This post is the second in a series describing our experience using Microsoft Power BI for an internal dashboard project. In our first post, Creating A Power BI Dashboard, we discussed our dashboard needs, goals, and our approach. Now, we’ll dive into how to actually get the data with Power Query.
Our dashboard initially will report on billable and non-billable hours of our consultants, calculating revenue, margin, and utilization.
To build our dashboard, the first step is getting data out of our time tracking system, Harvest. Harvest presented 3 options:
Use Harvest Web API
But using the Harvest Web API presents some challenges. Power Query supports a varied and increasing number of data sources. However, anything not pre-built into the tool requires some custom coding. Microsoft Program Manager Kasper de Jonge blogged about getting Power Query to use the Twitter API with some OAuth hacking (with some limitations). I decided the first iteration of the dashboard, I wanted to spend effort on modeling the data, not wrestling with OAuth, not to mention the ramp up time to grok the Harvest API.
Run report, then choose exportoption (Excel, CSV)
This means running custom reports, picking the desired date range, and exporting the report as CSV or Excel. This option requires too many manual steps, which means more chances to make a mistake.
Export/download all organization data via CSV file through admin interface
This is the path we selected. It involves getting a full organization data export using the Harvest administrative page. While this means re-downloading much of the same data every week, our total file size is around 5 MB, so this is acceptable. For the initial dashboard iteration, we opted to forgo incremental data updates for simplicity.
Now that we we have our data source (obfuscated in this blog post for privacy), what do we do?
Read the data into, Excel shape it as needed, and load it into the Excel Data Model.
While Excel has a host of legacy data import options, each data source wizard has a different user interface with different transformation options. Power Query is the shiny tool that presents a unified user interface and an amazing amount of flexibility and transformative capabilities. It have an intuitive Office ribbon interface for getting the data, transforming it, then loading it into an Excel table, the Excel data model, or both. Each step is saved as a part of a single script in the Power Query Formula Language (code-named â€œMâ€). In addition, Power Query has an extensive and growing Power Query Formula Library that provides the data manipulation functionality. The relationship is analogous to a language like C# lets you create programs, but using the .NET Library greatly expands your capabilities.
Power Query has the following system requirements: Windows 7 and greater, Office 2010 Professional Plus, or Office 2013. For Office 2013, they offer a Power Query Public and Power Query Premium flavors, depending on your Office version. See the download page for full details.
– Download the Power Query add-in.
– Alternatively, you can use the (now in preview) Power BI designer (to be covered in a future post)
Once Power Query is installed, you’ll see the Power Query tab (you may need to go to File -> Options -> Add-Ins to enable it).
Power Query supports many different data sources, with more added regularly. The data source from Harvest is a CVS file. To load the data, we choose From File -> From CSV and browse to the CSV file. You can also paste in a URL to a file share or a Share Point document (which is where we eventually decided to store and update the file). Once the file is read, it opens up the Query Editor window.
When loading a data source, Power Query will scan the data and attempt to set headers and data types for each column. You will notice on the right-side of the screen shot, the Applied Steps section. Every time you interact with the data, the “step” is recorded, making it easy to follow complex transformations, and undo (delete) steps if needed.
For our file, we want to do the following:
– Ensure the data types are correct
– Derive new columns (for business keys)
– Remove unneeded columns
Most of the data types it gets correct, but based on the sampling it may suggest integer when we actually want currency. The two columns we need as currency are “Billable Rate” and “Billable Amount”. I highlight them both by Ctrl-click on each column header and choosing “Currency” from the Data Type menu.
Thinking ahead to our data model (covered in a future post), we will be modeling a traditional star schema. Meaning, some of the columns in this file will be loaded into separate tables keeping only a referring key in our fact table. We will generate the business keys for the future dimension tables, Project and Consultant.
For Project, we will concatenate the “Client” and “Project” columns. We click “Add Column”, give it the name ProjectKey and type in the formula
For Consultant, we will do the same thing, this time concatenating “Last Name” and “First Name” columns and titling it “ConsultantKey”. Unfortunately, Harvest does not provide us with a unique identifier for consultants, so the responsibility for preventing name clashes falls on the Harvest administrator when time comes to add new consultants into Harvest. For our purposes now, this was a satisfactory trade-off.
Now, we can remove unneeded columns. This can be accomplished by either Ctrl-click the columns to remove and choose “Remove Columns” from the toolbar, or Ctrl-click the columns we want to keep and choose “Remove Other Columns”.
As we keep transforming the data, more steps are added to the Applied Steps section. While the default names are helpful (for example “Added Custom” and “Added Custom1”), this can be improved upon by renaming them something more descriptive. You can right-click each step and select “Rename” from the menu. I chose to rename them “Add ProjectKey” and “Add ConsultantKey”.
At this point, you can view the entire script by clicking the “View” tab and choosing “Advanced Editor”. While on this tab, I also recommend clicking the “Formula Bar” option so you can see the formulas being created, even when just using the user interface.
Our script at this point:
That Was Easy
The next post in this series will show a few more advanced examples.