5.3 Getting Started With Power BI
Dashboarding tools are, for the most part, story-driven tools: Microsoft’s Power BI is one such tool. Among its main attributes, we note that its learning curve (while reasonably steep, as is the case with all new tools) is attenuated somewhat by its distinctly Microsoft-ish point-and-click functionality, and that it allows dashboards to be easily published and distributed on the web (either internally and/or externally).66 On the negative side of the ledger, Power BI does not play nicely with MacOS and Linux (although there are workarounds).
5.3.1 Power BI Set-Up
Once Power BI has been downloaded and installed on a PC, we tap it open. The layout will look as follows:

Figure 5.18: The components of the default Power BI view.
We will explain what each of these components does as we go through the explanations: for now, we simply want to ensure that their locations on the screen is easy to find.
With older versions of Power BI, the layout might look like:

Figure 5.19: The components of the default (old) Power BI view.
5.3.2 Importing and Modeling Data
Let’s start by taking a look at the datasets that we are going to use in this section (we might use others as well, as the need arises).
The first dataset contains accounting transactions:
accounting.trans <- readxl::read_excel("data/DataSetAccounting.xlsx", sheet = 1)
knitr::kable(head(accounting.trans,10))
Accounting Control Number | Journal Voucher Type Code | Accounting Effective Date | Journal Voucher Item Amount | Project Identifier |
---|---|---|---|---|
5000085 | MC | 2018-03-01 | 151551.10 | PR007 |
5000086 | SA | 2018-03-02 | 84067.38 | PR009 |
5000087 | O&M | 2018-03-03 | 93526.46 | PR010 |
5000088 | MIC | 2018-03-04 | 53365.74 | PR011 |
5000089 | SA | 2018-03-05 | 68619.29 | PR012 |
5000090 | O&M | 2018-03-06 | 10035.78 | PR013 |
5000091 | MC | 2018-04-02 | 134484.51 | PR007 |
5000093 | SA | 2018-04-03 | 71770.03 | PR009 |
5000094 | O&M | 2018-04-04 | 46450.02 | PR010 |
5000095 | MC | 2018-04-05 | 55050.76 | PR011 |
accounting.code <- readxl::read_excel("data/DataSetAccounting.xlsx", sheet = 3)
knitr::kable(head(accounting.code,4))
Code | Description | Custom Sort |
---|---|---|
O&M | Operations and Maintenance | 1 |
MC | Major Capital | 2 |
MIC | Minor Capital | 3 |
SA | Salary | 4 |
The second contains information about projects and full-time equivalents working on these projects:
projects <- readxl::read_excel("data/DataSetProject.xlsx", sheet = 1)
knitr::kable(head(projects,10))
Project Identifier | Project Name | O&M Budget | Salary Budget | Major Cap Budget | Minor Cap Budget | FTE Budget |
---|---|---|---|---|---|---|
PR001 | Parks | 2500000 | 2.0e+06 | 5.0e+06 | 1000000 | 9.0 |
PR002 | Buildings | 5000000 | 4.0e+06 | 5.0e+06 | 5000000 | 6.0 |
PR003 | Emergency Response | 3000000 | 7.0e+06 | 8.0e+05 | 3000000 | 6.0 |
PR004 | Office | 4000000 | 4.0e+06 | 8.0e+06 | 200000 | 12.0 |
PR005 | Roads | 5000000 | 2.0e+06 | 2.5e+06 | 1000000 | 7.0 |
PR006 | Science | 5000000 | 1.0e+07 | 8.0e+06 | 2000000 | 7.0 |
PR007 | Heritage | 1500000 | 5.0e+06 | 3.0e+06 | 1700000 | 9.0 |
PR008 | Celebration | 2000000 | 4.0e+06 | 0.0e+00 | 0 | 7.0 |
PR009 | Research | 5000000 | 1.2e+06 | 1.0e+07 | 500000 | 6.5 |
PR010 | Upgrades | 4000000 | 2.0e+06 | 1.0e+07 | 500000 | 10.0 |
projects.FTE <- readxl::read_excel("data/DataSetProject.xlsx", sheet = 2)
knitr::kable(head(projects.FTE,10))
Project Identifier | Date | FTE (- out + in) | Group-Level |
---|---|---|---|
PR006 | 2018-09-01 | 2.0 | AS-02 |
PR016 | 2019-07-01 | 2.0 | AS-02 |
PR023 | 2020-02-01 | 2.0 | AS-02 |
PR027 | 2020-06-01 | 3.0 | AS-02 |
PR007 | 2019-02-01 | 1.0 | AS-02 |
PR014 | 2019-09-01 | 1.0 | AS-02 |
PR024 | 2020-07-01 | -1.0 | AS-02 |
PR004 | 2019-01-01 | 0.5 | AS-02 |
PR008 | 2019-05-01 | 2.0 | AS-02 |
PR018 | 2020-03-01 | 0.5 | AS-02 |
Save both of these files in an easy-to-remember location on the computer.
So how does Power BI Import Data? It is important to realize that Power BI is really the combination of two products, Power Query and Power BI.67 Data is imported and manipulated at a low-level through Power Query (this is done in the background using the Microsoft language M). Once this is completed and we close Power Query, the data is pushed into Power BI, where we manipluate it further using an Excel-like language called DAX.
M
M is Power BI’s data transformation engine; M Query is a mashup query language used to query sets accross multiple data sources. M contains commands to transform data and to return query and transformation results to the Power BI data model.68 Normally, we would use M Query to query data sources, and to clean and load data into Power BI.
M can be used for data preparation and data transformation tasks before the data is loaded into your model. For instance, instead of bringing 3 tables into Power BI, we could use M to remove unneeded columns and merge the tables together into a single table, which would then be loaded into the data model. This can help improve the performance of Power BI once the data model is fully loaded.
DAX
DAX (data analysis eXpression language) is Power BI’s analytical engine. It is a common language used by SQL Server Analysis Services Tabular, Power BI, and Excel’s Power Pivot. Once the data is loaded in Power BI, it is used to create custom columns, tables, and/or measures (among others).
Unlike M, DAX has similarities to Excel functions, but it provides a much wider range of functionality (and more power) than they do. We will provide a deeper discussion of DAX at a later stage.
Importing Datasets
Typically, we might want to “tweak” the data before loading it into Power BI. We can accomplish this using the Power Query Interface:
open Power BI;
close the yellow “Hello” screen (we will return to this shortly);
in the “Home” ribbon at the top of the Power BI screen, there is a clickable region named “Get Data” – activate it to bring up several data format options:
Figure 5.20: Available data format options.
among all the options, select “Excel” (the format in which the datasets are available);
click on the file
DataSetAccounting.xlsx
, and select “Open”;Figure 5.21: Options for importing data from an Excel spreadsheet.
The corresponding M code is:
= Excel.Workbook(File.Contents("data\DataSetAccounting.xlsx"), null, true)
click on the first TABLE (the one named “Accounting_Transactions”) without selecting the checkbox (yet) – this brings up a display of the table (clicking on the other options will bring up their displays);
select the “Acounting_Transactions” checkbox (don’t click on load yet!);
we will tweak the data before loading it in Power BI, so we edit the transformation by clicking on “Transform Data”.
Figure 5.22: Transforming the data before loading it in Power BI.
The Power Query screen will look as below:

Figure 5.23: Power Query screen components.
The corresponding M code is:
= Source{[Item="Accounting_Transactions",Kind="Table"]}[Data]
Refining Datasets
Using Power Query:
change the name of the table by double clicking (or right clicking) on the table name and editing to remove the underscore;
the corresponding M code is:
= Table.TransformColumnTypes(Accounting_Transactions_Table,{{"Accounting Control Number", Int64.Type}, {"Journal Voucher Type Code", type text}, {"Accounting Effective Date", type date}, {"Journal Voucher Item Amount", type number}, {"Project Identifier", type text}})
change the name of the “Accounting Effective Date” column to “Effective Date”;
do the same and rename “Journal Voucher Item Amount” to “Item Amount” (note the new step in the query step box);
the corresponding M code is:
= Table.RenameColumns(#"Changed Type",{{"Accounting Effective Date", "Effective Date"}, {"Journal Voucher Item Amount", "Item Amount"}})
remove the observations with dates in March 2018 in the “Effective Date” column by clicking on the drop-down arrow by the column header, selecting “date filters”, then “after”, then selecting March 31 2018, then pressing “ok”;
The corresponding M code is:
= Table.SelectRows(#"Renamed Columns", each [Effective Date] > #date(2018, 3, 31))
click on “Close and Apply” and load Power BI with the refined data;
save the
.pbix
file with an appropriate name (say,Test.pbix
), and in an appropriate location.
All the steps can be re-traced in the Power Query APPLIED STEPS panel:

Figure 5.24: Re-tracing the Power Query steps.
Let’s repeat this procedure using the same spreadsheet, but adding the “Journal Voucher Type Code” tab this time (remember to click on “Transform Data” to verify that the data looks as it should); change its name to “XREF Journal Code”. In this context, “XREF” stands for “cross-reference”, a table o observations that “explain” the primary data on which we evaluate an buil charts. It is a good Power BI practice to name all your cross-reference table starting with the string “XREF”; this will allow us to easily group them together.69

Figure 5.25: Loading the second table (don’t forget to press the Close and Apply button.
Linking Datasets
We now have two tables. We will use Power BI to link them (assuming that there is a natural way to do so, i.e., that they have one column in common):
click on the “Model” button in Power BI (see Figure 5.18 for a location reminder);
two tables appear, but they are not yet linked:
Figure 5.26: Two unlinked tables in the data model space.
left-click on
Code
in the “XREF Journal Code” table, drag across toJournal Voucher Type Code
in the “Accounting Transactions” table, and release to see the new connection in the data model:Figure 5.27: Two unlinked tables in the data model space.
Double clicking the connection/line between the tables brings up the relationship editor:

Figure 5.28: Two unlinked tables in the data model space.
The greyed-out columns show which variables are connected in the data model, while the cardinality describes how the two tables are related:
one-to-many / many-to-one,
one-to-one, or
many-to-many (which is generally not recommended, for a variety of reasons).70
The cross-filter direction is the direction in which Power BI applies a filter, from one table to another. We leave as “Single” for now, but note that this could be modified if required.
We are now in a position to create calculations and (most excitingly) charts.
5.3.3 Calculations and Charts
In Excel, formulas are entered in cells; in Power BI, the corresponding quantities are independent calculations made using the DAX language.
DAX is a collection of functions, operators, and constants, which are used to create new data entities (measures, columns, tables); in effect, DAX helps to create new information from data already found in the model (it is a form of data reduction) [100].
Calculated Columns, Measures, and Tables
Consider the following typical DAX formula:
Total Amount = sum('Accounting Transactions'[Item Amount])
In this example:
the measure name is
Total Amount
;the operator
=
indicates the beginning of the formula – when calculated, the formula returns a single result;the function (in this case,
sum
) contains a lone argument, which is contained inside the parentheses( ... )
;the table from which the data is taken is found just to the left of the brackets
[ ... ]
(in this case,Accounting Transactions
), andthe referenced column from the table is found inside the brackets
[ ... ]
(in this case,Item Amount
).
The single quotation marks around the table name are needed because it contains a space character; they would not be needed around the column name if it contained a space character, however. As an example, the following would be acceptable (assuming these were the actual table and column names):
Total Amount = sum(Accounting_Transactions[Item Amount])
In Excel, calculations are added to a cell (to be sure, we can also add a calculation to every cells in a range, but it’s the same calculation, possibly using different inputs).
In Power BI a calculation can “live” in one any of three places (not including in Power Query, but this is a topic for a later section).
A calculation can create:
a new data column (a calculated column);
a new data value calculated solely to be used in a visualization (a measure), or
a new data table (a calculated table).
These are accessed via the Modeling ribbon (see below).

Figure 5.29: Calculation buttons found in the Modeling ribbon.
Let us take a quick look at each of the types.
Calculated columns are added to an existing data table; they are evaluated for each row in the table, immediately after ‘Enter’ is hit to complete the formula. These columns are saved into the model, so they take up space in the model (a calculated column on a table with 1M rows adds 1M new data points to the table). Calculated columns are often used to filter on their result, rather than simply as a calculated result (in slicers, for instance).
Calculated measures, on the other hand, are only evaluated when used in a visualization (or when the visualization is rendered); they are therefore not saved anywhere (aside from the cache). This helps explain why measures are generally preferred to calculated columns, but there are trade-offs related to performance (report runtime vs. pre-processing), storage space, and what type of expressions can be used in the formula.
Finally, calculated tables are new tables of data, calculated from existing tables in the data model. As is the case with calculated columns, they are evaluated for each row in the source table, immediately after ‘Enter’ is hit to complete the formula, and they are saved into the model, which means that they take up space. They can be linked to any other table in the data model.
Columns vs. Measures vs. Tables
When do we use calculated columns over measures? Apart from space considerations,71 it is often the case that both are reasonable options. In most situations, the computation needs make the determination.
For instance, we would use calculated columns to accomplish the following tasks:
place the calculated results in a slicer, or in the axis of a chart, or use the result as a filter condition in a DAX query;
define an expression that is strictly bound to the current row (for example,
Cost * Volume
does not work on an average or on a sum of two or more columns);categorize text or numbers (for example, a range of values for a measure, a range of customer ages, such as
0–18
,18–25
, etc.).
One the other hand, we would use calculated measures whenever a resulting calculation needs to be displayed in the values area in the plot area of a chart, such as in calculating the cost percentage on a certain subset of the data. As measures can incorporate data from different tables, they do not therefore “belong” to a specific table in the same manner as a calculated column does. Consequently, it is a good practice to create a separate “home” (table) for calculated measures.
We only discuss calculated tables briefly for the moment. They could prove useful as intermediate steps in data analytic endeavours, or more simply when:
- the desired calculated result is a matrix of some form;
data tables need to be summarized;
data needs to be isolated from other tables;
new relationships need to be created in the data model.
Examples of Calculated Columns and Measures
As a first example of a DAX expression, we divide the “Item Amount” column by 1000 to obtain a new calculated column that represents the item amount in units of 1000$:
click on the data button to access the table view (note the available tables and columns in the Fields panel, at the right of the screen);
select the “Accounting Transactions” table in the Fields panel (by clicking on it) to produce a new ribbon item called “Table tools”;

Figure 5.30: Accessing the table tools.
- click on “New Column” (you can also access this functionality from the “Home” ribbon);

Figure 5.31: Accessing the column tools.
- Type in the formula below in the space reserved to that effect, then hit “Enter” (the auto-complete functionality will not appear when cut-and-pasting):
Item Amount $k = divide('Accounting Transactions'[Item Amount], 1000, 0)
We can see a new column in the Fields panel:

Figure 5.32: Creating a new column with DAX.
We can do a bit of housekeeping to clean things up before we create a chart:
click on the column header for
Item Amount $k
(alternatively, select that variable from the Fields panel, on the right), bring up the “Column tools” in the top ribbon;the “Format” dropdown menu is found in the “Formatting section” of the “Column tools” ribbon; click on the menu and change the format to “Currency”;

Figure 5.33: Cleaning up the new column.
- to display the currency with the usual 2 decimal places, locate the dropdown box that says “Auto”, click on it, and change the value to “2” (this can also be done for the original
Item Amount
column).

Figure 5.34: Formatting the number of decimal places.
At this point, we can start using Power BI for … well, what it’s supposed to be used for: namely, charts. We create a simple display using the new column:
access the “Report” screen;
in the Visualizations panel, select the “Stacked Bar Chart” icon (the first icon in the list);
from the “XREF Journal Code” table (in the Fields panel), drag the
Description
column on the “Y-Axis” field;from the “Accounting Transactions” table, drag the
Item Amount $k
column onto the “X-Axis” field.
The resulting chart looks like the one below:

Figure 5.35: The first Power BI chart!
The quantity that is displayed on the \(x-\)axis is in fact the sum of Item Amount $k
for all items with the same Description
in the “Accounting Transactions” table; that is the default statistic.
But it does not have to be the sum; right-clicking on the “X-Axis” menu brings up the other statistics.72
For instance, if we select the average Item Amount $k
per row in the “Accounting Transactions”, we get the following chart:

Figure 5.36: A different Power BI chart, using the same underlying data.
We can also format the aesthetics of the chart. This can be accomplished by first selecting the chart (simply by clicking on it), and then pressing on the “Format your visual” icon in the Visualization panel (the icon with a paintbrush and a chart). This will bring up a number of items, the exact listing depending on the chart type (this is another important thing to explore).
For instance, we can modify the colour of each bar in the chart through the “Bars” option.

Figure 5.37: Changing the aesthetics of the Power BI chart.
We can easily modify the chart, by selecting a different chart type in the list: for instance, a donut chart.

Figure 5.38: Another Power BI chart!
Let us now build a measure. The best practice in such a case is to create a “Measure Table” in which the measure can reside:
- in the “Home” ribbon, click on the “Enter Data” icon (in the “Data” section);

Figure 5.39: On the way to a calculated measure.
ignore everything except the “Name” field at the bottom (left);
override the name “Table” with “_Measures” (making sure to include the underscore
_
);

Figure 5.40: On the way to a calculated measure.
- click on “Load” – the measure table appears at the top of the list in the Fields panel (because of the underscore).

Figure 5.41: On the way to a calculated measure.
Next, we populate that table with a measure that once again uses the DIVIDE function:
click on the new “_Measures” table in the Fields panel (the top ribbon will change to “Table tools”);
click on “New measure”;

Figure 5.42: Creating a new measure.
- start to TYPE the following (do not cut and paste):
Item Amount $k Measure = DIVIDE(item ...
Power BI is not completing the string in the same way as it did when we created a calculated column – why is that the case?

Figure 5.43: Trouble in paradise?
The following would provide auto-complete options, however:
Item Amount $k Measure = DIVIDE(SUM('Accounting Transactions'[Item Amount $k]), 1000, 0)

Figure 5.44: Trouble in paradise?
By default, measures perform an operation on an entire column, not row-by-row (there are exceptions, which will be covered at a later stage). In this case, we could also DIVIDE the average, max, min, standard deviation, etc. by 1000;73
- complete the measure formula and create a chart from it as we did for the calculated column (note the icon next to the measure in the Fields panel, on the right).

Figure 5.45: A chart created from a measure
Mathematical and Logical Operations
In the preceding text, we showed a standard use of SUM and DIVIDE; the expected standard arithmetic expressions (SUM, DIVIDE, PRODUCT, MAX, MIN, AVERAGE, etc.) are all available in DAX.
Most arithmetic expressions are straightforward, but some are very specific, such as the DIVIDE operation, say. It is easy to manually divide using a /
, (we all do it and most of the time no problems will arise), but there is a failure option built into DIVIDE that provides alternate results (such as 0
for a “divide by 0” error).
DIVIDE(<numerator>, <denominator> [,<alternateresult>])
It is worth spending a bit of time playing around with these expressions to determine their full functionality.
We finish this section on DAX by introducing logical operations, such as the AND or the OR operation. There are multiple ways of achieving this; we use a simple IF statement to select a value when a logical condition is met.
As an example, assume we want to flag a transaction when the Item Amount value is above $100,000. We do this by adding a new calculated column with the following logic:
if the Item Amount value is equal to or greater then $100,000, then we add the word “Check”;
if the value is below $100,000, then we leave the value in the new column blank.
This can be accomplished as follows:
click on the “Accounting Transactions” table (in the Fields panel);
click on “Column Tools”, then “New Column” (in the top ribbon);
type in the following formula:
- press “Enter” to complete the calculation.
We can now create charts using this new column:

Figure 5.46: A chart using the result of a logical computation.
Filtering and Slicing
As we can see in the previous examples, we calculate measures and columns by performing an operation on an entire data column. What if we only want to do this on a subset of the data.
Here are two ways to do this:
build the filter into the calculation itself;
filter the result using a slicer or using the Power BI built-in filter function.
We are going to do an example of each.
First, we build a filter directly into our calculations, using probably THE most useful DAX operation (“CALCULATE”).
Suppose we want to add the dollar amounts, but only those that arise from salaries (in the “Accounting Transactions” table):
click on the “_Measures” table (in the Fields panel);
create a new measure, and type the following at the prompt:
As a quick aside, we can include a carriage return in a formula by holding down “Shift” and hitting “Enter”; it is good practice to use carriage returns to ensure that measures and columns are easy to read. Another best practice is to comment code, which is done in Power BI by typing 2 forward slashes //
, as below:
Total Salary = CALCULATE(SUM('Accounting Transactions'[Item Amount]), //this is a comment
'Accounting Transactions'[Journal Voucher Type Code]="SA") //another comment
We can easily create three new measures Total Major Capital, Total Minor Capital, and Total O&M, corresponding to Journal Voucher Type Codes MC, MIC, and O&M, respectively (remember to format the measures again), and build a clustered bar chart from them.
This is not the only way to do this, of course (we have created the chart in a previous section); the nice thing about using measures is that they are easy to re-order (by changing the order in the “X-Axis” field.
We can also filter visualizations by using slicers. In fact, the default setting for Power BI is to allow a user to click on any chart which will in turn filter other charts.
We set-up a simple slicer using “Project Identifier” by clicking on the slicer icon in the Visualization panel, and then dragging the “Project Identifier” column into the “Field” box.

Figure 5.47: Using a slicer on a chart.
The chart can be updated for various combinations of project identifiers.

Figure 5.48: Chart for one specific project.
We can also filter visualizations by using the Filter panel: click on the desired chart and expand the filter pane, then drag the column to use as a filter in the corresponding slot.

Figure 5.49: Chart for one specific project.
5.3.4 Organizing Data
Organizing the available data is a critical part of building reports and dashboards. Fortunately, Power BI has some powerful built-in capabilities to accomplish this task.
Hierarchies
Hierarchies provide a way to order data levels. They allow us to summarize data in different ways.
When a date is imported in Power BI, a date hierarchy is automatically created:
Year > Quarter > Month > Week > Day
For instance, the hierarchies created in Exercise 10 can be navigated up and down, as shown below:

Figure 5.50: Illustration of the hierarchical structure of dates in Power BI.
It is a very simple process to create a custom hierarchy in Power BI: simply drag a column on top of another. As an example:
enter the report view;
left-click on “Project Identifier” (or click on the 3 dots
...
) and select “Create Hierarchy”;left-click on “Journal Voucher Type Code” (or click on the 3 dots
...
) and select “Add to Hierarchy > Project Identifier Hierarchy”;

Figure 5.51: Building a custom hierarchy.
- create a bar chart and add the hierarchy to the X-Axis and “Item Amount” to the values.

Figure 5.52: Bar chart with a custom hierarchical structure (two levels: projects and codes).
We can also use hierarchies with matrices and arrays, as in the example below:
- insert a matrix on the canvas (use the icon in the Visualization panel);

Figure 5.53: Matrix icon in the visualization panel.
add the hierarchy to the “Rows” field;
add “Item Amount” to the “Values” field;
open the “Effective Date” Hierarchy and drag the “Year” to the “Columns” field, in essence creating a pivot table.

Figure 5.54: Pivot tables in Power BI.
Groups
We use groups when we want to categorize or classify similar rows of data in a table.
There is a built-in Power BI function to do accomplish this for two types of data:
grouping text or categories (such as grouping all EX equivalent job categories together, say), and
grouping or “bucketing” numerical intervals (such as creating an axis where we can count the number of items that fall in a particular set of ranges: 0-100, 101-200, 201-300, > 300).
Neither approach is perfect, but 90% of new users’ requirements will be covered; we will discuss more complex scenarios at a later stage.
Let us start with an example where we group categorical data via the following steps:
in either the Data or Report views, click on the “Accounting Transactions” table in the Fields panel;
click on the “Project Identifier” column and either click on “Data groups” in the “Column tools” menu or the 3 dots
...
next to the “Project Identifier” name in the Fields panel, and select “New group”;in the list called “Ungrouped Values”, click on PR001, hold down SHIFT and click on PR0010 to select that block (CTRL also works for individual items);

Figure 5.55: Groups dialog box (categorical).
click on “Group” and rename the Group in the “Groups and members” field to “Group 1” (double-click on the concatenation of all names to edit the group name);
repeat the process for PR0011 - PR0020 (Group 2) and PR0021 - PR0030 (Group 3);

Figure 5.56: New groups: Group 1, Group 2, Group 3.
- Hit “OK”.

Figure 5.57: Project Identifier grouped variable.
There is now a new column in the “Accounting Transactions” table, displaying the Project Identifier group associated with each transaction (row). This column can be used in a chart axis, slicer, filter, or as data (for count operations only, however).74

Figure 5.58: New variable in the Accounting Transactions table.
Next, we provide an example that illustrates how to group numerical data into intervals:
in either the Data or Report views, click on the “Accounting Transactions” table;
click on the “Item Amount $k” column and as per the previous example either click on “Data groups” in the “Column tools” menu or the 3 dots
...
next to the “Item Amount $k” name in the Fields menu, and select “New group”;

Figure 5.59: Reaching the groups dialog box (numerical).
There are 2 grouping options for numerical data (see “Group Type” dropdown menu);
lists work in the same way as the previous example where we were grouping categorical project identifiers;
bins are used to create “buckets” (intervals) of column values, which can either be determined using the “Size of Bins” or the “Number of Bins” parameters.
“Size of bins” is typically more useful: in this example, using “Bin count” = 50 yields a “Bin size” of $3828.27, which does not particularly roll off the tongue. Setting the “Bin size” to something sensible (say, $10,000) will provide reasonable (and easily interpretable) intervals.
Figure 5.60: Groups dialog box (numerical: list and bins).
When we do so, we see a new column appear in the “Accounting Transactions” table, “Item Amount $k (bins)” (the name can be changed in the Group dialog, if needed).
We create a histogram with this new colum:
In the report view, add a Stacked Column Chart;
drag the “Item Amount $k (bins)” column to the X-axis field;
drag the “Item Amount $k (bins)” column to the Y-axis field (in the dropdown menu, make sure that “count” is selected rather than some other statistic);

Figure 5.61: Histogram using a grouped numerical variable (bin size = 10,000).
- edit the bin size (see below) to obtain coarser or finer histograms.


Figure 5.62: Histogram using a grouped numerical variable (bin size = 20,000, left; 5000, right).
Custom Sorting
We can click on a Power BI chart anywhere the canvas, you can sort by the either axes, or by the legend (ascending or descending, either numerical or alphabetical). Some examples are shown below:



Figure 5.63: Various ways of sorting a chart.
We can also sort according to values provided in another column. In the “XREF Journal Code” table, there is a column named “Custom Sort”.

Figure 5.64: Custom sort order for journal codes.
We can sort the chart according to the order shown in this column as follows:
click on the “Description” column name in the Fields panel;
in the Column Tools menu click on “Sort By Column and select”Custom Sort”:

Figure 5.65: Custom sort order for journal codes.
We change the custom sort order by swapping the values around in the Excel spreadsheet and hitting “Refresh”, the chart will update with the new sort order.
There is another way to sort charts, of course. As an illustration, suppose that we create a chart with some columns from the “Accounting Transactions” table and “Journal Voucher Type Code” on the axis, but keeping the custom sort order. This could come into play for a variety of reasons (e.g., such as needing to pull in a value from another table to do row-by-row calculations), so let us look at how this could be achieved using programmatic means.]
This can be achieved using the DAX function RELATED:
click on the “Accounting Transactions” table and click on “New Column” in the “Table tools”;
type in the following DAX code at the prompt:
Custom Sort Order = RELATED('XREF Journal Code'[Custom Sort])
the “RELATED” operation looks at all connected tables;
we specify the column in the connected table we want to reference;
once selected, the code copies the data to the new column.
In essence, this produces results much as “VLOOKUP” would in Excel.
5.3.5 Data Wrangling
According to Wikipedia:
“Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one”raw” data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. The goal of data wrangling is to assure quality and useful data. Data analysts typically spend the majority of their time in the process of data wrangling compared to the actual analysis of the data. The process of data wrangling may include further munging, data visualization, data aggregation, training a statistical model, as well as many other potential uses. Data wrangling typically follows a set of general steps which begin with extracting the data in a raw form from the data source, “munging” the raw data (e.g., sorting) or parsing the data into predefined data structures, and finally depositing the resulting content into a, data sink for storage and future use.”
In this section, we look at a few high-level approaches to import and clean data before visualizing it. All of this is done in Power Query, but it must be noted that some of these activities could also be completed in Power BI.
The main reason why we suggest doing the majority of Data Wrangling in Power Query is that only the post-processed data is loaded into Power BI, making the data set more streamlined. If we wrangle the data in Power BI using DAX, then the entire data set has to be uploaded first, which is typically less efficient.
In all of the following examples, then, we need to access the data through Power Query; this is accomplished by pressing the “Transform Data” button in the “Home” menu.
Removing Rows
We can remove rows in Power Query by using the column filter (this is very similar to the “Filter” function in Excel). As an example:
in the “Accounting Transactions” query, click on the “Effective Date” column;
to remove all transactions taking after 01-Apr-2018, click on the small down arrow next to the column name, then on Date Filters > Before…

Figure 5.66: Removing rows using filters.
- enter the appropriate filter in the dialog box and press “OK”:

Figure 5.67: Removing rows using filters.
Those rows are now removed completely from the data set and will NOT be loaded into Power BI.
The second way to remove rows is to use the “Remove Rows” button in the “Home” ribbon:

Figure 5.68: Removing rows using the menu.
“Remove Top Rows” removes the top X rows from a table: this can prove useful when a data source (sucn as an Excel spreadsheet) contains fronting non-data information, say;
“Remove Bottom Rows” is identical to the above, except that it removes rows found at the bottom of a data source;
“Remove Alternate Rows” removes alternate rows in a pattern: this can proved useful when a data source contains rows and subtotals, say;
“Remove Duplicates” removes all duplicate rows;
“Remove Blank Rows” removes all blank rows;
“Remove Errors” removes rows with errors (but this should only be use sparingly… it is preferable to fix errors).
Replacing Values
Another useful data wrangling trick is to replace values. As an example, if a transaction’s date is unknown for some reason, we may find a “N/A” in an observation’s date column. The presence of such an “N/A” makes it impossible to format that column as a date, which could “break” measures and charts that use this column.
Faced with this situation, we might decide to replace the “N/A” with a null (or empty) value:75
in Power Query, select the column in which you want to replace values;
click on the “Replace Values” button in the “Home” ribbon;
fill the “Value To Find” and “Replace With” fields;
the options to match the entire cell contents and/or to replace using special characters is also available in the “Advanced options” dialog (not available in every Power BI version).

Figure 5.69: Replacing values dialog box.
Splitting Columns
At times, a text data column may contain information which could be more useful if it was split and found in one (or more) separate column. For example, if we have a column named “Group - Level” where the entries all consist of a group name and a level, it could be advantageous, from a visual hierarchy point of view, to split the entries into two separate columns: “Group” and “Level” (we suggest that you keep the original column in the dataset, however).
In this example, we remove the project number from the project identifier:
select the “Project Identifier” column from “Accounting Transactions”;
right-click on the column heading and select “Duplicate Column”;
select the duplicate and click on the “Split Column” button and select “By Number of Characters”.

Figure 5.70: Splitting a column.
- Type in “2” in the “Number of characters” field (given the structure of the project identifiers) and then select “Once as far left as possible”:

Figure 5.71: Splitting a column.
- Rename the columns “Project Code” and “Project Number”.
There are a few different options when it comes to splitting columns:
“By Delimiter” – for example split at every instance of a comma;
“By Number of Characters” – see previous example;
“By Positions” – at fixed points in the string;
“By Lowercase to Uppercase” – e.g., abcDEF would split into abc and then DEF;
“By Uppercase to Lowercase” – as above but in reverse;
“By Digit to Non-Digit” – e.g., 123ABC would split into 123 and ABC;
“By Non-Digit to Digit” – as above but in reverse.
Trimming and Cleaning
Often, we need to import data that comes with leading and/or trailing white spaces on some records. We can trim such records by selecting any text column, pressing the “Format” button in the “Transform” menu, and finally selecting “Trim”.

Figure 5.72: Trimming a text column.
NOTE: the “Trim” function sometimes fails to achieve the desired effect as there are two types of blank spaces in text files: a regular space (ASCII code 32) and non-breaking space (ASCII code 160). The latter will not be removed using this procedure. Instead, we must use a special character search and replace the tricky strings (we will discuss this at a later stage).
At times, data contains unprintable (or non-printable) characters: carriage returns, tabs, line breaks etc.76 To remove these, simply perform the steps as per the previous slide but select “Clean” from the options.
If the data contains a lot of text (e.g., comments or descriptions), it is a best practice to perform a “Clean” step in the Power Query before importing the data into Power BI.
Other useful text formatting options include:
“lowercase” – changes all characters in a text column to lowercase;
“UPPERCASE” – changes all characters in a text column to UPPERCASE;
“Capitalize Each Word” – performs as described;
“Trim” – see previous example;
“Clean” – see previous example;
“Add Prefix” – adds a text value to the front of each string in a text column;
“Add Suffix” – adds a text value to the end of each string in a text column.
Appending Tables
In the rest of this section, we are going to work through an exercise that includes appending three tables together. The exercise will also use several items we have reviewed up to now.
We want to import new tables of data named “Project Identifier”. We will create a cross-reference table using “Project Identifier” to link all the tables together in the data model.
This could be done manually (and in fact, most users might chose this approach), but this can become annoying when new projects are added to either table. Instead, we will dynamically create an automatic cross-reference table from all of the “Project Identifier” columns.
Let us follow these steps:
import BOTH tabs from the
DataSetProject.xlsx
file (Project_Tombstone
andFTE_Count
);change the name of both the tables to remove the underscore;
review the formatting of all columns in both tables;
In the next part of the exercise, we create a single XREF table containing a set of unique “Project Identifiers” that can be used to link all of the tables together:
- right-click the “Accounting Transactions” query and select the “Duplicate” option;
select the duplicate table and do the following:
rename it “XREF Project Identifier”;
select the “Project Identifier” column, right click and select “Remove Other Columns”;
Figure 5.73: How to remove all columns but one.
do the same for “Project Tombstone” query but rename it “DNL Project Tombstone”;
do the same for “FTE Count” query but rename it “DNL FTE Count” (DNL stands for DO NOT LOAD – we will come back to that later).
We now have 3 tables each containing a single column which is “Project Identifier”. NOTE: the column names in each table need to be IDENTICAL!
Next, we are going to “Append” these tables, in other words put them all together into a single table. We do so by first clicking on the “target” table (the one which will eventually be uploaded into Power BI). We could create a new table, but for illustration purposes, we are simply going to use the “XREF Project Identifier” table.
Click on “XREF Project Identifier”, then in the “Home” menu click on “Append Queries”. You will see 2 options, select “Append Queries” (not as new).

Figure 5.74: Location of the Append Queries button.
The Append dialog box asks whether we want to append only 2 or 3+ tables. Select the “Three or more” option; click on “DNL FTE Count” and then “ADD”, then repeat for “DNL Project Tombstone”.

Figure 5.75: The Append dialog box.
The resulting “XREF Project Identifier” query might not look any different, but the instructions now exist for Power Query to append all the tables together.
We now apply the following data transformations:
trim the “Project Identifier” column.
clean to the “Project Identifier” column.
select “Remove Rows > Remove Duplicates”.
select “Remove Rows > Remove Blank Rows”.
The column now contains a unique list of Project Identifiers based on all three tables. We are not quite finished, however.
Remember those tables we renamed with the DNL prefix? If we hit “Close and Apply” now, they will be loaded into the data set and create confusion. To ensure that they are no loaded, right-click the “DNL FTE Count” query, say, and uncheck “Enable Load”.77 Repeat these steps for the “DNL Project Tombstone” query.
Finally hit “Close and Apply”, then “Save” in the Power BI canvas.
When we hit “Close and Apply”, Power Query tries to link the tables together: this should show up in the modeling tab. The initial view is very messy so move the tables closer together.
As we can see Power BI did not get all the connections correctly, so we delete the wrong table connections and create the correct ones.
Finally, note that it is good practice to place the XREF tables across the top, and the data tables down to the left: this tends to make the data model easier to read.
We can now create charts, measures, columns, slicers, etc. using “Project Identifier” from the new XREF table.
References
Not everyone favours the point-and-click approach, which can become byzantine on a dime; there is, thankfully, a programmatic approach (that uses M and DAX). We will discuss those in the forthcoming sections.↩︎
Power Query can also be found in Microsoft Excel and other software.↩︎
See Information Gathering for (slightly) more information on the topic [61].↩︎
And will go some way towards helping us remain sane when things get complicated!↩︎
See Getting Insight From Data for (slightly) more information on the topic [61].↩︎
Recall that measures are more computationally efficient than calculated columns and that they do not add additional data into the model.↩︎
For text data, the only options are “Count” and “Count (Distinct)”.↩︎
Even though the measure was created from a column that was formatted as currency, we still need to separately format the measure as currency.↩︎
To obtain an “Other bucket”, check the “Include other group” checkbox in the Groups dialog box.↩︎
Technically, we should really try to figure out what the actual transaction date was, but this is not always possible.↩︎
As an aside, the first 32 ASCII codes (0 – 31) are reserved as control codes for printers and other peripherals.↩︎
This step is known to be successful when the query name changes to italics.↩︎