5.1 Visualization Toolkit
With data displays, we try to highlight:
a relationship – show a connection or correlation between two or more variables, such as the impact of an aging population on health care;
a comparison – set some variables apart from others, and display how those two variables interact, such as the number of fans attending hockey games for different teams in a season;
a composition – collect different types of information that make up a whole and display them together, such as the various search terms that visitors used to land on your site, or how many visitors came from various sources (links, search engines, or direct traffic), and
a distribution – lay out a collection of related or unrelated information to see how it correlates (if at all), and to understand if there’s any interaction between the variables, such as the number of bugs reported during each month after a new software release.
Here are some examples of what some of those types of visualizations can look like; we will give a more thorough treatment shortly (comprehensive catalogues can be found in [6]–[8], [98], among others).

Figure 5.1: Decision Tree: classification scheme for the kyphosis dataset (personal file).

Figure 5.2: Histogram of reported weekly work hours (personal file).

Figure 5.3: Decision tree bubble chart: estimated average project effort (in red) over-layed over product complexity, programmer capability, and product count in NASA’s COCOMO dataset (personal file).

Figure 5.4: Association rules network: diagnosis network around COPD in the Danish Medical Dataset [99].

Figure 5.5: Classification scatterplot: artificial dataset (personal file).

Figure 5.6: Classification bubble chart: Hertzsprung-Russell diagram of stellar evolution (European Southern Observatory).

Figure 5.7: Time series: trend, seasonality, shifts of a supply chain metric (personal file).
5.1.1 Simple Text and Tables
One or two numbers to focus on.
Good at “setting the scene”.
Draws focus to an area of the report.

Figure 2.2: CAPTION
Tables interact with our verbal system, which means we read them: used to compare values audiences will look for their rows
Table design needs to blend into background the data should stand out, not the borders dense table/data: use alternating row colour

Figure 3.33: CAPTION
Leverage colour to convey magnitude use single colour saturation rather than differentiation (different colours) with a legend (white = low, blue = high), numbers can be removed without altering the message

Figure 3.34: CAPTION
5.1.2 Scatterplots and Bubble Charts
Show relationship between 2 variables (scatterplot) or 3 variables (bubble plot) use average lines (dotted lines) to provide context far fewer options in Power BI than Excel consider using groupings to add clarity (e.g. colour gradients)

Figure 4.2: CAPTION

Figure 3.35: CAPTION
from: https://medium.muz.li/guide-to-data-visualization-comparison-part-1-678382ceef00

Figure 5.8: CAPTION
from: https://towardsdatascience.com/bubble-charts-why-how-f96d2c86d167

Figure 4.3: CAPTION
http://www.hockeyabstract.com/playerusagecharts

Figure 5.9: CAPTION
Colour + geometry allow us to plot (at least) 2 extra variables on a 2D scatter plot
May need to re-scale or bin the available data
A movie could be used to visualize an additional ordinal variable
Text can also be added to visualize an additional categorical variable
Works best when chart is not too encumbered
A personal favourite – a good mixture of traditional and modern features
5.1.3 Linegraphs and Sparklines
Line charts can show a single series or multiple series of data. particularly useful to show time series
Axis scale should be clear and relevant.
May wish to “anchor” \(y−\)axis if using dynamic filters otherwise the graph can jump around as people interact with it

Figure 4.5: CAPTION
5.1.4 Bar Charts and Histograms
Very versatile and useful.
ALWAYS (?) have a zero baseline.
Use graph axis OR data labels. Axis for broad statements, data labels for more detail.
Horizontal charts are apparently easier to read (according to many studies).
Think about the ordering of categories.

Figure 4.6: CAPTION
Stacked bar charts are designed for comparing totals, but can quickly become overwhelming.
Hard to sort / order.
Filtering is complicated in Power BI (what do you click on & how the chart responds when filter is clicked on?)

Figure 5.10: CAPTION
100% bar charts work well for visualizing portions of a whole on scale from negative to positive
Consistent baseline on far left and right
Easy to compare
Issue is no relative measure to magnitude of data
Research shows that horizonal is easier to process than vertical

Figure 5.11: CAPTION
Shows how initial value increases or decreases using a series of intermediate values.
Different colours can be used for increases and decreases.
Hard to remove elements without removing context (hard to declutter the chart).
Large increases / decreases look odd…

Figure 5.12: CAPTION
5.1.5 Area Charts and Treemaps
Try to avoid: human brains have a hard time attributing a value to a 2D area…
… except for numbers with vastly different magnitudes.

Figure 3.37: CAPTION
Simultaneously show big picture and can compare related easily.
Easy to process data sub-categories.
Useful to prioritize “big ticket items” in dynamic dashboards.
Labeling and colouring are tricky.

Figure 3.38: CAPTION
5.1.7 Maps, Heat Maps, and Choropleths
Should be a course in their own right – in Power BI you HAVE to have Lat/Long data to avoid rendering problems.
Build a hierarchy of location details (city, province, country) to give drill down options.
For now avoid ARCGIS map (does not support embedding)
Shape map is best for province level summaries

Figure 5.13: CAPTION
Most of us are quite familiar with geographical maps, so they tend to be easier to interpret.
Can produce a striking effect when the data visualization shows un-expected results which may mask significant information or lack of significant information or change the way you view things

Figure 4.13: CAPTION

Figure 4.14: CAPTION

Figure 3.40: CAPTION
[Paul Breding]
Heat maps are ideal to look at the relationship between 3 or 4 variables if one of them represents a percentage or a value within a set range (in order to fix the colour scale, for comparison purposes) and the other can act as categorical variables / size variables
Better to bin the data, even if the axes variables are continuous (decreases the number of required observations for usefulness)
Easier to read if colours are selected along natural colour gradients, such as Red \(\to\) Green or Red \(\to\) Yellow \(\to\) Green for instance (but that’s not ideal if colour blind)

Figure 3.41: CAPTION
The Horizon of Pedestrian Risk The rate of fatal traffic incidents involving pedestrians, each hour of the day, throughout the seasons of the year. The seasonal shift of our setting sun traces an ark of elevated risk – an echo of the curve of the Earth, itself (Note: ???). Source: Fatality Analysis Reporting System (NHTSA 2006-2010)
[J. Nelson, IDV Solutions]

Figure 3.42: CAPTION
[NBAsavant.com]

Figure 3.43: CAPTION
[left: A.E. McCann, right: author unknown]

Figure 4.15: CAPTION
5.1.8 Parallel Coordinates

Figure 4.16: CAPTION
[A. E. McCann, V. Cruz vs. L. Fitzgerald Peer Analysis]
5.1.9 Text Visualizations and Representations
For maximal impact, font size should be a function of frequency.
Typically used for univariate categorical data, but small multiples, cloud shape, word placement, colour, and hue could be used to integrate more variates.
Word placement and colour choice algorithm are “hidden”.
Could be used to answer authorship questions.

Figure 4.17: CAPTION
http://theeviljam.co.uk/2011/09/02/most-pirated-artists-2007-2010-word-cloud/

Figure 4.18: CAPTION

Figure 4.19: CAPTION

Figure 5.14: CAPTION

Figure 5.15: CAPTION
5.1.10 Trees, Dendrograms, and Network Diagrams

Figure 3.45: CAPTION

Figure 3.46: CAPTION

Figure 5.16: CAPTION

Figure 4.20: CAPTION
[P.Z. Meyers]
https://phys.org/news/2015-09-tree-life-million-species.html
5.1.13 Miscellaneous Charts
Designed on the premise that people can easily understand facial expressions.
Can accommodate up to 18 or 36 facial feature variables.
Works well in some instances, but in others… most facial features are not ordinal faces are more than the sum of their parts not all facial features carry emotions

Figure 3.49: CAPTION