5.1 Visualization Toolkit

With data displays, we try to highlight:

  • a relationship – show a connection or correlation between two or more variables, such as the impact of an aging population on health care;

  • a comparison – set some variables apart from others, and display how those two variables interact, such as the number of fans attending hockey games for different teams in a season;

  • a composition – collect different types of information that make up a whole and display them together, such as the various search terms that visitors used to land on your site, or how many visitors came from various sources (links, search engines, or direct traffic), and

  • a distribution – lay out a collection of related or unrelated information to see how it correlates (if at all), and to understand if there’s any interaction between the variables, such as the number of bugs reported during each month after a new software release.

Here are some examples of what some of those types of visualizations can look like; we will give a more thorough treatment shortly (comprehensive catalogues can be found in [6][8], [98], among others).

Classification scheme for the kyphosis dataset.

Figure 5.1: Decision Tree: classification scheme for the kyphosis dataset (personal file).

Histogram of reported weekly work hours.

Figure 5.2: Histogram of reported weekly work hours (personal file).

Estimated average project effort over-layed over product complexity, programmer capability, and product count in NASA's COCOMO dataset.

Figure 5.3: Decision tree bubble chart: estimated average project effort (in red) over-layed over product complexity, programmer capability, and product count in NASA’s COCOMO dataset (personal file).

Diagnosis network around COPD in the Danish Medical Dataset.

Figure 5.4: Association rules network: diagnosis network around COPD in the Danish Medical Dataset [99].

Classification of two categories in an artificial dataset.

Figure 5.5: Classification scatterplot: artificial dataset (personal file).

lassification bubble chart: Hertzsprung-Russell diagram of stellar evolution.

Figure 5.6: Classification bubble chart: Hertzsprung-Russell diagram of stellar evolution (European Southern Observatory).

Trend, seasonality, shifts of a supply chain metric.

Figure 5.7: Time series: trend, seasonality, shifts of a supply chain metric (personal file).

5.1.1 Simple Text and Tables

One or two numbers to focus on.

Good at “setting the scene”.

Draws focus to an area of the report.


Figure 2.2: CAPTION

Tables interact with our verbal system, which means we read them: used to compare values audiences will look for their rows

Table design needs to blend into background the data should stand out, not the borders dense table/data: use alternating row colour


Figure 3.33: CAPTION

Leverage colour to convey magnitude use single colour saturation rather than differentiation (different colours) with a legend (white = low, blue = high), numbers can be removed without altering the message


Figure 3.34: CAPTION

5.1.2 Scatterplots and Bubble Charts

Show relationship between 2 variables (scatterplot) or 3 variables (bubble plot) use average lines (dotted lines) to provide context far fewer options in Power BI than Excel consider using groupings to add clarity (e.g. colour gradients)


Figure 4.2: CAPTION


Figure 3.35: CAPTION

from: https://medium.muz.li/guide-to-data-visualization-comparison-part-1-678382ceef00


Figure 5.8: CAPTION

from: https://towardsdatascience.com/bubble-charts-why-how-f96d2c86d167


Figure 4.3: CAPTION



Figure 5.9: CAPTION

Colour + geometry allow us to plot (at least) 2 extra variables on a 2D scatter plot

May need to re-scale or bin the available data

A movie could be used to visualize an additional ordinal variable

Text can also be added to visualize an additional categorical variable

Works best when chart is not too encumbered

A personal favourite – a good mixture of traditional and modern features

5.1.3 Linegraphs and Sparklines

Line charts can show a single series or multiple series of data. particularly useful to show time series

Axis scale should be clear and relevant.

May wish to “anchor” \(y−\)axis if using dynamic filters otherwise the graph can jump around as people interact with it


Figure 4.5: CAPTION

5.1.4 Bar Charts and Histograms

Very versatile and useful.

ALWAYS (?) have a zero baseline.

Use graph axis OR data labels. Axis for broad statements, data labels for more detail.

Horizontal charts are apparently easier to read (according to many studies).

Think about the ordering of categories.


Figure 4.6: CAPTION

Stacked bar charts are designed for comparing totals, but can quickly become overwhelming.

Hard to sort / order.

Filtering is complicated in Power BI (what do you click on & how the chart responds when filter is clicked on?)


Figure 5.10: CAPTION

100% bar charts work well for visualizing portions of a whole on scale from negative to positive

Consistent baseline on far left and right

Easy to compare

Issue is no relative measure to magnitude of data

Research shows that horizonal is easier to process than vertical


Figure 5.11: CAPTION

Shows how initial value increases or decreases using a series of intermediate values.

Different colours can be used for increases and decreases.

Hard to remove elements without removing context (hard to declutter the chart).

Large increases / decreases look odd…


Figure 5.12: CAPTION

5.1.5 Area Charts and Treemaps

Try to avoid: human brains have a hard time attributing a value to a 2D area…

… except for numbers with vastly different magnitudes.


Figure 3.37: CAPTION

Simultaneously show big picture and can compare related easily.

Easy to process data sub-categories.

Useful to prioritize “big ticket items” in dynamic dashboards.

Labeling and colouring are tricky.


Figure 3.38: CAPTION

5.1.6 Boxplots

5.1.7 Maps, Heat Maps, and Choropleths

Should be a course in their own right – in Power BI you HAVE to have Lat/Long data to avoid rendering problems.

Build a hierarchy of location details (city, province, country) to give drill down options.

For now avoid ARCGIS map (does not support embedding)

Shape map is best for province level summaries


Figure 5.13: CAPTION

Most of us are quite familiar with geographical maps, so they tend to be easier to interpret.

Can produce a striking effect when the data visualization shows un-expected results which may mask significant information or lack of significant information or change the way you view things


Figure 4.13: CAPTION


Figure 4.14: CAPTION


Figure 3.40: CAPTION

[Paul Breding]

Heat maps are ideal to look at the relationship between 3 or 4 variables if one of them represents a percentage or a value within a set range (in order to fix the colour scale, for comparison purposes) and the other can act as categorical variables / size variables

Better to bin the data, even if the axes variables are continuous (decreases the number of required observations for usefulness)

Easier to read if colours are selected along natural colour gradients, such as Red \(\to\) Green or Red \(\to\) Yellow \(\to\) Green for instance (but that’s not ideal if colour blind)


Figure 3.41: CAPTION

The Horizon of Pedestrian Risk The rate of fatal traffic incidents involving pedestrians, each hour of the day, throughout the seasons of the year. The seasonal shift of our setting sun traces an ark of elevated risk – an echo of the curve of the Earth, itself (Note: ???). Source: Fatality Analysis Reporting System (NHTSA 2006-2010)

[J. Nelson, IDV Solutions]


Figure 3.42: CAPTION



Figure 3.43: CAPTION

[left: A.E. McCann, right: author unknown]


Figure 4.15: CAPTION

5.1.8 Parallel Coordinates


Figure 4.16: CAPTION

[A. E. McCann, V. Cruz vs. L. Fitzgerald Peer Analysis]

5.1.9 Text Visualizations and Representations

For maximal impact, font size should be a function of frequency.

Typically used for univariate categorical data, but small multiples, cloud shape, word placement, colour, and hue could be used to integrate more variates.

Word placement and colour choice algorithm are “hidden”.

Could be used to answer authorship questions.


Figure 4.17: CAPTION



Figure 4.18: CAPTION


Figure 4.19: CAPTION


Figure 5.14: CAPTION


Figure 5.15: CAPTION

5.1.10 Trees, Dendrograms, and Network Diagrams


Figure 3.45: CAPTION


Figure 3.46: CAPTION


Figure 5.16: CAPTION


Figure 4.20: CAPTION

[P.Z. Meyers]


5.1.11 Small Multiples


Figure 5.17: CAPTION


Figure 3.48: CAPTION

5.1.12 Interactive and Animated Visualizations

5.1.13 Miscellaneous Charts

Designed on the premise that people can easily understand facial expressions.

Can accommodate up to 18 or 36 facial feature variables.

Works well in some instances, but in others… most facial features are not ordinal faces are more than the sum of their parts not all facial features carry emotions


Figure 3.49: CAPTION


A. Cairo, The Functional Art. New Riders, 2013.
I. Meirelles, Design for Information : an Introduction to the Histories, Theories, and Best Practices Behind Effective Information Visualizations. Rockport, 2013.
N. Yau, FlowingData.
A. B. Jensen et al., “Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients,” Nature Communications, vol. 5, 2014, doi: 10.1038/ncomms5022.