3.2 Visual Design and Data Charts

When it comes to data charts, the Gestalt principles are not the entire story, however; various other factors enter the visual design picture.

3.2.1 Visualization and Memory

The first thing to realize is that human brains are engaged with different types of memories when they are faced with visual information and storytelling:

  • iconic memory directs the eye (pre-attentive processes);

  • short-term memory limits how many charts one may reasonably expect to encounter on a report/dashboard page or an infographic poster, say, and

  • long-term memory helps the audience retain the visual message and information presented to them.

Iconic memory is the visual sensory memory register relating to the visual domain; it is a fast-decaying, high-capacity store of visual information. Most importantly, iconic memory is exceedingly brief: it provides a coherent representation of our entire visual perception in a heartbeat. We think it evolved as a survival mechanism attuned to pre-attentive attributes (see Figure 3.8), the subconscious accumulation of information from the environment. In a nutshell iconic memory draws our eyes to anything in our field of vision that seems out of place (and potentially represents a danger); once the situation has been “resolved” to our satisfaction, the triggering stimulii are absorbed by our regular memory and become nearly invisible.9

Visual thinking seeks patterns. In the search for such patterns, pre-attentive processes are fast, instinctive, efficient, and they compatible with multitasking. They gather information and build patterns by going from features to objects.

We can easily locate Waldo in the image below using only pre-attentive attributes; the task is so easily accomplished that the image can becomes uninteresting in short order: we have seen what there was to see and are ready to move on to the next order of business.

Using pre-attentive processes to easily locate Waldo, the observation of interest (based on a fantastic idea in [@groeger]).

Figure 2.2: Using pre-attentive processes to easily locate Waldo, the observation of interest (based on a fantastic idea in [34]).

Attentive process, on the other hand, are slow, deliberate, and focused; they require time and concentration (cognitive energy). They identify objects first, then drill on down to the features.

Any serious attempt to locate Waldo in the image below requires shedding pre-attentive processes; the task can easily become engrossing (explaining the books’ popularity, presumably).

Failure of pre-attentive processes to easily locate Waldo, the observation of interest (based on a fantastic idea in [@groeger]).

Figure 3.33: Failure of pre-attentive processes to easily locate Waldo, the observation of interest (based on a fantastic idea in [34]).

Most data presentations do not ask for such a commitment on the part of their audience (in fact, this often proves counter-productive in data storytelling, where the main principle is to clearly show and tell); how long would you be willing to search before concluding that the exercise was a charade and that Waldo is nowhere to be found in the picture?10

Classical illustration of the difference between a pre-attentive (left) vs. an attentive search (right) imposed by design elemeny choices: we can still find the observation of interest on the right, but it requires work -- not so on the left [original author unknown].

Figure 3.34: Classical illustration of the difference between a pre-attentive (left) vs. an attentive search (right) imposed by design elemeny choices: we can still find the observation of interest on the right, but it requires work – not so on the left [original author unknown].

Short term memory provides a cap to how many visual information chunks we can hold in our brains simultaneously, namely about 4, for the most part (there are exceptions, of course, but they are rare). When presented with more information (such as a fifth chart on a dashboard page, say), some of the chunks currently held in the short term memory register must be processed to make room for the new information. The old chunks become invisible (until they are processed in again, at the expense of other charts). This continuous short term memory back and forth can become quite distracting and interfere with the audience’s “willing suspension of disbelief”, which is to say, their willingness to play along with the data presentation.

This not to say that we can only read charts with 4 or fewer markers (be they lines, bars, dots, and the likes); in practice, we try to build charts so that they form focused hierarchies of visual parts – the Gestalt principles can help in that regard.11

Good visual design uses pre-attentive processes to draw in the audience and short term memory limitations to avoid overpopulating the display pages, all while making the charts interesting enough to reward sticking around for a longer look and increasing the odds that the audience will retain the message. This is where attentive processes and long term memory kick in.

Long term memory is built up over a lifetime of experiences and forms the basis of pattern recognition and general cognitive processing; it is an aggregate of visual memory and verbal memory. The combination of relatively simple images with relatively simple text help us trigger long-term memory, making the story “stick”. Design and text choices can also provide different contexts (and thus, different long-term memories).

As an example, consider the following text cards accompanying an unseen chart related to Access to Information and Privacy (ATIP) requests:

The option on the left might be the chart description provided as an accessibility aid (see Practical Suggestions. We will remember the topic (ATIP requests) and that there were large numbers (with different magnitudes), but it is a safe bet that the vast majority of the audience will not remember the exact numbers: there is really little point to reporting to the nearest unit with quantities in the ten thousands or the millions.

The text on the right, on the other hand, brings us a step closer to conveying a data story, with its focus on general trends and meaningful approximations, relative magnitudes, and the added statistical summary. It is likely that some audience members would remember 30K requests, 6.6M pages, and 230 pages/request when all is said and done.

We will have more to say on the topic in Visualization and Storytelling.

3.2.2 Colour, Sizing, and Layout Considerations

Practitioners often end up spending quite a lot of time on considerations related to colour, size, and layout.12 What is all the fuss about?


Colour theory is an old and complicated topic, mixing artistic, psychological, cultural, and scientific viewpoints, and it is not without its controversies;13 we will not dwell on these topics, but here is a little start.

From our perspective, it will suffice to know that there are three primary colours: blue, red, and yellow.14 Additional colours can be built by mixing the primary colours in various combinations: purple is an equal mixture of red and blue, orange of yellow and red, and green of blue and yellow.15 This information is often presented as colour wheels.

The main colour wheels.

Figure 3.35: The main colour wheels.

The colour wheels provide a foundation for colour schemes, of which there are many:

  • achromatic (colourless, using only blacks, whites and grays)

  • monochromatic (1-colour schemes);

  • complementary (colours directly across from each other on the colour wheel);

  • split complementary (2 of the 3 colours are adjacent; 1 of the colours is opposite);

  • split-left and split-right complementary (“split” colours are either to the left or right of the complementary colour);

  • analogous (any 3 adjacent primary, secondary, or tertiary colours on the colour wheel);

  • colour diad (2 colours that are 2 colours apart on the color wheel);

  • colour triad (3 colours, equally distant from each other on the colour wheel), and

  • colour tetrad (4 or more colors on the colour wheel).

Schemes are used in a variety of settings: can you recognize them in the charts below?16

When it comes to colour, the advice is simple: less is more: it should be used sparingly (graphic designers are taught to “get it right, in black and white”). But based on the Gestalt principles, monochrome schemes can be particularly effective: we select a few base colours and build monochrome schemes from these colours (with an achromatic scheme thrown in for good measure).

When appropriate, the job of selecting a colour scheme can be made easier by using a corporate identity: this maximizes buy in, but be wary of schemes that run afoul of well-established conventions (especially those involving heavy use of reds) and accessibility considerations (in particular, schemes that ignore issues related to colourblindness and low-visibility). Whatever colour scheme is settled on, it is good practice to take the time to create a template, share it around, and stick to it.

Colour Palettes

Designers have built palettes with a variety of properties and made them available to practitioners (various data visualization software also have commonly-used default palettes, see Top R Color Palettes to Know for Great Data Visualization and Choosing a Color Palette For Your Power BI Report, for instance).

Here is a very small sample of pre-built colour palettes.

“Statistical graphics are often augmented by the use of color coding information contained in some variable. When this involves the shading of areas (and not only points or lines)—e.g., as in bar plots, pie charts, mosaic displays or heatmaps—it is important that the colors are perceptually based and do not introduce optical illusions or systematic bias. Based on the perceptually-based Hue-Chroma-Luminance (HCL) color space suitable color palettes are derived for coding categorical data (qualitative palettes) and numerical variables (sequential and diverging palettes).” [36]

“Graphics with scientific data become clearer when the colours are chosen carefully. It is convenient to have good default schemes ready for each type of data, with colours that are: distinct for all people, including colour-blind readers; distinct from black and white; distinct on screen and paper; matching well together. This document shows such schemes, developed with the help of mathematical descriptions of colour differences and the two main types of colour-blind vision” [37]

“Requests for sets of colours which would be maximally different for use in color coding, providing maximum contrast for those with deficient color vision, have resultes in the selection of 22 shades from the ICSS-NBS Centroid Colors.” [38]

“A method is presented for choosing high-contrast sets of colors for additive color mixers (e.g., CRT). The method is based on data about target-location performance of human observers and adapts the color sets to the gamut of the color processor in use. The method produces any specified number of colors spread as far from each other as possible in color space to maximize contrast. Applications of high-contrast sets of colors are indicated, illustrative results are presented and discussed, and variations of the method are suggested.” [39]

There is a French idiomatic expression that does not seem to have a direct translation in English (and which is quite a propos, given the topic at hand): “des goûts et des couleurs, on ne discute pas”. We must definitely invest time and effort to think critically about colour use and palettes for data presentations, but it is important to keep in mind that no amount of effort will please every stakeholder and audience member.

Case in point, when we ask our workshop participants which of the schemes of Figure 3.36 they prefer most (or least), we never reach a consensus. It is preferable not to over-think this.

Competing colour palettes.

Figure 3.36: Competing colour palettes.


We tend to view bigger charts as being more important than smaller ones. Once all charts have been decluttered, things of equal importance are sized similarly, and everything else scales to importance.

In the chart above, we would expect the audience to spend about half the time on the funnel chart, and the remaining time more or less split evenly on the other two charts.


If we assume that a page contains the short term memory limit of 4 charts, how should they be placed? The standard answer, in the West at least, is to place them according to the reading flow, zig-zagging from the top-left to the bottom-right, as below. This layout is also supported by the Gestalt principle of continuity as it allows for easy and natural alignment of the charts.

The standard 4-chart layout.

Figure 3.37: The standard 4-chart layout.

Our concentration depletes itself as we scan (and resets itself upon turning to a new page, whether in a book or a dashboard). Charts should be ordered according to some preference, of course, but in order not to overtax the audience, they should get progressively simpler as we approach the 4th slot.

This also applies to alternative layouts, such as those presented below.

Alternative 4-chart layouts.Alternative 4-chart layouts.

Figure 3.38: Alternative 4-chart layouts.

3.2.3 Decluttering

In the world of charts and storytelling with data, less is more (one more time, for good measure): clutter gets in the way of clarity.

Every element on a chart or dashboard page adds to the cognitive load required to “read” the presentation; our brains are like all other organs in that they require energy to function17 – the lower the energy requirement, the better (as a general evolutionary strategy).

So decluttering a chart or a dashboard page is simple: all we need to do is identify and remove anything that is not adding value to the graphics.18

Tufte refers to the data-to-ink ratio -– “the larger the share of a graphic’s ink devoted to data, the better” [1], while Duarte refers to this as “maximizing the signal-to-noise ratio”, where the signal is the information or the story we want to communicate [40]: the chart should first and foremost be abou the data AND the story.

We use the Gestalt principles to organize/highlight data in the chart (see section 3.1.3).

In particular, we should:

  • align all elements (graphs, text, lines, etc.), without rlying on the eye (use position boxes and values when possible;

  • remove chart borders, gridlines, data markers (unless the marker is the only signal in the chart), and useless annotations (ditto);

  • clean the axis labels;

  • label the data directly (instead of using a legend box);

  • use consistent fonts, font size, colour, and alignment (L.C. Muth discusses font choices here];

  • avoid rotated text – everything should either be horizontal or vertical, but not slanted (even then, there incompatibility with how vertical text is rendered: from top to bottom, or from bottom to top?);

  • use white space and keep clear margins free of text and visuals – we basically think of white space as a border.

For instance, we consider the following chart to be clunky and awkward.

It is obviously a chart, but it is not clear what we are meant to focus on – there are too many distractions. Compare with the decluttered version of the same data:

Here, we are free to focus on the story: “More Things” used to dominate over “Things”, but the roles were reversed in May. Why is that so? The chart does not say, presumably because this is not part of the story (although it easil could have been, in a different context).

3.2.4 The Layered Grammar of Graphics

It is one thing to recognize when charts are effective (Hall-of-Fame/Hall-of-Shame), when their aesthetics make them easy to read (Gestalt Principles), and when they are laid out in a dashboard which tells a compelling visual story (Anatomy of Storytelling Dashboards). We can also fairly easily recognize when they fail to do so and provide recommendations, as needed (Decluttering, Evolving a Storytelling Chart).

But it is another thing altogether to start with raw data and to build such charts – storyboarding, for instance, makes do entirely without charts in the planning stages, and dashboard layout guidelines and Gestalt principles considerations explicitly assume that base charts have already been produced.

The problem is often compounded by software choices made by clients or employers without practitioners’ input, which can leave analysts grasping at straws and blindly tossing charts together.

The grammar of graphics [41], [42] provides a reliable tool-agnostic path out of the wilderness. Why “grammar”? D.J. Sarkar elucidates:

“A grammar is defined as a set of structural rules which helps define and establish the components of a language. A language’s system/structure usually consists of syntax and semantics. A grammar of graphics is a framework which follows a layered approach to describe and construct visualizations or graphics in a structured manner. The layered grammar of graphics uses pre-defined components to build charts (instead of random trials and errors).” [43]

Naturally, the layered grammar of graphics is best described with the help of a chart, courtesy of the QCBS R Workshop Series (slightly modified):

What does that mean? There are 7 layers in this version of the grammar of graphics:

  1. data (required): the plotting observations are found in rows, the variables in columns;

  2. aesthetics (required): the mapping of the dataset’s variables to the chart’s design elements (position, shape, size, colour, etc.);

  3. geometry (required): the type of chart on which the data is represented (bar, line, scatterplot, pie, etc.);

  4. facets (optional): the subsets of the data represented on the chart (variable levels);

  5. statistics (optional): the measures that help provide chart context (centrality, dispersion, trend, etc.);

  6. coordinates (required): the chart’s plotting space (axes, scale, etc.), and

  7. themes (required): the design choices that are used to create a visual identity (fonts, colours, etc.).

In practice, layers 2 and 3 are often decided on simultaneously; while layers 6 and 7 are required, they are often left unspecified in favour of default settings, in applications (the most famous of which is without doubt Wickham’s celebrated ggplot2, see section 5.6).

Considering the layered grammar of graphics prior to embarking on analysis and visualization can help analysts save precious process time; a good way to practice (and thus reduce the required time) is to get into the habit of deconstructing charts using the grammar (which is the opposite of its habitual use in building charts). Deconstructing a Chart Using the Layered Grammar of Graphics

The Gapminder dataset (https://gapminder.org) contains socio-demographic information (upwards of 500 variables) for the Earth’s nations, for years ranging from 1800 to 2020. We have already studied one chart built from this dataset in Principles of Analytical Design, the 2012 Health and Wealth of Nations, shown below.

Gapminder's Health and Wealth of Nation (2012).

Figure 3.39: Health and Wealth of Nations, in 2012 (Gapminder Foundation).

What can be said about this chart, from the perspective of layered grammar?

It uses a subset of the 2012 data, with the bubble chart geometry, which allows for the use of 4 design elements: position (x2), size, and fill. The aesthetics mapping between the geometry and the data is made explicit below:

  • horizontal position \(\to\) income per person

  • vertical position \(\to\) life expectancy

  • fill \(\to\) region

  • size \(\to\) population

This chart admits no facet as all observations are plotted in the same space (unique set of axes). Two statistics are added to the canvas: the world life expectancy (equator) and the world income per person (prime meridian). On the coordinates front, life expectancy is displayed linearly (as this quantity is distributed more or less uniformly on its range), whereas income per person and population are displayed logarithmically (due to the presence of long tails in the positive direction for both quantities). Finally, the chart is displayed with the old Gapminder World style, providing a nice twist on the traditional cartographer’s map.

3.2.5 Examples

We present a few final examples related to the material found in this module.


Iris is a genus of plants with showy flowers. The iris dataset contains 150 observations of 5 attributes for specimens collected by Anderson, mostly from a Gaspé peninsula’s pasture in the 1930s [44].

The attributes are

  • petal width

  • petal length

  • sepal width

  • sepal length

  • species (virginica, versicolor, setosa)

A “description” of these features is provided by the picture in the picture below:

This dataset has become synonymous with data analysis,19 being used to showcase just about every algorithm under the sun. It is thus with a certain amount of sadness that we also use it in this section.20

We look at various versions of the scatterplot matrix of the iris dataset below, ranging from excessively simple but uncluttered to chock full of information in a chaotic soup of clutter.

Scatterplot matrix; iris dataset. The clutter level increases from the first to the last chart.Scatterplot matrix; iris dataset. The clutter level increases from the first to the last chart.Scatterplot matrix; iris dataset. The clutter level increases from the first to the last chart.Scatterplot matrix; iris dataset. The clutter level increases from the first to the last chart.

Figure 3.40: Scatterplot matrix; iris dataset. The clutter level increases from the first to the last chart.

The first is probably too sparse to be of use, the last one will drive anybody unlucky enough to gaze upon it mad.21 The perfect mix would probably combine the best features of the second and third versions.

The next example is taken from [3].

Cluttered chart.

Figure 3.41: Cluttered chart.

Remove border and gridlines (left); remove markers (right).

Figure 3.42: Remove border and gridlines (left); remove markers (right).

Clean-up axis labels and legend (left); colour code the lines to get a decluttered chart (right).

Figure 3.43: Clean-up axis labels and legend (left); colour code the lines to get a decluttered chart (right).

The Layered Grammar of Graphics

Here is a brief grammar of graphics deconstruction of seven additional charts built via the Gapminder dataset. Note that these are not the only charts that could be displayed, far from it! But it is a start.

Data: Gapminder countries, 2009

Geometry: stacked density chart


  • x: daily income

  • (y: percentage per country)

  • fill: region

Facets: none

Statistics: extreme poverty proportion

Coordinates: logarithmic (x)

Theme: Gapminder Tools; adornment (extreme poverty vertical line)

Data: Gapminder countries, 2009

Geometry: bubble chart


  • x: total fertility

  • y: income per person

  • fill: UNICEF region

  • size: population

Facets: none

Statistics: none

Coordinates: logarithmic (x, y, size)

Theme: Gapminder Tools

Data: Gapminder countries, 2009

Geometry: scatterplot chart


  • x: total fertility

  • y: life expectancy

Facets: none

Statistics: line of best fit, confidence interval for mean response

Coordinates: linear (x, y)

Theme: ggplot2 default

Data: Gapminder countries, 2009

Geometry: density chart


  • x: infant mortality

  • fill: continent

Facets: continent

Statistics: none

Coordinates: linear (x)

Theme: Darjeeling1

Data: Gapminder countries, 2009

Geometry: bubble chart


  • x: income per person

  • y: HIV infection rate

  • fill: WHO region

  • size: HIV infected population

Facets: none

Statistics: none

Coordinates: logarithmic (x, y, size)

Theme: old Gapminder World

Data: Gapminder countries, 2009

Geometry: boxplot chart


  • x: fertility rate

  • y: continent

  • fill: continent

“Facets”: continent (?)

Statistics: 5-pt summary (built-in)

Coordinates: linear (x)

Theme: Darjeeling1

Data: selected Gapminder countries, 1960-2011

Geometry: line chart


  • x: total fertility

  • y: percentage per country

  • colour: country

  • size: year

Facets: none

Statistics: none

Coordinates: linear (x, y, size)

Theme: custom (author unknown)


E. Tufte, The Visual Display of Quantitative Information. Graphics Press, 2001.
C. Nussbaumer Knaflic, Storytelling with Data. Wiley, 2015.
L. V. Groeger, A big article about wee things,” ProPublica, Sep. 2014.
S. McCloud, Making Comics: Storytelling Secrets of Comics, Manga and Graphic Novels. Harper, 2006.
A. Zeileis, K. Hornik, and P. Murrell, Escaping RGBland: Selecting colors for statistical graphics,” Computational Statistics & Data Analysis, vol. 53, no. 9, pp. 3259–3270, 2009.
P. Tol, Colour schemes,” SRON Technical Note, vol. 9–2, no. 3.2, 2021.
K. L. Kelly, “22 colours of maximum contrast,” Color Eng., no. 6, 1965.
R. C. Carter and E. C. Carter, “High-contrast sets of colors,” Applied Optics, vol. 21, no. 16, pp. 2936–2939, Aug. 1982.
N. Duarte, Resonate: Present visual stories that transform audiences. Wiley, 2013.
L. Wilkinson, The Grammar of Graphics. Springer, 1999.
H. Wickham, “A layered grammar of graphics,” Journal of Computational and Graphical Statistics, no. 19, pp. 3–28, 2009.
R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, vol. 7, no. 7, pp. 179–188, 1936.
A. M. Raja, Penguins dataset overview - iris alternative,” Towards Data Science, Jun. 2020.

  1. We can get used to many things that do not outright kill us quite quickly: it took one of us (Patrick) about 30 minutes to become comfortable driving on the left on a trip to New Zealand after 20 years of driving on the right in Canada, but he has not been able to shake off his French Canadian accent after nearly 30 years of speaking English.↩︎

  2. Sorry about that…↩︎

  3. This could cause confusion: what we mean is that there may be various hierarchical levels in a chart. Think of the universe being composed of galactic groupings, themselves composed of galaxies, which are composed of stars, which may or may not have attendant planets and other bodies circling them. The equivalent structure might be a dashboard composed of pages, each consisting of a certain number of charts, which may hold clusters of markers; effective charts try to maintain the number of components in a hierarchical level small (it is not always possible to do so, of course).↩︎

  4. Overly much, perhaps, but that might simply be because they are fairly easy to control with modern toos↩︎

  5. Do colours have emotional meanings, for instance?↩︎

  6. That is the additive colour model, which is not the only such model (see CMYK, [[35];McCloud_UC]).↩︎

  7. That last one we would have a harder time believing had we not seen it time and time again on kids’ drawings.. perhaps it has to do with there not being a green rod in the eye?↩︎

  8. ↩︎

  9. Think of cognitive load as mental effort required to process information.↩︎

  10. To be fair, this may be easier said than done… especially if we have laboured to bring a chart to fruition. “Everything that we have added was needed!”, we might scream in despair. Ask around, when in doubt – and leave the ego at the coat check.↩︎

  11. To the point that the standard joke is that “it’s not necessary to be a gardener to become a data analyst, but it helps”.↩︎

  12. Note that the iris dataset has started being phased out in favour of the penguin dataset [45], for reasons that do not solely have to do with its overuse (hint: take a look at the name of the journal that published Fisher’s paper).↩︎

  13. MAD!!↩︎