Statistics New Zealand Graphics Guidelines


Statistics New Zealand Graphics Guidelines

The Graphics Guidelines have been prepared as a supplement to the Protocols for Official Statistics. They will assist with the implementation of Principle 8 of the protocol.

Principle 8: in analysing and reporting the results of a collection, objectivity and professionalism must be maintained and the data impartially presented in ways which are easy to understand.

1. Introduction

Graphs have two primary uses. Firstly, they can be used to explore and analyse data in order to uncover patterns and relationships. The second use of graphs forms the scope of these guidelines: the communication and display of results.

Graphs are widely used to communicate information. Unfortunately a focus on eye-catching graphic design and a lack of attention to principles for accurate presentation of information can result in graphs which are not clear and are misunderstood. The objective of these guidelines is to provide assistance in the production of graphs which accurately reflect the major story in the data and are presented in the clearest and most consistent possible way.

The design principles that follow are based primarily on recommendations from the following sources.

1. A. Wallgren, B. Wallgren, R. Persson, U. Jorner, and J. Haaland (1996). Graphing Statistics and Data, (Statistics Sweden). 93pp, Sage Publications Inc, Newbury Park.
2. E. Tufte, (1983). The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut.
3. N. Fisher, Informative graphics. CSIRO.
4. The Australian Bureau of Statistics graphics standards (1990).

These references (especially the first one) should be consulted where information is required that is beyond the scope of this document.

Producing graphs is an art as well as a skill. While adherence to the points in these guidelines will go a long way towards ensuring that a graph presents data in the best way possible, the process is not complete until at least one other person has reviewed the graph. This final step is vital.

A special point of note is that one should be careful when using computer packages to produce graphs. Many default settings are not ideal for good graphics, such as vertically written text, large spacing between bars on a bar graph etc.

1.1 When and why use graphics?

Graphs can be more revealing than statistical tables. The objective of a graph should be to convey the major story being revealed by the data in an unambiguous and illuminating form. Graphs should not only emphasise important statistical messages and indicate relative sizes or trends, but also create reader interest in the statistics.

The first step in deciding what to graph is to analyse the statistical output and understand the major elements to be represented. It should then be decided whether a graph is the best way of representing these elements. A table may be better.

How does one choose between use of a graph or a table? There are some fairly simple indicators of situations in which a table will be preferable. These are where the data sets:

1. are very small (perhaps just 3 or 4 values),
2. have several cross-classifications,
3. have comments attached to some of the data points,
4. contain numerical values which are of direct interest, or
5. contain numerical values that are likely to be required for future reference.

However in general a graph is preferable to a table since patterns can more easily be revealed.

Graphs should generally be located as closely as possible to the relevant tabular or descriptive presentation. In some cases, however (for example in small publications, or those where users may wish to compare one graph with another), it may be more appropriate to show all the graphs together.

1.2 Types of variables

Different types of variables require different sorts of graph. A variable will be one of the following:

1. Qualitative ('Words') e.g. Sex, Region
2. Quantitative ('Numbers')
a) Discrete ('Certain Values') e.g. number of rooms, family size.
b) Continuous ('All Values') e.g. an economic index, age, weight, temperature.

Continuous variables are often grouped into classes such as age ranges.

1.3 What should be in a graph?

The following are principles of constructing a graph. The components of a graph are defined in the Appendix.

- The graph should induce the reader to think about the data it contains, not the graphic design.

- Graphs should not give a false impression of the data by exaggerating differences.

- There should be as much white space as possible on a graph. In practice this means that a large proportion of the ink on a graph should be used to present the data itself. Grids, tick marks, labels etc. should be kept to a minimum.

- Graph 'junk’ should be minimised, e.g. hatching, stipples, unnecessary labelling and third dimensions.

- Graphs that contain only a few data points should be small in size.

- The interior of a graph, the plot area, is for data. This region should not be cluttered. Labels should be kept to a minimum; tick marks and scale labels should be outside the data region and when several series of data sets are included in the data region they must be visually distinguishable.

- The amount of text in the graph should be kept to a minimum. Explanatory text should be restricted to the title of the graph (and, where absolutely necessary, footnotes).

- A graph should still be intelligible after black-and-white photocopying or printing, so lines or bars should be distinguished by more than just colour.

Section 4, ‘Graphic standards’ gives more detail on what should be in a graph.

2. The process of graphing

The process of graphing falls into two stages, the first comprising data analysis and selection of the graph, and the second comprising construction of the graph and critical review of it.

2.1 First stage: Data analysis and graph selection

1. Perform a statistical analysis of the data set to find out what patterns and relationships (if any) it contains.
2. Decide if this information in the data is to be presented in a graph rather than in a table.
3. Decide on the basic, or primary, variables involved.
4. Identify the types of variables: quantitative or qualitative (categorical).
5. Decide on the specific variables and comparisons of interest.
6. Select an appropriate graph (time series, bar graph, dot graph etc) for the type of data. The interpretation should not be prejudiced by the technique of presentation.

2.2 Second stage: Construction of the graph

1. Construct an initial graph.
2. Consider re-ordering the variables.
3. Consider using extra plots.
4. Consider adding or removing zero.
5. Consider allowing for a break in an axis.
6. Consider changing the size of the graph.
7. Re-check against principles of graph construction.
- Is the graph easy to read?
- Can the graph be misinterpreted?
- Does the graph have a good size and shape?
- Is the graph in the right place?
- Does the graph benefit from being in colour?
8. Try the graph out on somebody.

These steps should be repeated until all points are satisfied.

3. When to use which graphs

3.1 Bar graphs

- A bar graph is best for comparison of quantities.

- Use when graphing a continuous variable by a categorical variable or when graphing classes (e.g. age ranges).

- Keep category labels as short as possible.

- It is often best to align the bars horizontally. This means that there is room for longer category labels (although these should be kept as short as possible). Vertical bar graphs with labels which do not fit neatly along the axis and require very large legends are difficult to interpret. The principle exception is when time is involved: the time axis should always be horizontal.

Example 3.1.1: Simple bar graph

Horizontal bars allow space for long category labels thus facilitating reading of the graph. Note that the bars are not touching and are evenly spaced. This indicates the categorical nature of the data.





Example 3.1.2: Simple bar graph, poor example.

The use of vertical bars rather than horizontal ones mean that a key is required for the categories which makes the graph difficult to read. The graph is also problematic in that there is a broken axis which is not clearly enough indicated (see 'Graphics standards' below)





- The best way to order the bars depends on intended purpose of the graph. In Example 3.1.1, the categories are geographical regions ordered from north to south, demonstrating the pattern in expected Maori population growth rates from north to south. Where there is no inherent ordering of the categories or the inherent order of the categories is not relevant to the message, order the bars by size. In Example 3.1.2 the categories are also geographic, but are ordered to demonstrate a pattern of increasing life expectancy.

- If more than one graph is displayed with a common set of categories then the bars should appear in the same order in all the graphs. Also the same size of graph should be kept if graphs can or have to be compared.

- The width of the gap between bars should be about 50 per cent of the bar width.

Where there is a third variable, e.g. time, one option is to use a grouped bar graph. In this case

- Group by either time period or category, depending on the message you wish to convey.
- Use no more than four categories in a group.

Example 3.1.3: Grouped bar graph. Grouping by years.

Here the graph is vertical so that time runs horizontally as is conventional.




Alternatively, if you want to highlight how individual categories change over time then group by category.

Example 3.1.4: Grouped bar graph. Grouping by categories.




3.2 Dot graphs

Experiments have been undertaken that show that dot graphs enable the user to interpret data more clearly than bar graphs, particularly where large numbers of observations are being graphed.

- Use as an alternative to horizontal bar graphs where there are a relatively large number of variable values.
- Use where the order of the categories is unimportant.
- Best used when portraying the value of categories in descending order of size.
- These graphs are strongly recommended.

Example 3.2.1: Dot graph

This example illustrates the flexibility of the dot graph: here a broken axis is neatly and clearly indicated.




3.3 Line graphs

- A line graph is best for showing changes and trends, especially over time.

- Use when graphing a continuous variable by a continuous variable. A common example is a time series, that is a graph where one variable is time.

- Display a maximum of three dependent variables (i.e. lines) on any one graph. Otherwise the graph can become crowded and difficult to read.

- Use a different line style for each variable, even if the lines are also distinguished by colour. This facilitates black-and-white printing and photocopying.

- Where multiple lines overlap such that they are difficult to distinguish, consider using more than one graph to display the data, or perhaps a grouped bar graph.

- In a graph of a time series, time should run horizontally.

- Consider using a vertical bar graph for time series where the series is short and the message relates to comparison of individual quantities, e.g. yearly results, rather than to changes.

- Where there is a visible seasonal component in a time series then at least two years' data should be graphed or the seasonal component of the variation may be mistaken for a trend

- Equal intervals (of time, for example) should be equally spaced. It follows that unequal intervals will be unequally spaced. For example, where data are for 1994, 1995, and 1997, the distance between 1995 and 1997 on the time axis should be twice that of the distance between 1994 and 1995.

- Where there is a discontinuity in the data, for example because of a change in the definition of a variable, do not join the points across the discontinuity. The discontinuity should be explained in the caption.

Example 3.3.1: Simple line graph

This time series presents a clear message. In particular the proportions of the graph are such that both the overall trend and the local deviations from it are obvious.




Example 3.3.2: Multiple line graph

The lines in this graph are clearly labelled, and it is generally clear to read. However there are further improvements to be made. The y-axis does not start at zero, yet the level of the data is important. The broken axis should have been clearly indicated as described below in the 'Graphics Standards' section. Some labels need horizontal alignment - this will mean further abbreviation of the tick-mark labels on the time axis. Also, the symbols are too heavy.





Example 3.3.3: Multiple line graph, poor example.

There are too many lines in this graph and some are overlapping. It is difficult to read.




3.4 Histograms

Histograms look like vertical bar graphs except with the bars touching.

- Use to graph the frequency distribution (counts) of classes of a continuous variable

- The area of each bar represents the quantity. Always try to make the intervals for the continuous variable equal so the bar widths will be equal. However if the intervals must be unequal so also should be the bar widths and the height of the bar should be adjusted correspondingly.

For example, where the continuous variable is age and the intervals are five years in all but one category where the interval is ten years, the width of the ten-year bar should be twice that of the other bars. To preserve areas, it should also be half the height of a 5-year bar with the same number of counts.

Example 3.4.1: Histogram: population pyramid.

A population pyramid is a special case of histogram used for demographic data and comprises two back-to-back horizontal histograms, one for men and one for women. Note in this example that the last age class, 90+, is open-ended, therefore it is unclear what width (and therefore, preserving area, what length) the bar should be. Misinterpretation is avoided in this case because there are so few in this category anyway.




3.5 Pie graphs

Pie graphs convey the relative sizes of the components of a total. They should only be used for overview situations and when the number of categories is few (preferably not more than five).

- Pie graphs may be unsuitable where there are several components of similar size. Pie graphs may be more suitable where one or two components dominate the total.

- A pie graph should represent one period only and should not be used to compare two or more periods.

Example 3.5.1: Pie graph, poor example.

There are too many segments in this graph and the third dimension makes comparison even more difficult by distorting apparent segment sizes. This pie graph probably is no more informative than a table of the numbers, and potentially more likely to be misread. For graphical display of these data a bar graph would have been better.




3.6 Stacked bar/area graph

- A stacked bar graph is used to illustrate the variation in both the relative proportions of the components and in the total.

- Use preferably where just two variables are being portrayed, as middle components do not align to a common base point.

- An area graph is used as a kind of 'stacked line graph' in situations where a line graph would be used to show a single series.

- Scale breaks are not permitted.

Example 3.6.1: Stacked bar graph, poor example

In this graph there is no common baseline for the middle components (6th form certificate, school certificate) making comparison between ethnic groups very difficult. A grouped bar graph would have been better (see Example 3.6.2).

There are also meaningless tick-marks on the category axis which should be removed, as should the figures inside the bars. A horizontal scale should be included.





Example 3.6.2: grouped bar graph as an alternative to Example 3.6.1.

Comparison between ethnic groups is now easy since there is a common baseline.




3.7 Pictograms

A pictogram is an illustration of graphical information using pictures. Pictograms should be used very sparingly as they can misrepresent the message the graph is meant to convey.

A common problem in pictograms is that areas or even volumes are used to show changes in one-dimensional data. Not only are areas and volumes difficult for the reader to compare, but very often the mistake is make of varying the two or three dimensions simultaneously. For example a difference in the data of a factor of two may be represented as an area difference of a factor of four or a volume difference of a factor of eight. A good rule is that the number of variable dimensions depicted should not exceed the number of dimensions in the data.

Example 3.7.1: Pictogram.

In this example, images of people are used to represent the relative sizes of the New Zealand population in different years in an eye-catching way. In this case, the graph represents the data accurately because the population size is represented by the length of the chain of people. In effect, this is a bar chart.





Example 3.7.2: Pictogram, poor example.

The numbers are represented in this pictogram by the sizes of the helmets. These are difficult to compare. Does the reader compare the heights, areas, or apparent volumes of the helmets? It would have been better to illustrate each number by a proportional number of soldiers in a manner similar to Example 3.8.1.




4. Graphics standards

The following requirements should be adhered to for any graph.

4.1 Shape and size

- If the nature of the data suggests the shape of the graphic, follow that suggestion.
- Otherwise the frame should be 1.5 times as wide as it is high.
- Small graphs should be used for simple messages, larger graphs for more complex messages.
- Comparison of related graphs should be facilitated by using identical scales of measurement and placing graphs side by side.

4.2 Graph title

- A graph title must be left aligned.
- The title should be informative but as short as possible. Supplement it with a separate caption under the graph if necessary.
- The title should be in mixed upper and lower (title) case, e.g. 'Sex Ratios of Elderly, Urban and Rural Areas'.

4.3 Plot Border

- A graph should have a border around the plot area if the plot area is the same colour as the rest of the page. This helps visually to connect the elements of the graph.

4.4 Scale

- A scale should be chosen which results in a balanced presentation and assists interpolation between labelled tick points. Use 1, 2, 5 (or 10, 20, 50 etc) as scale intervals. This will result in having easily recognisable values (even and multiples of 5) in the scale. For example avoid a scale as 30, 60, 90, 120, missing 100.

- Intervals should be evenly spaced. Non-linear scales (e.g. logarithmic) should only be used where absolutely necessary and where readers will not be misled.

- Use the same scale and format for graphs that are likely to be compared.

4.5 Axes

- Place the axes at the left and bottom of the graph.
- A second right hand axis should be used where the graph spans a whole page.
- Where the vertical axis has positive and negative values, the zero line should be clearly indicated.
- Two different types of vertical axis for different overlaid graphs should in general not be used as this is confusing to the reader. However occasionally this is a useful tool to compare patterns of trends (see Section 4.18).

4.6 Axis labels

- There should be name labels for both axes. These should begin with an initial capital followed by lower case e.g. 'Number never married'. An exception is where the category labels in a bar graph clearly identify what is being plotted on the axis (e.g. years, region names). In this case an axis name label may only add clutter to the graph.
- The unit and scale of measurement should be placed in the axis title and not in the graph title.
- The interval between the two highest y-axis labels should contain data.

4.7 Tick marks

- Tick marks must be outside of the axes.
- The width of the axis and the number of plot points will determine the number of tick points that are labelled. The number of labelled (major) tick marks must be less than 10 for the horizontal axis and less than 8 for the vertical axis.
- Minor ticks should be kept to the minimum necessary for clarity.
- The data should span the tick marks, i.e. the data should begin at the first tick mark and end at the last tick mark.
- The first and last major tick mark along the horizontal axis of a time series graph must be labelled.
- Do not put ticks between bars on graphs. They have no value and are confusing to readers.

4.8 Tick mark labels

- Numeric tick mark labels should have fewer than 4 digits and must have fewer than 6 digits (i.e. preferably 3 or less with a maximum of 5 digits). A comma as a thousands separator should be used for large numbers in graphs as it makes large numbers easier to read.

- The scale factor is the scaling to apply to the values labelling the tick marks e.g. a maximum scale value of 55,000 with a scale factor of 1,000 will display 55 as the maximum figure on the axis. The correct numeric axis label depends on the scale factor, the scale of the data, and the units of measurement (e.g. a label of '$M' where the scale factor is 1,000,000 and the units are dollars).

- The maximum and minimum vales of the numeric scale and the interval between tick marks must be selected appropriately so that suitable values appear for the axis tick labels. The value (maximum minus minimum) must be divisible by the specified interval value with no remainder.

- The tick mark labels should always be written under the plot area, not under the zero line.

4.9 Category labels

- Labels for categories of variables should be as short as possible consistent with interpretability.

4.10 Label alignment

- All labels should run horizontally.

4.11 Data point labels

It may occasionally be necessary to identify specific data points with labels. These should be

- inside the plot border, and
- with a line joining a label to its corresponding point.

4.12 Line styles

- Use different lines styles where lines cross or touch each other.

4.13 Legend

- Each line in a graph should be individually labelled if space permits. These labels must be clearly associated with the correct line only.
- Otherwise a legend (see definition of legend in the appendix) should be shown outside the graph area, preferably next to the lines, and to the right.
- Each column or group of columns in a column graph should be individually labeled when clearly possible, in preference to a legend.

4.14 Colours

- When colour is available use it sparingly, and normally only one colour, in soft tones and in a limited range of shades.

- Ensure that lines or bars can still be distinguished when reproduced in black and white, e.g. by using different line styles.

4.15 Fonts

- Use a san serif font, preferably Arial, for any text. On axes Arial Narrow can be used. Sources should be written in a font such as Goudy Old Style Italic, 7 points. In general the size depends on the publication. As a guideline the Publication Section of Statistics New Zealand uses for Analytical reports 12 points Arial for Headings.

4.16 Bars

- Bars must be filled with a solid shade, not hatching.
- Bars in a bar graph should not normally have borders (or the border must be the same shade as the bar itself).
- In grouped bar graphs in monochrome, the bars should be shaded in tints with the first bar shaded 80%, the second 60%, the third (where it exists) 40%, and the fourth (where it exists) 20%. If the background is grey, then the 20% bar may not show up very well. In this case a border is permitted around the bar.
- Where colour is used, it should be muted and preferably only one colour should be used. A colour should not be especially prominent as this could give false emphasis on the category (e.g. black, red and green: the red category is likely to be overemphasised). Recommended colours are blue, (bluish) green and purple.

4.17 Grids

- Subtle (grey-on-white or white-on-grey) grids should be used where possible to facilitate accurate value judgements.

4.18 Multiple scales

- Care should be taken with graphs with two different vertical scales as they can be difficult to interpret. Usually it is better to present the data on separate graphs (see Section 4.5).

- They are easiest to interpret where changes in pattern, not in levels, are of interest.

Example 4.18.1: Poor example: graph with two different vertical scales.

There is no relationship between the two scales so they only confuse the reader. In addition there is a broken axis which is not clearly indicated. This graph would be better as two separate graphs.




4.19 Use of symbols and abbreviations

- To save space, text in axis titles, tick mark labels and legends should use symbols and abbreviations as long as these are generally well understood. If in doubt, avoid their use.

- The use of symbols and abbreviations in graphs depends on the space available, but should be used only where necessary or where the abbreviation/symbol is generally well understood.

- Some examples of numeric axis titles are:

index - for index numbers
per cent - for per cent
ratio - for ratios that are not per cent
(000) - for thousands
million - for millions
$ - for dollars
$(000) - for thousands of dollars
$(million) - for millions of dollars

- For labels with months, in small graphs abbreviate to the first letter only without a full stop, i.e. J F M A M J J A S O N D.

4.20 Broken axes

Often the zero level for a continuous variable needs to be shown to indicate the level of the data and to prevent a misleading picture, but the scale required to do so would visually suppress small but interesting differences between the data points. In this case a percentage change graph should be considered instead. If this is not possible, the solution is to use a broken axis.

- When a broken axis is used it should be indicated clearly. For both bar and line graphs the preferred way of showing this is with a clear break across all the bars and the axis as shown in Example 4.20.1. A double slash at the y-axis may also be required. No indication, or only subtle indication, of a broken axis might cause the reader to have an exaggerated impression of the importance of small differences between the bars or data points.

Example 4.20.1: Broken axes.

The following graphs have clearly indicated broken axes.







Do not break an axis near the top of a bar or data point. This can make it difficult to tell what value is being represented.

Example 4.20.2: Broken axes, poor example.

This graph has the axis broken near the top of a bar. It is unclear exactly where the scale break is. The graph should be improved by making the lowest bar at least one scale interval above the break and by indicating the broken axis more clearly as in the examples above.




- At least 50% of the original scale should be displayed in a graph with a broken axis. This restricts over-emphasising minimal differences between the bars.

- The absolute value of an index has no intrinsic meaning, so it is not necessary to include the zero point on a graph of index values, or to indicate a broken axis, unless the purpose is to make ratios comparable.

Example 4.20.3: Graph of an index.

It is the changes in the indexes in this graph that are significant, not the absolute values of the indexes. Therefore the axis does not need to start at zero and the broken axis does not need to be indicated.




4.21 Data sources

- The source of the data should always be given beneath the graph.

4.22 Caption

- All graphs should have a caption.

- Captions should make graphs as nearly self-explanatory (i.e. independent of the text) as is feasible. If this is not feasible there should be a reference to the specific part of the text.

4.23 Medium

- Always keep the medium of presentation in mind and test the graph in that medium before finalising. For example, print out the diagram to see it in printed form.

Appendix: Components of a graph

The following diagram illustrates the components of a graph as referred to in these guidelines.