Working with numbers

It was three ‘zero-confused’ examples from last week – see bottom of this post – that got me thinking about tackling this topic. The result is wordy, but I hope helpful. Feedback welcome, I’d love to nail down some good guidance on this topic.

Click image to see full graphic

This table used in an earlier post – that compares who was and wasn’t affected by the New Orleans flooding – is a good example of how to present numbers with the appropriate amount of detail for the context they are being used in. Populations are given as 346,000 and 138,000, percentage breakdowns as 76%, 18% or 2%: enough detail for the desired comparison, and a consistent level of detail throughout.

This sort of simple and confident handling of numbers is deceptively hard to achieve, especially if you’re someone who shudders at the mention of ‘significant figures’, ’rounding’, ‘decimal places’. While they are the names for the tools that enable us to deal sensibly with the presentation of numbers I know their very mention can transport people straight back to their school maths class which, by and large, they were hating every minute of.

So maybe looking at the principles behind why it’s worth it is a better approach: clarity, context, precision, relevance, audience rather than those maths-y labels. Here goes…

When we say on the phone “I’ll be there in half an hour” it’s quite likely we’ll arrive sometime in the next 25 to 35 minutes. But for the context of meeting up with a friend “half an hour” will do. If you said “see you in 27 minutes” that would raise a laugh being an odd level of precision for the given context. The same ideas apply to numbers in journalism.

In the interest of telling your audience something as clearly and simply as possible less is more. Leave out as much detail as you can. Consider the comparison of two populations given above: 346,000 and 138,000. This could have been given as 345,983 and 138,031. But it’s much quicker and easier to make a comparison between the simplified numbers. Precision down to the last person just isn’t relevant in this context.

Another way of saying this is that you only need to continue to include figures up until the point they’re no longer needed to distinguish the things you’re comparing. Take for example the finishing times of the athletes in the last World Championship men’s 100m final: 9.92, 10.08, 10.09, 10.19, 10.26, 10.27, 10.95 (Usain Bolt was the other one but he was disqualified). There is enough detail to let us put them in a winning order. So a further figure added to the end isn’t needed. But if the times had been given with one less figure then 10.08 and 10.09 both would have been reported at 10.1 and you’d not have had a winning order.

This next point might sound obvious but don’t report numbers to a more precise degree than the source material you’re working from. The temptation to do this comes from the fact that your calculator screen will present back to you as many figures as it has got room for. However they are just the product of an inanimate object performing a mathematical process and it doesn’t know or care how precise the input numbers were. You do know how precise the input numbers were, so the responsibility lies with you to take notice only of the first few figures if all you that you input were numbers with only a few figures. An example: the average finishing time from the 100m final above. My calculator says 10.25142857 seconds (because its screen can display 10 figures). I would report this as 10.25 seconds, the same level of precision as the input numbers.

That calculation is slightly misleading given that the input numbers for it are highly accurate having been recorded on cutting-edge precision instruments. Never forget that economic data and population estimates are much more woolly, as are numbers that have already been simplified, for example the 346,000 and 138,000 population figures given above. You have two options when performing further calculations on these type of less precise numbers. Either go back and find the raw figures they came from or reduce the level of precision of the answer you get to reflect the fact the numbers you input into the calculation were already simplified. 10.3 seconds would therefore do for 100m finishing time average.

Maintaining a consistent level of precision across all the figures quoted in a single table or story means you are comparing apples with apples. So if you quote a percentage share of 76% black, then don’t quote 18.1% white, 2.8% hispanic, 2% asian and 0.8% other. Keep them all the same. In practice I think it is the zeros combined with a decimal point that confuse people and lead to the comparing of apples and pears: ‘zero-confusion’.

Three recent examples of 'zero-confusion' from The Guardian, Information is Beautiful, BBC

Often people think that after a decimal point the zero has no meaning. When they see 2.0 the 0 gets dropped. This is wrong. The 0 in 2.0 is as important as the 4 or 7 is in 2.4 or 2.7. Another way of looking at it is to pretend you were looking at 20, 24 and 27. You wouldn’t drop the 0 from 20. It’s what makes it 20 and not 24 or 27 or twenty-anything-else.

Comparing this ‘zero-confusion’ with my “see you in half an hour” example, it’s a bit like a bunch of friends all letting you know when they are going to arrive: “Stuck on tube, still 20 mins away”, “There in 10 mins”, “I’ll be there in 6 mins and 45 seconds”. Two give a sensible level of precision, and the last one is a bit strange (possibly better suited to the context of boiling eggs?). There’s nothing wrong with mixing up how precise you’re being, you just wouldn’t do it. There’s no reason to do it. It’s out of place. And ultimately it just trips your audience up.