# Working with numbers

It was three ‘zero-confused’ examples from last week – see bottom of this post – that got me thinking about tackling this topic. The result is wordy, but I hope helpful. Feedback welcome, I’d love to nail down some good guidance on this topic.

This table used in an earlier post – that compares who was and wasn’t affected by the New Orleans flooding – is a good example of how to present numbers with the appropriate amount of detail for the context they are being used in. Populations are given as 346,000 and 138,000, percentage breakdowns as 76%, 18% or 2%: enough detail for the desired comparison, and a consistent level of detail throughout.

This sort of simple and confident handling of numbers is deceptively hard to achieve, especially if you’re someone who shudders at the mention of ‘significant figures’, ’rounding’, ‘decimal places’. While they are the names for the tools that enable us to deal sensibly with the presentation of numbers I know their very mention can transport people straight back to their school maths class which, by and large, they were hating every minute of.

So maybe looking at the principles behind why it’s worth it is a better approach: clarity, context, precision, relevance, audience rather than those maths-y labels. Here goes…

When we say on the phone “I’ll be there in half an hour” it’s quite likely we’ll arrive sometime in the next 25 to 35 minutes. But for the context of meeting up with a friend “half an hour” will do. If you said “see you in 27 minutes” that would raise a laugh being an odd level of precision for the given context. The same ideas apply to numbers in journalism.

In the interest of telling your audience something as clearly and simply as possible less is more. **Leave out as much detail as you can**. Consider the comparison of two populations given above: 346,000 and 138,000. This could have been given as 345,983 and 138,031. But it’s much quicker and easier to make a comparison between the simplified numbers. Precision down to the last person just isn’t relevant in this context.

Another way of saying this is that you only need to continue to **include figures up until the point they’re no longer needed** to distinguish the things you’re comparing. Take for example the finishing times of the athletes in the last World Championship men’s 100m final: 9.92, 10.08, 10.09, 10.19, 10.26, 10.27, 10.95 (Usain Bolt was the other one but he was disqualified). There is enough detail to let us put them in a winning order. So a further figure added to the end isn’t needed. But if the times had been given with one less figure then 10.08 and 10.09 both would have been reported at 10.1 and you’d not have had a winning order.

This next point might sound obvious but **don’t report numbers to a more precise degree than the source** material you’re working from. The temptation to do this comes from the fact that your calculator screen will present back to you as many figures as it has got room for. However they are just the product of an inanimate object performing a mathematical process and it doesn’t know or care how precise the input numbers were. You do know how precise the input numbers were, so the responsibility lies with you to take notice only of the first few figures if all you that you input were numbers with only a few figures. An example: the average finishing time from the 100m final above. My calculator says 10.25142857 seconds (because its screen can display 10 figures). I would report this as 10.25 seconds, the same level of precision as the input numbers.

That calculation is slightly misleading given that the input numbers for it are highly accurate having been recorded on cutting-edge precision instruments. Never forget that **economic data and population estimates are much more woolly**, as are numbers that have already been simplified, for example the 346,000 and 138,000 population figures given above. You have two options when performing further calculations on these type of less precise numbers. Either go back and **find the raw figures** they came from **or reduce the level of precision of the answer** you get to reflect the fact the numbers you input into the calculation were already simplified. 10.3 seconds would therefore do for 100m finishing time average.

Maintaining a **consistent level of precision** across all the figures quoted in a single table or story means you are comparing apples with apples. So if you quote a percentage share of 76% black, then don’t quote 18.1% white, 2.8% hispanic, 2% asian and 0.8% other. Keep them all the same. In practice I think it is the zeros combined with a decimal point that confuse people and lead to the comparing of apples and pears: ‘zero-confusion’.

Often people think that after a decimal point the zero has no meaning. When they see 2.0 the 0 gets dropped. This is wrong. The 0 in 2.0 is as important as the 4 or 7 is in 2.4 or 2.7. Another way of looking at it is to pretend you were looking at 20, 24 and 27. You wouldn’t drop the 0 from 20. It’s what makes it 20 and not 24 or 27 or twenty-anything-else.

Comparing this ‘zero-confusion’ with my “see you in half an hour” example, it’s a bit like a bunch of friends all letting you know when they are going to arrive: “Stuck on tube, still 20 mins away”, “There in 10 mins”, “I’ll be there in 6 mins and 45 seconds”. Two give a sensible level of precision, and the last one is a bit strange (possibly better suited to the context of boiling eggs?). There’s nothing wrong with mixing up how precise you’re being, you just wouldn’t do it. There’s no reason to do it. It’s out of place. And ultimately it just trips your audience up.

I said I’d welcome feedback, and I do – obviously I’m not the only person who’s sensitive to these issues! Thank you all for your comments.

I love the line, “leave out as much detail as you can”.

And I generally love this article. My one dissent is not so much against what is said here, but a common misuse of the principle. If you are providing estimates NOT in the text of an analysis, but in something like detailed tables, rounding is not the ideal way to tell data users about the accuracy of your estimate. Particularly if the data users are going to use your data for other analyses.

An example: If your reasonably unbiased estimates of average earnings for two groups are $12,395 and $9,637, it is much more meaningful to take an income ratio of those two numbers rather than of $12,000 and $10,000, even if 2 standard errors > $500. This is even more so if the analysis being done by a data user is going to look at a number of such ratios over time or over different groups.

It is always difficult to prevent misuse of data, but suppressing it is not a good answer.

Helpful article. Thank you.

In the first table the implied decimal points shift to the left when going from double to single digits. All of the decimal points in a column, real or implied, should line up.

[…] Lavorare con i numeri […]

Part of the problem is that common tools don’t support paying attention to and displaying significance. MS Excel does a poor job at this as the easiest choice there is to give all of your numbers the same number of decimal places. The constant-decimal-places idea applies best to money, best I would argue that if you are talking about many millions of dollars in any real situation, you have lost track of counting individual cents and so they don’t really mean anything there either. And as you note, calculators display a lot of digits, leading users to think they mean something.

[…] Pinney goes over the subtle art of working with significant digits: When we say on the phone "I'll be there in half an hour" it’s quite likely we'll arrive sometime […]