Working with data

The steps I think should be followed when working with data:
1 – find out as much as you can about the data, from its original source
2 – collect and clean the data
3 – explore the data, get to know it
4 – find some stories
5 – tell some stories

As for the visual? Don’t think of it as one visual. There’ll be visuals used in exploring, different, better suited ones for finding stories, and different, simpler ones for telling.

Don’t be tempted to throw all the data at the audience just to prove the process you’ve been through. It could even be that what you’ve found out is in fact best followed up in words, pictures or video.

Thinking as I write, I suppose the above steps are in fact the steps data journalism involves. And this may result in a data visualisation, but it doesn’t have to.

I’ve been finally prompted to write the above by related talks I heard at The Design of Understanding last week. You’ll find more discussion of them below if you’re interested.

Help with numbers

In this world of all things data, Michael Blastland is a guiding voice worth listening to. In the context of creating data visualisations people often get so caught up in the process of creating the visual that they forget to spend time finding out about the data upon which it is based. Seeing data pinned down by a visualisation has the effect of making the data appear to be concrete. But if you haven’t bothered to find out more about the provenance of said data then in reality the visualisation is likely to be misleading at best, wrong at worst.

The news panel raised the question of which part of the team should provide the interpretation of the data, so by implication also tackle the validity problem. The current scenario is that, in the face of deadlines and a shortage of data handling expertise, everyone tends to bury their head in the sand on this.

By way of example BERG’s Schooloscope project was put in the spotlight. While it’s an unrivalled success in terms of making a dry, dense table of data digestible and friendly, Blastland pointed out that the margins of uncertainty for a large chunk of the data are so wide that they invalidate it. But no-one had any solutions on how to visualise uncertainty. I feel a conference coming on… The Design of Uncertainty?

Accuracy of the data aside, Jack Schulz of BERG’s analogy for working with data was to treat it as a material – only by knowing its properties will you know how best to work with it. This is something I strongly agree with, as per top of this post.