What to ask for
What to ask for: a checklist
In the chaper "Getting Data" of the next section of the Handbook, we will deal with asking governments for data (or getting it by other means). To set the scene for this and to work out whether your government actually publishes usable data already, have a quick look at the following questions:
- Is the government's data published in a machine-readable format? E.g. CSV, XML, JSON. While there is nothing wrong with publishing a PDF to support a data release – in fact, it is often nice to have a nicely formatted document to cross-reference and sanity-check data – it shouldn't be the only thing published. If you are asking for a policy document, ask for the underlying data in a spreadsheet so you can check the numbers.
- Does the government publish a 'data dictionary' to explain the terms used in the dataset? This should include definitions of column headers, explanations of terms and ranges used within the main body of the data, and explanations of any changes in terminology which have been introduced since last time the dataset was released.
- How is the data that is being published actually used internally by governments? Do some sanity checks on the minimum and maximum values of different columns to make sure they fall into the documented ranges and don't seem out of place. Do you see negative values when you don't think you should? Negative values usually mean money owed.
- Is the structure of the data the same across years? If not, is there a description of how it changes? It never hurts to contact the publisher and ask questions about the change and why it occurred. The publisher may have their name and contact details on the report or webpage. If there is no named contact then call the department's enquires number or send a message to their email address asking to meet or discuss the data.
- How aggregated is the data? What is the number of real-world financial transactions that are expressed by a single line of the dataset you have? For budgets this will mostly be hard to tell - but with transactional expenditure you want to make sure that the data is fairly disaggregated. Ideally, each entry represents a transaction - but even if this isn't true you'll still want to ensure the number is not in the tens or hundreds of thousands (e.g. government programmes as a whole).
- Ask for reference data. If your budget or spending data is augmented with reference data, make sure you have access to it. This might include functional or category codes on budget line items, location codes for describing recipient location, or codes that indicate the status of the record.
- Ask also for the guidelines people were given when creating the dataset. This will make it easier to understand what is included within the data, e.g. are the numbers in thousands / millions.
- Final tip: if the data you want is not given then narrow your scope. Your chances of success will be higher if you narrow the scope of the data you're requesting from the government and you are specific. Government is the de facto keeper of all kinds of data, so parameters that narrow your request are always helpful.