English |  Español |  Français |  Italiano |  Português |  Русский |  Shqip

Spending Data Handbook

Countering arguments against publishing data

Everyone in the spending data community has stories to share about struggling with government officials for transactional spending data in machine-readable format. Often publishers simply do not know that civil society wants data in a particular format, but there are also deliberate obstructions.

In this FAQ, we provide a list of the most typical excuses for refusing to release data in computer-friendly formats – and a set of compelling replies.

Arguments against publishing data

… in machine-readable format

"PDFs are on my computer, therefore they are machine-readable"

False. The fact that they are on your computer means they are electronic copies but not that they are machine-readable.

PDFs are essentially a set of instructions for a printer on how to print a page. They look nice and appealing to the human eye, but to a computer, they are little more than a picture. Image files are not machine-readable for the same reason.

From the perspective of someone trying to do data work, there are better and worse PDFs. Better PDFs are machine-generated, typically something like an Excel or Structured Word Document converted into a PDF (example). Often, you can copy and paste information from them, although there may be some formatting problems or other issues. Worse PDFs are scanned documents. Often, to add to the misery, they will be copies of faxes, smudged, speckled, tea- water- or mould-stained or crooked—sometimes all of the above.

"If we publish in machine-readable and open formats, someone will alter the data and use it to discredit us"

Again, false. If someone wants to use data badly enough, they will use it even if they have to get it out of documents manually. If they have to get it out manually, mistakes can be introduced. Publishing the data in machine-readable format simply allows the user to start working with the data straightaway and prevents accidental alterations.

Our advice would be the following:

  • Publish both machine-readable and non-machine readable formats. We insist on the former for analysis, but the latter can also be useful, e.g. to cross-reference numbers and to provide an easily readable form to read and share reports.
  • Encourage users of the data to show their work. A good data project will usually:
    • Link back to the original source data
    • Link to any modified data, with an explanation of how it was changed, with the calculations to any underlying working clearly visible. When you provide such a clear audit trail, others will be able to replicate your work and ensure that everything was done without errors. In journalism, this is sometimes known as the "nerd box". See the Handbook chapter on cleaning data for more details on tracking data provenance.
    • Offer the data source the chance to comment on calculations from the data in order to clear up misunderstandings.

… in sufficient levels of detail

"We cannot release spending data as it contains personal information"

False. Public authorities holding spending data which includes personal information should not refrain from publishing the data. Instead these authorities should conduct the proper examination and redact personal data accordingly. Workflows can be developed so that this effort is minimal.

We see real risks of local and national governments holding back spending data with this excuse and have therefore provided a guide for public authorities on how to deal with personal information in spending data; see the privacy guide in the appendix.

The current access to data from the EU farm subsidy programme is a clear example of a case where privacy (in this case for farmers) was used as argument to decide a case at the European Court of Justice, which significantly reduced access to data on farm subsidy payments.

"We cannot release spending data due to third parties' confidentiality concerns"

Public authorities should publish information about transactions between them, contractors, and commercial vendors. It is not uncommon, however, that either public officials or commercial contractors will attempt to block releases due to commercial confidentiality of the supplier (the third party).

The argument is most commonly argued when requests are made for actual contracts, but even contracts are often released in full without redactions.

"We cannot release granular data. You can get aggregated expenditures"

Not useful. Access to line-by-line transactional spending data is essential in order to ensure accountability. In order to be able to investigate suppliers and procurement practices, detailed transaction-level spending data is required.

There are currently a few countries who release such data, the UK, US, Brazil and Slovenia being some of the leaders in this field. While they are leaders, there is still work to do there.

We have also noticed that several countries have introduced fairly high disclosure thresholds in relation to their decision to disclose transactional data. Such practises should be challenged, and they remain a serious concern, as large shares of public spending can be hidden below such disclosure thresholds.

Between countries, disclosure thresholds vary widely:

  • United States (federal level): USD 25,000
  • United Kingdom, National: GBP 25,000
  • United Kingdom, Councils: GBP 500 (for spending data), GBP 50,000 (for contracts)
  • Slovenia: No minimum disclosure threshold
  • Greece: No minimum disclosure threshold

Without knowing more about why these levels have been set across countries, it is hard to fathom why they were set or whether they are reasonable. 

Helping the government help itself

Everyone knows it's important for CSOs, journalists, and other citizen groups to work together—you won't bump up against too much resistance to this idea. But what about when you get pushback from the government even after you've built a strong coalition to advocate for better access to data?

One answer you can give them is: "Government, darling, you're only hurting yourself." Whenever the word "transparency" is mentioned, what springs to mind is a bright light being shone into dark corners, exposing the dark secrets of corrupt bureaucrats and inefficient expenditure. But what is often overlooked is that governments also stand to benefit from more transparent publishing practices.

In this section, we begin to present the case for why it's actually in governments' interests to open up their data.

Connecting different levels of government

Simply put: when spending data is open, its circulation within the government itself also necessarily improves.

Successfully publishing open spending data means creating consistent, standardized datasets and sharing them in an accessible way. Governments themselves need this kind of high-quality and accessible data. When sub-national governments have information on national budgeting priorities, it allows them to adjust their own budgets to account for cuts or increases. Many local governments have small staffs and rely on revenue estimates and models at the federal level to estimate their own revenue. Consistent, standardized data-sharing across levels of government would allow them to share models, best practices, and software with each other, instead of custom-building everything from scratch. Publishing open data makes this sharing the default behavior.

It's a little terrifying to think that many budgeting decisions are made almost entirely in the dark. As soon as one executive budget proposal is finalised and published, work often begins on producing the next one. Within governments, those who have to draw up the next year's plan need access to information such as actual quarterly expenditures – and quickly! – in order to work out whether a government department is properly resourced from the outset or is drastically under- or overspending. Opening up spending data has the side effect of providing government officials with this fast, reliable access to the data they need.

Case study: British Columbia

When the province of British Columbia built a data portal, its motivations were primarily:

  • citizen engagement - they wanted citizens to better understand the workings of government
  • innovation - they wanted people to build applications and tools using the data
  • making handovers effective - a large number of the workforce were approaching retirement age, and those in charge wanted to make sure that they handed over the necessary information well in advance

One of the less expected outcomes of their portal was its massive uptake by civil servants themselves. In 2012, approximately one third of all the portal's traffic originated from government computers. The technology enabled faster access to relevant data within government departments, contributing to better collaboration on policies that required fiscal data. There was also an increase of about 20 percent in the number of Freedom of Information (FoI) requests, showing that releasing a small amount of data fuelled wider interest in data.

There has been error in communication with Booktype server. Not sure right now where is the problem.

You should refresh this page.