In a laudable effort toward transparency and open data, the Office of the Third Party (the name bestowed on the Green Party in the Legislative Assembly) has released its expenses in comma-delimited ASCII format. In a blog post about the release, leader Peter Bevan-Baker requested feedback about the release and it seems only appropriate to provide the feedback in the open as well.
The CSV file of the expenses appears to be simple a CSV export of the same spreadsheet used to generate the PDF file of the expenses, and as such it’s intended for human consumption not machine consumption.
Given that one of the benefits offered by open data is new opportunities for data visualization and analysis, machine readability generally trumps human readability (the PDF is fine for humans).
For example, if I wanted to track the expenses over the long term myself, to support longer-term analysis, the first thing I would do is to store them in a database. And to store them in a database in the current form would mean stripping out all of titles and subtitles, subtotals and grand totals intended for humans.
I would also need to convert the “amount” column to be simple decimal numbers, removing the “C$” from each figure so that these figures could be represented as numbers.
The result would look like this (visualized in a spreadsheet):
In this more machine-friendly format, I could simply import the CSV file directly into a database with no additional effort required.
Abbreviations, First Names and Context
What’s the CRRF-NAF 2015 Conference? Who is Pat? Who is Peter? What is the WI (I know the answer, but does everyone? Will people know in 50 years?)
Using acronyms that aren’t generally understood, or only first names, means some of the meaning is lost. Spelling things out ensures everyone can understand, now and in the future.
Why did Pat go go the Cumberland Energy Symposium? And to the CRRF/NAF Conference and the Atlantic Summer Institute?
I don’t doubt that there are good reasons for all three, but it would be nice to know the context for expenses like this, even if it’s just a simple sentence like “Supporting the leader’s work on local energy independence.”
Why are the cell phone bills sometimes more than $200 a month when all-you-can-use cell phone plans are available for less than $60 a month. Maybe there’s a good reason for this; I don’t know (maybe there’s more than one phone? maybe they need to shop around for a better plan?). Explaining expenses that might appear unusual to the lay reader can head off misunderstandings. And, again, it adds helpful context.
Many of these issues can be solved simply by elaboration of the data already there — spelling out abbreviations, adding last names and titles — and in other cases it might be useful to accompany the data with a set of explanatory notes that would add context (explaining conferences, cell phone plans, bank charges, etc.)
Hosting and Preservation
The release of the expenses was done through Peter Bevan-Baker’s own website. I would prefer to find the information under the Office of the Third Party website; I would expect the information there would have a better chance of being archived as part of the records of the Assembly than it would on Peter’s own website. It would also benefit being exposes to the Legislative Assembly’s search feature.
In any case, ensuring there’s a plan for long term preservation of the data is important.
This is a great step forward, and my commentary above is offered in the spirit of “yes, and…” only. I hope the other parties quickly follow suit, and that this results in a sort of “openness arms race” where each tries to best the others in the degree of clarity, context and transparency offered to the public as spending of public funds is reported on.