I have a longtime interest in the presentation of Prince Edward Island statutes and regulations online: I worked with the Government of PEI on its website for 8 years, and getting this material online was a significant project that ended up taking many more years than it should have, mostly because of technology challenges – the word processor that was used to maintain the material – that had nothing to do with the Internet. And we had to wait for the PDF file format to emerge as a de facto standard way of distributing complex documents on the web for it to really be feasible.
But there’s a limit to PDFs, especially when it comes to programmatic parsing of documents, and so I have an interest in “beyond the PDF” for distributing statutes and regulations. And, handily enough, I have a test case to use: because of my involvement with the PEI Home and School Federation I have more than just a passing interest in the School Act and its regulations, and I’m interested in ways of presenting it and annotating it that would enliven the document and spread it to a wider audience.
To begin this process I requested a Microsoft Word-formatted copy of the School Act from Legislative Counsel’s office, which they were quick to provide. When I opened this file in OpenOffice.org, however, it was presented to me as a “Read Only” document, meaning that I couldn’t edit it, and I couldn’t see any of its formatting, so I couldn’t understand the way that styles were used in Word to structure it. Fortunately this was quickly resolved by saving it as a native OpenOffice.org document (File | Save As… | ODF Text Document). Once I did this, then the names of the styles in the document were revealed.
So, for example, the definitions are all assigned the “Definitions” style:
Looking in the “Format | Styles and Formatting” tool of OpenOffice.org with the School Act open, the styles listed under “Applied Styles” are as follows:
- Act Title
Rearranging that list so that it reflects the hierarchy of the School Act transforms it to:
The only inconsistency in the document appears to be the use of the “Topic1” style for “PART I” at the beginning of the Act, which should, I think, be assigned style “Part.” But otherwise the styling appears consistent enough to allow for automatic parsing of the document. Which will be my next step.