Parsing Prince Edward Island Legislation: Understandling Styles

School Act Screen ShotI have a longtime interest in the presentation of Prince Edward Island statutes and regulations online: I worked with the Government of PEI on its website for 8 years, and getting this material online was a significant project that ended up taking many more years than it should have, mostly because of technology challenges – the word processor that was used to maintain the material – that had nothing to do with the Internet. And we had to wait for the PDF file format to emerge as a de facto standard way of distributing complex documents on the web for it to really be feasible.

But there’s a limit to PDFs, especially when it comes to programmatic parsing of documents, and so I have an interest in “beyond the PDF” for distributing statutes and regulations. And, handily enough, I have a test case to use: because of my involvement with the PEI Home and School Federation I have more than just a passing interest in the School Act and its regulations, and I’m interested in ways of presenting it and annotating it that would enliven the document and spread it to a wider audience.

To begin this process I requested a Microsoft Word-formatted copy of the School Act from Legislative Counsel’s office, which they were quick to provide. When I opened this file in OpenOffice.org, however, it was presented to me as a “Read Only” document, meaning that I couldn’t edit it, and I couldn’t see any of its formatting, so I couldn’t understand the way that styles were used in Word to structure it. Fortunately this was quickly resolved by saving it as a native OpenOffice.org document (File | Save As… | ODF Text Document). Once I did this, then the names of the styles in the document were revealed.

So, for example, the definitions are all assigned the “Definitions” style:

School at in OpenOffice.org

Looking in the “Format | Styles and Formatting” tool of OpenOffice.org with the School Act open, the styles listed under “Applied Styles” are as follows:

  • Act Title
  • AmendingSubsection
  • CenteredText
  • Chapter
  • Clause
  • ClauseCont
  • Default
  • Definition
  • DefSidenote
  • Footer
  • Header
  • Part
  • SecSubCont
  • SecSubSidenote
  • Section
  • Subclause
  • Subsection
  • Topic1
  • Topic2

Rearranging that list so that it reflects the hierarchy of the School Act transforms it to:

 

Chapter
ActTitle
  Topic1
    Section
      Definition
  Part
    Topic2
      Section
        Subsection
    Topic2
      Section
        Subsection
          Clause
        SecSubCont

 

The only inconsistency in the document appears to be the use of the “Topic1” style for “PART I” at the beginning of the Act, which should, I think, be assigned style “Part.” But otherwise the styling appears consistent enough to allow for automatic parsing of the document. Which will be my next step.

Comments