Following from yesterday’s resusitation of the Charlottetown Building Permits RSS feed, I decided that it was finally time to get around to seeing if there was enough data locked inside the City’s PDF files to create a map of building permit approvals. It turned out to be not that difficult to do using some open source wrangling. Here’s what I did.
The goal was to take the 219 PDF files I was able to scrape from the City’s Building Permit Approval page that each look like this:
and to pull enough information out about each approval to be able to geocode it. I did this using the excellent pdftotext utility, part of the open source Xpdf package. Doing this:
pdftotext -raw Weekly_approvals_webpage_21_Oct_2011.pdf \ Weekly_approvals_webpage_21_Oct_2011.txt
produces a plain ASCII text file that looks like this:
10-533 335067 402-bld-10 20-Oct-10 3-Oct-11 18-22 Water Street... 10-569 363556 439-BLD-10 17-Nov-10 3-Oct-11 20 Lapthorne Avenue... 11-002 1018274 001-bld-11 4-Jan-11 6-Oct-11 375 Mount Edward Road... 11-136 342436 326-bld-11 26-Aug-11 3-Oct-11 134 Kent Street...
From those files, because the Provincial Property Identification Number — the PID — is always a 6 or 7 digit number, and because such numbers rarely, if ever, appear elsewhere in the files, I was able to pull out the PID for every approval using some PHP:
preg_match('/\d{6,7}/',$line,$matches)
From there I looked up each PID in the freely-available Provincial Civic Address data, leaving me with a CSV file like this:
-63.12688,46.23066,22 WATER ST,"10-533 335067 402-bld-10 20-Oct-10... -63.12606,46.24454,20 LAPTHORN AV,"10-569 363556 439-BLD-10 17-Nov-10... -63.14558,46.27834,375 MOUNT EDWARD RD,"11-002 1018274 001-bld-11 4-Jan-11... -63.12808,46.23572,134 KENT ST,"11-136 342436 326-bld-11 26-Aug-11 3-Oct-11...
This CSV contains geocoded record of the 1,985 building permits I was able to scrape out the PDF files. Finally I used the open source KMLCSV Converter app to convert the CSV file into a mappable KML file and from there it was simply a matter of doing any of:
- Feeding the KML file to Google Maps
- Adapting my PEI Schools Map to show the Building Permits on an CloudMade-drive OpenStreetMap map.
- Opening the KML file in Google Earth
I continue to hope that the City of Charlottetown will eventually release building permit data in an open format so that all the scripery-scrapery required to do this can be eliminated and we can all concentrate on doing interesting things with the data rather than on getting the data in the first place.
Comments
Amazing! has this data been
Amazing! has this data been updated for 2012 by any chance?
Thank you for being so awesome!
Tom
The data hasn’t been updated
The data hasn’t been updated since I first posted, no.
Add new comment