I’ve spent a good part of the day fielding calls from reporters about OpenCorporations.org and the changes that have led me to plan its demise — so far I’ve done phone interviews with The Guardian, the Globe and Mail and CBC Radio and I just finished taping a TV interview for [[Compass]]. Some points I’ve tried to make in those conversations:
- Search engines (like Google and Yahoo) index the pages on the Internet by having “robots” that visit all of the pages on the Internet: so Google can tell me where to go for pink ice cream cake because its robot visited this page and this page and this page — and 296,000 others — and found the words “pink ice cream cake” on them and added these words to its index.
- OpenCorporations.org has its own “search robot” that does exactly the same thing, albeit with the robot trained only on the Corporate Register pages.
- There was no robots.txt limitation on the Government of PEI webserver that prevented search engines from indexing the Corporate Register. There still isn’t. That’s where I took my “it’s okay if you index this content” cue.
- For years Google, and other search spiders, have been indexing PEI corporations data: here’s a Google search for ‘Homburg’ that shows this in action. So the changes not only shut off the tap for OpenCorporations, but also for the rest of the web.
- It’s completely within Government’s right to control the indexing of resources on their website and, even if it were possible, I wouldn’t try to circumvent the restrictions they’ve put in place, which clearly telegraph a “don’t index this” intent.
- Freedom of Information and Protection of Privacy legislation was not written to anticipate mashups like OpenCorporations: it’s an open question as to whether Government has a duty anticipate and act against potential remixing. I think the project was valuable if only for the focus it put on issues like this.
- I don’t think that Government acted with the aim of hiding anything, or preventing the lid from blowing off anything: I think they were forced to make an impromptu policy decision based on sudden focus on an unanticipated use of new technology; I happen to think they made the wrong policy decision, but I think their motives are pure.
If nothing else I’ve discovered through this experience that if you create a useful tool that’s especially useful for journalists, they will be especially interested if you have to shut it down. I happen to think that’s a good thing: and I’ve been generally impressed with the journalistic understanding of the subtleties of the story.
Comments
It’s a little sad that the
It’s a little sad that the new “you better not be a robot” prevention mechanism makes the Corporate Register website harder for the vision impaired to get the information.
“but I think their motives
“but I think their motives are pure.”
Wow are you naive. They don’t want the heat anymore than Richard Brown wanted to interrogate civil servants during the Polar hearings but now wants to protect them.
Good story on Compass. The
Good story on Compass. The thought occurred to me: why don’t you leave your site up.
People investigating the PNP scandal are interested in the period you already have. It would be a great service.
It is also worth pointing out
It is also worth pointing out that the technical challenge of “breaking” this captcha is trivial. The government would have been much more reasonable to simply create a Terms of Use that did not allow scraping of the site content.
It amazes me that Provincial Government data is treated so recklessly and so little thought it put in to how to handle and distribute it. Instead of considering the options appropriately the response was just “put on the locks!” with no consideration of accessibility issues or other legitimate uses.
I am sure that captcha must
I am sure that captcha must be breaking some Canadian accessibility regulations. Even if it does not, here are some tools that can help visually impaired people to bypass it.
- For firefox users here is greasemonkey script: http://tnt.goldnet.ca/opencorp…
- For others, here is simple bookmarklet: http://tnt.goldnet.ca/opencorp…
It seems like it works 9 out of 10 times. I did not bother too much with making it perfect - it is just proof of concept and someone else might care to make it better. In that case here is php code it uses: http://tnt.goldnet.ca/opencorp…
For now this is on my testing server and I might move it or remove it as I please so if you really want to use it, make Peter host it himself :)
This episode in governmental
This episode in governmental information management has proved important. What still plagues me is why those who controlled access saw it as urgent. What did they fear would happen if they left it running while they came to a reasoned position.
My fear is that the default position of most government organizations on PEI is ‘We must protect you, even from yourselves.’ So, of course, shut it down and then see if it is safe. Rather than default to free information until we can come up with even a decent rationalization about why it should be shut down.
These are the same ideas that lead to regulations in gas, wages and rents. Though I am pleasantly surprised about the cell phone ban being put on hold.
Add new comment