Reverse Engineering the CBC Storm Centre

The CBC Storm Centre is the de facto place to look online for what’s delayed, closed and cancelled during severe weather on Prince Edward Island.

I was curious to know what’s under the hood of the web app, as it appears to be loading data from a remote source when you open the page, leaving open the notion that there’s some sort of open data lurking in the shadows that one might leverage for other purposes.

Opening the Firefox Developer Tools while the Storm Centre is loading, the first thing you notice is an AJAX request for a JSON resource that describes a Google Spreadsheet:

Screen shot of Firefox Developer Tools showing AJAX loading of a Google Spreadsheet as part of the Storm Centre page load.

Looking at the contents of that file, I find a link to this public Google Spreadsheet, which contains all of the data related to closures for the Storm Centre. For example, there’s a row showing “Private Institute of Hair Design and Aesthetics” is closed:

Screen show of a single row from the Google Spreadsheet.

And, sure enough, that’s what’s rendered into the Storm Centre:

Screen shot of CBC Storm Centre showing the same closed notice, rendered

Also in that original JSON resource there’s a link to an XML version of the same data, wherein the same data can be found, with one entry element per closure:

  <entry>
    <id>https://spreadsheets.google.com/feeds/list/16JACsgpXZkzpQkyzDkzzZxehhtvnnfjbyGY7CDyeq3o/od6/public/basic/dcgjs</id>
    <updated>2017-02-17T17:05:50.230Z</updated>
    <category scheme="http://schemas.google.com/spreadsheets/2006" term="http://schemas.google.com/spreadsheets/2006#list"/>
    <title type="text">Schools</title>
    <content type="text">closurestatus: Closed, name: Private Institute of Hair Design and Aesthetics</content>
    <link href="https://spreadsheets.google.com/feeds/list/16JACsgpXZkzpQkyzDkzzZxehhtvnnfjbyGY7CDyeq3o/od6/public/basic/dcgjs" rel="self" type="application/atom+xml"/>
  </entry>

That XML data has enough structure that it’s possible to use for alternate renderings of the storm closure data.

For example. here’s some hacky PHP that takes reads the XML and transforms it into a simple HTML file:

<?php

$closures = array();
$xml = simplexml_load_file('https://spreadsheets.google.com/feeds/list/16JACsgpXZkzpQkyzDkzzZxehhtvnnfjbyGY7CDyeq3o/od6/public/basic');
foreach($xml->entry as $key => $entry) {
  if ((strpos($entry->content, "closurestatus: Closed") !== false) or
     (strpos($entry->content, "closurestatus: Delay") !== false) or
     (strpos($entry->content, "closurestatus: Cancelled") !== false)) {
    $closures[] = parseEntry($entry->content);
  }
}
array_multisort($closures);
$oldstatus = '';
foreach($closures as $key => $c) {
  if ($c['status'] != $oldstatus) {
    print "<h1>" . $c['status'] . "</h1>";
  }
  print "<h2>" . $c['name'] . "</h2>";
  print "<p>" . $c['notes'] . "</p>";
  $oldstatus = $c['status'];
}

function parseEntry($content) {
  $elements = array();
  preg_match('/closurestatus: (.*), name:/', $content, $matches);
  $elements['status'] = $matches[1];
  if (!preg_match('/name: (.*), closurenotes/', $content, $matches)) {
    preg_match('/name: (.*)$/', $content, $matches);
  }
  $elements['name'] = $matches[1];
  preg_match('/closurenotes: (.*)/', $content, $matches);
  @$elements['notes'] = $matches[1];
  $elements['notes'] = preg_replace("/, configlabel:.*$/", '', $elements['notes']);
  $elements['notes'] = preg_replace("/, configvalue:.*$/", '', $elements['notes']);
  $elements['name'] = preg_replace("/, configlabel:.*$/", '', $elements['name']);
  $elements['name'] = preg_replace("/, configvalue:.*$/", '', $elements['name']);
  $elements['notes'] = preg_replace("/, configinstructions:.*$/", '', $elements['notes']);
  $elements['notes'] = preg_replace("/, configinstructions:.*$/", '', $elements['notes']);
  return $elements;
}

The result looks like this, in part:

Cancelled

Chances Drop In Play in Stratford

Food Safety Course scheduled for Charlottetown today

Will be rescheduled at a later date

Closed

Chances Family Centre Programs (in schools)

French Language School Board

Delay

ACOA office in Ch’town

Delaying opening until 10:30, further announcement by 9

And so on. The reason the PHP is so hacky is because the “content” for each closure isn’t structured data within the XML; it’s just plain text:

<content type="text">closurestatus: Closed, name: Chances Family Centre Programs (in schools) </content>

And so some parsing is required.

But it’s a start.

Comments

Bob Shand's picture
Bob Shand on February 17, 2017 - 20:47 Permalink

Doesn't the open google sheet seem ripe for abuse? Or am I missing something?

Peter Rukavina's picture
Peter Rukavina on February 17, 2017 - 21:30 Permalink

It’s a read-only spreadsheet for the public.