Although I was deeply involved in the original project to broadcast audio of the Legislative Assembly of Prince Edward Island online (to the point where I was working with Island Tel technicians to run a 2-wire copper circuit from Province House to the Sullivan Building), I was long-gone from the project by the time video broadcast was started, so I’ve no secret insider knowledge of how it all works.
But I’m naturally curious, so here goes.
From the main Video Archives page, when you click on the link for a specific day, you end up loading the same page, but with three parameters. Here are the parameters for April 21, 2016, for example:
file=20160421
number=2
year=2016
The file parameter is self-evident: it’s the year, the month, and the day as YYYYMMDD.
The year also explains itself: it’s YYYY.
The number parameter appears to be either 1 or 2, depending on whether there was just a morning session (1) or whether there was a daytime and an evening session (2) on the given date.
On this page, there’s an instance of JWPlayer that has either one or two playlists referenced, like:
http://198.167.125.144:1935/leg/mp4:20160420A.mp4/playlist.m3u8
for a Wednesday, where the House only sits once, and:
http://198.167.125.144:1935/leg/mp4:20160421A.mp4/playlist.m3u8
http://198.167.125.144:1935/leg/mp4:/20160421B.mp4/playlist.m3u8
for a Thursday, where it sits in the afternoon and the evening.
These M3U8 files are playlists that reference the another M3U8 file; inside they look like this (for April 20, 2016):
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=1371939,CODECS="avc1.77.31,mp4a.40.2",RESOLUTION=640x480
chunklist_w1709954322.m3u8
That last line is the filename of another M3U8 file that contains the filenames of the actual “chunks” of the video, each 10 seconds long:
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:12
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:11.433,
media_w1709954322_0.ts
#EXTINF:10.167,
media_w1709954322_1.ts
#EXTINF:10.166,
media_w1709954322_2.ts
#EXTINF:10.167,
media_w1709954322_3.ts
#EXTINF:10.166,
media_w1709954322_4.ts
...
This means that you can grab video for any 12 second chunk from a URL like this:
http://198.167.125.144:1935/leg/mp4:20160420A.mp4/media_w1709954322_0.ts
where 20160410 is the YYYYMMDD, followed by an A (first sitting of the day) or a B (second sitting of the day), followed by an arbitrary filename with a number at the end that increments for each 10 second chunk.
So, for example, if I want to get 10 seconds of video from April 20, 2016 starting 30 minutes into the morning sitting, I would calculate that 30 minutes contains 180 10-second chucks of video, so the video should be at:
http://198.167.125.144:1935/leg/mp4:20160420A.mp4/media_w1709954322_179.ts
And, sure enough, I can grab that video using FFMPEG:
ffmpeg -i http://198.167.125.144:1935/leg/mp4:20160420A.mp4/media_w1709954322_179.ts 20160420.ts
And if I want to grab a minute of video from that point I can concatenate six chunks together (MPEG transport streams are nice inasmuch as you can freely join them together like this and everything continues to work):
curl -sS http://198.167.125.144:1935/leg/mp4:20160420A.mp4/media_w1709954322_179.ts >> all.ts
curl -sS http://198.167.125.144:1935/leg/mp4:20160420A.mp4/media_w1709954322_180.ts >> all.ts
curl -sS http://198.167.125.144:1935/leg/mp4:20160420A.mp4/media_w1709954322_181.ts >> all.ts
curl -sS http://198.167.125.144:1935/leg/mp4:20160420A.mp4/media_w1709954322_182.ts >> all.ts
curl -sS http://198.167.125.144:1935/leg/mp4:20160420A.mp4/media_w1709954322_183.ts >> all.ts
curl -sS http://198.167.125.144:1935/leg/mp4:20160420A.mp4/media_w1709954322_184.ts >> all.ts
ffmpeg -i all.ts 20160420A-30-minutes-in-1-minute.mp4
This would give me an MP4 file containing one minute of video, the concatenation of 6 chunks of 10 seconds each.
To generalize this, just using BASH, to pull video starting at at a given time for a given duration from a given date, I can do this:
#!/bin/bash
DATESTAMP=$1
curl -Ss http://198.167.125.144:1935/leg/mp4:${DATESTAMP}.mp4/playlist.m3u8 > /tmp/playlist.m3u8
IFS=_ array=(`tail -1 /tmp/playlist.m3u8`)
IFS=. array=(${array[1]})
UNIQUEID="${array[0]}"
START=$(expr $(($2 * 6 - 1)))
DURATION=$(expr $(($3 * 6)))
END=$(expr $(($START + $DURATION)))
echo "Getting video for ${DATESTAMP}"
rm -f /tmp/concatentated-video.ts
while [ ${START} -lt ${END} ]; do
echo "Getting chunk ${START}"
curl -Ss "http://198.167.125.144:1935/leg/mp4:${DATESTAMP}.mp4/media_${UNIQUEID}_${START}.ts" >> /tmp/concatentated-video.ts
let START=START+1
done
echo "Got video; concatenating..."
ffmpeg -loglevel panic \
-i /tmp/concatentated-video.ts \
${DATESTAMP}-${START}-${DURATION}.mp4
I save that as a BASH script called get-video.sh and then make it executable and run it with three parameters:
chmod +x get-video.sh
./get-video.sh 20160414A 30 2
This will grab 2 minutes of video from the afternoon session on April 14, 2016 starting 30 minutes in. Uploading this to YouTube results in this video:
One can imagine that, understanding all this, it should now be trivial to do all sorts of remixing, indexing, visualizing and other useful hijinks with the video archive.
Comments
Alas half of the earlier
Alas half of the earlier videos from pervious years appear to be missing. At one point I planned to do a script to work out which ones... I'm not sure if there is a handy guide to when each session sat
You could probably parse each
You could probably parse each sitting on the Legislative Assembly Journal (http://www.assembly.pe.ca/journal/index.php) to get the days when they sat.
"CLASS=onedayplain" contains the links to the PDF journals, which follow the naming convention "YYYY-MM-DD-journal.pdf", so you could extract the days they sat from that.
Note that the IP address for
Note that the IP address for the Legislative Assembly video server has changed from 198.167.125.144 to 207.34.29.52 so the code above should be updated accordingly.
Add new comment