After seeing 8 years of our Netflix streaming activity day-by-day on a calendar, I got curious about what time of day we most often stream, so I used the same raw data I used yesterday to find out.
There’s a date column in the Netflix streaming activity export that looks like this:
1526177708874 1526169219659 1525735831181 1525130481740 1525043053677 1525040614148 1524960752757 1523330480349 1523145136457 1523142542204 1523142413550 1523064942413 1523058300622
That doesn’t look much like a human-readable date, mind you: that’s because it’s a numeric representation of a date, a timestamp that represents numerically the time since January 1, 1970.
Take 1526177708874, for example: if I divide that by 1,000 I get 1526177708, and if I feed that value to DuckDuckGo, which has a handy “convert a Unix timestamp to a date” featurelette, I get Sun May 13 02:15:08 2018 UTC. Which is, indeed, the time I watched the Catalina episode of the TV show Love.
What I really need for my purposes here isn’t specific dates, though: I want the hour of the day. So I paste that date column into a text file, and then load it into a spreadsheet and convert each value to a date with the formula:
=(A2/1000)/86400+date(1970,1,1)
which gives me a new column of dates, which I can format as a time:
1526177708874 02:15 1526169219659 23:53 1525735831181 23:30 1525130481740 23:21 1525043053677 23:04 1525040614148 22:23 1524960752757 00:12 1523330480349 03:21
Those values, though, are UTC times, not local times, so I need to take one more step, and do a blunt force time zone conversion (with no accounting for daylight time):
=B2-4/24
Now I have a third column, with the local time we streamed each episode:
1526177708874 02:15 22:15 1526169219659 23:53 19:53 1525735831181 23:30 19:30 1525130481740 23:21 19:21 1525043053677 23:04 19:04 1525040614148 22:23 18:23 1524960752757 00:12 20:12 1523330480349 03:21 23:21
I copy that column’s contents into a text file, times.txt, and use a text editor and a regular expression search-and-replace to end up with just the hour part of those times.
22 19 19 19 19 18 20 23
With just the hours in times.txt now, I use trusty uniq and awk to prepare a histogram-ready count of streams watch by the hour:
sort times.txt | uniq -c | awk '{ print $2 "," $1}' > histogram.csv
The result is a list of each hour of the day along with the number of Netflix streams we started during that hour, from 2010 to yesterday:
01,1 05,5 06,30 07,61 08,80 09,62 10,50 11,55 12,62 13,74 14,54 15,68 16,125 17,266 18,267 19,357 20,361 21,367 22,242 23,54
I import that into a spreadsheet and create a bar chart to visualize it:
The result maps to my expectations: we mostly watch Netflix in the evening, with the peak activity at 9:00 p.m., which is around the time Oliver’s traditionally gone to bed. We never stream between 1:00 a.m. and 5:00 a.m., and even then I suspect that the early morning streams are either time zone shifts when we watch Netflix while traveling.
Comments