Annotated Shell Script for Counting Blog Posts by the Week

Ton wants to count the number of blog posts he writes per week. I responded with a shell script, but without any documentation.

Here’s the missing manual.

First, the entire script:

curl -s https://www.zylstra.org/blog/feed/ | \
  grep pubDate | \
  sed -e 's/<pubDate>//g' | \
  sed -e 's/<\/pubDate>//g' | \
  while read -r line ; do
    date -j -f "%a, %d %b %Y %T %z" "$line" "+%V"
  done | \
  sort -n | \
  uniq -c

The basic idea here is “get the RSS feed of Ton’s blog, pull out the publication dates of each post, convert each date to a week number, and then total posts by week number.” As of this writing, the output of the script is:

   5 23
   6 24
   4 25

Which tells me, for example, that this week, which is week 25 of 2018, Ton has written 4 posts so far.

Here’s a line by line breakdown of how the script works.

curl -s https://www.zylstra.org/blog/feed/ | \

Use the curl command, in silent mode (-s) so as to not print a lot of unhelpful progress information, to get Ton’s RSS feed. This returns a chunk of XML, with one element for each of the most recent 15 blog posts, like this:

<item>
    <title></title>
    <link>https://www.zylstra.org/blog/2018/06/4125/</link>
    <comments>https://www.zylstra.org/blog/2018/06/4125/#comments</comments>
    <pubDate>Wed, 06 Jun 2018 11:23:08 +0000</pubDate>
    <dc:creator><![CDATA[Ton Zijlstra]]></dc:creator>
        <category><![CDATA[microblog]]></category>
    <category><![CDATA[blogroll]]></category>
    <category><![CDATA[opml]]></category>
    <guid isPermaLink="false">https://www.zylstra.org/blog/?p=4125</guid>
    <description><![CDATA[Frank Meeuwsen describes 5 blogs he currently enjoys following, and in an aside mentions he now follows 250+ blogs. Quick question for him: do you publish your list of feeds somewhere? My list of about a 100 feeds is on the right hand side.]]></description>
        <content:encoded><![CDATA[<p>Frank Meeuwsen <a href="http://diggingthedigital.com/5-favoriete-blogs/">describes 5 blogs he currently enjoys following</a>, and in an aside mentions he now follows 250+ blogs. Quick question for him: do you publish your list of feeds somewhere? My <a href="https://zylstra.org/tonsrss_june2018.opml">list of about a 100 feeds</a> is on the right hand side.</p>
]]></content:encoded>
      <wfw:commentRss>https://www.zylstra.org/blog/2018/06/4125/feed/</wfw:commentRss>
    <slash:comments>2</slash:comments>
  <post-id xmlns="com-wordpress:feed-additions:1">4125</post-id>  
  </item>

Notice that each post has a pubDate element that identifies the date it was published; that’s the date I need to focus on.

The vertical bar (|) at the end of this first line says “take the output of this command, and provide it as the input for the next command” and the backslash (\) says “this script continues on the next line; we’re not over yet.”

grep pubDate | \

This extracts only the lines that have pubDate in them, resulting in:

		<pubDate>Wed, 20 Jun 2018 09:04:14 +0000</pubDate>
		<pubDate>Tue, 19 Jun 2018 19:41:41 +0000</pubDate>
		<pubDate>Tue, 19 Jun 2018 18:06:01 +0000</pubDate>
		<pubDate>Tue, 19 Jun 2018 10:19:59 +0000</pubDate>
		<pubDate>Sun, 17 Jun 2018 18:31:41 +0000</pubDate>
		<pubDate>Sat, 16 Jun 2018 13:02:18 +0000</pubDate>
		<pubDate>Sat, 16 Jun 2018 08:38:42 +0000</pubDate>
		<pubDate>Fri, 15 Jun 2018 13:26:06 +0000</pubDate>
		<pubDate>Fri, 15 Jun 2018 13:05:11 +0000</pubDate>
		<pubDate>Mon, 11 Jun 2018 06:49:12 +0000</pubDate>
		<pubDate>Sun, 10 Jun 2018 19:29:24 +0000</pubDate>
		<pubDate>Sun, 10 Jun 2018 19:02:20 +0000</pubDate>
		<pubDate>Fri, 08 Jun 2018 10:49:14 +0000</pubDate>
		<pubDate>Thu, 07 Jun 2018 19:11:10 +0000</pubDate>
		<pubDate>Wed, 06 Jun 2018 11:23:08 +0000</pubDate>

Now I just have the dates: this is coming along just fine. Next, I need to remove the and from each line; for clarity, I do this in two steps:

  sed -e 's/<pubDate>//g' | \
  sed -e 's/<\/pubDate>//g' | \

This uses sed, which you can thing of as an “automated text editor,” to search and replace for those two elements and replace them with nothing; the result is:

		Wed, 20 Jun 2018 09:04:14 +0000
		Tue, 19 Jun 2018 19:41:41 +0000
		Tue, 19 Jun 2018 18:06:01 +0000
		Tue, 19 Jun 2018 10:19:59 +0000
		Sun, 17 Jun 2018 18:31:41 +0000
		Sat, 16 Jun 2018 13:02:18 +0000
		Sat, 16 Jun 2018 08:38:42 +0000
		Fri, 15 Jun 2018 13:26:06 +0000
		Fri, 15 Jun 2018 13:05:11 +0000
		Mon, 11 Jun 2018 06:49:12 +0000
		Sun, 10 Jun 2018 19:29:24 +0000
		Sun, 10 Jun 2018 19:02:20 +0000
		Fri, 08 Jun 2018 10:49:14 +0000
		Thu, 07 Jun 2018 19:11:10 +0000
		Wed, 06 Jun 2018 11:23:08 +0000

Next, I need to convert those dates into weeks. Fortunately the date command has an easy way of doing this. Using a while…do…done loop, I convert each line’s date into a week number:

  while read -r line ; do
    date -j -f "%a, %d %b %Y %T %z" "$line" "+%V"
  done 

The key here is that the format string, which follows -f, needs to match the format of the dates I’m converting: each of the placeholders in that string stand for a different part of the date. For example, %a is the abbreviated three-letter name of the day. The +%V is the format string of the output and it represents the week number of the year of the date. The result is this:

25
25
25
25
24
24
24
24
24
24
23
23
23
23
23

That’s the week number of each of the blog posts, in reverse chronological order. I want the weeks to be in chronological order, so I sort them numerically with:

 sort -n | \

To result in:

23
23
23
23
23
24
24
24
24
24
24
25
25
25
25

And, finally, I used a superpower of the uniq command, the -c option, which, says the man page, will “Precede each output line with the count of the number of times the line occurred in the input, followed by a single space.” That’s exactly what I want:

 uniq -c

Put all that together and you get:

   5 23
   6 24
   4 25

Each week number, preceded by the number of blog posts written that week.

Shell scripting is something I’m so happy to have burned into my muscle memory, as I use it every day to accomplish similar feats. The underlying principle of:

Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.

is one that’s served me well, and dovetails nicely with how I approach the world.

Comments

Ton Zijlstra's picture
Ton Zijlstra on June 21, 2018 - 10:25 Permalink

Thank you Peter for this annotation. I noticed that my blogpost you link to did not receive a Webmention.....not sure if it is a case of not sending one from your blog, or not listening from the side of my blog?

Peter Rukavina's picture
Peter Rukavina on June 21, 2018 - 11:08 Permalink

Alas the state of Webmention for Drupal is sad and sorry, and I’ve yet to find a reliable solution to this.

Roland Tanglao's picture
Roland Tanglao on June 26, 2018 - 16:49 Permalink

moreThanOneWayToSkinACat :

This is fabulous and it would be even easier to write such a script if Ton was using a static site instead of WordPress (I think Ton is using WP). Then you could just do ls -l with wc -l in a loop to get the number of posts in a week

Also mysql queries on the WP database would work too, right?