Going Secure. Or All the Images

I’m in the midst of a plan to migrate this site to a secure server – https instead of http – and as part of that plan I need to ferret out all the embedded images that are called from non-secure hosts, so as to avoid mixed-content issues

There are 6,953 pages that make up this site — all the blog posts and “about” pages taken together. This bit of PHP code extracts all URLs of all of the images embedded in the body of all of those pages by directly querying the Drupal node table:

$query = "SELECT entity_id,body_value from field_data_body";
$result = $db7->query($query);
while($row = $result->fetch(PDO::FETCH_ASSOC)) {
  $doc = new DOMDocument();
  @$doc->loadHTML($row['body_value']);
  $imageTags = $doc->getElementsByTagName('img');
  foreach($imageTags as $tag) {
      print $tag->getAttribute('src') . "\n";
  }
}

That script identifies 4,325 images in total, ranging from the good old:

/1by1.gif

to images on hosts like Flickr:

https://farm2.staticflickr.com/1603/23977645632_c8b864d187_c.jpg

Some of these images – like 1by1.gif – are 404, and I’ll need to do some manual corrections to the HTML for those mosts; others, like that Flickr example, are perfectly fine to serve on the new secure site as they’re already hosted on a secure server (note the https in the Flickr URL). But there are a lot of images that are served from non-secure hosts that I control, like:

//ruk.ca/sites/ruk.ca/files/media.ruk.ca/email-keyboard-20160124-120123.png

For images like that, I’ll need to change the URL to either:

//media.ruk.ca/images/email-keyboard-20160124-120123.png

or

https://media.ruk.ca/images/email-keyboard-20160124-120123.png

Of the 4,325 images, 3,587 (83%) are from non-secure hosts, 244 (6%) are from secure hosts, and 494 are relative embeds with no host indicated. It breaks down like this:

  • Flickr (non-secure): 1678 images
  • Flickr (secure): 233 images
  • ruk.ca (non-secure): 1011 images
  • media.ruk.ca (non-secure): 632 images
  • Third-party hosts (non-secure): 255 images
  • Relative embeds without host: 494 images

My plan is to move all of these images to a secure server under my control and then to rewrite the embedded URLs to point there.

As an aside, one of the things I found out while I was under the hooking mucking about with the blog was that I’ve written 1,439,497 words here since 1999. That’s Catch-22 times 8 or Fahrenheit 451 times 31. If only mine were such quality words as those.

Comments

Ton Zijlstra's picture
Ton Zijlstra on January 27, 2016 - 02:54 Permalink

This is good to read. Will think of adding an s to my blog as well. When I first looked at that years ago getting certificates was an expensive and timeconsuming thing. Now much less so, so no impediment anymore. Where / how did you arrange your certificates?

Peter Rukavina's picture
Peter Rukavina on January 27, 2016 - 08:08 Permalink

The key to understanding SSL certificates is that, under the hood, they are all the same: the only thing that differs is the degree of verification the issuer enforces.

So a $199 certificate and a $29 offer the same security functionality, but the $199 issuer will call you and ask for business records, etc. while the $29 issuer will verify you by email.

This is reflected in the way that browsers indicate the security in the address bar, but otherwise had no impact on securing the traffic to and from your site.

The other thing to notice is that many certificate authorities have several brands with different pricing. Verisign, Thawte, Geotrust and RapidSSL are all brands of Symantec, for example.

I bought my certificate wholesale from RapidSSL for $34 (I signed up to be a “partner,” which is free and immediate). I bought the cheapest certificate with the least amount of verification required.

The trickiest part is installing the certificate, especially ensuring that you have the proper intermediate certificate installed for your issuer. Fortunately the quality of documentation has improved a lot in recent years. Let me know if you need any help.

Ton Zijlstra's picture
Ton Zijlstra on January 27, 2016 - 10:49 Permalink

Thank you Peter. For some other domains (on my VPS) I've installed certs before, all the 'lite' versions with mail verification you mention. There's also a few free suppliers out there I found.

Peter Rukavina's picture
Peter Rukavina on January 27, 2016 - 11:11 Permalink

Ultimately, to my mind, the primary consideration, other than price, is how easy it is to request (and, ultimately) renew the certificate, and whether the certificate authority will be around, at least in some form, for the long haul.

Peter Rukavina's picture
Peter Rukavina on January 28, 2016 - 10:11 Permalink

Starting out my first job is the old non-secure Flickr images, with URLs like this:

http://farm6.static.flickr.com/5014/5410942568_cf15fddf45_z.jpg

which is this image, of Premier Ghiz and Oliver:

It turns out to be enough to stick an https on the image and it continues to render properly:

https://farm6.static.flickr.com/5014/5410942568_cf15fddf45_z.jpg

I’ve installed the Scanner module for Drupal and used it to search and replace as follows (checking the Use regular expressions setting):

  • Search: http:\/\/farm(.*).static.flickr.com
  • Replace: https://farm$1.static.flickr.com

That handled 519 instances of non-secure Flickr images. I then did the same thing with a similar pattern:

  • Search: http:\/\/farm(.*).staticflickr.com
  • Replace: https://farm$1.staticflickr.com

This handled a further 233 instances. And then:

  • Search: http:\/\/static.flickr.com
  • Replace: https://static.flickr.com

Which handled 96 images. 

That left me with these URLs for Flickr images, on very old posts like this one:

  • http://photos15.flickr.com/21458537_2ce6489824.jpg
  • http://photos17.flickr.com/21685001_21c1dfcd15_m.jpg
  • http://photos23.flickr.com/25486311_cf6946a304_o.jpg
  • http://photos23.flickr.com/26402898_3f0272a091_m.jpg
  • http://photos22.flickr.com/27217482_6edd3b7687_m.jpg
  • http://photos23.flickr.com/27369653_200b66c4bc_m.jpg
  • http://photos23.flickr.com/28771284_af27352828_m.jpg
  • http://photos22.flickr.com/31937663_179365d0b2_m.jpg
  • http://photos29.flickr.com/38143854_eb8ddb18ac.jpg

That appears to be a URL form that Flickr no longer supports — those non-secure images were broken, and so I needed manually re-embed the (securely served) images into the related posts, a task that Scanner made easier because I could use it to find the affected posts.

So I’ve now handled the 1,678 non-secure embedded Flickr images, and can move on to the other non-secure hosts.

Peter Rukavina's picture
Peter Rukavina on January 28, 2016 - 10:48 Permalink

Moving on to images like this:

http://media.ruk.ca/images/email-keyboard-20160124-120123.png

These images come from the server, still hosted at silverorange, that used to host this site in its pre-Drupal and Drupal 6 days. It has the benefit of having an SFTP server that I’ve wired up to Skitch on my Mac, and so my workflow, even after migration to a new host and Drupal 7, has been to dump images that I want to embed but not load up to Flickr, on that host.

As I don’t have an SSL setup on that media.ruk.ca server, my strategy here is to copy the images to the Drupal 7 “files” directory.

So I used scp to copy the images — there were 2000+ of them, even though only 632 were embedded into posts here, as some of them I linked to rather than embedding, and some of them I embedded elsewhere (Twitter, Facebook, etc.).

I created a new subdirectory in the Drupal 7 files directory (mostly to allow me to treat these files together if there was ever call in the future) and then copied the media.ruk.ca files via scp:

cd sites/ruk.ca/files/media.ruk.ca
scp -r user@media.ruk.ca:/www/htdocs-ruk/images/* .

And so after the copy:

http://media.ruk.ca/images/GirlishUmbrella.png

becomes also:

//ruk.ca/sites/ruk.ca/files/media.ruk.ca/GirlishUmbrella.png

Which means the image will render properly whether or not it’s served from a secure server.

The I used Scanner to change the embed links:

  • Search: http:\/\/media.ruk.ca\/images\/(.*)
  • Replace: //ruk.ca/sites/ruk.ca/files/media.ruk.ca/$1

This handled 343 images. There were a few others I needed to clean up manually, but they’re all handled now.

I’ve updated my Skitch setup to use SFTP to transfer files directly into the Drupal 7 files directory now, so the issue is handled going forward.

Peter Rukavina's picture
Peter Rukavina on January 28, 2016 - 10:57 Permalink

I’m now down to 753 non-secure embedded images.

There was an unfortunate period, between this post in July 2008 and this post in October 2008 when I used Nokia’s Ovi photo-sharing service as the image-embedding source for the blog. Nokia shut down Ovi at some point and I didn’t grab the photos out of Ovi before they did this. 

I’ll have to see if I can find them in the Web Archive or elsewhere.

Peter Rukavina's picture
Peter Rukavina on January 28, 2016 - 11:22 Permalink

The post from Iceland is a good example of why it’s unfortunate that I used Ovi at all.

Fortunately I was able to retrieve the photos, from my local archive and from Flickr, and update the post manually.