Getting my old posts into Drupal

After I succeeded at getting my taxonomy into Drupal, my next task was to get 10 years worth of blog posts from my old homebrew blogging system into Drupal too.  In this regard the Drupal module Node Import was of great help, but a little massaging on the old posts was required before I actually did the import, for which I wrote a PHP script.

My script was quite simple: it took the existing blog posts from a MySQL table, did the massaging, and output them as a vertical bar-delimited ASCII text file ready for Node Import.  The “massaging” amounted to the following:

  • I stripped the HTML tags from the post titles – just used strip_tags on this field.
  • I escaped the vertical bars inside posts themselves – just changed all instances of | to \|.
  • I created a “Taxonomy” field with a double-vertical bar-delimited list of taxonomy terms – for example Internet||Weblogs||Firefox.

Once my script ran on my archive of posts, I had an ASCII file that looked something like this:

"5487"|"2009-05-01 11:11:34"|"Post Title"|
<p>This is the post text in HTML.</p>"|"Internet||Blogs||Firefox"

The first field is the unique record ID on my old system, which I wanted to preserve in Drupal to make referencing old posts easier. The final step, once I had the ASCII file exported, was to run it through iconv to convert the character encoding from ISO-8859-1, used in my homebrew system’s tables, to the UTF-8 used by Drupal:

iconv -f ISO-8859-1 -t UTF-8 drupal-export.txt > drupal-export.utf8.txt

Now I was ready for Node Import.  In Drupal I went to Administer > Content Management > Import Content, clicked “New Import,” and then walked through the wizardy steps to define the import.  Some note on each of the steps:

  • Step One
    • I selected “Story content type”, as this was the content type I decided to use for blog posts.
  • Step Two
    • Remember that you have to “Browse” for the file, then click “Upload” to actually upload it before you go on to the next step.
  • Step Three
    • Delimiter Separated Values
      • Record Separator Newline
      • Field Seperator Pipe (|)
      • Text Delimiter Double Quote (“)
      • Escape Character Backslash (\)
    • If you make the above selections and then click on the “Reload Page” button, you can see a preview of your import in the “Sample data” section of the page, and can get a quick visual indication of whether you selected properly.
  • Step Four
    • I mapped each of my export file’s fields to the appropriate Drupal field.
    • I’d previously added a CCK field called “Previous Number” to hold the original blog post’s record number.
  • Step Six
    • This is the step where you define the field settings you wanted applied to every post where no value appears for a given field in your export file.
  • Step Seven
    • This is the most useful step of all: it gives you a preview of 5 imported items as they will appear once imported (and you can change “5” to a greater number using the “Number of records to preview” drop-down list at the top).  If posts don’t look right here, then they’re not going to look right when they import, so check the preview carefully for possible glitches.
    • If you find problems that require creating a new version of the ASCII file export, you can just click on the module’s “Back” button (not the browser’s) to go back to Step Two and upload another file; all the other choices you’ve made on on steps are remembered.

It took me about an hour of back-and-forth, looking at the preview of the import in Drupal, making tweaks to my export, uploading another version of the file, previewing again, and so on, to get things working properly; this was mostly working around peculiarities in posts on my old system, and didn’t have much to do with Drupal itself.

Once I was ready to launch the import itself it went quite quickly, and was done in under 30 minutes.  The result: more than 5,000 old blogs posts, mapped to a hierarchical taxonomy, in a new home in Drupal.

Add new comment

Plain text

  • Allowed HTML tags: <b> <i> <em> <strong> <blockquote> <code> <ul> <ol> <li>
  • Lines and paragraphs break automatically.

About This Blog

I am . I am a writer, printer, and curious person.

To learn more about me, read my /nowlook at my bio, read presentations and speeches I’ve written, or get in touch (peter@rukavina.net is the quickest way). You can subscribe to an RSS feed of posts, an RSS feed of comments, or receive a daily digests of posts by email.

Search