Getting my old taxonomy into Drupal

My old homebrew blogging system used a hierarchical set of categories to tag posts.  It looked something like this:

  • Technology
    • Apple
      • Steve Wozniak
    • Backup
    • Barcodes
      • UPC

And so on.  When I migrated the blog to Drupal, I needed a way of getting this set of categories out of the old system and into Drupal. Here’s how I did it.

First I created a new Drupal Vocabulary (Drupal-speak for “set of categories”) called “Topics” and then I looked in the drupal.vocabulary table to find that the vocabulary had a record number – or vid – of 5.  I noted this for later.

The old system’s categories were stored in a MySQL table that looked like this:

CREATE TABLE `categories` (
  `number` int(11) default NULL,
  `name` char(75) default NULL,
  `parent` int(11) default NULL,

For each category there was a number, a name (like “Apple”) and the number of the parent category, if any.  While there are various ways of importing categories into a Drupal “taxonomy”, none of them exactly fit the bill for me, so I decided to go the “write a PHP script to do it” route.

First I selected all of the categories, in numerical order (because I wanted to make sure that parents appeared before their children):

SELECT number,name,parent from categories order by number

For each of the categories I inserted a record into the Drupal term_data table:

INSERT into drupal.term_data (vid,name) values (5,'" . addslashes($name) . "')"

In then retrieved the tid (or “term ID”) value:

select LAST_INSERT_ID() as tid

and noted this for later use should I need to identify the ID of this category should it be a parent to a child encountered later.  I then also inserted a record into the term_hierarchy table, using a value of either 0 (zero) for categories with no parent, or the tid value for the parent, noted earlier, for this with a parent:

INSERT into drupal.term_hierarchy (tid,parent) values ($tid,$parent)

Put it all together and the code looks something like this:

$query = "SELECT number,name,parent from categories order by number";
$result = MYSQL_QUERY($query);
$howmanyrecords = MYSQL_NUMROWS($result);

$currentrecord = 0 ;

while ($currentrecord < $howmanyrecords) {
    $number = mysql_result($result,$currentrecord,"number");
    $name = mysql_result($result,$currentrecord,"name");
    $parent = mysql_result($result,$currentrecord,"parent");

    $query2 = "INSERT into drupal.term_data (vid,name) values (5,'" . addslashes($name) . "')";
    $result2 = MYSQL_QUERY($query2);

    $query2 = "select LAST_INSERT_ID() as tid";
    $result2 = MYSQL_QUERY($query2);
    $tid = trim(MYSQL_RESULT($result2,$currentrecord2,"tid"));

    $tids[$number] = $tid;

    if ($parent <> '') {
        $parent = $tids[$parent];
    else {
        $parent = 0;

    $query2 = "INSERT into drupal.term_hierarchy (tid,parent) values ($tid,$parent)";
    $result2 = MYSQL_QUERY($query2);

The end result was a 788-term vocabulary, with hierarchy preserved, in Drupal, ready to attach to the posts I would import next.