Earlier in this space, I detailed a problem I ran into with accented characters when trying to set up a system to use NetNewsWire to maintain blogroll for my weblog.
At long last, I’ve cracked the accented character problem, and so, if only to assist others who find themselves in the same boat, I present the final results here.
My goal was to export my list of RSS subscriptions in NetNewsWire to a text file, transfer that text file to my webserver, and then post-process it in PHP, loading it into a MySQL database where I could manipulate it at will, including using it to draw this blogroll page on the fly.
The first step is to use AppleScript to export the subscription list as a text file:
tell application “NetNewsWire” set c to “” set linefeed to “\n” repeat with thisSub in subscriptions set s to “” as Unicode text set s to s & (is group of thisSub) & linefeed set s to s & (inGroup of thisSub) & linefeed set s to s & (display name of thisSub) & linefeed set s to s & (givenName of thisSub) & linefeed set s to s & (givenDescription of thisSub) & linefeed set s to s & (home URL of thisSub) & linefeed set s to s & (RSS URL of thisSub) & linefeed set s to s & (icon URL of thisSub) & linefeed set t to TECConvertText s fromCode “UNICODE-2-0” toCode “ISO-8859-1” set c to c & t end repeat end tell set blogroll to “blogroll.txt” set f to (POSIX path of blogroll) set n to open for access file f with write permission write c to n close access n
This script loops through each of my subscriptions in NetNewsWire, gathers the relevant parts to export as a Unicode string called s, and then converts that string to ISO-8859-1 (aka ISO Latin-1) using a scripting addition called TEC OSAX.
To download and install TEC OSAX, simply follow the download link on this page, and then copy the file called TEX.osax to /Library/ScriptingAdditions (you may need to create that folder if it doesn’t exist already; note that there’s no space in the folder name).
Once the conversion to ISO-8859-1 is complete, the resulting string t is added to a string c which will later be written to a file.
Once all the information is gathered about each subscription, the string c, which contains a linefeed-separated list of attributes for each subscription, is written to a file called blogroll.txt.
This file is then copied to my webserver, using SCP, by a shell script, which then post-processes the file using a PHP script, the important part of which is this:
$string = htmlentities($string);
This line, which appears in the loop that reads in and parses the blogroll text file, converts the accented characters in the ISO-8859-1 character set to HTML entities.
The end result is that the ç that started out in NetNewsWire as a MacRoman 0x8D, gets converted to Unicode U+00E7, then gets converted to the ISO-8859-1 character 0xB8, and finally to the HTML entity ç.
And so accents get preserved and François Nonnenmacher comes out as François Nonnenmacher.
Comments
I’m curious as to why you
I’m curious as to why you didn’t make use of the export-to-OPML functionality that is built in NetNewsWire. Wouldn’t this have cut out a few steps? Couldn’t you have used a PHP-XML module to read the OPML file?
Changing the charset of your
Changing the charset of your blog to utf-8 ?
You will have to do it at some point of time anyway ;-)
I second Steve’s opinion here
I second Steve’s opinion here ;-) I’m doing something similar (less fancy!) to maintain my NNW blogroll:
I export it to OPML, convert it with a (very) simplestyle sheet into a HTML snippet that I then paste into my weblog.
granted, it requires two manual steps each time I update, but that’s less than a minute or two and setting up the whole thing from scratch took me just under 20 minutes ;-)
for a more detailed description, check http://tomster.org/geek/xml/coreblogroll/view
but you know what? I think I’m gonna change my solution to using a modified version of your Applescript anyway… you know why? because it’s cooler to USE ;-)
And another thought: do you
And another thought: do you think it’s possible for your Applescript to preserve the grouping of ones subscription? The OPML export only contains a flat listing of all subscriptions. It would be nice to be able to preserve which Group/Category one has put a particular subscription.
TextEncodingConverter can
TextEncodingConverter can convert directly from MacRoman to ISO 8859-1. You should be able to skip the Unicode step.
Add new comment