As I alluded to on the motherblog, I was happily renovating my blogroll when I ran into problems with François Nonnenmacher’s blog. Here’s what happened.
It’s all about the ç.
I use NetNewsWire as an RSS newsreader. It occurred to me that because I have all of my RSS feeds handily organized in that tool already, I might just as well use it as the source well from which to draw my blogroll. Because NetNewsWire is AppleScriptable, this should have been an easy task (if you ignore the fact that AppleScript itself is one of those “just enough like regular English so as to be completely incomprehensible to me” programming languages).
The actual process of dumping the data out was easy:
tell application “NetNewsWire” set s to “” set linefeed to “\n” repeat with thisSub in subscriptions set s to s & (is group of thisSub) & linefeed set s to s & (inGroup of thisSub) & linefeed set s to s & (display name of thisSub) & linefeed set s to s & (givenName of thisSub) & linefeed set s to s & (givenDescription of thisSub) & linefeed set s to s & (home URL of thisSub) & linefeed set s to s & (RSS URL of thisSub) & linefeed set s to s & (icon URL of thisSub) & linefeed end repeat end tell set blogroll to “/Users/peter/blogroll.txt” set f to (POSIX path of blogroll) set n to open for access file f with write permission write s to n close access n
Where I ran into problems was that François’ name was output as a Mac OS Roman character with hex value 0x8D.
That’s not a big problem, in the grander scheme of things, as I could have worked up, using this table, a translator that would convert all instances of 0x8D to ç. And done the same for all other accented characters.
But I wanted a cleaner solution.
So I entered the murky world of Unicode and UTF-8 and tried experimenting with AppleScript’s ability to output Unicode text, and Perl’s ability to deal with it.
This led me off on an series of wild goose chases. I’m sure this is all supposed to be easy to figure out, and my difficulties may be that I’m assuming it’s much more difficult than it is. However at this stage, I’m suffering from Unicode fatigue, and need to step away for a minute and understand the issues on a broader level. Joel’s piece helps tremendously.
I’ll report back when I’ve cracked this nut.