One of the afflictions we untrained, shoot from the hip, make it up as you go along programmers suffer from is a tendency to get stuck in our ways. If it works, you keep doing it. Learning happens when things don’t work, and there’s seldom enough time to go back and retrofit code for new learning.
This affliction hit me in a big way this week.
My little project of the day was to take a database of 700,000-odd Canadian postal codes and turn them into an ESRI shapefile that I could load into MapInfo. Or, in simpler terms, I needed to plot postal codes on a map.
Using PHP/MapScript, I wrote a PHP script to loop through the database of postal codes, grab the required information — postal code, place name, province, latitude and longitude — and add each, in turn, to a map layer.
And it worked.
Except that it worked really, really slowly.
It started off quickly enough — zooming through the first 10% of the postal code file in a couple of minutes. But once it got more than 50,000 points in its belly, things slowed to a crawl, and adding each point to the map layer was taking 5 to 10 seconds. I left the script running over the weekend, and it was only halfway through the “J” postal codes in Quebec, alphabetically, when I returned on Monday morning.
My immediate suspicion was that the more points I was adding to the map layer, the slower it was to add new ones. So I tweaked the part of the script related to adding points to the map layer. But things still ran just as slowly. I stripped out even more fluff, and made the code even more efficient. Still got bogged down. I switched the code to created ten individual layers, one for each province. No improvement.
This morning, with a clearer head, I though “maybe it has nothing to do with the map layer part of the code — maybe it’s something else.” So I stripped out all of the map-related code, leaving me with, essentially, a script that looped through 700,000 records in a database and displayed its progress. Much to my surprise, the code ran just as slowly as when it was creating map points. So the problem was database related.
To the manual.
I’ve written hundreds of thousands of lines of database-related PHP code over the past 5 or 6 years. And the vast majority of the queries have returned less than a dozen records. Which is far less than 700,000.
Right here in the PHP manual it says, about the mysql_result function I’ve always used to grab results:
When working on large result sets, you should consider using one of the functions that fetch an entire row (specified below). As these functions return the contents of multiple cells in one function call, they’re MUCH quicker than mysql_result(). Also, note that specifying a numeric offset for the field argument is much quicker than specifying a fieldname or tablename.fieldname argument.
So I modified the code to use mysql_fetch_row instead.
And now I can digitize the entire country in about a minute.
Which, assuming the originally script would have ever completed, is about a 300,000 times improvement in speed. Presumably that’s why the word MUCH is capitalized in the PHP manual!
Moral of the story: never make assumptions about where problems lie, never assume you know everything. And read the manual.
Here’s the Canadian Dept. of Foreign Affairs description of the safety and security of a country. Can you identify which country it is (without using Google)?
Crime occurrences are high in many… cities, but are generally concentrated in areas that travellers are unlikely to visit. Travellers, however, should remain vigilant and alert to their surroundings. Full cooperation is recommended when stopped by police. Street crime can spill over into commercial, hotel, and entertainment areas. Racial tensions and poverty occasionally prompt riots; these are usually confined to the poorer districts of major cities, but the violence can spread to central commercial and hotel districts.
Spectrum, the weekly Deutsche Welle science program, has an interesting piece about the use of iris scanners at Nine Zero, a Boston hotel where I’ve stayed before (albeit in the off-off season, when rates are cheap). You can listen here (it’s the second item in the programme, about 7 minutes in).
The piece was produced by Sharon Dempsey, who interviewed me last month at the DNC.
Let’s say you have a MapInfo polygon layer called “Regions,” and a point layer called “Points.” You can get find out which Region each Point is inside using a MapInfo SQL query like:
SELECT Points.id,Regions.id from Points,Regions WHERE Points.Obj WITHIN Regions.Obj
But what if you want to do the opposite: get a list of all of the Points that are outside any Polygon (you might call them “orphans”). You can do this:
SELECT Points.id from Points WHERE not Obj within any (SELECT Obj from Regions)
Cool.
Through our work with YANKEE, The Old Farmer’s Almanac and Elections PEI we’ve had to become adept at understanding the basics of GIS — Geographic Information Systems, or, more generally “maps on computers.” And we’ve had to become adept at finding sources of free GIS data.
The United States is much more enlightened about the public use of government information, and so a lot of U.S. Government generated GIS data is freely available online. Ironically, it’s often possible to get Canadian data for free from the U.S. too. Three excellent starting points are:
- The Map Layers Warehouse of the U.S. National Atlas has a great selection of base layers, running from basics like roads and state lines to magnetic field, zebra mussels infestations and time zones. I’m a particular fan of their Shaded Relief of North America. They make their data available for free download in ESRI Shapefile and SDTS formats.
- The AWIPS Map Database page, from the National Weather Service, is, as you would expect, weather-focused. But they do have some basic layers, like Canadian provincial and territorial boundaries, that aren’t available from the U.S. National Atlas.
- The Canadian GeoBase site is proof that Canada is catching up. In particular I’ve found their National Road Network is a useful resource (although, at least for PEI, it’s currently without street and road names attached; this is coming in a later revision).
Put the data you can get from those three sites together with an open source mapping application like MapServer and you can create powerful mapping applications (like this and this) at no capital cost.
What’s missing from the mix is, alas, an open source GIS editing application — something along the lines of MapInfo or ArcView. There is GRASS, but I have found its complexity so insurmountable as to be almost useless. I’m sure it can be very powerful if you are willing to mount an assault on its learning curve, but that would take the kind of time and dedication I can’t afford.
Even without a GIS app to create new layers, however, you can still, using MapServer and PHP or Perl, programatically create new layers — both points and polygons. Although this isn’t “point and click,” it does make it possible to map your own data on top of others’ base layers.
I’m pretty convinced that things are heating up inside the nexus GIS, GPS, telephony and the web: these tools, and this data, can drive a lot of interesting experimentation in people’s basements.
My friend Stephen has a theory, mentioned here before, that people without children cannot understand what it is like to have children. In other words, all those “hey, you should have kids… it’s great” advertisements that we the childed beam out are effectively “blah blah blah children blah blah” to those without. Stephen would maintain, I think, that some sort of switch gets thrown as soon as you’re caring for children, and that after that switch gets thrown, nothing is ever the same.
I tend to agree. I certainly know that nothing I imagined about having a child bears before bears any resemblance to what it’s actually like. Partly this means that none of the “we’ll never be able to have any fun again” paranoia didn’t pan out. And partly it’s that it isn’t possible to understand the whole “I will jump in front of a train for you” kind of love that child rearing engenders almost instantly.
Put another way, there is a great divide of understanding between people with children and people without. I don’t mean to suggest that people who, through choice or circumstance, don’t rear children are lesser people, simply that there are certain things they can’t sense.
Which is all to say that, after reading this observation from Tom Peters about his own life circumstances, I’m left, mostly, saying “huh?”
I’m pretty sure that he’s describing something substantial and important. And I’m just as sure that from here on the outside of his life looking in, it’s almost completely impossible for me to understand anything about it.
Something has happened to Tom Peters, and as a result he’s standing on the other side of a great divide of understanding. I’m not sure whether he’s capable of communicating where he is to me, or whether I’m even capable of listening.
Last night’s episode of The Amazing Race was, I think, the weakest one I’ve seen. There were a number of things that made it so: the seemingly endless focus on ostrich egg eating, the unusually strong emphasis on intra-team conflict, and, most oddly, the elimination of Charla and Mirna that was compressed into an anti-climactic minute or two at the very end.
The pacing seemed off for the entire episode, and what might have received emphasis didn’t, and vice versa.
Charla and Mirna were, argubly, the stars of Amazing Race 5 to date, for they brought the most interesting relationship, the quickest thinking, and the most “rebounds from certain defeat” to the table. Last night their departure was handled so quickly that it was as if they simply disappeared.
There were some good parts to the episode.
Chip and Kim, probably the most positive players, and seemingly the only ones paying any attention to their surroundings, had a good experience in the “deliver a chair” challenge, and took enough time away from the crazy race pace to make a small connection with the local family they delivered the chair to.
The variety of experiences the teams had with hiring a minibus in Tanzania was also interesting: teams that acted like rich North Americans and waved their money around were met with hostility and delay; teams that went with the flow did better (if I followed the proceedings correctly, one team payed as little as $3 for a ride that cost another team upwards of $100).
Edward Hasbrouck, who provides weekly commentary on each episode on vacation until September 1, so we’ll not hear from him again, I presume, on this or the next two episodes.
First, if we’re supposed to feel all warm and patriotic when Canadian athletes win medals at the Olympics, are we supposed to feel inadequate and useless when they come 11th. Or 43rd?
Second, how should we read this message on the CBC website:
Due to International Olympic Committee (IOC) restrictions regarding the online transmission of Olympic Games coverage, CBC.ca is prohibited from streaming any live or on-demand audio/video files that may include protected Olympic material.
Doesn’t this say, in effect, “all that superfluous local content that we play most of the time is less important than allowing the sporty people to watch divers trying to hit the water at the same time?”
I don’t mean to suggest that the CBC shouldn’t broadcast the Olympics — certainly they are of interest. But these “restrictions” about “protected Olympic material” mean that the thousands of people who use CBC’s streaming audio on the web are left in silence.
The CBC, I think, has a made a deal with the devil.
Plasma has a branding problem.
You’ve got your plasma televisions.
And your plasma rays.
There are plasma gemstones.
And the more ominous sounding Plasma: Fourth State of Matter.
So when our friendly blood collectors across the street at 85 Fitzroy St. recruited me to become a plasma donor, it was hard not to have very Star Trek visions of what this might entail.
(By “recruited,” I mean “I walked in, after seeing the ‘Donor Clinic Today’ sign out front for the 2,000th time, and said “I’d like to give blood.” Before I knew it, I’d been upsold to plasma).
So in 20 minutes I’m heading across the street to plasmanate for the first time.
I’ll let you know how it goes.
Update: It all went fine. People were super nice, the technology behind the “suck out his plasma and give him back his blood” is nifty. It didn’t hurt. Lots of questions about sex with IV drug users and trips to the Congo, but that’s understandable. They’re all super-careful about double- and triple-checking everything. It was a good hour away from telephone, television, cell phone and other stresses. I’ll go again.