Looking back over my wide and varied experience of technical meltdowns, I find myself realizing that when things go wrong, it’s often the DNS that’s at fault.
Of course it never seems like it’s the DNS that’s at fault, so I spend a lot of time debugging lots of other things. But then, once everything else is eliminated as a possible cause, it turns out that faulty DNS is the root of the problem.
Today, for example, I spent a lot of time helping a client figure out why staff on their internal LAN couldn’t access an administrative application on the offsite webserver.
There was nothing obvious on the server itself, nothing on the firewall on either end, and access from everywhere else on earth seemed just fine.
The problem was tricky to nail down because the problem would come and go. Sometimes access for one person would be instant while the person at the next desk would be plagued with constant timeouts.
Things got unplugged, re-plugged, examined and sorted through.
It wasn’t until I compared the output of the excellent Charles tool for one of my (perfectly normal) sessions with the output for one of my client’s (constantly timing out) sessions that the problem became obvious: they were hitting the wrong IP address.
It turns out that the DNS server they have set up on their LAN had a duplicate, incorrect entry for the webserver in place. So a DNS query on the LAN would return two IP addresses, one correct one and the other one that led nowhere.
So some staff were getting the right IP address the first time, and going to town, while other were getting the wrong IP address and then either getting no access at all, or timing out and then eventually getting access after 20 seconds.
We’ve just had the errant entry removed from the DNS, and so far things are swimming along like there was no tomorrow and everyone is happy.
Memories of the hurricane and DNS flood back.
It’s always the DNS…