uncultured-banner-moon
Back to "Geek Adventures"

Ping Times Keep Rising

This has got to be one the oddest problems I have ever come across.

We were setting up a dedicated line between a client site and ours. We got things the way we wanted on each side. The client was having trouble connecting to us, and as part of debugging, I tried to ping the client's remote server.

The ping did not fail, but it took a very long time for the first ping to come back. Normally, you'd expect a timeout first. And then, the second ping came back, and reported a time twice that of the first! The third, three times as long. Etc.

Well, this was really messed up. The ping time should be more or less constant, certainly not an arithmetic progression. I don't remember what all I tried. I do remember the next big piece of the puzzle. We hooked up a FreeBSD machine in place of my Linux server (the one with the weird pings), and it got perfectly normal pings.

I really beat my head against this one. Was Linux so broken it couldn't handle basic TCP/IP? (No.) Was there a bug in the client's firewall? Our firewall? A bug in the client's TCP/IP stack that was just handled differently by Linux and FreeBSD on our end?

Well, I finally figured it out. I'm not even sure how, but it involved a cross between desperation and intuition.

Linux's ping command (I think it was RedHat 6.X) defaulted to listing hostnames rather than IP addresses. Since I was pinging by IP, it was doing a reverse lookup. It seems that the DNS server for that reverse lookup domain wasn't responding. (This was our client's problem.) Every single ping would try to do a reverse lookup. Since it failed, the next one wouldn't have the info it needed, and try again. So the pings were actually taking the normal amount of time, and going out back-to-back. They weren't waiting for the reverse lookups at all. They were less than a second apart, practically zero in relation to the DNS timeouts. But the reverse lookups wouldn't start until the previous reverse lookup had failed.

So the times were effectively the length of the timeout, times then number of that particular ping. (T - 0 = 1T) for the first, (2T - 2*0 = 2T) for the second, etc.

I did a web search to figure out where to file a bug on this. Turns out, the powers that be had already looked at this issue, and decided that this was correct behaviour. I forget the reasoning, but at least...

On my current Linux system, if you ping by IP address rather than name, it doesn't try to look up the name.


Back to "Geek Adventures"