Daniel was tracking down what appeared to be a networking problem….
- server reported 113 (No route to host)
- However, an strace did not reveal the networking stack ever returning that.
- On the other side, IP packets were actually received.
- When confronted with mysteries like this, I get suspicious – mainly of (fellow) programmers.
- I suggested a grep through the source code, which revealed return -EHOSTUNREACH;
- Mystery solved, which allowed us to find what was actually going on.
- Don’t just believe or presume the supposed origin of an error.
- Programmers often take shortcuts that cause grief later. I fully appreciate how the above code came about, but I still think it was wrong. Mapping a “similar” situation onto an existing error code is convenient. But when an error occurs, the most important thing is for people to be able to track down what the root cause is. Reporting this error outside of its original context (error code reported by network stack) is clearly unhelpful, it actually misdirects and requires people to essentially waste time to track it down (as above).
- Horay once again for Open Source, which makes it so much easier to figure these things out. While possibly briefly embarrassing for the programmer, more eyes allows code to improve better and faster – and, perhaps, also entices towards better coding practices from the outset (I can hope!).
What do you think?