I’m writing a paper on error handling in large-scale software development. It’s taken me quite some time to get my head around some of the issues. I think some of the currently accepted wisdom isn’t, which seems to be proven by the quality of the handling in most software. Here’s one way of looking at the whole exceptions vs. error-codes; checked vs. unchecked exceptions and low-level vs. high-level errors discussion:
There are three kinds of really different error handling in programs:
- A subroutine foo calls subroutine bar which detects an error. The possible errors detected by bar should be documented and foo may be able to recover or ignore some of them, based on the specific kind of error. If it can’t, it ll pass the failure upwards.
- Conclusion 1 You may need to document specific errors and supply additional information to the immediate caller.
- Subroutine baz gets the error passed up by foo. It no longer knows about bar, so any documentation or specific error information produced by bar is no longer useful. But since the error didn’t originate in foo, neither is any additional information supplied by foo (see section 4.4 for some relaxations of this claim).
- Conclusion 2 The error information must support generic handling strategies, like a blind retry.
- Conclusion 3 Low-level exceptions should not be remapped by upper layers.
- The error cannot be corrected by the program, but must be corrected by a human. Since the error was detected by bar, bar has the most specific knowledge of the error and must describe it.
- Conclusion 4 The routine detecting an error is responsible for describing it in terms understandable to a human, often the user.
It is not always the user who can fix the error, but sometimes it is a system administrator or even the programmer.
- Conclusion 5
The system must be able to tell the user if they should contact another human.
Now the user is told ”Contact your system administrator” . The error description given by the low-level routine is not shown to the user, but must be forwarded to the administrator instead.
- Conclusion 6 The system must produce different messages for consumption by different humans.
If these three radically different ways of dealing with errors are not kept separate, confusion ensues, and most current arguments in the developer community can be seen as resulting from this confusion. The discussion of the relative merits of checked vs. unchecked exceptions can be seen as not seeing the different requirements mixing cases ImmediateCaller and UpperLayer: it is really great from the point of view of ImmediateCaller that it knows exactly what might go wrong with the function it is calling, but if there is nothing it can do the information is no longer useful to UpperLayer. And since ImmediateCaller is not supposed to remap, it cannot make a reasonable exception specification for it’s own methods.
It should be noted that in only the minority of cases ImmediateCaller can do something useful. C#’s chief architect Anders Hejlsberg seems to agree with this, saying that catches should be rare.
Error codes as error information (as in most system calls, Unix, VMS or Win32) can be adequate for ImmediateCaller, but often insufficient for UpperLayer and completely useless for the Human.