On Error handling

I’m writing a paper on error handling in large-scale software development. It’s taken me quite some time to get my head around some of the issues. I think some of the currently accepted wisdom isn’t, which seems to be proven by the quality of the handling in most software. Here’s one way of looking at the whole exceptions vs. error-codes; checked vs. unchecked exceptions and low-level vs. high-level errors discussion:

There are three kinds of really different error handling in programs:

A subroutine foo calls subroutine bar which detects an error. The possible errors detected by bar should be documented and foo may be able to recover or ignore some of them, based on the specific kind of error. If it can’t, it ll pass the failure upwards.

  • Conclusion 1 You may need to document specific errors and supply additional information to the immediate caller.

Subroutine baz gets the error passed up by foo. It no longer knows about bar, so any documentation or specific error information produced by bar is no longer useful. But since the error didn’t originate in foo, neither is any additional information supplied by foo (see section 4.4 for some relaxations of this claim).

  • Conclusion 2 The error information must support generic handling strategies, like a blind retry.
  • Conclusion 3 Low-level exceptions should not be remapped by upper layers.
The error cannot be corrected by the program, but must be corrected by a human. Since the error was detected by bar, bar has the most specific knowledge of the error and must describe it.

  • Conclusion 4 The routine detecting an error is responsible for describing it in terms understandable to a human, often the user.

It is not always the user who can fix the error, but sometimes it is a system administrator or even the programmer.

  • Conclusion 5
  • The system must be able to tell the user if they should contact another human.

Now the user is told ”Contact your system administrator” . The error description given by the low-level routine is not shown to the user, but must be forwarded to the administrator instead.

  • Conclusion 6 The system must produce different messages for consumption by different humans.

If these three radically different ways of dealing with errors are not kept separate, confusion ensues, and most current arguments in the developer community can be seen as resulting from this confusion. The discussion of the relative merits of checked vs. unchecked exceptions can be seen as not seeing the different requirements mixing cases ImmediateCaller and UpperLayer: it is really great from the point of view of ImmediateCaller that it knows exactly what might go wrong with the function it is calling, but if there is nothing it can do the information is no longer useful to UpperLayer. And since ImmediateCaller is not supposed to remap, it cannot make a reasonable exception specification for it’s own methods.

It should be noted that in only the minority of cases ImmediateCaller can do something useful. C#’s chief architect Anders Hejlsberg seems to agree with this, saying that catches should be rare.

Error codes as error information (as in most system calls, Unix, VMS or Win32) can be adequate for ImmediateCaller, but often insufficient for UpperLayer and completely useless for the Human.

This entry was posted in CS. Bookmark the permalink.

2 Responses to On Error handling

  1. simo says:

    This sentence didn’t really parse, could you please explain more: “The discussion of the relative merits of checked exceptions can be seen as mixing cases ImmediateCaller and UpperLayer. “

  2. Mika says:

    Hmm. I rewrote some of that. The idea is that when you are considering what ImmediateCaller might do, you want to document all the interesting error conditions. To make that documentation as specific as possible, you may consider using checked exceptions, as they make the compiler verify your documentation.

    Now the fault is that there are not that many interesting specific error conditions in well-designed APIs (e.g., you should not catch a file-not-found error when opening a file to create the file, you should use a OpenIfExistsOtherwiseCreate call instead, O_CREAT in unix-speak.). There are many things that may go wrong which merit their own error codes, descriptions or exception classes but even ImmediateCaller is not interested in most of them.

    Once ImmediateCaller is out of the picture, the need for specific exceptions goes away, as UpperLayer won’t be able to do anything reasonable with that specific info. Most of the time the operation just fails or succeeds, and you either roll back or continue. So the necessity to either specify all those low-level errors in your exception specification, or having to map them to something else becomes just an unnecessary burden with no benefits. Especially since mapping them risks losing information.