Improving exception-management in OCaml

July 2, 2008 § 7 Comments

Short version

Catch me if you can is a small library for OCaml 3.10. The latest release is version 0.2, which you may find here. This library improves management of errors in OCaml. It is released under the LGPL licence. It has been written by David Teller, Arnaud Spiwack, Till Varoquaux and Gabriel Scherer.

Long version

As all languages of the ML family — and most modern languages indeed — OCaml permits the management of exceptional situations using exceptions. This mechanism lets programmer register protected sections of code, as well as exception handlers to handle any exception which may be raised during the execution of a protected section. Whenever an exception is raised, the protected section of code is immediately stopped and the corresponding exception handler is executed instead. In addition, exceptions may convey some information regarding the nature of the exceptional circumstance.

In OCaml, the mechanism is fast, it’s convenient and it’s type-safe, much like the rest of the language (barring any type-unsafe interaction with C). However, a few things are missing. If we consider the rest of the language, exceptions are both heavyweight and clumsy: each exception must be declared before being used and there’s no way to introduce a polymorphic type parameter in the exception. In addition, languages such as Java offer to important features missing in OCaml: automatic case coverage and exception hierarchies. While a nice tool exists to provide case coverage for exceptions in OCaml, this tool is complex and unfortunately unmaintained.

Catch me if you can offers an alternative mechanism, comparable to ML exceptions, to handle errors. In comparison with OCaml’s native exception mechanism, this library adds:

automatic inference of exceptions (i.e. no need to declare your exceptions, unless you want to)
more flexible exceptions (i.e. exceptions may have polymorphic type parameters, constraints, etc.)
hierarchies (i.e. an IOException is a sub-case of Exception and a super-case of NetworkException)
case coverage (i.e. the compiler can tell you if you forgot a case or sometimes if you wrote useless ones)
conditional success handlers (i.e. do something with the result in case of success)
conditional success-and-failure handlers (i.e. “finally”).

To attain this, we replace the mechanism of exceptions by an error monad, we replace exception constructors with polymorphic variants and we introduce a dose of syntactic sugar.

Examples

Expression evaluator

Let’s write a simple expression evaluator for the following set of expressions:

type expr =
  | Value of float
  | Div     of expr * expr
  | Add    of expr * expr
  | Mult   of expr * expr
  | Subs  of expr * expr

These may be evaluated using the following function:

let rec eval = function
 | Value x    -> x
 | Add (x,y) -> eval x +. eval y
 | Mult(x,y)  -> eval x *. eval y
 | Div(x,y)   -> eval x /. eval y
 | Subs(x,y) -> eval x -. eval y

Of course, this function is bound to fail in case of division by zero. While this is expected, there is nothing in the source code — much less in the type of the function — to let us know which exception will be raised in case of division by zero.

An alternative would be to add manual error checking, as follows:

type ('a, 'b) result =
 | Ok of 'a
 | Error of 'b

let rec eval = function
 | Value x    -> OK x
 | Add (x,y) -> (match eval x with
                      | Error e -> Error e
                      | Ok x' ->  match eval y with
                               | Error e -> Error e
                               | Ok y'    -> Ok (x' +. y'))
 | Mult (x,y) -> (match eval x with
                      | Error e -> Error e
                      | Ok x' ->  match eval y with
                               | Error e -> Error e
                               | Ok y'    -> Ok (x' *. y'))
 | Div (x,y) -> (match eval x with
                      | Error e -> Error e
                      | Ok x' ->  match eval y with
                               | Error e -> Error e
                               | Ok y'    -> if y' = 0. then Error "Division by zero"
                                                else              Ok (x' /. y'))
 | Subs (x,y) -> (match eval x with
                      | Error e -> Error e
                      | Ok x' ->  match eval y with
                               | Error e -> Error e
                               | Ok y'    -> Ok (x' -. y'))
(*eval : expr -> (float, string) result*)

After this transformation, the type of the exception appears in the type of eval — here, we used strings, but anything else would have been fine. Of course, the downside is that this is unreadable. Well, what about the following ?

let rec eval = function
 | Value x    -> return x
 | Add (x,y) -> perform with module Error
                        x' <-- eval x;
                        y' <-- eval y;
                        return (x' +. y')
 | Mult (x,y) -> perform with module Error
                        x' <-- eval x;
                        y' <-- eval y;
                        return (x' *. y')
 | Div (x,y) -> perform with module Error
                        x' <-- eval x;
                        y' <-- eval y;
                        if y'=0. then throw "Division by zero"
                        else            return (x' /. y')
 | Subs (x,y) -> perform with module Error
                        x' <-- eval x;
                        y' <-- eval y;
                        return (x' -. y')
(*eval : expr -> (float, string) result*)

This extract uses [our customized version of] Pa_monad (included in the package), which brings syntactic support for monads. While this is more verbose than the original version, it’s also safer, insofar as we can guarantee that exceptions won’t remain uncaught.

Still too long? Then what about using the appropriate operators?

open Error.Operators
let rec eval = function
 | Value x    -> x
 | Add (x,y) -> eval x +. eval y
 | Mult(x,y) -> eval x *. eval y
 | Div(x,y)   -> eval x /. eval y
 | Subs(x,y) -> eval x -. eval y

Except for the module opening, that’s the same thing as our first listing. Just with the added safety.

Throwing, catching and hierarchies

By the way, the type of the result is

(*eval : expr -> (float, [> `Arithmetic of (unit, [> `Div_by_zero of (unit, _) exc ]) exc ]) result*)

That is, eval may either succeed and return a float or fail and return an arithmetic exception, which also turns out to be a division by zero. That’s classes of exceptions.

With our syntactic sugar, raising such an exception is done by

throw (exception Arithmetic (); Div_by_zero ())

Note that we could have put some content instead of (). Note that exceptions are typed as they appear in the code and don’t need to be declared (if you wonder, polymorphic variants are involved in this).

Of course, various kinds of exceptions may be combined, as in the following extract:

match ... with
| 1 -> throw (exception Arithmetic (); Div_by_zero ())
| 2 -> throw (exception Arithmetic (); Overflow "by gosh !")
| 3 -> throw (exception IO file_descr)
| ...
(*
('a, [> `Arithmetic of (unit, [> `Div_by_zero of (unit, _) exc
                                     |   `Overflow of (string, _)   exc]) exc
     |   `IO of (int, [`> ]) exc ]) result*)
*)

While the type of the expression is difficult to read, catching is easy

attempt ... with
 | val s -> (*success*)
 | Arithmetic (); Div_by_zero () -> (*Division by zero*)
 | Arithmetic (); _                   -> (*Other arithmetic*)
 | IO _                                  -> (*Some IO stuff *)
 | finally _                             -> (*Don't forget to close the door*)

This extract introduces three keywords:

attempt is our replacement for try
val is used to pattern-match against the result of a successful evaluation
finally is used to pattern-match against the final result, whether this result was obtained after a successful evaluation or after an exception was raised and handled.

Unbreaking tail-recursion

The following extract is wrong:

let line_count filename =
  let rec loop file count =
  try
    ignore (input_line file);
    loop file (count + 1)
  with
    End_of_file -> count
  in
    loop (open_file filename) 0

Don’t get me wrong, it will compile and run. The problem is that it’s not tail-recursive. In other words, it will be much slower and much more memory-consuming than if exceptions had been ignored. Why ? Because exception End_of_file may have been raised from the next call to loop, so the recursive call cannot be optimized into a non-recursive call. Of course, exceptions can’t be ignored in this extract, as they are required to determine when to stop reading the file. Now, a simple transformation would make the problem go away :

let line_count filename =
  let rec loop file count =
    let should_continue =
    try
      ignore (input_line file);
      true
    with End_of_file -> false
  in
    if should_continue then loop file (count + 1)
    else                    count
  in
    loop (open_file filename) 0

Well, the transformation is simple, but it’s annoying and hard to read. What’s even more annoying is that it’s quite common. With Catch me if you can, we would rather write the following:

let input_line2 x = Error.legacy input_line x

let line_count filename =
  let rec loop file count =
    attempt input_line2 file with
      | val _ -> loop file (count + 1)
      | _     -> count
  in
  loop (open_file filename) 0

In this extract, legacy is a simple manner of wrapping an existing, one-argument, function and convert it to our new exception style. It’s not quite as good as wrapping the function manually and giving it an actual exception, but it’s better than nothing.

All in all, the resulting function line_count is shorter, easier to read, takes less memory and is also faster than the original.

What about performance?

Now, that’s a complex question. Short answer: there’s an acceptable performance loss. False short answer: we wrote a paper on that subject.

Tagged: camlp4, catch, code generation, code rewriting, compiler support, coverage, error, error management, exceptional situation, exceptions, Functional Programming, guarantees, Java, languages, ml, monad, OCaml, performance, preprocessor, Programmation Fonctionnelle, programming, programming languages, rewriting, sml, static analysis, syntactic sugar, try, type-safe, typed, types

§ 7 Responses to Improving exception-management in OCaml

Bartek Ćwikłowski says:

July 3, 2008 at 1:19 am

great stuff! I just have one question: is it possible to write pure (and by pure I mean without using built-in exceptions anywhere in implementation) “catch” function, which would infer appropriate type with one less class of exceptions possible?

Reply
yoric says:

July 3, 2008 at 5:52 am

If there’s a way, I haven’t found it: polymorphic variants, as implemented in OCaml, do not support substraction, i.e. there is no way of expressing type
```
[`Foo of stuff | 'a] -> ['a]
```
.

Reply
Sandro says:

July 4, 2008 at 9:40 pm

I knew an exhaustive statically checked, yet extensible technique would eventually be found for exception handling. I’ve been looking for something like this for a language I’m toying with.

I imagine there are also more optimizations to make this more efficient, ie. monomorphization and more aggressive inlining.

Given monadic encodings of effects and type and effect systems are equivalent, I wonder if extending the language with a throw construct with an effect UNWIND that unwinds the stack to a marker, while passing a variant, could be used to provide native exceptions with these same semantics.

I also looked at coroutines to implement exception handling, and while the memory overhead of additional stacks is a downside, the performance will likely be better than all the thunking and indirection of a naive monadic approach.

I look forward to reading more on this work though!

Reply
yoric says:

July 4, 2008 at 10:41 pm

I imagine there are also more optimizations to make this more efficient, ie. monomorphization and more aggressive inlining.

In this context, I believe that monomorphization would offer very little in terms of performance improvement. As you may have seen in the paper, we already proceed to somewhat aggressive inlining, which leads to vast improvements in some circumstances, minor or non-existent in others.

Given monadic encodings of effects and type and effect systems are equivalent, I wonder if extending the language with a throw construct with an effect UNWIND that unwinds the stack to a marker, while passing a variant, could be used to provide native exceptions with these same semantics.

I’m not sure I’d claim that types and effects and monads are strictly equivalent (i.e. types and effects could presumably apply to complex algebras, while monads are limited by the existing type system). However, yes, all this monadic stuff would become irrelevant in presence of a rich enough types and effects system. That’s something I intend to study, whenever time permits.

I also looked at coroutines to implement exception handling, and while the memory overhead of additional stacks is a downside, the performance will likely be better than all the thunking and indirection of a naive monadic approach.

Interesting. Have you written down something on the subject?

Reply
Sandro says:

July 5, 2008 at 4:15 am

> I’m not sure I’d claim that types and effects and monads are strictly equivalent

Well, I meant the traditional single-effect function type and effect systems. There is a direct correspondence between them [1].

>However, yes, all this monadic stuff would become irrelevant in presence of a rich enough types and effects system. That’s something I intend to study, whenever time permits.

I’m only just starting to grasp type and effects. Is there existing literature on using type and effects with exception handling?

> Interesting. Have you written down something on the [coroutines]?

It’s mainly informal reasoning based on the literature on one-shot continuations and coroutine implementations. Consider a naive implementation based on stack-switching and setjmp/longjmp. In this case, a coroutine is basically a small stack whose top is the jmpbuf structure.

The cost of entering a try-catch block is the allocation of a new stack, a register-save-and-restore, and an indirect return. The new stack benefits from bulk allocation/deallocation ala regions, so the cost is low (one or two pointer updates). Register-save isn’t too bad either. The indirect-return can be costly on current hardware, and probably can’t be helped without some sort of analysis to determine whether the try block is scoped to the enclosing function. This would reduce a coroutine switch to a native exception throw.

Contrast with a monadic implementation, where a number of small thunks are created and must be GC’d, and a number of indirect calls are made, many of which can’t be properly predicted. Your own measurements seem to support these costs.

I had hoped that some aggressive inlining/monomorphization/fusion techniques could eliminate all of the allocations and the indirect calls, which I would think would have brought this type of exception handling close to native exception performance, but perhaps my mental model is missing some costs which aren’t amenable to these optimizations.

That said, I think it’s interesting how far core ML can be pushed with only small orthogonal extensions, like polymorphic variants, instead of adding specific extensions like exceptions.

[1] http://lambda-the-ultimate.org/node/2551

Reply
Paolo Donadeo says:

July 19, 2008 at 5:19 pm

Very interesting work, my compliments for the article too.

Reply
yoric says:

July 19, 2008 at 6:34 pm

Thanks. Unfortunately, the paper was just rejected.

Reply

Il y a du thé renversé au bord de la table