Improving exception-management in OCaml
July 2, 2008 § 7 Comments
Catch me if you can is a small library for OCaml 3.10. The latest release is version 0.2, which you may find here. This library improves management of errors in OCaml. It is released under the LGPL licence. It has been written by David Teller, Arnaud Spiwack, Till Varoquaux and Gabriel Scherer.
As all languages of the ML family — and most modern languages indeed — OCaml permits the management of exceptional situations using exceptions. This mechanism lets programmer register protected sections of code, as well as exception handlers to handle any exception which may be raised during the execution of a protected section. Whenever an exception is raised, the protected section of code is immediately stopped and the corresponding exception handler is executed instead. In addition, exceptions may convey some information regarding the nature of the exceptional circumstance.
In OCaml, the mechanism is fast, it’s convenient and it’s type-safe, much like the rest of the language (barring any type-unsafe interaction with C). However, a few things are missing. If we consider the rest of the language, exceptions are both heavyweight and clumsy: each exception must be declared before being used and there’s no way to introduce a polymorphic type parameter in the exception. In addition, languages such as Java offer to important features missing in OCaml: automatic case coverage and exception hierarchies. While a nice tool exists to provide case coverage for exceptions in OCaml, this tool is complex and unfortunately unmaintained.
Catch me if you can offers an alternative mechanism, comparable to ML exceptions, to handle errors. In comparison with OCaml’s native exception mechanism, this library adds:
- automatic inference of exceptions (i.e. no need to declare your exceptions, unless you want to)
- more flexible exceptions (i.e. exceptions may have polymorphic type parameters, constraints, etc.)
- hierarchies (i.e. an IOException is a sub-case of Exception and a super-case of NetworkException)
- case coverage (i.e. the compiler can tell you if you forgot a case or sometimes if you wrote useless ones)
- conditional success handlers (i.e. do something with the result in case of success)
- conditional success-and-failure handlers (i.e. “finally”).
To attain this, we replace the mechanism of exceptions by an error monad, we replace exception constructors with polymorphic variants and we introduce a dose of syntactic sugar.
Let’s write a simple expression evaluator for the following set of expressions:
type expr = | Value of float | Div of expr * expr | Add of expr * expr | Mult of expr * expr | Subs of expr * expr
These may be evaluated using the following function:
let rec eval = function | Value x -> x | Add (x,y) -> eval x +. eval y | Mult(x,y) -> eval x *. eval y | Div(x,y) -> eval x /. eval y | Subs(x,y) -> eval x -. eval y
Of course, this function is bound to fail in case of division by zero. While this is expected, there is nothing in the source code — much less in the type of the function — to let us know which exception will be raised in case of division by zero.
An alternative would be to add manual error checking, as follows:
type ('a, 'b) result = | Ok of 'a | Error of 'b let rec eval = function | Value x -> OK x | Add (x,y) -> (match eval x with | Error e -> Error e | Ok x' -> match eval y with | Error e -> Error e | Ok y' -> Ok (x' +. y')) | Mult (x,y) -> (match eval x with | Error e -> Error e | Ok x' -> match eval y with | Error e -> Error e | Ok y' -> Ok (x' *. y')) | Div (x,y) -> (match eval x with | Error e -> Error e | Ok x' -> match eval y with | Error e -> Error e | Ok y' -> if y' = 0. then Error "Division by zero" else Ok (x' /. y')) | Subs (x,y) -> (match eval x with | Error e -> Error e | Ok x' -> match eval y with | Error e -> Error e | Ok y' -> Ok (x' -. y')) (*eval : expr -> (float, string) result*)
After this transformation, the type of the exception appears in the type of
eval — here, we used strings, but anything else would have been fine. Of course, the downside is that this is unreadable. Well, what about the following ?
let rec eval = function | Value x -> return x | Add (x,y) -> perform with module Error x' <-- eval x; y' <-- eval y; return (x' +. y') | Mult (x,y) -> perform with module Error x' <-- eval x; y' <-- eval y; return (x' *. y') | Div (x,y) -> perform with module Error x' <-- eval x; y' <-- eval y; if y'=0. then throw "Division by zero" else return (x' /. y') | Subs (x,y) -> perform with module Error x' <-- eval x; y' <-- eval y; return (x' -. y') (*eval : expr -> (float, string) result*)
This extract uses [our customized version of] Pa_monad (included in the package), which brings syntactic support for monads. While this is more verbose than the original version, it’s also safer, insofar as we can guarantee that exceptions won’t remain uncaught.
Still too long? Then what about using the appropriate operators?
open Error.Operators let rec eval = function | Value x -> x | Add (x,y) -> eval x +. eval y | Mult(x,y) -> eval x *. eval y | Div(x,y) -> eval x /. eval y | Subs(x,y) -> eval x -. eval y
Except for the module opening, that’s the same thing as our first listing. Just with the added safety.
Throwing, catching and hierarchies
By the way, the type of the result is
(*eval : expr -> (float, [> `Arithmetic of (unit, [> `Div_by_zero of (unit, _) exc ]) exc ]) result*)
That is, eval may either succeed and return a float or fail and return an arithmetic exception, which also turns out to be a division by zero. That’s classes of exceptions.
With our syntactic sugar, raising such an exception is done by
throw (exception Arithmetic (); Div_by_zero ())
Note that we could have put some content instead of
(). Note that exceptions are typed as they appear in the code and don’t need to be declared (if you wonder, polymorphic variants are involved in this).
Of course, various kinds of exceptions may be combined, as in the following extract:
match ... with | 1 -> throw (exception Arithmetic (); Div_by_zero ()) | 2 -> throw (exception Arithmetic (); Overflow "by gosh !") | 3 -> throw (exception IO file_descr) | ... (* ('a, [> `Arithmetic of (unit, [> `Div_by_zero of (unit, _) exc | `Overflow of (string, _) exc]) exc | `IO of (int, [`> ]) exc ]) result*) *)
While the type of the expression is difficult to read, catching is easy
attempt ... with | val s -> (*success*) | Arithmetic (); Div_by_zero () -> (*Division by zero*) | Arithmetic (); _ -> (*Other arithmetic*) | IO _ -> (*Some IO stuff *) | finally _ -> (*Don't forget to close the door*)
This extract introduces three keywords:
attemptis our replacement for
valis used to pattern-match against the result of a successful evaluation
finallyis used to pattern-match against the final result, whether this result was obtained after a successful evaluation or after an exception was raised and handled.
The following extract is wrong:
let line_count filename = let rec loop file count = try ignore (input_line file); loop file (count + 1) with End_of_file -> count in loop (open_file filename) 0
Don’t get me wrong, it will compile and run. The problem is that it’s not tail-recursive. In other words, it will be much slower and much more memory-consuming than if exceptions had been ignored. Why ? Because exception End_of_file may have been raised from the next call to loop, so the recursive call cannot be optimized into a non-recursive call. Of course, exceptions can’t be ignored in this extract, as they are required to determine when to stop reading the file. Now, a simple transformation would make the problem go away :
let line_count filename = let rec loop file count = let should_continue = try ignore (input_line file); true with End_of_file -> false in if should_continue then loop file (count + 1) else count in loop (open_file filename) 0
Well, the transformation is simple, but it’s annoying and hard to read. What’s even more annoying is that it’s quite common. With Catch me if you can, we would rather write the following:
let input_line2 x = Error.legacy input_line x let line_count filename = let rec loop file count = attempt input_line2 file with | val _ -> loop file (count + 1) | _ -> count in loop (open_file filename) 0
In this extract, legacy is a simple manner of wrapping an existing, one-argument, function and convert it to our new exception style. It’s not quite as good as wrapping the function manually and giving it an actual exception, but it’s better than nothing.
All in all, the resulting function line_count is shorter, easier to read, takes less memory and is also faster than the original.
What about performance?
Now, that’s a complex question. Short answer: there’s an acceptable performance loss. False short answer: we wrote a paper on that subject.