A taste of OCaml Batteries Included

November 7, 2008 § 14 Comments

I don’t know about you, but I have the feeling that many people are interested by OCaml Batteries Included but don’t dare try it yet, due to the Alpha status and the fact that it’s not available for their favorite Linux distribution yet. Well, it’s probably a healthy level of caution.

Of course, just reading the manual is probably not the best way of getting a feeling of OCaml Batteries Included.

So I’ve decided to take measures. From time to time, I’ll add here a few samples of what you can do with OCaml Batteries Included and how you can do it.

For today, let’s start with displaying the contents of a file. You know, Unix’s cat or MS-DOS’s type.

open System, IO, File

iter (fun x -> copy (open_in x) stdout) (args ())

That’s it. Three lines, one of them blank.

Now, for details: open System, IO, File opens three modules System (which contains all system-related functions, including input/output, file management, etc.), IO (the submodule of System containing all the operations on inputs and outputs), and File (the submodule of System containing all the necessary to open files for reading, writing, etc.).

Let’s move on to the last line. Function iter is defined in module Standard, which means that you don’t need to open any module to be able to use it. This function is a general imperative loop on enumerations. This is the equivalent of loop for-each in some languages. Perhaps I should detail what enumerations are: they are a read-and-forget data structure used pervasively in OCaml Batteries Included, and which replaces streams. By opposition to lists, arrays, etc., enumerations are built lazily and discarded as they are read, which makes them quite convenient for loops, or for accessing possibly huge sets of data — depending on your background, you may think of these either as streams (in OCaml and most languages) or as iterators (in Python, JavaScript and other dynamic languages). Oh, and for functional-minded people, don’t worry, your usual functional loops are available on enumerations, too. You can fold, map or unfold at will.

So, what does iter do? Well, if asked, OCaml will tell you that it has type ('a -> unit) -> 'a Enum.t -> unit. In other words, for any type 'a, this function takes as first argument a function (let’s call it f), as second argument an enumeration of elements of  type 'a and returns nothing. Function f itself should take as argument an element of type 'a and return nothing. In other words, iter takes a function which works on one element of type 'a and turns it into a function which works on a whole enumeration of elements of type 'a. Yep, it’s called a loop.

Before looking at the definition of the function, let’s take a look at the enumeration passed to iter: args (). Well, if we look at the documentation (for instance by using our on-line help), we may read

args(): An enumeration of the arguments passed to this program through the command line.

So, args () is your usual pair argc;, argv (if you come from C) or args [] (if you come from Java). By opposition to Java, you don’t have to always put them in your program, if you don’t use them, and by opposition to both, it’s an enumeration, which makes more sense than an array, since you don’t need to modify them and since you always move forward among the arguments. Still, if you need it as an array, it’s available in a package. No more on this for the moment.

What’s left? Oh, yes, the function. As the code indicates, fun x -> copy (open_in x) stdout) is an anonymous function (also known as a “lambda” in a few languages). This function takes an argument x, the name of a file. Function copy, defined in module IO, takes two arguments, an input (a source of data) and an output (a sink of data), and copies the whole contents of the input into the output. The first argument here is open_in x, that is the result of applying function open_in to argument x. Function open_in, defined in module File, opens a file for reading. The result open_in x is therefore an input which lets us read the contents of file x. The second argument of copy is stdout, that is the standard output, that is the screen. In other words, fun x -> copy (open_in x) stdout) is a function which takes as argument the name of a file, opens that file and prints its contents on the screen. Note that everything is done lazily, so the contents is never completely present in memory. In other words, this works on files of theoretically unlimited length.

Bottom line: this utility reads all the files whose names are given on the command line and prints their content on the screen. In three lines of code.

Note that we could equally have written our utility

open System, IO, File

iter f (args ()) where f x = copy (open_in x) stdout

or

open System, IO, File

let f x = copy (open_in x) stdout;;
iter f (args ())

or, in one line,

iter (fun x -> System.IO.copy (System.File.open_in x) stdout) (args ())

or a number of other variants.

Without Batteries Included, the same code, in OCaml, would have looked like

for i = 1 to Array.length Sys.argv - 1 do
  let file = open_in Sys.args.(i) in
  let ended = ref false in
    while not !ended
      try print_endline (input_line file)
      with End_of_file -> ended := true
    do
done

Not much longer but definitely more complicated. Oh, and for fun, here’s the Java version:

import java.io.*;

public class Demo {
  public static final void main(String args[]) {
    try {
      for(String x : args) {
        BufferedReader reader = new BufferedReader(new FileReader(x));
        for(String read = reader.readLine();
                       read != null;
                       read = reader.readLine())
            System.out.println(read);
      }
    } catch (IOException e) {
    }
  }
}

Note quite as short, note quite as simple and, in my humble opinion, not quite as nice.

§ 14 Responses to A taste of OCaml Batteries Included

  • ChriS says:

    Can you really write “open System, IO, File” (i.e. some preprocessing is enabled by default)? Otherwise “open System open IO open File” is not so much longer…

  • yoric says:

    Yes, you can. As you write, it’s not a big gain, although I personally find this more readable. What is more interesting, though, is that you can write

    open System, IO, File in some_expression

    to evaluate some_expression in a context where modules System, IO and File are (locally) opened. That’s quite useful for local operator overloading, for instance.

  • Jon Harrop says:

    Looks like you’re leaking file handles. Here’s a correct plain OCaml version following yours:

    for i=1 to Array.length Sys.argv - 1 do
      let ch = open_in Sys.argv.(i) in
      try
        while not(eof ch) do
          print_endline(input_line ch)
        done
      with End_of_file -> close_in ch
    done;;
    

    However, I would just use the Stream module:

    for i=1 to Array.length Sys.argv - 1 do
      let ch = open_in Sys.argv.(i) in
      Stream.iter print_char (Stream.of_channel ch);
      close_in ch
    done;;
    
  • Jon Harrop says:

    Except that my “correct” OCaml version used a function that doesn’t exist in OCaml. Here’s what I meant to write:

    for i=1 to Array.length Sys.argv - 1 do
      let ch = open_in Sys.argv.(i) in
      try
        while true do
          print_endline(input_line ch)
        done
      with End_of_file -> close_in ch
    done;;
    
  • yoric says:

    Indeed, your plain OCaml version is better than my plain OCaml version. Note that the Batteries version also semi-leaks file descriptors, by design, as this is a 3-liner program. It’s not a full leak, as file descriptors are recollected at garbage-collection and/or at program close. To ensure timely close of channels, we may rewrite

    open System, IO, File  
       
    iter f (args ()) where  f = with_file_in x (flip copy stdout)
    

    or

    open System, IO, File  
       
    iter f (args ()) where  f x =with_file_in x (fun inp -> copy inp stdout)
    

    For information, according to my measures, your stream-functional Stream version is about 2 to 3 times slower than the stream-functional Batteries version, while your imperative version is about 3 times faster, due to faster input/output. I’m measuring native versions, with Batteries compiled in debug mode.

    The Batteries version is faster than the Stream version due to faster higher-level libraries. The Batteries version is slower than the imperative version as this one uses lower-level libraries, which are much less generic. One difference, among many, is that our I/O infrastructure permits things such as transparent transcoding of text, transparent compression to gzip/decompression from gzip, etc.

  • Jon Harrop says:

    The speed is interesting but I am still concerned with the correctness.

    You say that the file handles are closed when they are garbage collected: so you have implemented this yourself by wrapping them in an object and specifying a finalizer?

    You say that the file handles are closed when the program exits but I do not believe that. Specifically, OCaml is buggy wrt calling finalizers when programs complete (they are often never called) but I do not believe you can rely upon the OS to close them either (IIRC, Windows does not).

    Finally, you may like to study the design of IDisposable and IEnumerable from .NET because they do something similar. The IDisposable interface presents a Dispose function that can be used to clean up a resource deterministically but which is also used as a finalizer when the object is collected. The IEnumerable interface akin to impure streams and can call Dispose() on a handle when traversal of the stream is complete. F# provides a “use” equivalent of “let” that calls Dispose() automatically when the value goes out of scope.

    The transcoding and decompression of streams on-the-fly is a fantastic idea.

  • Doh says:

    Now here goes the PHP version:

    array_shift($argv);
    
    foreach ($argv as $file_name) echo file_get_contents($file_name);
    

    Beat the clarity of that.

  • yoric says:

    You say that the file handles are closed when they are garbage collected: so you have implemented this yourself by wrapping them in an object and specifying a finalizer?

    File handles (and more generally inputs and outputs, which do not need to map to file handles), are closed:

    when they are garbage collected
    when the program ends
    when leaving the scope of a higher-order function such as with_file_in
    for input from files, when the end of file has been reached (unless the corresponding option has been chosen)
    for an output wrapping an underlying output (e.g. transcoding), when the underlying output has been closed
    for an input wrapping an underlying input (e.g. transcoding), when the underlying input has been closed
    when the user closes the handle manually

    Hopefully, this should cover all situations.

    Specifically, OCaml is buggy wrt calling finalizers when programs complete (they are often never called).

    That’s actually not a bug, it’s part of the definition of finalization — I don’t know any single programming language in which finalizers are guaranteed to be called.

    F# provides a “use” equivalent of “let” that calls Dispose() automatically when the value goes out of scope.

    I’ve been working on something similar. For the moment, we’re using higher-order functions for this purpose, but something more complex and more robust may be added to Batteries at some point, under the guise of a Camlp4 syntax extension.

    The transcoding and decompression of streams on-the-fly is a fantastic idea.

    Thanks. Not ours, to be true, but thanks 🙂

  • zheng says:

    You may also consider Camlish:

    open Camlish;;
    
    Array.iteri
      (fun i s -> if i  0 then !! {(cmd "cat") with pin = pin_file s})
      Sys.argv
    

    and the pout can be redirected into various OCaml data structure include stream.

  • yoric says:

    Beat the clarity of that.

    I’m not very familiar with php, but if my recollections are correct, the actual complete program is

    #!/usr/bin/php
    <?php
    array_shift($argv);
    
    foreach ($argv as $file_name) echo file_get_contents($file_name);
    ?>
    

    which does lose some clarity. However, even then, I grant you that it remains clear. Something which is less clear to me is whether/when file descriptors are actually closed. Could you answer that question? The manual of php couldn’t help me there.

    Note that, when it works, your php version seems faster than the Batteries version and slower than the low-level OCaml version. However, when tested against a 500Mb file, the result is

    Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 742842369 bytes) in /tmp/test.php on line 5

    So I’m afraid that your php version, no matter how clear and how fast, is not really a good competition for either of the OCaml versions.

    For information, if you prefer foreach, with OCaml Batteries Included, we could have written similarly

    open System, IO, File   
    
    foreach (args ()) (fun file_name -> print_all (open_in file_name))
    

    However, none of this is the point. The programming language community is quite aware that a few dynamic languages (I’m thinking of Python, Ruby and Php) have extensive libraries which permit the development of simple tools in a matter of a few lines of code. This batteries included approach is probably the biggest reason behind the success of these languages.

    On the other side of the spectrum, we have functional languages such as OCaml or Haskell, both of which are sometimes erroneously described as dynamic, because they can do most of the interesting things which are possible in Python, Ruby and Php, as well as plenty which aren’t (I’m thinking type-safety and pattern-matching, but also functors, type-classes, local modules, static analysis of exception-safety, provably correct code, compile-time code generation, embedded domain-specific languages, syntax customization, etc.). Despite their many qualities, these languages have never been taken seriously in large part because of the lack of a library which would actually make these languages useful.

    So the point is the following: if we can write small programs in OCaml which are nearly as concise as their counterparts in Python, Ruby or Php, and if the library scales up to large programs, then we have achieved our objectives. Because, when we reach this point, we will have most of what makes Python, Ruby or Php so attractive, and plenty of things that these languages are missing.

    We certainly haven’t reached that point, mind you, but we’re getting there. The many versions of cat which appear on this page are examples of what we can do now. And I believe they already show large progress.

  • yoric says:

    You may also consider Camlish:

    Well, the point was to reimplement cat, not to use the existing one 🙂

    Btw, with Camlish + Batteries Included, you get the slightly simpler

    open Camlish;;
    
    iter (fun s -> !! {(cmd "cat") with pin = pin_file s}) (args())
    
  • […] A taste of OCaml Batteries Included I don’t know about you, but I have the feeling that many people are interested by OCaml Batteries Included but […] […]

  • e tate says:

    or in one line of lisp. 🙂

    note that you can use any function you like as the read fn.

    (iter (for ln in-file x using #’read-line) (format t “~a” ln))

  • […] Additionally, the OCaml Batteries Included project was created as an attempt to bundle a standard set of commonly-used library together with the language core. Even if this project is still in alpha stage, it definitely looks promising. […]

Leave a comment

What’s this?

You are currently reading A taste of OCaml Batteries Included at Il y a du thé renversé au bord de la table.

meta