January 23, 2010 § Leave a Comment
January 27, 2009 § 6 Comments
Or, OCaml is a scripting language, too.
Note: These extracts use the latest version of Batteries, currently available from the git. Barring any accident, this version should be made public within the next few days.
A few days ago, when writing some code for OCaml Batteries Included, I realized that, to properly embed Camomile’s Unicode transcoding module, I would need to manually write 500+ boring lines, all of them looking like:
| `ascii -> Encoding.of_name "ASCII"
The idea behind that pattern matching was to define a type-safe phantom type for text encodings. Upon installation, Camomile generates a directory containing about 540 files, one per text encoding, and it seemed like a good idea to rely upon something less fragile than a string name.
Of course, writing this pattern-matching manually was out of the question: it was boring, error-prone, and while Batteries deserves sacrifices, it doesn’t quite deserve that level of mind-numbing activities. The alternative was to generate both the list of constructors and the pattern-matching code from the contents of the directory. I could have done it with some scripting language but that sounded like a good chance to test-drive the numerous new functions of the String module of Batteries (73 for 28 in the Base Library).
The main program
The structure of the program is easy: read the contents of a directory. For each file, do some treatment on the file name and print the result:
open Shell foreach (files_of argv.(1)) do_something
foreach is the same function as
iter but with its arguments reversed. It’s sometimes much more readable. Instead of reading the contents of a directory with
Shell.files_of, we could just as well have traversed the command-line arguments with
args, or read the lines of standard input using
Actually, we could just as well generalize to a (possibly empty) set of directories. For this purpose, we just need to
map our function
files_of to the enumeration of command-line arguments. This yields an enumeration of enumerations, which we turn into a flat enumeration with
flatten. In my mind, that’s somewhat nicer and more readable than nested loops.
Our main program now looks like:
open Shell, Enum foreach (flatten (map files_of (args ()))) do_something
Or, for those of us who prefer operators to parenthesis:
open Shell, Enum (foreach **> flatten **> map files_of **> args ()) do_something
It’s now time to take a file name and turn it into
- a nice constructor name
- a file name without extension,
That second point is the easiest, so let’s start with it. We have a function
Filename.chop_extension just for this purpose. So, if we were interested only in printing the list of files without their extension, we could define
let do_something x = print_endline (Filename.chop_extension x)
The first point is slightly trickier, as we need to
- remove the extension from the file name (done)
- prepend character
- replace any illicit character with
_(slightly more annoying, I know that the list of illicit characters which may actually appear in my list of files contains
)and whitespaces but I’d rather not go and check manually which other characters may turn out problematic)
- prepend something before names which start with a digit, as digits cannot appear as the first character of an OCaml constructor (a tad annoying, too)
- make everything lowercase, just because it’s nicer (trivial).
Let’s deal with the third item, it’s bound to be central. Let’s see, replacing characters could be done with regular expressions, something I dislike, or with function
String.map. It’s nicer, type-safer, and it has a counterpart
Rope.map for Unicode, if we ever need one. Now, functions
Char.is_digit will help us determine which names are safe. Using them together, we obtain the following function:
open Char let replace s = String.map (fun c -> if is_letter c || is_digit c then c else '_') s
Let’s solve the fourth item on our list. We need to check the first character of a string and to determine whether it’s a digit. Well, we already know how to do this. Let’s call our prefix
let clean_digit p s = if is_digit s. then p^s else s
If we chain up everything, we obtain
let constructor p s = "`" ^ (if is_digit r. then p^r else r) where r = lowercase (String.map (fun c -> if is_letter c || is_digit c then c else '_') s)
I like this
Now that we have both our strings, we just need to be able to combine and print them. For this purpose, Printf is probably the most concise tool. Here, we can just write
let print s1 s2 = Printf.printf " | %s -> %S\n" s1 s2
We could parameterize upon the format used by printf and we’re bound to do this sooner or later, but let’s keep it simple for now.
The complete program
open Shell, Enum foreach (flatten **> map files_of **> args ()) do_something where do_something s = let name = Filename.chop_extension s in Printf.printf " | %s -> %S\n" c name where c = "`" ^ (if Char.is_digit r. then "codemap_"^r else r) where r = lowercase (String.map (fun c -> if Char.is_letter c || Char.is_digit c then c else '_') name)
I don’t know about you but I find this pretty nice, for a type-safe language. I’m sure it would have been possible to make something shorter in Perl or awk, and suggestions are welcome regarding how to improve this but I’m rather happy. And, once again, we’re not trying to beat Python, Perl or awk in concision, just to do something comparably good, because we already beat them by far in speed and safety.
So, what do you think?
November 20, 2008 § Leave a Comment
Well, my previous post on the Hierarchy of OCaml Batteries Included certainly triggered reactions. Essentially, judging from these, the OCaml community doesn’t seem to want of a module hierarchy. So here’s a reworked version of the library layout, without hierarchy. Again, feedback is appreciated and should go to the OCaml mailing-list.
- Standard (automatically opened)
- Monad Interfaces for monadic operations
- Concurrency Interfaces for concurrency operations
I.1.i. Built-in threads
- Threads A module containing aliases to Condition, Event…
- CoThreads as Threads but with implementations coming from coThreads
I.1.iii. Shared memory
- Shm_* Placeholders
III. Mutable containers
IV. Persistent containers
- Numeric Interfaces for number-related stuff
- Safe_float placeholder
- Text Definition of text-related interfaces
- StringText A module containing aliases to String and Char1
- RopeText As StringText but with implementations from Rope and UChar
- UTF8Text As StringText but with implementations from UTF8 and UChar
VI. Distribution-related stuff
VIII. Network (placeholders)
VIII.4. Generic server
- PCRE place-holder
- Date placeholder
- Actually a slightly modified version of Char to match signatures for Latin-1 and Unicode
November 10, 2008 § 10 Comments
note: There seems to have been a WordPress bug. For some reason, the extended release notes on OCaml Batteries Included were replaced by something quite unrelated. My apologies for this.
Dear programmers, I am happy to inform you that the second alpha release of OCaml Batteries Included has landed. You may now download it from the Forge. A GODI package is also available and a Debian package should follow soon (you should be able to find the old one here) and you can read the documentation on-line.
So, what’s new in this release?
October 11, 2008 § 4 Comments
It has landed
The first alpha version of OCaml Batteries Included has landed. It is now available as source code, from the Forge or as a GODI package. You may find the API documentation here and the complete documentation there.
Remember, this is alpha-level code, use at your own risk. Also note that documentation generation is very slow (10+ minute on my laptop), so don’t worry if installation seems to last forever, it’s nearly true but not quite.
Reviews, comments, suggestions and bug reports are particularly welcome. We have a few trackers for these, as well as a forum, so don’t hesitate to use them. We’re also looking for volunteers to give us a hand, so please consider stepping forward.
What is OCaml Batteries Included?
Twenty years ago, a language was just a compiler. You could measure the quality of a language from the beauty of its semantics, the clarity of its syntax, the speed of generated code. That was then. In the meantime, the Java and .Net nuclear plants have been built, while the Python and Ruby communities have gradually developed their own powerhouses. All these platforms have amply demonstrated that a language can only be as beautiful, clear and fast as the libraries which developers are actually going to use for their work. In other words, it’s not about the language anymore, it’s about the platform.
At this point in time, out-of-the-box, OCaml isn’t a usable platform. There is no Unicode, there are no modern user interface toolkits, no distributed programming infrastructures, no network services, no type-safe communications, no analysis of other languages, no interfacing with industrial platforms, no XML, non modern two- or three-dimensional drawing engine, etc. This would not be too much of a problem if OCaml provided an easy way of installing new libraries and of using these libraries once they are installed. This would also not be too much of a problem if OCaml could somehow guarantee that trivial data structures didn’t need to be reinvented by most projects and that communication between libraries happened lawlessly.
So no, out of the box, OCaml isn’t a usable platform. However, no matter what you’re trying to do, chances are that the community has already developed or adapted a tool to make your life easier. Easy installation of OCaml packages is quite possible, if you are using Debian or Ubuntu (apt-get), Fedora or Red Hat (yum) or actually, any Unix-compatible platform, including MacOS X and Windows (GODI). Simple and reliable usage of installed libraries can be done with Findlib. A comprehensive Unicode library is available (Camomile), as well as a modern user interface toolkit or two (LablGtk, OCamlRT), a number of distributed programming infrastructures (OCamlMPI, BSML, Opis, Camlp3l, OCamlNAE), libraries to interface with industrial platforms
(OCamlJava, SpiderCaml …), to analyse other languages (CIL, Dalton/FlowCaml, …), to read or write XML (PXP, Expat, …), to draw in two or three dimensions (Cairo, OpenGL), to test your programs (OUnit), etc. OCaml even offers a built-in tool to customize the language itself (Camlp4).
What’s missing? A few things. For now, not all important libraries are available as simple-to-install packages. That problem is being addressed by the devoted packagers of Debian, Fedora and GODI, while the possibility exists that the recently announced Symbiosis will also help address the issue. The other missing part is a standard set of libraries, language extensions and data structures which developers could be assured to find on every OCaml installation, and which would let them write programs without having to endlessly reinvent the same basic wheels, and without spending their time writing adapters between libraries which should work well together but don’t use the same conventions. OCaml Batteries Included is one possible answer to this problem.
OCaml Batteries Included consists in
- A core set of libraries, designed to define the basic standard data structures. This set, largely based on both the Base library of OCaml and ExtLib, extends the basic strings, arrays, lists… provided with OCaml and introduces numerous data structures, including enumerations, lazy lists, extendable inputs
and outputs, dynamic arrays, unicode ropes…
- A uniformization layer, the glue, on top of chosen existing libraries. Note that we are not forking any of these libraries, only providing an additional layer on top of them. The libraries may be installed manually by the user or, preferably, by automatic dependency resolution thanks to apt-get, yum, GODI or some other packaging tool. The uniformization layer serves to guarantee that libraries play together nicely, that only one manner of reading from a file or writing to the output is necessary, that every data provided by a library may be decoded by another, etc.
- Additional documentation on whichever library for which we provide glue, including both low-level documentation (“what the heck does this function do?”), mid-level documentation (“why should I use that function?”) and high-level documentation (“ok,I’m new here, where do I start?”)
- A handful of language extensions, provided as Camlp4 modules, to solve common issues, automatically generate boilerplate code, improve readability, …
- In the future, possibly a set of external tools.
- Build tools to make all of this transparent.
- And, of course, a logo.
OCaml Batteries Included is a project maintained by the community, which means that it depends on you. If you have ideas, suggestions, complaints, bug reports, if you want to participate, to write code, documentation, tutorials, build tools, review code, have a word in policies, if you want a package to be included in the Batteries, or just to contact us, please visit our website and take advantage of our bug trackers, request for features trackers, forums, etc. (courtesy of OCaml Core). We can also often be seen on irc, on server freenode, channel #ocaml and, of course on the OCaml Mailing-List.
OCaml Batteries Included is also a work-in-progress. The version you have in front of your eyes does not contain everything we want to put in it, nor even all the code we have written for it. But we’ll get there. We intend to integrate additional libraries progressively, from minor release to minor release, with major milestones approximately twice per year. This policy is not written in stone and is largely subject to debate, so don’t hesitate to comment on the subject.
For more informations on the contents of OCaml Batteries Included, follow us towards the manual.
Relations to other libraries
Project Gallium’s Base Library
First, a word on vocabulary. We call “Base library” the library provided by INRIA with the default distribution of OCaml. We don’t call it “standard library” for the simple reason that there are several libraries vying for the status of standard, including Batteries Included.
The relation between Batteries Included and the Base Library is simple: the Base Library is one of the libraries for which Batteries Included provides a uniformization layer. We are not forking the library, merely providing additional functions, additional documentation, boilerplate code…
The complete Base Library is available in Batteries Included as module
Legacy. Most modules are also available inside the Batteries module hierarchy, sometimes under different names, and usually completed by numerous new functions. A few modules are considered obsolete and appear only in
Jane Street’s Core
Jane Street’s Core is another library vying for the status of standard, this time produced by Jane Street Capital. This library is comparable in purpose and design to the core of Batteries Included (actually, we draw some inspiration from them and we hope that they are going to draw some inspiration from us, too) but there is no code shared between Batteries Included and Jane Street’s Core for the moment.
In the future, Batteries Included may depend on Jane Street’s Core. This is not the case yet as, according to one of Jane Street’s Core’s authors, this library may change quite a lot, and in what we understand as possibly incompatible ways, before reaching version 1.0.
We already depend on two components of Jane Street’s Core: Type-Conv (a general infrastructure which may be used to generate boilerplate code) and Sexplib (an instance of Type-Conv used to generate code for serializing to/deserializing from human-readable S-Expressions). We intend to extend this to a third component of Jane Street’s Core: Bin-Prot, another instance of Type-Conv, used this time to generate serialization to/from binary protocols.
ExtLib is another extension of the OCaml Base Library. ExtLib was designed as a relatively small addition, with the idea of fitting nicely within the OCaml Base Library. Since 2005, ExtLib has essentially stopped growing, following a conscious choice from the maintainers.
OCaml Batteries Included is largely based on ExtLib. Indeed, the core of Batteries is essentially a fork of ExtLib. We intend to maintain ascending compatibility with ExtLib (with the exception of changes in module names), although we have added a great number of features, fixed bugs, split code, etc. For most purposes, OCaml Batteries Included contains ExtLib and, in our mind, superseeds this library. We hope that the maintainers and developers of ExtLib will consider migrating to OCaml Batteries Included and helping us with their skills.
Community OCaml is an attempt to turn OCaml into an out-of-the-box complete development platform, including numerous libraries. We have joined forces with the developers of Community OCaml and we share most of our code. Although the design of Community OCaml is not completely decided yet, it is quite possible that this will end up as distribution consisting in OCaml + Batteries Included + some package manager.
OCamlNet is an implementation of numerous Internet protocols for OCaml, along with so many utilities that it almost constitues a complete general-purpose development library for OCaml.
OCaml Batteries Included will depend on OCamlNet and incorporate OCamlNet into its hierarchy. In particular, for the moment, OCaml Batteries Included provides little in the way of libraries for interacting with the operating system, as we intend to use OCamlNet for that purpose.
Caml Development Kit
The Caml Development Kit, or CDK, was another attempt to build a development platform around OCaml by bundling together OCaml itself and a number of interesting libraries and tools. By opposition to Batteries Included, this was a monolithic distribution, with a custom compiler and very little documentation. Development on CDK seems to have died around 2001.
There is no relation between CDK and Batteries Included other than the common goal. Any feedback from former CDK users or developers will be welcome, though.
GODI, Apt, Yum
GODI, Apt and Yum are the three main package managers available for OCaml users. The first one is OCaml-specific, the second one lives in Debian/Ubuntu/Knoppix world and the third one in the Red Hat/Fedora world.
OCaml Batteries Included has no direct relation with either package manager, nor does it replicate the work of any of these managers. However, as Batteries relies on numerous external libraries, use of a package manager is strongly recommended.
Findlib is a compile-time package manager for OCaml. It lets developers easily specify which libraries are required and manages compile-time dependencies between installed libraries.
OCaml Batteries Included uses Findlib extensively.
Comparable projects, somewhere else
Python Batteries Included
To the best of our knowledge, the term “Batteries Included” was first coined by the Python community to describe the standard distribution of Python, which in addition to usual data structures, contains modules for handling databases, compression, a number of file format (de)coders including JSON, (X)HTML, XML, CSV, cryptography, logging, text interfaces, threading, inter-process communication, implementation of client and server protocols from SMTP to HTTP, communication with external webbrowsers, image and sound manipulation, internationalization, lexing and parsing, graphical interfaces, unit testing, sandboxing, reflexivity, OS-specific services, …
This massive number of out-of-the-box features, along with the conciseness of the language, are probably the two main reasons of the success of Python: simple tasks, even those which require complex libraries, may often be programmed with only a few lines and in a few minutes.
Well, obviously, we’re trying to provide something as useful with OCaml Batteries Included, but with the added safety and speed of OCaml.
Haskell Batteries Included
The Haskell Platform (although known as “Haskell Batteries Included”) is a recent project undertaken by the Haskell community with objectives comparable to OCaml Batteries Included: “provides a comprehensive,
stable and quality tested base for Haskell projects to work from.” At the time of this writing, the Haskell Platform has not released any software, although a first release is expected within a few weeks.
As OCaml Batteries, the Haskell Platform is community-led and relies on a number of decentralized libraries. As OCaml Batteries, the Haskell Platform requires package management.There are a number of differences in the methods, though.
Some differences are probably trivial: where we strive to work with any of the major package management systems, the Haskell Platform is based solely on Cabal, which may allow them a better integration. Where OCaml Batteries attempts to reclassify existing libraries into one uniform hierarchy of modules, the Haskell Platform keeps the original module names, as decided by their original authors, pre-Platform.
Others are quite far-reaching: OCaml Batteries provides an extended core of libraries to serve as support for standardization and uniformization, while the Haskell Platform doesn’t. Similarly, the Haskell Platform doesn’t add any uniformization layer, which means that the task of getting libraries to work together lies upon the end-user. Some Haskell libraries may be patched into compliance and uniformization, but this is not always possible, short of creating cyclic dependencies between libraries which should work together but don’t. Finally, the Haskell Platform has much stricter guidelines regarding libraries which can or can’t make it into the Platform. This is a good thing if library authors are willing to fix whichever problems prevent the inclusion of their work — something which we don’t assume for OCaml Batteries, at least not yet.
Despite these differences, both projects seem based on solid foundations. And hopefully, both will achieve large success.
September 27, 2008 § Leave a Comment
Just a quick word for people who may be curious about the development of OCaml Batteries Included. Work is proceeding nicely and we’re getting close to a first official release. We’ve moved things around quite a lot recently, worked on the documentation and added a few nice features (read-only strings and arrays, uniform numeric modules with type-class-style dictionaries). We’re about to add Unicode support for inputs and outputs (based on Camomile) and an improved Scanf module and that should be it for a first release.
As a side-note, the Haskell community seems to be involved much in the same process as Batteries Included, with the Haskell Platform, aka Haskell Batteries Included. Both their schedule and their list of packages seem a little more precise than ours but the overall objective remains the same: take a great programming language used mostly by academics and turn it into a complete development platform able to compete with the best the industrial world is able to offer. The main difference, it seems, is that the Haskell Platform doesn’t have a glue layer designed to uniformize APIs. The other main difference, I’m afraid, is that the Haskell community seems much larger these days than the OCaml community — or perhaps just more active or more verbal. It is my hope that a larger and more convenient standard library will help draw (back?) both academics and developers to the OCaml world. A little more academic support wouldn’t hurt, of course.
Back to OCaml Batteries Included, I hope we’ll be able release by October 10th. At that point, we’ll need beta-testing and it will be time to decide of what should get into Batteries Included 0.2. I’m sure everyone has ideas and suggestions — it will soon be time to share them.
August 29, 2008 § Leave a Comment
After a few discussions on IRC, by e-mail and on forums, I have come to realize that both the purpose of Batteries Included and what the development of Batteries involved were quite unclear to most people — and that we should probably have started our work in quite a different manner. All these discussions have prompted a few changes and the release of a first pre-version of Batteries Included, which you may find on the OCamlForge project (you may also browse source code here and API documentation here).
This release represents what we should have produced in the first place; a simple and uniform presentation layer on top of existing libraries.
July 4, 2008 § Leave a Comment
A quick work regarding the current status of Extrapol and its release.
Development of Extrapol progresses. With our current set of sample, Extrapol works flawlessly. We’re now adding features, improving error reporting and de-hard-wiring the model of the C standard library from the tool and moving it towards an external configuration file as well as progressively moving towards larger and more realistic samples. Development will come to an abrupt (and temporary) halt at the end of this week, though, due to personal matters (i.e. I’m getting married).
The release planned for next week, on the other hand, is canceled. As the research field of applied security is very competitive, and after careful discussion with the rest of my research team, we have decided to only release a version of Extrapol after the scientific content has been accepted for publication in a conference or journal. At the request of one of the institutes which founds this research, I will also refrain from posting detailed information on the theory and algorithms behind Extrapol, until these are cleared by the institute and accepted for publication. Without entering the details, Extrapol is expected to serve in critical infrastructures, which explains the need for clearance.
However, rest assured that there will be a release and it will be open-source (presumably licenced under a combination of MIT and LGPL). The only question is when — and this probably won’t happen before November.