(re)introducing OS.File

June 27, 2012 § 6 Comments

OS.File is a new JavaScript library available to Firefox and Thunderbird developers and add-on developers. This library offers efficient, low-level, backgrounded, interaction with the file system, with a number of primitives to take advantage of the specific features of each platform. It is also a nice example of systems programming in JavaScript. Please use it, look at the code, and please report bugs and missing features.

(re)Introducing OS.File

A considerable aspect of our work, at Mozilla, is to ensure that the user experience is smooth and responsive. One of the main tools available to developers to permit such responsive code is multi-threading: any computation or interaction with the system that takes too long can (and should) be pushed into the background, and should interact asynchronously with the user interface.

Now, one of critical bottlenecks in any application is I/O: accessing the disk (or the network, or the database…) is typically orders of magnitude slower than any in-memory operation – plus it can sometimes disrupt the user experience of the complete system. This is true on desktop systems and this is even more true on smartphones and tablets.

What this means is that we need a nice library to perform I/O, and by nice, I mean:

  • I/O should be backgrounded;
  • the number of I/O operations should be carefully controlled.

This is what OS.File is all about: OS.File is a library available to developers (including add-on developers) on the Mozilla platforms
(Firefox, Thunderbird, Songbird, InstantBird, Boot-to-Gecko, etc.). This library is available (only) to JavaScript, and it offers
low-level access to the file system, available to background threads.

As its name implies, OS.File is a system library, not a web library, so web application developers will not have access to it.

A first usable version of OS.File has landed a few days ago and is now available on nightly build of Mozilla Platform applications. We are progressively working on adding features, and I would like to invite all developers who need to do I/O to try it, report any bugs and request any features they need.

Using OS.File

OS.File offers both a cross-platform API (module OS.File itself) and bindings to platform-specific functions (modules OS.Win.File and OS.Unix.File), as well as utilities for system programming (modules OS.Shared and OS.Constants). In this post, I will only discuss module OS.File itself.

By design, in this first delivery, module OS.File is quite minimalistic. Features will be added progressively (see next section). You can find the documentation of OS.File on MDN, as usual.

For the moment, module OS.File can be used only from a chrome worker (i.e. a privileged JavaScript background thread).

Renaming a file


OS.File.move("a.tmp", "b.tmp");

In case of error, this will raise an exception of type OS.File.Error.

Copying a file, handling errors, options


try {
  OS.File.copy("b.tmp", "c.tmp", {noOverwrite: true});
} catch(ex) {
  if (ex.becauseNoSuchFile) {
    // b.tmp does not exist
  } else if (ex.becauseFileExists) {
    // c.tmp exists and we do not want to overwrite it
  }
}

Open a file, read a prefix


let buffer = new ArrayBuffer(12); // Also works with a js-ctypes C pointer
let file
try {
  file = OS.File.open("myfile.tmp"); // No options: open for reading
  let bytes = file.read(buffer, 12);
  // Do something with these bytes
  // ...
} finally {
  if (file) {
    file.close();
  }
}

Open a file for writing


let file = OS.File.open("myfile.tmp", {create:true}); // Fail if the file already exists

Note that this operation will only require one I/O interaction with the operating system – this is much faster than first checking whether the file already exists, and then creating it if it does not.

Open a file with OS-specific options


let file = OS.File.open("myfile.tmp",
  {create:true},
  {unixMode: OS.Constants.libc.S_IRWXU | OS.Constants.libc.S_IRWXG }
);

Short FAQ

What’s good about OS.File?

  • Finally, file I/O for JavaScript workers.
  • An API much more JavaScript-friendly than what already existed in the Mozilla Platform.
  • Options and low-level functions to ensure that we perform minimal amount of actual I/O.

Wasn’t all that already possible?

The existing I/O libraries on the Mozilla Platform could not be used from background threads. Some functions could be backgrounded, but only very few of them.

JavaScript-friendly wrappers had been written around these libraries, but they only covered a few of the features of these libraries, in addition to which they could not be used from background threads either.

How is OS.File implemented?

OS.File is implemented in pure JavaScript, using the (very nice) js-ctypes library to perform calls to the OS APIs.

Why JavaScript and not C++?

Because we want the code to be easily accessible to the community.

Isn’t that slow?

Well, firstly, JavaScript has grown into a very fast language. These days, expecting without benchmarks that C++ is faster than JavaScript on hot code can cause surprises.

In addition, writing the library in C++ would have meant that we needed to cross language barriers quite often, which is bad for performance, due to:

  • complex memory management;
  • bad JIT-ability; and
  • need to convert all data structures, in particular strings.

We attempt to avoid this as much as possible.

For the moment, however, OS.File has not been benchmarked. We await real-world applications.

Work in progress

We are currently hard at work extending OS.File. The next few landings should add:

Features are driven by application requirements, so if you need some other feature, please do not hesitate to contact me on IRC or to file a bug on Bugzilla.

C data finalization – in JavaScript

May 2, 2012 § 4 Comments

A few iterations ago, the Mozilla Platform introducefd js-ctypes, a very nice Foreign Function Interface (FFI) for JavaScript. As its inspiration, Python’s ctypes, js-ctypes lets (privileged) JavaScript code open native libraries, import their functions and call these functions almost as if they were regular JavaScript functions.

Here is an example using the Unix libc:

// Open the C library
let libcCandidates = [
  'libSystem.dylib',// MacOS X
  'libc.so.6',      // Linux
  'libc.so'         // Android, B2G
];
let libc;
for each(let candidate in libcCandidates) {
  libc = ctypes.open(candidate);
  if (libc) {
    break;
  }
}

// Import some functions from libc
let open = libc.declare("open", ctypes.default_abi,
  /*return int*/ ctypes.int,
  /*const char* path*/ctypes.char.ptr,
  /*int oflag*/ ctypes.int
  /*int mode*/ ctypes.int);
let read = libc.read("read", ctypes.default_abi,
  /*return ssize_t*/ ctypes.ssize_t,
  /*int fildes*/ ctypes.int,
  /*void *buf*/ ctypes.voidptr_t,
  /*size_t nbytes*/ ctypes.size_t);
let close = libc.read("close", ctypes.default_abi,
  /*return int*/ ctypes.int,
  /*int fd*/ ctypes.int);

// Now use them
let myfile = open("/etc/passwd", 0, 0);
if (myfile == -1)
  throw new Error("Could not open file");
// ...

If you are familiar with XPConnect, the mechanism generally used in the Mozilla Platform for letting JavaScript and C++ interact, you can see that using js-ctypes to call native code directly is much nicer than adding a C++ XPCOM/XPConnect layer. From what I hear, it seems to be also much faster, as XPConnect needs to perform expensive magic to ensure that memory is properly passed between JavaScript and C++. In addition, this selfsame memory magic now prevents XPConnect from being executed from threads other than the main thread, which makes js-ctypes the only manner of doing any system access from worker threads.

Now, js-ctypes nicely solves the issue of calling native code from JavaScript. However, JavaScript and C are very different languages, with very different paradigms, so getting them to coexist requires a little more than simply the ability to place calls or convert values. In particular, C has:

  • manual resource management (memory must be released, file descriptors must be closed, locks must be released, etc.);
  • no language-level mechanism for error management (a task smaller than a process cannot be killed because of an error).

By contrast, Javascript has:

  • automated memory management, but no support for managing automatically resources other than memory (no user-level finalization or scoped resources mechanism);
  • several language/vm-level mechanisms that can kill a task in non-trivial manners (exceptions, “this script is busy”, etc.)

Unfortunately, putting all of this together makes it quite difficult to write JavaScript code that manipulates C resources without leaking. Such leaks can cause both performance issues (memory leaks, in particular, tend to slow down the whole system) and hard-to-track errors (leaking file descriptors can prevent the application from opening any new file, or, under Windows, can prevent the application from reopening some files that were improperly closed, while leaking locks can completely freeze an application).

Introducing C data finalization

For this reason, we have recently added a new features to js-ctypes, designed to add automated resource management to JavaScript: C data finalization.

Specifying a finalizer is simple:

function openfile(path, flags, mode) {
  let fd = open(path, flags, mode);
  if (fd == -1) {
    throw new Error("Could not open file " + path);
  }
  return ctypes.CDataFinalizer(fd, close);
}

What this code does is ensure that, whenever the file descriptor is garbage-collected, function close is called, releasing the C resources represented by that file descriptor. This value is C data with a finalizer, aka CDataFinalizer.

You can use it just as you would use the C data through js-ctypes:

let myfile = openfile("/etc/passwd", 0, 0);
let result = read(myfile, myarray, 4096); // Read some data
// Wherever required, |myfile| is automatically converted to
// the underlying integer value.
// Once |myfile| has no reference, it will (eventually) be
// closed.

It is, of course, possible (and strongly recommended) to close the file manually to ensure that resources are immediately available for the process and the rest of the system:

let myfile = openfile("/etc/passwd", 0, 0);
// ...
// ... do whatever you wish to do with that file
let result = myfile.dispose(); // This calls |close|.

// From this point, |myfile| cannot be converted to the underlying
// integer value anymore. Any attempt to do so will raise an
// exception.

Or, an equivalent but more verbose solution, using forget:

let myfile = openfile("/etc/passwd", 0, 0);
// ...
// ... do whatever you wish to do with that file
let fd = myfile.forget();
// From this point, |myfile| cannot be converted to the underlying
// integer value anymore. Any attempt to do so will raise an
// exception.
let result = close(fd);

This mechanism is, of course, not restricted to file descriptors. It has been used with success to other data structures, including malloc-allocated strings.

Details and caveat

JavaScript does not feature finalization and might never do so. There are good reasons for this: finalization considerably complicates the garbage-collector and introduces the possibility of subtle bugs and leaks that the various JS implementors do not want to inflict to their users (if you are curious, two of the main problems are resurrection of dead references and finalization of cyclic data structures).

Consequently, C data finalizers are not full-featured finalizers. Indeed, the main limitation of C data finalizers is that its first argument must be a C value and its second argument must be a pointer to a C function – for the above mentioned reasons, letting users specify any JavaScript function as a finalizer would open a can of worms that nobody really wants to see crawling around.

Also, before using a finalizer, you should be aware that JavaScript garbage-collection is not necessarily deterministic – during the testing phase of CDataFinalizer, we have encountered memory errors caused by developers (ok, I will confess, that was me, sorry guys) making invalid assumptions about just when values would be garbage-collected. Let me emphasize this: any hypothesis you make about when a value is finalized is bound to be regularly false. In other words, C data finalizers should be used as a last line of defense, not as the default mechanism for recovering resources.

Still, C data finalizers are a powerful mechanism that make manipulation of C values with JavaScript much more reliable. Indeed, it is one of the core mechanisms used pervasively by the OS.File library.

edit As per Steve Fink’s suggestion, I have emphasized that users should not rely on the behavior of garbage-collection/finalization, and clarified the can of worms.

Where Am I?

You are currently browsing entries tagged with ctypes at Il y a du thé renversé au bord de la table.