Asynchronous file I/O for the Mozilla Platform

October 3, 2012 § 17 Comments

The Mozilla platform has recently been extended with a new JavaScript library for asynchronous, efficient, file I/O. With this library, developers of Firefox, Firefox OS and add-ons can easily write code that behave nicely with respect to the process and the operating system. Please use it, report bugs and contribute.

Off-main thread file I/O

Almost one year ago, Mozilla started Project Snappy. The objective of Project Snappy is to improve, wherever possible, the responsiveness of Firefox, the Mozilla Platform, and now, Firefox OS, based on performance data collected from volunteer users. Thanks to this real-world performance data, we have been able to identify a number of bottlenecks at all levels of Firefox. As it turns out, one of the main bottlenecks is main thread file I/O, i.e. reading from a file or writing to a file from the thread that also runs most of the code of Firefox and its add-ons.

« Read the rest of this entry »

Appel à enseignants/intervenants

September 26, 2012 § 2 Comments

Pour Mozilla, l’année universitaire 2012-2013 est l’année de Firefox OS, l’année des Open Web Applications et l’année à laquelle la communauté Mozilla lance sa campagne de libération des téléphones et applications portables !

Nous avons besoin de vous pour enseigner les technologies des Open Web Applications aux communautés francophones.
L’objectif des cours est de former des étudiants ingénieurs ou équivalents aux technologies nécessaires pour le développement ouvert d’applications web ouvertes. Nous cherchons notamment des enseignements sur les sujets suivants :
  • JavaScript avancé (clôtures, événements, prototypes, itérateurs/générateurs, timeouts)

Déroulement des cours

Autant que possible, les cours seront donnés en français face à une classe de MIAGE de  l’Université d’Évry, où ils seront aussi enregistrés. Les cours seront mis à disposition de toute la communauté Mozilla francophone dans le cadre de Mozilla Education. Les documents correspondants seront eux-mêmes mis à  disposition sous licence libre sur github.
Si  vous ne pouvez pas vous rendre à Évry, mais si vous pouvez enregistrer des cours par vous-mêmes, n’hésitez pas à nous contacter. Nous mettrons vos cours à disposition du public francophone.
Si vous avez d’autres idées, n’hésitez pas à nous les proposer. En ce moment, nos capacités d’enregistrement sont limitées mais nous ferons notre possible pour enregistrer et mettre votre cours à disposition.

Pour participer

  • Si vous pouvez assurer ces cours (entièrement ou en partie), suivez les liens ci-dessus
  • Pour proposer d’autres cours que vous pouvez vous-même assurer, ajoutez la description de vos cours sur https://github.com/Yoric/Mozilla-Courses/issues (cliquez sur “New issue”). Précisez si vous avez des contraintes géographiques ou besoin de matériel spécifique.

Et après ?

Notre objectif est d’étendre cette initiative hors de la région parisienne et surtout hors de France. Nous espérons notamment pouvoir organiser des cours en Afrique Francophone. La suite au prochain épisode !

 

 

 

Getting file information with OS.File

July 31, 2012 § 2 Comments

OS.File keeps gaining new features.

Today, let me show you OS.File.stat and OS.File.prototype.stat, two data structures used to get information on a file, such as its size, its creation date or its nature.

How to

There are two ways to get information on a file.

The first technique is to simply call OS.File.stat with the path of the file you wish to open:

// File sessionstore.js in the user’s profile directory
let path = OS.Path.join(OS.Constants.Path.profileDir, "sessionstore.js");
let stat = OS.File.stat(path)

This returns a OS.File.Info object containing all the interesting information on the file.

if (stat.isDir) {
  dump("This is a directory\n");
} else if (stat.isSymLink) {
  dump("This is a symbolic link\n");
}
dump("The file contains " + stat.size + "bytes\n”);
dump("The file was created at " + stat.creationDate + "\n");
dump("The file was last accessed at " + stat.lastAccessDate + "\n");
dump("The file was last modified at " + stat.lastModificationDate + "\n");

Additionally, under Unix, some security information is available:

if ("unixOwner" in OS.File.Info.prototype) {
  dump("The file belongs to user " + stat.unixOwner +
    " in group " + stat.unixGroup +
    " and has mode " + stat.unixMode);
}

That’s it.

The second technique will let you get information on a file that is already opened:

let file = OS.File.open(path);
let stat = file.stat();

The result is exactly the same. Of course, file.stat() is faster if you have already opened the file, while OS.File.stat(path) if faster than opening the file, calling file.stat() then closing it.

Exercise

Let’s put OS.File.stat and OS.File.DirectoryIterator to good use for getting the list of all files in a directory, ordered by last modification date.

function sortedEntries(path) {
  // Get the list of all files in directory
  let iterator = new OS.File.DirectoryIterator(path);
  let entries;
  try {
    entries = [entry for (entry in iterator)];
  } finally {
    iterator.close();
  }

  // If we are under Windows, we have all information in entries already
  // We can make this happen without any further I/O
  if ("winLastModificationDate" in OS.File.DirectoryIterator.prototype) {
    return entries.sort(function compare(x,y) {
      return x.winLastModificationDate - y.winLastModificationDate;
    }
  } else {
    // On other systems, we have to call stat before we can order
    let sortable = [{entry: entry, stat: OS.File.stat(entry.path)} for (entry in entries)];
    // Array comprehension is cool
    let sorted = sortable.sort(function compare(x, y)) {
      return x.stat.lastModificationDate - y.stat.lastModificationDate;
    }
    return [x.entry for (x in sorted)];
  }
}

Note that OS.File.DirectoryIterator does not return special files “.” and “..”.

For bonus points, let’s do the same, but only for non-directory files in the directory:

function nonDirectoryEntries(path) {
  // Get the list of all files in directory
  let iterator = new OS.File.DirectoryIterator(path);
  try {
    for (let entry in iterator) {
      if (!entry.isDir) {
        // Generators are cool, too
        yield entry;
      }
    }
  } finally {
    iterator.close();
  }
}

function sortedEntries(path) {
  // Get the list of all non-directory files in directory
  let entries = nonDirectoryEntries(path);
  if ("winLastModificationDate" in OS.File.DirectoryIterator.prototype) {
    // ... as above
  } else {
    // ... as above
  }
}

We could of course remove directories after sorting, but removing it initially saves both computation time (we sort through a shorter array) and I/O (under non-Windows platforms, we only need to call stat on a smaller set of files).

Homework

As a Programming Language guy, I see an opportunity to develop this API into a nice Domain Specific Language that would let developers formulate queries and would let the engine generate OS-optimized functions to execute these queries.

For instance:

OS.File.Query.SelectFromDir().
  where({isDir: false}).
  sortedBy({lastModificationDate: true})
  // returns the above function, including the optimizations

 

OS.File.Query.SelectFromDir().
  where({path: /.*\.tmp^/, isSymLink:false}).
  sortedBy({creationDate: true})

I do not have plans to implement anything such at the moment, but this sounds like a nice student project. If you are interested, do not hesitate to drop me a line.

That’s all folks.

In the next entries of this blog, I expect to introduce, in no particular order:

  • path manipulation with OS.File;
  • reading/writing with encodings in OS.File;
  • off-main-thread async I/O for the main thread;

benchmarks.

OS.File: Iterating through a directory

July 17, 2012 § 2 Comments

Since its first landing, OS.File has been steadily gaining new features. Today, let me show you OS.File.DirectoryIterator. As its name implies, this class serves to iterate through the contents of a directory.

How to

Iterating through a directory is quite simple:

let iterator = new OS.File.DirectoryIterator("/tmp");
try {
  for (let entry in iterator) {
    // Do something with the entry.
  }
} finally {
  iterator.close(); // Release system resources as soon as possible
}

As usual with OS.File, calling iterator.close() is not strictly necessary, but is a good habit to take, as it releases critical resources immediately.

Of course, should you need all the entries for future consumption, you can place them all in an array as follows:

let array = [entry for (entry in iterator)]; // Array comprehensions in JS, at last!

Each entry contains the available information about one file:

for (let entry in iterator) {
  // Checking the type of the entry
  if (entry.isDir) {
    console.log(entry.name, "is a directory");
  } else if (entry.isLink) {
    console.log(entry.name, "is a link”);
  } else {
    console.log(entry.name, "is a regular file");
  }

  // Getting the full path to the entry
  console.log("Full path", entry.path);
}

One of the main design goals of OS.File is to be I/O efficient. Here, this means that during the iteration, the library will perform exactly one I/O call per entry, with one additional call for opening the iterator and one for closing. With respect to I/O, getting the type, name and path of the file is free.

Under Windows, a few additional informations are available, also for free. As usual with OS.File, OS-specific features are prefixed: winCreationTime, winLastWriteTime and winLastAccessTime.

For instance, to list the creation times of entries under Windows:

for (let entry in iterator) {
  if ("winCreationTime" in entry) {
    console.log("The file was created at", entry.winCreationTime);
  }
}

Or, to sort a list of entries by creation time:

let entries = [entry for (entry in iterator)];
if (entries.length > 0 && "winCreationTime" in OS.File.DirectoryIterator.Entry.prototype) {
  entries = entries.sort(function(a, b) {
    return a.winCreationTime - b.winCreationTime;
  })
}

If you wonder why we introduced fields winCreationTime et al. and not a cross-platform field creationTime, recall that, for the sake of I/O efficiency, each entry only contains the information returned by one single I/O call. As the Windows call returns more information than the Unix version, an entry under Windows offers more information than under Unix.

Finally, the Windows back-end offers an additional feature: iterating through only the subset of the entries of the directory matching some regular expression. As usual, since the feature is Windows only, it is prefixed by win.

let iterator = new OS.File.DirectoryIterator("C:\\System\\TEMP",
    /*platform-specific options*/ { winPattern: "*.tmp" } );
// ... do something with that iterator

FAQ

What’s this I/O efficiency?

The two main goals with OS.File are:

  • provide off-main-thread I/O;
  • be I/O efficient.

I/O efficiency is all about minimizing the number of actual I/O calls. This is critical because some platforms have extremely slow storage (e.g. smartphones, tablets) and because, regardless of the platforms, doing too much I/O penalizes not just your application but potentially all the applications running on the system, which is quite bad for the user experience. Finally, I/O is often expensive in terms of energy, so wasting I/O is wasting battery.

Consequently, one of the key design choices of OS.File is to provide operations that are low-level enough that they do not hide any I/O from the developer (which could cause the developer to perform more I/O than they think) and, since not all platforms have the same features, offer system-specific information that the developer can use to optimize his algorithms for a platform.

How does OS.File compare to Node.js I/O in terms of I/O-efficiency?

OS.File is designed for efficient off-main-thread I/O. For the moment, OS.File does not provide an asynchronous API that can be used from the main thread, although we are working on fixing this.

By contrast, Node.js low-level I/O is designed to mirror a subset of an old version of Posix, and provides both a synchronous and an asynchronous API on top of these calls.

The choice made by Node.js works well on the platforms for which Node.js is generally targeted (e.g. Unix-based servers) but we need better to cope with the platforms for which Firefox and Firefox OS are targeted (e.g. not only Unix but also Windows machines, as well as battery-powered devices with slow storage, etc.).

How does directory iteration compare to Node.js directory iteration?

Node.js provides a primitive readdir to iterate through a directory. This primitive returns an array of file names. The implementation of this primitive already costs about n I/O calls, where n is the number of files in the directory.

Consequently, the algorithm to determine which entries of a directory are subdirectories costs

  • about n I/O calls to establish the list of entries ; then
  • about n I/O calls to determine which are subdirectories.

This makes walking a directory recursively (to empty it or to copy it to another drive, for instance) twice more expensive than necessary. Note that this measure is very much non-scientific, as the I/O call to determine if an entry is a subdirectory can be much more expensive than the call to list the entry, depending on the OS.

By comparison, the OS.File directory iterator requires about n I/O calls for this purpose.

Similarly, under all platforms, finding the file accessed least recently has a cost of about 2·n I/O calls under Node.js.

With OS.File, the cost is similar to Node.js under Unix, but only n under Windows. Upcoming work with OS.File should also onsiderably reduce the I/O cost under Linux, Android and Firefox OS.

Finally, for an algorithm that can break from iteration once some condition is met (e.g. looking if at least one file matches some condition), Node.js will still require n I/O calls, while OS.File generally requires only as many I/O calls.

How does this compare to XPCOM nsILocalFile::directoryEntries?

Until now, the only manner of listing entries in a directory on the Mozilla Platform was nsILocalFile::directoryEntries.

Generally, OS.File directory is more convenient to use from JavaScript, can be called off-main thread, and provides more information than nsILocalFile::directoryEntries for the same I/O cost, which makes it more I/O efficient for e.g. iterating a directory looking for a file matching some condition, or walking recursively through a hierarchy.

The counterparts are that OS.File directory iteration cannot be used from the main thread yet and cannot be called from C++.

HOWTO: Calling into Mozilla C internals from JavaScript

July 13, 2012 § 10 Comments

I have been using js-ctypes to call into libxul from JavaScript, to I figured I should share the recipe.

Several versions ago, Firefox, Thunderbird and other applications using the Mozilla Platform (aka Gecko) started shipping with js-ctypes, a very powerful foreign function interface for JavaScript. Using this mechanism, privileged JavaScript code can call into native libraries. Since then, js-ctypes has been used to implement all sorts of tools, from interaction with MacOS X GUI to direct calls to the Linux kernel to scripting Dalvik, the Android JVM.

One of the interesting and less-known features of js-ctypes is that you can also use it to call into Gecko internals. You can use it to script a number of previously inaccessible Gecko features. I personally use it to permit access the Unicode conversion primitives of Gecko, and to help me with serialization of js-ctypes values, but there is no reason to not use it for other things, such as testing native libxul code from JavaScript.

Caveat This is a hack. Doing this may void your warranty. Also, if I were reviewing an add-on that depends on this, I would reject it. You have been warned.

Let’s see how. At the time of this writing, the following steps require a recent Nightly build of Firefox/Thunderbird/… but the feature should be available shortly to released versions.

Import js-ctypes and OS.Constants.

Components.utils.import("resource://gre/modules/ctypes.jsm");

Components.classes["@mozilla.org/net/osfileconstantsservice;1"].
 getService(Components.interfaces.nsIOSFileConstantsService).
 init();

If your code is executed in a worker thread, this extract is not required.

If you are running an older version of Firefox (i.e. anything except a recent Nightly), you will need to replace the references to nsIIOSFileConstantsService/OS.Constants with more verbose calls to xpcom, and this will work only on the main thread.

Open libxul

libxul is the native library that contains the Gecko internals. To open it,

let libxul = ctypes.open(OS.Constants.Path.libxul);

Play with it!

You are basically done. Now, you can import any C function (or a number of C++ functions, with a little more effort) to JavaScript. Let’s see how to import function DumpJSStack, for instance. This function is part of the JavaScript Virtual Machine. As its name implies, it prints the JavaScript stack. Not very useful for us at this stage, as JavaScript can print its own stack quite easily, but a good way to test our toy.

The definition of DumpJSStack is the following:

extern "C" void DumpJSStack()

In other words, this function takes no argument, returns nothing and is designed to be called from C (or from js-ctypes). This makes it an ideal candidate to import it with js-ctypes:

let dumpJSStack = libxul.declare("DumpJSStack",
  ctypes.default_abi,
  /*return*/ ctypes.void_t);

We can now use the function as if it were a regular JS function:

dumpJSStack();

(prints some information about how you got to this point – works only from the main thread)

Play with it: printf_stderr

We can similarly import function printf_stderr, which is often quite useful for debugging:

extern "C" void printf_stderr(char* fmt, ...);

To import it:

let printf_stderr = libxul.declare("printf_stderr",
 ctypes.default_abi,
 /*return*/ ctypes.void_t,
 /*fmt*/ ctypes.char.ptr,
 /* ... */ "...");

More features

There are many more features that can be accessed through libxul and that go beyond the scope of this blog entry. Let us simply list a few of them.

What about:

  • walking the native stack using NS_StackWalk? see file nsStackWalk.h for more details;
  • interacting with other processes using PR_CreateProcess, PR_WaitProcess, PR_KillProcess? see prprocess.h for more details;
  • using the built-in cryptographic features of the Mozilla Platform? see nss/* for more details;

and certainly more…

Note that, by opposition to XPCOM-based calls, most js-ctypes based calls work from all threads.

(re)introducing OS.File

June 27, 2012 § 6 Comments

OS.File is a new JavaScript library available to Firefox and Thunderbird developers and add-on developers. This library offers efficient, low-level, backgrounded, interaction with the file system, with a number of primitives to take advantage of the specific features of each platform. It is also a nice example of systems programming in JavaScript. Please use it, look at the code, and please report bugs and missing features.

(re)Introducing OS.File

A considerable aspect of our work, at Mozilla, is to ensure that the user experience is smooth and responsive. One of the main tools available to developers to permit such responsive code is multi-threading: any computation or interaction with the system that takes too long can (and should) be pushed into the background, and should interact asynchronously with the user interface.

Now, one of critical bottlenecks in any application is I/O: accessing the disk (or the network, or the database…) is typically orders of magnitude slower than any in-memory operation – plus it can sometimes disrupt the user experience of the complete system. This is true on desktop systems and this is even more true on smartphones and tablets.

What this means is that we need a nice library to perform I/O, and by nice, I mean:

  • I/O should be backgrounded;
  • the number of I/O operations should be carefully controlled.

This is what OS.File is all about: OS.File is a library available to developers (including add-on developers) on the Mozilla platforms
(Firefox, Thunderbird, Songbird, InstantBird, Boot-to-Gecko, etc.). This library is available (only) to JavaScript, and it offers
low-level access to the file system, available to background threads.

As its name implies, OS.File is a system library, not a web library, so web application developers will not have access to it.

A first usable version of OS.File has landed a few days ago and is now available on nightly build of Mozilla Platform applications. We are progressively working on adding features, and I would like to invite all developers who need to do I/O to try it, report any bugs and request any features they need.

Using OS.File

OS.File offers both a cross-platform API (module OS.File itself) and bindings to platform-specific functions (modules OS.Win.File and OS.Unix.File), as well as utilities for system programming (modules OS.Shared and OS.Constants). In this post, I will only discuss module OS.File itself.

By design, in this first delivery, module OS.File is quite minimalistic. Features will be added progressively (see next section). You can find the documentation of OS.File on MDN, as usual.

For the moment, module OS.File can be used only from a chrome worker (i.e. a privileged JavaScript background thread).

Renaming a file


OS.File.move("a.tmp", "b.tmp");

In case of error, this will raise an exception of type OS.File.Error.

Copying a file, handling errors, options


try {
  OS.File.copy("b.tmp", "c.tmp", {noOverwrite: true});
} catch(ex) {
  if (ex.becauseNoSuchFile) {
    // b.tmp does not exist
  } else if (ex.becauseFileExists) {
    // c.tmp exists and we do not want to overwrite it
  }
}

Open a file, read a prefix


let buffer = new ArrayBuffer(12); // Also works with a js-ctypes C pointer
let file
try {
  file = OS.File.open("myfile.tmp"); // No options: open for reading
  let bytes = file.read(buffer, 12);
  // Do something with these bytes
  // ...
} finally {
  if (file) {
    file.close();
  }
}

Open a file for writing


let file = OS.File.open("myfile.tmp", {create:true}); // Fail if the file already exists

Note that this operation will only require one I/O interaction with the operating system – this is much faster than first checking whether the file already exists, and then creating it if it does not.

Open a file with OS-specific options


let file = OS.File.open("myfile.tmp",
  {create:true},
  {unixMode: OS.Constants.libc.S_IRWXU | OS.Constants.libc.S_IRWXG }
);

Short FAQ

What’s good about OS.File?

  • Finally, file I/O for JavaScript workers.
  • An API much more JavaScript-friendly than what already existed in the Mozilla Platform.
  • Options and low-level functions to ensure that we perform minimal amount of actual I/O.

Wasn’t all that already possible?

The existing I/O libraries on the Mozilla Platform could not be used from background threads. Some functions could be backgrounded, but only very few of them.

JavaScript-friendly wrappers had been written around these libraries, but they only covered a few of the features of these libraries, in addition to which they could not be used from background threads either.

How is OS.File implemented?

OS.File is implemented in pure JavaScript, using the (very nice) js-ctypes library to perform calls to the OS APIs.

Why JavaScript and not C++?

Because we want the code to be easily accessible to the community.

Isn’t that slow?

Well, firstly, JavaScript has grown into a very fast language. These days, expecting without benchmarks that C++ is faster than JavaScript on hot code can cause surprises.

In addition, writing the library in C++ would have meant that we needed to cross language barriers quite often, which is bad for performance, due to:

  • complex memory management;
  • bad JIT-ability; and
  • need to convert all data structures, in particular strings.

We attempt to avoid this as much as possible.

For the moment, however, OS.File has not been benchmarked. We await real-world applications.

Work in progress

We are currently hard at work extending OS.File. The next few landings should add:

Features are driven by application requirements, so if you need some other feature, please do not hesitate to contact me on IRC or to file a bug on Bugzilla.

Fun with Windows paths.

June 19, 2012 § 3 Comments

I am currently attempting to implement a JavaScript library to handle file system paths in a portable manner.

Right now, I am having lots of fun with Windows paths and I wanted to share a few tidbits.

Under Windows, a path name can look like:

  1. “\\?\drivename:” followed by backslash-separated components.
    Such paths can be either relative or absolute.
    In such paths, “.”, “..” and “/” are regular file names.
  2. “\\.\drivename:” followed by backslash-separated components.
    Such paths can be either relative or absolute.
    In such paths, “.”, “..” and “/” are special names.
  3. “\\?\UNC\servername” followed by backslash-separated components.
    Such paths can only be absolute.
    In such paths, “.”, “..” and “/” are regular file names.
  4. “\\servername” followed by slash- or backslash- components.
    Such paths can only be absolute.
    In such paths, “.”, “..” and “/” are special names.
  5. “drivename:” followed by slash- or backslash- components.
    Such paths can be either relative or absolute.
    In such paths, “.”, “..” and “/” are special names.
  6. Just a series of slash- or backslash- components.
    Such paths can be either relative or absolute.
    In such paths, “.”, “..” and “/” are special names.

To simplify things further, depending on the version of Windows, a drive name can be:

  • only one letter between A and Z;
  • any sequence of letters between A an Z;
  • something that looks like Volume{41AF5D4F-04CC-4D15-9389-734BD6F52A7E}.

Also

  • if a path starts with “\\?\”, its length is limited to 32,767 chars;
  • otherwise, its length is limited to 260 chars.

Also

  • some names such as “LPT”, “COM”, etc. are reserved and cannot be used as file names;
  • … unless your path starts with “\\”.

Also

  • paths are case-insensitive;
  • … except when they are case-sensitive because of the disk format;
  • … except when they are case-sensitive because of something else.

Fortunately, the Windows APIs provides the following functions to simplify matters:

  • PathCanonicalize (completely broken);
  • GetFullPathName (broken);
  • GetLongPathName (requires access permissions just to tell you if a path is well-formatted);
  • UriCanonicalize (not sure what it does exactly, I haven’t tested it yet).

Of course, not all Windows API functions accept all schemes.

As you can imagine, I am having lots of fun.

Quick exercise given two paths A and B (either absolute or relative), how do you determine the path obtained by concatenating A and B?

If you are interested in following my progress, details are on bugzilla.

C data finalization – in JavaScript

May 2, 2012 § 4 Comments

A few iterations ago, the Mozilla Platform introducefd js-ctypes, a very nice Foreign Function Interface (FFI) for JavaScript. As its inspiration, Python’s ctypes, js-ctypes lets (privileged) JavaScript code open native libraries, import their functions and call these functions almost as if they were regular JavaScript functions.

Here is an example using the Unix libc:

// Open the C library
let libcCandidates = [
  'libSystem.dylib',// MacOS X
  'libc.so.6',      // Linux
  'libc.so'         // Android, B2G
];
let libc;
for each(let candidate in libcCandidates) {
  libc = ctypes.open(candidate);
  if (libc) {
    break;
  }
}

// Import some functions from libc
let open = libc.declare("open", ctypes.default_abi,
  /*return int*/ ctypes.int,
  /*const char* path*/ctypes.char.ptr,
  /*int oflag*/ ctypes.int
  /*int mode*/ ctypes.int);
let read = libc.read("read", ctypes.default_abi,
  /*return ssize_t*/ ctypes.ssize_t,
  /*int fildes*/ ctypes.int,
  /*void *buf*/ ctypes.voidptr_t,
  /*size_t nbytes*/ ctypes.size_t);
let close = libc.read("close", ctypes.default_abi,
  /*return int*/ ctypes.int,
  /*int fd*/ ctypes.int);

// Now use them
let myfile = open("/etc/passwd", 0, 0);
if (myfile == -1)
  throw new Error("Could not open file");
// ...

If you are familiar with XPConnect, the mechanism generally used in the Mozilla Platform for letting JavaScript and C++ interact, you can see that using js-ctypes to call native code directly is much nicer than adding a C++ XPCOM/XPConnect layer. From what I hear, it seems to be also much faster, as XPConnect needs to perform expensive magic to ensure that memory is properly passed between JavaScript and C++. In addition, this selfsame memory magic now prevents XPConnect from being executed from threads other than the main thread, which makes js-ctypes the only manner of doing any system access from worker threads.

Now, js-ctypes nicely solves the issue of calling native code from JavaScript. However, JavaScript and C are very different languages, with very different paradigms, so getting them to coexist requires a little more than simply the ability to place calls or convert values. In particular, C has:

  • manual resource management (memory must be released, file descriptors must be closed, locks must be released, etc.);
  • no language-level mechanism for error management (a task smaller than a process cannot be killed because of an error).

By contrast, Javascript has:

  • automated memory management, but no support for managing automatically resources other than memory (no user-level finalization or scoped resources mechanism);
  • several language/vm-level mechanisms that can kill a task in non-trivial manners (exceptions, “this script is busy”, etc.)

Unfortunately, putting all of this together makes it quite difficult to write JavaScript code that manipulates C resources without leaking. Such leaks can cause both performance issues (memory leaks, in particular, tend to slow down the whole system) and hard-to-track errors (leaking file descriptors can prevent the application from opening any new file, or, under Windows, can prevent the application from reopening some files that were improperly closed, while leaking locks can completely freeze an application).

Introducing C data finalization

For this reason, we have recently added a new features to js-ctypes, designed to add automated resource management to JavaScript: C data finalization.

Specifying a finalizer is simple:

function openfile(path, flags, mode) {
  let fd = open(path, flags, mode);
  if (fd == -1) {
    throw new Error("Could not open file " + path);
  }
  return ctypes.CDataFinalizer(fd, close);
}

What this code does is ensure that, whenever the file descriptor is garbage-collected, function close is called, releasing the C resources represented by that file descriptor. This value is C data with a finalizer, aka CDataFinalizer.

You can use it just as you would use the C data through js-ctypes:

let myfile = openfile("/etc/passwd", 0, 0);
let result = read(myfile, myarray, 4096); // Read some data
// Wherever required, |myfile| is automatically converted to
// the underlying integer value.
// Once |myfile| has no reference, it will (eventually) be
// closed.

It is, of course, possible (and strongly recommended) to close the file manually to ensure that resources are immediately available for the process and the rest of the system:

let myfile = openfile("/etc/passwd", 0, 0);
// ...
// ... do whatever you wish to do with that file
let result = myfile.dispose(); // This calls |close|.

// From this point, |myfile| cannot be converted to the underlying
// integer value anymore. Any attempt to do so will raise an
// exception.

Or, an equivalent but more verbose solution, using forget:

let myfile = openfile("/etc/passwd", 0, 0);
// ...
// ... do whatever you wish to do with that file
let fd = myfile.forget();
// From this point, |myfile| cannot be converted to the underlying
// integer value anymore. Any attempt to do so will raise an
// exception.
let result = close(fd);

This mechanism is, of course, not restricted to file descriptors. It has been used with success to other data structures, including malloc-allocated strings.

Details and caveat

JavaScript does not feature finalization and might never do so. There are good reasons for this: finalization considerably complicates the garbage-collector and introduces the possibility of subtle bugs and leaks that the various JS implementors do not want to inflict to their users (if you are curious, two of the main problems are resurrection of dead references and finalization of cyclic data structures).

Consequently, C data finalizers are not full-featured finalizers. Indeed, the main limitation of C data finalizers is that its first argument must be a C value and its second argument must be a pointer to a C function – for the above mentioned reasons, letting users specify any JavaScript function as a finalizer would open a can of worms that nobody really wants to see crawling around.

Also, before using a finalizer, you should be aware that JavaScript garbage-collection is not necessarily deterministic – during the testing phase of CDataFinalizer, we have encountered memory errors caused by developers (ok, I will confess, that was me, sorry guys) making invalid assumptions about just when values would be garbage-collected. Let me emphasize this: any hypothesis you make about when a value is finalized is bound to be regularly false. In other words, C data finalizers should be used as a last line of defense, not as the default mechanism for recovering resources.

Still, C data finalizers are a powerful mechanism that make manipulation of C values with JavaScript much more reliable. Indeed, it is one of the core mechanisms used pervasively by the OS.File library.

edit As per Steve Fink’s suggestion, I have emphasized that users should not rely on the behavior of garbage-collection/finalization, and clarified the can of worms.

Scoped resources for all

April 12, 2012 § 2 Comments

A small class hierarchy has been added to MFBT, the “Mozilla Framework Based on Templates” which contains some of the core classes of the Mozilla Platform. This hierarchy introduces general-purpose and specialized classes for scope-based resource management. When it applies, Scope-based resource management is both faster than reference-counting and closer to the semantics of the algorithm, so you should use it :)

The codebase of Mozilla is largely written in C++. While C++ does not offer any form of automatic memory management, the (sometimes scary) flexibility of the language has allowed numerous projects to customize the manner in which memory and other resources are managed, and Mozilla is no exception. Largely, the Mozilla C++ codebase uses reference-counting, to provide automatic memory management in most cases.

While reference-counting is quite imperfect, and while future versions of Mozilla might possibly use other forms of memory management, it is also a very useful tool for such a large codebase. However, in some cases, reference-counting is just too much. Indeed, in a number of simple cases, we prefer the simpler mechanism of scope-based resource management, that is both more predictable, faster and more resource-efficient – at the cost of not being able to scale up to the more complex cases for which reference-counting or even more powerful mechanisms become much more suited.

Scope-based resource management is designed to handle resources that should be cleaned-up as soon as you leave a given scope (typically, the function), regardless of how you leave it (by reaching the end, with a break, a return or an exception).

The following extract illustrates the use of scoped resource allocation:

// returns true in case of success, false in case of error
bool copy(const char* sourceName, const char* destName, size_t bufSize) {
   ScopedFD source(open(sourceName, O_RDONLY));
   if (source.get() == -1) return false;

   ScopedFD dest(open(destName, O_WRONLY|O_CREAT, 0600));
   if (dest.get() == -1) return false;
     // source is closed automatically

   ScopedDeleteArray buf(new char[bufSize]);
   if (buf.get() == NULL) return false;
     // source, dest are closed automatically

   while (true) {
     const int bytesRead = read(source.get(), buf.rwget(), bufSize);
     if (bytesRead == 0) break;
     if (bytesRead == -1) return false;
       // source, dest, buf are cleaned-up

     const int writePos = 0;
     while (writePos < bytesRead) {
       const int bytesWritten = write(dest.get(), buf.get(),
                                      bytesRead - writePos);
       if (bytesWritten == -1) return false ;
         // source, dest, buf are cleaned-up
       writePos += bytesWritten;
     }
   }

   return true;
      // source, dest, buf are cleaned-up
}

As you can see, the main point of these scope-based resource management classes is that they are cleaned up automatically both in case of success and in case of error. In some cases, we wish to clean up resources only in case of error, as follows:

// returns -1 in case of error, the destination file descriptor in case of success
int copy(const char* sourceName, const char* destName, size_t bufSize) {
   ScopedFD source(open(sourceName, O_RDONLY));
   if (source.get() == -1) return -1;

   ScopedFD dest(open(destName, O_WRONLY|O_CREAT, 0600));
   if (dest.get() == -1) return -1;
      // source is closed automatically

   ScopedDeleteArray buf(new char[bufSize]);
   if (buf.get() == NULL) return -1;
      // source, dest are closed automatically

   while (true) {
     const int bytesRead = read(source.get(), buf.rwget(), bufSize);
     if (bytesRead == 0) break;
     if (bytesRead == -1) return -1;
      // source, dest, buf are cleaned-up

     const int writePos = 0;
     while (writePos < bytesRead) {
       const int bytesWritten = write(dest.get(), buf.get(),
                                      bytesRead - writePos);
       if (bytesWritten == -1) return -1 ;
        // source, dest, buf are cleaned-up
       writePos += bytesWritten;
     }
   }

   return dest.forget();
   // source and buf are cleaned-up, not dest
}

While both examples could undoubtedly be implemented with reference-counting or without any form of automated resource management, this would either make the source code much more complex and harder to maintain (for purely manual resource management) or make the executable slower and less explicit in terms of ownership (for reference-counting). In other words, scoped-based resource management is the right choice for these algorithms.

Now, the Mozilla codebase has offered a few classes for scope-based resource management. Unfortunately, these classes were scattered throughout the code, some of them were specific to some compilers, and they were generally not designed to be reusable.

We have recently starting consolidating these classes into a simple and extensible hierarchy of classes. If you need them, you can find the root of this hierarchy, as well as the most commonly used classes, on mozilla-central, as part of the MFBT:

  • ScopedFreePtr<T> is suited to deallocate C-style pointers allocated with malloc;
  • ScopedDeletePtr<T> is suited to deallocate C++-style pointers allocated with new;
  • ScopedDeleteArray<T> is suited to deallocate C++-style pointers allocated with new[];
  • root class Scoped<Trait> and macro SCOPED_TEMPLATE are designed to make it extremely simple to define new classes to handle other cases.

For instance, class ScopedFD as used in the above examples to close Unix-style file descriptors, can be defined with the following few lines of code:


struct ScopedFDTrait
{
public:
  typedef int type;
  static type empty() { return -1; }
  static void release(type fd) {
    if (fd != -1) {
      close(fd);
    }
  }
};
SCOPED_TEMPLATE(ScopedFD, ScopedFDTrait);

So, well, if you need scoped-based resource management, you know where to find it!

I will blog shortly about the situation in JavaScript.

Student project updates

April 12, 2012 § Leave a Comment

As mentioned a few times on this blog, I take part in a few Mozilla-related Student Projects as a mentor or a helper. For this year, projects are finished. Let us take a look at the results.

Save as .epub (Firefox add-on)

(Kevin CORRE, Benjamin ROCHER, Elie AHUMA, Sylvestre ANTOINE – Université d’Orléans, MIAGE + IRAD)

Objective Add the following feature to Firefox, as an add-on: Save a page and its resources as one file, using open standard .epub. This open-standard file can then be transferred to just about any device, edited with LibreOffice, etc.

Current status I have seen version 0.1 on addons.mozilla.org, although it seems to have disappeared in the meantime. I suppose it still needs a little polish, but, hey, it’s a good start.

Follow this project This project lives on github.

Detect use of the wrong account (Thunderbird add-on)

(Baptiste MEYNIER, Johan JANS, Maxime DENOYER, Mustapha OUCHEIKH – Université d’Orléans, MIAGE + IRAD)

Objective Add the following feature to Thunderbird, as an add-on: Detect that a message is being sent to a correspondant using the wrong account (e.g. using a professional account for a personal message or a personal account for a professional message).

Current status The project has reached an important milestone and has become usable. I have the impression that no version has been released, which is a shame, but it already offers some useful features. I really hope that the students continue the project and turn it into a complete add-on.

Follow this project This project lives on github (currently on a non-master branch).

Simplify the addition of several alarms for the same event in Lightning (Thunderbird add-on)

(Loïc LE MÉRO aka Morkai – Université d’Évry, MIAGE 2)

Objective Lightning offers the ability to add several alarms for the same event (e.g. 1 day before then 15 minutes before). Improve the user interface to make this more discoverable.

Current status Code is complete, what is left is to submit it upstream.

Follow this project This project lives on Bugzilla.

Extend Lightning alarms (Thunderbird add-on)

(Anto DOMINIC PAUL – Université d’Évry, MIAGE 2)

Objective Lightning offers the ability to attach alarms to an event. Extend this feature to make it possible to play a music or execute a script when the alarm is triggered.

Current status I have seen a working version. Not 100% about security, but it looks very promising. Guys, I hope you can finish that alarm, I want to use it.

Follow this project This project lives on Bugzilla.

Handle resources in Lightning events (Thunderbird add-on)

(Julien LACROIX – Université d’Évry, MIAGE 2)

Objective Add the ability to attach resource requirements to events: a picnic requires food (one resource), drinks (one resource), cutlery (one resource), etc… Who will bring them? Also, add the ability to attach a geolocation to events, to help finding the way. Who brings the beer?

Current status It works! And it was submitted as an add-on on for Thunderbird.

Follow this project This project lives on github.

Remind me that I need to reply within 24h/remind me that I expect a reply within 24h (Thunderbird add-on)

(Vincent LEGUEVEL, Mickael MAINGE – Université d’Évry, MIAGE 2)

Objective Add the ability to mark a message as “I need to answer within …” / “I expect an answer within …”. Nag the user as long as she hasn’t sent or received the reply.

Current status A little disappointed. A subset of the features is here, but as far as I can tell, not unified as one single add-on.

Follow this project This project lives on two Bugzilla bugs: need to send / expect to receive.

Where Am I?

You are currently browsing the Firefox category at Il y a du thé renversé au bord de la table.

Follow

Get every new post delivered to your Inbox.