Introducing JavaScript native file management

December 6, 2011 § 28 Comments

Summary

The Mozilla Platform keeps improving: JavaScript native file management is an undergoing work to provide a high-performance JavaScript-friendly API to manipulate the file system.

The Mozilla Platform, JavaScript and Files

The Mozilla Platform is the application development framework behind Firefox, Thunderbird, Instantbird, Camino, Songbird and a number of other applications.

While the performance-critical components of the Mozilla Platform are developed in C/C++, an increasing number of components and add-ons are implemented in pure JavaScript. While JavaScript cannot hope to match the speed or robustness of C++ yet (edit: at least not on all aspects), the richness and dynamism of the language permit the creation of extremely flexible and developer-friendly APIs, as well as quick prototyping and concise implementation of complex algorithms without the fear of memory errors and with features such as higher-level programming, asynchronous programming and now clean and efficient multi-threading. If you combine this with the impressive speed-ups experienced by JavaScript in the recent years, it is easy to understand why the language has become a key element in the current effort to make the Mozilla Platform and its add-ons faster and more responsive at all levels.

Many improvements to the JavaScript platform are pushing the boundary of what can be done in JavaScript. Core Modules, strict mode and the let construct are powerful tools that empower developers to produce reusable, clean and safe JavaScript libraries. The Mozilla Platform offers XPConnect and now js-ctypes, two extremely powerful technologies that let privileged JavaScript maskerade as C/C++ and get access to the low-level features of the platform. Other technologies such as the Web Workers expose low-level operating system features through fast, JavaScript-friendly APIs (note that the Mozilla Platform has exposed threads and processes to JavaScript at least since 2005 – Web Workers are faster, nicer, and play much more nicely with the runtime, in particular with respect to garbage-collection and the memory model).

Today, I would like to introduce one such improvement: native file management for JavaScript, also known as OS.File.

Since JavaScript has become a key component to the Mozilla Platform, the Mozilla Platform needs a great library for manipulating files in JavaScript. While both XPConnect and JS-ctypes can (and have been) used for this purpose, our objective, with this library, is to go way beyond the file management APIs that has been exposed to JavaScript so far, regardless of the platform, in terms of:

expressiveness;
integration with the JavaScript side of the Mozilla Platform;
operating system-level features;
performance;
extensibility.

This library is a work in progress by the Mozilla Performance Team, and we have good hope that a fully working prototype will be available by early January. Not everything is implemented yet and all sorts of adjustments can yet be made based on your feedback.

Once we have delivered, it is our hope that you will use this library for your future works on the Mozilla Platform, whether you are extending the Mozilla Platform, developing an add-on or an application, or refactoring some existing feature.

Let me emphasize that this is a Mozilla Platform API (hence the “OS” prefix), not a Web API. By opposition to the HTML5 File object, this API gives full access to the system, without any security limitation, and is definitely not meant to be scriptable by web applications, under any circumstance.

Manipulating files, the JavaScript way

Reading from a file

Let us start with something simple: reading from a file.

First, open the library:

Components.utils.import("resource://gre/modules/osfile.jsm");

OS.File is a JavaScript module, in other words it is shared between all users in the same thread. This is particularly important for speed, as this gives us the ability to perform aggressive caching of certain data.

Once you have opened the module, you may read your file:

var fileName = "/home/yoric/hello";
var contents = OS.File.openForReading.using(fileName, function(myFile) {
  return myFile.readString()
});

This extract:

opens file "/home/yoric/hello" for reading;
reads the contents of the file as a string (assuming ASCII encoding);
closes the file;
reports an error if anything wrong has happened either during opening or during reading;
places the result in variable contents.

This short listing already demonstrates a few interesting elements of the API. Firstly, notice the use of function using. This function performs scope-bound resource management to ensure that the file is properly closed once it has become unneeded, even in presence of errors. This has roughly the same role as a finally block in Java or a destructor on a C++ auto-pointer. I will return to the topic of resource management later. For the moment, suffices to say that closing a file through using or method close is optional but recommended, as open files are a limited resource on all operating systems.

Had we decided to entrust JavaScript to close the file by itself at some point in the future, we could have simply written:

var fileName = "/home/yoric/hello";
var contents = OS.File.openForReading(fileName).readString();

Secondly, consider OS.File.openForReading. As its name suggests, this function/object serves to open an existing file for reading, and it fails if the file does not exist yet. The API provides such functions for all common scenarios, all of which accept optional flags to customize Unix-style file rights, Windows-style sharing properties and other Unix- or Windows-style attributes. Alternatively, function/object/constructor OS.File is the general manner of controlling all details of file opening.

The extracts above do not demonstrate any feature that could not have been achieved with XPConnect. However, let us briefly compare our extracts with an XPConnect-based implementation using similar lines:

the OS.File implementation consists of 2 to 4 lines, including resource cleanup and error-handling / a comparable XPConnect-based implementation requires about 30 lines;
the OS.File implementation works both in the main thread or in a background thread / a comparable XPConnect-based implementation works only in the main thread;
benchmarks are not available yet, but I have hope that the OS.File implementation should be slightly faster due to a lower overhead and an optimized implementation of readString;
in case of error, the OS.File implementation raises an exception with constructor OS.File.Error / the XPConnect-based implementation raises a generic XPConnect exception;
if the file does not exist, the OS.File implementation raises an error while executing OS.File.openForReading / the XPConnect-based implementation raises an error later in the process;
if executed on the main thread, the OS.File implementation will print a warning.

Note that OS.File manipulates this and closures in the JavaScript fashion, which makes it possible to make our extracts even more concise, as follows:

var fileName = "/home/yoric/hello";
var contents = OS.File.openForReading.using(fileName, function() {
  return this.readString();
});

or, equivalently,

var fileName = "/home/yoric/hello";
var contents = OS.File.openForReading.using(fileName,
  OS.File.prototype.readString);

Of course, OS.File is not limited to strings. Indeed, to return a typed array, simply replace readString with readBuffer. For better performance, it is also possible to reuse an existing buffer. This is done by replacing readBuffer with readTo.

Also, OS.File is not limited to reading entire files. Indeed, all read/write functions accept an optional argument that may be used to determine a subset of the file that must be read:

var fileName = "/home/yoric/hello";
var contents = OS.File.openForReading.using(fileName,
  {fileOffset: 10, bytes: 100},
  OS.File.prototype.readString);

Well-known directories

The operations we have demonstrated so far use an hard-coded path “/home/yoric/hello”. This is not a very good idea, as this path is valid only under Linux, but not under Windows or MacOS. Therefore, we certainly prefer asking the Mozilla Platform to select the path for us. For this purpose, we may replace the first line with:

var fileName = OS.Path.home.get("hello");

This extract:

uses global object OS.Path (part of library OS.File);
requests the path to the user’s home directory;
requests item "hello" at this path.

The extract demonstrates a few things. Firstly, the use of OS.Path. This object contains paths to well-known directories, and can be extended with new directories. Each path has constructor OS.Path, and supports a method get that serves to enter into files/directories. Secondly, the use of OS.Path as a path for functions of module OS.File: any function of this module accepts an OS.Path in place of a hard-coded directory.

Note that OS.Path objects are purely in-memory constructs. Building an OS.Path does not cause any call to the file system.

As previously, something similar is feasible with XPConnect. Comparing with a XPConnect-based implementation, we may notice that:

the OS.File implementation consists of 1 line / a comparable XPConnect-based implementation consists of 1 to 4 lines, depending on the use of additional libraries;
the OS.File implementation works both in the main thread and in a background thread / again, XPConnect works only in the main thread;
benchmarks are not available yet, but I have hope that the OS.File implementation should be slightly faster due to a lower overhead and use of caching.

Behaving nicely

The operations we have demonstrated so far are synchronous. This is probably not problematic for file opening, but reading a large file synchronously from the main thread is a very bad idea, as it will freeze the user interface until completed. It is therefore a good idea to either send the operation to a background thread or to ensure that reading takes place by small chunks.

OS.File supports both scenarios by integrating with (work-in-progress) libraries Promise and Schedule, both of which will be introduced in another post, once their API has stabilized.

The first step to reading asynchronously is to open library Promise. We will take the opportunity to open Schedule

Components.utils.import("resource://gre/modules/promise.jsm");
Components.utils.import("resource://gre/modules/schedule.jsm");

Now that the module is open, we may use asynchronous reading and asynchronous writing functions:

var promisedContents = OS.File.openForReading(fileName).
    readString.async();

This operation schedules progressive reading of the file and immediately returns. Note that we do not close the file, as this would stop reading, probably before the operation is complete. The result of the operation, promisedContents, is a Promise, i.e. a variable that will eventually contain a value, and that may be observed or polled, as follows:

promisedContents.onsuccess(function(contents) {
  console.log("It worked", contents);
});
promisedContents.onerror(function(error) {
  console.log("It failed", error);
});

Similarly, reading from a background thread is a simple operation:

var promisedContents = Schedule.bg(function() {
  importScripts("resource://gre/modules/osfile.jsm");
  var fileName = "/home/yoric/hello";
  return OS.File.openForReading.using(fileName, function(myFile) {
    return myFile.readAsString();
  });
);

The call to Schedule.bg “simply” sends a task to a background thread and ensures that any result, error, etc. is routed back to the promise. The promised value itself is used exactly as in the previous example.

Once again, we may compare to the XPConnect-based implementation;

OS.File-based implementation of asynchronous reading takes 3 lines including opening, closing, resource management / general XPConnect-based implementation of asynchronous reading takes about 10-15 lines, although reading from a hard-coded path or a resource inside the Mozilla Platform can be reduced to 5-6 lines;
OS.File implementation of background reading takes 5 lines / XPConnect does not expose sufficient features to permit permit background, although such features could certainly be implemented in C++ and exposed through XPConnect;
OS.File-based implementation only works for files / XPConnect-based implementation works for just about any construction;
benchmarks are not available, but I have hope that the OS.File implementation should be faster than the XPConnect-based implementation due to a less generic implementation and a lower overhead;
the promises used in the OS.File-based implementation encourages writing code in natural order, in which the code that uses a value appears after the code that fetches the value / XPConnect-based implementation encourages backwards coding, in which the function that uses a value appears before the code that fetches the value (aka “asynchronous spaghetti programming”).

API summary

The API defines the following constructors:

OS.File – all operations upon an open file, including reading, writing, accessing or altering information, flushing, closing the file;
OS.Dir – all operations upon an open directory, including listing its contents, walking through the directory, opening an item of the directory, removing an item of the directory;
OS.Path – all operations on paths which do not involve opening a directory, including concatenation, climbing up and down the tree ;
OS.File.Error – all file-system related errors.

and the following global objects:

OS.File – opening a file, with or without auto-cleanup;
OS.Dir – opening a directory;
OS.Path – well-known directories and files.

Speed

Writing fast, cross-platform, file manipulation code is a complex task. Indeed, some platforms accelerate opening a file from a directory (e.g. Linux), while other platforms do not have such operations (e.g. MacOS, Windows). Some platforms let applications collect all information regarding a file with a single system call (Unix), while others spread the work through several system calls (Windows). The amount of information that may be obtained upon a file without having to perform additional system calls varies from OS to OS, as well as the maximal length of a path (e.g. under Windows, the value of MAX_PATH is false), etc.

The design of OS.File takes this into account, as well as the experience from the previous generations of file manipulation APIs in the Mozilla Platform (prfile and nsIFile/nsILocalFile), and works hard to minimize the number of system calls required for each operation, and to let experts fine-tune their code for performance. While benchmarking is not available yet, we have good hope that this will make it possible to write IO code that runs much faster, in particular on platforms with slow file systems (e.g. Android).

In addition, although this should have a much smaller impact, OS.File uses as bridge between C++ and JavaScript the JSAPI, which is, at the moment of this writing, the fastest C++-to-JavaScript bridge on the Mozilla Platform.

Responsiveness

Speed is not sufficient to ensure responsiveness. For this purpose, long-running operations are provided with asynchronous variants that divide the work in smaller chunks to avoid freezing up the thread. The API does not enforce the use of these asynchronous variants, as experience shows that such a drastic choice is sometimes too constraining for progressive refactoring of synchronous code towards better asynchronicity.

Every operation can be backgrounded thanks to the Schedule module. At the time of this writing, it is not possible to send a file from a thread to another one, but we have a pretty clear idea of how we can do this, so this should become possible at some point in the future.

What now?

As mentioned, this is a work in progress. I am currently hard at work on building a complete prototype by the end of December, with the hope of landing something soon afterwards. I expect that benchmarking will continue after this stage to fine-tune some low-level choices and improve the API. If you wish to follow progress – or vote for this feature – we have a Bugzilla tracking bug on the topic, and a whole host of subbugs.

Note that this API will not replace nsIFile, although once it has landed, some of our JavaScript code will progressively from nsIFile to OS.File.

If you have any feedback, now is a great time to send it. Would you use this API? Would you need certain specific or obscure feature that is currently missing in the Mozilla Platform or that risks being lost?

In future posts, I will introduce further examples and detail some of the choices that we have made to ensure the best possible speed on all platforms.

Stay tuned!

Tagged: api, asynchronous, bsd, development, file, file system, Firefox, future, javascript, js, macos, mozilla, mozilla platform, open-source, performance, platform, posix, programming, promise, responsiveness, snappy, speed, threads, webapi, windows

§ 28 Responses to Introducing JavaScript native file management

Axel Rauschmayer says:

December 6, 2011 at 11:46 am

It would be nice if you could collaborate with Node.js here.

Reply
- yoric says:
  
  December 6, 2011 at 12:37 pm
  
  I would be glad to.
  
  Reply
  - Axel Rauschmayer says:
    
    December 6, 2011 at 12:57 pm
    
    Idea: Contact them at [1] and ask them how and if they would be willing to collaborate. Proposing something that is completely different from what they have is probably tricky, but you could point out places in their current API that you think need to be improved.
    
    [1] http://groups.google.com/group/nodejs-dev
Michael Ratcliffe says:

December 6, 2011 at 11:55 am

Awesome, I remember my head almost exploding when I first had to manipulate files using nsIFile … the new API is far more understandable.

Reply
Chris says:

December 6, 2011 at 12:14 pm

why don’t you just use the node.js apis? Do you really need to reinvent the wheel?

Reply
- yoric says:
  
  December 6, 2011 at 12:36 pm
  
  Well, one could ask why the Node.js developers did not reuse on our old file APIs or another high-level API such as Glib, and had to reinvent the wheel 🙂 – and, in fact, they were certainly right to not use our old file APIs, because the high-level ones were both too complex and not targeted for high responsiveness and the low-level ones were a tad obscure.
  
  Now, we have equally good reasons not to use the Node.js APIs. Unless there has been a drastic change in Node.js in the past few months, Node.js’ File APIs are 100% “generic Unix”-centric, which is a very sane choice for server-side programming, but not for us. By opposition, our platform is meant to host desktop applications that interact nicely with the operating system, even when this operating system is, say, Windows or Android. This needs a different design. For similar reasons, the design of Node.js does not take advantage of platform-specific non-portable optimizations that speed up e.g. file opening or directory walking under some circumstance. Server-side, this is probably not a big deal: hard drives are fast, and I assume that disk usage by Node.js is not a dominating cost. On platforms such as Android phones, on the other hand, where JavaScript is fast but the file system itself is slow, we need to try much harder to play along the strengths of the file system.
  
  Now, on the other hand, it might be interesting to eventually contribute this library to Node.js. If a Node.js developer is reading these lines and is interested, feel free to contact me, and we will see what we can do.
  
  Reply
David says:

December 6, 2011 at 4:15 pm

I wouldn’t lose any sleep over collaborating with node.js developers.

Reply
Ted Mielczarek (@TedMielczarek) says:

December 6, 2011 at 5:35 pm

Nice work, looks a lot better than the existing XPCOM mess. My only complaint would be the verbosity of some of your APIs. “OS.File.openForReading.using(fn, …)” is quite a handful to type. Compare vs. Python’s “with open(fn) as f:”.

Reply
- yoric says:
  
  December 6, 2011 at 5:46 pm
  
  Thanks.
  And yes, I admit that it is sometimes verbose. I am generally a big fan of conciseness when it does not hurt readability or robustness, so ff you have good ideas to reduce verbosity, I would love to hear them.
  
  Let’s see what we have:
  - namespacing – we need this namespacing to avoid confusion with other file libraries (although “OS.File” could certainly be renamed “OSFile”), but users can work around most of this verbosity this with a simple “var OF = OS.File”;
  - scope-bound resource management – if someone finds a way to implement Python’s magical “with” without the need for higher-order functions or with a lighter syntax, I will be eager to adopt;
  - openForReading – guilty as charged, I purposedly used a long function name to differentiate it from “createNewFile”, “createOrOverwrite”, etc., as I am convinced that this will make code easier to read and write.
  Reply
  - pd says:
    
    December 9, 2011 at 4:02 pm
    
    Complaining about the length of these method calls is like saying that a 100 metre sprint is too long compared to a marathon. Relative to XPConnect the length of naming in this new API is orders of magnitude better. Well done on sticking to your beliefs with a longer literal name too. openForReading is perfectly literal and logical and far from too long.
    
    Ted, does your editor have code completion?
Taras Glek says:

December 6, 2011 at 9:28 pm

This looks nice, however opening files should not happen on main thread either. There need to be convenient async versions of open/stat and the ones in your example should print a warning on main thread.

Reply
- yoric says:
  
  December 6, 2011 at 10:01 pm
  
  No problem. I will see if I can get any OS-accelerated support for async open/stat, if possible.
  
  Reply
Steve Fink says:

December 6, 2011 at 9:46 pm

I can understand why you picked openForReading, but I’m not convinced. It seems like you’ll end up needing flags anyway, unless you’re ok with having openForReadingInBinaryMode, createOrAppendWithoutCaching, and openForReadingAndWriting(…AndTruncateIfItAlreadyExists). Or are you pulling out a handful of the most important flags (unix: O_RDONLY, O_WRONLY, O_RDWR, O_CREAT, O_TRUNC?) into the name and using parameters for the rest?

Reply
- yoric says:
  
  December 6, 2011 at 10:00 pm
  
  Or are you pulling out a handful of the most important flags (unix: O_RDONLY, O_WRONLY, O_RDWR, O_CREAT, O_TRUNC?) into the name and using parameters for the rest?
  
  Indeed, I only put a handful of flags in the function name, to handle the most common cases. At the moment, the complete list is:
  - openForReading
  - createOrOverwrite
  - createNewFile
  - appendToExistingFile
  - openOrCreate
  Under Unix, this is O_CREATE, O_TRUNC, plus one additional case. Under Windows, this is exactly dwCreationDisposition. Each function comes with “reasonable” defaults flags and also accepts all the usual Unix flags and Windows flags. I make no attempt to normalize flags beyond the flags previously mentioned and appending.
  
  Reply
  - martensms says:
    
    December 10, 2011 at 6:44 pm
    
    Hum… wouldn’t it be nicer to have something like:
    
    OS.File.read()
    OS.File.open()
    OS.File.append()
    Os.File.write() -> having a parameter to overwrite existing data, like a true/false switch?
    
    I think the long names are pretty heavy when it comes to typing as they may change in future if the library evolves.
    
    Most scripting languages (well, even PHP) solve it with a parameter where you can modify stdin / stdout handling, like the ‘w+’, ‘a+’ in PHP. I don’t prefer these parameters, but I think optional true/false switch parameters are a good way to go.
    
    read(…, true) -> dunno
    
    append(…, true) -> will enforce writing, even if file doesn’t exist
    open(…, true) -> will create the file if it doesnt exist
    write(…, true) -> will overwrite an already existing file
    
    Most of my C/C++ File System or Low Level APIs have those short-named functions as they are mostly public wrappers for internal functionality. What do you think about this idea?
    
    Greets from Mayence,
    Chris
  - yoric says:
    
    December 10, 2011 at 6:50 pm
    
    Interesting idea, I like it. However, keep in mind that, if we count all options (including Unix-specific and Windows-specific options), we could end up with 30 such booleans.
mawrya says:

December 6, 2011 at 10:26 pm

I think developers have wanted this sort of simplified file I/O control for some time now, so this is good to hear. Are you familiar with jslib? http://jslib.mozdev.org/libraries/io/io.html

I started using jsLib back in 2004 to get this similfied file access and was always surprised this sort of thing wasn’t baked into the mozilla code base. I haven’t used it in a few years but I believe many people are still using it in various mozilla platform projects.

Reply
- yoric says:
  
  December 7, 2011 at 9:13 am
  
  Thanks for the pointer, I admit that I had completely forgotten the existence of jslib.
  
  Looking at the API, it seems that jslib is a thin layer on top of nsIFile and nsIIOService. Interesting to keep in mind, but I believe that OS.File is heading for something both more convenient, more responsive and faster. The main difference being, of course, that jslib is pure JS on top of XPConnect, while OS.File contains considerable amounts of C/C++.
  
  Reply
philikon says:

December 7, 2011 at 5:43 am

There are some great ideas here, but as Taras points out we must stop doing any kind of sync I/O on the mainthread. In my experience, even simply providing it as an option will lead to people using it accidentally and affecting responsiveness a lot.

Also, I would recommend taking a look at FileUtils.jsm (https://developer.mozilla.org/en/JavaScript_code_modules/FileUtils.jsm). There’s some great stuff there already and we can easily extend to provide even more helpers, along the lines of the stuff you suggest. Patches are welcome, and I’ll be happy to review.

Reply
- yoric says:
  
  December 7, 2011 at 8:37 am
  
  Thanks for the offer. I am quite aware of FileUtils and it is definitely a very useful tool. Unfortunately, for fast and responsive I/O, we are far beyond simple helpers, for the simple reason that XPConnect does not work on non-main threads, so I am not completely sure that integrating OS.File and FileUtils would be a good idea.
  
  As for the option of doing any kind of sync I/O on the main thread, I am trying to find the best compromise between helping people write code and enforcing our policies. If I just forbid synchronous code on the main thread, I suspect that users of the library will “simply” add nested event loops, which will turn simple annoyances into total nightmares.
  
  Reply
Panos Astithas says:

December 7, 2011 at 8:12 am

You may be interested to know that there is already at least one Promise implementation in the tree, in browser/devtools/shared/Promise.jsm. It’s pretty spartan, but you could expand on it if it would make sense for your use case.

Reply
- yoric says:
  
  December 7, 2011 at 8:33 am
  
  Thank you very much, I will definitely do that.
  We should get bug 703336 going…
  
  Reply
  - yoric says:
    
    December 13, 2011 at 8:48 am
    
    Done, thank you very much for the pointer. I have refactored all the code around Promise, and it works nicely.
Tane Piper (@tanepiper) says:

December 9, 2011 at 2:17 pm

I already wrote a reference implementation of the NodeJS file system for the Chrome FileSystem stuff – feel free to take a look and use it for this: https://github.com/tanepiper/webfs

Reply
- yoric says:
  
  December 9, 2011 at 2:27 pm
  
  Interesting, thanks.
  
  Reply
pd says:

December 9, 2011 at 2:41 pm

“the OS.File implementation consists in 2 to 4 lines”
should be
“the OS.File implementation consists of 2 to 4 lines”

Reply
- yoric says:
  
  December 9, 2011 at 2:44 pm
  
  Thanks, fixed.
  
  Reply
pd says:

December 9, 2011 at 4:14 pm

As a non-C++ developer who has been wanting to get into Mozilla (or “Mozilla Platform” or GRE or Gecko 2 which became 8 AFAIK or XULRunner, or whatever it is called today) development for some time, this is a very interesting API. I do feel though that there’s now a whole jumble of file manipulation tools/APIs in the “Platform” which is a bit worrying.

It’s great to see that you are focusing on speed, multi-threading and developer-friendly syntax.

Keep up the great work.

Reply