Beautiful Off Main Thread File I/O

October 18, 2012 § 7 Comments

Now that the main work on Off Main Thread File I/O for Firefox is complete, I have finally found some time to test-drive the combination of Task.js and OS.File. Let me tell you one thing: it rocks!

« Read the rest of this entry »

Getting file information with OS.File

July 31, 2012 § 2 Comments

OS.File keeps gaining new features.

Today, let me show you OS.File.stat and OS.File.prototype.stat, two data structures used to get information on a file, such as its size, its creation date or its nature.

How to

There are two ways to get information on a file.

The first technique is to simply call OS.File.stat with the path of the file you wish to open:

// File sessionstore.js in the user’s profile directory
let path = OS.Path.join(OS.Constants.Path.profileDir, "sessionstore.js");
let stat = OS.File.stat(path)

This returns a OS.File.Info object containing all the interesting information on the file.

if (stat.isDir) {
  dump("This is a directory\n");
} else if (stat.isSymLink) {
  dump("This is a symbolic link\n");
}
dump("The file contains " + stat.size + "bytes\n”);
dump("The file was created at " + stat.creationDate + "\n");
dump("The file was last accessed at " + stat.lastAccessDate + "\n");
dump("The file was last modified at " + stat.lastModificationDate + "\n");

Additionally, under Unix, some security information is available:

if ("unixOwner" in OS.File.Info.prototype) {
  dump("The file belongs to user " + stat.unixOwner +
    " in group " + stat.unixGroup +
    " and has mode " + stat.unixMode);
}

That’s it.

The second technique will let you get information on a file that is already opened:

let file = OS.File.open(path);
let stat = file.stat();

The result is exactly the same. Of course, file.stat() is faster if you have already opened the file, while OS.File.stat(path) if faster than opening the file, calling file.stat() then closing it.

Exercise

Let’s put OS.File.stat and OS.File.DirectoryIterator to good use for getting the list of all files in a directory, ordered by last modification date.

function sortedEntries(path) {
  // Get the list of all files in directory
  let iterator = new OS.File.DirectoryIterator(path);
  let entries;
  try {
    entries = [entry for (entry in iterator)];
  } finally {
    iterator.close();
  }

  // If we are under Windows, we have all information in entries already
  // We can make this happen without any further I/O
  if ("winLastModificationDate" in OS.File.DirectoryIterator.prototype) {
    return entries.sort(function compare(x,y) {
      return x.winLastModificationDate - y.winLastModificationDate;
    }
  } else {
    // On other systems, we have to call stat before we can order
    let sortable = [{entry: entry, stat: OS.File.stat(entry.path)} for (entry in entries)];
    // Array comprehension is cool
    let sorted = sortable.sort(function compare(x, y)) {
      return x.stat.lastModificationDate - y.stat.lastModificationDate;
    }
    return [x.entry for (x in sorted)];
  }
}

Note that OS.File.DirectoryIterator does not return special files “.” and “..”.

For bonus points, let’s do the same, but only for non-directory files in the directory:

function nonDirectoryEntries(path) {
  // Get the list of all files in directory
  let iterator = new OS.File.DirectoryIterator(path);
  try {
    for (let entry in iterator) {
      if (!entry.isDir) {
        // Generators are cool, too
        yield entry;
      }
    }
  } finally {
    iterator.close();
  }
}

function sortedEntries(path) {
  // Get the list of all non-directory files in directory
  let entries = nonDirectoryEntries(path);
  if ("winLastModificationDate" in OS.File.DirectoryIterator.prototype) {
    // ... as above
  } else {
    // ... as above
  }
}

We could of course remove directories after sorting, but removing it initially saves both computation time (we sort through a shorter array) and I/O (under non-Windows platforms, we only need to call stat on a smaller set of files).

Homework

As a Programming Language guy, I see an opportunity to develop this API into a nice Domain Specific Language that would let developers formulate queries and would let the engine generate OS-optimized functions to execute these queries.

For instance:

OS.File.Query.SelectFromDir().
  where({isDir: false}).
  sortedBy({lastModificationDate: true})
  // returns the above function, including the optimizations

 

OS.File.Query.SelectFromDir().
  where({path: /.*\.tmp^/, isSymLink:false}).
  sortedBy({creationDate: true})

I do not have plans to implement anything such at the moment, but this sounds like a nice student project. If you are interested, do not hesitate to drop me a line.

That’s all folks.

In the next entries of this blog, I expect to introduce, in no particular order:

  • path manipulation with OS.File;
  • reading/writing with encodings in OS.File;
  • off-main-thread async I/O for the main thread;

benchmarks.

OS.File: Iterating through a directory

July 17, 2012 § 2 Comments

Since its first landing, OS.File has been steadily gaining new features. Today, let me show you OS.File.DirectoryIterator. As its name implies, this class serves to iterate through the contents of a directory.

How to

Iterating through a directory is quite simple:

let iterator = new OS.File.DirectoryIterator("/tmp");
try {
  for (let entry in iterator) {
    // Do something with the entry.
  }
} finally {
  iterator.close(); // Release system resources as soon as possible
}

As usual with OS.File, calling iterator.close() is not strictly necessary, but is a good habit to take, as it releases critical resources immediately.

Of course, should you need all the entries for future consumption, you can place them all in an array as follows:

let array = [entry for (entry in iterator)]; // Array comprehensions in JS, at last!

Each entry contains the available information about one file:

for (let entry in iterator) {
  // Checking the type of the entry
  if (entry.isDir) {
    console.log(entry.name, "is a directory");
  } else if (entry.isLink) {
    console.log(entry.name, "is a link”);
  } else {
    console.log(entry.name, "is a regular file");
  }

  // Getting the full path to the entry
  console.log("Full path", entry.path);
}

One of the main design goals of OS.File is to be I/O efficient. Here, this means that during the iteration, the library will perform exactly one I/O call per entry, with one additional call for opening the iterator and one for closing. With respect to I/O, getting the type, name and path of the file is free.

Under Windows, a few additional informations are available, also for free. As usual with OS.File, OS-specific features are prefixed: winCreationTime, winLastWriteTime and winLastAccessTime.

For instance, to list the creation times of entries under Windows:

for (let entry in iterator) {
  if ("winCreationTime" in entry) {
    console.log("The file was created at", entry.winCreationTime);
  }
}

Or, to sort a list of entries by creation time:

let entries = [entry for (entry in iterator)];
if (entries.length > 0 && "winCreationTime" in OS.File.DirectoryIterator.Entry.prototype) {
  entries = entries.sort(function(a, b) {
    return a.winCreationTime - b.winCreationTime;
  })
}

If you wonder why we introduced fields winCreationTime et al. and not a cross-platform field creationTime, recall that, for the sake of I/O efficiency, each entry only contains the information returned by one single I/O call. As the Windows call returns more information than the Unix version, an entry under Windows offers more information than under Unix.

Finally, the Windows back-end offers an additional feature: iterating through only the subset of the entries of the directory matching some regular expression. As usual, since the feature is Windows only, it is prefixed by win.

let iterator = new OS.File.DirectoryIterator("C:\\System\\TEMP",
    /*platform-specific options*/ { winPattern: "*.tmp" } );
// ... do something with that iterator

FAQ

What’s this I/O efficiency?

The two main goals with OS.File are:

  • provide off-main-thread I/O;
  • be I/O efficient.

I/O efficiency is all about minimizing the number of actual I/O calls. This is critical because some platforms have extremely slow storage (e.g. smartphones, tablets) and because, regardless of the platforms, doing too much I/O penalizes not just your application but potentially all the applications running on the system, which is quite bad for the user experience. Finally, I/O is often expensive in terms of energy, so wasting I/O is wasting battery.

Consequently, one of the key design choices of OS.File is to provide operations that are low-level enough that they do not hide any I/O from the developer (which could cause the developer to perform more I/O than they think) and, since not all platforms have the same features, offer system-specific information that the developer can use to optimize his algorithms for a platform.

How does OS.File compare to Node.js I/O in terms of I/O-efficiency?

OS.File is designed for efficient off-main-thread I/O. For the moment, OS.File does not provide an asynchronous API that can be used from the main thread, although we are working on fixing this.

By contrast, Node.js low-level I/O is designed to mirror a subset of an old version of Posix, and provides both a synchronous and an asynchronous API on top of these calls.

The choice made by Node.js works well on the platforms for which Node.js is generally targeted (e.g. Unix-based servers) but we need better to cope with the platforms for which Firefox and Firefox OS are targeted (e.g. not only Unix but also Windows machines, as well as battery-powered devices with slow storage, etc.).

How does directory iteration compare to Node.js directory iteration?

Node.js provides a primitive readdir to iterate through a directory. This primitive returns an array of file names. The implementation of this primitive already costs about n I/O calls, where n is the number of files in the directory.

Consequently, the algorithm to determine which entries of a directory are subdirectories costs

  • about n I/O calls to establish the list of entries ; then
  • about n I/O calls to determine which are subdirectories.

This makes walking a directory recursively (to empty it or to copy it to another drive, for instance) twice more expensive than necessary. Note that this measure is very much non-scientific, as the I/O call to determine if an entry is a subdirectory can be much more expensive than the call to list the entry, depending on the OS.

By comparison, the OS.File directory iterator requires about n I/O calls for this purpose.

Similarly, under all platforms, finding the file accessed least recently has a cost of about 2·n I/O calls under Node.js.

With OS.File, the cost is similar to Node.js under Unix, but only n under Windows. Upcoming work with OS.File should also onsiderably reduce the I/O cost under Linux, Android and Firefox OS.

Finally, for an algorithm that can break from iteration once some condition is met (e.g. looking if at least one file matches some condition), Node.js will still require n I/O calls, while OS.File generally requires only as many I/O calls.

How does this compare to XPCOM nsILocalFile::directoryEntries?

Until now, the only manner of listing entries in a directory on the Mozilla Platform was nsILocalFile::directoryEntries.

Generally, OS.File directory is more convenient to use from JavaScript, can be called off-main thread, and provides more information than nsILocalFile::directoryEntries for the same I/O cost, which makes it more I/O efficient for e.g. iterating a directory looking for a file matching some condition, or walking recursively through a hierarchy.

The counterparts are that OS.File directory iteration cannot be used from the main thread yet and cannot be called from C++.

(re)introducing OS.File

June 27, 2012 § 6 Comments

OS.File is a new JavaScript library available to Firefox and Thunderbird developers and add-on developers. This library offers efficient, low-level, backgrounded, interaction with the file system, with a number of primitives to take advantage of the specific features of each platform. It is also a nice example of systems programming in JavaScript. Please use it, look at the code, and please report bugs and missing features.

(re)Introducing OS.File

A considerable aspect of our work, at Mozilla, is to ensure that the user experience is smooth and responsive. One of the main tools available to developers to permit such responsive code is multi-threading: any computation or interaction with the system that takes too long can (and should) be pushed into the background, and should interact asynchronously with the user interface.

Now, one of critical bottlenecks in any application is I/O: accessing the disk (or the network, or the database…) is typically orders of magnitude slower than any in-memory operation – plus it can sometimes disrupt the user experience of the complete system. This is true on desktop systems and this is even more true on smartphones and tablets.

What this means is that we need a nice library to perform I/O, and by nice, I mean:

  • I/O should be backgrounded;
  • the number of I/O operations should be carefully controlled.

This is what OS.File is all about: OS.File is a library available to developers (including add-on developers) on the Mozilla platforms
(Firefox, Thunderbird, Songbird, InstantBird, Boot-to-Gecko, etc.). This library is available (only) to JavaScript, and it offers
low-level access to the file system, available to background threads.

As its name implies, OS.File is a system library, not a web library, so web application developers will not have access to it.

A first usable version of OS.File has landed a few days ago and is now available on nightly build of Mozilla Platform applications. We are progressively working on adding features, and I would like to invite all developers who need to do I/O to try it, report any bugs and request any features they need.

Using OS.File

OS.File offers both a cross-platform API (module OS.File itself) and bindings to platform-specific functions (modules OS.Win.File and OS.Unix.File), as well as utilities for system programming (modules OS.Shared and OS.Constants). In this post, I will only discuss module OS.File itself.

By design, in this first delivery, module OS.File is quite minimalistic. Features will be added progressively (see next section). You can find the documentation of OS.File on MDN, as usual.

For the moment, module OS.File can be used only from a chrome worker (i.e. a privileged JavaScript background thread).

Renaming a file


OS.File.move("a.tmp", "b.tmp");

In case of error, this will raise an exception of type OS.File.Error.

Copying a file, handling errors, options


try {
  OS.File.copy("b.tmp", "c.tmp", {noOverwrite: true});
} catch(ex) {
  if (ex.becauseNoSuchFile) {
    // b.tmp does not exist
  } else if (ex.becauseFileExists) {
    // c.tmp exists and we do not want to overwrite it
  }
}

Open a file, read a prefix


let buffer = new ArrayBuffer(12); // Also works with a js-ctypes C pointer
let file
try {
  file = OS.File.open("myfile.tmp"); // No options: open for reading
  let bytes = file.read(buffer, 12);
  // Do something with these bytes
  // ...
} finally {
  if (file) {
    file.close();
  }
}

Open a file for writing


let file = OS.File.open("myfile.tmp", {create:true}); // Fail if the file already exists

Note that this operation will only require one I/O interaction with the operating system – this is much faster than first checking whether the file already exists, and then creating it if it does not.

Open a file with OS-specific options


let file = OS.File.open("myfile.tmp",
  {create:true},
  {unixMode: OS.Constants.libc.S_IRWXU | OS.Constants.libc.S_IRWXG }
);

Short FAQ

What’s good about OS.File?

  • Finally, file I/O for JavaScript workers.
  • An API much more JavaScript-friendly than what already existed in the Mozilla Platform.
  • Options and low-level functions to ensure that we perform minimal amount of actual I/O.

Wasn’t all that already possible?

The existing I/O libraries on the Mozilla Platform could not be used from background threads. Some functions could be backgrounded, but only very few of them.

JavaScript-friendly wrappers had been written around these libraries, but they only covered a few of the features of these libraries, in addition to which they could not be used from background threads either.

How is OS.File implemented?

OS.File is implemented in pure JavaScript, using the (very nice) js-ctypes library to perform calls to the OS APIs.

Why JavaScript and not C++?

Because we want the code to be easily accessible to the community.

Isn’t that slow?

Well, firstly, JavaScript has grown into a very fast language. These days, expecting without benchmarks that C++ is faster than JavaScript on hot code can cause surprises.

In addition, writing the library in C++ would have meant that we needed to cross language barriers quite often, which is bad for performance, due to:

  • complex memory management;
  • bad JIT-ability; and
  • need to convert all data structures, in particular strings.

We attempt to avoid this as much as possible.

For the moment, however, OS.File has not been benchmarked. We await real-world applications.

Work in progress

We are currently hard at work extending OS.File. The next few landings should add:

Features are driven by application requirements, so if you need some other feature, please do not hesitate to contact me on IRC or to file a bug on Bugzilla.

Introducing JavaScript native file management

December 6, 2011 § 28 Comments

Summary

The Mozilla Platform keeps improving: JavaScript native file management is an undergoing work to provide a high-performance JavaScript-friendly API to manipulate the file system.

The Mozilla Platform, JavaScript and Files

The Mozilla Platform is the application development framework behind Firefox, Thunderbird, Instantbird, Camino, Songbird and a number of other applications.

While the performance-critical components of the Mozilla Platform are developed in C/C++, an increasing number of components and add-ons are implemented in pure JavaScript. While JavaScript cannot hope to match the speed or robustness of C++ yet (edit: at least not on all aspects), the richness and dynamism of the language permit the creation of extremely flexible and developer-friendly APIs, as well as quick prototyping and concise implementation of complex algorithms without the fear of memory errors and with features such as higher-level programming, asynchronous programming and now clean and efficient multi-threading. If you combine this with the impressive speed-ups experienced by JavaScript in the recent years, it is easy to understand why the language has become a key element in the current effort to make the Mozilla Platform and its add-ons faster and more responsive at all levels.

« Read the rest of this entry »

Where Am I?

You are currently browsing entries tagged with file at Il y a du thé renversé au bord de la table.

Follow

Get every new post delivered to your Inbox.

Join 29 other followers