Recent changes to OS.File

April 8, 2014 § 5 Comments

A quick post to summarize some of the recent improvements to OS.File.

Encoding/decoding

To write a string, you can now pass the string directly to writeAtomic:

OS.File.writeAtomic(path, "Here is a string", { encoding: "utf-8"})

Similarly, you can now read strings from read:

OS.File.read(path, { encoding: "utf-8" } ); // Resolves to a string.

Doing this is at least as fast as calling TextEncoder/TextDecoder yourself (see below).

Native implementation

OS.File.read has been reimplemented in C++. The main consequence is that this function can now be used safely during startup, without having to wait for the underlying OS.File ChromeWorker to start. Also, decoding (see above) is performed off the main thread, which makes it much faster.

According to my benchmarks, using OS.File.read to read strings is about 2-5x faster than NetUtil.asyncFetch on large files and doesn’t block the main thread for more than 5ms, while asyncFetch performs string decoding on the main thread. Also, it doesn’t perform any main thread I/O by opposition to NetUtil.asyncFetch.

Backups

When using writeAtomic, it is now possible to request existing files to be backed up almost atomically. In many cases, this is a good strategy to ensure that data is safely written to disk, without having to use a flush, which would be expensive for the whole system.

yield OS.File.writeAtomic(path, data, { tmpPath: path + ".tmp", backupTo: path + ".backup} } );

Compression

writeAtomic and read both now support an implementation of lz4 compression

yield OS.File.writeAtomic(path, data, { compression: "lz4"});
yield OS.File.read(path, { compression: "lz4"});

Note that this format will not be understood by any command-line tool. It is somewhat proprietary. Also note that (de)compression is performed on the ChromeWorker thread for the time being, so it doesn’t benefit from the native reimplementation mentioned above.

Creating directories recursively

let dir = OS.Path.join(OS.Constants.Path.profileDir, "a", "b", "c", "d");
yield OS.File.makeDir(dir, { from: OS.Constants.Path.profileDir });

OS.File: Iterating through a directory

July 17, 2012 § 2 Comments

Since its first landing, OS.File has been steadily gaining new features. Today, let me show you OS.File.DirectoryIterator. As its name implies, this class serves to iterate through the contents of a directory.

How to

Iterating through a directory is quite simple:

let iterator = new OS.File.DirectoryIterator("/tmp");
try {
  for (let entry in iterator) {
    // Do something with the entry.
  }
} finally {
  iterator.close(); // Release system resources as soon as possible
}

As usual with OS.File, calling iterator.close() is not strictly necessary, but is a good habit to take, as it releases critical resources immediately.

Of course, should you need all the entries for future consumption, you can place them all in an array as follows:

let array = [entry for (entry in iterator)]; // Array comprehensions in JS, at last!

Each entry contains the available information about one file:

for (let entry in iterator) {
  // Checking the type of the entry
  if (entry.isDir) {
    console.log(entry.name, "is a directory");
  } else if (entry.isLink) {
    console.log(entry.name, "is a link”);
  } else {
    console.log(entry.name, "is a regular file");
  }

  // Getting the full path to the entry
  console.log("Full path", entry.path);
}

One of the main design goals of OS.File is to be I/O efficient. Here, this means that during the iteration, the library will perform exactly one I/O call per entry, with one additional call for opening the iterator and one for closing. With respect to I/O, getting the type, name and path of the file is free.

Under Windows, a few additional informations are available, also for free. As usual with OS.File, OS-specific features are prefixed: winCreationTime, winLastWriteTime and winLastAccessTime.

For instance, to list the creation times of entries under Windows:

for (let entry in iterator) {
  if ("winCreationTime" in entry) {
    console.log("The file was created at", entry.winCreationTime);
  }
}

Or, to sort a list of entries by creation time:

let entries = [entry for (entry in iterator)];
if (entries.length > 0 && "winCreationTime" in OS.File.DirectoryIterator.Entry.prototype) {
  entries = entries.sort(function(a, b) {
    return a.winCreationTime - b.winCreationTime;
  })
}

If you wonder why we introduced fields winCreationTime et al. and not a cross-platform field creationTime, recall that, for the sake of I/O efficiency, each entry only contains the information returned by one single I/O call. As the Windows call returns more information than the Unix version, an entry under Windows offers more information than under Unix.

Finally, the Windows back-end offers an additional feature: iterating through only the subset of the entries of the directory matching some regular expression. As usual, since the feature is Windows only, it is prefixed by win.

let iterator = new OS.File.DirectoryIterator("C:\\System\\TEMP",
    /*platform-specific options*/ { winPattern: "*.tmp" } );
// ... do something with that iterator

FAQ

What’s this I/O efficiency?

The two main goals with OS.File are:

  • provide off-main-thread I/O;
  • be I/O efficient.

I/O efficiency is all about minimizing the number of actual I/O calls. This is critical because some platforms have extremely slow storage (e.g. smartphones, tablets) and because, regardless of the platforms, doing too much I/O penalizes not just your application but potentially all the applications running on the system, which is quite bad for the user experience. Finally, I/O is often expensive in terms of energy, so wasting I/O is wasting battery.

Consequently, one of the key design choices of OS.File is to provide operations that are low-level enough that they do not hide any I/O from the developer (which could cause the developer to perform more I/O than they think) and, since not all platforms have the same features, offer system-specific information that the developer can use to optimize his algorithms for a platform.

How does OS.File compare to Node.js I/O in terms of I/O-efficiency?

OS.File is designed for efficient off-main-thread I/O. For the moment, OS.File does not provide an asynchronous API that can be used from the main thread, although we are working on fixing this.

By contrast, Node.js low-level I/O is designed to mirror a subset of an old version of Posix, and provides both a synchronous and an asynchronous API on top of these calls.

The choice made by Node.js works well on the platforms for which Node.js is generally targeted (e.g. Unix-based servers) but we need better to cope with the platforms for which Firefox and Firefox OS are targeted (e.g. not only Unix but also Windows machines, as well as battery-powered devices with slow storage, etc.).

How does directory iteration compare to Node.js directory iteration?

Node.js provides a primitive readdir to iterate through a directory. This primitive returns an array of file names. The implementation of this primitive already costs about n I/O calls, where n is the number of files in the directory.

Consequently, the algorithm to determine which entries of a directory are subdirectories costs

  • about n I/O calls to establish the list of entries ; then
  • about n I/O calls to determine which are subdirectories.

This makes walking a directory recursively (to empty it or to copy it to another drive, for instance) twice more expensive than necessary. Note that this measure is very much non-scientific, as the I/O call to determine if an entry is a subdirectory can be much more expensive than the call to list the entry, depending on the OS.

By comparison, the OS.File directory iterator requires about n I/O calls for this purpose.

Similarly, under all platforms, finding the file accessed least recently has a cost of about 2·n I/O calls under Node.js.

With OS.File, the cost is similar to Node.js under Unix, but only n under Windows. Upcoming work with OS.File should also onsiderably reduce the I/O cost under Linux, Android and Firefox OS.

Finally, for an algorithm that can break from iteration once some condition is met (e.g. looking if at least one file matches some condition), Node.js will still require n I/O calls, while OS.File generally requires only as many I/O calls.

How does this compare to XPCOM nsILocalFile::directoryEntries?

Until now, the only manner of listing entries in a directory on the Mozilla Platform was nsILocalFile::directoryEntries.

Generally, OS.File directory is more convenient to use from JavaScript, can be called off-main thread, and provides more information than nsILocalFile::directoryEntries for the same I/O cost, which makes it more I/O efficient for e.g. iterating a directory looking for a file matching some condition, or walking recursively through a hierarchy.

The counterparts are that OS.File directory iteration cannot be used from the main thread yet and cannot be called from C++.

OS.File, step-by-step: The Schedule API

December 13, 2011 § 7 Comments

Summary

One of the key components of OS.File is the Schedule API, a tiny yet powerful JavaScript core designed to considerably simplify the development of asynchronous modules. In this post, we introduce the Schedule API.

Introduction

In a previous post, I introduced OS.File, a Mozilla Platform library designed make the life of developers easier and to help them produce high-performance, high-responsiveness file management routines.

In this post, I would like to concentrate on one of the core items of OS.File: the Schedule API. Note that the Schedule API is not limited to OS.File and is designed to be useful for all sorts of other modules.

« Read the rest of this entry »

Where Am I?

You are currently browsing entries tagged with OS.File at Il y a du thé renversé au bord de la table.

Follow

Get every new post delivered to your Inbox.

Join 33 other followers