Recent changes to OS.File

April 8, 2014 § 3 Comments

A quick post to summarize some of the recent improvements to OS.File.

Encoding/decoding

To write a string, you can now pass the string directly to writeAtomic:

OS.File.writeAtomic(path, "Here is a string", { encoding: "utf-8"})

Similarly, you can now read strings from read:

OS.File.read(path, { encoding: "utf-8" } ); // Resolves to a string.

Doing this is at least as fast as calling TextEncoder/TextDecoder yourself (see below).

Native implementation

OS.File.read has been reimplemented in C++. The main consequence is that this function can now be used safely during startup, without having to wait for the underlying OS.File ChromeWorker to start. Also, decoding (see above) is performed off the main thread, which makes it much faster.

According to my benchmarks, using OS.File.read to read strings is about 2-5x faster than NetUtil.asyncFetch on large files and doesn’t block the main thread for more than 5ms, while asyncFetch performs string decoding on the main thread. Also, it doesn’t perform any main thread I/O by opposition to NetUtil.asyncFetch.

Backups

When using writeAtomic, it is now possible to request existing files to be backed up almost atomically. In many cases, this is a good strategy to ensure that data is safely written to disk, without having to use a flush, which would be expensive for the whole system.

yield OS.File.writeAtomic(path, data, { tmpPath: path + ".tmp", backupTo: path + ".backup} } );

Compression

writeAtomic and read both now support an implementation of lz4 compression

yield OS.File.writeAtomic(path, data, { compression: "lz4"});
yield OS.File.read(path, { compression: "lz4"});

Note that this format will not be understood by any command-line tool. It is somewhat proprietary. Also note that (de)compression is performed on the ChromeWorker thread for the time being, so it doesn’t benefit from the native reimplementation mentioned above.

Creating directories recursively

let dir = OS.Path.join(OS.Constants.Path.profileDir, "a", "b", "c", "d");
yield OS.File.makeDir(dir, { from: OS.Constants.Path.profileDir });

Copying streams asynchronously

October 18, 2013 § Leave a comment

In the Mozilla Platform, I/O is largely about streams. Copying streams is a rather common activity, e.g. for the purpose of downloading files, decompressing archives, saving decoded images, etc. As usual, doing any I/O on the main thread is a very bad idea, so the recommended manner of copying streams is to use one of the asynchronous string copy APIs provided by the platform: NS_AsyncCopy (in C++) and NetUtil.asyncCopy (in JavaScript). I have recently audited both to ascertain whether they accidentally cause main thread I/O and here are the results of my investigations.

In C++

What NS_AsyncCopy does

NS_AsyncCopy is a well-designed (if a little complex) API. It copies the full contents of an input stream into an output stream, then closes both. NS_AsyncCopy can be called with both synchronous and asynchronous streams. By default, all operations take place off the main thread, which is exactly what is needed.

In particular, even when used with the dreaded Safe File Output Stream, NS_AsyncCopy will perform every piece of I/O out of the main thread.

The default setting of reading data by chunks of 4kb might not be appropriate to all data, as it may cause too much I/O, in particular if you are reading a small file. There is no obvious way for clients to detect the right setting without causing file I/O, so it might be a good idea to eventually extend NS_AsyncCopy to autodetect the “right” chunk size for simple cases.

Bottom line: NS_AsyncCopy is not perfect but it is quite good and it does not cause main thread I/O.

Limitations

NS_AsyncCopy will, of course, not remove main thread I/O that takes place externally. If you open a stream from the main thread, this can cause main thread I/O. In particular, file streams should really be opened with flag DEFER_OPEN flag. Other streams, such as nsIJARInputStream do not support any form of deferred opening (bug 928329), and will cause main thread I/O when they are opened.

While NS_AsyncCopy does only off main thread I/O, using a Safe File Output Stream will cause a Flush. The Flush operation is very expensive for the whole system, even when executed off the main thread. For this reason, Safe File Output Stream is generally not the right choice of output stream (bug 928321).

Finally, if you only want to copy a file, prefer OS.File.copy (if you can call JS). This function is simpler, entirely off main thread, and supports OS-specific accelerations.

In JavaScript

What NetUtil.asyncCopy does

NetUtil.asyncCopy is a utility method that lets JS clients call NS_AsyncCopy. Theoretically, it should have the same behavior. However, some oddities make its performance lower.

As NS_AsyncCopy requires one of its streams to be buffered, NetUtil.asyncCopy calls nsIIOUtil::inputStreamIsBuffered and nsIIOUtil::outputStreamIsBuffered. These methods detect whether a stream is buffered by attempting to perform buffered I/O. Whenever they succeed, this causes main thread I/O (bug 928340).

Limitations

Generally speaking, NetUtil.asyncCopy has the same limitations as NS_AsyncCopy. In particular, in any case in which you can replace NetUtil.asyncCopy with OS.File.copy, you should pick the latter, which is both simpler and faster.

Also, NetUtil.asyncCopy cannot read directly from a Zip file (bug 927366).

Finally, NetUtil.asyncCopy does not fit the “modern” way of writing asynchronous code on the Mozilla Platform (bug 922298).

Helping out

We need to fix a few bugs to improve the performance of asynchronous copy. If you wish to help, please do not hesitate to pick any of the bugs listed above and get in touch with me.

Where Am I?

You are currently browsing entries tagged with perf at Il y a du thé renversé au bord de la table.

Follow

Get every new post delivered to your Inbox.

Join 32 other followers