Shutting down things asynchronously

February 14, 2014 § Leave a comment

This blog entry is part of the Making Firefox Feel As Fast As Its Benchmarks series. The fourth entry of the series was growing much too long for a single blog post, so I have decided to cut it into bite-size entries.

A long time ago, Firefox was completely synchronous. One operation started, then finished, and then we proceeded to the next operation. However, this model didn’t scale up to today’s needs in terms of performance and performance perception, so we set out to rewrite the code and make it asynchronous wherever it matters. These days, many things in Firefox are asynchronous. Many services get started concurrently during startup or afterwards. Most disk writes are entrusted to an IO thread that performs and finishes them in the background, without having to stop the rest of Firefox.

Needless to say, this raises all sorts of interesting issues. For instance: « how do I make sure that Firefox will not quit before it has finished writing my files? » In this blog entry, I will discuss this issue and, more generally, the AsyncShutdown mechanism, designed to implement shutdown dependencies for asynchronous services.

« Read the rest of this entry »

Is my data on the disk? Safety properties of OS.File.writeAtomic

February 5, 2014 § 1 Comment

If you have been writing front-end or add-on code recently, chances are that you have been using library OS.File and, in particular, OS.File.writeAtomic to write files. (Note: If you have been writing files without using OS.File.writeAtomic, chances are that you are doing something wrong that will cause Firefox to jank – please don’t.) As the name implies, OS.File.writeAtomic will make efforts to write your data atomically, so as to ensure its survivability in case of crash, power loss, etc.

However, you should not trust this function blindly, because it has its limitations. Let us take a look at exactly what the guarantees provided by writeAtomic.

Algorithm: just write

Snippet OS.File.writeAtomic(path, data)

What it does

  1. reduce the size of the file at path to 0;
  2. send data to the operating system kernel for writing;
  3. close the file.

Worst case scenarios

  1. if the process crashes between 1. and 2. (a few microseconds), the full content of path may be lost;
  2. if the operating system crashes or the computer loses power suddenly before the kernel flushes its buffers (which may happen at any point up to 30 seconds after 1.), the full content of path may be lost;
  3. if the operating system crashes or the computer loses power suddenly while the operating system kernel is flushing (which may happen at any point after 1., typically up to 30 seconds), and if your data is larger than one sector (typically 32kb), data may be written incompletely, resulting in a corrupted file at path.

Performance very good.

Algorithm: write and rename

Snippet OS.File.writeAtomic(path, data, { tmpPath: path + ".tmp" })

What it does

  1. create a new file at tmpPath;
  2. send data to the operating system kernel for writing to tmpPath;
  3. close the file;
  4. rename tmpPath on top of path.

Worst case scenarios

  1. if the process crashes at any moment, nothing is lost, but a file tmpPath may be left on the disk;
  2. if the operating system crashes or the computer loses power suddenly while the operating system kernel is flushing metadata (which may happen at any point after 1., typically up to 30 seconds), the full content of path may be lost;
  3. if the operating system crashes or the computer loses power suddenly while the operating system kernel is flushing (which may happen at any point after 1., typically up to 30 seconds), and if your data is larger than one sector (typically 32kb), data may be written incompletely, resulting in a corrupted file at path.

Performance almost as good as Just Write.

Side-note On the ext4fs file system, the kernel automatically adds a flush, which transparently transforms the safety properties of this operation into those of the algorithm detailed next.

Native equivalent In XPCOM/C++, the mostly-equivalent solution is the atomic-file-output-stream.

Algorithm: write, flush and rename

Use OS.File.writeAtomic(path, data, { tmpPath: path + ".tmp", flush: true })

What it does

  1. create a new file at tmpPath;
  2. send data to the operating system kernel for writing to tmpPath;
  3. close the file;
  4. flush the writing of data to tmpPath;
  5. rename tmpPath on top of path.

Worst case scenarios

  1. if the process crashes at any moment, nothing is lost, but a file tmpPath may be left on the disk;
  2. if the operating system crashes, nothing is lost, but a file tmpPath may be left on the disk;
  3. if the computer loses power suddenly while the hard drive is flushing its internal hardware buffers (which is very hard to predict), nothing is lost, but an incomplete file tmpPath may be left on the disk;.

Performance some operating systems (Windows) or file systems (ext3fs) cannot flush a single file and rather need to flush all the files on the device, which considerably slows down the full operating system. On some others (ext4fs) this operation is essentially free. On some versions of MacOS X, flushing actually doesn’t do anything.

Native equivalent In XPCOM/C++, the mostly-equivalent solution is the safe-file-output-stream.

Algorithm: write, backup, rename

(not landed yet)

Snippet OS.File.writeAtomic(path, data, { tmpPath: path + ".tmp", backupTo: path + ".backup"})

What it does

  1. create a new file at tmpPath;
  2. send data to the operating system kernel for writing to tmpPath;
  3. close the file;
  4. rename the file at path to backupTo;
  5. rename the file at tmpPath on top of path;

Worst case scenarios

  1. if the process crashes between 4. and 5, file path may be lost and backupTo should be used instead for recovery;
  2. if the operating system crashes or the computer loses power suddenly while the operating system kernel is flushing metadata (which may happen at any point after 1., typically up to 30 seconds), the file at path may be empty and backupTo should be used instead for recovery;
  3. if the operating system crashes or the computer loses power suddenly while the operating system kernel is flushing (which may happen at any point after 1., typically up to 30 seconds), and if your data is larger than one sector (typically 32kb), data may be written incompletely, resulting in a corrupted file at path and backupTo should be used instead for recovery;

Performance almost as good as Write and Rename.

Making Firefox Feel as Fast as its Benchmarks – part 3 – Going multi-threaded

October 29, 2013 § 11 Comments

As we saw in the previous posts, our browser behaves as follows

function browser() {
  while (true) {
    handleEvents();  // Let's make this faster
    updateDisplay();
  }
}

The key to making the browser smooth is to make handleEvents() do less. We have already discussed the ongoing work to make Firefox multi-process, their goals and their limitations. Another, mostly orthogonal, path, is to go multi-threaded.

Going multi-threaded

Going multi-threaded is all about splitting the event loop in several loops, executed concurrently, on several cores whenever applicable and possible:

function browser() {
  main() ||| worker() ||| worker() // Running concurrently
}

task main() { // Main thread (time-critical)
  while (true) {
    handleEvents(); // Some of your code here
    updateDisplay();
  }
}

task worker() {
  while (true) {
    handleEvents(); // Some of your code here
  }
}

task worker() {
  while (true) {
    updateDisplay();
  }
}

The main thread remains time-critical and needs to loop 60 times per second, while other threads handle some of the workload of both handleEvents() and updateDisplay(). Now, this treatment is only useful if we can isolate operations that slow down the main loop measurably. As it turns out, there are many such operations lying around, including:

  • Network I/O;
  • Disk I/O;
  • Database I/O;
  • GPU I/O;
  • Treating large amounts of data.

It is easy to see why Network I/O could considerably slow down the main loop, if it were handled on the main thread – after all, some requests take several seconds to receive a reply, or never do, and if the main thread had to wait for the completion of these requests before it proceeded, this would cause multi-second gaps between two frames, which is simply not acceptable.

The cost of disk I/O, however, is often underestimated. Few developers realize that _any_ disk operation can take an unbounded amount of time – even closing a file or checking whether a file exists can, in some cases, take several seconds. This may seem counter-intuitive, as these operations do very little besides book-keeping, but one must not forget that they rely upon the device itself and that said device can unpredictably become very slow, typically because it is otherwise busy, or asleep – or even because that device is actually a network device. Database I/O is a special case of Disk I/O that we generally single out because its cost is often much higher than users suspect – recall that, in addition to saving, a database management system will typically need to maintain a journal and to flush the drive regularly, to protect data against both software or hardware failures, including sudden power loss. Consequently, unless the database has been heavily customized to lift the safety requirements in favor of performance, you should expect that every operation on your database will cause heavy disk I/O.

Finally, treating large amounts of data, or applying any other form of heavy algorithm, will of course take time.

None of these operations should take place on the main thread. Moving them off the main thread will largely contribute to getting rid of the jank caused by these operations.

Coding for multi-threading

In the Firefox web browser, threads are materialized as instances of nsIThread in C++ code and as instances of ChromeWorker in JavaScript code. For this discussion, I will concentrate on JavaScript code as refactoring C++ code is, well, complicated. Side-note: if you are new here, recall that Chrome Workers have nothing to with the Chrome Web Browser and everything to do with the Mozilla Chrome, i.e. the parts of Gecko and Firefox written in JavaScript.

Chrome Workers are an extension of Web Workers, and have the same semantics, plus a few additions. Instantiating a ChromeWorker requires a source file:

let worker = new ChromeWorker("resource://path/to/my_file.js");

We may send messages to and from a Chrome Worker

// In the parent
worker.postMessage(someValue);

// In the worker
self.postMessage(someValue);

and, of course, receive messages

// In the parent
worker.addEventListener("message", function(msg) {
// A copy of the message appears in msg.data
});

// In the worker
self.addEventListener("message", function(msg) {
// A copy of the message appears in msg.data
});

In either case, the contents of the message gets copied between threads, with essentially the same semantics as JSON.stringify/JSON.parse. If necessary, binary data in messages (ArrayBuffer or the upcoming Typed Objects) can be transferred instead of being copied, which is faster.

As Web Workers, Chrome Workers are very good to perform computations. In addition, they have a number of low-level libraries to access system features. Such libraries can be loaded with the chrome worker module loader:

let MyModule = require("resource://...");

Further modules can be defined for consumption with the chrome worker module loader:

module.exports = {
  foo: // ...
};

Finally, they can call into C code using the js-ctypes foreign function interface:

let lib = ctypes.open("path/to/my_lib");
let fun = lib.declare("myFunction", ctypes.void); // void myFunction()
fun(); // Call into C

Combining the module loader and js-ctypes makes for a powerful combination that has been used to provide access to low-level libraries, including low-level file manipulation (module OS.File), phone communication (module RIL, shorthand for Radio Interface Layer), file (de)compression, etc.

Limitations

Where multi-process is good at protecting a process against other processes, going multi-threaded works nicely to protect a process (a tab, the ui, etc.) against itself. Threads take up much less resources than processes and are also much faster to start and stop. However, they have very strict limitations.

The main limitation is that they do not have access to all the main thread APIs. Each API needs to be ported individually to chrome workers. Until recently, there was no manner to define or load modules. At the moment, there is no way to read or write a compressed file from a Chrome Worker, or to access a database from a Chrome Worker. In most cases, this is only a question of time and manpower, and we can hope to eventually bring almost all important APIs to Chrome Workers. However, some APIs cannot be ported at all, in particular any API that requires a DOM window, which is most (fortunately not all) DOM APIs.

Also, the paradigm behind Chrome Workers is purely asynchronous. This means that there is no way for a Chrome Worker to wait synchronously until some treatment has been completed by the main thread. This complicates code in a few cases but, in general, this is rarely a problem.

Also, the communication mechanism needs to be taken into account:  as copying long messages can block the main thread. In some cases, it may be necessary to perform aggressive optimization of messages to avoid such situations.

Refactoring for multi-threading

The first thing to take into consideration while refactoring for multi-process is whether this is the best strategy. Since most APIs and most customization possibilities live on the main thread, most features need to be produced and/or consumed by the main thread. This does not mean that going multi-threaded is not possible, only that your code will probably end up looking like an asynchronous API meant to be used mostly on the main thread but implemented off the main thread. This also means that your consumers must be architectured to accept an asynchronous API. We will cover making things asynchronous in another entry of this series.

Once we have decided to go multi-threaded, the next part is to determine what goes of the main thread. Generally, you want to move as much as you can off the main thread. The only limits are things that you simply cannot move off the main thread (e.g. access to the document), or if you realize that the data you need to copy (not transfer) across threads will slow down the main thread inacceptably. This, of course, is something that can be determined only by benchmarking.

Next, you will need to define a communication protocol between the main thread and the worker. Threads communicate by sending pure data (i.e. objects without methods, without DOM nodes, etc.) and binary data can be transfered for high-performance. Recall that communications are asynchronous, so if you want a thread to respond to another one, you will need to build into your protocol identification to match a reply to a request. This is not built-in, but quite easy to do. Handling errors requires a little finesse, as uncaught exceptions on the worker are transmitted to a onerror listener instead of the usual onmessage listener, and lose some information along the way.

In some (hopefully rare) cases, you will need to add new bindings to native code, so as to call C functions (only C, not C++) from JavaScript. For this purpose, take a look at the documentation of js-ctypes, our JavaScript FFI, and osfile_shared_allthreads.jsm, a set of lightweight extensions to js-ctypes that handle a number of platform-specific gotchas. As finding the correct libraries to link is sometimes tricky, you should take advantage of OS.Constants.Path, that already lists some of them. Don’t hesitate to file bugs if you realize that something important is missing. Also, in a few (hopefully almost non-existent) cases, you will need to expose additional C code to native code, typically to expose some C++-only features. For this purpose, take a look at an example.

Unsurprisingly, the next step is to write the JS code. The usual caveats apply, just don’t forget to use the module system. Worker code goes into its own file, typically with extension “.js”. It is generally a good idea to mention “worker” in the name of the file, e.g. “foo_worker.js”, and to deploy your code to "resource://.../worker/..." or "chrome://.../worker/..." to avoid ambiguities. To construct the worker, it is then sufficient to call new ChromeWorker("resource://path/to/your/file.js"). The worker code will be started lazily when the first message is sent.

For automated testing, you can for instance use mochitest-chrome or (once bug 930924 has landed) xpcshell-tests. In the latter, if you need to add new worker code for the sake of testing, you should install it with the chrome:// protocol. Also, for any testing, don’t forget to look at your system console, as worker errors are displayed on that console by default.

That’s it! In a future blog entry, I will write more about common patterns for writing or refactoring asynchronous code, which comes in very handy for code that uses your new API.

Contributing

Refactoring Firefox as a set of asynchronous APIs backed by off main thread implementations is a considerable task. To make it happen, the best way is to contribute to coding, testing or documentation

Copying streams asynchronously

October 18, 2013 § Leave a comment

In the Mozilla Platform, I/O is largely about streams. Copying streams is a rather common activity, e.g. for the purpose of downloading files, decompressing archives, saving decoded images, etc. As usual, doing any I/O on the main thread is a very bad idea, so the recommended manner of copying streams is to use one of the asynchronous string copy APIs provided by the platform: NS_AsyncCopy (in C++) and NetUtil.asyncCopy (in JavaScript). I have recently audited both to ascertain whether they accidentally cause main thread I/O and here are the results of my investigations.

In C++

What NS_AsyncCopy does

NS_AsyncCopy is a well-designed (if a little complex) API. It copies the full contents of an input stream into an output stream, then closes both. NS_AsyncCopy can be called with both synchronous and asynchronous streams. By default, all operations take place off the main thread, which is exactly what is needed.

In particular, even when used with the dreaded Safe File Output Stream, NS_AsyncCopy will perform every piece of I/O out of the main thread.

The default setting of reading data by chunks of 4kb might not be appropriate to all data, as it may cause too much I/O, in particular if you are reading a small file. There is no obvious way for clients to detect the right setting without causing file I/O, so it might be a good idea to eventually extend NS_AsyncCopy to autodetect the “right” chunk size for simple cases.

Bottom line: NS_AsyncCopy is not perfect but it is quite good and it does not cause main thread I/O.

Limitations

NS_AsyncCopy will, of course, not remove main thread I/O that takes place externally. If you open a stream from the main thread, this can cause main thread I/O. In particular, file streams should really be opened with flag DEFER_OPEN flag. Other streams, such as nsIJARInputStream do not support any form of deferred opening (bug 928329), and will cause main thread I/O when they are opened.

While NS_AsyncCopy does only off main thread I/O, using a Safe File Output Stream will cause a Flush. The Flush operation is very expensive for the whole system, even when executed off the main thread. For this reason, Safe File Output Stream is generally not the right choice of output stream (bug 928321).

Finally, if you only want to copy a file, prefer OS.File.copy (if you can call JS). This function is simpler, entirely off main thread, and supports OS-specific accelerations.

In JavaScript

What NetUtil.asyncCopy does

NetUtil.asyncCopy is a utility method that lets JS clients call NS_AsyncCopy. Theoretically, it should have the same behavior. However, some oddities make its performance lower.

As NS_AsyncCopy requires one of its streams to be buffered, NetUtil.asyncCopy calls nsIIOUtil::inputStreamIsBuffered and nsIIOUtil::outputStreamIsBuffered. These methods detect whether a stream is buffered by attempting to perform buffered I/O. Whenever they succeed, this causes main thread I/O (bug 928340).

Limitations

Generally speaking, NetUtil.asyncCopy has the same limitations as NS_AsyncCopy. In particular, in any case in which you can replace NetUtil.asyncCopy with OS.File.copy, you should pick the latter, which is both simpler and faster.

Also, NetUtil.asyncCopy cannot read directly from a Zip file (bug 927366).

Finally, NetUtil.asyncCopy does not fit the “modern” way of writing asynchronous code on the Mozilla Platform (bug 922298).

Helping out

We need to fix a few bugs to improve the performance of asynchronous copy. If you wish to help, please do not hesitate to pick any of the bugs listed above and get in touch with me.

Trapping uncaught asynchronous errors

October 14, 2013 § 2 Comments

While the official specifications of DOM Promise is still being worked on, Mozilla has been using Promise internally for several years already. This API is available to the platform front-end and to add-ons. In the past few weeks, Promise.jsm (our implementation of Promise) and Task.jsm (our implementation of Beautiful Concurrency in JavaScript, built on top of Promise) have been updated with a few new features that should make everybody’s life much easier.

Reporting errors

The #1 issue developers encounter with the use of Promise and Task is error-handling. In non-Promise code, if a piece of code throws an error, by default, that error will eventually be reported by window.onerror or any of the other error-handling mechanisms.

function fail() {
  let x;
  return x.toString();
}

fail(); // Displays somewhere: "TypeError: x is undefined"

By opposition, with Promise and/or Task, if a piece of code throws an error or rejects, by default, this error will be completely ignored:

Task.spawn(function*() {
  fail(); // Error is ignored
});

 

Task.spawn(function*() {
  yield fail(); // Error is ignored
});

 

somePromise.then(function onSuccess() {
  fail(); // Error is ignored
});

 

somePromise.then(function onSuccess() {
  return fail(); // Error is ignored
});

Debugging the error requires careful instrumentation of the code, which is error-prone, time-consuming, often not compositional and generally ugly to maintain:

Task.spawn(function*() {
  try {
    fail();
  } catch (ex) {
    Components.utils.reportError(ex);
    throw ex;
    // The error report is incomplete, re-throwing loses stack information
    // and can cause double-reporting
  }
});

The main reason we errors end up dropped silently is that it is difficult to find out whether an error is eventually caught by an error-handler – recall that, with Promise and Task, error handlers can be registered long after the error has been triggered.

Well, after long debates, we eventually found solutions to fix the issue :)

Simple case: Reporting programming errors

Our first heuristic is that programming errors are, well, programming errors, and that programmers are bound to be looking for them.

So,

Task.spawn(function*() {
  fail(); // Error is not invisible anymore
});

will now cause the following error message

*************************
A coding exception was thrown and uncaught in a Task.

Full message: TypeError: x is undefined
Full stack: fail@Scratchpad/2:23
@Scratchpad/2:27
TaskImpl_run@resource://gre/modules/Task.jsm:217
TaskImpl@resource://gre/modules/Task.jsm:182
Task_spawn@resource://gre/modules/Task.jsm:152
@Scratchpad/2:26
*************************

The message appears on stderr (if you have launched Firefox from the command-line) and in the system logs, so it won’t disrupt your daily routine, but if you are running tests or debugging your code, you should see it.

A similar error message will be printed out if the error is thrown from a raw Promise, without use of Task.

These error messages are limited to programming errors and appear only if the errors are thrown, not passed as rejections.

General case: Reporting uncaught errors

Now, we have just landed a more general support for displaying uncaught errors.

Uncaught thrown error

Task.spawn(function* () {
  throw new Error("BOOM!"); // This will be displayed
});

Uncaught rejection

Task.spawn(function* () {
  yield Promise.reject("BOOM!"); // This will also be displayed
});

Uncaught and clearly incorrect rejection

Task.spawn(function* () {
  Promise.reject("BOOM!");
  // Oops, forgot to yield.
  // Nevermind, this will be displayed, too
});

These will be displayed in the browser console as follows:

A promise chain failed to handle a rejection: on Mon Oct 14 2013 16:50:15 GMT+0200 (CEST), Error: BOOM! at
@Scratchpad/2:27
TaskImpl_run@resource://gre/modules/Task.jsm:217
TaskImpl@resource://gre/modules/Task.jsm:182
Task_spawn@resource://gre/modules/Task.jsm:152
@Scratchpad/2:26

These error messages appear for every uncaught error or rejection, once it is certain that the error/rejection cannot be caught anymore. If you are curious about the implementation, just know that it hooks into the garbage-collector to be informed once the error/promise cannot be caught anymore.

This should prove very helpful when debugging Promise- or Task-related errors. Have fun :)

Support for ES6 generators

You may have noticed that the above examples use function*() instead of function(). Be sure to thank Brandon Benvie who has recently updated Task.jsm to be compatible with ES6 generators :)

Making Firefox Feel as Fast as its Benchmarks – Part 2 – Towards multi-process

October 9, 2013 § 2 Comments

As we saw in the first post of this series, our browser behaves as follows:

function browser() {
  while (true) {
    handleEvents();  // Let's make this faster
    updateDisplay();
  }
}

As we discussed, the key to making the browser smooth is to make handleEvents() do less. One way of doing this is to go multi-process.

Going multi-process

Chrome is multi-process. Internet Explorer 4 was multi-process and so is Internet Explorer 8+ do (don’t ask me where IE 5, 6, 7 went). Well, Firefox OS is multi-process, too and Firefox for Android used to be multi-process until we canceled this approach due to Android-specific issues. For the moment, Firefox Desktop is only slightly multi-process, although we are heading further in this direction with project electrolysis (e10s, for short).

In a classical browser (i.e. not FirefoxOS, not the Firefox Web Runtime), going multi-process means running the user interface and system-style code in one process (the “parent process”) and running code specific to the web or to a single tab in another process (the “child process”). Whether all tabs share a process or each tab is a separate process, or even each iframe is a process, is an additional design choice that, I believe, is still open to discussion in Firefox. In FirefoxOS and in the Firefox Web Runtime (part of Firefox Desktop and Firefox for Android), going multi-process generally means one process per (web) application.

Since code is separated between processes, each handleEvents() has less to do and will therefore, at least in theory, execute faster. Additionally, this is better for security, insofar as a compromised web-specific process affords an attacker less control than compromising the full process. Finally, this gives the possibility to crash a child process if necessary, without having to crash the whole browser.

Coding for multi-process

In the Firefox web browser, the multi-process architecture is called e10s and looks as follows:

function parent() {
  while (true) {
    handleEvents();  // Some of your code here
    updateDisplay(); // Just the ui
  }
}
function child() {
  while (true) {
    handleEvents();  // Some of your code here
    updateDisplay(); // Just some web
  }
}

parent() ||| child()

The parent process and the child process are not totally independent. Very often, they need to communicate. For instance, when the user browses in a tab, the parent needs to change the history menu displayed by the parent process. Similarly, every few seconds, Firefox saves its state to provide quick recovery in case of crash – the parent asks each tab for its information and, once all replies have arrived, gathers them into one data structure and saves them all.

For this purpose, parent and the child can send messages to each other through the Message Manager. A Message Manager can let a child process communicate with a single parent process and can let a parent process communicate with one or more children processes:

// Sender-side
messageManager.sendAsyncMessage("MessageTopic", {data})

// Receiver-side
messageManager.addMesageListener("MessageTopic", this);
// ...
receiveMessage: function(message) {
  switch (message.name) {
  case "MessageTopic":
    // do something with message.data
    // ...
    break;
  }
}

Additionally, code executed in the parent process can inject code in the child process using the Message Manager, as follows:

messageManager.loadFrameScript("resource://path/to/script.js", true);

Once injected, the code behaves as any (privileged) code in the child process.

As you may see, communications are purely asynchronous – we do not wish the Message Manager to stop a process and wait until another process is done with it tasks, as this would totally defeat the purpose of multi-processing. There is an exception, called the Cross Process Object Wrapper, which I am not going to cover, as this mechanism is meant to be used only during a transition phase.

Limitations

It is tempting to see multi-process architecture as a silver bullet that semi-magically makes Firefox (or its competitors) fast and smooth. There are, however, quite clear limitations to the model.

Firstly, going multi-process has a cost. As demonstrated by Chrome, each process consumes lots of memory. Each process needs to load its libraries, its JavaScript VM, each script must be JIT-ed for each process, each process needs its communication channgel towards the GPU etc. Optimizing this is possible, as demonstrated by FirefoxOS (which runs nicely with 256 Mb) but is a challenge.

Similarly, starting a multi-process browser can be much slower than starting a single-process browser. Between the instant the user launches the browser and the instant it becomes actually usable, many things need to happen: launching the parent process, which in turn launches the children processes, setting up the communication channels, JIT compiling all the scripts that need compilation, etc. The same cost appears when shutting down the processes.

Also, using several processes brings about a risk of contention on resources. Two processes may need to access the disk cache at the same time, or the cookies, or the session storage, or the GPU or the audio device. All of this needs to be managed carefully and can, in certain cases, slow down considerably both processes.

Also, some APIs are synchronous by specifications. If, for some reason, a child process needs to access the DOM of another child process – as may happen in the case of iframes – both child processes need to become synchronous. During the operation, they both behave as a single process, with just extremely slower DOM operations.

And finally, going multi-process will of course not make a tab magically responsive if this tab itself is the source of the slowdown – in other words, multi-process it not very useful for games.

Refactoring for multi-process

Many APIs, both in Firefox itself and in add-ons, are not e10s-compliant yet. The task of refactoring Firefox APIs into something e10s-compliant is in progress and can be followed here. Let’s see what needs to be done to refactor an API for multi-process.

Firstly, this does not apply to all APIs. APIs that access web content for non-web content need to be converted to e10s-style – an example is Page Info, which needs to access web content (the list of links and images from that page) for the purpose of non-web content (the Page Info button and dialog). As multi-process communications is asynchronous, this means that such APIs must be asynchronous already or must be made asynchronous if they are not, and that code that calls these APIs needs to be made asynchronous if it is not asynchronous already, which in itself is already a major task. We will cover making things asynchronous in another entry of this series.

Once we have decided to make an asynchronous API e10s-compliant, the following step is to determine which part of the implementation needs to reside in a child process and which part in the parent process. Typically, anything that touches the web content must reside in the child process. As a rule of thumb, we generally consider that the parent process is more performance-critical than children-processes, so if you have code that could reside either in the child process or in the parent process, and if placing that code in the child process will not cause duplication of work or very expensive communications, it is a good idea to move it to the child process. This is, of course, a rule of thumb, and nothing replaces testing and benchmarking.

The next step is to define a communication protocol. Messages exchanged between the parent process and children processes all need a name. If are working on feature Foo, by conventions, the name of your messages should start with “Foo:”. Recall that message sending is asynchronous, so if you need a message to receive an answer, you will need two messages: one for sending the request (“Foo:GetState”) and one for replying once the operation is complete (“Foo:State”). Messages can carry arbitrary data in the form of a JavaScript structure (i.e. any object that can be converted to/from JSON without loss). If necessary, these structures may be used to attach unique identifiers to messages, so as to easily match a reply to its request – this feature is not built into the message manager but may easily be implemented on top of it. Also, do not forget to take into account communication timeouts – recall that a process may fail to reply because it has crashed or been killed for any reason.

The last step is to actually write the code. Code executed by the parent process typically goes into some .js file loaded from XUL (e.g. browser.js) or a .jsm module, as usual. Code executed by a child process goes into its own file, typically a .js, and must be injected into the child process during initialization by using window.messageManager.loadFrameScript (to inject in all children process) or browser.messageManager.loadFrameScript (to inject in a specific child process).

That’s it! In a future blog entry, I will write more about common patterns for writing or refactoring asynchronous code, which comes in very handy for code that uses your new API.

Contributing to e10s

The ongoing e10s refactoring of Firefox is a considerable task. To make it happen, the best way is to contribute to coding or testing.

What’s next?

In the next blog entry, I will demonstrate how to make front-end and add-on code multi-threaded.

Asynchronous database connections in the Mozilla Platform

July 19, 2013 § 2 Comments

One of the core components of the Mozilla Platform is mozStorage, our low-level database, based on sqlite3. mozStorage is used just about everywhere in our codebase, to power indexedDB, localStorage, but also site permissions, cookies, XUL templates, the download manager (*), forms, bookmarks, the add-ons manager (*), Firefox Health Report, the search service (*), etc. – not to mention numerous add-ons.

(*) Some components are currently moving away from mozStorage for performance and footprint reasons as they do not need the safety guarantees provided by mozStorage.

A long time ago, mozStorage and its users were completely synchronous and main-thread based. Needless to say, this eventually proved to be a design that doesn’t scale very well. So, we set out on a quest to make mozStorage off main thread-friendly and to move all these uses off the main thread.

These days, whether you are developing add-ons or contributing to the Mozilla codebase, everything you need to access storage off the main thread are readily available to you. Let me introduce the two recommended flavors.

Note: This blog entry does not cover using database from *web applications* but from the *Mozilla Platform*. From web applications, you should use indexedDB.

« Read the rest of this entry »

Chrome Workers, now with modules!

July 17, 2013 § 3 Comments

One of the main objectives of Project Async is to encourage Firefox developers and Firefox add-on developers to use Chrome Workers to ensure that whatever they do doesn’t block Firefox’ UI thread. The main obstacle, for the moment, is that Chrome Workers have access to very few features, so one the tasks of Project Async is to add features to Chrome Workers.

Today, let me introduce the Module Loader for Chrome Workers.

« Read the rest of this entry »

Project Async & Responsive, issue 1

April 26, 2013 § 3 Comments

In the previous episodes

Our intrepid heroes, a splinter cell from Snappy, have set out on a quest to offer alternatives to all JavaScript-accessible APIs that blocks the main thread of Firefox & Mozilla Apps.

Recently completed

Various cleanups on Session Restore (Bug 863227, 862442, 861409)

Summary Currently, we regularly (~every 15 seconds) save the state of every single window, tab, iframe, form being displayed, so as to be able to restore the session quickly in case of crash, power failure, etc. As this can be done only on the main thread, just the jank of data collection is often noticeable (i.e. > 50 ms). We are in the process of refactoring Session Restore to make it both faster and more responsive. These bugs are steps towards optimizations.

Status Landed. More cleanups in progress.

Telemetry for Number of Threads (Bug 724368)

Summary As we make Gecko and add-ons more and more concurrent, we need to measure whether this concurrency can cause accidental internal Denial of Service. The objective of this bug is to measure.

Status Landed. You can follow progression in histograms BACKGROUNDFILESAVER_THREAD_COUNT and SIMPLEMEASURES_MAXIMALNUMBEROFCONCURRENTTHREADS.

Reduce number of fsync() in Firefox Health Report (Bug 830492)

Summary Firefox Health Report stores its data using mozStorage. The objective of this bug is to reduce the number of expensive disk synchronizations performed by FHR.

Status Landed.

Ongoing bugs

Out Of Process Thumbnailing (Bug 841495)

Summary Currently, we capture thumbnails of all pages, e.g. for display in about:newtab, in Panorama or in add-ons. This blocks the main thread temporarily. This bug is about using another process to capture thumbnails of pages currently being visited. This can be useful for both security/privacy reasons (i.e. to ensure that bank account numbers do not show up in thumbnails) and responsiveness (i.e. to ensure that we never block browsing).

Status In progress.

Cutting Session Restore data collection into chunks (Bug 838577)

Summary Currently, we regularly (~every 15 seconds) save the state of every single window, tab, iframe, form being displayed, so as to be able to restore the session quickly in case of crash, power failure, etc. As this can be done only on the main thread, just the jank of data collection is often noticeable (i.e. > 50 ms). This bug is about cutting data collection in smaller chunks, to remove that jank.

Status Working prototype.

Off Main Thread database storage (Bug 702559)

Summary We are in the process of moving as many uses of mozStorage out of the main thread. While mozStorage already supports doing much I/O off the main thread, there is no clear way for developers to enforce this. This bug is about providing a subset of mozStorage that performs all I/O off the main thread and that will serve as target for both ongoing refactorings and future uses of mozStorage, in particular by add-ons.

Status Working prototype.

Improvements transaction management by JavaScript API for Off Main Thread database storage (Bug 856925)

Summary Sqlite.jsm is our JavaScript library for convenient Off Main Thread database storage. This bug is about improving how implicit transactions are handled by the library, hence improving performance.

Status In progress.

Refactor how Places data is backed up (Bugs 852040, 852041, 852034, 852032, 855638, 865643, 846636, 846635, 860625, 854927, 855190)

Summary Places is the database containing bookmarks, history, etc. Historically, Places was implemented purely on the main thread, which is something we very much want to remove, as any main thread I/O can block the user interface for arbitrarily lengthy durations. This set of bugs is part of the larger effort to get rid of Places main thread I/O. The objective here is to isolate and cleanup Places backup, to later allow removing it entirely from the main thread.

Status Working prototype.

APIs for updating/reading Places Off Main Thread (Bugs 834539, 834545)

Summary These bugs are part of the effort to provide a clean API, usable by platform and add-ons, to access/modify Places information off the main thread.

Status In progress.

Move Form History to use Off Main Thread storage (Bug 566746)

Summary This bug is part of the larger effort to get rid of Places main thread I/O. The objective here is to move Form History I/O off the main thread.

Status In progress.

Make about:home use IndexedDB instead of LocalStorage (Bug 789348)

Summary Currently, page about:home uses localStorage to store some data. This is not good, as localStorage does blocking main thread I/O. This bug is about porting about:home to use indexedDB instead.

Status In progress.

Download Architecture Improvements (Bug 825588 and sub-bugs)

Summary Our Architecture for Downloads has grown organically for about 15 years. Part of it is executed on the main thread and synchronously. The objective of this meta-bug is to re-implement Downloads with a modern architecture, asynchronous, off the main thread, and accessible from JavaScript.

Status In progress.

Constant stack space Promise (Bug 810490)

Summary Much of our main thread asynchronous work uses Promises. The current implementation of Promise is very recursive and eats considerable amounts of stack. The objective here is to replace it with a new implementation of Promise that works in (almost) constant stack space.

Status Working partial prototype.

Reduce amount of I/O in session restore (Bug 833286)

Summary The algorithm used by Session Restore to back up its state is needlessly expensive. The objective of this bug is to replace it by an alternative implementation that requires much less I/O.

Status Working prototype.

Planning stage

Move session recording and activity monitoring to Gecko (Bug 841561)

Summary Firefox Health Report implements a sophisticated mechanism for determining whether a user is actively using the browser. This mechanism could be reimplemented in a more efficient and straightforward manner by moving it to Gecko itself. This is the objective of this bug.

Status Looking for developer.

Non-blocking Thumbnailing (Bug 744100)

Summary Currently, we capture thumbnails of all pages, e.g. for display in about:newtab, in Panorama or in add-ons. This blocks the main thread temporarily, as capturing a page requires repainting it into memory from the main thread. This bug is about removing completely the “repaint into memory” step and rather collaborate with the renderer to obtain a copy of the latest image rendered.

Status Design in progress.

Evaluate dropping “www.” from rev_host column (Bug 843357)

Summary The objective of this bug is to simplify parts of the Places database and its users by removing some data that appears unnecessary.

Status Evaluation in progress.

Optimize worker communication for large messages (Bug 852187)

Summary We sometimes need to communicate very large messages between the main thread and workers. For large enough messages, the communication itself ends up being very expensive for the main thread. This bug is about optimizing such communications.

Status Design in progress.

Announcing Project Async & Responsive

April 10, 2013 § 24 Comments

tl;dr

Project Snappy has been retired and replaced by several smaller projects, including Async & Responsive. The objective of this project is to improve the responsiveness of Firefox and the Mozilla Platform by converting key components to make them asynchronous and, wherever possible, to move them off the main thread.

The setting

Firefox and other Mozilla applications are great products, in particular in terms of performance. They are based on an extremely fast rendering engine, Gecko, and its companion JavaScript engine, which in addition to being the richest JS engine around, is also, these days, quite possibly the fastest. What is not so great, unfortunately, is that despite these great core performances, Mozilla applications have often been perceived as slow and sluggish.

Project Snappy was formed about 18 months ago to focus the effort by Mozilla developers to fight this perceived sluggishness. During this period, we have made tremendous progress, thanks to the commitment of everyone involved. Indeed, most of the long-term objectives of Snappy have been reached already. We have therefore decided to retire project Snappy, in favor of both a larger project Performance, and several sub-projects focusing on distinct aspects of Performance.

Let me introduce Asynchronous & Responsive [1], one of the sub-projects of Performance.

Project outline

Despite considerable progress, much of Firefox still behaves as a single-threaded application. Most services and components are initialized sequentially in the main thread, run in the main thread, are shutdown sequentially in the main thread. Also, most add-ons execute essentially in the main thread. As a consequence, any long-lived task can disrupt the user experience.

There are historical reasons for this, but in most cases, there is not deep blocker that would prevent us from rewriting services. Project Asynchronous & Responsive is now starting to support and focus the ongoing effort to get rid of main thread services and components, both in platform code and in add-on code, for the betterment of all Mozillakind.

This entails:

  • identifying blockers that prevent platform and add-on developers from deploying their code on non-main threads (generally, worker threads);
  • helping platform and add-on developers transition their code off-main thread;
  • actually transitioning some of our services and components off the main thread.

Please note that we have no intention of working on the JavaScript VM, on DOM or Graphics. These teams already have dedicated developers working on moving things off the main thread.

Following our progress

As I am the tech lead of this project, you will find more information on this blog, under category Performance.

I will try and post updates every second week.

[1] If you have an idea of a nicer name that does not sound too much like “Snappy”, we are interested :) Marxist jokes about Workers might or might not be accepted.

Where Am I?

You are currently browsing entries tagged with javascript at Il y a du thé renversé au bord de la table.

Follow

Get every new post delivered to your Inbox.

Join 25 other followers