Making Firefox Feel as Fast as its Benchmarks – part 3 – Going multi-threaded

October 29, 2013 § 11 Comments

As we saw in the previous posts, our browser behaves as follows

function browser() {
  while (true) {
    handleEvents();  // Let's make this faster
    updateDisplay();
  }
}

The key to making the browser smooth is to make handleEvents() do less. We have already discussed the ongoing work to make Firefox multi-process, their goals and their limitations. Another, mostly orthogonal, path, is to go multi-threaded.

Going multi-threaded

Going multi-threaded is all about splitting the event loop in several loops, executed concurrently, on several cores whenever applicable and possible:

function browser() {
  main() ||| worker() ||| worker() // Running concurrently
}

task main() { // Main thread (time-critical)
  while (true) {
    handleEvents(); // Some of your code here
    updateDisplay();
  }
}

task worker() {
  while (true) {
    handleEvents(); // Some of your code here
  }
}

task worker() {
  while (true) {
    updateDisplay();
  }
}

The main thread remains time-critical and needs to loop 60 times per second, while other threads handle some of the workload of both handleEvents() and updateDisplay(). Now, this treatment is only useful if we can isolate operations that slow down the main loop measurably. As it turns out, there are many such operations lying around, including:

  • Network I/O;
  • Disk I/O;
  • Database I/O;
  • GPU I/O;
  • Treating large amounts of data.

It is easy to see why Network I/O could considerably slow down the main loop, if it were handled on the main thread – after all, some requests take several seconds to receive a reply, or never do, and if the main thread had to wait for the completion of these requests before it proceeded, this would cause multi-second gaps between two frames, which is simply not acceptable.

The cost of disk I/O, however, is often underestimated. Few developers realize that _any_ disk operation can take an unbounded amount of time – even closing a file or checking whether a file exists can, in some cases, take several seconds. This may seem counter-intuitive, as these operations do very little besides book-keeping, but one must not forget that they rely upon the device itself and that said device can unpredictably become very slow, typically because it is otherwise busy, or asleep – or even because that device is actually a network device. Database I/O is a special case of Disk I/O that we generally single out because its cost is often much higher than users suspect – recall that, in addition to saving, a database management system will typically need to maintain a journal and to flush the drive regularly, to protect data against both software or hardware failures, including sudden power loss. Consequently, unless the database has been heavily customized to lift the safety requirements in favor of performance, you should expect that every operation on your database will cause heavy disk I/O.

Finally, treating large amounts of data, or applying any other form of heavy algorithm, will of course take time.

None of these operations should take place on the main thread. Moving them off the main thread will largely contribute to getting rid of the jank caused by these operations.

Coding for multi-threading

In the Firefox web browser, threads are materialized as instances of nsIThread in C++ code and as instances of ChromeWorker in JavaScript code. For this discussion, I will concentrate on JavaScript code as refactoring C++ code is, well, complicated. Side-note: if you are new here, recall that Chrome Workers have nothing to with the Chrome Web Browser and everything to do with the Mozilla Chrome, i.e. the parts of Gecko and Firefox written in JavaScript.

Chrome Workers are an extension of Web Workers, and have the same semantics, plus a few additions. Instantiating a ChromeWorker requires a source file:

let worker = new ChromeWorker("resource://path/to/my_file.js");

We may send messages to and from a Chrome Worker

// In the parent
worker.postMessage(someValue);

// In the worker
self.postMessage(someValue);

and, of course, receive messages

// In the parent
worker.addEventListener("message", function(msg) {
// A copy of the message appears in msg.data
});

// In the worker
self.addEventListener("message", function(msg) {
// A copy of the message appears in msg.data
});

In either case, the contents of the message gets copied between threads, with essentially the same semantics as JSON.stringify/JSON.parse. If necessary, binary data in messages (ArrayBuffer or the upcoming Typed Objects) can be transferred instead of being copied, which is faster.

As Web Workers, Chrome Workers are very good to perform computations. In addition, they have a number of low-level libraries to access system features. Such libraries can be loaded with the chrome worker module loader:

let MyModule = require("resource://...");

Further modules can be defined for consumption with the chrome worker module loader:

module.exports = {
  foo: // ...
};

Finally, they can call into C code using the js-ctypes foreign function interface:

let lib = ctypes.open("path/to/my_lib");
let fun = lib.declare("myFunction", ctypes.void); // void myFunction()
fun(); // Call into C

Combining the module loader and js-ctypes makes for a powerful combination that has been used to provide access to low-level libraries, including low-level file manipulation (module OS.File), phone communication (module RIL, shorthand for Radio Interface Layer), file (de)compression, etc.

Limitations

Where multi-process is good at protecting a process against other processes, going multi-threaded works nicely to protect a process (a tab, the ui, etc.) against itself. Threads take up much less resources than processes and are also much faster to start and stop. However, they have very strict limitations.

The main limitation is that they do not have access to all the main thread APIs. Each API needs to be ported individually to chrome workers. Until recently, there was no manner to define or load modules. At the moment, there is no way to read or write a compressed file from a Chrome Worker, or to access a database from a Chrome Worker. In most cases, this is only a question of time and manpower, and we can hope to eventually bring almost all important APIs to Chrome Workers. However, some APIs cannot be ported at all, in particular any API that requires a DOM window, which is most (fortunately not all) DOM APIs.

Also, the paradigm behind Chrome Workers is purely asynchronous. This means that there is no way for a Chrome Worker to wait synchronously until some treatment has been completed by the main thread. This complicates code in a few cases but, in general, this is rarely a problem.

Also, the communication mechanism needs to be taken into account:  as copying long messages can block the main thread. In some cases, it may be necessary to perform aggressive optimization of messages to avoid such situations.

Refactoring for multi-threading

The first thing to take into consideration while refactoring for multi-process is whether this is the best strategy. Since most APIs and most customization possibilities live on the main thread, most features need to be produced and/or consumed by the main thread. This does not mean that going multi-threaded is not possible, only that your code will probably end up looking like an asynchronous API meant to be used mostly on the main thread but implemented off the main thread. This also means that your consumers must be architectured to accept an asynchronous API. We will cover making things asynchronous in another entry of this series.

Once we have decided to go multi-threaded, the next part is to determine what goes of the main thread. Generally, you want to move as much as you can off the main thread. The only limits are things that you simply cannot move off the main thread (e.g. access to the document), or if you realize that the data you need to copy (not transfer) across threads will slow down the main thread inacceptably. This, of course, is something that can be determined only by benchmarking.

Next, you will need to define a communication protocol between the main thread and the worker. Threads communicate by sending pure data (i.e. objects without methods, without DOM nodes, etc.) and binary data can be transfered for high-performance. Recall that communications are asynchronous, so if you want a thread to respond to another one, you will need to build into your protocol identification to match a reply to a request. This is not built-in, but quite easy to do. Handling errors requires a little finesse, as uncaught exceptions on the worker are transmitted to a onerror listener instead of the usual onmessage listener, and lose some information along the way.

In some (hopefully rare) cases, you will need to add new bindings to native code, so as to call C functions (only C, not C++) from JavaScript. For this purpose, take a look at the documentation of js-ctypes, our JavaScript FFI, and osfile_shared_allthreads.jsm, a set of lightweight extensions to js-ctypes that handle a number of platform-specific gotchas. As finding the correct libraries to link is sometimes tricky, you should take advantage of OS.Constants.Path, that already lists some of them. Don’t hesitate to file bugs if you realize that something important is missing. Also, in a few (hopefully almost non-existent) cases, you will need to expose additional C code to native code, typically to expose some C++-only features. For this purpose, take a look at an example.

Unsurprisingly, the next step is to write the JS code. The usual caveats apply, just don’t forget to use the module system. Worker code goes into its own file, typically with extension “.js”. It is generally a good idea to mention “worker” in the name of the file, e.g. “foo_worker.js”, and to deploy your code to "resource://.../worker/..." or "chrome://.../worker/..." to avoid ambiguities. To construct the worker, it is then sufficient to call new ChromeWorker("resource://path/to/your/file.js"). The worker code will be started lazily when the first message is sent.

For automated testing, you can for instance use mochitest-chrome or (once bug 930924 has landed) xpcshell-tests. In the latter, if you need to add new worker code for the sake of testing, you should install it with the chrome:// protocol. Also, for any testing, don’t forget to look at your system console, as worker errors are displayed on that console by default.

That’s it! In a future blog entry, I will write more about common patterns for writing or refactoring asynchronous code, which comes in very handy for code that uses your new API.

Contributing

Refactoring Firefox as a set of asynchronous APIs backed by off main thread implementations is a considerable task. To make it happen, the best way is to contribute to coding, testing or documentation

Chrome Workers, now with modules!

July 17, 2013 § 3 Comments

One of the main objectives of Project Async is to encourage Firefox developers and Firefox add-on developers to use Chrome Workers to ensure that whatever they do doesn’t block Firefox’ UI thread. The main obstacle, for the moment, is that Chrome Workers have access to very few features, so one the tasks of Project Async is to add features to Chrome Workers.

Today, let me introduce the Module Loader for Chrome Workers.

« Read the rest of this entry »

Announcing Project Async & Responsive

April 10, 2013 § 24 Comments

tl;dr

Project Snappy has been retired and replaced by several smaller projects, including Async & Responsive. The objective of this project is to improve the responsiveness of Firefox and the Mozilla Platform by converting key components to make them asynchronous and, wherever possible, to move them off the main thread.

The setting

Firefox and other Mozilla applications are great products, in particular in terms of performance. They are based on an extremely fast rendering engine, Gecko, and its companion JavaScript engine, which in addition to being the richest JS engine around, is also, these days, quite possibly the fastest. What is not so great, unfortunately, is that despite these great core performances, Mozilla applications have often been perceived as slow and sluggish.

Project Snappy was formed about 18 months ago to focus the effort by Mozilla developers to fight this perceived sluggishness. During this period, we have made tremendous progress, thanks to the commitment of everyone involved. Indeed, most of the long-term objectives of Snappy have been reached already. We have therefore decided to retire project Snappy, in favor of both a larger project Performance, and several sub-projects focusing on distinct aspects of Performance.

Let me introduce Asynchronous & Responsive [1], one of the sub-projects of Performance.

Project outline

Despite considerable progress, much of Firefox still behaves as a single-threaded application. Most services and components are initialized sequentially in the main thread, run in the main thread, are shutdown sequentially in the main thread. Also, most add-ons execute essentially in the main thread. As a consequence, any long-lived task can disrupt the user experience.

There are historical reasons for this, but in most cases, there is not deep blocker that would prevent us from rewriting services. Project Asynchronous & Responsive is now starting to support and focus the ongoing effort to get rid of main thread services and components, both in platform code and in add-on code, for the betterment of all Mozillakind.

This entails:

  • identifying blockers that prevent platform and add-on developers from deploying their code on non-main threads (generally, worker threads);
  • helping platform and add-on developers transition their code off-main thread;
  • actually transitioning some of our services and components off the main thread.

Please note that we have no intention of working on the JavaScript VM, on DOM or Graphics. These teams already have dedicated developers working on moving things off the main thread.

Following our progress

As I am the tech lead of this project, you will find more information on this blog, under category Performance.

I will try and post updates every second week.

[1] If you have an idea of a nicer name that does not sound too much like “Snappy”, we are interested 🙂 Marxist jokes about Workers might or might not be accepted.

Tales of Science-Fiction Bugs: The Thing That Killed Talos

November 12, 2012 § 4 Comments

Have you ever encountered one of these bugs? One in which every single line of your code is correct, in which every type-check passes, every single unit test succeeds, the specifications are fulfilled but somehow, for no reason that can be explained rationally, it just does not work? I call them Science-Fiction Bugs. I am sure that you have met some of them. For some reason, the Mozilla Performance Team seems to stumble upon such bugs rather often, perhaps because we spend so much time refactoring other team’s code long after the original authors have moved on to other features, and combining their code with undertested libraries and technologies. Truly, this is life on the Frontier.

Today, I would like to tell you the tale of one of these Science-Fiction Bugs: The Thing That Killed Talos.

« Read the rest of this entry »

Where Am I?

You are currently browsing entries tagged with workers at Il y a du thé renversé au bord de la table.