Firefox, the Browser that has your Back[up]

June 26, 2014 § 19 Comments

One of the most important features of Firefox, in my opinion, is Session Restore. This component is responsible for ensuring that, even in case of crash, or if you upgrade your browser or an add-on that requires restart, your browser can reopen immediately and in the state in which you left it. As far as I am concerned, this feature is a life-safer.

Unfortunately, there are a few situations in which the Session Restore file may be corrupted – typically, if the computer is rebooted before the write is complete, or if it loses power, or if the operating system crashes or the disk is disconnected, we may end up losing our precious Session Restore. While any of these circumstances happens quite seldom, it needs to be applied as part of the following formula:

seldom · .5 billion users = a lot

I am excited to announce that we have just landed a new and improved Session Restore component in Firefox 33 that protects your precious data better than ever.

How it works

Firefox needs Session Restore to handle the following situations:

  • restarting Firefox without data loss after a crash of either Firefox, the Operating System, a driver or the hardware, or after Firefox has been killed by the Operating System during shutdown;
  • restarting Firefox without data loss after Firefox has been restarted due to an add-on or an upgrade;
  • quitting Firefox and, later, restarting without data loss.

In order to handle all of this, Firefox needs to take a snapshot of the state of the browser whenever anything happens, whether the user browses, fills a form, scrolls, or an application sets a Session Cookie, Session Storage, etc. (this is actually capped to one save every 15 seconds, to avoid overloading the computer). In addition, Firefox performs a clean save during shutdown.

While at the level of the application, the write mechanism itself is simple and robust, a number of things beyond the control of the developer can prevent either the Operating System or the hard drive itself from completing this write consistently – a typical example being tripping on the power plug of a desktop computer during the write.

The new mechanism involves two parts:

  • keeping smart backups to maximize the chances that at least one copy will be readable;
  • making use of the available backups to transparently avoid or minimize data loss.

The implementation actually takes very few lines of code, the key being to know the risks against which we defend.

Keeping backups

During runtime, Firefox remembers which files are known to be valid backups and which files should be discarded. Whenever a user interaction or a script requires it, Firefox writes the contents of Session Restore to a file called sessionstore-backups/recovery.js. If it is known to be good, the previous version of sessionstore-backups/recovery.js is first moved to sessionstore-backups/recovery.bak. In most cases, both files are valid and recovery.js contains a state less than 15 seconds old, while recovery.bak contains a state less than 30 seconds old. Additionally, the writes on both files are separated by at least 15 seconds. In most circumstances, this is sufficient to ensure that, even of hard drive crash during a write to recover.js, at least recovery.bak has been entirely written to disk.

During shutdown, Firefox writes a clean startup file to sessionstore.js. In most cases, this file is valid and contains the exact state of Firefox at the time of shutdown (minus some privacy filters). During startup, if sessionstore.js is valid, Firefox moves it to sessiontore-backup/previous.js. Whenever this file exists, it is valid and contains the exact state of Firefox at the time of the latest clean shutdown/startup. Note that, in case of crash, the latest clean shutdown/startup might be older than the latest actual startup, but this backup is useful nevertheless.

Finally, on the first startup after an update, Firefox copies sessionstore.js, if it is available and valid, to sessionstore-backups/upgrade.js-[build id]. This mechanism is designed primarily for testers of Firefox Nightly, who keep on the very edge, upgrading Firefox every day to check for bugs. Testers, if we introduce a bug that affects Session Restore, this can save your life.

As a side-note, we never use the operating system’s flush call, as 1/ it does not provide the guarantees that most developers expect; 2/ on most operating systems, it causes catastrophic slowdowns.

Recovering

All in all, Session Restore may contain the following files:

  • sessionstore.js (contains the state of Firefox during the latest shutdown – this file is absent in case of crash);
  • sessionstore-backups/recovery.js (contains the state of Firefox ≤ 15 seconds before the latest shutdown or crash – the file is absent in case of clean shutdown, if privacy settings instruct us to wipe it during shutdown, and after the write to sessionstore.js has returned);
  • sessionstore-backups/recovery.bak (contains the state of Firefox ≤ 30 seconds before the latest shutdown or crash – the file is absent in case of clean shutdown, if privacy settings instruct us to wipe it during shutdown, and after the removal of sessionstore-backups/recovery.js has returned);
  • sessionstore-backups/previous.js (contains the state of Firefox during the previous successful shutdown);
  • sessionstore-backups/upgrade.js-[build id] (contains the state of Firefox after your latest upgrade).

All these files use the JSON format. While this format has drawbacks, it has two huge advantages in this setting:

  • it is quite human-readable, which makes it easy to recover manually in case of an extreme crash;
  • its syntax is quite rigid, which makes it easy to find out whether it was written incompletely.

As our main threat is a crash that prevents us from writing the file entirely, we take advantage of the latter quality to determine whether a file is valid. Based on this, we test each file in the order indicated above, until we find one that is valid. We then proceed to restore it.

If Firefox was shutdown cleanly:

  1. In most cases, sessionstore.js is valid;
  2. In most cases in which sessionstore.js is invalid, sessionstore-backups/recovery.js is still present and valid (the likelihood of it being present is obviously higher if privacy settings do not instruct Firefox to remove it during shutdown);
  3. In most cases in which sessionstore-backups/recovery.js is invalid, sessionstore-backups/recovery.bak is still present, with an even higher likelihood of being valid (the likelihood of it being present is obviously higher if privacy settings do not instruct Firefox to remove it during shutdown);
  4. In most cases in which the previous files are absent or invalid, sessionstore-backups/previous.js is still present, in which case it is always valid;
  5. In most cases in which the previous files are absent or invalid, sessionstore-backups/upgrade.js-[...] is still present, in which case it is always valid.

Similarly, if Firefox crashed or was killed:

  1. In most cases, sessionstore-backups/recovery.js is present and valid;
  2. In most cases in which sessionstore-backups/recovery.js is invalid, sessionstore-backups/recovery.bak is pressent, with an even higher likelihood of being valid;
  3. In most cases in which the previous files are absent or invalid, sessionstore-backups/previous.js is still present, in which case it is always valid;
  4. In most cases in which the previous files are absent or invalid, sessionstore-backups/upgrade.js-[...] is still present, in which case it is always valid.

Numbers crunching

Statistics collected on Firefox Nightly 32 suggest that, out of 11.95 millions of startups, 75,310 involved a corrupted sessionstore.js. That’s roughly a corrupted sessionstore.js every 158 startups, which is quite a lot. This may be influenced by the fact that users of Firefox Nightly live on pre-alpha, so are more likely to encounter crashes or Firefox bugs than regular users, and that some of them use add-ons that may modify sessionstore.js themselves.

With the new algorithm, assuming that the probability for each file to be corrupted is independent and is p = 1/158, the probability of losing more than 30 seconds of data after a crash goes down to p^3 ≅ 1 / 4,000,000. If we haven’t removed the recovery files, the probability of losing more than 30 seconds of data after a clean shutdown and restart goes down to p^4 ≅ 1 / 630,000,000. This still means that , statistically speaking, at every startup, there is one user of Firefox somewhere around the world who will lose more than 30 seconds of data, but this is much, better than the previous situation by several orders of magnitude.

It is my hope that this new mechanism will transparently make your life better. Have fun with Firefox!

Revisiting uncaught asynchronous errors in the Mozilla Platform

May 30, 2014 § Leave a comment

Consider the following feature and its xpcshell test:

// In a module Foo
function doSomething() {
  // ...
  OS.File.writeAtomic("/an invalid path", "foo");
  // ...
}

// In the corresponding unit test
add_task(function*() {
  // ...
  Foo.doSomething();
  // ...
});

Function doSomething is obviously wrong, as it performs a write operation that cannot succeed. Until we started our work on uncaught asynchronous errors, the test passed without any warning. A few months ago, we managed to rework Promise to ensure that the test at least produced a warning. Now, this test will actually fail with the following message:

A promise chain failed to handle a rejection – Error during operation ‘write’ at …

This is particularly useful for tracking subsystems that completely forget to handle errors or tasks that forget to call yield.

Who is affected?

This change does not affect the runtime behavior of application, only test suites.

  • xpcshell: landed as part of bug 976205;
  • mochitest / devtools tests: waiting for all existing offending tests to be fixed, code is ready as part of bug 1016387;
  • add-on sdk: no started, bug 998277.

This change only affects the use of Promise.jsm. Support for DOM Promise is in bug 989960.

Details

We obtain a rejected Promise by:

  • throwing from inside a Task; or
  • throwing from a Promise handler; or
  • calling Promise.reject.

A rejection can be handled by any client of the rejected promise by registering a rejection handler. To complicate things, the rejection handler can be registered either before the rejection or after it.

In this series of patches, we cause a test failure if we end up with a Promise that is rejected and has no rejection handler either:

  • immediately after the Promise is garbage-collected;
  • at the end of the add_task during which the rejection took place;
  • at the end of the entire xpcshell test;

(whichever comes first).

Opting out

There are extremely few tests that should need to raise asynchronous errors and not catch them. So far, we have needed this two tests: one that tests the asynchronous error mechanism itself and another one that willingly crashes subprocesses to ensure that Firefox remains stable.

You should not need to opt out of this mechanism. However, if you absolutely need to, we have a mechanism for opting out. For more details, see object Promise.Debugging in Promise.jsm.

Any question?

Feel free to contact either me or Paolo Amadio.

A curse and a blessing

April 7, 2014 § 38 Comments

The curse

When Brendan Eich stepped in as a CEO, Mozilla and him were immediately faced a storm demanding his resignation because of his political opinions. To the best of my knowledge, none of those responsible for the storm were employees of the Mozilla Corporation and only 4 or 5 of them were members of the Mozilla Community (they were part of the Mozilla Foundation, which is a different organization).

When Brendan Eich resigned from his position as an employee of Mozilla, Mozilla was immediately faced by a storm assuming that Brendan Eich had been fired, either because of his opinions or as a surrender to the first storm.

Both storms are still raging, fueled by angry (and dismayed and saddened) crowds and incompetent news reporting.

We will miss Brendan. We have suffered and we will continue suffering from these storms. But we can also salvage from them.

The blessing

Think about it. We are being criticized by angry crowds. But the individuals who form these crowds are not our enemies. Many of them care deeply about Freedom of Speech and are shocked because they believe that we are extinguishing this freedom. Others care primarily about equality, an equality that can seldom be achieved wherever there is no Freedom of Speech.

Freedom of Speech. This is one of the core values of Mozilla, one of the values for which we have been fighting all these years.

We are being criticized by some of the people who need us most. They are our users, or our potential users, and they are getting in touch with us. Through Facebook, through Twitter, through the contribute form, through the governance mailing-list, through our blogs, or in real life discussions.

Some will say that we should ignore them. Some will be tempted to answer anger with anger and criticism with superiority.

Do neither. They are our users. They deserve to be heard.

We should listen to them. We should answer their concerns, not with FAQs or with press releases, but with individual answers, because these concerns are valid. We should explain what really happened. We should show them how Mozilla is largely about defending Freedom of Speech through the Open Web.

So please join the effort to answer the angry crowds. If you can, please reach out to media and the public and get the story out there. If only one person out of a hundred angry users receives the message and decides to join the community and the fight for the open web, we will have salvaged a victory out of the storm.

Wouldn’t it be nice?

April 2, 2014 § 2 Comments

Wouldn’t it be nice if Mozilla were a political party, with a single stance, a single state of mind and a single opinion?

Wouldn’t it be nice if people could decide to vote for or against Mozilla based on a single opinion of its leader?

But that’s not the case. We are Mozilla. We have thousands of different voices. We agree that users must be defended on the web. We fight for privacy and for freedom of speech and for education. On everything else, we might disagree, but that’s ok. We are Mozilla. We won’t let that stop us.

So please don’t ask us to exclude one of our own, no matter how much you disagree with his positions. We are Mozilla. We always disagree on most things that are not our mission. And we move forward, together.

Of course, if you want to change Mozilla, how we work and what we think, there is one way to do it. You can join us. Don’t worry, you don’t have to agree with us on much.

Shutting down things asynchronously

February 14, 2014 § Leave a comment

This blog entry is part of the Making Firefox Feel As Fast As Its Benchmarks series. The fourth entry of the series was growing much too long for a single blog post, so I have decided to cut it into bite-size entries.

A long time ago, Firefox was completely synchronous. One operation started, then finished, and then we proceeded to the next operation. However, this model didn’t scale up to today’s needs in terms of performance and performance perception, so we set out to rewrite the code and make it asynchronous wherever it matters. These days, many things in Firefox are asynchronous. Many services get started concurrently during startup or afterwards. Most disk writes are entrusted to an IO thread that performs and finishes them in the background, without having to stop the rest of Firefox.

Needless to say, this raises all sorts of interesting issues. For instance: « how do I make sure that Firefox will not quit before it has finished writing my files? » In this blog entry, I will discuss this issue and, more generally, the AsyncShutdown mechanism, designed to implement shutdown dependencies for asynchronous services.

« Read the rest of this entry »

Copying streams asynchronously

October 18, 2013 § Leave a comment

In the Mozilla Platform, I/O is largely about streams. Copying streams is a rather common activity, e.g. for the purpose of downloading files, decompressing archives, saving decoded images, etc. As usual, doing any I/O on the main thread is a very bad idea, so the recommended manner of copying streams is to use one of the asynchronous string copy APIs provided by the platform: NS_AsyncCopy (in C++) and NetUtil.asyncCopy (in JavaScript). I have recently audited both to ascertain whether they accidentally cause main thread I/O and here are the results of my investigations.

In C++

What NS_AsyncCopy does

NS_AsyncCopy is a well-designed (if a little complex) API. It copies the full contents of an input stream into an output stream, then closes both. NS_AsyncCopy can be called with both synchronous and asynchronous streams. By default, all operations take place off the main thread, which is exactly what is needed.

In particular, even when used with the dreaded Safe File Output Stream, NS_AsyncCopy will perform every piece of I/O out of the main thread.

The default setting of reading data by chunks of 4kb might not be appropriate to all data, as it may cause too much I/O, in particular if you are reading a small file. There is no obvious way for clients to detect the right setting without causing file I/O, so it might be a good idea to eventually extend NS_AsyncCopy to autodetect the “right” chunk size for simple cases.

Bottom line: NS_AsyncCopy is not perfect but it is quite good and it does not cause main thread I/O.

Limitations

NS_AsyncCopy will, of course, not remove main thread I/O that takes place externally. If you open a stream from the main thread, this can cause main thread I/O. In particular, file streams should really be opened with flag DEFER_OPEN flag. Other streams, such as nsIJARInputStream do not support any form of deferred opening (bug 928329), and will cause main thread I/O when they are opened.

While NS_AsyncCopy does only off main thread I/O, using a Safe File Output Stream will cause a Flush. The Flush operation is very expensive for the whole system, even when executed off the main thread. For this reason, Safe File Output Stream is generally not the right choice of output stream (bug 928321).

Finally, if you only want to copy a file, prefer OS.File.copy (if you can call JS). This function is simpler, entirely off main thread, and supports OS-specific accelerations.

In JavaScript

What NetUtil.asyncCopy does

NetUtil.asyncCopy is a utility method that lets JS clients call NS_AsyncCopy. Theoretically, it should have the same behavior. However, some oddities make its performance lower.

As NS_AsyncCopy requires one of its streams to be buffered, NetUtil.asyncCopy calls nsIIOUtil::inputStreamIsBuffered and nsIIOUtil::outputStreamIsBuffered. These methods detect whether a stream is buffered by attempting to perform buffered I/O. Whenever they succeed, this causes main thread I/O (bug 928340).

Limitations

Generally speaking, NetUtil.asyncCopy has the same limitations as NS_AsyncCopy. In particular, in any case in which you can replace NetUtil.asyncCopy with OS.File.copy, you should pick the latter, which is both simpler and faster.

Also, NetUtil.asyncCopy cannot read directly from a Zip file (bug 927366).

Finally, NetUtil.asyncCopy does not fit the “modern” way of writing asynchronous code on the Mozilla Platform (bug 922298).

Helping out

We need to fix a few bugs to improve the performance of asynchronous copy. If you wish to help, please do not hesitate to pick any of the bugs listed above and get in touch with me.

Making Firefox Feel as Fast as its Benchmarks – Part 2 – Towards multi-process

October 9, 2013 § 2 Comments

As we saw in the first post of this series, our browser behaves as follows:

function browser() {
  while (true) {
    handleEvents();  // Let's make this faster
    updateDisplay();
  }
}

As we discussed, the key to making the browser smooth is to make handleEvents() do less. One way of doing this is to go multi-process.

Going multi-process

Chrome is multi-process. Internet Explorer 4 was multi-process and so is Internet Explorer 8+ do (don’t ask me where IE 5, 6, 7 went). Well, Firefox OS is multi-process, too and Firefox for Android used to be multi-process until we canceled this approach due to Android-specific issues. For the moment, Firefox Desktop is only slightly multi-process, although we are heading further in this direction with project electrolysis (e10s, for short).

In a classical browser (i.e. not FirefoxOS, not the Firefox Web Runtime), going multi-process means running the user interface and system-style code in one process (the “parent process”) and running code specific to the web or to a single tab in another process (the “child process”). Whether all tabs share a process or each tab is a separate process, or even each iframe is a process, is an additional design choice that, I believe, is still open to discussion in Firefox. In FirefoxOS and in the Firefox Web Runtime (part of Firefox Desktop and Firefox for Android), going multi-process generally means one process per (web) application.

Since code is separated between processes, each handleEvents() has less to do and will therefore, at least in theory, execute faster. Additionally, this is better for security, insofar as a compromised web-specific process affords an attacker less control than compromising the full process. Finally, this gives the possibility to crash a child process if necessary, without having to crash the whole browser.

Coding for multi-process

In the Firefox web browser, the multi-process architecture is called e10s and looks as follows:

function parent() {
  while (true) {
    handleEvents();  // Some of your code here
    updateDisplay(); // Just the ui
  }
}
function child() {
  while (true) {
    handleEvents();  // Some of your code here
    updateDisplay(); // Just some web
  }
}

parent() ||| child()

The parent process and the child process are not totally independent. Very often, they need to communicate. For instance, when the user browses in a tab, the parent needs to change the history menu displayed by the parent process. Similarly, every few seconds, Firefox saves its state to provide quick recovery in case of crash – the parent asks each tab for its information and, once all replies have arrived, gathers them into one data structure and saves them all.

For this purpose, parent and the child can send messages to each other through the Message Manager. A Message Manager can let a child process communicate with a single parent process and can let a parent process communicate with one or more children processes:

// Sender-side
messageManager.sendAsyncMessage("MessageTopic", {data})

// Receiver-side
messageManager.addMesageListener("MessageTopic", this);
// ...
receiveMessage: function(message) {
  switch (message.name) {
  case "MessageTopic":
    // do something with message.data
    // ...
    break;
  }
}

Additionally, code executed in the parent process can inject code in the child process using the Message Manager, as follows:

messageManager.loadFrameScript("resource://path/to/script.js", true);

Once injected, the code behaves as any (privileged) code in the child process.

As you may see, communications are purely asynchronous – we do not wish the Message Manager to stop a process and wait until another process is done with it tasks, as this would totally defeat the purpose of multi-processing. There is an exception, called the Cross Process Object Wrapper, which I am not going to cover, as this mechanism is meant to be used only during a transition phase.

Limitations

It is tempting to see multi-process architecture as a silver bullet that semi-magically makes Firefox (or its competitors) fast and smooth. There are, however, quite clear limitations to the model.

Firstly, going multi-process has a cost. As demonstrated by Chrome, each process consumes lots of memory. Each process needs to load its libraries, its JavaScript VM, each script must be JIT-ed for each process, each process needs its communication channgel towards the GPU etc. Optimizing this is possible, as demonstrated by FirefoxOS (which runs nicely with 256 Mb) but is a challenge.

Similarly, starting a multi-process browser can be much slower than starting a single-process browser. Between the instant the user launches the browser and the instant it becomes actually usable, many things need to happen: launching the parent process, which in turn launches the children processes, setting up the communication channels, JIT compiling all the scripts that need compilation, etc. The same cost appears when shutting down the processes.

Also, using several processes brings about a risk of contention on resources. Two processes may need to access the disk cache at the same time, or the cookies, or the session storage, or the GPU or the audio device. All of this needs to be managed carefully and can, in certain cases, slow down considerably both processes.

Also, some APIs are synchronous by specifications. If, for some reason, a child process needs to access the DOM of another child process – as may happen in the case of iframes – both child processes need to become synchronous. During the operation, they both behave as a single process, with just extremely slower DOM operations.

And finally, going multi-process will of course not make a tab magically responsive if this tab itself is the source of the slowdown – in other words, multi-process it not very useful for games.

Refactoring for multi-process

Many APIs, both in Firefox itself and in add-ons, are not e10s-compliant yet. The task of refactoring Firefox APIs into something e10s-compliant is in progress and can be followed here. Let’s see what needs to be done to refactor an API for multi-process.

Firstly, this does not apply to all APIs. APIs that access web content for non-web content need to be converted to e10s-style – an example is Page Info, which needs to access web content (the list of links and images from that page) for the purpose of non-web content (the Page Info button and dialog). As multi-process communications is asynchronous, this means that such APIs must be asynchronous already or must be made asynchronous if they are not, and that code that calls these APIs needs to be made asynchronous if it is not asynchronous already, which in itself is already a major task. We will cover making things asynchronous in another entry of this series.

Once we have decided to make an asynchronous API e10s-compliant, the following step is to determine which part of the implementation needs to reside in a child process and which part in the parent process. Typically, anything that touches the web content must reside in the child process. As a rule of thumb, we generally consider that the parent process is more performance-critical than children-processes, so if you have code that could reside either in the child process or in the parent process, and if placing that code in the child process will not cause duplication of work or very expensive communications, it is a good idea to move it to the child process. This is, of course, a rule of thumb, and nothing replaces testing and benchmarking.

The next step is to define a communication protocol. Messages exchanged between the parent process and children processes all need a name. If are working on feature Foo, by conventions, the name of your messages should start with “Foo:”. Recall that message sending is asynchronous, so if you need a message to receive an answer, you will need two messages: one for sending the request (“Foo:GetState”) and one for replying once the operation is complete (“Foo:State”). Messages can carry arbitrary data in the form of a JavaScript structure (i.e. any object that can be converted to/from JSON without loss). If necessary, these structures may be used to attach unique identifiers to messages, so as to easily match a reply to its request – this feature is not built into the message manager but may easily be implemented on top of it. Also, do not forget to take into account communication timeouts – recall that a process may fail to reply because it has crashed or been killed for any reason.

The last step is to actually write the code. Code executed by the parent process typically goes into some .js file loaded from XUL (e.g. browser.js) or a .jsm module, as usual. Code executed by a child process goes into its own file, typically a .js, and must be injected into the child process during initialization by using window.messageManager.loadFrameScript (to inject in all children process) or browser.messageManager.loadFrameScript (to inject in a specific child process).

That’s it! In a future blog entry, I will write more about common patterns for writing or refactoring asynchronous code, which comes in very handy for code that uses your new API.

Contributing to e10s

The ongoing e10s refactoring of Firefox is a considerable task. To make it happen, the best way is to contribute to coding or testing.

What’s next?

In the next blog entry, I will demonstrate how to make front-end and add-on code multi-threaded.

Project Async & Responsive, issue 4

July 4, 2013 § 4 Comments

It is told that, on the far side of the Sea of Ocean, in a mythical city sitting on a Great Lake, the Fellowship finally met. They saw face-to-face, reviewed each other’s quests, prepared for future adventures and ate perhaps a little too much.

Storage: Support for mozIStorageAsyncConnection (bug 702559) has finally landed, thus bringing type-safe off-main thread storage and adding off-main thread opening of storage. The never-ending work to convert synchronous database clients to asynchronous storage proceeds, including inline autocomplete (bug 791776), async annotations (bug 699844), async bookmarks backups (https://etherpad.mozilla.org/places-backups-changes).
Other: We have improve the startup speed for Session Restore (bug 887780), added a backup feature during upgrade for session data
 (bug 876168) and we are still progressing on cleaning up and making Session Restore asynchronous. Finally, a battle-tested version of the the module loader for workers is now nearly ready to land, along with a refactored version of the core of OS.File (bug 888479).

Alors comme ça, votre projet a besoin de contributeurs ?

June 21, 2013 § Leave a comment

Comment décevoir un contributeur

Il était une fois un projet de logiciel libre (ou, d’ailleurs, un projet associatif). Un jour, un anonyme se présenta et annonça qu’il souhaitait aider. C’était une bonne nouvelle, car le projet avait bien besoin de contributeurs supplémentaires. Malheureusement, au bout de quelques jours, l’anonyme disparût, car il n’arrivait pas à aider.

C’est une histoire assez triste. Elle vous est peut-être familière, soit dans la peau du contributeur existant, soit dans la peau de l’anonyme qui voulait contribuer. Cette histoire est malheureusement fréquente dans les projets qui cherchent à s’étendre. Voyons ce que nous pourrions faire pour changer la fin du conte.

Il était une fois un projet de logiciel libre (ou, d’ailleurs, un projet associatif). Un jour, un anonyme se présenta et annonça qu’il souhaitait aider. C’était une bonne nouvelle, car le projet avait bien besoin de contributeurs supplémentaires. Les contributeurs existants avaient justement préparé des documents pour guider des nouveaux venus et étaient prêts à répondre aux questions de l’anonyme. L’anonyme suivit le guide de contribution. Après avoir suivi ce guide, l’anonyme chercha à quoi il pouvait bien contribuer. Malheureusement, au bout de quelques jours ou quelques semaines, l’anonyme n’avait pas trouvé en quoi il pouvait aider et il disparût.

Malgré tous les efforts des contributeurs existants, l’histoire est toujours aussi triste. Alors que faire pour arriver à une fin heureuse ?

Il était une fois un projet de logiciel libre (ou, d’ailleurs, un projet associatif). Comme dans tous les projets de ce genre, il y avait énormément de choses à faire et pas assez de contributeurs pour tout accomplir. Les contributeurs avaient pris pour habitude de noter sur une liste de tâche facilement accessible tout ce qu’ils n’avaient pas encore eu le temps de mener à bien. Certaines de ces tâches étaient accessibles à des nouveaux venus. Pensant aux futurs contributeurs, les contributeurs existants s’assuraient donc que ces tâches accessibles étaient faciles à trouver et que n’importe quel nouveau venu pouvait facilement contacter la personne qui avait ajouté la tâche dans la liste, pour lui demander des conseils. De plus, les contributeurs existants avaient préparé des documents pour guider des nouveaux venus et étaient prêts à répondre aux questions de l’anonyme. L’anonyme suivit le guide de contribution, qui le mena à des tâches accessibles. Il trouva une tâche qui l’intéressait et un mentor pour l’aider à démarrer. Ils vécurent heureux et eurent beaucoup de contributions.

Le système des bugs mentorés

Chez Mozilla, nous utilisons depuis quelques années le système que je viens de décrire, avec un succès impressionnant. Tous les quelques jours, sur les projets que je suis, de nouveaux contributeurs se présentent, suivent les tutoriels, choisissent une tâche, se mettent immédiatement au travail – et finissent la plupart du temps par publier leurs contributions, et un peu plus tard par devenir eux-mêmes mentors sur d’autres tâches.

Marquer une tâche comme mentorée prend environ deux secondes.

  1. je viens d’ouvrir un bug sur Bugzilla et je réalise qu’un débutant pourrait certainement le traiter avec un peu d’aide ;
  2. j’ajoute dans le champ libre (« whiteboard ») du bug l’information [mentor=Yoric] – à partir de ce moment-là, les nouveaux venus peuvent trouver ce bug dans notre moteur de recherche de bugs mentorés ;
  3. j’en profite pour ajouter dans ce même champ libre l’information [lang=js][lang=c++] – à partir de ce moment-là, les nouveaux venus cherchant des bugs dans l’une des deux technologies “JavaScript” ou “C++” verront s’afficher ce bug ;
  4. c’est fini – un de ces jours, un contributeur me contactera peut-être pour demander s’il peut travailler sur ce bug.

Bien entendu, l’exemple utilise Bugzilla et des contributions techniques mais il est assez simple d’étendre le système à d’autres gestionnaires de tâches et à des tâches purement non techniques.

Pour un nouveau venu, commencer est aussi très simple :

  1. lire notre document d’introduction et suivre le lien vers le moteur de recherche de bugs mentorés ;
  2. choisir des centres d’intérêt et un bug ;
  3. contacter le mentor par mail ou par irc.

Certaines étapes peuvent encore être fluidifiées (le nom et l’adresse du mentor ne sont pas toujours évidents à trouver à l’écran, etc.) mais c’est en cours. Nous espérons que le système pourra, à terme, être généralisé à tous les projets de Mozilla, techniques ou non.

Du coup, si vous participez à un projet (Mozilla ou autre) qui n’emploie pas un tel système de bugs mentorés et qui cherche des contributeurs, je vous invite vivement à essayer.

Project Async & Responsive, issue 1

April 26, 2013 § 3 Comments

In the previous episodes

Our intrepid heroes, a splinter cell from Snappy, have set out on a quest to offer alternatives to all JavaScript-accessible APIs that blocks the main thread of Firefox & Mozilla Apps.

Recently completed

Various cleanups on Session Restore (Bug 863227, 862442, 861409)

Summary Currently, we regularly (~every 15 seconds) save the state of every single window, tab, iframe, form being displayed, so as to be able to restore the session quickly in case of crash, power failure, etc. As this can be done only on the main thread, just the jank of data collection is often noticeable (i.e. > 50 ms). We are in the process of refactoring Session Restore to make it both faster and more responsive. These bugs are steps towards optimizations.

Status Landed. More cleanups in progress.

Telemetry for Number of Threads (Bug 724368)

Summary As we make Gecko and add-ons more and more concurrent, we need to measure whether this concurrency can cause accidental internal Denial of Service. The objective of this bug is to measure.

Status Landed. You can follow progression in histograms BACKGROUNDFILESAVER_THREAD_COUNT and SIMPLEMEASURES_MAXIMALNUMBEROFCONCURRENTTHREADS.

Reduce number of fsync() in Firefox Health Report (Bug 830492)

Summary Firefox Health Report stores its data using mozStorage. The objective of this bug is to reduce the number of expensive disk synchronizations performed by FHR.

Status Landed.

Ongoing bugs

Out Of Process Thumbnailing (Bug 841495)

Summary Currently, we capture thumbnails of all pages, e.g. for display in about:newtab, in Panorama or in add-ons. This blocks the main thread temporarily. This bug is about using another process to capture thumbnails of pages currently being visited. This can be useful for both security/privacy reasons (i.e. to ensure that bank account numbers do not show up in thumbnails) and responsiveness (i.e. to ensure that we never block browsing).

Status In progress.

Cutting Session Restore data collection into chunks (Bug 838577)

Summary Currently, we regularly (~every 15 seconds) save the state of every single window, tab, iframe, form being displayed, so as to be able to restore the session quickly in case of crash, power failure, etc. As this can be done only on the main thread, just the jank of data collection is often noticeable (i.e. > 50 ms). This bug is about cutting data collection in smaller chunks, to remove that jank.

Status Working prototype.

Off Main Thread database storage (Bug 702559)

Summary We are in the process of moving as many uses of mozStorage out of the main thread. While mozStorage already supports doing much I/O off the main thread, there is no clear way for developers to enforce this. This bug is about providing a subset of mozStorage that performs all I/O off the main thread and that will serve as target for both ongoing refactorings and future uses of mozStorage, in particular by add-ons.

Status Working prototype.

Improvements transaction management by JavaScript API for Off Main Thread database storage (Bug 856925)

Summary Sqlite.jsm is our JavaScript library for convenient Off Main Thread database storage. This bug is about improving how implicit transactions are handled by the library, hence improving performance.

Status In progress.

Refactor how Places data is backed up (Bugs 852040, 852041, 852034, 852032, 855638, 865643, 846636, 846635, 860625, 854927, 855190)

Summary Places is the database containing bookmarks, history, etc. Historically, Places was implemented purely on the main thread, which is something we very much want to remove, as any main thread I/O can block the user interface for arbitrarily lengthy durations. This set of bugs is part of the larger effort to get rid of Places main thread I/O. The objective here is to isolate and cleanup Places backup, to later allow removing it entirely from the main thread.

Status Working prototype.

APIs for updating/reading Places Off Main Thread (Bugs 834539, 834545)

Summary These bugs are part of the effort to provide a clean API, usable by platform and add-ons, to access/modify Places information off the main thread.

Status In progress.

Move Form History to use Off Main Thread storage (Bug 566746)

Summary This bug is part of the larger effort to get rid of Places main thread I/O. The objective here is to move Form History I/O off the main thread.

Status In progress.

Make about:home use IndexedDB instead of LocalStorage (Bug 789348)

Summary Currently, page about:home uses localStorage to store some data. This is not good, as localStorage does blocking main thread I/O. This bug is about porting about:home to use indexedDB instead.

Status In progress.

Download Architecture Improvements (Bug 825588 and sub-bugs)

Summary Our Architecture for Downloads has grown organically for about 15 years. Part of it is executed on the main thread and synchronously. The objective of this meta-bug is to re-implement Downloads with a modern architecture, asynchronous, off the main thread, and accessible from JavaScript.

Status In progress.

Constant stack space Promise (Bug 810490)

Summary Much of our main thread asynchronous work uses Promises. The current implementation of Promise is very recursive and eats considerable amounts of stack. The objective here is to replace it with a new implementation of Promise that works in (almost) constant stack space.

Status Working partial prototype.

Reduce amount of I/O in session restore (Bug 833286)

Summary The algorithm used by Session Restore to back up its state is needlessly expensive. The objective of this bug is to replace it by an alternative implementation that requires much less I/O.

Status Working prototype.

Planning stage

Move session recording and activity monitoring to Gecko (Bug 841561)

Summary Firefox Health Report implements a sophisticated mechanism for determining whether a user is actively using the browser. This mechanism could be reimplemented in a more efficient and straightforward manner by moving it to Gecko itself. This is the objective of this bug.

Status Looking for developer.

Non-blocking Thumbnailing (Bug 744100)

Summary Currently, we capture thumbnails of all pages, e.g. for display in about:newtab, in Panorama or in add-ons. This blocks the main thread temporarily, as capturing a page requires repainting it into memory from the main thread. This bug is about removing completely the “repaint into memory” step and rather collaborate with the renderer to obtain a copy of the latest image rendered.

Status Design in progress.

Evaluate dropping “www.” from rev_host column (Bug 843357)

Summary The objective of this bug is to simplify parts of the Places database and its users by removing some data that appears unnecessary.

Status Evaluation in progress.

Optimize worker communication for large messages (Bug 852187)

Summary We sometimes need to communicate very large messages between the main thread and workers. For large enough messages, the communication itself ends up being very expensive for the main thread. This bug is about optimizing such communications.

Status Design in progress.

Where Am I?

You are currently browsing entries tagged with mozilla at Il y a du thé renversé au bord de la table.

Follow

Get every new post delivered to your Inbox.

Join 30 other followers