Firefox, the Browser that has your Back[up]

June 26, 2014 § 21 Comments

One of the most important features of Firefox, in my opinion, is Session Restore. This component is responsible for ensuring that, even in case of crash, or if you upgrade your browser or an add-on that requires restart, your browser can reopen immediately and in the state in which you left it. As far as I am concerned, this feature is a life-safer.

Unfortunately, there are a few situations in which the Session Restore file may be corrupted – typically, if the computer is rebooted before the write is complete, or if it loses power, or if the operating system crashes or the disk is disconnected, we may end up losing our precious Session Restore. While any of these circumstances happens quite seldom, it needs to be applied as part of the following formula:

seldom · .5 billion users = a lot

I am excited to announce that we have just landed a new and improved Session Restore component in Firefox 33 that protects your precious data better than ever.

How it works

Firefox needs Session Restore to handle the following situations:

  • restarting Firefox without data loss after a crash of either Firefox, the Operating System, a driver or the hardware, or after Firefox has been killed by the Operating System during shutdown;
  • restarting Firefox without data loss after Firefox has been restarted due to an add-on or an upgrade;
  • quitting Firefox and, later, restarting without data loss.

In order to handle all of this, Firefox needs to take a snapshot of the state of the browser whenever anything happens, whether the user browses, fills a form, scrolls, or an application sets a Session Cookie, Session Storage, etc. (this is actually capped to one save every 15 seconds, to avoid overloading the computer). In addition, Firefox performs a clean save during shutdown.

While at the level of the application, the write mechanism itself is simple and robust, a number of things beyond the control of the developer can prevent either the Operating System or the hard drive itself from completing this write consistently – a typical example being tripping on the power plug of a desktop computer during the write.

The new mechanism involves two parts:

  • keeping smart backups to maximize the chances that at least one copy will be readable;
  • making use of the available backups to transparently avoid or minimize data loss.

The implementation actually takes very few lines of code, the key being to know the risks against which we defend.

Keeping backups

During runtime, Firefox remembers which files are known to be valid backups and which files should be discarded. Whenever a user interaction or a script requires it, Firefox writes the contents of Session Restore to a file called sessionstore-backups/recovery.js. If it is known to be good, the previous version of sessionstore-backups/recovery.js is first moved to sessionstore-backups/recovery.bak. In most cases, both files are valid and recovery.js contains a state less than 15 seconds old, while recovery.bak contains a state less than 30 seconds old. Additionally, the writes on both files are separated by at least 15 seconds. In most circumstances, this is sufficient to ensure that, even of hard drive crash during a write to recover.js, at least recovery.bak has been entirely written to disk.

During shutdown, Firefox writes a clean startup file to sessionstore.js. In most cases, this file is valid and contains the exact state of Firefox at the time of shutdown (minus some privacy filters). During startup, if sessionstore.js is valid, Firefox moves it to sessiontore-backup/previous.js. Whenever this file exists, it is valid and contains the exact state of Firefox at the time of the latest clean shutdown/startup. Note that, in case of crash, the latest clean shutdown/startup might be older than the latest actual startup, but this backup is useful nevertheless.

Finally, on the first startup after an update, Firefox copies sessionstore.js, if it is available and valid, to sessionstore-backups/upgrade.js-[build id]. This mechanism is designed primarily for testers of Firefox Nightly, who keep on the very edge, upgrading Firefox every day to check for bugs. Testers, if we introduce a bug that affects Session Restore, this can save your life.

As a side-note, we never use the operating system’s flush call, as 1/ it does not provide the guarantees that most developers expect; 2/ on most operating systems, it causes catastrophic slowdowns.

Recovering

All in all, Session Restore may contain the following files:

  • sessionstore.js (contains the state of Firefox during the latest shutdown – this file is absent in case of crash);
  • sessionstore-backups/recovery.js (contains the state of Firefox ≤ 15 seconds before the latest shutdown or crash – the file is absent in case of clean shutdown, if privacy settings instruct us to wipe it during shutdown, and after the write to sessionstore.js has returned);
  • sessionstore-backups/recovery.bak (contains the state of Firefox ≤ 30 seconds before the latest shutdown or crash – the file is absent in case of clean shutdown, if privacy settings instruct us to wipe it during shutdown, and after the removal of sessionstore-backups/recovery.js has returned);
  • sessionstore-backups/previous.js (contains the state of Firefox during the previous successful shutdown);
  • sessionstore-backups/upgrade.js-[build id] (contains the state of Firefox after your latest upgrade).

All these files use the JSON format. While this format has drawbacks, it has two huge advantages in this setting:

  • it is quite human-readable, which makes it easy to recover manually in case of an extreme crash;
  • its syntax is quite rigid, which makes it easy to find out whether it was written incompletely.

As our main threat is a crash that prevents us from writing the file entirely, we take advantage of the latter quality to determine whether a file is valid. Based on this, we test each file in the order indicated above, until we find one that is valid. We then proceed to restore it.

If Firefox was shutdown cleanly:

  1. In most cases, sessionstore.js is valid;
  2. In most cases in which sessionstore.js is invalid, sessionstore-backups/recovery.js is still present and valid (the likelihood of it being present is obviously higher if privacy settings do not instruct Firefox to remove it during shutdown);
  3. In most cases in which sessionstore-backups/recovery.js is invalid, sessionstore-backups/recovery.bak is still present, with an even higher likelihood of being valid (the likelihood of it being present is obviously higher if privacy settings do not instruct Firefox to remove it during shutdown);
  4. In most cases in which the previous files are absent or invalid, sessionstore-backups/previous.js is still present, in which case it is always valid;
  5. In most cases in which the previous files are absent or invalid, sessionstore-backups/upgrade.js-[...] is still present, in which case it is always valid.

Similarly, if Firefox crashed or was killed:

  1. In most cases, sessionstore-backups/recovery.js is present and valid;
  2. In most cases in which sessionstore-backups/recovery.js is invalid, sessionstore-backups/recovery.bak is pressent, with an even higher likelihood of being valid;
  3. In most cases in which the previous files are absent or invalid, sessionstore-backups/previous.js is still present, in which case it is always valid;
  4. In most cases in which the previous files are absent or invalid, sessionstore-backups/upgrade.js-[...] is still present, in which case it is always valid.

Numbers crunching

Statistics collected on Firefox Nightly 32 suggest that, out of 11.95 millions of startups, 75,310 involved a corrupted sessionstore.js. That’s roughly a corrupted sessionstore.js every 158 startups, which is quite a lot. This may be influenced by the fact that users of Firefox Nightly live on pre-alpha, so are more likely to encounter crashes or Firefox bugs than regular users, and that some of them use add-ons that may modify sessionstore.js themselves.

With the new algorithm, assuming that the probability for each file to be corrupted is independent and is p = 1/158, the probability of losing more than 30 seconds of data after a crash goes down to p^3 ≅ 1 / 4,000,000. If we haven’t removed the recovery files, the probability of losing more than 30 seconds of data after a clean shutdown and restart goes down to p^4 ≅ 1 / 630,000,000. This still means that , statistically speaking, at every startup, there is one user of Firefox somewhere around the world who will lose more than 30 seconds of data, but this is much, better than the previous situation by several orders of magnitude.

It is my hope that this new mechanism will transparently make your life better. Have fun with Firefox!

Shutting down things asynchronously

February 14, 2014 § Leave a comment

This blog entry is part of the Making Firefox Feel As Fast As Its Benchmarks series. The fourth entry of the series was growing much too long for a single blog post, so I have decided to cut it into bite-size entries.

A long time ago, Firefox was completely synchronous. One operation started, then finished, and then we proceeded to the next operation. However, this model didn’t scale up to today’s needs in terms of performance and performance perception, so we set out to rewrite the code and make it asynchronous wherever it matters. These days, many things in Firefox are asynchronous. Many services get started concurrently during startup or afterwards. Most disk writes are entrusted to an IO thread that performs and finishes them in the background, without having to stop the rest of Firefox.

Needless to say, this raises all sorts of interesting issues. For instance: « how do I make sure that Firefox will not quit before it has finished writing my files? » In this blog entry, I will discuss this issue and, more generally, the AsyncShutdown mechanism, designed to implement shutdown dependencies for asynchronous services.

« Read the rest of this entry »

Trapping uncaught asynchronous errors

October 14, 2013 § 2 Comments

While the official specifications of DOM Promise is still being worked on, Mozilla has been using Promise internally for several years already. This API is available to the platform front-end and to add-ons. In the past few weeks, Promise.jsm (our implementation of Promise) and Task.jsm (our implementation of Beautiful Concurrency in JavaScript, built on top of Promise) have been updated with a few new features that should make everybody’s life much easier.

Reporting errors

The #1 issue developers encounter with the use of Promise and Task is error-handling. In non-Promise code, if a piece of code throws an error, by default, that error will eventually be reported by window.onerror or any of the other error-handling mechanisms.

function fail() {
  let x;
  return x.toString();
}

fail(); // Displays somewhere: "TypeError: x is undefined"

By opposition, with Promise and/or Task, if a piece of code throws an error or rejects, by default, this error will be completely ignored:

Task.spawn(function*() {
  fail(); // Error is ignored
});

 

Task.spawn(function*() {
  yield fail(); // Error is ignored
});

 

somePromise.then(function onSuccess() {
  fail(); // Error is ignored
});

 

somePromise.then(function onSuccess() {
  return fail(); // Error is ignored
});

Debugging the error requires careful instrumentation of the code, which is error-prone, time-consuming, often not compositional and generally ugly to maintain:

Task.spawn(function*() {
  try {
    fail();
  } catch (ex) {
    Components.utils.reportError(ex);
    throw ex;
    // The error report is incomplete, re-throwing loses stack information
    // and can cause double-reporting
  }
});

The main reason we errors end up dropped silently is that it is difficult to find out whether an error is eventually caught by an error-handler – recall that, with Promise and Task, error handlers can be registered long after the error has been triggered.

Well, after long debates, we eventually found solutions to fix the issue :)

Simple case: Reporting programming errors

Our first heuristic is that programming errors are, well, programming errors, and that programmers are bound to be looking for them.

So,

Task.spawn(function*() {
  fail(); // Error is not invisible anymore
});

will now cause the following error message

*************************
A coding exception was thrown and uncaught in a Task.

Full message: TypeError: x is undefined
Full stack: fail@Scratchpad/2:23
@Scratchpad/2:27
TaskImpl_run@resource://gre/modules/Task.jsm:217
TaskImpl@resource://gre/modules/Task.jsm:182
Task_spawn@resource://gre/modules/Task.jsm:152
@Scratchpad/2:26
*************************

The message appears on stderr (if you have launched Firefox from the command-line) and in the system logs, so it won’t disrupt your daily routine, but if you are running tests or debugging your code, you should see it.

A similar error message will be printed out if the error is thrown from a raw Promise, without use of Task.

These error messages are limited to programming errors and appear only if the errors are thrown, not passed as rejections.

General case: Reporting uncaught errors

Now, we have just landed a more general support for displaying uncaught errors.

Uncaught thrown error

Task.spawn(function* () {
  throw new Error("BOOM!"); // This will be displayed
});

Uncaught rejection

Task.spawn(function* () {
  yield Promise.reject("BOOM!"); // This will also be displayed
});

Uncaught and clearly incorrect rejection

Task.spawn(function* () {
  Promise.reject("BOOM!");
  // Oops, forgot to yield.
  // Nevermind, this will be displayed, too
});

These will be displayed in the browser console as follows:

A promise chain failed to handle a rejection: on Mon Oct 14 2013 16:50:15 GMT+0200 (CEST), Error: BOOM! at
@Scratchpad/2:27
TaskImpl_run@resource://gre/modules/Task.jsm:217
TaskImpl@resource://gre/modules/Task.jsm:182
Task_spawn@resource://gre/modules/Task.jsm:152
@Scratchpad/2:26

These error messages appear for every uncaught error or rejection, once it is certain that the error/rejection cannot be caught anymore. If you are curious about the implementation, just know that it hooks into the garbage-collector to be informed once the error/promise cannot be caught anymore.

This should prove very helpful when debugging Promise- or Task-related errors. Have fun :)

Support for ES6 generators

You may have noticed that the above examples use function*() instead of function(). Be sure to thank Brandon Benvie who has recently updated Task.jsm to be compatible with ES6 generators :)

Project Async & Responsive, issue 1

April 26, 2013 § 3 Comments

In the previous episodes

Our intrepid heroes, a splinter cell from Snappy, have set out on a quest to offer alternatives to all JavaScript-accessible APIs that blocks the main thread of Firefox & Mozilla Apps.

Recently completed

Various cleanups on Session Restore (Bug 863227, 862442, 861409)

Summary Currently, we regularly (~every 15 seconds) save the state of every single window, tab, iframe, form being displayed, so as to be able to restore the session quickly in case of crash, power failure, etc. As this can be done only on the main thread, just the jank of data collection is often noticeable (i.e. > 50 ms). We are in the process of refactoring Session Restore to make it both faster and more responsive. These bugs are steps towards optimizations.

Status Landed. More cleanups in progress.

Telemetry for Number of Threads (Bug 724368)

Summary As we make Gecko and add-ons more and more concurrent, we need to measure whether this concurrency can cause accidental internal Denial of Service. The objective of this bug is to measure.

Status Landed. You can follow progression in histograms BACKGROUNDFILESAVER_THREAD_COUNT and SIMPLEMEASURES_MAXIMALNUMBEROFCONCURRENTTHREADS.

Reduce number of fsync() in Firefox Health Report (Bug 830492)

Summary Firefox Health Report stores its data using mozStorage. The objective of this bug is to reduce the number of expensive disk synchronizations performed by FHR.

Status Landed.

Ongoing bugs

Out Of Process Thumbnailing (Bug 841495)

Summary Currently, we capture thumbnails of all pages, e.g. for display in about:newtab, in Panorama or in add-ons. This blocks the main thread temporarily. This bug is about using another process to capture thumbnails of pages currently being visited. This can be useful for both security/privacy reasons (i.e. to ensure that bank account numbers do not show up in thumbnails) and responsiveness (i.e. to ensure that we never block browsing).

Status In progress.

Cutting Session Restore data collection into chunks (Bug 838577)

Summary Currently, we regularly (~every 15 seconds) save the state of every single window, tab, iframe, form being displayed, so as to be able to restore the session quickly in case of crash, power failure, etc. As this can be done only on the main thread, just the jank of data collection is often noticeable (i.e. > 50 ms). This bug is about cutting data collection in smaller chunks, to remove that jank.

Status Working prototype.

Off Main Thread database storage (Bug 702559)

Summary We are in the process of moving as many uses of mozStorage out of the main thread. While mozStorage already supports doing much I/O off the main thread, there is no clear way for developers to enforce this. This bug is about providing a subset of mozStorage that performs all I/O off the main thread and that will serve as target for both ongoing refactorings and future uses of mozStorage, in particular by add-ons.

Status Working prototype.

Improvements transaction management by JavaScript API for Off Main Thread database storage (Bug 856925)

Summary Sqlite.jsm is our JavaScript library for convenient Off Main Thread database storage. This bug is about improving how implicit transactions are handled by the library, hence improving performance.

Status In progress.

Refactor how Places data is backed up (Bugs 852040, 852041, 852034, 852032, 855638, 865643, 846636, 846635, 860625, 854927, 855190)

Summary Places is the database containing bookmarks, history, etc. Historically, Places was implemented purely on the main thread, which is something we very much want to remove, as any main thread I/O can block the user interface for arbitrarily lengthy durations. This set of bugs is part of the larger effort to get rid of Places main thread I/O. The objective here is to isolate and cleanup Places backup, to later allow removing it entirely from the main thread.

Status Working prototype.

APIs for updating/reading Places Off Main Thread (Bugs 834539, 834545)

Summary These bugs are part of the effort to provide a clean API, usable by platform and add-ons, to access/modify Places information off the main thread.

Status In progress.

Move Form History to use Off Main Thread storage (Bug 566746)

Summary This bug is part of the larger effort to get rid of Places main thread I/O. The objective here is to move Form History I/O off the main thread.

Status In progress.

Make about:home use IndexedDB instead of LocalStorage (Bug 789348)

Summary Currently, page about:home uses localStorage to store some data. This is not good, as localStorage does blocking main thread I/O. This bug is about porting about:home to use indexedDB instead.

Status In progress.

Download Architecture Improvements (Bug 825588 and sub-bugs)

Summary Our Architecture for Downloads has grown organically for about 15 years. Part of it is executed on the main thread and synchronously. The objective of this meta-bug is to re-implement Downloads with a modern architecture, asynchronous, off the main thread, and accessible from JavaScript.

Status In progress.

Constant stack space Promise (Bug 810490)

Summary Much of our main thread asynchronous work uses Promises. The current implementation of Promise is very recursive and eats considerable amounts of stack. The objective here is to replace it with a new implementation of Promise that works in (almost) constant stack space.

Status Working partial prototype.

Reduce amount of I/O in session restore (Bug 833286)

Summary The algorithm used by Session Restore to back up its state is needlessly expensive. The objective of this bug is to replace it by an alternative implementation that requires much less I/O.

Status Working prototype.

Planning stage

Move session recording and activity monitoring to Gecko (Bug 841561)

Summary Firefox Health Report implements a sophisticated mechanism for determining whether a user is actively using the browser. This mechanism could be reimplemented in a more efficient and straightforward manner by moving it to Gecko itself. This is the objective of this bug.

Status Looking for developer.

Non-blocking Thumbnailing (Bug 744100)

Summary Currently, we capture thumbnails of all pages, e.g. for display in about:newtab, in Panorama or in add-ons. This blocks the main thread temporarily, as capturing a page requires repainting it into memory from the main thread. This bug is about removing completely the “repaint into memory” step and rather collaborate with the renderer to obtain a copy of the latest image rendered.

Status Design in progress.

Evaluate dropping “www.” from rev_host column (Bug 843357)

Summary The objective of this bug is to simplify parts of the Places database and its users by removing some data that appears unnecessary.

Status Evaluation in progress.

Optimize worker communication for large messages (Bug 852187)

Summary We sometimes need to communicate very large messages between the main thread and workers. For large enough messages, the communication itself ends up being very expensive for the main thread. This bug is about optimizing such communications.

Status Design in progress.

Why Firefox OS matters to me

March 16, 2013 § 5 Comments

These days, everybody seems to be talking about Firefox OS. About how removing the barrier of the marketplace will make the world a better place, or about how HTML5 is so darn great, or about the fact that a gazillion constructors and operators are supporting Firefox OS. And that’s great, because Firefox OS is an impressively good product and deserves this attention.

However, all this craze is missing one feature that makes Firefox OS my choice of mobile operating system: I can write a playable prototype for a simple game, from scratch, in two hours.

Of course, this was a prototype, and completing the game took me a few more days of adding 8 bit graphics, optimizing, toying with the rules, adding difficulty levels, high scores, etc. But after just two hours, I could play the game on computer, tablet and cellphone, and decide where to proceed from here. This was both my first HTML5 game and my first mobile game, by the way. It is by no means an AAA game, but it is fun enough that I sometimes play it in the subway. By the way, did I mention that, once I was satisfied with this game, I could publish it in just a few seconds, simply by hosting it anywhere on the web?

Oh, and another feature: I wrote a quite usable comic book reader in the subway, while commuting from/to work. It took me a few days of commuting (three days, I seem to remember) to obtain a tool that works quite nicely. Due to screen size, I prefer using it on my Android tablet than on a cellphone, but that’s the wonders of HTML5 and Open Web Applications: I developed for one, and it worked for both. Did I mention that this was my first attempt at writing a web application that does file I/O or that uses the touch screen intelligently? I will try and finalize and release this application one of these days.

Now, other developers or users might not share this feeling, but this simplicity to start coding and publish and evolve a game or application is of tremendous importance to me. Because one day, I will have a child in age of playing video games. And for his birthday, I will have a chance to download a 5€ game from the Firefox Marketplace (or anywhere else), but more importantly, I will be able to build a game with his favorite characters as support cast and him as a hero. I hope he will love it. And I will not need to ask for permission.

If there is some application you want to develop, neither will you.

Mozilla Student Projects update

February 27, 2013 § 3 Comments

It has been quite some time since the last update. Since then, many things have happened, both with the Student Projects and with the world of Mozilla. We have had the exciting FirefoxOS AppDays, many alpha, beta and near-final versions of FirefoxOS, and the MWC launch of FirefoxOS.

Well, without further ado, let us see how the student projects have progressed.

« Read the rest of this entry »

Asynchronous file I/O for the Mozilla Platform

October 3, 2012 § 18 Comments

The Mozilla platform has recently been extended with a new JavaScript library for asynchronous, efficient, file I/O. With this library, developers of Firefox, Firefox OS and add-ons can easily write code that behave nicely with respect to the process and the operating system. Please use it, report bugs and contribute.

Off-main thread file I/O

Almost one year ago, Mozilla started Project Snappy. The objective of Project Snappy is to improve, wherever possible, the responsiveness of Firefox, the Mozilla Platform, and now, Firefox OS, based on performance data collected from volunteer users. Thanks to this real-world performance data, we have been able to identify a number of bottlenecks at all levels of Firefox. As it turns out, one of the main bottlenecks is main thread file I/O, i.e. reading from a file or writing to a file from the thread that also runs most of the code of Firefox and its add-ons.

« Read the rest of this entry »

Where Am I?

You are currently browsing entries tagged with programming at Il y a du thé renversé au bord de la table.

Follow

Get every new post delivered to your Inbox.

Join 31 other followers