Je Suis Charlie, but I Am Vigilant

January 13, 2015 § 2 Comments

(This text has initially been written for the French-speaking Mozilla Community. Most members of the community haven’t had a chance to review or sign it yet.)

20150108_144913

I am Charlie. Some of us grew up with Cabu’s children cartoons or Wolinkski’s willies. Some of us laughed at Charb’s cover pages, others much less, and yet others had never even heard of Charlie Hebdo. Despite our differences, from the bottom of our heart, we are with those who defend Free Speech, the right to discuss, draw, make laugh or cringe.

I am a Cop. Some among us work directly with law enforcement, or ensure the online safety of individuals or organizations. Some of us make their voice heard when legal or executive powers around the world decide that security, convenience or economic interests matter more than the rights of users. All, we salute the police officers murdered or wounded these last few days as they attempted to save innocents.

I am Jew, or Muslim, or Anything else. Some among us are Jew, or Muslim, or Christian, or anything else, and, frankly, most of us don’t care who is what. All, we are horrified that, in the 21st century, extremists may still decide to murder innocents, solely because they might be Jew, and because they had decided to go the grocery store. All, we are appalled that, in the 21st century, extremists may still decide to attack innocents, just because they might be Muslems, through threats, physical violence or through their places of cult. All, we are shocked whenever opportunists praise murders or violence, or call for hatred or the ones or the others.

I am Collateral. Before we are the Mozilla Community, we are a community of individuals. Any one of us could have been at the front desk of this building, or on the path of that car, hostage or collateral kill of the assassins. Our minute of silence is for the anonymous ones, too.

I am Vigilant. Some of us believe that strong and immediate measures must be taken. Others prefer to wait until the emotion has passed before we can think of an appropriate response. All, we wait to see how the attacks of January 7th and January 9th 2015 will change our society. All, we remain vigilant, to make sure that, on top of the blood of the dead, our society does not choose to sacrifice Human Rights, Free Speech and Privacy, in the name of a securitarian ideology or opportunistic economical interests.

I am the French-speaking Mozilla Community.


Text edited by myself. List of signatures of the French version.

Je Suis Charlie

January 7, 2015 § Leave a comment

Charlie

Vous souhaitez apprendre à développer des Logiciels Libres ?

November 29, 2014 § Leave a comment

Cette année, la Communauté Mozilla propose à Paris un cycle de Cours/TDs autour du Développement de Logiciels Libres.

Au programme :

  • comment se joindre à un projet existant ;
  • comment communiquer dans une équipe distribuée ;
  • comment financer un projet de logiciel libre ;
  • qualité du code ;
  • du code !
  • (et beaucoup plus).

Pour plus de détails, et pour vous inscrire, tout est ici.

Attention, les cours commencent le 8 décembre !

RFC: We deserve better than runtime warnings

November 20, 2014 § 2 Comments

Consider the following scenario:

  1. Module A prints warnings when it’s used incorrectly;
  2. Module B uses module A correctly;
  3. Some future refactoring of module B starts using module A incorrectly, hence displaying the warnings;
  4. Nobody realises for months, because we have too many warnings;
  5. Eventually, something breaks.

How often has this happened to everyone of us?

This scenario has many variants (e.g. module A changed and nobody realized that module B is now in a situation it misuses module A), but they all boil down to the same thing: runtime warnings are designed to be lost, not fixed. To make things worse, many of our warnings are not actionable, simply because we have no way of knowing where they come from – I’m looking at you, Cu.reportError.

So how do we fix this?

We would certainly save considerable amounts of time if warnings caused immediate assertion failures, or alternatively test failures (i.e. fail, but only when running the unit tests). Unfortunately, we can do neither, as we have a number of tests that trigger the warnings either

  • by design (e.g. to check that we can recover from such misuses of A, or because we still need a now-considered-incorrect use of an API to keep working until we have ported all the clients to the better API);
  • or for historical reasons (e.g. the now incorrect use of A used to be correct, but we haven’t fixed all tests that depend on it yet).

However, I believe that causing test failures is still the solution. We just need a mechanism that supports a form of whitelisting to cope with the aforementioned cases.

Introducing RuntimeAssert

RuntimeAssert is an experiment at provoding a standard mechanism to replace warnings. I have a prototype implemented as part of bug 1080457. Feedback would be appreciated.

The key features are the following:

  • when a test suite is running, a call to `RuntimeAssert` causes the test suite to fail;
  • when a test suite is running, a call to `RuntimeAssert` contains at least the filename/line number of where it was triggered, preferably a stack wherever available;
  • individual tests can whitelist families of calls to `RuntimeAssert` and mark them either as expected;
  • individual tests can whitelist families of calls to `RuntimeAssert` and mark them as pending fix;
  • when a test suite is not running, a call to `RuntimeAssert` does nothing costly (it may default to PR_LOG or Cu.reportError).

Possible API:

  • in JS, we trigger a test failure by calling RuntimeAssert.fail(keyword, string or Error) from production code;
  • in C++, we likewise trigger a test failure by calling MOZ_RUNTIME_ASSERT(keyword, string);
  • in the testsuite, we may whitelist errors by calling Assert.whitelist.expected(keyword, regexp)  or Assert.whitelist.FIXME(keyword, regexp).

Examples:

//
// Module
//
let MyModule = {
  oldAPI: function(foo) {
    RuntimeAssert.fail(“Deprecation”, “Please use MyModule.newAPI instead of MyModule.oldAPI”);
    // ...
  },
  newAPI: function(foo) {
    // ...
  },
};

let MyModule2 = {
  api: function() {
    return somePromise().then(null, error => {
      RuntimeAssert.fail(“MyModule2.api”, error);
      // Rather than leaving this error uncaught, let’s make it actionable.
    });
  },

  api2: function(date) {
    if (typeof date == “number”) {
      RuntimeAssert.fail(“MyModule2.api2”, “Passing a number has been deprecated, please pass a Date”);
      date = new Date(date);
    }
    // ...
   }
}


//
// Whitelisting a RuntimeAssert in a test.
//

// This entire test is about MyModule.oldAPI, warnings are normal.
Assert.whitelist.expected(“Deprecation”, /Please use MyModule.newAPI/);

// We haven’t fixed all calls to MyModule2.api2, so they should still warn, but not cause an orange.
Assert.whitelist.FIXME(“MyModule2.api2”, /please pass a Date/);

Assert.whitelist.expected(“MyModule2.api”, /TypeError/, function() {
  // In this test, we will trigger a TypeError in MyModule2.api, that’s entirely expected.
  // Ignore such errors within the (async) scope of this function.
});

Applications

In the long-term, I believe that RuntimeAssert (or some other mechanism) should replace almost all our calls to Cu.reportError.

In the short-term, I plan to use this for reporting

  • uncaught Promise rejections, which currently require a bit too much hacking for my tastes;
  • errors in XPCOM.lazyModuleGetter & co;
  • failures during AsyncShutdown;
  • deprecation warnings as part of Deprecated.jsm.

The Future of Promise

November 19, 2014 § Leave a comment

If you are writing JavaScript in mozilla-central or in an add-on, or if you are writing WebIDL code, by now, you have probably made use of Promise. You may even have noticed that we now have several implementations of Promise in mozilla-central, and that things are moving fast, and sometimes breaking.
At the moment, we have two active implementations of Promise:
(as well as a little code using an older, long deprecated, implementation of Promise)
This is somewhat confusing, but the good news is that we are working hard at making it simpler and moving everything to DOM Promise.

General Overview

Many components of mozilla-central have been using Promise for several years, way before a standard was adopted, or even discussed. So we had to come up with our implementation(s) of Promise. These implementations were progressively folded into Promise.jsm, which is now used pervasively in mozilla-central and add-ons.
In parallel, Promise were specified, submitted for standardisation, implemented in Firefox, and finally standardised. This is the second implementation we call DOM Promise. This implementation is starting to be used in many places on the web.
Having two implementations of Promise with the same feature set doesn’t make sense. Fortunately, Promise.jsm was designed to match the API of Promise that we believed would be standardised, and was progressively refactored and extended to follow these developments, so both APIs are almost identical.
Our objective is to move entirely to DOM Promise. There are still a few things that need to happen before this is possible, but we are getting close. I hope that we can get there by the end of 2014.

Missing pieces

Debugging and testing

At the moment, Promise.jsm is much better than DOM Promise in two aspects:
  • it is easier to inspect a promise from Promise.jsm for debugging purposes (not anymore, things have been moving fast while I was writing this blog entry);
  • Promise.jsm integrates nicely in the test suite, to make sure that uncaught errors are reported and cause test failures.
In both topics, we are hard at work bringing DOM Promise to feature parity with Promise.jsm and then some (bug 989960, bug 1083361). Most of the patches are in the pipeline already.

API differences

  • Promise.jsm offers an additional function Promise.defer, which didn’t make it to standardization.
This function may easily be written on top of DOM Promise, so this is not a hard blocker. We are going to add this function to a module `PromiseUtils.jsm`.
  • Also, there is a slight bug in DOM Promise that gives it a slightly unexpected behavior in a few edge cases. This should not hit developers who use DOM Promise as expected, but this might surprise people who know the exact scheduling algorithm and expect it to be consistent between Promise.jsm and DOM Promise.

Oh, wait, that’s fixed already.

Wrapping it up

Once we have done all of this, we will be able to replace Promise.jsm with an empty shell that defers all implementations to DOM Promise. Eventually, we will deprecate and remove this module.

As a developer, what should I do?

For the moment, you should keep using Promise.jsm, because of the better testing/debugging support. However, please do not use Promise.defer. Rather, use PromiseUtils.defer, which is strictly equivalent but is not going away.
We will inform everyone once DOM Promise becomes the right choice for everything.
If your code doesn’t use Promise.defer, migrating to DOM Promise should be as simple as removing the line that imports Promise.jsm in your module.

What David Did During Q3

September 30, 2014 § 6 Comments

September is ending, and with it Q3 of 2014. It’s time for a brief report, so here is what happened during the summer.

Session Restore

After ~18 months working on Session Restore, I am progressively switching away from that topic. Most of the main performance issues that we set out to solve have been solved already, we have considerably improved safety, cleaned up lots of the code, and added plenty of measurements.

During this quarter, I have been working on various attempts to optimize both loading speed and saving speed. Unfortunately, both ongoing works were delayed by external factors and postponed to a yet undetermined date. I have also been hard at work on trying to pin down performance regressions (which turned out to be external to Session Restore) and safety bugs (which were eventually found and fixed by Tim Taubert).

In the next quarter, I plan to work on Session Restore only in a support role, for the purpose of reviewing and mentoring.

Also, a rant The work on Session Restore has relied heavily on collaboration between the Perf team and the FxTeam. Unfortunately, the resources were not always available to make this collaboration work. I imagine that the FxTeam is spread too thin onto too many tasks, with too many fires to fight. Regardless, the symptom I experienced is that during the course of this work, both low-priority, high-priority and safety-critical patches have been left to rot without reviews, despite my repeated requests, for 6, 8 or 10 weeks, much to the dismay of everyone involved. This means man·months of work thrown to /dev/null, along with quarterly objectives, morale, opportunities, contributors and good ideas.

I will try and blog about this, eventually. But please, in the future, everyone: remember that in the long run, the priority of getting reviews done (or explaining that you’re not going to) is a quite higher than the priority of writing code.

Async Tooling

Many improvements to Async Tooling landed during Q3. We now have the PromiseWorker, which simplifies considerably the work of interacting between the main thread and workers, for both Firefox and add-on developers. I hear that the first add-on to make use of this new feature is currently being developed. New features, bugfixes and optimizations landed for OS.File. We have also landed the ability to watch for changes in a directory (under Windows only, for the time being).

Sadly, my work on interactions between Promise and the Test Suite is currently blocked until the DevTools team manages to get all the uncaught asynchronous errors under control. It’s hard work, and I can understand that it is not a high priority for them, so in Q4, I will try to find a way to land my work and activate it only for a subset of the mochitest suites.

Places

I have recently joined the newly restarted effort to improve the performance of Places, the subsystem that handles our bookmarks, history, etc. For the moment, I am still getting warmed up, but I expect that most of my work during Q4 will be related to Places.

Shutdown

Most of my effort during Q3 was spent improving the Shutdown of Firefox. Where we already had support for shutting down asynchronously JavaScript services/consumers, we now also have support for native services and consumers. Also, I am in the process of landing Telemetry that will let us find out the duration of the various stages of shutdown, an information that we could not access until now.

As it turns out, we had many crashes during asynchronous shutdown, a few of them safety-critical. At the time, we did not have the necessary tools to determine to prioritize our efforts or to find out whether our patches had effectively fixed bugs, so I built a dashboard to extract and display the relevant information on such crashes. This proved a wise investment, as we spent plenty of time fighting AsyncShutdown-related fires using this dashboard.

In addition to the “clean shutdown” mechanism provided by AsyncShutdown, we also now have the Shutdown Terminator. This is a watchdog subsystem, launched during shutdown, and it ensures that, no matter what, Firefox always eventually shuts down. I am waiting for data from our Crash Scene Investigators to tell us how often we need this watchdog in practice.

Community

I lost track of how many code contributors I interacted with during the quarter, but that represents hundreds of e-mails, as well as countless hours on IRC and Bugzilla, and a few hours on ask.mozilla.org. This year’s mozEdu teaching is also looking good.

We also launched FirefoxOS in France, with big success. I found myself in a supermarket, presenting the ZTE Open C and the activities of Mozilla to the crowds, and this was a pleasing experience.

For Q4, expect more mozEdu, more mentoring, and more sleepless hours helping contributors debug their patches :)

The Battle of Session Restore – Season 1 Episode 3 – All With Measure

July 17, 2014 § 4 Comments

Plot For the second time, our heroes prepared for battle. The startup of Firefox was too slow and Session Restore was one of the battle fields.

When Firefox starts, Session Restore is in charge of restoring the browser to its previous state, in case of a crash, a restart, or for the users who have configured Firefox to resume from its previous state. This entails numerous activities during startup:

  1. read sessionstore.js from disk, decode it and parse it (recall that the file is potentially several Mb large), handling errors;
  2. backup sessionstore.js in case of startup crash.
  3. create windows, tabs, frames;
  4. populate history, scroll position, forms, session cookies, session storage, etc.

It is common wisdom that Session Restore must have a large impact on Firefox startup. But before we could minimize this impact, we needed to measure it.

Benchmarking is not easy

When we first set foot on Session Restore territory, the contribution of that module to startup duration was uncharted. This was unsurprising, as this aspect of the Firefox performance effort was still quite young. To this day, we have not finished chartering startup or even Session Restore’s startup.

So how do we measure the impact of Session Restore on startup?

A first tool we use is Timeline Events, which let us determine how long it takes to reach a specific point of startup. Session Restore has had events `sessionRestoreInitialized` and `sessionRestored` for years. Unfortunately, these events did not tell us much about Session Restore itself.

The first serious attempt at measuring the impact of Session Restore on startup Performance was actually not due to the Performance team but rather to the metrics team. Indeed, data obtained through Firefox Health Report participants indicated that something wrong had happened.

Oops, something is going wrong

Indicator `d2` in the graph measures the duration between `firstPaint` (which is the instant at which we start displaying content in our windows) and `sessionRestored` (which is the instant at which we are satisfied that Session Restore has opened its first tab). While this measure is imperfect, the dip was worrying – indeed, it represented startups that lasted several seconds longer than usual.

Upon further investigation, we concluded that the performance regression was indeed due to Session Restore. While we had not planned to start optimizing the startup component of Session Restore, this battle was forced upon us. We had to recover from that regression and we had to start monitoring startup much better.

A second tool is Telemetry Histograms for measuring duration of individual operations, such as reading sessionstore.js or parsing it. We progressively added measures for most of the operations of Session Restore. While these measures are quite helpful, they are also unfortunately very unstable in real-world conditions, as they are affected both by scheduling (the operations are asynchronous), by the work load of the machine, by the actual contents of sessionstore.js, etc.

The following graph displays the average duration of reading and decoding sessionstore.js among Telemetry participants: Telemetry 4

Difference in colors represent successive versions of Firefox. As we can see, this graph is quite noisy, certainly due to the factors mentioned above (the spikes don’t correspond to any meaningful change in Firefox or Session Restore). Also, we can see a considerable increase in the duration of the read operation. This was quite surprising for us, given that this increase corresponds to the introduction of a much faster, off the main thread, reading and decoding primitive. At the time, we were stymied by this change, which did not correspond to our experience. We have now concluded that by changing the asynchronous operation used to read the file, we have simply changed the scheduling, which makes the operation appear longer, while in practice it simply does not block the rest of the startup from taking place on another thread.

One major tool was missing for our arsenal: a stable benchmark, always executed on the same machine, with the same contents of sessionstore.js, and that would let us determine more exactly (almost daily, actually) the impact of our patches upon Session Restore:Session Restore Talos

This test, based on our Talos benchmark suite, has proved both to be very stable, and to react quickly to patches that affected its performance. It measures the duration between the instant at which we start initializing Session Restore (a new event `sessionRestoreInit`) and the instant at which we start displaying the results (event `sessionRestored`).

With these measures at hand, we are now in a much better position to detect performance regressions (or improvements) to Session Restore startup, and to start actually working on optimizing it – we are now preparing to using this suite to experiment with “what if” situations to determine which levers would be most useful for such an optimization work.

Evolution of startup duration

Our first benchmark measures the time elapsed between start and stop of Session Restore if the user has requested all windows to be reopened automatically

restoreAs we can see, the performance on Linux 32 bits, Windows XP and Mac OS 10.6 is rather decreasing, while the performance on Linux 64 bits, Windows 7 and 8 and MacOS 10.8 is improving. Since the algorithm used by Session Restore upon startup is exactly the same for all platforms, and since “modern” platforms are speeding up while “old” platforms are slowing down, this suggests that the performance changes are not due to changes inside Session Restore. The origin of these changes is unclear. I suspect the influence of newer versions of the compilers or some of the external libraries we use, or perhaps new and improved (for some platforms) gfx.

Still, seeing the modern platforms speed up is good news. As of Firefox 31, any change we make that causes a slowdown of Session Restore will cause an immediate alert so that we can react immediately.

Our second benchmark measures the time elapsed if the user does not wish windows to be reopened automatically. We still need to read and parse sessionstore.js to find whether it is valid, so as to decide whether we can show the “Restore” button on about:home.

norestoreWe see peaks in Firefox 27 and Firefox 28, as well as a slight decrease of performance on Windows XP and Linux. Again, in the future, we will be able to react better to such regressions.

The influence of factors upon startup

With the help of our benchmarks, we were able to run “what if” scenarios to find out which of the data manipulated by Session Restore contributed to startup duration. We did this in a setting in which we restore windows:size-restore

and in a setting in which we do not:

size-norestore

Interestingly, increasing the size of sessionstore.js has apparently no influence on startup duration. Therefore, we do not need to optimize reading and parsing sessionstore.js. Similarly, optimizing history, cookies or form data would not gain us anything.

The single largest most expensive piece of data is the set of open windows – interestingly, this is the case even when we do not restore windows. More precisely, any optimization should target, by order of priority:

  1. the cost of opening/restoring windows;
  2. the cost of opening/restoring tabs;
  3. the cost of dealing with windows data, even when we do not restore them.

What’s next?

Now that we have information on which parts of Session Restore startup need to be optimized, the next step is to actually optimize them. Stay tuned!

Where Am I?

You are currently browsing the Informatique / Computer science category at Il y a du thé renversé au bord de la table.

Follow

Get every new post delivered to your Inbox.

Join 34 other followers