Q2 2014 Report
July 1, 2014 § Leave a comment
Q2 2014 was a difficult quarter at Mozilla, with all the agitation around Brendan Eich, Australis, Media Extensions, etc. Still, I have the feeling that we managed to get a lot done despite the intense pressure. Here is a quick highlight of my main accomplishments for Q2 2014.
A considerable amount of my time was spent working on Session Restore. The main objective is to decrease the jank caused by Session Restore taking snapshots of the session and to decrease the time Session Restore takes to restore the state of Firefox. Much of the activity this quarter dealt with measuring performance, so as to best optimize it and improving safety.
Reworking Session Restore backups
With Firefox 33, the backups of Session Restore state have been completely redesigned. The new system should prove orders of magnitude safer, in addition to now being fully transparent.
Next steps We are still lacking measurements to confirm that this is as successful as the mathematics suggest. If you are interested, there is a mentored bug open.
Talos tests and Telemetry on Session Restore startup
Optimizing startup is difficult, and generally impossible if you do not know what to optimize. With Firefox 32 and 33, we have new benchmarks and real world measurements to help us determine immediately the influence of patches on Session Restore startup.
Next steps Using these benchmarks to experiment with possible optimizations. This is in progress.
Cleaning up Session Restore file
One of our objectives is to decrease the size of the Session Restore file, to reduce the amount of I/O (hence battery use and hardware wear and tear) and memory usage. As a first step, we have introduced a mechanism that progressively removes from the “Undo Close” feature tabs and windows that have been closed at least 2 weeks ago. Interestingly, Telemetry indicates that this clean-up has no effect on the size of the Session Restore file. Experiments run later during the quarter, using the Talos tests, also strongly suggest that the data that we could clean up and that we do not clean up yet have essentially no influence on startup duration.
Next steps I believe that this strategy will therefore not be pursued during the next quarters.
Preserving compatibility with Tor Browser
While refactoring Session Restore, we have hit a number of obstacles in the form of add-ons using private or semi-private APIs that we wished to remove. We have managed to work along with add-on authors and, as far as I know, we have not broken any add-on yet. In particular, we have maintained compatibility with the Tor Browser, which is a heavily customized distribution of Firefox targeted towards privacy.
Next steps Providing a clean API for add-ons. This will require discussing with add-on authors to find out what they need.
I am in charge of the Async Project, which is all about giving front-end and add-on developers tools to develop asynchronous code that does not jank. As usual, this involved plenty of activity in a number of different directions.
Auto-closing Sqlite.jsm databases (mentoring Michael Brennan)
Sqlite.jsm databases can now be closed automatically during garbage-collection. On user’s computers, this will increase safety, as failing to close a database causes shutdown-time assertion failures. However, to use resources effectively, pragmatism dictates that databaes should be closed manually, so failing to close a database in the Mozilla codebase will still cause test failures.
Reworking OS.File shutdown
On devices with little memory (typically Firefox Phones), one of the techniques used to save memory is to shutdown the OS.File worker as early as possible, re-launching it later if necessary. As it turns out, the task is more complicated than it seems, due to possibilities of race conditions. Unfortunately, this means that in some extreme cases, Firefox OS applications could lock down and fail to shutdown properly without being killed by the OS. This is now fixed. Somewhere along the way, this helped us to make the PromiseWorker used by OS.File more resilient to low-level errors.
Next steps Making the PromiseWorker usable by other modules than OS.File, including testing and add-ons.
OS.File for Android and Firefox OS
OS.File was initially designed for desktop devices. Now that it is used in a number of places on mobile devices, I have mercilessly hunted down all compatibility issues between OS.File and our two mobile platforms. Compatibility tests are now activated on all platforms and should avoid any regression.
AsyncShutdown Barrier mechanism
The shutdown process of Firefox has always been a dark and scary place, full of unspecified dependencies. As a result, any refactoring or addition a new dependency could break many things in new and interesting ways. I have introduced the AsyncShutdown Barrier mechanism that lets us specify clear, explicit and extensible dependencies, handles ordering of shutdowns, as well as error reporting if a dependency is unmet. This Barrier is now used by Sqlite.jsm, OS.File, Firefox Health Report, Session Restore, Page Thumbnails and fixes a number of major issues.
Next steps Porting AsyncShutdown Barrier to allow native components to register with it.
Fixing Firefox 30 shutdown freezes (with Tim Taubert)
Many users of Firefox 30 encountered issues that caused Firefox to freeze during shutdown. We found out that the issue was caused was triggered by Page Thumbnails and caused by a bug in ChromeWorkers, which did not handle an error case gracefully. I applied AsyncShutdown Barrier to ensure that Page Thumbnails always completed without triggering the error case, while Tim Taubert ensured that the Chrome Workers handled the error robustly.
Making Firefox Health Report shutdown more robust
While porting Firefox Health Report to AsyncShutdown, we encountered an elusive bug that manifested itself by causing rare shutdown crashes. After months of experimenting, instrumenting and attempting to fix the issue, we eventually traced it back to a more serious bug in shutdown, which apparently does not always send the proper notifications. Using the AsyncShutdown Barrier, we managed to work around the issue and make FHR’s shutdown both more robust and better instrumented in case of crash. This later helped us locate another issue that prevents a proper shutdown when some databases have been corrupted.
Next steps Fix the upstream shutdown bug, make our shutdown more resilient in case of database corruption.
The other aspect of writing asynchronous code is making sure that developers can debug it. Now that we have hit a critical mass of developers writing async code, it was high time to help them work with it.
Rewriting Task stack traces to be meaningful
Now that we know how to handle uncaught errors, the main remaining weaknesses of Promise-based and Task-based code is that their stack traces lose much information. Since Firefox 33, Task-based stack traces are now transparently rewritten into something developer-redable. Somewhere along the way, I have also patched xpcshell and mochitests to ensure that they take advantage of this rewriting. Experience shows that this is very useful and that the runtime cost is negligible.
Next steps Evaluate the runtime cost of doing the same thing for Promise-based code.
Making xpcshell tests fail in case of uncaught promise error
Uncaught promise errors were treated by the test suites as warnings, TBPL did not report them, and they remained consequantly more often than not ignored (or even unseen) by the developers. I have reworked the xpcshell test harness to consider all uncaught promise errors as oranges and fixed all offenders.
Next steps Doing the same for mochitests. Code is ready, but a few offenders remain.
Dealing with political feedback around the nomination and departure of Brendan Eich
Along with many others, I made my best to engage people who voiced their negative feedback either at the nomination or at the departure of Brendan Eich. Unfortunately, this took time and efforts, but I believe that staying in touch with our users is part of what makes the difference between Mozilla and other browser vendors.
Working with new contributors
I estimate that I have worked with ~30 potential new contributors during the quarter. Many have unfortunately decided to postpone or abandon their efforts towards contributing, but a few have stayed, to work either with me or with other teams. At the moment, I am following 5 promising contributors. In particular, I am quite happy to welcome Dexter (who is working on a very sophisticated patch to let code watch for file modifications) and Kushagra (who has landed several test suite bugs).
Next steps More of it!
Working with universities
A group of École Centrale de Lyon successfully completed an online tool to help grassroot projects find volunteers. It was nice mentoring them.
I was invited to deliver a presentation on performance at Zedge, in Trondheim, Norway. That was fun 🙂
Next steps Publish the slides.
Let’s get started with Q3!