March 26, 2014 § 7 Comments
Plot Our heroes received their assignment. They had to go deep into the Perflines, in the long lost territory of Session Restore, and do whatever it took to get Session Restore back into Perfland. However, they quickly realized that they had been sent on a mission without return – and without a map. This is their tale.
Session Restore is a critical component of Firefox. This component records the current state of your browser to ensure that you can always resume browsing without losing the state of your browser, even if Firefox crashes, if your computer loses power, or if your browser is being upgraded. Unfortunately, we have had many reports of Session Restore slowing down Firefox. In February 2013, a two person Perf/Fx-team task force started working on the Performance of Session Restore. This task force eventually grew to four persons from Perf, Fx-team and e10s, along with half a dozen of punctual contributors.
To this day, the effort has lasted 13 months. In this series of blg entries, I intend to present our work, our results and, more importantly, the lessons we have learnt along the way, sometimes painfully.
Fixing yes, but fixing what?
We had reports of Session Restore blocking Firefox for several seconds every 15 seconds, which made Firefox essentially useless.
The job of Session Restore is to record everything possible of the state of the current browsing session. This means the list of windows, the list of tabs, the current address of each tab, but also the history of each tab, scroll position, anchors, DOM SessionStorage. session cookies, etc. Oh, and this goes recursively for both nested frames and history. All of this is saved to a JSON-formatted file called sessionstore.js, every 15 seconds of user activity. To this day, the largest reported sessionstore.js files is 150Mb, but Telemetry indicates that 95% of users used to have a file of less than 1Mb (numbers are lower these days, after we spent time eliminating unnecessary data from sessionstore.js).
We started the effort to fix Session Restore from only a few bug reports:
- sometimes, users lost sessionstore.js data;
- sometimes, data collection took ages.
Unfortunately, we had no data on:
- the size of the file;
- the actual duration of data collection;
- how long it took to write data to the disk.
To complicate things further, Session Restore had been left without owner for several years. Irregular patching to support new features of the web and new configurations had progressively turned the code and data structures into a mess that nobody fully understood.
We had, however, a few hints:
- Session Restore needs to collect lots of data;
- Session Restore had been designed a long time ago, for users with few tabs, and when web pages stored very little information;
- serializing and writing to JSON is inefficient;
- in bad cases, saving could take several seconds;
- the collection of data was purely monolithic;
- reading and writing data was done entirely on the main thread, which was a very bad thing to do;
- the client API caused full recollections at each request;
- the data structure used by Session Restore had progressively become an undocumented mess.
While there were a number of obvious sources of inefficiency that we could fix without further data, and that we set out to fix immediately. In a few cases, however, we found out the hard way that optimizing without hard data is a time-consuming and useless exercise. Consequently, a considerable part of our work has been to use Telemetry to determine where we could best apply our optimization effort, and to confirm that this effort yielded results. In many cases, this meant adding coarse-grained probes, then progressively completing them with finer-grained probes, in parallel with actually writing optimizations.
To be continued…
In the next episode, our heroes will fight Main Thread File I/O… and the consequences of removing it.