November 6, 2015 § Leave a comment
In part 1, we discussed the design of time measurement within the Firefox Performance Monitor. Despite the intuition, the Performance Monitor had neither the same set of objectives as the Gecko Profiler, nor the same set of constraints, and we ended up picking a design that was not a sampling profiler. In particular, instead of capturing performance data on stacks, the Monitor captures performance data on Groups, a notion that we have not discussed yet. In this part, we will focus on bridging the gap between our low-level instrumentation and actual add-ons and webpages, as may be seen by the user.
Designing the Firefox Performance Stats Monitor, part 1: Measuring time without killing battery or performance
October 27, 2015 § Leave a comment
For a few versions, Firefox Nightly has been monitoring the performance of add-ons, thanks to the Performance Stats API. While we are waiting for the greenlight to let it graduate to Firefox Aurora, as well as investigating a few lingering false-positives, and while v2 is approaching steadily, it is time for a brain dump on this toolbox and its design.
The initial objective of this monitor is to be able to flag both add-ons and webpages that cause noticeable slowdowns, so as to let users disable/close whatever is making their use of Firefox miserable. We also envision more advanced uses that could let us find out if features of webpages cause slowdowns on specific OS/hardware combinations.
July 16, 2015 § Leave a comment
School year 2014-2015 is ending. It’s time for a brief report.
July 13, 2015 § 31 Comments
A long time ago, XUL was an extraordinary component of Firefox. It meant that front-end and add-on developers could deliver user interfaces in a single, mostly-declarative, language, and see them adapt automatically to the look and feel of each OS. Ten years later, XUL has become a burden: most of its features have been ported to HTML5, often with slightly different semantics – which makes Gecko needlessly complex – and nobody understands XUL – which makes contributions harder than they should be. So, we have reached a stage at which we basically agree that, in a not-too-distant future, Firefox should not rely upon XUL anymore.
But wait, it’s not the only thing that needs to change. We also want to support piecewise updates for Firefox. We want Firefox to start fast. We want the UI to remain responsive. We want to keep supporting add-ons. Oh, and we want contributors, too. And we don’t want to lose internationalization.
Mmmh… and perhaps we don’t want to restart Firefox from bare Gecko.
All of the above are worthy objectives, but getting them all will require some careful thought.
So I’d like to put together a list of all our requirements, against which we could evaluate potential solutions, re-architectures, etc. for the front-end:
- Get rid of the deprecated (XUL) bits of Gecko in a finite time.
- Don’t break Firefox .
- Firefox should start fast.
- The UI should not suffer from jank.
- The UI should not cause jank.
- Look and feel like a native app, even with add-ons.
- Keep supporting internationalization.
- Keep supporting lightweight themes.
- Keep supporting acccessibility.
- Use technologies that the world understands.
- Use technologies that are useful to add-on authors.
- Support piece-wise, restart-less front-end updates.
- Provide an add-ons API that won’t break.
- Code most of the front-end with the add-ons API.
 I have heard this claim contested. Some apparently suggest that we should actually break Firefox and base all our XUL-less, Go Faster initiatives on a clean slate from e.g. Browser.html or Servo. If you wish to defend this, please step forward :)
Does this sound like a correct list for all of you?
May 6, 2015 § 13 Comments
When it is at its best, Firefox is fast. Really, really fast. When things start slowing down, though, using Firefox is much less fun. So, one of main objectives of the developers of Firefox is making sure that Firefox is and remains as smooth and responsive as humanly possible. There is, however, one thing that can slow down Firefox, and that remains out of the control of the developers: add-ons. Good add-ons are extraordinary, but small coding errors – or sometimes necessary hacks – can quickly drive the performance of Firefox into the ground.
So, how can an add-on developer (or add-on reviewer) find out whether her add-on is fast? Sadly, not much. Testing certainly helps, and the Profiler is invaluable to help pinpoint a slowdown once it has been noticed, but what about the performance of add-ons in everyday use? What about the experience of users?
To solve this issue, we decided to work on a set of tools to help add-on developers and reviewers find out the performance of their add-ons. Oh, and also to let users find out quickly if an add-on is slowing down their everyday experience.
On recent Nightly builds of Firefox, you may now open about:performance to get an overview of the performance cost of add-ons and webpages :
The main resources we monitor are :
- jank, which measures how much the add-on impacts the responsiveness of Firefox. For 60fps performance, jank should always remain ≤ 4. If an add-on regularly causes jank to increase past 6, you should be worried.
- CPOW aka blocking cross-process communications, which measures how much the add-on is causing Firefox to freeze waiting for a process to respond. Anything above 0 is bad.
Note that the design of this page is far from stable. I realise it’s not very user-friendly at the moment, so don’t hesitate to file bugs to help us improve it. Also note that, when running with e10s, the page doesn’t display all the useful information. We are working on it.
Add-on developers and reviewers can now find information on the performance of their add-ons on a dedicated dashboard.
These are real-world performance data, as extracted from user’s computers. The two histograms available for the time being are:
- MISBEHAVING_ADDONS_JANK_LEVEL, which measures the jank, as detailed above;
- MISBEHAVING_ADDONS_CPOW_TIME_MS, which measure the amount of time spent in CPOW, as detailed above.
If you are an add-on developer, you should monitor regularly the performance of your add-on on this page. If you notice suspicious values, you should try and find out what causes these performance issues. Don’t hesitate and reach out to us, we will try and help you.
Slow add-on Notification
Add-on developers and reviewers, as well as end-users, are now informed when an add-on causes either jank or CPOW performance issues:
Note that this feature is not ready to ride the trains, and we do not have a specific idea of when it will be made available for users of Aurora/DeveloperEdition. This is partly because the UX is not good enough yet, partly because the thresholds will certainly change, and partly because we want to give add-on developers time to fix any issue before the users see a dialog that suggest that an add-on should be uninstalled.
Performance Stats API
By the way, we have an API for accessing performance stats. Very imaginatively, it’s called PerformanceStats.jsm [link]. While this API will still change during the coming weeks you can start playing with it if you are interested. Some add-ons may be able to throttle their performance use based on this data. Also, I hope that, in time, someone will be able to write a version of about:performance much nicer than mine :)
Challenges and work ahead
For the moment, we are in the process of stabilizing the API, its implementation and its performance. In parallel, we are working on making the UX of about:performance more useful. Once both are done, we are going to proceed with adding more measurements, making the code more e10s-friendly and measuring the performance of webpages.
If you are an add-on developer and if you feel that your add-on is tagged as slow by error, or if you have great ideas on how to make this data useful, feel free to ping me, preferably on IRC. You can find me on irc.mozilla.org, channel #developers, where I am Yoric.
July 17, 2014 § 4 Comments
Plot For the second time, our heroes prepared for battle. The startup of Firefox was too slow and Session Restore was one of the battle fields.
When Firefox starts, Session Restore is in charge of restoring the browser to its previous state, in case of a crash, a restart, or for the users who have configured Firefox to resume from its previous state. This entails numerous activities during startup:
- read sessionstore.js from disk, decode it and parse it (recall that the file is potentially several Mb large), handling errors;
- backup sessionstore.js in case of startup crash.
- create windows, tabs, frames;
- populate history, scroll position, forms, session cookies, session storage, etc.
It is common wisdom that Session Restore must have a large impact on Firefox startup. But before we could minimize this impact, we needed to measure it.
Benchmarking is not easy
When we first set foot on Session Restore territory, the contribution of that module to startup duration was uncharted. This was unsurprising, as this aspect of the Firefox performance effort was still quite young. To this day, we have not finished chartering startup or even Session Restore’s startup.
So how do we measure the impact of Session Restore on startup?
A first tool we use is Timeline Events, which let us determine how long it takes to reach a specific point of startup. Session Restore has had events `
sessionRestoreInitialized` and `
sessionRestored` for years. Unfortunately, these events did not tell us much about Session Restore itself.
The first serious attempt at measuring the impact of Session Restore on startup Performance was actually not due to the Performance team but rather to the metrics team. Indeed, data obtained through Firefox Health Report participants indicated that something wrong had happened.
d2` in the graph measures the duration between `
firstPaint` (which is the instant at which we start displaying content in our windows) and `
sessionRestored` (which is the instant at which we are satisfied that Session Restore has opened its first tab). While this measure is imperfect, the dip was worrying – indeed, it represented startups that lasted several seconds longer than usual.
Upon further investigation, we concluded that the performance regression was indeed due to Session Restore. While we had not planned to start optimizing the startup component of Session Restore, this battle was forced upon us. We had to recover from that regression and we had to start monitoring startup much better.
A second tool is Telemetry Histograms for measuring duration of individual operations, such as reading sessionstore.js or parsing it. We progressively added measures for most of the operations of Session Restore. While these measures are quite helpful, they are also unfortunately very unstable in real-world conditions, as they are affected both by scheduling (the operations are asynchronous), by the work load of the machine, by the actual contents of sessionstore.js, etc.
Difference in colors represent successive versions of Firefox. As we can see, this graph is quite noisy, certainly due to the factors mentioned above (the spikes don’t correspond to any meaningful change in Firefox or Session Restore). Also, we can see a considerable increase in the duration of the read operation. This was quite surprising for us, given that this increase corresponds to the introduction of a much faster, off the main thread, reading and decoding primitive. At the time, we were stymied by this change, which did not correspond to our experience. We have now concluded that by changing the asynchronous operation used to read the file, we have simply changed the scheduling, which makes the operation appear longer, while in practice it simply does not block the rest of the startup from taking place on another thread.
One major tool was missing for our arsenal: a stable benchmark, always executed on the same machine, with the same contents of sessionstore.js, and that would let us determine more exactly (almost daily, actually) the impact of our patches upon Session Restore:
This test, based on our Talos benchmark suite, has proved both to be very stable, and to react quickly to patches that affected its performance. It measures the duration between the instant at which we start initializing Session Restore (a new event `
sessionRestoreInit`) and the instant at which we start displaying the results (event `
With these measures at hand, we are now in a much better position to detect performance regressions (or improvements) to Session Restore startup, and to start actually working on optimizing it – we are now preparing to using this suite to experiment with “what if” situations to determine which levers would be most useful for such an optimization work.
Evolution of startup duration
Our first benchmark measures the time elapsed between start and stop of Session Restore if the user has requested all windows to be reopened automatically
As we can see, the performance on Linux 32 bits, Windows XP and Mac OS 10.6 is rather decreasing, while the performance on Linux 64 bits, Windows 7 and 8 and MacOS 10.8 is improving. Since the algorithm used by Session Restore upon startup is exactly the same for all platforms, and since “modern” platforms are speeding up while “old” platforms are slowing down, this suggests that the performance changes are not due to changes inside Session Restore. The origin of these changes is unclear. I suspect the influence of newer versions of the compilers or some of the external libraries we use, or perhaps new and improved (for some platforms) gfx.
Still, seeing the modern platforms speed up is good news. As of Firefox 31, any change we make that causes a slowdown of Session Restore will cause an immediate alert so that we can react immediately.
Our second benchmark measures the time elapsed if the user does not wish windows to be reopened automatically. We still need to read and parse sessionstore.js to find whether it is valid, so as to decide whether we can show the “Restore” button on about:home.
The influence of factors upon startup
With the help of our benchmarks, we were able to run “what if” scenarios to find out which of the data manipulated by Session Restore contributed to startup duration. We did this in a setting in which we restore windows:
and in a setting in which we do not:
Interestingly, increasing the size of sessionstore.js has apparently no influence on startup duration. Therefore, we do not need to optimize reading and parsing sessionstore.js. Similarly, optimizing history, cookies or form data would not gain us anything.
The single largest most expensive piece of data is the set of open windows – interestingly, this is the case even when we do not restore windows. More precisely, any optimization should target, by order of priority:
- the cost of opening/restoring windows;
- the cost of opening/restoring tabs;
- the cost of dealing with windows data, even when we do not restore them.
Now that we have information on which parts of Session Restore startup need to be optimized, the next step is to actually optimize them. Stay tuned!
April 2, 2014 § Leave a comment
Plot Our heroes set out for the first battle. Session Restore’s file I/O was clearly inefficient. Not only was it performing redundant operations, but also it was blocking the main thread doing so. The time had come to take it back. Little did our heroes know that the forces of Regression were lurking and that real battle would be fought long after the I/O had been rewritten and made non-blocking.
For historical reasons, some of Session Restore’s File I/O was quite inefficient. Reading and backing up were performed purely on the main thread, which could cause multi-second pauses in extreme cases, and 100ms+ pauses in common cases. Writing was done mostly off the main thread, but the underlying library used caused accidental main thread I/O, with the same effect, and disk flushing. Disk flushing is extremely inefficient on most operating systems and can quickly bring the whole system to its knees, so needs to be avoided.
In addition to performing main thread I/O and flushing, Session Restore’s I/O had several immediate weaknesses. One of the weaknesses was due to its crash detection mechanism, that required Session Restore to rewrite sessionstore.js immediately after startup, just to store a boolean indicating that we had not crashed. Recall that the largest sessionsstore.js known to this date weighs 150+Mb, and that 1Mb+ instances represented ~5% of our users. Rewriting all this data (and blocking startup while doing so) for a simple boolean flag was clearly unacceptable. We fixed this issue by separating the crash detection mechanism into its own module and ensuring that it only needed to write a few bytes. Another weakness was due to the backup code, which required a full (and inefficient) copy during startup, whereas a simple renaming would have been sufficient.
Having fixed all of this, we were happy. We were wrong.
Sadly, Telemetry archives do not reach back far enough to let me provide data confirming any speed improvement. Note for future perf developers including future self: backup your this data or blog immediately before The Cloud eats it.
As for measuring the effects of a flush, at the moment, we do not have a good way to do this, as the main impact is not on the process itself but on the whole system. The best we can do is measure the total number of flushes, but that doesn’t really help.
Full speed… backwards?
The first indication that something was wrong was a large increase in Telemetry measure SESSIONRESTORED, which measures the total amount of time between the launch of the browser and the moment Session Restore has completed initialization. After a short period of bafflement, we concluded that this increase was normal and was due to a change of initialization order – indeed, since OS.File I/O was executed off the main thread, the results of reading the sessionstore.js file could only be received once the main thread was idle and could receive messages from other threads. While this interpretation was partly correct, it masked a very real problem that we only detected much later. Additionally, during our refactorings, we changed the instant at which Session Restore initialization was executed, which muddled the waters even further.
The second indication arrived much later, when the Metrics team extracted Firefox Health Report data from released versions and got in touch with the Performance team to inform us of a large regression in firstPaint-to-sessionRestored time. For most of our users, Firefox was now taking more than 500ms more to load, which was very bad.
After some time spent understanding the data, attempting to reproduce the measure and bisecting to find out at which changeset the regression had taken place, as well as instrumenting code with additional performance probes, we finally concluded that the problem was due to our use I/O thread, the “SessionWorker”. More precisely, this thread was very slow to launch during startup. Digging deeper, we concluded that the problem was not in the code of the SessionWorker itself, but that the loading of the underlying thread was simply too slow. More precisely, loading was fine on a first run, but on second run, disk I/O contention between the resources required by the worker (the cache for the source code of SessionWorker and its dependencies) and the resources required by the rest of the browser (other source code, but also icons, translation files, etc) slowed down things considerably. Replacing the SessionWorker by a raw use of OS.File would not have improved the situation – ironically, just as the SessionWorker, our fast I/O library was loading slowly because of slow file I/O. Further measurements indicated that this slow loading could take up to 6 seconds in extreme cases, with an average of 340ms.
The patch using the new version of OS.File.read() has landed a few days ago. We are still in the process of trying to make sense of Telemetry numbers. While Telemetry indicates that the total time to read and decode the file has considerably increased, the total time between the start of the read and the time we finish startup seems to have decreased nicely by .5 seconds (75th percentile) to 4 seconds (95th percentile). We suspect that we are confronted to yet another case in which concurrency makes performance measurement more difficult.
We have not attempted to measure the duration of shutdown-time I/O at the moment.
Losing data or privacy
While we received no reports of bugs caused by this risk, we solved the issue by plugging Session Restore’s shutdown into AsyncShutdown.
Changing the back-end
One of our initial intuitions when starting with this work was that the back-end format used to store session data (large JSON file) was inefficient and needed to be changed. Before doing so, however, we instrumented the relevant code carefully. As it turns out, we could indeed gain some performance by improving the back-end format, but this would be a relatively small win in comparison with everything else that we have done.
We have several possible designs for a new back-end, but we have decided not to proceed for the time being, as there are still larger gains to be obtained with simpler changes. More on this in future blog entries.
Before setting out on this quest, we were already aware that performance refactorings were often more complex than they appeared. Our various misadventures have confirmed it. I strongly believe that, by changing I/O, we have improved the performance of Session Restore in many ways. Unfortunately, I cannot prove that we have improved runtime (because old data has disappeared), and we are still not certain that we have not regressed start-up.
If there are lessons to be learned, it is that:
- there is no performance work without performance measurements;
- once your code is sophisticated enough, measuring and understanding the results is much harder than improving performance.
On the upside, all this work has succeeded at:
- improving our performance measurements of many points of Session Restore;
- finding out weaknesses of ChromeWorkers and fixing some of these;
- finding out weaknesses of OS.File and fixing some of these;
- fixing Session Restore’s backup code that consumed resources and didn’t really do much useful;
- avoiding unnecessary performance refactorings where they would not have helped.
The work on improving Session Restore file I/O is still ongoing. For one thing, we are still waiting for confirmation that our latest round of optimizations does not cause unwanted regressions. Also, we are currently working on Talos benchmarks and Telemetry measurements to let us catch such regressions earlier.
This work has also spawned other works for other teams on improving the performance of ChromeWorkers’ startup and communication speed.
In the next episode
Drama. Explosions. Asynchronicity. Electrolysis. And more.