September 19, 2016 § Leave a comment
You can find my new blog on github. Still rough around the edges, but I’m planning to improve this as I go.
February 17, 2016 § Leave a comment
One of these days, using the Cloud of OpaqueCompany ™, I will be able to set the colour of my lightbulbs by talking to my TV. Somewhere along the way, my house will become a little bit more energy hungry and a little bit more dependent on the Cloud of OpaqueCompany(tm) . That’s the promise of the Internet of Things. Isn’t that neat? Isn’t that exciting?
Not really. At least, not for me. But, for some reason, whenever I read about that Internet of Things, it is about expensive gadgets that, to me, sounds like Christmas commercials: marginally useful, designed by marketers for spoilt westerners to be consumed then forgotten before the next Christmas shopping spree.
But this doesn’t have to be.
I have spent a little time scratching the surface and trying to determine whether there was something more to this Internet of Things, beside the shopping list. I came back convinced that, once you forget the marketing, this Internet of Things can become a revolution as big as the Personal Computer or the World Wide Web – at least if we let it fall into the right hands.
Say you are the owner or manager of a small commerce, say a restaurant. Chances are that you need a burglar alarm, either because you fear that you are going to be burglarised, or because your insurance requires one. You have two solutions. Either you go to a store and buy some off-the-shelf product, or you contract a company, draw a list of requirements and pay for a custom setup. In either case, you are a consumer, and you are stuck with what you paid for. But needs change. Perhaps the insurance policies now requires you to have an alarm that can call the police automatically. Perhaps neighbours complained about the noise of the alarm and you need to turn it into a silent alarm that rings your cellphone. Perhaps the insurance has changed their policy and now requires you to take pictures of the burglary. Perhaps you have had work done and the small window in the bathroom is now large enough that it could be used to break in. Or water damage has destroyed one of your sensors and you need to replace it, but the model doesn’t exist anymore. Or you are tired of triggering the alarm when you take out the garbage and need to refine the policy. Of your product was linked to a subscription, to call the police on your behalf, but the provider has stopped this service. In any of these cases, you are probably stuck. Because your needs have made you a consumer and you are served only as long as there is a market for your specific need.
Now, consider an alternate universe, in which you just need to walk or drive to the nearest store, buy a few off-the-shelf motion detectors, for the price of a few dollars and simply attach them in your restaurant, where you see fit. They use open standards, so you can install an app to get them to work together, or even better, use your cellphone to script them visually into doing what you need. Do you need to add one or ten, or replace them with different models, or add door-lock sensors? It’s just as easy. Do you need to add a camera? Well, place it and use your cellphone to add that camera to your script. Use your cellphone again and customise the effect, to call the police, or ring your cellphone, or deactivate a single alarm between 11pm and 11.30pm, because that’s when you take out the trash. And if your product is linked to a subscription, because it uses open standards, you can switch provider as needed. In this universe, the Internet of Things has put you in control – not a Cloud, not a silo – and drastically cut your costs and your dependencies.
A few months ago, Mozilla has started pivoting from SmartPhones to the Web of Things – that’s the name we give to Internet of Things done right, with open standards, you in charge, rather than silos and Opaque Cloud ™. I can make no promise that we are going to succeed, but I believe in the huge potential of this Web of Things.
By the way, it doesn’t stop at restaurants. The exact same open standards can help you guard against fires in your house or humidity in your server room. Or crowdsourcing flood detection in cities exposed to flash floods or automating experiments in a physics lab. Or watching your heartbeat or listening to your snores. Or determining which part of the village farm needs to be irrigated in priority or which part of the sewers need most attention.
Some of these problems already have commercial solutions. But what about your next problem, the one that hasn’t attracted the attention of any company large enough to produce devices specifically for you?
Here is to the Web of Things. Let’s make sure that it falls into the right hands.
November 6, 2015 § Leave a comment
In part 1, we discussed the design of time measurement within the Firefox Performance Monitor. Despite the intuition, the Performance Monitor had neither the same set of objectives as the Gecko Profiler, nor the same set of constraints, and we ended up picking a design that was not a sampling profiler. In particular, instead of capturing performance data on stacks, the Monitor captures performance data on Groups, a notion that we have not discussed yet. In this part, we will focus on bridging the gap between our low-level instrumentation and actual add-ons and webpages, as may be seen by the user.
Designing the Firefox Performance Stats Monitor, part 1: Measuring time without killing battery or performance
October 27, 2015 § Leave a comment
For a few versions, Firefox Nightly has been monitoring the performance of add-ons, thanks to the Performance Stats API. While we are waiting for the greenlight to let it graduate to Firefox Aurora, as well as investigating a few lingering false-positives, and while v2 is approaching steadily, it is time for a brain dump on this toolbox and its design.
The initial objective of this monitor is to be able to flag both add-ons and webpages that cause noticeable slowdowns, so as to let users disable/close whatever is making their use of Firefox miserable. We also envision more advanced uses that could let us find out if features of webpages cause slowdowns on specific OS/hardware combinations.
September 30, 2014 § 6 Comments
September is ending, and with it Q3 of 2014. It’s time for a brief report, so here is what happened during the summer.
After ~18 months working on Session Restore, I am progressively switching away from that topic. Most of the main performance issues that we set out to solve have been solved already, we have considerably improved safety, cleaned up lots of the code, and added plenty of measurements.
During this quarter, I have been working on various attempts to optimize both loading speed and saving speed. Unfortunately, both ongoing works were delayed by external factors and postponed to a yet undetermined date. I have also been hard at work on trying to pin down performance regressions (which turned out to be external to Session Restore) and safety bugs (which were eventually found and fixed by Tim Taubert).
In the next quarter, I plan to work on Session Restore only in a support role, for the purpose of reviewing and mentoring.
Also, a rant The work on Session Restore has relied heavily on collaboration between the Perf team and the FxTeam. Unfortunately, the resources were not always available to make this collaboration work. I imagine that the FxTeam is spread too thin onto too many tasks, with too many fires to fight. Regardless, the symptom I experienced is that during the course of this work, both low-priority, high-priority and safety-critical patches have been left to rot without reviews, despite my repeated requests, for 6, 8 or 10 weeks, much to the dismay of everyone involved. This means man·months of work thrown to /dev/null, along with quarterly objectives, morale, opportunities, contributors and good ideas.
I will try and blog about this, eventually. But please, in the future, everyone: remember that in the long run, the priority of getting reviews done (or explaining that you’re not going to) is a quite higher than the priority of writing code.
Many improvements to Async Tooling landed during Q3. We now have the PromiseWorker, which simplifies considerably the work of interacting between the main thread and workers, for both Firefox and add-on developers. I hear that the first add-on to make use of this new feature is currently being developed. New features, bugfixes and optimizations landed for OS.File. We have also landed the ability to watch for changes in a directory (under Windows only, for the time being).
Sadly, my work on interactions between Promise and the Test Suite is currently blocked until the DevTools team manages to get all the uncaught asynchronous errors under control. It’s hard work, and I can understand that it is not a high priority for them, so in Q4, I will try to find a way to land my work and activate it only for a subset of the mochitest suites.
I have recently joined the newly restarted effort to improve the performance of Places, the subsystem that handles our bookmarks, history, etc. For the moment, I am still getting warmed up, but I expect that most of my work during Q4 will be related to Places.
As it turns out, we had many crashes during asynchronous shutdown, a few of them safety-critical. At the time, we did not have the necessary tools to determine to prioritize our efforts or to find out whether our patches had effectively fixed bugs, so I built a dashboard to extract and display the relevant information on such crashes. This proved a wise investment, as we spent plenty of time fighting AsyncShutdown-related fires using this dashboard.
In addition to the “clean shutdown” mechanism provided by AsyncShutdown, we also now have the Shutdown Terminator. This is a watchdog subsystem, launched during shutdown, and it ensures that, no matter what, Firefox always eventually shuts down. I am waiting for data from our Crash Scene Investigators to tell us how often we need this watchdog in practice.
I lost track of how many code contributors I interacted with during the quarter, but that represents hundreds of e-mails, as well as countless hours on IRC and Bugzilla, and a few hours on ask.mozilla.org. This year’s mozEdu teaching is also looking good.
We also launched FirefoxOS in France, with big success. I found myself in a supermarket, presenting the ZTE Open C and the activities of Mozilla to the crowds, and this was a pleasing experience.
For Q4, expect more mozEdu, more mentoring, and more sleepless hours helping contributors debug their patches 🙂
July 17, 2014 § 4 Comments
Plot For the second time, our heroes prepared for battle. The startup of Firefox was too slow and Session Restore was one of the battle fields.
When Firefox starts, Session Restore is in charge of restoring the browser to its previous state, in case of a crash, a restart, or for the users who have configured Firefox to resume from its previous state. This entails numerous activities during startup:
- read sessionstore.js from disk, decode it and parse it (recall that the file is potentially several Mb large), handling errors;
- backup sessionstore.js in case of startup crash.
- create windows, tabs, frames;
- populate history, scroll position, forms, session cookies, session storage, etc.
It is common wisdom that Session Restore must have a large impact on Firefox startup. But before we could minimize this impact, we needed to measure it.
Benchmarking is not easy
When we first set foot on Session Restore territory, the contribution of that module to startup duration was uncharted. This was unsurprising, as this aspect of the Firefox performance effort was still quite young. To this day, we have not finished chartering startup or even Session Restore’s startup.
So how do we measure the impact of Session Restore on startup?
A first tool we use is Timeline Events, which let us determine how long it takes to reach a specific point of startup. Session Restore has had events `
sessionRestoreInitialized` and `
sessionRestored` for years. Unfortunately, these events did not tell us much about Session Restore itself.
The first serious attempt at measuring the impact of Session Restore on startup Performance was actually not due to the Performance team but rather to the metrics team. Indeed, data obtained through Firefox Health Report participants indicated that something wrong had happened.
d2` in the graph measures the duration between `
firstPaint` (which is the instant at which we start displaying content in our windows) and `
sessionRestored` (which is the instant at which we are satisfied that Session Restore has opened its first tab). While this measure is imperfect, the dip was worrying – indeed, it represented startups that lasted several seconds longer than usual.
Upon further investigation, we concluded that the performance regression was indeed due to Session Restore. While we had not planned to start optimizing the startup component of Session Restore, this battle was forced upon us. We had to recover from that regression and we had to start monitoring startup much better.
A second tool is Telemetry Histograms for measuring duration of individual operations, such as reading sessionstore.js or parsing it. We progressively added measures for most of the operations of Session Restore. While these measures are quite helpful, they are also unfortunately very unstable in real-world conditions, as they are affected both by scheduling (the operations are asynchronous), by the work load of the machine, by the actual contents of sessionstore.js, etc.
Difference in colors represent successive versions of Firefox. As we can see, this graph is quite noisy, certainly due to the factors mentioned above (the spikes don’t correspond to any meaningful change in Firefox or Session Restore). Also, we can see a considerable increase in the duration of the read operation. This was quite surprising for us, given that this increase corresponds to the introduction of a much faster, off the main thread, reading and decoding primitive. At the time, we were stymied by this change, which did not correspond to our experience. We have now concluded that by changing the asynchronous operation used to read the file, we have simply changed the scheduling, which makes the operation appear longer, while in practice it simply does not block the rest of the startup from taking place on another thread.
One major tool was missing for our arsenal: a stable benchmark, always executed on the same machine, with the same contents of sessionstore.js, and that would let us determine more exactly (almost daily, actually) the impact of our patches upon Session Restore:
This test, based on our Talos benchmark suite, has proved both to be very stable, and to react quickly to patches that affected its performance. It measures the duration between the instant at which we start initializing Session Restore (a new event `
sessionRestoreInit`) and the instant at which we start displaying the results (event `
With these measures at hand, we are now in a much better position to detect performance regressions (or improvements) to Session Restore startup, and to start actually working on optimizing it – we are now preparing to using this suite to experiment with “what if” situations to determine which levers would be most useful for such an optimization work.
Evolution of startup duration
Our first benchmark measures the time elapsed between start and stop of Session Restore if the user has requested all windows to be reopened automatically
As we can see, the performance on Linux 32 bits, Windows XP and Mac OS 10.6 is rather decreasing, while the performance on Linux 64 bits, Windows 7 and 8 and MacOS 10.8 is improving. Since the algorithm used by Session Restore upon startup is exactly the same for all platforms, and since “modern” platforms are speeding up while “old” platforms are slowing down, this suggests that the performance changes are not due to changes inside Session Restore. The origin of these changes is unclear. I suspect the influence of newer versions of the compilers or some of the external libraries we use, or perhaps new and improved (for some platforms) gfx.
Still, seeing the modern platforms speed up is good news. As of Firefox 31, any change we make that causes a slowdown of Session Restore will cause an immediate alert so that we can react immediately.
Our second benchmark measures the time elapsed if the user does not wish windows to be reopened automatically. We still need to read and parse sessionstore.js to find whether it is valid, so as to decide whether we can show the “Restore” button on about:home.
The influence of factors upon startup
With the help of our benchmarks, we were able to run “what if” scenarios to find out which of the data manipulated by Session Restore contributed to startup duration. We did this in a setting in which we restore windows:
and in a setting in which we do not:
Interestingly, increasing the size of sessionstore.js has apparently no influence on startup duration. Therefore, we do not need to optimize reading and parsing sessionstore.js. Similarly, optimizing history, cookies or form data would not gain us anything.
The single largest most expensive piece of data is the set of open windows – interestingly, this is the case even when we do not restore windows. More precisely, any optimization should target, by order of priority:
- the cost of opening/restoring windows;
- the cost of opening/restoring tabs;
- the cost of dealing with windows data, even when we do not restore them.
Now that we have information on which parts of Session Restore startup need to be optimized, the next step is to actually optimize them. Stay tuned!