Is my data on the disk? Safety properties of OS.File.writeAtomic

February 5, 2014 § 1 Comment

If you have been writing front-end or add-on code recently, chances are that you have been using library OS.File and, in particular, OS.File.writeAtomic to write files. (Note: If you have been writing files without using OS.File.writeAtomic, chances are that you are doing something wrong that will cause Firefox to jank – please don’t.) As the name implies, OS.File.writeAtomic will make efforts to write your data atomically, so as to ensure its survivability in case of crash, power loss, etc.

However, you should not trust this function blindly, because it has its limitations. Let us take a look at exactly what the guarantees provided by writeAtomic.

Algorithm: just write

Snippet OS.File.writeAtomic(path, data)

What it does

  1. reduce the size of the file at path to 0;
  2. send data to the operating system kernel for writing;
  3. close the file.

Worst case scenarios

  1. if the process crashes between 1. and 2. (a few microseconds), the full content of path may be lost;
  2. if the operating system crashes or the computer loses power suddenly before the kernel flushes its buffers (which may happen at any point up to 30 seconds after 1.), the full content of path may be lost;
  3. if the operating system crashes or the computer loses power suddenly while the operating system kernel is flushing (which may happen at any point after 1., typically up to 30 seconds), and if your data is larger than one sector (typically 32kb), data may be written incompletely, resulting in a corrupted file at path.

Performance very good.

Algorithm: write and rename

Snippet OS.File.writeAtomic(path, data, { tmpPath: path + ".tmp" })

What it does

  1. create a new file at tmpPath;
  2. send data to the operating system kernel for writing to tmpPath;
  3. close the file;
  4. rename tmpPath on top of path.

Worst case scenarios

  1. if the process crashes at any moment, nothing is lost, but a file tmpPath may be left on the disk;
  2. if the operating system crashes or the computer loses power suddenly while the operating system kernel is flushing metadata (which may happen at any point after 1., typically up to 30 seconds), the full content of path may be lost;
  3. if the operating system crashes or the computer loses power suddenly while the operating system kernel is flushing (which may happen at any point after 1., typically up to 30 seconds), and if your data is larger than one sector (typically 32kb), data may be written incompletely, resulting in a corrupted file at path.

Performance almost as good as Just Write.

Side-note On the ext4fs file system, the kernel automatically adds a flush, which transparently transforms the safety properties of this operation into those of the algorithm detailed next.

Native equivalent In XPCOM/C++, the mostly-equivalent solution is the atomic-file-output-stream.

Algorithm: write, flush and rename

Use OS.File.writeAtomic(path, data, { tmpPath: path + ".tmp", flush: true })

What it does

  1. create a new file at tmpPath;
  2. send data to the operating system kernel for writing to tmpPath;
  3. close the file;
  4. flush the writing of data to tmpPath;
  5. rename tmpPath on top of path.

Worst case scenarios

  1. if the process crashes at any moment, nothing is lost, but a file tmpPath may be left on the disk;
  2. if the operating system crashes, nothing is lost, but a file tmpPath may be left on the disk;
  3. if the computer loses power suddenly while the hard drive is flushing its internal hardware buffers (which is very hard to predict), nothing is lost, but an incomplete file tmpPath may be left on the disk;.

Performance some operating systems (Windows) or file systems (ext3fs) cannot flush a single file and rather need to flush all the files on the device, which considerably slows down the full operating system. On some others (ext4fs) this operation is essentially free. On some versions of MacOS X, flushing actually doesn’t do anything.

Native equivalent In XPCOM/C++, the mostly-equivalent solution is the safe-file-output-stream.

Algorithm: write, backup, rename

(not landed yet)

Snippet OS.File.writeAtomic(path, data, { tmpPath: path + ".tmp", backupTo: path + ".backup"})

What it does

  1. create a new file at tmpPath;
  2. send data to the operating system kernel for writing to tmpPath;
  3. close the file;
  4. rename the file at path to backupTo;
  5. rename the file at tmpPath on top of path;

Worst case scenarios

  1. if the process crashes between 4. and 5, file path may be lost and backupTo should be used instead for recovery;
  2. if the operating system crashes or the computer loses power suddenly while the operating system kernel is flushing metadata (which may happen at any point after 1., typically up to 30 seconds), the file at path may be empty and backupTo should be used instead for recovery;
  3. if the operating system crashes or the computer loses power suddenly while the operating system kernel is flushing (which may happen at any point after 1., typically up to 30 seconds), and if your data is larger than one sector (typically 32kb), data may be written incompletely, resulting in a corrupted file at path and backupTo should be used instead for recovery;

Performance almost as good as Write and Rename.

JavaScript, this static language (part 1)

October 20, 2011 § 7 Comments

tl;dr

JavaScript is a dynamic language. However, by borrowing a few pages from static languages – and a few existing tools – we can considerable improve reliability and maintainability.

« Writing one million lines of code of JavaScript is simply impossible »

(source: speaker in a recent open-source conference)

JavaScript is a dynamic language – a very dynamic one, in which programs can rewrite themselves, objects may lose or gain methods through side-effects on themselves on on their prototypes, and, more generally, nothing is fixed.

And dynamic languages are fun. They make writing code simple and fast. They are vastly more suited to prototyping than static languages. Dynamism also makes it possible to write extremely powerful tools that can perform JIT translation from other syntaxes, add missing features to existing classes and functions and more generally fully customize the experience of the developer.

Unfortunately, such dynamism comes with severe drawbacks. Safety-minded developers will tell you that, because of this dynamism, they simply cannot trust any snippet, as this snippet may behave in a manner that does not match its source code. They will conclude that you cannot write safe, or even modular, applications in JavaScript.

Many engineering-minded developers will also tell you that they simply cannot work in JavaScript, and they will not have much difficulty finding examples of situations in which the use of a dynamic language in a complex project can, effectively, kill the project. If you do not believe them, consider a large codebase, and the (rather common) case of a large transversal refactoring, for instance to replace an obsolete API by a newer one. Do this in Java (or, even better, in a more modern mostly-static language such as OCaml, Haskell, F# or Scala), and you can use the compiler to automatically and immediately spot any place where the API has not been updated, and will spot a number of errors that you may have made with the refactoring. Even better, if the API was designed to be safe-by-design, the compiler will automatically spot even complex errors that you may have done during refactoring, including calling functions/methods in the wrong order, or ownership errors. Do the same in JavaScript and, while your code will be written faster, you should expect to be hunting bugs weeks or even months later.

I know that the Python community has considerably suffered from such problems during version transitions. I am less familiar with the world of PHP, but I believe this is no accident that Facebook is progressively arming itself with PHP static analysis tools. I also believe that this is no accident that Google is now introducing a typed language as a candidate replacement for JavaScript.

That is because today is the turn of JavaScript, or if not today, surely tomorrow. I have seen applications consisting in hundreds of thousands of lines of JavaScript. And if just maintaining these applications is not difficult enough, the rapid release cycles of both  Mozilla and Chrome, mean that external and internal APIs are now changing every six weeks. This means breakage. And, more precisely, this means that we need new tools to help us predict breakages and help developers (both add-on developers and browser contributors) react before these breakages hit their users.

So let’s do something about it. Let’s make our JavaScript a strongly, statically typed language!

Or let’s do something a little smarter.

JavaScript, with discipline

At this point, I would like to ask readers to please kindly stop preparing tar and feathers for me. I realize fully that JavaScript is a dynamic language and that turning it into a static language will certainly result in something quite disagreeable to use. Something that is verbose, has lost most of the power of JavaScript, and gained no safety guarantees.

Trust me on this, there is a way to obtain the best of both worlds, without sacrificing anything. Before discussing the manner in which we can attain this, let us first set objectives that we can hope to achieve with a type-disciplined JavaScript.

Finding errors

The main benefit of strong, static typing, is that it helps find errors.

  • Even the simplest analyses can find all syntax errors, all unbound variables, all variables bound several times and consequently almost all scoping errors, which can already save considerable time for developers. Such an analysis requires no human intervention from the developer besides, of course, fixing any error that has been thus detected. As a bonus, in most cases, the analysis can suggest fixes.
  • Similarly trivial forms of analysis can also detect suspicious calls to break or continue, weird uses of switch(), suspicious calls to private fields of objects, as well as suspicious occurrences of eval – in my book, eval is always suspicious.
  • Slightly more sophisticated analyses can find most occurrences of functions or methods invoked with the wrong number of arguments. Again, this is without human intervention. With type annotations/documentation, we can move from most occurrences to all occurrences.
  • This same analysis, when applied to public APIs, can provide developers with more informations regarding how their code can be (mis)used.
  • At the same level of complexity, analysis can find most erroneous access to fields/methods, suspicious array traversals, suspicious calls to iterators/generators, etc. Again, with type annotations/documentation, we can move from most to all.
  • Going a little further in complexity, analysis can find fragile uses of this, uncaught exceptions, etc.

Types as documentation

Public APIs must be documented. This is true in any language, no matter where it stands on the static/dynamic scale. In static languages, one may observe how documentation generation tools insert type information, either from annotations provided by the user (as in Java/JavaDoc) or from type information inferred by the compiler (as in OCaml/OCamlDoc). But look at the documentation of Python, Erlang or JavaScript libraries and you will find the exact same information, either clearly labelled or hidden somewhere in the prose: every single value/function/method comes with a form of type signature, whether formal or informal.

In other words, type information is a critical piece of documentation. If JavaScript developers provide explicit type annotations along with their public APIs, they have simply advanced the documentation, not wasted time. Even better, if such type can be automatically inferred from the source code, this piece of documentation can be automatically written by the type-checker.

Types as QA metric

While disciples of type-checking tend to consider typing as something boolean, the truth is more subtle: it quite possible that one piece of code does not pass type-checking while the rest of the code does. Indeed, with advanced type systems that do not support decidable type inference, this is only to be expected.

The direct consequence is that type-checking can be seen as a spectrum of quality. A code can be seen as failing if the static checking phrase can detect evident errors, typically unbound values or out-of-scope break, continue, etc. Otherwise, every attempt to type a value that results in a type error is a hint of poor QA practice that can be reported to the developer. This yields a percentage of values that can be typed – obtain 100% and get a QA stamp of approval for this specific metric.

Typed JavaScript, in practice

Most of the previous paragraphs are already possible in practice, with existing tools. Indeed, I have personally experienced using JavaScript static type checking as a bug-finding tool and a QA metric. On the first day, this technique has helped me find both plenty of dead code and 750+ errors, with only a dozen false positives.

For this purpose, I have used Google’s Closure Compiler. This tool detects errors, supports a simple vocabulary for documentation/annotations, fails only if very clear errors are detected (typically syntax errors) and provides as metric a percentage of well-typed code. It does not accept JavaScript 1.7 yet, unfortunately, but this can certainly be added.

I also know of existing academic work to provide static type-checking for JavaScript, although I am unsure as to the maturity of such works.

Finally, Mozilla is currently working on a different type inference mechanism for JavaScript. While this mechanism is not primarily aimed at finding errors, my personal intuition is that it may be possible to repurpose it.

What’s next?

I hope that I have convinced you of the interest of investigating manners of introducing static, strong type-checking to JavaScript. In a second part, I will detail how and where I believe that this can be done in Mozilla.

Next performance: OWASP 2010

June 22, 2010 § Leave a comment

I haven’t had much time to update this blog in the past few months. Well, the good news is that all this time — mostly spent on OPA — is starting to pay off. I’m starting to like our OPA platform quite a lot. Our next release, OPA S3, is shaping up to be absolutely great.

I’m now on my way to OWASP AppSec Research 2010, where I’ll present some of the core design of OPA. Normally, my slides will be made public after the talk, so I’ll try and link them here as soon as I return.

In the meantime, if you’re curious about OPA, I’m starring in a few Dailymotion tutorial slideshows :)

An IRC channel for OPA

January 26, 2010 § Leave a comment

Just a short entry to inform you that we now have an IRC channel for general discussion about OPA. It’s on Freenode and it’s called, well, #opa.

We call it OPA

November 28, 2009 § 7 Comments

Web applications are nice. They’re useful, they’re cross-platform, users need no installation, no upgrades, no maintenance, not even the computing or storage power to which they are used. As weird as it may sound, I’ve even seen announcements for web applications supposed to run your games on distant high-end computers so that you can actually play on low-end computers. Go web application!

Of course, there are a few downsides to web applications. Firstly, they require a web connexion. Secondly, they are largely composed of plumbing. Finally, ensuring their security is a constant fight.

« Read the rest of this entry »

Firefox, les extensions et la sécurité

September 22, 2008 § 1 Comment

Il y a environ un an, un de mes collègues travaillait sur un dispositif de communication pour les services d’urgence. Ce dispositif, prévu pour être utilisé lors de catastrophes naturelles ou d’attentats terroristes majeurs, devait permettre une meilleure synchronisation entre la police, les pompiers et les équipes de secouristes. Ah, et au cas où il se serait effectivement agi d’un attentat, le dispositif se devait d’être sûr. Conclusion du collègue en question : “comme Firefox est sûr, je vais faire cela sous la forme d’une extension Firefox, et ça sera sûr.”

À ce stade, j’espère que vous êtes déjà en train de vous moquer intérieurement de mon collègue. Cela dit, savez-vous effectivement quel risque vous courez en écrivant une extension ? Savez-vous quel risque vous faites courir à l’utilisateur ? Et savez-vous à quel point vous pouvez compter sur la plateforme Mozilla pour vous aider à écrire des extensions ou des applications sûres ?

À l’occasion, je tâcherai d’écrire un article détaillé sur le sujet. En attendant, pour ceux qui se posent des questions, voici les transparents (en français) de l’exposé que j’ai donné samedi dernier au Mozilla Add-Ons Workshop de Paris.

Ah, un indice : ayez peur.

What’s up with Batteries?

August 27, 2008 § 7 Comments

A few days ago, someone asked me whether there was more to OCaml Batteries Included than Lazy Lists. The answer is a definite yes. If you have looked at the repository recently, you may have seen that there is plenty more waiting for testing and for a release.

Let’s take a brief look at what is done and what is left to do.

« Read the rest of this entry »

Extrapol and Korset

June 19, 2008 § Leave a comment

A colleague recently pointed me towards Korset, a program developed by Ohad Ben-Cohen and Avishai Wool promising features comparable to Extrapol. While I must admit I’m slightly skeptical about the promise of “provable zero false alarm” — since the problem is undecidable, usually people tend to develop “provably complete” rather than “provably sound” analysis — it sounds like an interesting development.

Now, from what I understand, Korset will be presented to Blackhat in a few months, and the rules of the conference forbid the developers from giving away any detail. Until then, we have no way of comparing the unreleased Extrapol and the equally unreleased Korset.

Note: the tarball for the first prototype of Extrapol is waiting on my hard-drive for release clearance. I hope I’ll be able to release it next Tuesday or Wednseday. Stay tuned.

Extrapol source code available (not)

June 17, 2008 § Leave a comment

A quick note to inform you that the repository for Extrapol is now public. The source code as available on the repository does not have a licence yet and will not compile as such, due to dependencies on libraries available somewhere else. Stay tuned for an actual release.

Update: Sorry, repository cut off by the administrator. I’ll inform you when the sources are back.

Note rapide pour vous informer que le code source d’Extrapol est maintenant disponible au public. Il ne s’agit pas encore d’une version officielle — en particulier, le code n’a pas encore de licence et il manque des bibliothèques (disponibles ailleurs). Plus de détails dès qu’une version officielle est disponible.

Additif: Désolé, je viens d’apprendre que le dépôt de source a été isolé par l’administrateur. Je vous tiendrai au courant dès que le code source est de nouveau public.

MLS for Thunderbird or “O gosh, perhaps I shouldn’t have sent confidential info to a public mailing-list”

November 1, 2007 § 4 Comments

This entry is a brief presentation of an on-going work in progress by a group of my students in ENSI de Bourges, Vincent Tarbouriech and Roland Thaisong.

The problem

If your daily work is to deal with sensitive subjects, whether in a laboratory, in the industry, in an administration or in a newspaper, chances are that your computer will host a number of confidential informations. By definition, if someone who doesn’t have the necessary credentials gets their hands on this information, you’re in trouble.

As long as the information remains on your computer and your computer uses a reasonably safe operating system and you’re behind a reasonably safe firewall and you don’t need to communicate any information to anyone, you should be reasonably safe from malicious third-parties. However, if e-mail happens to be one of your work tools and if you may need some of that information to some people but not to others, accidental leaks become a definite possibility.

« Read the rest of this entry »

Where Am I?

You are currently browsing entries tagged with safety at Il y a du thé renversé au bord de la table.

Follow

Get every new post delivered to your Inbox.

Join 25 other followers