OCaml | Il y a du thé renversé au bord de la table

JavaScript, this static language (part 1)

October 20, 2011 § 7 Comments

tl;dr

JavaScript is a dynamic language. However, by borrowing a few pages from static languages – and a few existing tools – we can considerable improve reliability and maintainability.

« Writing one million lines of code of JavaScript is simply impossible »

(source: speaker in a recent open-source conference)

JavaScript is a dynamic language – a very dynamic one, in which programs can rewrite themselves, objects may lose or gain methods through side-effects on themselves on on their prototypes, and, more generally, nothing is fixed.

And dynamic languages are fun. They make writing code simple and fast. They are vastly more suited to prototyping than static languages. Dynamism also makes it possible to write extremely powerful tools that can perform JIT translation from other syntaxes, add missing features to existing classes and functions and more generally fully customize the experience of the developer.

Unfortunately, such dynamism comes with severe drawbacks. Safety-minded developers will tell you that, because of this dynamism, they simply cannot trust any snippet, as this snippet may behave in a manner that does not match its source code. They will conclude that you cannot write safe, or even modular, applications in JavaScript.

Many engineering-minded developers will also tell you that they simply cannot work in JavaScript, and they will not have much difficulty finding examples of situations in which the use of a dynamic language in a complex project can, effectively, kill the project. If you do not believe them, consider a large codebase, and the (rather common) case of a large transversal refactoring, for instance to replace an obsolete API by a newer one. Do this in Java (or, even better, in a more modern mostly-static language such as OCaml, Haskell, F# or Scala), and you can use the compiler to automatically and immediately spot any place where the API has not been updated, and will spot a number of errors that you may have made with the refactoring. Even better, if the API was designed to be safe-by-design, the compiler will automatically spot even complex errors that you may have done during refactoring, including calling functions/methods in the wrong order, or ownership errors. Do the same in JavaScript and, while your code will be written faster, you should expect to be hunting bugs weeks or even months later.

I know that the Python community has considerably suffered from such problems during version transitions. I am less familiar with the world of PHP, but I believe this is no accident that Facebook is progressively arming itself with PHP static analysis tools. I also believe that this is no accident that Google is now introducing a typed language as a candidate replacement for JavaScript.

That is because today is the turn of JavaScript, or if not today, surely tomorrow. I have seen applications consisting in hundreds of thousands of lines of JavaScript. And if just maintaining these applications is not difficult enough, the rapid release cycles of both Mozilla and Chrome, mean that external and internal APIs are now changing every six weeks. This means breakage. And, more precisely, this means that we need new tools to help us predict breakages and help developers (both add-on developers and browser contributors) react before these breakages hit their users.

So let’s do something about it. Let’s make our JavaScript a strongly, statically typed language!

Or let’s do something a little smarter.

JavaScript, with discipline

At this point, I would like to ask readers to please kindly stop preparing tar and feathers for me. I realize fully that JavaScript is a dynamic language and that turning it into a static language will certainly result in something quite disagreeable to use. Something that is verbose, has lost most of the power of JavaScript, and gained no safety guarantees.

Trust me on this, there is a way to obtain the best of both worlds, without sacrificing anything. Before discussing the manner in which we can attain this, let us first set objectives that we can hope to achieve with a type-disciplined JavaScript.

Finding errors

The main benefit of strong, static typing, is that it helps find errors.

Even the simplest analyses can find all syntax errors, all unbound variables, all variables bound several times and consequently almost all scoping errors, which can already save considerable time for developers. Such an analysis requires no human intervention from the developer besides, of course, fixing any error that has been thus detected. As a bonus, in most cases, the analysis can suggest fixes.
Similarly trivial forms of analysis can also detect suspicious calls to break or continue, weird uses of switch(), suspicious calls to private fields of objects, as well as suspicious occurrences of eval – in my book, eval is always suspicious.
Slightly more sophisticated analyses can find most occurrences of functions or methods invoked with the wrong number of arguments. Again, this is without human intervention. With type annotations/documentation, we can move from most occurrences to all occurrences.
This same analysis, when applied to public APIs, can provide developers with more informations regarding how their code can be (mis)used.
At the same level of complexity, analysis can find most erroneous access to fields/methods, suspicious array traversals, suspicious calls to iterators/generators, etc. Again, with type annotations/documentation, we can move from most to all.
Going a little further in complexity, analysis can find fragile uses of this, uncaught exceptions, etc.

Types as documentation

Public APIs must be documented. This is true in any language, no matter where it stands on the static/dynamic scale. In static languages, one may observe how documentation generation tools insert type information, either from annotations provided by the user (as in Java/JavaDoc) or from type information inferred by the compiler (as in OCaml/OCamlDoc). But look at the documentation of Python, Erlang or JavaScript libraries and you will find the exact same information, either clearly labelled or hidden somewhere in the prose: every single value/function/method comes with a form of type signature, whether formal or informal.

In other words, type information is a critical piece of documentation. If JavaScript developers provide explicit type annotations along with their public APIs, they have simply advanced the documentation, not wasted time. Even better, if such type can be automatically inferred from the source code, this piece of documentation can be automatically written by the type-checker.

Types as QA metric

While disciples of type-checking tend to consider typing as something boolean, the truth is more subtle: it quite possible that one piece of code does not pass type-checking while the rest of the code does. Indeed, with advanced type systems that do not support decidable type inference, this is only to be expected.

The direct consequence is that type-checking can be seen as a spectrum of quality. A code can be seen as failing if the static checking phrase can detect evident errors, typically unbound values or out-of-scope break, continue, etc. Otherwise, every attempt to type a value that results in a type error is a hint of poor QA practice that can be reported to the developer. This yields a percentage of values that can be typed – obtain 100% and get a QA stamp of approval for this specific metric.

Typed JavaScript, in practice

Most of the previous paragraphs are already possible in practice, with existing tools. Indeed, I have personally experienced using JavaScript static type checking as a bug-finding tool and a QA metric. On the first day, this technique has helped me find both plenty of dead code and 750+ errors, with only a dozen false positives.

For this purpose, I have used Google’s Closure Compiler. This tool detects errors, supports a simple vocabulary for documentation/annotations, fails only if very clear errors are detected (typically syntax errors) and provides as metric a percentage of well-typed code. It does not accept JavaScript 1.7 yet, unfortunately, but this can certainly be added.

I also know of existing academic work to provide static type-checking for JavaScript, although I am unsure as to the maturity of such works.

Finally, Mozilla is currently working on a different type inference mechanism for JavaScript. While this mechanism is not primarily aimed at finding errors, my personal intuition is that it may be possible to repurpose it.

What’s next?

I hope that I have convinced you of the interest of investigating manners of introducing static, strong type-checking to JavaScript. In a second part, I will detail how and where I believe that this can be done in Mozilla.

Goodbye MLstate, goodbye Opa

September 6, 2011 § 6 Comments

Tides come and tides go.

Two years ago, I accepted to join MLstate, to take lead of the R&D group, and turn Opa from a promising early-stage demo into a world-class technology. And I am happy to say that we succeeded. Certainly, there are still many things that we would like to improve in Opa, but looking back on those two years, I am proud of the work we have accomplished, of the number of topics upon which we have pushed forward the state of the art, and even of many of the mistakes we have made, because they have expanded our understanding so much.

Now, after two years at MLstate, I am leaving. Our work is accomplished and I do not feel that I can contribute in any meaningful way to what MLstate has now become, nor that today’s MLstate can keep me excited and interested any longer. In the past few days, Opa has been featured on Lambda the Ultimate, on Hacker News and on Slashdot. Small and large high-tech companies have tried and enjoyed the technology. What better time than this to set sail and say goodbye to these two exciting years of my life?

As of today, I am not the Head of Research & Development, Chief Scientific Officer or Technological Evangelist at MLstate anymore. I will keep a distant eye on Opa, but I will not design or supervise its future versions. Mathieu Baudet, our COO, is replacing me as the supervisor for the development of Opa, while Adam Koprowski is replacing me as Technological Evangelist. Mathieu is a very intelligent security researcher and I am sure that he will impose a new style to the Opa team, and Adam is a bright and enthusiastic researcher/developer, and certainly the best person at MLstate to carry on Opa advocacy.

I would like to thank my University for supporting this foray into the exciting world of start-ups. I would like to thank our CEO for recruiting such a talented team. I would also like to thank Mehdi Ben Soltane, our CFO/HR director, who managed to do his job with a nice and welcome pinch of humor, even in the toughest of times. And mostly, I would like to thank all the R&D team: Maxime Audouin, Mathieu Barbin, Vincent Benayoun, Anthonin Bonnefoy, Raja Boujbel, Quentin Bourgerie, Sébastien Briais, Valentin Gatien-Baron, Louis Gesbert, Nicolas Glondu, Hugo Heuzard, Adrien Jonquet, Mikolaj Konarski, Adam Koprowski, Laurent LeBrun, Sarah Maarek, Grégoire Makridis, François Pessaux, Guillem Rieu, Pascal Rigaux, Norman Scaife, Rudy Sicard, François-Régis Sinot, Cédric Soulas, Quickie Squeaky, Hugo Venturini, Frédéric Ye, and all our successive generations of interns[1]: you are the best team I have ever had the chance to join, it really was an honor and a pleasure working with you all and I hope that those among you who have chosen to remain in MLstate have as much fun working under Mathieu’s leadership as I had working with you all.

Time to set sail! My next missive should arrive from the next port.

[1] Sorry, I do not have the list of interns at hand. But do not worry, I enjoyed working with you, too 🙂

Opa advocacy

August 28, 2011 § Leave a comment

Opa advocacy and tutorials have moved to their own, dedicated blog. The topics are now covered by Adam Koprowski. Thanks for handling this, Adam!

Opa on Lambda the Ultimate (and now Slashdot)

August 28, 2011 § Leave a comment

There is a nice discussion on Opa on Lambda the Ultimate forums. If you are not familiar with Lambda the Ultimate, know that this is the place for discussing new and exotic programming languages and programming concepts, so the simple fact of seeing a thread on LtU is something of an honor for us.

Listing Opa applications

June 7, 2011 § Leave a comment

Short update The opalang website now lists applications developed in Opa (registration required – part of the closed preview). If you are developing an application in Opa, please consider submitting this to the list.

Crowdsourcing the syntax

May 30, 2011 § 18 Comments

Feedback from Opa testers suggests that we can improve the syntax and make it easier for developers new to Opa to read and write code. We have spent some time both inside the Opa team and with the testers designing two possible revisions to the syntax. Feedback on both possible revisions, as well as alternative ideas, are welcome.

A few days ago, we announced the Opa platform, and I’m happy to announce that things are going very well. We have received numerous applications for the closed preview – we now have onboard people from Mozilla, Google and Twitter, to quote but a few, from many startups, and even from famous defense contractors – and I’d like to start this post by thanking all the applicants. It’s really great to have you guys & gals and your feedback. We are still accepting applications, by the way.

Speaking of feedback, we got plenty of it, too, on just about everything Opa, much of it on the syntax. This focus on syntax is only fair, as syntax is both the first thing a new developer sees of a language and something that they have to live with daily. And feedback on the syntax indicates rather clearly that our syntax, while being extremely concise, was perceived as too exotic by many developers.

Well, we aim to please, so we have spent some time with our testers working on possible syntax revisions, and we have converged on two possible syntaxes. In this post, I will walk you through syntax changes. Please keep in mind that we are very much interested in feedback, so do not hesitate to contact us, either by leaving comments on this blog, by IRC, or at feedback@opalang.org .

Important note: that we will continue supporting the previous syntax for some time and we will provide tools to automatically convert from the previous syntax to the revised syntax.

Let me walk you through syntax changes.

« Read the rest of this entry »

A few Opa applications

May 24, 2011 § Leave a comment

A few open-source Opa applications, written by beta testers or Opa team members, have been open-sourced in the past few days. Expect a few other releases in the upcoming days/weeks:

OpaChat – simple real-time web chat (works)
OpaStorage – simple distributed key/value store (works)
opaCAS – single sign-on (in progress)
Contre-Jour – thumbnail viewer (works)
OpaTetris – I’m sure you can guess what it’s about – based on HTML5 canvas (works) 🙂

Know of any other open-source Opa app? Then let me know!

Unbreaking Scalable Web Development, One Loc at a Time

May 23, 2011 § 18 Comments

The Opa platform was created to address the problem of developing secure, scalable web applications. Opa is a commercially supported open-source programming language designed for web, concurrency, distribution, scalability and security. We have entered closed beta and the code will be released soon on http://opalang.org, as an Owasp project .

Edit The video spawned a conversation on Reddit.
Edit Interesting followup on Hacker News.
Edit Reworked source code & comments for clarity. Thanks for the feedback.
EditCome and chat with us on Freenode, channel #opalang .

If you are a true coder, sometimes, you meet a problem so irritating, or a solution so clumsy, that challenging it is a matter of engineering pride. I assume that many of the greatest technologies we have today were born from such challenges, from OpenGL to the web itself. The pain of pure LAMP-based web development begat Ruby on Rails, Django or Node.js, as well as the current NoSQL generation. Similarly, the pains of scalable large system development with raw tools begat Erlang, Map/Reduce or Project Voldemort.

Opa was born from the pains of developing scalable, secure web applications. Because, for all the merits of existing solutions, we just knew that we could do much, much better.

Unsurprisingly, getting there was quite a challenge. Between the initial idea and an actual platform lay blood, sweat and code, many experiments and failed prototypes, but finally, we got there. After years of development and real-scale testing, we are now getting ready to release the result.

The parents are proud to finally introduce Opa. « Read the rest of this entry »

Post-OWASP AppSec Research

June 28, 2010 § Leave a comment

Well, I’m just back from the Way to Valhalla and OWASP AppSec Research 2010.

The welcome was great, with plenty of people interested in OPA — some of them actually looking enthusiastic. I was quite surprised to realize that a number of researchers, developers and consultants in the web security community are very much aware of the limitations of current-generation approaches to security, but just don’t have the resources to start working on a next-generation approach. Speaking of resources, we’re now getting close to being 7 years into the OPA project, a commitment that not many research groups or companies could make.

Interestingly, during his talk, Dave Wichers, the editor for the OWASP Top 10 Web Application Security Risks project, mentioned that the solution was certainly to switch language and paradigm, to something cleaner and easier to secure. This is, of course, exactly what we have been working on during all these years.

All the slides and videos of the conference should be uploaded soon on the official website. In the meantime, I have uploaded my slides. I’ll try and add some sound if I can work out some sound problems I’ve been encountering recently with my presentations.

Edit The presentation of OPA available on Dailymotion had sound issues. I’ve finally managed to fix them. Enjoy!

Offres d’emploi pour informaticiens de haut niveau

May 31, 2010 § Leave a comment

MLstate est une jeune entreprise innovante en campagne depuis 2008 pour la reconquête du web. Notre objectif : réinventer les bases technologiques et scientifiques des applications web, pour une toile plus saine, plus sûre et plus sécurisée. Notre équipe R&D compte une vingtaine de passionnés, docteurs ou ingénieurs en informatique, et est sur le point de s’agrandir.

Si vous êtes informaticien de haut niveau, inspiré et ingénieux, si vous êtes doté d’une forte culture informatique et scientifique, d’une grande connaissance des langages fonctionnels et impératifs, de la compilation, des systèmes de types, contactez-nous. Le candidat idéal, docteur ou non, avec ou sans expérience industrielle, aura aussi des connaissances en distribution, parallélisme, bases de données, sera capable d’évoluer dans un environnement polyglotte et disposera de la finesse nécessaire pour construire des produits finis.

Les problèmes à résoudre sont difficiles. Pour relever le défi, contactez-nous à careers@mlstate.com .