Copying streams asynchronously

October 18, 2013 § Leave a comment

In the Mozilla Platform, I/O is largely about streams. Copying streams is a rather common activity, e.g. for the purpose of downloading files, decompressing archives, saving decoded images, etc. As usual, doing any I/O on the main thread is a very bad idea, so the recommended manner of copying streams is to use one of the asynchronous string copy APIs provided by the platform: NS_AsyncCopy (in C++) and NetUtil.asyncCopy (in JavaScript). I have recently audited both to ascertain whether they accidentally cause main thread I/O and here are the results of my investigations.

In C++

What NS_AsyncCopy does

NS_AsyncCopy is a well-designed (if a little complex) API. It copies the full contents of an input stream into an output stream, then closes both. NS_AsyncCopy can be called with both synchronous and asynchronous streams. By default, all operations take place off the main thread, which is exactly what is needed.

In particular, even when used with the dreaded Safe File Output Stream, NS_AsyncCopy will perform every piece of I/O out of the main thread.

The default setting of reading data by chunks of 4kb might not be appropriate to all data, as it may cause too much I/O, in particular if you are reading a small file. There is no obvious way for clients to detect the right setting without causing file I/O, so it might be a good idea to eventually extend NS_AsyncCopy to autodetect the “right” chunk size for simple cases.

Bottom line: NS_AsyncCopy is not perfect but it is quite good and it does not cause main thread I/O.

Limitations

NS_AsyncCopy will, of course, not remove main thread I/O that takes place externally. If you open a stream from the main thread, this can cause main thread I/O. In particular, file streams should really be opened with flag DEFER_OPEN flag. Other streams, such as nsIJARInputStream do not support any form of deferred opening (bug 928329), and will cause main thread I/O when they are opened.

While NS_AsyncCopy does only off main thread I/O, using a Safe File Output Stream will cause a Flush. The Flush operation is very expensive for the whole system, even when executed off the main thread. For this reason, Safe File Output Stream is generally not the right choice of output stream (bug 928321).

Finally, if you only want to copy a file, prefer OS.File.copy (if you can call JS). This function is simpler, entirely off main thread, and supports OS-specific accelerations.

In JavaScript

What NetUtil.asyncCopy does

NetUtil.asyncCopy is a utility method that lets JS clients call NS_AsyncCopy. Theoretically, it should have the same behavior. However, some oddities make its performance lower.

As NS_AsyncCopy requires one of its streams to be buffered, NetUtil.asyncCopy calls nsIIOUtil::inputStreamIsBuffered and nsIIOUtil::outputStreamIsBuffered. These methods detect whether a stream is buffered by attempting to perform buffered I/O. Whenever they succeed, this causes main thread I/O (bug 928340).

Limitations

Generally speaking, NetUtil.asyncCopy has the same limitations as NS_AsyncCopy. In particular, in any case in which you can replace NetUtil.asyncCopy with OS.File.copy, you should pick the latter, which is both simpler and faster.

Also, NetUtil.asyncCopy cannot read directly from a Zip file (bug 927366).

Finally, NetUtil.asyncCopy does not fit the “modern” way of writing asynchronous code on the Mozilla Platform (bug 922298).

Helping out

We need to fix a few bugs to improve the performance of asynchronous copy. If you wish to help, please do not hesitate to pick any of the bugs listed above and get in touch with me.

Asynchronous database connections in the Mozilla Platform

July 19, 2013 § 2 Comments

One of the core components of the Mozilla Platform is mozStorage, our low-level database, based on sqlite3. mozStorage is used just about everywhere in our codebase, to power indexedDB, localStorage, but also site permissions, cookies, XUL templates, the download manager (*), forms, bookmarks, the add-ons manager (*), Firefox Health Report, the search service (*), etc. – not to mention numerous add-ons.

(*) Some components are currently moving away from mozStorage for performance and footprint reasons as they do not need the safety guarantees provided by mozStorage.

A long time ago, mozStorage and its users were completely synchronous and main-thread based. Needless to say, this eventually proved to be a design that doesn’t scale very well. So, we set out on a quest to make mozStorage off main thread-friendly and to move all these uses off the main thread.

These days, whether you are developing add-ons or contributing to the Mozilla codebase, everything you need to access storage off the main thread are readily available to you. Let me introduce the two recommended flavors.

Note: This blog entry does not cover using database from *web applications* but from the *Mozilla Platform*. From web applications, you should use indexedDB.

« Read the rest of this entry »

Tales of Science-Fiction Bugs: The Thing That Killed Talos

November 12, 2012 § 4 Comments

Have you ever encountered one of these bugs? One in which every single line of your code is correct, in which every type-check passes, every single unit test succeeds, the specifications are fulfilled but somehow, for no reason that can be explained rationally, it just does not work? I call them Science-Fiction Bugs. I am sure that you have met some of them. For some reason, the Mozilla Performance Team seems to stumble upon such bugs rather often, perhaps because we spend so much time refactoring other team’s code long after the original authors have moved on to other features, and combining their code with undertested libraries and technologies. Truly, this is life on the Frontier.

Today, I would like to tell you the tale of one of these Science-Fiction Bugs: The Thing That Killed Talos.

« Read the rest of this entry »

Fighting the good fight for fun and credits, with Mozilla Education

October 30, 2012 § Leave a comment

Are you a student?

Do you want to fight the good fight, for the Future of the Web, and earn credits along the way?

Mozilla Education maintains a tracker of student project topics. Each project is followed by one (or more) mentor from the Mozilla Community.

Then what are you waiting for? Come and pick or join a project or get in touch to suggest new ideas!

Are you an educator?

The tracker is also open to you. Do not hesitate to pick projects for your students, send students to us or contact us with project ideas.

We offer/accept both Development-oriented, Research-oriented projects and not-CS-oriented-at-all projects.

Are you an open-source developer/community?

If things work out smoothly, we intend to progressively open this tracker to other (non-Mozilla) projects related to the future of the web. Stay tuned – or contact us!

Scoped resources for all

April 12, 2012 § 2 Comments

A small class hierarchy has been added to MFBT, the “Mozilla Framework Based on Templates” which contains some of the core classes of the Mozilla Platform. This hierarchy introduces general-purpose and specialized classes for scope-based resource management. When it applies, Scope-based resource management is both faster than reference-counting and closer to the semantics of the algorithm, so you should use it :)

The codebase of Mozilla is largely written in C++. While C++ does not offer any form of automatic memory management, the (sometimes scary) flexibility of the language has allowed numerous projects to customize the manner in which memory and other resources are managed, and Mozilla is no exception. Largely, the Mozilla C++ codebase uses reference-counting, to provide automatic memory management in most cases.

While reference-counting is quite imperfect, and while future versions of Mozilla might possibly use other forms of memory management, it is also a very useful tool for such a large codebase. However, in some cases, reference-counting is just too much. Indeed, in a number of simple cases, we prefer the simpler mechanism of scope-based resource management, that is both more predictable, faster and more resource-efficient – at the cost of not being able to scale up to the more complex cases for which reference-counting or even more powerful mechanisms become much more suited.

Scope-based resource management is designed to handle resources that should be cleaned-up as soon as you leave a given scope (typically, the function), regardless of how you leave it (by reaching the end, with a break, a return or an exception).

The following extract illustrates the use of scoped resource allocation:

// returns true in case of success, false in case of error
bool copy(const char* sourceName, const char* destName, size_t bufSize) {
   ScopedFD source(open(sourceName, O_RDONLY));
   if (source.get() == -1) return false;

   ScopedFD dest(open(destName, O_WRONLY|O_CREAT, 0600));
   if (dest.get() == -1) return false;
     // source is closed automatically

   ScopedDeleteArray buf(new char[bufSize]);
   if (buf.get() == NULL) return false;
     // source, dest are closed automatically

   while (true) {
     const int bytesRead = read(source.get(), buf.rwget(), bufSize);
     if (bytesRead == 0) break;
     if (bytesRead == -1) return false;
       // source, dest, buf are cleaned-up

     const int writePos = 0;
     while (writePos < bytesRead) {
       const int bytesWritten = write(dest.get(), buf.get(),
                                      bytesRead - writePos);
       if (bytesWritten == -1) return false ;
         // source, dest, buf are cleaned-up
       writePos += bytesWritten;
     }
   }

   return true;
      // source, dest, buf are cleaned-up
}

As you can see, the main point of these scope-based resource management classes is that they are cleaned up automatically both in case of success and in case of error. In some cases, we wish to clean up resources only in case of error, as follows:

// returns -1 in case of error, the destination file descriptor in case of success
int copy(const char* sourceName, const char* destName, size_t bufSize) {
   ScopedFD source(open(sourceName, O_RDONLY));
   if (source.get() == -1) return -1;

   ScopedFD dest(open(destName, O_WRONLY|O_CREAT, 0600));
   if (dest.get() == -1) return -1;
      // source is closed automatically

   ScopedDeleteArray buf(new char[bufSize]);
   if (buf.get() == NULL) return -1;
      // source, dest are closed automatically

   while (true) {
     const int bytesRead = read(source.get(), buf.rwget(), bufSize);
     if (bytesRead == 0) break;
     if (bytesRead == -1) return -1;
      // source, dest, buf are cleaned-up

     const int writePos = 0;
     while (writePos < bytesRead) {
       const int bytesWritten = write(dest.get(), buf.get(),
                                      bytesRead - writePos);
       if (bytesWritten == -1) return -1 ;
        // source, dest, buf are cleaned-up
       writePos += bytesWritten;
     }
   }

   return dest.forget();
   // source and buf are cleaned-up, not dest
}

While both examples could undoubtedly be implemented with reference-counting or without any form of automated resource management, this would either make the source code much more complex and harder to maintain (for purely manual resource management) or make the executable slower and less explicit in terms of ownership (for reference-counting). In other words, scoped-based resource management is the right choice for these algorithms.

Now, the Mozilla codebase has offered a few classes for scope-based resource management. Unfortunately, these classes were scattered throughout the code, some of them were specific to some compilers, and they were generally not designed to be reusable.

We have recently starting consolidating these classes into a simple and extensible hierarchy of classes. If you need them, you can find the root of this hierarchy, as well as the most commonly used classes, on mozilla-central, as part of the MFBT:

  • ScopedFreePtr<T> is suited to deallocate C-style pointers allocated with malloc;
  • ScopedDeletePtr<T> is suited to deallocate C++-style pointers allocated with new;
  • ScopedDeleteArray<T> is suited to deallocate C++-style pointers allocated with new[];
  • root class Scoped<Trait> and macro SCOPED_TEMPLATE are designed to make it extremely simple to define new classes to handle other cases.

For instance, class ScopedFD as used in the above examples to close Unix-style file descriptors, can be defined with the following few lines of code:


struct ScopedFDTrait
{
public:
  typedef int type;
  static type empty() { return -1; }
  static void release(type fd) {
    if (fd != -1) {
      close(fd);
    }
  }
};
SCOPED_TEMPLATE(ScopedFD, ScopedFDTrait);

So, well, if you need scoped-based resource management, you know where to find it!

I will blog shortly about the situation in JavaScript.

The OPA type system, part 1

January 7, 2010 § 2 Comments

edit Part 2 of this post was never written. I no longer work on Opa. For any question regarding Opa, please contact MLstate.

Since the initial announcement regarding OPA, we have received a number of questions regarding all the aspects of the language (including, suprisingly, a few demands for answers and documentation). Well, while we’re busy putting together documentation, benchmarks and FAQ, here’s a quick tour of one of the most fundamental pieces of the language: the type system.

« Read the rest of this entry »

Extrapol update

July 4, 2008 § Leave a comment

A quick work regarding the current status of Extrapol and its release.

Development of Extrapol progresses. With our current set of sample, Extrapol works flawlessly. We’re now adding features, improving error reporting and de-hard-wiring the model of the C standard library from the tool and moving it towards an external configuration file as well as progressively moving towards larger and more realistic samples. Development will come to an abrupt (and temporary) halt at the end of this week, though, due to personal matters (i.e. I’m getting married).

The release planned for next week, on the other hand, is canceled. As the research field of applied security is very competitive, and after careful discussion with the rest of my research team, we have decided to only release a version of Extrapol after the scientific content has been accepted for publication in a conference or journal. At the request of one of the institutes which founds this research, I will also refrain from posting detailed information on the theory and algorithms behind Extrapol, until these are cleared by the institute and accepted for publication. Without entering the details, Extrapol is expected to serve in critical infrastructures, which explains the need for clearance.

However, rest assured that there will be a release and it will be open-source (presumably licenced under a combination of MIT and LGPL). The only question is when — and this probably won’t happen before November.

Where Am I?

You are currently browsing entries tagged with c++ at Il y a du thé renversé au bord de la table.

Follow

Get every new post delivered to your Inbox.

Join 31 other followers