C data finalization – in JavaScript

May 2, 2012 § 4 Comments

A few iterations ago, the Mozilla Platform introducefd js-ctypes, a very nice Foreign Function Interface (FFI) for JavaScript. As its inspiration, Python’s ctypes, js-ctypes lets (privileged) JavaScript code open native libraries, import their functions and call these functions almost as if they were regular JavaScript functions.

Here is an example using the Unix libc:

// Open the C library
let libcCandidates = [
  'libSystem.dylib',// MacOS X
  'libc.so.6',      // Linux
  'libc.so'         // Android, B2G
];
let libc;
for each(let candidate in libcCandidates) {
  libc = ctypes.open(candidate);
  if (libc) {
    break;
  }
}

// Import some functions from libc
let open = libc.declare("open", ctypes.default_abi,
  /*return int*/ ctypes.int,
  /*const char* path*/ctypes.char.ptr,
  /*int oflag*/ ctypes.int
  /*int mode*/ ctypes.int);
let read = libc.read("read", ctypes.default_abi,
  /*return ssize_t*/ ctypes.ssize_t,
  /*int fildes*/ ctypes.int,
  /*void *buf*/ ctypes.voidptr_t,
  /*size_t nbytes*/ ctypes.size_t);
let close = libc.read("close", ctypes.default_abi,
  /*return int*/ ctypes.int,
  /*int fd*/ ctypes.int);

// Now use them
let myfile = open("/etc/passwd", 0, 0);
if (myfile == -1)
  throw new Error("Could not open file");
// ...

If you are familiar with XPConnect, the mechanism generally used in the Mozilla Platform for letting JavaScript and C++ interact, you can see that using js-ctypes to call native code directly is much nicer than adding a C++ XPCOM/XPConnect layer. From what I hear, it seems to be also much faster, as XPConnect needs to perform expensive magic to ensure that memory is properly passed between JavaScript and C++. In addition, this selfsame memory magic now prevents XPConnect from being executed from threads other than the main thread, which makes js-ctypes the only manner of doing any system access from worker threads.

Now, js-ctypes nicely solves the issue of calling native code from JavaScript. However, JavaScript and C are very different languages, with very different paradigms, so getting them to coexist requires a little more than simply the ability to place calls or convert values. In particular, C has:

  • manual resource management (memory must be released, file descriptors must be closed, locks must be released, etc.);
  • no language-level mechanism for error management (a task smaller than a process cannot be killed because of an error).

By contrast, Javascript has:

  • automated memory management, but no support for managing automatically resources other than memory (no user-level finalization or scoped resources mechanism);
  • several language/vm-level mechanisms that can kill a task in non-trivial manners (exceptions, “this script is busy”, etc.)

Unfortunately, putting all of this together makes it quite difficult to write JavaScript code that manipulates C resources without leaking. Such leaks can cause both performance issues (memory leaks, in particular, tend to slow down the whole system) and hard-to-track errors (leaking file descriptors can prevent the application from opening any new file, or, under Windows, can prevent the application from reopening some files that were improperly closed, while leaking locks can completely freeze an application).

Introducing C data finalization

For this reason, we have recently added a new features to js-ctypes, designed to add automated resource management to JavaScript: C data finalization.

Specifying a finalizer is simple:

function openfile(path, flags, mode) {
  let fd = open(path, flags, mode);
  if (fd == -1) {
    throw new Error("Could not open file " + path);
  }
  return ctypes.CDataFinalizer(fd, close);
}

What this code does is ensure that, whenever the file descriptor is garbage-collected, function close is called, releasing the C resources represented by that file descriptor. This value is C data with a finalizer, aka CDataFinalizer.

You can use it just as you would use the C data through js-ctypes:

let myfile = openfile("/etc/passwd", 0, 0);
let result = read(myfile, myarray, 4096); // Read some data
// Wherever required, |myfile| is automatically converted to
// the underlying integer value.
// Once |myfile| has no reference, it will (eventually) be
// closed.

It is, of course, possible (and strongly recommended) to close the file manually to ensure that resources are immediately available for the process and the rest of the system:

let myfile = openfile("/etc/passwd", 0, 0);
// ...
// ... do whatever you wish to do with that file
let result = myfile.dispose(); // This calls |close|.

// From this point, |myfile| cannot be converted to the underlying
// integer value anymore. Any attempt to do so will raise an
// exception.

Or, an equivalent but more verbose solution, using forget:

let myfile = openfile("/etc/passwd", 0, 0);
// ...
// ... do whatever you wish to do with that file
let fd = myfile.forget();
// From this point, |myfile| cannot be converted to the underlying
// integer value anymore. Any attempt to do so will raise an
// exception.
let result = close(fd);

This mechanism is, of course, not restricted to file descriptors. It has been used with success to other data structures, including malloc-allocated strings.

Details and caveat

JavaScript does not feature finalization and might never do so. There are good reasons for this: finalization considerably complicates the garbage-collector and introduces the possibility of subtle bugs and leaks that the various JS implementors do not want to inflict to their users (if you are curious, two of the main problems are resurrection of dead references and finalization of cyclic data structures).

Consequently, C data finalizers are not full-featured finalizers. Indeed, the main limitation of C data finalizers is that its first argument must be a C value and its second argument must be a pointer to a C function – for the above mentioned reasons, letting users specify any JavaScript function as a finalizer would open a can of worms that nobody really wants to see crawling around.

Also, before using a finalizer, you should be aware that JavaScript garbage-collection is not necessarily deterministic – during the testing phase of CDataFinalizer, we have encountered memory errors caused by developers (ok, I will confess, that was me, sorry guys) making invalid assumptions about just when values would be garbage-collected. Let me emphasize this: any hypothesis you make about when a value is finalized is bound to be regularly false. In other words, C data finalizers should be used as a last line of defense, not as the default mechanism for recovering resources.

Still, C data finalizers are a powerful mechanism that make manipulation of C values with JavaScript much more reliable. Indeed, it is one of the core mechanisms used pervasively by the OS.File library.

edit As per Steve Fink’s suggestion, I have emphasized that users should not rely on the behavior of garbage-collection/finalization, and clarified the can of worms.

Tagged: , , , , , , , , , ,

§ 4 Responses to C data finalization – in JavaScript

  • Steve Fink says:

    I think this finalization API is a good idea, but I would emphasize that you should *never* rely on its timing, and you should be prepared for it to simply never happen at all — especially with more advanced GC techniques, even closing all tabs associated with a domain is not guaranteed to finalize the JS data associated with those tabs. Maybe we’ll decide that your foreground tab is playing an animation and we don’t have time to clean anything up, or we’ll wait until there’s more memory pressure to do cleanup, or whatever.

    So I’d recommend cleaning up all external resources that you can. The finalizers are still necessary for everything left over, and as long as you’ve registered them then you can feel like you’ve done your duty. But nothing should rely on them running at any given time, and as little as possible should rely on them ever running at all.

    I would also quibble with “…for which we are not quite ready.” We’re nowhere near ready, and the most that would *ever* happen is some sort of severely restricted JS function. “Any JavaScript function” at all just isn’t going to happen.

  • yoric says:

    Fixed, thanks.

  • mossop says:

    So if you can’t rely on the finalizer ever being used you still have to do everything you had to do previously to stop any leaks from happening. So I’m confused what the benefit is here and concerned that developers will think they can use it as an alternative to manual clean-up (despite how bold you make the warnings)

    • yoric says:

      The main benefit is that without finalizers, C resources acquired through js-ctypes by a thread or script simply cannot be recovered if the thread/script is killed.

      A possible future secondary benefit, if we decide to push CDataFinalizer in this direction, could be an aid for tracking leaks of js-ctypes-allocated resources.

      Now, I concur that there is a risk of developers using it as an alternative to manual clean-up. We will have to ensure that documentation discourages developer from doing that.

Leave a comment

What’s this?

You are currently reading C data finalization – in JavaScript at Il y a du thé renversé au bord de la table.

meta