Security analysis

May 25, 2008 § Leave a comment

A few weeks ago, I promised I would tell you more about ExtraPol, my current research project. Well, before doing so, here’s a short reminder about the notion of security in computer science — and the manners of enforcing that security.

While most members of the computer science community agree that safety and security are desirable properties, there is little consensus on the methods to be used for ensuring safety or security. Indeed, even the actual meaning of these properties often remains an open question.

One possibility is to define security in terms of authorizations and safety in terms of real-world hazard. In this context, a system or subsystem is therefore secure if there is no way for something forbidden to happen, while it is safe if its use may only cause acceptable risks. Both notions are very broad and their enforcement is far from trivial. Even the reduced problem of ensuring that the installation and execution of a software application will not breach simple cases of security of a desktop station is an open research issue.

In practice, techniques used or investigated in the domain of security tend to fall roughly into three groups:

static analysis — try and detect security holes before running the program
dynamic analysis — try and detect security breaches as they happen
trace analysis — try and detect security breaches after they have happened.

Static analysis

Static analysis is all about examining the source code of a program and deducing information from that source code — the source code does not have to be in a high-level language, some tools perform analysis on assembly language or on JVML or .Net bytecode.

The most common form of static analysis is the type-checking phase built-in most compilers. While this form of type-checking is usually far from sufficient to detect most security holes it can guarantee memory protection for some languages (e.g. Java) and, coupled with well-chosen APIs, it can even guarantee resource usage protocols in some other languages (e.g. Haskell or OCaml, although I don’t think it is actually used for this purpose in either language). Other forms of type-checking may serve to compute guaranteed bounds to resource usage (e.g. Camelot or works by Yours Truly), to guarantee the absence of information leaks in cryptographic protocols (e.g. works by J. Goubault-Larrecq), the absence of deadlocks (e.g. works by Naoki Kobayashi), to prove mathematical theorems (e.g. Coq) etc. In its general form, a type-checker is the implementation of a specialized form of logic designed for a specific task and for a specific programming language. Each elementary primitive of the language is characterized by a type information (e.g. the set of operations which may be performed with/on that primitive), while each possible combinator of the language is associated to a type deduction rule, which permits deducing the type information of the combination based on the type information its constituents. Note that not all type-checkers are automatic: tools such as Coq require heavy interaction between the type-checker and the programmer.

Another form of static analysis is model-checking, as pioneered by E.M. Clarke, E.A. Emerson and J. Sifakis. Surprisingly enough, this technique is little-known outside of the worlds of research and critical systems — and this, despite the fact that it is actually the most widely used method for the verification of both software and hardware systems. Model-checking serves to check that an algorithm or its implementation matches a formal specification. This has been put to use for embedded systems in space-bound hardware, verification of flood control algorithms, experimental operating systems (e.g. Singularity), etc. In its general form, a model-checker for a given general-purpose logic (typically Temporal Logic or one of its variants) is a tool use to explore all possible states which may be reached by a system and asserting that each possible state verifies the specification.

A third common form of static analysis is abstract interpretation, as pioneered by Rhadia and Patrick Cousot. This technique, also little-known outside of the academic world, has been put to use for the verification of embedded airplane components, the search of possible optimizations during compilation, the search of defects in embedded medical components, etc. In its general form, an abstract interpreter is a language interpreter extended to work even when the value of some variable is totally or partially unknown, for instance because this value may depend on some outside information (e.g. the time of execution, the contents of some file or some user input).

Pros

Well-designed static analysis techniques provide not only hints that the program is correct / safe, but mathematical proofs of properties, which is much stronger.
Once the technique has been designed, mathematical tools exist to prove its safety.
For critical systems, critical properties must be guaranteed before actually deploying the program, which makes static analysis a natural choice.
By opposition to dynamic or trace analysis, most often, static analysis incurs no run-time cost. Rather, static analysis may permit additional optimizations.
Combined with Proof-Carrying Code techniques, static analysis provides checkable guarantees much stronger than program signatures.
Static analysis requires no support from the operating system.

Cons

Designing a static analysis technique correctly is very hard (PhD hard). Techniques must be developed once for each language and may need to be modified to accept new properties.
Most problems are undecidable, which means that any static analysis will be an approximation. Well-designed static analysis are conservative approximations, which means that they will reject some programs which are actually safe.
Static analysis may be algorithmically very heavy, possibly untractable, which means that some programs are just too complex to be examined.
Static analysis requires some kind of source code — static checking from assembler is often untractable.
The validity of the type system does not entail the validity of the type-checker.
Not very friendly with dynamic languages.

Dynamic analysis

Dynamic analysis is all about testing a program, often automatically, and checking as they happen its interactions with the system to determine whether anything unsafe has taken place. This does not require any source code, only an environment suited to testing.

The most visible form of dynamic analysis is testing, with its variant unit testing. Need I detail ? Interactions between the program being tested and the system are defined by the developer or by the user based on specifications — and based on whichever tools are available on the host operating system. The program is then executed and observed. If deemed satisfactory, the program may be deployed. This form of testing is actually not very different from trace analysis.

The other form of dynamic analysis is also pervasive, although usually much less visible: watchdogs and their beefed-up cousin sandboxes. In either case, support of the operating system is needed, sometimes as a virtual machine. This is used inside any modern operating system, under names such as “memory protection”, “access control lists”, “firewall”, “SELinux”, “AppArmor” etc. This is also used in Java or .Net, PHP, etc. In either case, the design of the operating system must allow capturing and, if necessary, rejecting interactions between the program and the system itself. Once this is given, during the actual execution of the program, every input and output of the program is captured and compared to a policy. If the policy specifies that the input/output is not legitimate, the interaction fails or the program is interrupted by force.

Pros

Dynamic analysis does not require the availability of source code.
Designing dynamic analysis policies is easier than designing static analysis.
Although not sufficient for every property, support exists in current operating systems.
Independent of the language.

Cons

There is no way of being 100% sure that dynamic analysis is exhaustive.
Typically, the dynamic analyser may be detected by the program being analysed. This makes it possible to write programs which are dangerous only when not being examined. A variant is to write a program which only becomes dangerous at a given date known to take place after the testing phrase.
Watchdogging / sandboxing incurs a typically heavy slowdown of the program.
Watchdogging / sandboxing require support of the operating system. In some cases, applying new policies may require altering the operating system.
Doesn’t take advantage of any language engineering.
Not necessarily independent from the run-time: a dynamic analyser must be used for native programs, another one for programs executed on the Java Virtual Machine, another one for the .Net Virtual Machine etc. Typically, the costs of these analyzers may stack: a program written for the JVM will incur the cost of the JVM sandbox, in addition to the SELinux sandbox and to the Unix kernel sandbox.
Designing good policies may be easier than designing static analysis — but it’s still quite hard and tedious to get it right. In the words of Theodore T’so’s, “life is too short for SELinux”.

Trace analysis

Trace analysis is all about recording inputs and outputs of programs as they take place — to analyse them later to find out if something wrong has happened.

I must admit that I am much less familiar with trace analysis than with the two previous approaches. In particular, I have no idea about the difficulty of designing instances of the analysis.

Some forms of testing should also be mentioned in trace analysis.

Pros

Trace analysis requires no source code.
Trace analysis may take advantage of more informations than dynamic analysis.
If necessary, trace analysis may be executed on a different computer than the program.
Independent of the language.

Cons

Trace analysis may be sufficient to contain damage but not to prevent it.
Trace analysis incurs some run-time penalty.
Trace analysis induces the creation of possibly huge log files.
Trace analysis may miss breaches.
A large enough breach may destroy or tamper with traces.
Not necessarily independent from the run-time: a dynamic analyser must be used for native programs, another one for programs executed on the Java Virtual Machine, another one for the .Net Virtual Machine etc.

What now ?

This was a short overview of what needs to be done to ensure the security of a program, how it is typically done and what the drawbacks are. As you may imagine, project ExtraPol lies somewhere in the realm of security analysis and attempts to improve the current situation.

More details on ExtraPol in a further post.

Tagged: abstract interpretation, compilation, compiler, debug, dependent types, dynamic analysis, i/o, open-source, policies, programming, security, security breaches, security holes, security policies, side-effects, static analysis, trace, trace analysis, type system

Il y a du thé renversé au bord de la table