MIXY and OCaml
I recently prototyped a system that mixes type checking with symbolic execution for my PL Seminar. The course site already contains out-of-date information regarding papers, but if you check back there I assume we’ll eventually post accurate paper topics and links to the rest of our seminar’s prototype implementations.
I’ll go into a little bit of detail on the MIXY system presented in the PLDI 2010 paper Mixing Type Checking and Symbolic Execution by K.Y. Phang, B.E. Chang, and J.S. Foster. You can take a look at my prototype implementation and some more notes in my github repo.
MIXY
MIXY presents a system that allows you to balance the tradeoff between precision and performance in a static analysis. Type systems typically provide efficient analyses, usually baked into the compilation process, at the cost of coarse results. On the other hand, symbolic execution can yield tremendous precision, but does so at the cost of performance of the analysis, often needing to consult model checkers and/or satisfiability solvers, that can have long tails in runtime.
In MIXY, users annotate a program with typed and symbolic tags, specifying which portions of code should be analyzed by which procedure. For portions of code known to be computationally expensive for symbolic execution, typed tags can weaken precision on just that region, while symbolic execution can still perform precise analysis elsewhere in your program. Alternatively, symbolic execution could refine types based on flow-sensative behavior, allowing a flow-insensitive refined type system to be more precise in its analysis.
The high-level idea is pretty straightforward, so here are a couple examples of what this could look like:
MIX(symbolic) {
let div =
fun (x:int) ->
fun (y:int) ->
if y == 0
then false
else x / y in
MIX(typed) { 10 + MIX(symbolic) { div 7 0 } } }
In the above, we try to reject the classic divide-by-0 mistake everyone always talks about. Simple type systems have no knowledge of values, and cannot catch this error statically. Here, we wrap the application of div
in a symbolic block, symbolically performing the application and recognizing that y == 0
, yielding a false
value. When we return to type check the addition 10 + ...
, we know we are trying to add an int
to a bool
, and we fail the type checker! Notice how the definition of div
obviously doesn’t type-check under any sane type system because the then
and else
branches differ in type. MIXY’s presentation of symbolic execution forks on conditionals, accumulating path conditions on code reachability, so that later on we know under what the return type of div
should be and under what conditions.
let val =
MIX(symbolic) {
if true
then
MIX(typed) { 0 }
else
MIX(typed) { 10 + false }
} in
val
In this example, we have a type error in code that is unreachable at runtime. Everything about this is obviously bad practice, but more subtle examples of this behavior silently live in untyped code all over the place. We know that 10 + false
should fail to type check, so let’s wrap it in a typed block. Since we symbolically evaluate the entire if
-statement, we will only perform typechecking on the else
branch if the path condition is feasible. In this way, we can eliminate analyzing unreachable code that would otherwise cause our analysis to fail. We can use variations of this to ensure that certain conditions in our program are never met, over all inputs.
That’s it for the basic examples I’ve conjured up (you can test them in my prototype!). Check out what I’ve implemented on github, along with my notes on some issues I had regarding implementing the formal system presented in the paper, and with the paper itself. Despite my issues with it, the paper authors implemented a substantial prototype of thier system to check non-null pointer dereferences in C, using logically qualified type inference. The paper contains more concrete examples, and some implementation issues of their own that I sort of glossed over in my prototype.
Functors, briefly
I also want to talk about a couple OCaml things that OCaml people do and you can too.
A ‘functor’ is one of the many functional programming idioms I feel are too often described in dense, mathematical terms for no reason at all.
Functors are simply modules parameterized on the signature of another module. For anyone from an OOP background, you can just think of it as a class parameterized by the interfaces necessary to define it, rather than inheriting from those interfaces. Pretending like OCaml doesn’t have an object system, this is how you implement generic modules that expect to be abale to call certain functions defined to be applied to some parameterized type.
(* Set.Make : functor (Comparable:COMPARABLE) -> sig
* type t
* type elt = Comparable.t
* ...
* val add : t -> elt -> t
* ...
* end *)
module C : COMPARABLE = struct
type t = int
let compare (x:t) (y:t) = Pervasives.compare x y
end
Set.Make(C)
In the above example, the Set.Make
functor is parameterized by a module whose signature defines a type t
and a binary function compare
that operates on two t
s. Throughout the implementation of the Set.Make
module, we can make use of values of type Comparable.t
, aliased to the type elt
, and can apply Comparable.compare
to perform traditional set operations, such as determining if a set contains an element before adding the element to it.
Mutually Recursive Functors
One of the key insights to the MIXY system is its agnosticism towards the type and symbolic analyses it integrates. Type checking and symbolic execution only interact when the analysis crosses the boundary between the two. The ‘mixing rule’ semantics given in the paper translate between type environments and symbolic state, retaining sufficient information to progress in the analysis soundly.
In my implementation, we mix a symbolic execution based off the operational semantics given in the paper, with a standard type system for a language akin to STLC. I finally found a use for mutually recursive functors in OCaml! The signatures for my typechecking and symbolic execution modules are mutually recursive, in the same way you can define mutually recursive even/odd functions. Here’s the concrete application of these functors in my prototype:
module rec T : Analyses.TYP = Typecheck.Make(SE)
and SE : Analyses.SYM = Symbolic_interp.Make(T)
Notice the use of rec
and and
keywords just like mutually recursive function definitions. It’s the same idea. It’s kind of cool that it’s more or less an intuitive implementation of the system’s semantics. I don’t think I ever came across an intuitive need for mutually recursive classes during my C++ days…
Cyclic build dependencies become something you have to think about with mutually recursive functors though. You’ll notice the signatures TYP and SYM are nested inside the Analyses compilation unit. Since the signatures of typechecking and symbolic execution need to know about each other, they must be colocated in the same file, or they would otherwise cause cyclic build dependencies and be rejected by the compiler. This breaks down the ability to compartmentalize your code according to files somewhat, but I think that’s better than the alternative of losing code reuse by squashing both implementations into a single module.