Correctness and testing

If we take the long view, programmers have–to put it gently–a pretty mixed track record vis-a-vis bugs; writing bug-free software is really hard! At the same time, it’s quite important to ensure that compilers are correct. If your compiler has a bug, it could impact anyone writing programs in your language and is likely to be very hard for end-users to track down. So: what can we do to make sure that our software is correct?

We have a number of options, all of which have been applied (to varying degrees of success) to compilers:

Hand-written correctness arguments
Testing on particular programs
Testing different compilers against each other
Formal verification

For this class, our approach is going to be as follows. As we grow our language, we’re going to add features both to our compiler and to a definitional interpreter. This is an interpreter for our language where we don’t care about performance at all; all we care about is that it’s correct. Indeed, this interpreter will serve as the definition of our language.

Here’s such an interpreter for the language we have so far:

let rec interp_exp (exp : s_exp) : int =
  match exp with
  | Num n -> n
  | Lst [Sym "add1"; arg] ->
    (interp_exp arg) + 1
  | Lst [Sym "sub1"; arg] ->
      (interp_exp arg) - 1
  | _ -> raise (BadExpression exp)

let interp (program : string) = string_of_int (interp_exp (parse program))

Notice that our interpreter has a pretty similar structure to our compiler. We’ll see this throughout the semester.

Now that we have an interpreter, we can test our compiler against it:

let difftest (examples : string list) =
  let results = List.map (fun ex -> (compile_and_run ex, interp ex)) examples in
  List.for_all (fun (r1, r2) -> r1 = r2) results

let test () =
  difftest ["43"; "(add1 (add1 3))"; "(sub1 4)"]