L - A toy language

Nov 16, 2017 — Part of a series on Static analysis

In this series we will examine a number of static analyses of programs, and how they might be formulated in terms of the changes to programs. Before turning our attention to such analyses, however, we need to define the semantics of the programs under consideration.

The editor that is part of this project puts changes to documents more centrally in the editing experience than traditional editors. In defining a language for static analysis, however, we shall start by focusing on the structure that can be constructed from these changes: an s-expression.

S-expressions are perhaps most well-known for being the basic building block of the syntax of Lisp and its various dialects. In this series we will stay true to that tradition, defining a minimal Lisp as the basis for further discussion. In the below, we shall refer to this language as L. Despite being quite minimal, L is sufficiently complete for actual usage, and for various static analyses to yield interesting results.

The given language is a subset of Scheme. In the interest of brevity the definition below is somewhat imprecise; for a full definition of Scheme the reader is referred to the excellent book by Abelson & Sussman or the accompanying video lectures. A more precise definition of the language under consideration here may be obtained by taking Scheme as defined in those sources, and limiting it to the special forms as defined below.

Operational semantics

In the below we provide an operational semantics, which is to say: the meaning of the language is entirely defined by its behavior in some imaginary machine. In particular, given an s-expression and an environment, we provide a mechanism for evaluation.

S-expressions have been talked about at length in our description of the editor; With regards to environments, it suffices to say two things:

  • Environments bind values to names; after a value has been bound to a name, looking up the name returns the value.
  • Environments may be nested; if a name is looked up in a nested environment and not found there, lookup recursively proceeds in the enclosing environment.

Case-analysis

We define evaluation in terms of a case-analysis on the shape of the s-expression. For list expressions, we use a pattern-matching-like notation for this shape; the reader is expected to be able to either deal with the lack of precision, or look up a more precise definition in the above-mentioned sources. Note that the definition is recursive: some of the mechanisms of evaluation below are defined in terms of further evaluation.

  • an atom which is either a string or a numbers is a value; its evaluation results in this value itself.
  • any other atom is a variable; its evaluation results in the value resulting from looking up this variable in the present environment.
  • (quote «s-expr») is a quoted expression; its evaluation results in the quoted s-expression as-is (i.e., specifically by not further evaluation the quoted s-expression).
  • (quote «predicate» «consequent» «alternative») is an if-expression; its evaluation depends on the value resulting from evaluating the predicate; if this is true, it is the result of evaluating of the conseqeuent, otherwise that of evaluating the alternative.
  • (define «variable-name» «definition») is a definition; its evaluation does not result in a (meaningful) value; however, as an effect of evaluating the result of evaluating the definition is bound to the variable-name in the present environment.
  • (lambda «parameter names» «body ...») is a lambda; it evaluates to a procedure object which contains the parameter names and body. (Note: unlike in Scheme, in which a special form of define is interpreted as a procedure, in L lambda’s can only be defined directly using the associated special form)
  • (begin «... ...») is a sequence; it evaluates each s-expression in turn, the evaluation of the sequence results in the last element’s evaluation.
  • («procedure» «arguments ...») is a procedure application; evaluation results in the value that is arrived at as follows: first, evaluate the procedure and all arguments in the present environment. Create a new (sub-)environment in which each of the procedure’s parameter names is bound to the value of the associated argument. Evaluate the procedure’s body in this environment. This is the result of evaluating the procedure application.

For the sake of simplicity, L has no primitive procedures; this is no problem, since we can always evaluate programs in some “global” environment in which all required primitive procedures are bound to some name.

An example

As an example, given this definition, consider the following s-expression:

(begin
  (define factorial
    (lambda (n)
      (if (= n 0)
        1
        (* n (factorial (- n 1))))))
  (factorial 5)
)

Evaluating this in an environment where the mathematical symbols are bound to procedures expressing their usual meaning yields the value 120.

Conclusions

This concludes the definition of the language L. In the next article in this series we will see a very first example of a static analysis on programs in L, namely the analysis on the forms in such programs.