Static Analysis - Introduction

Nov 15, 2017 — Part of a series on Static analysis

In this project, we put changes to computer programs more central in the programming experience. As a result, the ability to do static analyses is lifted from single programs to the full dimension of any program’s construction. This has some useful applications, which are demonstrated in a series of articles.

Static program analysis is the analysis of computer programs that is performed without actually executing programs. Some examples are:

  • identifying variables that are declared, but never used;
  • identifying variables that are used, but never declared;
  • any kind of type-checker.

Such analyses are useful in helping the programmer identify potential errors as soon as possible.

Case in point: running a program with undeclared variables will lead to failure at the point the variable is used, which is why running such programs is ruled out in many (but not all) programming environments.

Let’s see how such analyses relate to the present project. This project is about Expressions of Change: environments in which the changes to programs take a central role, e.g. by having an editor which stores histories of structured changes to documents rather than the resulting documents.

In the context of such an editor, doing such static analyses for each version of the program yields an extra dimension of valuable information: in addition to giving information about what is wrong and where, such step by step analyses provide insight into when something went wrong, in terms of the edit-history of your program. This extra dimension may be quite helpful in answering the most important question, which is the question as to why.

As an example, consider an analysis which identifies (attempted) usage of undeclared variables. A traditional such analysis will point at the location in our program where an unknown variable is used. Doing the same analysis for each version of the program (where every modification of the program yields a new version) will help us pinpoint when the problem was introduced. In other words: it will pinpoint a precise modification of the program. By examining this modification, the question of why may be more readily answered. Examples of possible answers are:

  • the variable was renamed at its usage-location;
  • the variable was renamed at the location of its definition;
  • the location of the variable’s definition was removed entirely;
  • the variable has never been defined.

This principle, of applying a given static analysis on each version of the program as a whole, gives us a new axis of analysis “for free”, namely that along the history of the program’s construction. By “for free”, we mean that we need not write any code: we simply take an existing tool for static analysis and run it at each step of the program’s construction. One might call this principle the naive approach to step by step program analysis.

In addition to this naive approach, we could consider approaches to step-wise static analysis which try to utilize both the structured nature of our editor and the stream of change-operations it produces. Such approaches to static analysis will take as input:

  • a previous version of the program,
  • a finished analysis for that previous version of the program and
  • a single change to the program (or part of it).

There are two important reasons to do so:

The first is that the approach of running a full static analysis for each change to a program, might be “free” in terms of the effort to develop the analyses, but is certainly not free in terms of processing power! When we take this approach the amount of required processing power and time is multiplied by the number of changes we wish to analyze. This likely excludes many interesting use-cases, such as the one of interactive feedback on the output of static analyses.

A second reason to consider a more incremental approach to static analysis is that such analyses, by virtue of their incremental formulation, may occasionally lead to output that is more easily understood by the human end-user of the analysis. In such analyses we spend quite some effort in precisely defining how various changes may affect the output of an analysis, and how information flows through a program, with the goal of reducing the analyses’ runtimes. It turns out this has the fortunate side-effect of saving processing power of the human interpreter of the result as well!

In the past month I have written a number of such incremental static analyses; I will present the results here as a series of articles. In the next article in the series we will define a small language, called L, as a basis for various analyses.