Cyclone: Definite Assignment

Definite Assignment

It is unsafe to allow memory to be used as a value of a particular type just because the memory has been allocated at that type. In other words, you cannot use memory that has not been properly initialized. Most safe languages enforce this invariant by making allocation and initialization a single operation. This solution is undesirable in Cyclone for at least two reasons:

Many idioms require declaring variables in a wider scope than is convenient for initializing the variable.
C code, which we wish to port to Cyclone, is full of separated allocation and initialization, including all heap-allocated storage (i.e., malloc).

Inspired by Java’s rules for separate declaration and initialization of local variables, Cyclone has a well-defined, sound system for checking that memory is written before it is used. The rules are more complicated than in Java because we support pointers to uninitialized memory, as is necessary for malloc, and because C’s order-of-evaluation is not completely specified.

Here we begin with idioms that the analysis does and does not permit. With a basic sense of the idea, we expect programmers can generally not worry about the exact rules of the analysis. However, when the compiler rejects code because memory may be uninitialized, the programmer needs to know how to rewrite the code in order to pass the analysis. For this reason, we also give a more complete description of the rules.

We begin with examples not involving pointers. If you are familiar with Java’s definite assignment, you can skip this part, but note that struct and tuple fields are tracked separately. So you can use an initialized field before another field of the same object is initialized. (Java does not allow separate allocation and initialization of object fields. Rather, it inserts null or 0 for you.)

Finally, we do allow uninitialized numeric values to be accessed. Doing so is dangerous and error-prone, but does not compromise type safety, so we allow it.

The following code is accepted:

  extern int maybe();
  int f() {
    int *x, *y, *z;
    if(maybe())
      x = new 3;
    else
      x = new 4;
    while(1) {
      y = x;
      break;
    }
    if(z = new maybe() && maybe() && q = new maybe())
      return q;
    else
      return z;
  }

In short, the analysis checks that every control-flow path between a variable’s declaration and use includes an assignment to the variable. More generally, the analysis works on memory locations, not just variables. The analysis knows that loop bodies and conditional branches are only executed if the value of certain expressions are 0 or not 0.

The following code is safe, but is not accepted:

  extern int maybe();
  int f() {
    int * x = new 1;
    int * y;
    int b = maybe();
    if(b)
      y = 2;
    if(b)
     return y;
   return 0;
  }

The problem is that the analysis does not know that the second if-guard is true only if the first one is. General support for such “data correlation” would require reasoning about two different expressions at different times evaluating to the same value.

Unlike Java, Cyclone supports pointers to uninitialized memory. The following code is accepted:

  extern int maybe();
  int f() {
    int * x;
    int * z;
    int ** y;
    if(maybe()) {
      x = new 3;
      y = &x;
    } else {
      y = &z;
      z = new 3;
    }
    return *y;
  }

The analysis does not know which branch of the if will be taken, so after the conditional it knows that either “x is initialized and y points to x” or “z is initialized and y points to z.” It merges this information to “y points to somewhere initialized,” so the function returns an initialized value, as required. (It is safe to return uninitialized ints, but we reject such programs anyway.)

However, this code is rejected even though it is safe:

  extern int maybe();
  int f() {
    int * x;
    int * z;
    int ** y;
    if(maybe()) {
      y = &x;
    } else {
      y = &z;
    }
    x = new 3;
    z = new 3;
    return *y;
  }

The problem is that the analysis loses too much information after the conditional. Because y may allow (in fact, does allow) access to uninitialized memory and the analysis does not know exactly where y points, the conditional is rejected.

A compelling use of pointers to uninitialized memory is porting C code that uses malloc, such as the following (the cast is not necessary in Cyclone):

  struct IntPair { int x; int y; };
  struct IntPair * same(int z) {
    struct IntPair * ans =
          (struct IntPair *)malloc(sizeof(struct IntPair));
    ans->x = z;
    ans->y = z;
    return ans;
  }

There is limited support for passing a pointer to uninitialized memory to a function that initializes it. See Interprocedural Memory Initialization.

Certain expression forms require their arguments to be fully initialized (that is, everything reachable from the expression must be initialized) even though the memory is not all immediately used. These forms are the expression in “let p = e” and the argument to switch. We hope to relax these restrictions in the future.

You should now know enough to program effectively in Cyclone without immediately initializing all memory. For those wanting a more complete view of the language definition (i.e., what the analysis does and does not accept), we now go into the details. Note that the analysis is sound and well-specified–there is never a reason that the compiler rejects your program for unexplainable reasons.

For each local variable and for each program point that allocates memory, the analysis tracks information about each field. We call each such field a place. For example, in this code:

  struct B { int * x; $(int*,int*) y;};
  void f() {
    struct B b;
    struct B * bp = malloc(sizeof(B));
    ...
  }

the places are b.x, b.y[0], b.y[1], bp, <1>.x, <1>.y[0], and <1>.y[1] where we use <1> to stand for the malloc expression (a program point that does allocation). An initialization state can be “must point to P” where P is a path. For example, after the second declaration above, we have “bp must point to <1>.” An ensuing assignment of the form “bp->x = new 3” would therefore change the initialization state of <1>.x. If there is not a unique path to which a place definitely points, then we keep track of the place’s initialization level and escapedness. A place is escaped if we do not know exactly all of the places that must point to it. For example, both of the following fragments would cause all the places starting with <1> to be escaped afterwards (assuming bp must point to <1>):

  struct B * bp2;                some_fun(bp);
  if(maybe())
     bp2 = bp;

Note that if “p must point to P,” then p is implicitly unescaped because we cannot know that p points to P if we don’t know all the pointers to p. The initialization level is either None or All. All means p and everying reachable from p (following as many pointers as you want) is initialized.

Note that our choice of tracking “must point to” instead of “must alias” forces us to reject some safe programs, such as this one:

  int f() {
    int * x, int *y;
    int **p1;
    if(maybe())
     p1 = &x;
    else
     p1 = &y;
    *p1 = new 7;
    return *p1;
  }

Even though p1 has not escaped, our analysis must give it initialization-level None. Moreover, x and y escape before they are initialized, so the conditional is rejected.

For safety reasons, once a place is escaped, any assignment to it must be a value that is fully initialized, meaning everything reachable from the value is initialized. This phenomenon is why the first function below is accepted but not the second (the list_t typedefs is defined in the List library):

  list_t<`a,`H> copy(list_t<`a> x) {
    struct List *@notnull result, *@notnull prev;

    if (x == NULL) return NULL;
    result = new List{.hd=x->hd,.tl=NULL};
    prev = result;
    for (x=x->tl; x != NULL; x=x->tl) {
      struct List *@notnull temp = malloc(sizeof(struct List));
      temp->hd = x->hd;
      temp->tl = NULL;
      prev->tl = temp;
      prev = temp;
    }
    return result;
  }

  list_t<`a,`r2> rcopy(region_t<`r2> r2, list_t<`a> x) {
    struct List *@notnull result, *@notnull prev;

    if (x == NULL) return NULL;
    result = rnew(r2) List{.hd=x->hd,.tl=NULL};
    prev = result;
    for (x=x->tl; x != NULL; x=x->tl) {
      prev->tl = malloc(sizeof(struct List));
      prev->tl->hd = x->hd;
      prev->tl->tl = NULL;
      prev = prev->tl;
    }
    return result;
  }

In the for body, we do not know where prev must point (on the first loop iteration it points to the first malloc site, but on ensuing iterations it points to the second). Hence prev->tl may be assigned only fully initialized objects.