13 Advanced Features
The features in this section are largely independent of the rest of
Cyclone. It is probably safe to skip them when first learning the
language, but it is valuable to learn them at some point because they
add significant expressiveness to the language.
13.1 Existential Types
The implementation of a struct type can have
existentially bound type variables (as well as region
variables, tag variables, and so on). Here is a useless example:
struct T { <`a> `a f1; `a f2; };
Values of type struct T have two fields with the same (boxed)
type, but there is no way to determine what the type is. Different
values can use different types. To create
such a value, expressions of any appropriate type suffice:
struct T x = T{new 3, new 4};
Optionally, you can explicitly give the type being used for
`a:
struct T x = T{<int*@notnull> new 3, new 4};
As with other lists of type variables, multiple existentially bound
types should be comma-separated.
Once a value of an existential variant is created, there is no way to
determine the types at which it was used. For example,
T("hi","mom") and T(8,3) both have type
struct T.
The only way to read fields of a struct with existentially
bound type variables is pattern matching. That is, the
field-projection operators (. and ->) will
not type-check. The pattern can give names to the correct
number of type variables or have the type-checker generate names for
omitted ones.
Continuing our useless example, we can write:
void f(struct T t) {
let T{<`b> x,y} = t;
x = y;
}
We can now see why the example is useless; there is really nothing
interesting that f can do with the fields of t. In
other words, given T("hi","mom"), no code will ever be
able to use the strings "hi" or "mom". In any case,
the scope of the type `b is the same as the scope of the
variables x and y. There is one more restriction:
For subtle reasons, you cannot use a reference pattern (such as
*x) for a field of a struct that has existentially
bound type variables.
Useful examples invariably use function pointers. For a realistic
library, see fn.cyc in the distribution. Here is a smaller (and
sillier) example; see the following two sections for an
explanation of why the regions(`a) <= `r stuff is necessary.
int f1(int x, int y) { return x+y; }
int f2(string x, int y) {printf("%s",x); return y; }
struct T<`r::E> {<`a> : regions(`a) <= `r
`a f1;
int f(`a, int);
};
void g(bool b) {
struct T<`H> t;
if(b)
t = Foo(37,f1);
else
t = Foo("hi",f2);
let T{<`b> arg,fun} = t;
`b x = arg;
int (*f)(`b,int) = fun;
f(arg,19);
}
We could replace the last three lines with fun(arg)---the
compiler would figure out all the types for us. Similarly, the
explicit types above are for sake of explanation; in practice, we tend
to rely heavily on type inference when using these advanced typing
constructs.
13.2 The Truth About Effects, Capabilities and Effect Subset Constraints
An effect or capability is a set of (compile-time)
region names. We use effect to refer to the region names that
must be ``live'' for some expression to type-check and
capability to refer to the region names that are ``live'' at
some program point. A effect subset constraint indicates that
all region names that appear in one effect qualifier also appear in
another effect qualifier. Each program point has a set of ``known
effect subset relations''.
The intuition is that a program point's capability and subset
relations must imply that an expression's effect describes live
regions, else the expression does not type-check. As we'll see,
default effects for functions were carefully designed so that most
Cyclone code runs no risk of such an ``effect check'' ever failing.
But using existential types effectively requires a more complete
understanding of the system, though perhaps not as complete as this
section presents.
The form of effects or capabilities is described as follows:
-
{} is the empty set. At most the heap region
is accessed by an expression having this effect.
- `r is the set containing the indivisible effect
`r. This effect variable can be isntantiated with a set
consisting of one or more region names.
- e1 + e2 is the set containing the effects e1 and e2.
That is, we write + for set-union.
- regions(t), where t is a type is the set
containing all of the region names contained in t and
regions(`a) for all type variables `a contained in
t.
The description of regions(t) appears circular, but in fact
if we gave the definition for each form of types, it would not be.
The point is that regions(`a) is an ``atomic effect'' in the
sense that it stands for a set of regions that cannot be further
decomposed without knowing `a. The primary uses of
regions(t) are descibed below.
The form of an effect subset constraint is e1 <= e2 where
e1 and e2 are both effects.
We now describe the capability at each program point. On function
entry, the capability is the function's effect (typically the regions
of the parameters and result, but fully described below). In
a local block or a growable-region statement, the capability is the
capability of the enclosing context plus the block/statement's region
name.
The known effect subset relation at a program point are only those
that are mentioned in the type of the function within which the
program point occurs.
We can now describe an expression's effect: If it reads or writes to
memory described by a region name `r, then the effect
contains {`r}. If it calls a function with effect
e, then the effect conatins e. Every function
(type) has an effect, but programmers almost never write down an
explicit effect. To do so, one puts ``; e'' at the end of
the parameter list, wehre e is an effect. For example, we
could write:
`a id(`a x; {}) { return x; }
because the function does not access any memory.
If a function takes parameters of types t1,...,tn and
returns a value of type t, the default effect is simply
regions(t1)+...+regions(tn)+regions(t). In English, the
default assumption is that a function may dereference any pointers it
is passed, so the corresponding regions must be live. In our example
above, the default effect would have been regions(`a). If
the caller had instantiated `a with int*`r, then
with the default effect, the type-checker would require `r to
be live, but with the explicit effect {} it would not.
However, dangling pointers can be created only when using existential
types, so the difference is rarely noticed.
By default, a function (type) has no effect subset constraints. That
is, the function assumes no relationship between all the effect
variables that appear in its type. Adding explicit subset
relationships enables more subtyping in the callee and more stringent
requirements at the call site (namely that the relationship holds).
We can describe when a capability e and a set of effect
subset relations s imply an effect, although your intuition
probably suffices. A ``normalized effect'' is either {}
or unions of ``atomic effects'', where an atomic effect is either
{`r} or regions(`a). For any effect e1,
we can easily compute an equivalent normalized effect e2.
Now, e and s imply e1 if they imply every
{`r} and regions(`a) in e2. To imply
{`r} (or regions(`a)), we need {`r}
(or regions(`a)) to be in (a normalized effect of)
e) or for b to contain some `r <= `r2 such
that e and b imply `r2. Or something like
that.
All of these complications are unnecessary except for existential
types, to which we now return. Explicit bounds are usually necessary
to make values of existential types usable. To see why, consider the
example from the previous section, with the struct
declaration changed to remove the explicit bound:
struct T {<`a>
`a f1;
int f(`a, int);
};
(So the declaration of t should just have type struct
T.) Now the function call f(arg,19) at the end of
g will not type-check because the (default) effect for
f includes regions(`b), which cannot be established
at the call site. But with the bound, we know that
regions(`b) <=`H, which is sufficient to prove the call
won't read through any dangling pointers.
13.3 Interprocedural Memory Initialization
We currently have limited support for functions that initialize
parameters. if you have an *@notnulll1 parameter (pointing into any region),
you can use an attribute __attribute__((initializes(1))) (where it's
the first parameter, use a different number otherwise) to indicate
that the function initializes through the parameter.
Obviously, this affects the definite-assignment analysis for the
callee and the call-site. In the callee, we know the parameter is
initialized, but not what it points to. The memory pointed to must be
initialized before returning. Care must be taken to reject this code:
void f(int *@notnull*@notnull x) __attribute__((initializes(1))) {
x = new (new 0);
return x;
}
In the caller, the actual argument must point to a known location.
Furthermore, this location must not be reachable from any other actual
arguments, i.e., there must be no aliases available to the callee.
Two common idioms not yet supported are:
-
The parameter is
initialized only if the return value satisfies some predicate; for
example, it is 0.
- The caller can pass NULL, meaning do not initialize through this
parameter.