Concepts in Programming Languages, Part II: Statics

创建时间: 2020年3月2日

What are the statics of a programming language? Most programming languages exhibit a phase distinction between static and dynamic phase of processing. People sometime loosey say static of a language happened at "compile-time," and dynamic of a language occurred at "run-time." The static phase of a language contains lexing, parsing, and in the case of static-typed languages, type-checking, and name resolution. And the dynamic phase consists of the execution of a program. We will only focus on static-typed languages in this post since the statics of dynamically-typed languages are trivial.

In the static phase, we consider what the valid operations of a program are. We construct a set of rules on the typing judgments to state if an expression is well-formed for certain types.

Static of a small expression language

Let's consider a pretty dull expression-oriented language below.

Typeτ::=Num ⁣StrExpre::=num[n] ⁣str[s] ⁣plus(e1,e2) ⁣minus(e1,e2) ⁣concat(e1,e2) ⁣len(e)\begin{array}{rcll} \mathbf{Type} & \tau &::= & \text{Num} \\ && \quad\! | & \text{Str} \\ \mathbf{Expr} & e &::= & \text{num}[n] \\ && \quad\! | & \text{str}[s] \\ && \quad\! | & \text{plus}(e_1, e_2) \\ && \quad\! | & \text{minus}(e_1, e_2) \\ && \quad\! | & \text{concat}(e_1, e_2) \\ && \quad\! | & \text{len}(e) \end{array}

The above grammar defined two sorts, Type τ\mathbf{Type} \ \tau and Expr e\mathbf{Expr} \ e. A type in this language can either be number or string. Now it's time to define the inference rules for the derivation of each operation. First, the type of literals are quite obvious

num(n):Num\frac{}{\text{num}(n) : \text{Num}}str(s):Str\frac{}{\text{str}(s) : \text{Str}}

The we can define typing for operations of the language:

e1:Nume2:Numplus(e1,e2):Num\frac{e_1 : \text{Num} \quad e_2 : \text{Num}} {\text{plus}(e_1, e_2) : \text{Num}}

If both the left-hand e1e_1 side and the right-hand e2e_2 side have the type Num\text{Num}, the expression plus(e1,e2)\text{plus}(e_1, e_2) have the type Num\text{Num}. Otherwise, plus(e1,e2)\text{plus}(e_1, e_2) is ill-formed.

We can use the similar way to define the rest of the operations:

e1:Nume2:Numminus(e1,e2):Num\frac{e_1 : \text{Num} \quad e_2 : \text{Num}} {\text{minus}(e_1, e_2) : \text{Num}}e1:Stre2:Strconcat(e1,e2):Str\frac{e_1 : \text{Str} \quad e_2 : \text{Str}} {\text{concat}(e_1, e_2) : \text{Str}}e:Strlen(e):Str\frac{e : \text{Str}} {\text{len}(e) : \text{Str}}

With those basic rules, we can state that in our language, plus(num[1],num[2])\text{plus}(\text{num}[1], \text{num}[2]) is well formed and len(num[1])\text{len}(\text{num}[1]) is a type error.

Typing context

So far, our little language does not have variables. In real programming languages, the type system needs to consider the typing context. Let's introduce variables and let binding into our language:

Expre::= ⁣var(v) ⁣let(v,e1,e2)\begin{array}{rcll} \mathbf{Expr} & e &::= & \cdots \\ && \quad\! | & \text{var}(v) \\ && \quad\! | & \text{let}(v, e_1, e_2) \end{array}

Now we can define variable in our languages such as let(x,num[1],plus(var[x],num[2]))\text{let}(x, \text{num[1]}, \text{plus}(\text{var}[x], \text{num}[2])). In a concrete syntax, the expression looks like

let x = 1;
x + 2 // evaluates to 3

Whether plus(var[x],num[2])\text{plus}(\text{var}[x], \text{num}[2]) make sense depends on whether the variable x is defined in the surrounding context, but our inference rule for plusplus cannot catch that yet.

What you can do is to introduce another concept called typing context Γ\Gamma, which is a mapping from variables to types.

Γ  Γ,v:τ\begin{aligned} \Gamma \equiv& \ \varnothing \\ |& \ \Gamma', v: \tau \end{aligned}

We inductively define Γ\Gamma as either an empty set or the extension of another typing context Γ\Gamma' with one mapping from a variable to a type.

Then you need to change the judgment form of typing to Γe:τ\Gamma \vdash e : \tau, which means "under typing context Γ\Gamma, the type of expression ee is τ\tau."

For most of the rules, nothing exciting happens besides the additional typing context in all judgments.

Γnum(n):Num\frac{}{\Gamma \vdash \text{num}(n) : \text{Num}}Γstr(s):Str\frac{}{\Gamma \vdash \text{str}(s) : \text{Str}}Γe1:NumΓe2:NumΓplus(e1,e2):Num\frac{\Gamma \vdash e_1 : \text{Num} \quad \Gamma \vdash e_2 : \text{Num}} {\Gamma \vdash \text{plus}(e_1, e_2) : \text{Num}}Γe1:NumΓe2:NumΓminus(e1,e2):Num\frac{\Gamma \vdash e_1 : \text{Num} \quad \Gamma \vdash e_2 : \text{Num}} {\Gamma \vdash \text{minus}(e_1, e_2) : \text{Num}}Γe1:StrΓe2:StrΓconcat(e1,e2):Str\frac{\Gamma \vdash e_1 : \text{Str} \quad \Gamma \vdash e_2 : \text{Str}} {\Gamma \vdash \text{concat}(e_1, e_2) : \text{Str}}Γe:StrΓlen(e):Str\frac{\Gamma \vdash e : \text{Str}} {\Gamma \vdash \text{len}(e) : \text{Str}}

For variable name resolution, we need to lookup in the typing context.

Γ,x:τx:τ\frac{} {\Gamma, x : \tau \vdash x : \tau}

For let bindings, we can extend the typing context and check the inference rules recursively.

Γe1:τ1Γ,x:τ1e2:τ2Γlet(x,e1,e2):τ2\frac{ \Gamma \vdash e_1 : \tau_1 \quad \Gamma, x : \tau_1 \vdash e_2 : \tau_2 } {\Gamma \vdash \text{let}(x, e_1, e_2) : \tau_2}

If you want to prevent name shadowing, you can change the rule a little bit.

xΓΓe1:τ1Γ,x:τ1e2:τ2Γlet(x,e1,e2):τ2\frac{ x \notin \Gamma \quad \Gamma \vdash e_1 : \tau_1 \quad \Gamma, x : \tau_1 \vdash e_2 : \tau_2 } {\Gamma \vdash \text{let}(x, e_1, e_2) : \tau_2}