Lecture 2: Pattern Binding (plus scope checking)

Modules referenced in this lecture

Overview and Goals

In Lecture 1 we represented terms in a simple language using well-scoped de Bruijn indices and implemented a big-step call-by-value evaluator.

But, do we need to do all of this work every time? Can library support help?

And, will this approach work for more sophisticated languages, which might include, say, pattern matching?

Additionally, when working on a real implementation, we want to do more with our AST besides evaluate it. For example, we’d like to be able to parse and pretty-print our terms, just to make them easier to work with.

While full parsing and pretty-printing is out of scope (heh) for this tutorial, we do want to consider how well scoped terms interact with these operations. In particular, if a user writes their code with explicit names, we need to ensure that all names are in scope. Similarly, to print code nicely, we want to preserve the names that the user originally wrote.

The goals of this lecture are to:

1. rebound definitions and generic substitution

Last time, we worked with the module Tutorial.Scoped.Syntax and implemented a simple interpreter for the lambda calculus. The rebound library can replace much of this code with imported definitions and derived operations.

The module Tutorial.Scoped.SyntaxScratch is a version of the same file relying on the library for the definitions of the Bind1 and Bind2 types, the Env data structure, and associated operations. It also uses generic programming (i.e. GHC.Generics) to replace the definition of applyE.

The key parts of this example are the instances of the SubstVar and Subst type classes for the Tm AST type. The first class identifies the Var constructor for the Tm type. This allows the library to have a polymorphic definition of idE (of type idE :: SubstVar v => Env v n n).

instance SubstVar Tm where
  var :: Fin n -> Tm n
  var = Var
  
instance Subst Tm Tm where
  applyE :: Env Tm m n -> Tm m -> Tm n
  applyE env (Var x)              = applyEnv env x
  applyE env (Lam b)              = Lam (applyE env b)
  applyE _   Unit                 = Unit
  applyE env (Pair a b)           = Pair (applyE env a) (applyE env b)
  applyE env (Inj i t)            = Inj i (applyE env t)
  applyE env (App f a)            = App (applyE env f) (applyE env a)
  applyE env (MatchUnit a b)      = MatchUnit (applyE env a) (applyE env b)
  applyE env (MatchPair a b) =
    MatchPair (applyE env a) (applyE env b)
  applyE env (MatchSum a b1 b2) =
    MatchSum (applyE env a) (applyE env b1) (applyE env b2)

The Subst class takes two type parameters and overloads the applyE operation. The polymorphic type looks like this:

applyE :: Subst v c => Env v n m -> c n -> c m

The first parameter is used in the type of the environment — this is the type of term that will replace all variables. The second is the type of term that we are substituting into. By separating these two types, the we can apply substitution environments not just to terms, but to types that contain terms. For example, the rebound library includes instances for Bind1 and Bind2, simplifying the definition above. No need to use up at the binders—those instances do that automatically.

However, applyE also has a generic default definition in the Subst type class. We can replace the above definition of applyE with the shorter implementation of isVar that identifies the variable constructor in the type.

instance Subst Tm Tm where
  isVar :: Tm n -> Maybe (Tm :~: Tm, Fin n)
  isVar (Var x) = Just (Refl, x)
  isVar _ = Nothing

The variable case is the only case in the definition of applyE that does anything interesting. All other cases do structural recursive calls. Note that isVar also includes a small proof of correctness.

isVar :: Subst v c => c n -> Maybe (v :~: c, Fin n)

Not only does it need to return the index of the variable, but it also must produce a proof that the two type parameters in the Subst class are the same. We can use Refl in this case becase Tm is the same of Tm. But if we were substituting in some other type c, not the same as Tm, then we couldn’t pick out a variable constructor — it would be unsound to replace that variable with a Tm.

Once we define isVar, we can use the default definition of applyE. No need to implement this recursion. It is our choice whether we want to do so our not.

The rebound library can do more than the simple pattern matching shown in that file. In Tutorial.Scoped.Syntax we extend the language with deep pattern matching. This allows us to write nested patterns, eliminated with a single form of case expression.

However, before we go into what is required for this extension, let’s talk a bit about parsing, pretty-printing, and scope-checking first.

2. Parsing and pretty-printing scoped syntax

The module Tutorial.Scoped.ScopeCheck provides a parser and pretty printer for well-scoped abstract syntax trees defined in Tutorial.Scoped.Syntax.

However, so that we can only talk about scoping, these operations are broken up into two steps, through the use of a parallel named AST.

In other words, to parse, we divide the work into parsing raw strings into a representation that uses strings for variable names, and a separate projection function that performs scope checking and produces well-scoped syntax trees.

         parse                      project
String ------------> Named Syntax --------------> Scoped Syntax 

In this process, two sorts of failures could occur: perhaps the string doesn’t parse, or perhaps one of the variables is out of scope. This process also resolves shadowing, where the name of one variable hides another.

The inverse of parsing is pretty printing. For clarity, as above we do this in two stages. First we inject the scoped syntax into the named syntax, replacing all indices with strings. Then we use a pretty printer for the named syntax.

 inject                pretty-print
Scoped Syntax -------> Named Syntax -------------> String

Below, and in the ScopeCheck module, we use S to refer to the Tutorial.Scoped.Syntax module and N to refer to the Tutorial.Named.Syntax module.

3. Remembering user supplied names

Where do the names come from during pretty printing? Of course, we can just make up names, making sure that we always use new ones. However, that can lead to confusion—we’d like to keep any user-supplied names if possible.

Therefore, we use a simple type, called LocalName to remember such names for printing.

When a user writes \x. x and we scope-check it, we produce:

S.Lam (S.bind (S.LocalName "x") (S.Var FZ))

This way, the string "x" is stored inside the binder as a LocalName. When later printed, the output reads \ x. x.

A local name is just a wrapper for a string, but we want to make sure that the strings do not interfere with alpha-equivalence.

newtype LocalName = LocalName { name :: String }

instance Eq LocalName where
    x1 == x2 = True   -- all LocalNames are equal!

The deliberately trivial Eq instance means that two binders with different user names are still considered equal as long as their bodies are equal under de Bruijn comparison. This gives the correct notion of α-equivalence for free:

--  λ x. x
t1 = S.Lam (S.bind (S.LocalName "x") (S.Var FZ))
--  λ y. y
t2 = S.Lam (S.bind (S.LocalName "y") (S.Var FZ))
-- >>> t1 == t2
-- True

The Pat.Bind interface

In Tutorial.Scoped.Syntax, we define the single binder type Bind1 n as Pat.Bind Tm Tm LocalName n, using a type defined in the rebound library. For convenience, we abbreviate this type as Bind11.

This example is a specific instance of pattern binding: the pattern is a single local name, stored alongside the body of the binder. The abstract type enforces access to the body only through smart constructors and accessors.

Function Type Description
bind LocalName -> Tm (S n) -> Bind1 n package a body under a binder
getPat Bind1 n -> LocalName retrieve the stored name
getBody Bind1 n -> Tm (S n) access the body of the binder
instantiate1 Bind1 n -> Tm n -> Tm n open the binder by substituting a term

However, pattern binding is a general construct and these operations are more generic than the types listed in the table above.

4. General pattern binding

In our simple language (from module Tutorial.Scoped.Scratch), we had separate constructors for pattern matching unit, pair and sum values. We would like to combine these into a single Match expression that allows nested patterns. This requires a pattern datatype and a way to bind the variables introduced by a pattern in the branch body.

The Pat and Branch datatypes

data Pat (m :: Nat) where
    PVar  :: LocalName -> Pat N1
    PUnit :: Pat N0
    PPair :: Pat m1 -> Pat m2 -> Pat (m2 + m1)
    PInj  :: Int -> Pat m -> Pat m

In the type Pat m index m is the number of variables bound by the pattern, tracked at the type level. PVar binds one variable, PUnit binds none, and PPair binds the sum of its sub-pattern counts. Note the order in the type of PPair: variables from the right sub-pattern (m2) are innermost (have the smaller de Bruijn indices), so they come first in the sum m2 + m1. PInj is a tag for injection patterns and passes the binding count through unchanged.

data Branch (n :: Nat) where
    Branch :: Pat.Bind Tm Tm (Pat m) n -> Branch n

Bind Tm Tm (Pat m) n is the pattern-binding abstraction provided by Rebound.Bind.Pat. For simplicity, we use the type abbreviation BindP m n to stand for this type.

This type pairs a pattern of type Pat m with a body of type Tm (m + n), where the body’s first m free variables are the ones the pattern binds. The existential over m is hidden inside Branch, so callers do not need to know the pattern’s arity statically.

As above, we can use the same operations for working with pattern binders, but this time the operations have the following type:

Function Type Description
bind Pat m -> Tm (m + n) -> BindP m n construct a branch
getPat BindP m n -> Pat m extract the pattern
getBody BindP m n -> Tm (m + n) extract the body
instantiate BindP m n -> Env Tm m n -> Tm n open by substituting an environment

The Sized instance

To create the general types for these operations, rebound needs to know the number of variables bound in any type used as a pattern. For types such as Pat m, this is easy — we just use m. However, it is less obvious that the type LocalName binds exactly one variable.

Therefore, rebound uses the Sized class to calculate this information, both in types and also dynamically, using a singleton.


instance Sized LocalName where
    type Size LocalName = N1
    
    size :: LocalName -> SNat (Size (LocalName))
    size _ = s1


instance Sized (Pat m) where
    type Size (Pat m) = m

    size :: Pat m -> SNat (Size (Pat m))
    size (PVar _)      = s1
    size PUnit         = s0
    size (PPair p1 p2) = sPlus (size p2) (size p1)
    size (PInj _ p)    = size p

The type SNat and type class SNatI provide runtime access to type-level natural numbers. Haskell is not a full-spectrum dependently-typed language, so numbers that appear in types cannot be pattern matched at runtime.

data SNat n where
   SZ :: SNat Z
   SS :: SNatI n1 => SNat (S n1)

The SNatI n acts as an implicit argument, and uses Haskell’s type inference to automatically supply runtime naturals when possible. The operations snat and withSNat convert between implicit and explicit arguments.

>>> :t snat
snat :: SNatI n => SNat n

>>> :t withSNat
withSNat :: SNat n -> (SNatI n => r) -> r

There are singleton versions of various operations for natural numbers. For example, we can add them:

>>> :t sPlus
sPlus :: SNat n1 -> SNat n2 -> SNat (n1 + n2)

Above, in the definition of size for the Pat type, sPlus (size p2) (size p1) mirrors the type m2 + m1 from the PPair constructor, keeping the runtime value and the type-level index in sync.

We can also test them for equality. The (overloaded) testEquality operation has a heterogenous type and produces a proof of equivalence for its indices when its arguments are equal.

>>> :t testEquality @SNat
testEquality @SNat :: TestEquality SNat => SNat a -> SNat b -> Maybe (a :~: b)

5. Alpha-equivalence for branches and patterns

We cannot derive Eq automatically for Pat or Branch because of the dependent index m.

When comparing branches, intuitively, we want to compare their patterns and their bodies. We would like to define an instance declaration like this:

-- Two branches are equal when their patterns are equal and their 
-- bodies are equal
instance Eq (Branch n) where
  Branch b1 == Branch b2 = 
      getPat b1 == getPat b2 && getBody b1 == getBody b2

However, this instance does not type check. The patterns in the two branches may bind different numbers of variables. Therefore, they have different types. So we need a heterogenous equality operation.

testEquality: heterogeneous pattern equality

Instead, we can create an instance of the TestEquality class. This instance compares the patterns for equality and also returns a proof that they bind the same number of variables.

instance TestEquality Pat where
  testEquality :: Pat a -> Pat b -> Maybe (a :~: b)
  testEquality (PVar x) (PVar y) = return Refl
  testEquality PUnit PUnit = return Refl
  testEquality (PInj i p) (PInj j p') | i == j = testEquality p p'
  testEquality (PPair p1 p2) (PPair p1' p2') = do
    Refl <- testEquality p1 p1'
    Refl <- testEquality p2 p2'
    return Refl
  testEquality _ _ = Nothing

Notice that PVar patterns are always considered equal regardless of the stored name—consistent with LocalName’s trivial Eq instance.

The returned proof is exactly what we need to be able to compare the bodies of the pattern with the usual Eq type class.

instance Eq (Branch n) where
  (==) :: Branch n -> Branch n -> Bool
  Branch b1 == Branch b2 = 
      case testEquality (getPat b1) (getPat b2) of
        Just Refl -> getBody b1 == getBody b2
        Nothing -> False

6. Evaluating pattern matching

The evaluator in Tutorial.Scoped.Eval provides three reduction strategies and uses pattern matching throughout.

Big-step evaluation (eval)

eval :: Tm Z -> Maybe (Tm Z) evaluates closed terms to values. The interesting cases are application and match:

eval (App m n) = do
    mv <- eval m
    nv <- eval n
    case mv of
        Lam b -> eval (instantiate1 b nv)
        _     -> Nothing
eval (Match e brs) = do
    v <- eval e
    br <- findBranch v brs
    eval br

In particular, in the case of a Match expression, findBranch iterates through the list of branches and comparing the value using the patternMatch operation with each pattern. If this comparison is successful, it returns an environment mapping each variable bound in the patter with a subterm of the matched value.

findBranch :: Tm n -> [Branch n] -> Maybe (Tm n)
findBranch _ [] = Nothing
findBranch e (Branch b : rest) =
    case patternMatch (getPat b) e of
        Just r  -> Just (instantiate b r)
        Nothing -> findBranch e rest

More generally patternMatch compares a pattern against a value and, on success, returns an environment Env Tm m n — a mapping from the m pattern variables to Tm n values.

7. Historical Notes

Scope checking as a compiler pass. The idea of converting named surface syntax to an internal nameless or index-based form is standard in compiler design. Early Lisp interpreters used association lists (alist) to map symbol names to values at runtime — the direct ancestor of the [(String, Fin n)] context used in projectTmWith. In modern compilers this conversion is a distinct front-end pass, often called name resolution or scope analysis, that runs after parsing and before type checking, and uses a more efficient data structure.

Singleton types and SNatI. To use a type-level natural number n :: Nat at runtime — e.g., to enumerate Fin n — one needs a singleton: a runtime value that mirrors the type. Simulating this in Haskell was described by McBride (“Faking It: Simulating Dependent Types in Haskell”, 2002) and later systematized in the singletons library (Eisenberg and Weirich, “Dependently Typed Programming with Singletons”, 2012). The SNatI typeclass and SNat type used by genTm follow this pattern.


Exercises

1. Tracing projectTmWith. Manually trace through the following call, writing down the association list at each recursive step and the final de Bruijn term produced:

projectTm (N.Lam "x" (N.Lam "y" (N.Var "x")))

Now do the same for:

projectTm (N.Case (N.Var "p")
    [(N.Pair [N.Var "x", N.Var "y"], N.Var "x")])

Which variable maps to FZ inside the body — x or y? Why?


2. Extending the conversions with let. Extend projectTmWith and injectTmWith in Tutorial.Scoped.ScopeCheck to handle a let-expression. Assume you have already added Let :: Tm n -> Bind1 n -> Tm n to Tutorial.Scoped.Syntax and N.Let :: String -> N.Tm -> N.Tm -> N.Tm to the named syntax.