Model data as pairs

The previous chapter talks about self-calling procedures. In this chapter, I will introduce two new concepts: pairs and closures. A pair is the concept of a two-element data structure which is defined recursively. A closure is a concept where a procedure has a local state.

I will begin by introducing pairs as defined using standard Scheme procedures. Then, I will make pairs using only lambda expressions, which evaluate into procedures. I will link that with closures. Lastly, I will show recursion combined with pairs.

Scheme pairs

In Scheme, a pair is a two-element data structure, defined recursively.

A pair has exactly two elements.
The order matters.
The elements themself can be pairs.

In Scheme a pair is spelled as cons. This magic word is short for construct, which means: construct me a pair.

> cons
$1 = #<procedure cons (_ _)>

So, cons is a Scheme procedure of two arguments. Here are two examples.

> (cons 1 2)
$2 = (1 . 2)
> (cons 1 (cons 2 3))
$3 = (1 2 . 3)

The . dot means that the data structure created is a pair. The first element is left of the dot. The second element is to the right of the dot.

Why are there three elements in $3? This is because the left element is yet another pair. But Scheme merges the inner with the outer pair as to simplify the output and reduce the number of parentheses printed.

You can ask Scheme if what you have is a pair using the pair? procedure that accepts one argument. It returns #t if the argument is a pair and #f otherwise. Do not forget to write the question mark.

> pair?
$4 = #<procedure pair? (_)>
> (pair? 1)
$5 = #f
> (pair? (cons 1 2))
$6 = #t
> (pair? (cons 1 (cons 2 3)))
$7 = #t

There are two more procedures that you have to know. These are the car and the cdr. The strange names are due to historical reasons.

> car
$8 = #<procedure car (_)>
> cdr
$9 = #<procedure cdr (_)>

Procedures car and cdr accept one argument, a pair, and return the first and the second element in the pair, respectively.

> (car (cons 1 2))
$10 = 1
> (cdr (cons 1 2))
$11 = 2

Pairs out of lambda expressions

We have learned everything needed to write our own cons, car, and cdr procedures. This is not how these are implemented in Guile. Rather the purpose here is to show that you can implement cons, car, and cdr without using any data structures at all but only using procedures.

Consider the identity procedure that returns its argument.

(lambda (x) x)

A pair has exactly two elements and so will our identity procedure accept two arguments, denoted by x and y. But now there is a choice of selecting either x or y as the return value.

(lambda (x y) x)
(lambda (x y) y)

Let us add yet another argument, denoted by z. But for z we will say that it is a procedure, which we call with x and y.

(lambda (x y z) (z x y))

Let us try calling this procedure. It takes three arguments. So, we let 1, 2, and (lambda (x y) x), be the arguments.

> ((lambda (x y z) (z x y)) 1 2 (lambda (x y) x))
$1 = 1

Let us try calling this procedure again. This time using (lambda (x y) y).

> ((lambda (x y z) (z x y)) 1 2 (lambda (x y) y))
$2 = 2

This already resembles a pair. However, the cons procedure accepts only two arguments, instead of three. So, we want a procedure that takes 1 and 2, and returns something new, a thing, something which we call a pair. What if the returned pair is a procedure, which accepts one argument. The argument being a procedure that accepts two arguments. Hopefully, this will be clear in a moment.

Let my-cons be a procedure that accepts two arguments, denoted by x and y, and it returns a procedure that accepts one argument, denoted by p. But that argument p must be a procedure that accepts two arguments, because, my-cons invokes p with arguments x and y.

(define (my-cons x y) (lambda (p) (p x y)))

Let us try calling my-cons with 1 and 2.

> (my-cons 1 2)
$3 = #<procedure 7fa03e1e3900 at <unknown port>:6:22 (p)>

Let us try calling this procedure twice. Once with (lambda (x y) x). Once with (lambda (x y) y).

> ((my-cons 1 2) (lambda (x y) x))
$4 = 1
> ((my-cons 1 2) (lambda (x y) y))
$5 = 2

Now we just need to define these two functions as my-car and my-cdr.

> (define (my-car x y) x)
> (define (my-cdr x y) y)
> (my-car (my-cons 1 2))
$6 = 1
> (my-cdr (my-cons 1 2))
$7 = 2

Let us consider the definition of my-cons in more detail, as what is known as a closure.

Closures

To better understand (define (my-cons x y) (lambda (p) (p x y))), suppose we extend define with a lambda expression.

(define my-cons
  (lambda (x y)
    (lambda (p)
      (p x y))))

A procedure call of my-cons evaluates into a lambda expression, for which the evaluator knows what the values of x and y are, as they are given in the procedure call expression. For example, for the procedure call expression (my-cons 1 2), the values of x and y are 1 and 2, respectively.

The returned value is the expression (lambda (p) (p x y)). The evaluator can evaluate this expression, as it knows the values of x and y. Thus, the expression (my-cons 1 2) evaluates as (lambda (p) (p x y)), which evaluates into a procedure.

> (my-cons 1 2)
$1 = #<procedure 7fae4270ba20 at <unknown port>:2:30 (p)>

In computer programming, we call this closure, the idea that a procedure captures the definitions of variables in which it was created. A typical use of closures is to define private variables. Another use is defining new data structures, like the pair.

Recall that a pair may contain other pairs as its elements. This will let you represent lists out of pairs, as will be shown later. For now, the idea that pairs contain pairs is refered to as the closure property of cons.

This idea of closure property is similar to the idea of closure in math algebra. A closure is a property of an operation defined on a set, where performing that operation on elements of the set, always results in a new element belonging to the same set.

For example, consider the set of integers {1, 2, 3, ...}. Addition is a closed operation on this set because if you add any two integers from this set, the result will always be an integer from the same set. However, division is not a closed operation on the set of integers because dividing some integers may result in a fraction.

If you have a set S and an operation o, closure means that for any two elements a and b in S, the result of (o a b) is also an element of S.

Closure is an important property in algebra because it ensures that operations behave predictably and consistently within a set, allowing for the development of mathematical structures and systems.

With all that said, the meaning of closure depends on the context, in computer programming or in math algebra. To distinguish between the two, we often say lexical closures instead of simply closures.

Recursion with cons

Sometimes it is convenient to write recursive procedures that use cons as a tail-call.

Suppose you wish to find all integers within an interval, which is bounded by a low and a high value.

Let iter be a recursive procedure that accepts three numbers, denoted by x, low, and high.

(define (iter x low high)

Procedure iter returns x, when x equals high.

(define (iter x low high)
  (if (equal? x high)
      x

Otherwise, it constructs a pair of x and a procedure call. The invoked procedure is iter itself, with the argument x incremented by one.

(define (iter x low high)
  (if (equal? x high)
      x
      (cons x (iter (+ x 1) low high))))

Here is the trace of iter.

> ,trace (iter 1 1 5)
trace: |  (iter 1 1 5)
trace: |  |  (iter 2 1 5)
trace: |  |  |  (iter 3 1 5)
trace: |  |  |  |  (iter 4 1 5)
trace: |  |  |  |  |  (iter 5 1 5
trace: |  |  |  |  |  5
trace: |  |  |  |  (4 . 5)
trace: |  |  |  (3 4 . 5)
trace: |  |  (2 3 4 . 5)
trace: |  (1 2 3 4 . 5)

Procedure iter is not tail-recursive, because the tail-call is not a self-tail call.

To solve this problem, we have to move cons from the tail-call position. This can be done by adding a new argument r to iter, which will contain the result.

Consider the new procedure iter.

(define iter x low high r)

It returns r, instead of just x, when x equals high.

(define iter x low high r)
  (if (equal? x high)
      r

Otherwise, it calls itself as follows. The argument x is incremented by one. Arguments low and high are not changed. Argument r is replaced by a pair whose first element is x and whose second element is r.

(define (iter x low high r)
  (if (equal? x high)
      r
      (iter (+ x 1) low high (cons x r))))

Here is the trace of the new iter.

> ,trace (iter 1 1 5 0)
trace: |  (iter 1 1 5 0)
trace: |  (iter 2 1 5 (1 . 0))
trace: |  (iter 3 1 5 (2 1 . 0))
trace: |  (iter 4 1 5 (3 2 1 . 0))
trace: |  (iter 5 1 5 (4 3 2 1 . 0))
trace: |  (4 3 2 1 . 0)

There is a major problem and a few minor problems. Let us focus on the major problem which is that the result is reversed.

Let pair-reverse be a recursive procedure that accepts a pair, and an argument r to store the result.

(define (pair-reverse per r)

Procedure ‘pair-reverse’ checks that per is a pair. When argument per is a pair, pair-reverse calls itself.

(define (pair-reverse per r)
  (if (pair? per)
      (pair-reverse (cdr per) (cons (car per) r))
      r))

Here is the trace of pair-reverse.

> ,trace (pair-reverse (iter 1 1 5 0) 0)
trace: |  (pair-reverse (4 3 2 1 . 0) 0)
trace: |  (pair-reverse (3 2 1 . 0) (4 . 0))
trace: |  (pair-reverse (2 1 . 0) (3 4 . 0))
trace: |  (pair-reverse (1 . 0) (2 3 4 . 0))
trace: |  (pair-reverse 0 (1 2 3 4 . 0))
trace: |  (1 2 3 4 . 0)

Now all that is left is to combine iter and pair-reverse within a new procedure, with a suggestive name not-the-best-in-range.

Let not-the-best-in-range be a procedure that accepts two numbers, denoted by low and high.

(define (not-the-best-in-range low high)

Let iter be a recursive procedure that accepts number, denoted by x, and an argument r to store the result.

(define (not-the-best-in-range low high)
  (define (iter per r)

Procedure iter returns r, when x equals high. Otherwise, it calls itself.

(define (not-the-best-in-range low high)
  (define (iter x r)
    (if (equal? x high)
        r
        (iter (+ x 1) (cons x r))))

What is different from before is that iter no longer has low and high as arguments, since they are constant during recursion. We also need to include pair-reverse.

(define (not-the-best-in-range low high)
  (define (iter x r)
    (if (equal? x high)
        r
        (iter (+ x 1) (cons x r))))
  (define (pair-reverse per r)
    (if (pair? per)
        (pair-reverse (cdr per) (cons (car per) r))
        r))

And include the first call to iter without forgeting to reverse.

(define (not-the-best-in-range low high)
  (define (iter x r)
    (if (equal? x high)
        r
        (iter (+ x 1) (cons x r))))
  (define (pair-reverse per r)
    (if (pair? per)
        (pair-reverse (cdr per) (cons (car per) r))
        r))
  (pair-reverse (iter low 0) high))

Let us call not-the-best-in-range from one to five.

> (not-the-best-in-range 1 5)
(1 2 3 4 . 5)

Prev Index Next