Introduction to Programming, Aug-Dec 2008
Lecture 8, Wed 10 Sep 2008

Outermost reduction and infinite data structures
------------------------------------------------

Outermost reduction permits the definition of infinite data
structures.  For instance, the list of all integers starting at "n"
is given by the function

   listfrom n = n: (listfrom (n+1))

In the definition of listfrom, the outermost expression is the
one involving ":".  This is thus evaluated first, resulting in
the initial "n" being generated.  Haskell then tries to expand
"listfrom (n+1)" which, in turn, generates "n+1" and "listfrom
(n+2)" and so on.  Thus, the output of "listfrom m" is the
infinite list [m, m+1,...}  which is denoted [m..] in Haskell.

We can use infinite lists, for instance, to define in a natural way
the Sieve of Eratosthenes whose output is the (infinite) list of all
prime numbers.

   sieve (x:xs) = x : (sieve [ y <- xs | mod y x > 0])
   primes = sieve [2..]

The function sieve picks up the first number of its input list,
constructs a new list from its input by removing all multiples of
the first number and then recursively runs sieve on this list.

If we work out the reduction for this we get

  primes  ===> sieve [2..]
          ===> 2:(sieve [ y | y <- [3..] , mod y 2 > 0])
          ===> 2:(sieve (3:[y | y <- [4..], mod y 2 > 0])
          ===> 2:(3:(sieve [z | z <- 
                        (sieve [y | y <- [4..], mod y 2 > 0]) | 
                               mod z 3 > 0])
          ===> 2:(3:(5:(sieve [w | w <- 
                           (sieve [z | z <- 
                               (sieve [y | y <- [4..], mod y 2 > 0]) | 
                                      mod z 3 > 0]) | 
                                          mod w 5 > 0])
          ===> ...
      
Why is this useful?  It is often conceptually easier to define a
function that returns an infinite list and extract a finite prefix to
get a concrete value.

For instance, from primes we can derive functions such as

   nthprime k = primes !! k

that extracts the k'th prime number.

======================================================================

Using infinite lists
--------------------

We look at another example of why infinite lists are a useful
conceptual tool.  Suppose we have a set of cities with direct
flights from some of the cities to some of the other cities.  For
instance, we could represent this situation as follows:

         A ----> B
         |<-     |
         |  \    |
         |   \   v
         |    ---C
         v       |
         D       |
         ^\      |
         | \     |
         |  \    v
         F   --->E
          ------>

This kind of picture is called a graph.  'A', 'B' etc are called
nodes or vertices and the arrows between them are called edges.
More precisely, this is a directed graph because each edge is
oriented.  In some applications, it helps to use undirected
edges, which are not oriented.

We could represent the edges in this graph as a Haskell function,
as follows:

   edge :: Char -> Char -> Bool
   edge 'A' 'B' = True
   edge 'A' 'D' = True
   edge 'B' 'C' = True
   edge 'C' 'A' = True
   edge 'C' 'E' = True
   edge 'D' 'E' = True
   edge 'F' 'D' = True
   edge 'F' 'E' = True
   edge  _   _  = False

Suppose we now wish to compute the pairs of vertices that are
connected to each other.  Our goal is to construct a function

   connected :: Char -> Char -> Bool

such that connected x y is True if and only if there is a path
from x to y using the given set of edges.

Clearly edge x y implies connected x y.  In general, we can
define connected x y inductively:

   If connected x y and edge y z then connected x z

Unfortunately, we cannot translate this inductive definition of
connected in the usual direct fashion into a Haskell function.

Instead, we use the following idea.  We build up the set of paths
of inductively.  Initially we have paths of length 0 --- there is
only one.  Given a path of length k, we can extend it to a path
of length k+1 by adding an edge.  We represent a path as a list
of nodes.  Thus, we have

   type Path = [Char]

   extendpath :: Path -> [Path]
   extendpath p = [p++c | c <- ['A'..'F'], edge (last p) c]

Now, we can map extendpath over the list of paths of length k to
get the list of paths of length k+1.

   extendall :: [Path] -> [Path]
   extendall [] = [[c] | c <- ['A'..'F']
   extendall l  = concat [extend p | p <- l]
                = [ll | p <- l, ll <- extend p]

The base case of extendall constructs paths that consist of a
single node, the start node.  If we start with the singleton list
of empty paths [[]] and repeatedly apply extendall, we get lists
with longer and longer paths.

The next observation we have is that to check if x and y are
connected, we only need to check for paths without loops from x
to y --- that is, we can assume that the path from x to y does
not visit an intermediate node z twice.  If it did, we can excise
the loop from z to z and get a shorter path that serves our
purpose.  If we have n nodes overall, a loop free path can have
at most (n-1) edges.  If there are more than (n-1) edges in the
path, some node must repeat.

This suggests that to check all pairs of connected nodes, it is
sufficient to apply extendall n times to the initial list
containing the empty path.

Haskell has a builtin function iterate that works as follows:

   iterate :: (a -> a) -> a -> [a]

For instance

   iterate f x = [x, f x, f^2 x, ...]

We can now try

   iterate extendall [[]]

to generate the list we want.  We can then extract the first n
elements of this list (all paths of length upto n-1) as

   firstn = take n (iterate extendall [[]])

Now, for each path in this list, we extract the start and end
points.

   connectedpairs  = [(head p, last p) | l <- firstn, p <- l]
   where
     firstn = take n (iterate extendall [[]])

Finally

   connected x y = (elem (x,y) connectedpairs)

We have used the builtin function "elem x l" that checks if x
belongs to l.

Notice that we have not bothered about the fact that extendall
generates paths that loop and do other unproductive things.  For
instance, the path ['A','B','C','A','B','C'] belongs to the sixth
iteration of extendall [[]].  But, it does not matter.  All that
we want is a guarantee that every pair (x,y) that is connected is
enumerated by the nth step.

The relation connected that we computed above is the reflexive
and transitive closure of the relation edge.  In general, we can
use the strategy given above to compute the transitive closure of
any binary relation on a set.

Search problems
---------------

An important class of problems consists of search problems, where
there is no closed form for the solution and one must go through
a cycle of expanding out possible solutions and then undoing
partial solutions when we reach a dead end.

A classic problem of this sort is that of placing N queens on an
N x N chessboard such that no two queens attack each other.
Recall that two queens attack each other if they lie on the same
row, column or diagonal.

From the problem description, it is immediate that in any
solution to the problem, there is exactly one queen on each row
(and also on each column).  Thus, one strategy for solving the
problem is the following:

  - Place the first queen on the some square of the first row
  - In each succeeding row, place a queen at the leftmost square
    that is not attacked by any of the earlier queens

If we follow this strategy on an 8 x 8 board after placing the
first queen at the top left corner, after 7 moves, we arrive at
the following configuration.

       -------------------------------
      | Q |   |   |   |   |   |   |   |
       -------------------------------
      |   |   | Q |   |   |   |   |   |
       -------------------------------
      |   |   |   |   | Q |   |   |   |
       -------------------------------
      |   |   |   |   |   |   | Q |   |
       -------------------------------
      |   | Q |   |   |   |   |   |   |
       -------------------------------
      |   |   |   | Q |   |   |   |   |
       -------------------------------
      |   |   |   |   |   | Q |   |   |
       -------------------------------
      |   |   |   |   |   |   |   |   |
       -------------------------------

Now, we find that there is no valid position on the last row for
the 8th queen, so we have to abandon this solution and try
another one.  This can be done in a systematic way by retrying
the next possibility for the 7th queen and once again trying the
8th queen.  If all possibilities for the 7th queen fail, we go
back and try  the next possibility for the 6th queen.  In this
way, we work backwards and retry the previous move.  This
strategy is called backtracking.

We can represent an arrangement of queens as a list of integers,
where the first integer is the column number of the first queen
in the first row, the second integer is the column number of the
second queen in the second row, etc.  Thus, for instance, the
position drawn above corresponds to the list [1,3,5,7,2,4,6].

Given an arrangement of k queens, we can write a function that
computes all valid extensions of this arrangement to k+1 queens,
analogous to the function we wrote to extend paths of length k to
paths of length k+1.  We have to ensure that the new element we
add is not the same as any earlier entry (so that no two queens
are in the same column).   We also have to do some elementary
arithmetic to calculate that the new position is not on any
diagonal that is attacked by any of the previous positions.

As in the paths example, let us give the name "extendall" to the
function that computes all valid extensions of a list of
arrangements.  We can now solve the n queens problem by
repeatedly applying the function extendall to the empty
arrangement and picking up the values generated after the nth
application.  The following function computes all possible
arrangements of n queens on an n x n board.

    queens n = (iterate extend [[]])!!(n+1)

And, the following returns just one such arrangement --- the
first one that is generated.

    queensone n = head ((iterate extend [])!!(n+1))


Notice that some of the positions after k iterations may have no
valid extensions (like the arrangment of 7 queens above).  This
does not matter.  If, at some stage, all arrangements die out as
infeasible, we will get the value [] consisting of no valid
arrangements (as opposed to [[]], the list consisting of the
empty arrangement) which will just repeat itself indefinitely.

The best way to visualize the search for  solutions to the N
queens problem is to draw a tree in which the root is the initial
set of arrangements and every node has as its children its valid
extensions.

                                  []
                                   |
              -----------------------------------------
             |     |     |     |     |     |     |     |
            [1]   [2]   [3]   [4]   [5]   [6]   [7]   [8]
             |     |     |     |     |     |     |     |
             |    ...   ...   ...   ...   ...   ...   ...
             |
    -----------------------------
   |     |     |     |     |     |
 [1,3] [1,4] [1,5] [1,6] [1,7] [1,8]
   |     |     |     |     |     |
   |    ...   ...   ...   ...   ...
   |
   -----------------------
  |       |       |       |
[1,3,5] [1,3,6] [1,3,7] [1,3,8]
   |    ...   ...   ...   ...
   |
   --------
  |        |       
[1,3,5,7] [1,3,5,8]
  |        |
 ...      ...

The arrangements at  ninth level of this tree are the solutions
that we are looking for.  Superficially, it appears that the
iterative form in which we described the search for solutions is
an inefficient way to find just one solution.  It seems that to 
find the solution, we have to compute all values in this tree one
level at a time.  However, the lazy rewriting strategy of Haskell
will ensure that this is not the case.  In fact, Haskell will
expand arrangements along the leftmost path in this tree, as
shown in the picture above.  If this path is unsuccessful, it
will go back and explore the closest successor that has not been
tried before.  Thus, lazy evaluation ensures that this tree is
examined in a "depth first" fashion rather than a "breadth first"
fashion, which would be very inefficient.

======================================================================