Introduction to Programming, Aug-Dec 2008
Lecture 6, Wed 27 Aug 2008

Folding functions through a list
--------------------------------

Consider the following inductive definitions over lists:

  sum :: [Int] -> Int
  sum [] = 0
  sum (x:xs) = x + (sum xs)

  and :: [Bool] -> Bool
  and [] = True
  and (x:xs) = x && (and xs)

  concat :: [[a]] -> [a]
  concat [] = []
  concat (x:xs) = x ++ (concat xs)

All of these have the general form:

  f [] = y
  f (x:xs) = x op (f xs)

Thus, given a binary function op and an initial value y, we can
evaluate these functions from right to left as follows

     x_1   x_2   .  .  .  x_n-1  x_n  y
      |                     |     \  /
      |                     |      op
      |                     |       |
      |                     |      y_n
      |                      \    /
      |                        op
      |                        |
      |                       y_n-1
      |     .  .  .      . . .
      |
      |       y_2
       \     /
         op
          |
         y_1

where y_1 is the final value to be returned.

As can be seen from the figure, we are "folding" the binary
function op from right to left through the list.  Since op is a
binary function, we have to provide a second argument y to
combine with the last element x_n.  After this, each intermediate
value is used as the right argument to op and the next element of
the list is taken as the left argument.

This folding operation is described by the Haskell function
foldr, which takes three inputs: the function to be folded, the
initial value, and the list to be folded.

Thus, we can write

   sum xs = foldr (+) 0 xs
   and xs = foldr (&&) True xs
   concat xs = foldr (++) [] xs

More concisely, since xs is the rightmost argument on both sides,
we can write:

   sum = foldr (+) 0
   and = foldr (&&) True
   concat = foldr (++) []

What is the type of foldr?  In the three examples above, the
function passed to foldr had the same types for both its inputs
and its output.  However, there is no reason for this.  Suppose
that the two inputs to the function op are of types a and b,
respectively.

Initially, we have op x_n y = y_n-1, where x_n is of type a and y
is of type b.  At the next step, we have to apply op to x_n-1 and
y_n-1.  Since the second argument to op must be of type b, this
constrains y_n-1 to be of type b.  In other words, we have the
following general types:

   The function passed to foldr :: a->b->b
   The initial value :: b
   The input list :: [a]
   The output value :: b

So, the type of foldr is

   foldr :: (a->b->b) -> b -> [a] -> b

The function foldr itself can be defined inductively:

   foldr f y [] = y
   foldr f y (x:xs) = f x (foldr f y xs)

Notice that folding from the right is a natural consequence of
inductively decomposing lists from the left.  This is the
efficient way to decompose lists because of the internal
representation of lists in Haskell.

foldl
-----

We can define a symmetric function, foldl, that folds a function
f from left to right through a list.
Thus, given a binary function op and an initial value y, we can
evaluate op functions from left to right  as follows.

    y   x_1   x_2   . . .  x_n
     \  /      |            |
      op       |            |
       |       |            |
      y_1     /             |
         \   /              |
          op                |
           |                |
          y_2               |
             \              |
               ...          |
                    \       |
                y_(n-1)    /
                      \   /
                       op
                        |
                       y_n

where y_n is the final value to be returned.

To illustrate the use of foldl, we revisit the problem of
converting a string to a number.  The function we want to
construct is

  strtonum :: String -> Int

The convention we adopt is that the characters '0', '1', ..., '9'
denote the number 0,1,...,9 and all other characters are
interpredted as 0.  Thus strtonum "138" should return 138 and
strtonum "1ab9" should return 1009.

We begin with a function chartonum that converts a single
characte to a single digit integer.  One way to write this
function is to explicitly match the relevant characters '0', '1',
..., '9' and assign 0 to all other values of the input.

  chartonum :: Char -> Int
  chartonum '0' = 0
  chartonum '1' = 1
  ...
  chartonum '9' = 9
  chartonum  x  = 0

Alternatively, we could have made use of the fact that '0' to '9'
are consecutive in the internal table representing characters and
written

  chartonum c
    | (c >= '0' && c <= '9') = 0 + (ord c - ord '0')
    | otherwise              = 0

Now, to convert a String to an Int, we start from the left and
inductively build up a number.  At each step, we multiply the
number we have so far by 10 and add the next digit.

Pictorially, we have

    0   d_1   d_2   . . .  d_n
    \   /      |            |
     \ /       |            |
 k_1=10*0+d_1 /             |
       \     /              |
   k_2=10*k_1+d_2           |
            \               |
              ...           |
                  \         |
                 k_n-1     /
                     \    /
                k_n=10*k_n-1+d_n


The operation that combines the partially constructed number with
the next digit is the following:

  combine :: Int -> Char -> Int
  combine n c = 10*n + (chartonum c)

We can now write strtonum as

  strtonum = foldl combine 0

Observe that the type of foldl is the following:

  foldl :: (b->a->b) -> b -> [a] -> b

The difference is in the type of the function f passed to foldl;
since we start from the left, the type of f is (b->a->b) and not
(a->b->b) as in foldr, where we start from the right.

Can we define foldl inductively?  A naive definition would be the
following:

  foldl :: (b->a->b) -> b -> [a] -> b
  foldl f 0 [] = 0
  foldl f 0 l  = f (foldl f 0 (init l)) (last l)

However, this is not an efficient definition because computing
init l and last l takes time proportional to the length of the
list.  

Exercise:

  Devise a more efficient definition for foldl.  Hint: Use an
  auxiliary function that explicitly maintains the incremental
  value that is being computed.

Folding on nonempty lists
-------------------------

We wrote the function sum as 

  foldr (+) 0

If we write a corresponding function to compute the product of
all elements of a list, it would be

  foldr (*) 1

This would work fine for nonempty lists but would give the
somewhat counterintuitive answer that the product of an empty
list is 1.

To consider another example, suppose we want to find the maximum
value in a list.  Intuitively, this corresponds to folding the
builtin function max through the list.  Once again, there is a
problem defining the maximum value of an empty list.  Moreover,
unlike the case of product, it is not even clear how to define
this function --- the default value we supply must be smaller
than any value actually in the list, which means we have to rely
on the underlying system.

To get around these difficulties, Haskell provides the functions
foldr1 and foldl1 that work exactly like foldr and foldl,
respectively, but only for nonempty lists.  If the list has only
one element, no folding is done and that element is returned.  If
the list has two or more elements, the function supplied to
foldr1 (or fold1) is be folded through the list beginning with
the last two (or first two) elementds in the list.

Here is an inductive definition of foldr1:

  foldr1 :: (a->a->a) -> [a] -> a
  foldr1 f [x] = x
  foldr1 f [x,y] = f x y
  foldr1 f (x:y:ys) = f x (foldr1 f (y:ys))

Notice that the function passed to foldr1 has type a->a->a,
unlike the type a->b->b of the function passed to foldr.  This is
because both arguments to f come from the list and the answer
must again be of the same type.

Given this, we can define the functions product of a list and
maximum value of a list in terms of foldr1 (and foldl1).

  product = foldr1 (*) = foldl1 (*)
  maxlist = foldr1 max = foldl1 max

======================================================================

Accumulating intermediate values : scanl and scanr
--------------------------------------------------

As we have seen, the function foldl folds a function f through a
list and produces a single final value.  Pictorially, we had

    y   x_1   x_2   . . .  x_n
     \  /      |            |
      op       |            |
       |       |            |
      y_1     /             |
         \   /              |
          op                |
           |                |
          y_2               |
             \              |
               ...          |
                    \       |
                y_(n-1)    /
                      \   /
                       op
                        |
                       y_n


The computation of y_n involves generating the intermediate
values y, y_1, ..., y_(n-1).  These correspond to "partial"
answers of foldl for prefixes of the list.  Sometimes, these
partial answers are also interesting and they can be returned
with extra effort.  The Haskell function scanl achieves this.  In
other words,

  scanl f m l = [y,y_1,...,y_n]

where [y,y_1,...,y_n] is the list of intermediate values
generated when computing

  foldl f m l

Thus, we have

  scanl (+) 0 [1..n] = [0,1,(1+2),....,(1+2+..+n)]
  scanl (*) 1 [1..n] = [1!,2!, 3!,...,n!]

Symmetrically, scanr returns the partial values generated when
evaluating foldr.  Recall that the picture for foldr was the
following:

     x_1   x_2   .  .  .  x_n-1  x_n  y
      |                     |     \  /
      |                     |      op
      |                     |       |
      |                     |      y_n
      |                      \    /
      |                        op
      |                        |
      |                       y_n-1
      |     .  .  .      . . .
      |
      |       y_2
       \     /
         op
          |
         y_1

Thus, 

  scanr f m l = [y_1,...,y_n,y]

where [y_1,...,y_n,y] is the list of intermediate values
generated when computing

  foldr f m l

Notice that the output of scanl is the list of foldl values for
longer and longer prefixes of l while the output of scanr is the
list of foldr values for shorter and shorter suffixes of l.

======================================================================

Combinatorial functions on lists
--------------------------------

We now look at some combinatorial functions on lists.  All of
these can be defined inductively in terms of the structure of the
list.  However, we will also see that we can use alternative
notation to simplify these definition.

Initial segments
----------------

We begin with the function initsegs that lists out the initial
segments of a list.  An initial segment of a list is a prefix ---
that is, a sublist that includes the first k elements of l for
some k.

The idea is straightforward.  The smallest initial segment of a
list is the empty list.  For a list of the form (x:xs), the
initial segments can be obtained by inserting an x at the head of
each initial segments of xs (and explicitly adding a fresh empty
initial segment).  Each initial segment is itself a list, so
initsegs returns a list of lists.

Here is an inductive definition of initsegs:

  initsegs :: [a] -> [[a]]
  initsegs [] = [[]]
  initsegs (x:xs) = [[]] ++ [x:l | l <- initsegs xs]

An alternative, "purer", definition is

  initsegs :: [a] -> [[a]]
  initsegs [] = [[]]
  initsegs (x:xs) = [[]] ++ map (x:) (initsegs xs)

Notice that we can generate a much more direct definition in
terms of take.

  initsegs l = [take n l | n <- [0..length l]]

One application of initsegs is to define scanl.  scanl can be
seen as repeated application of foldl on each initial segment of
the givn list.  In other words

  scanl f a l = map (foldl f a) (initsegs l)

or

  scanl f a l = [foldl f a ll | ll <- initsegs l]


All permutations of a list
--------------------------

Our next task is to generate a function that lists out all
permutations of a list.  An inductive definition would require
defining "permutations (x:xs)" in terms of "permutations xs".
The logical way to lift "permutations xs" to "permutations
(x:xs)" is to insert x in each possible position within each
permutation of xs.  

We begin with a function "interleave" which achieves the task of
inserting a value in every possible position of a list. 

  interleave :: a -> [a] -> [a]
  interleave x [] = [[x]]
  interleave x (y:ys) = [x:y:ys] + map (y:) (interleave x ys)

Alternatively, we have

  interleave x l = [(take n l) ++ [x] ++ (drop n l) | n <- [0..(length l) -1]]

Now, we can define

  permutations :: [a] -> [[a]]
  permutations [] = [[]]
  permutations (x:xs) = [ zs | ys <- permutations xs ; zs <- interleave x ys ]

If we did not use list comprehension, we would have something
like

  permutations (x:xs) = concat  (map (interleave x) (permutations xs))

Notice the need for a concat to remove an extra level of list
brackets.  Even with list comprehension, if we move the
interleave to the left hand side, we have to add a concat.

  permutations (x:xs) = concat [ interleave x ys | ys <- permutations xs]

Partitions of a list
--------------------

A collection of nonempty lists l1, l2,...,lk is said to be a
partition of the list l if l == l1 ++ l2 ++ ... ++ lk.  For
instance [[1],[2,3],[4,5,6]] is a partition of [1..6].
  
We would like to write a Haskell function "partitions" that takes
as input a list l and returns all the partitions of l.

Notice that each partition is itself written as a list of lists.
Thus, the function partitions will return a list of (list of
lists).

  partitions :: [a] -> [[[a]]]

Since we are interested only in nonempty partitions, the base
case is for the singleton list.

  partitions [x] = [[[x]]]

For the inductive case, we observe that if we have a set of
partitions for xs, the partitions of (x:xs) are either those in
which x is added to the first component of some partition of xs
or those in which [x] is added as an additional component to some
partition of xs.  Thus


  partitions (x:xs) =    [(x:head l):(tail l)  | l <- parts xs]
                      ++ [[x]:l | l <- parts xs]

======================================================================