Introduction to Programming, Aug-Dec 2008
Lecture 5, Mon 25 Aug 2008

Notation for lists
------------------

Haskell provides some convenient notation for lists consisting of
a continuous sequence of values.  The list [m..n] is the list of
elements [m,m+1,m+2,...,n].   If n < m, this list is defined to
be empty.  Thus, for example,

   [1..7] = [1,2,3,4,5,6,7]
   [3..3] = [3]
   [4..3] = []

The starting and ending values can be variables.  

An extension of this notation can be used to define arithmetic
progressions, and, as a special case, lists in descending order.
An arithmetic progression is specified by the first two values
(this defines the separation between elements in the sequence)
and an upper bound.  For instance,

  [1,3..8] = [1,3,5,7]
  [2,5..19] = [2,5,8,11,14,17]

Notice that the upper bound may not actually be part of the list
being defined.  The rule is that we keep inserting elements with
the separation defined by the first two elements so long as we do
not cross the upper bound.  

The difference between the first two elements can be negative, in
which case we get a list in descending order.

  [8,7..5] = [8,7,6,5]
  [12,8..-9] = [12,8,4,0,-4,-8]

This notation works for any basic type that can be enumerated ---
that is, for each value in the type, there is a well defined
"next" value.  This is true for Char and Bool.  Bool is a
somewhat trivial case in which the two values are arranged so
that False is less than True.  Char is more interesting --- the
next character is the one with the next ord value in the table
representing characters.  As we saw, we can assume that
'a','b',..,'z' form a contiguous sequence in the table.
Likewise, the capital letters 'A','B',...,'Z' form a contiguous
sequence as do the digits '0','1',...,'9'.  Thus, we can write
lists such as

  ['a'..'f']     = ['a','b','c','d','e','f'] = "abcdef"
  ['Z','X'..'S'] = ['Z','X','V','T']         = "ZXVT"

List comprehension: combining map and filter
--------------------------------------------

In set theory, we can build new sets from old sets using notation
called "set comprehension".  For instance, given the set  of
integers {1,2,..,m}, we can define the set of squares of the even
numbers in this set as

        { n^2 | n in {1,2,..,m}, even(n)}

where even(n) is a predicate that evaluates to true precisely
when n is even.

Analogously, we can build new lists from old lists using "list
comprehension".  For instance, the list of squares of all the
even numbers from 1 to m is given by

       [ n^2 | n <- [1..m], iseven n ]

where iseven is the function we wrote earlier to check if n is
even. 

The notation "<-" is supposed to look like the "element of"
notation from set theory.  The way this expression is interpreted
is as follows:

   For each n in the list [1..m], if "iseven n" is true, append n^2
   to the output

The first part, n <- [1..m], is referred to as the generator,
which supplies the initial list of values to be operated on.  The
function "iseven n" serves to filter the list while "n^2" on the
left of the "|" is a function that is mapped onto all elements
that survive the filter.  We shall examine the precise
relationship between list comprehension and map/filter later.
For now, let us consider an illustrative example.

We first write a function to compute the list of divisors of a
number n.

   divisors n = [ m | m <- [1..n], mod n m == 0 ]

This says that the divisors of n are precisely those numbers
between 1 and n such that there is no remainder when n is divided
by the number.

We can now, for instance, write a function that checks that n is
a prime number, as follows:

   prime n = (divisors n == [1,n])

In other words, n is a prime if its list of divisors is precisely
[1,n].  Notice that 1 (correctly) fails to be a prime under this
criterion because divisors 1 = [1] which is not the same as
[1,1].

Multiple generators
-------------------

We can use more than one generator in a list comprehension.  Here
is an example

  triples n = [ (x,y,z) | x <- [1..n], y <- [1..n], z <- [1..n] ]

If there are multiple generators, later generators move faster
than earlier ones, so the output of triples n would be

  [(1,1,1),(1,1,2),...,(1,1,n),(1,2,1),(1,2,2),...,
   (1,2,n),...,(n,1,1),(n,1,2),...,(n,n,n)]

Later generators can depend on earlier values.  Thus

  triples n = [ (x,y,z) | x <- [1..n], y <- [x..n], z <- [y..n] ]

is the same as

  triples n = [ (x,y,z) | x <- [1..n], y <- [1..n], z <- [1..n],
                          x <= y && y <= z ]

A generator can use any expression that puts out a list.  For
instance, we can run a generator over the output of triples as
follows to generate the set of all Pythagorean triples with
entries less than or equal to n.

  pythagoras n = [(x,y,z) | (x,y,z) <- triples n,
                               x^2 + y^2 == z^2 ]


List comprehension
------------------

Translating list comprehensions
-------------------------------

Though list comprehension notation is extremely convenient, it is
important to recognize that it is only a way of combining map and
filter in a more readable format.

In general, a list comprehension is of the form [ e | Q ] where e
is an expression and Q is a list of generators and conditions.
Roughly speaking, the expressions in e correspond to functions
that have to be mapped over the lists specified by the generators
in Q after filtering out elements according to the conditions in
Q.

We can formally translate list comprehensions by induction on Q.
Here is a first attempt to translate list comprehension using map
and filter:

  [ e | x <- xs, Q] = map f xs where f x = [ e | Q ]
  [ e | p, Q ]      = if p then [ e | Q ] else []

Here's how this translation would work:

  [ n^2 | n <- [1..7], mod n 2 == 0 ]
  ==> map f [1..7] where f n = [ n^2 | mod n 2 == 0]
  ==> map f [1..7] where f n = if (mod n 2 == 0) then [n^2] else []
  ==> [[],[4],[],[16],[],[36],[]]

As we can see, something has gone wrong --- there is an extra
level of brackets in the output.  To get rid of this we need the
function concat, which dissolves one level of brackets in a list.
For example:

  concat [[1,2,3],[4,5],[6]] = [1,2,3,4,5,6]
  concat [[],[4],[],[16],[],[36],[]] = [4,16,36]

Now, we can give a correct translation of list comprehension in
terms of map, filter and concat.

  [ e | x <- xs, Q] = concat map f xs where f x = [ e | Q ]
  [ e | p, Q ]      = if p then [ e | Q ] else []

The previous example that we worked out now becomes:

  [ n^2 | n <- [1..7], mod n 2 == 0 ]
  ==> concat map f [1..7] where f n = [ n | mod n 2 == 0]
  ==> concat map f [1..7] where
        f n = if (mod n 2 == 0) then [n^2] else []
  ==> concat [[],[4],[],[16],[],[36],[]]
  ==> [4,16,36]

Operators vs binary functions
-----------------------------

We have seen that some builtin arithmetic operations are defined
as functions --- for instance, div and mod.  These are applied to
their arguments like other functions we write in Haskell, by
placing the arguments after the function name.  For instance, we
have to write "div 7 2" and "mod m 3" rather than "7 div 2" or 
"m mod 3" though the second form is more natural to read.

On the other hand, we have operations such as + and * which
always occur between their operands (this is called infix
notation). 

For binary operations, it turns out that we freely convert
functions into operators and vice versa.  Any function can be
used as an infix operator by enclosing it in backquotes (`).
Thus, we can write

   7 `div` 2
   m `mod` 3

In the other direction, any infix operator can be used as a
function written before its arguments by enclosing it in
parenthesis.  Thus, we can write

   (+) 7 2   for  7+2
   (*) 3 5   for  3*5

Normally, when we define functions in Haskell, we use names that
start with a lower case letter and have only letters and digits
in the name.  We can also define binary functions whose names are made
up entirely of punctuation marks (with some restrictions).  Such
functions can be used as operators.  To define the type of these
functions, we need the () notation earlier.  For instance, we
could define an operator *** such that m *** n = (m+n)^3 as
follows:

(***) :: Int -> Int -> Int
(***) m n = (m+n)^3

Alternatively, we can use the operator form in the definition of
the function (but not in the definition of the type!)

(***) :: Int -> Int -> Int
m *** n = (m+n)^3


Some useful functions defined on lists
--------------------------------------

  sum xs   -- adds up the elements in xs for a list of numbers
  and xs   -- evaluates to True over xs of type [Bool] if all
              elements in xs [Bool]  are True.  In particular,
              and [] = True because for every x in [] (there
              aren't any!), x is True


The function zip
----------------

zip combines two lists into a list of pairs.  For instance,

  zip [1,2] ['a','b'] = [(1,'a'),(2,'b')]

If one list is longer than the other one, zip stops constructing
pairs when the shorter list runs out.  Thus,

  zip [1..4] ['a','b','c'] = [(1,'a'),(2,'b'),(3,'c')]

Here are a couple of examples that use zip.

Example 1 (nondecreasing)
-------------------------

Suppose we want to write a function

  nondecreasing :: [Int] -> Bool

such that nodecreasing [x_1,x_2,...,x_n] = True if
      x_1 <= x_2 && x_2 <= x_3 && ... && x_n-1 <= x_n

We can write an inductive definition of nondecreasing.

  nondecreasing [] = True
  nondecreasing [x] = True
  nondecreasing (x:y:ys) = (x <= y) && (nondecreasing (y:ys))

Another way to do this is to observe that we are checking

              x_1 <= x_2
        &&    x_2 <= x_3
        ..
        &&    x_n-1 <= x_n

If we pair up xs with (tail xs) we get the list of pairs
[(x_1,x_2), (x_2,x_3), ... , (x_n-1,x_n)]

We can thus write nondecreasing as follows:

  nondecreasing xs = and [x <= y | (x,y) <- zip xs (tail xs) ]

As we saw earlier, the function and returns True provided x <= y
for each pair (x,y) in the output of zip.  

What happens in the case "nondecreasing []"?  For the empty list
[], tail is not defined and hence this expression should fail.
However, recall that Haskell uses a lazy evaluation strategy.  To
construct "zip l1 l2",  Haskell proceeds as follows:  

-  Extract the head x1 of l1, if any.  If l1 has no more elements,
   quit. 
-  Extract the tail x2 of l2, if any.  If l2 has no more elements,
   quit. 
-  Add (x1,x2) to the output of zip and repeat this process with
   (tail l1) and (tail l2).

Thus, if l1 is [], zip quits with an empty list as output,
without evaluating l2.  This means that the invalid function
"tail []" is never evaluated in the case when we compute
"nondecreasing []".

It is matter of taste whether the definition of nondecreasing in
terms of zip is better than the original one.  For someone not
very familiar with Haskell, the first definition is more direct,
but the second would be preferred by someone who knows the
language.

Example 2 (position)
--------------------

Recall our old exercise:


    Write a function position :: Char -> String -> Int such that

      position c s

    returns the first position in s where c occurs and returns -1
    if c does not occur in s.  Note that a valid answer for this
    function must be either -1 or a number in the range
    {0,1,...,length s - 1}.


It is not difficult to write a function that
merely checks whether c occurs in s.  


    exists :: Char -> String -> Bool
    exists c "" = False
    exists c (x:xs) |
      | c == x    = True
      | otherwise = exists c xs

The problem is to identify the position at which we have found
the first occurrence of c in s, because this information is not
directly available in the inductive definition of exists.

The solution is to write an auxiliary function that takes three
parameters, c, s and n, where n is the current position that we
are examining in s.  Initially, we call this auxiliary function
with n set to 0.  When we inductively examine (tail s), we
increment n to n+1.  If we find c at (head s) we return n.  If s
becomes empty, we return -1, regardless of the value of n.  Here
is a complete definition of position in terms of this auxiliary
function.

  position :: Char -> String -> Int
  position c s = auxpos c s 0
    where
    auxpos :: Char -> String -> Int -> Int
    auxpos c "" n = -1
    auxpos c (x:xs) n 
      | c == x    = n
      | otherwise = auxpos c xs (n+1)

Another approach is to tag each element of xs with its position.
This can be achieved using zip as follows:

  zip xs [0..(length xs)] 

Notice that the last position is actually (length xs) - 1 but zip
will automatically discard the spurious value.

Now, we can extract the list of all positions where x occurs in
xs

  allpos x xs = [ i | (y,i) <- zip xs [0..(length xs)], x == y ]

If (allpos x xs) is nonempty, head (allpos x xs) gives us the
value that we seek.  What if (allpos x xs) is empty?  We have to
return -1 in this case.  We can do this by tagging on -1 at the
end of (allpos x xs).  Thus, if (allpos x xs) is empty head
((allpos x xs) ++ [-1]) returns -1, otherwise it returns the
first real position where x occurs in xs.  Putting all this
together, we get

  position x xs = head ([ i | (y,i) <- zip xs [0..(length xs)],
                              x == y ]
                        ++ [-1])


======================================================================