Introduction to Programming, Aug-Nov 2008
Lecture 21, Monday 17 Nov 2008

Arrays
------

The last abstract datatype we shall consider is the array.  We
can think of an array as a fixed sequence of "cells", each
position labelled by an index, where we can access each cell
directly by specifying the index value.

   index->   1   2   3   4   5   6   7   8  
            -------------------------------
   value-> | a | c | f | i | j | k | a | d |
            -------------------------------

The values stored in an array are all of a uniform type.  An
array is like a list with two important differences:

1. Normally, arrays have a fixed number of elements.

2. The time taken to access the i^th element of an array is
   independent of i.  (Recall that this takes time O(i) in a
   list). 

Arrays are built-in to Haskell, so we shall first look at how to
use arrays in Haskell, and then consider how this datatype may be
implemented in an efficient manner.

To use arrays in Haskell, you have to first import the module
Array as follows:

  import Array

This makes available the abstract datatype 

  Array a b

Notice that Array has two types associated with it.  The first
type variable, a, refers to the type of the index variable, while
the second type variable, b, refers to the type of the data
stored in the array.

The index variable must belong to the type class Ix, which is a
subclass of Ord a.  Intuitively, Ix consists of types which have
a total order but which are also discrete, so that one can
enumerate all values that list between a lower bound and upper
bound.  Thus, Bool, Char and Int would belong to Ix but not
Float.  Also, if t1, t2, ..., tk belong to Ix, the tuple
(t1,t2,...,tn) belongs to Ix.

Recall that tuples are ordered lexicographically, or in
dictionary order.  Thus, if we have index values from (2,3) to
(3,5), they will be listed in order as

  [(2,3),(2,4),(2,5),(3,3,(3,4),(3,5)]

Formally, the class Ix is defined as follows:

class  (Ord a) => Ix a  where
  range       :: (a,a) -> [a]
  index       :: (a,a) a -> Int
  inRange     :: (a,a) -> a -> Bool

The function range takes a lower and upper bound of values and
returns a list of all values between these bounds.  The function
index returns the position of a specific value in the list of
values between a lower and upper bound.  Finally, inRange
specifies whether a given index lies within a given bound.

For instance:

  range ((2,3),(3,5)) = [(2,3),(2,4),(2,5),(3,3,(3,4),(3,5)]
  index ((2,3),(3,5)) (2,5) = 2
  inRange ((2,3),(3,5)) (3,3) = True
  inRange ((2,3),(3,5)) (3,2) = False

In what follows, we will not bother too much about the details of
the class Ix.  We will be content to use types such as Int and
(Int,Int) as index types of arrays.

It is best to regard an array as a collection of (index,value)
pairs.  Such a list of (index,value) pairs is often called an
"association list".

Creating arrays
---------------

Haskell supports three functions to construct arrays:


1. array :: (Ix a) => (a,a) -> [(a,b)] -> Array a b

   As the type suggests, the function takes a pair of index
   values, the lower and upper bounds of the indices, and an
   association list, in any order, to generate an Array.

   Here are some examples:

     squares = array (1,4) [(2,4),(3,9),(1,1),(4,16)]
 
     squares = array (1,4) [(i,i*i) | i <- [1..4]]

     somesquares = array (1,4) [(2,4),(1,1),(4,16)]

   The first example explicitly lists out the (index,value) pairs
   in the array, in a random order.  The second example provides
   the same list using list comprehension.  The third example
   shows that not all (index,value) pairs need to be defined.  If
   an index is repeated in the list, the value at that index is
   undefined.  If an index value is out of range, the entire
   array is undefined.

   Here is an example of an array whose indices are pairs of
   integers such that the entry at (i,j) is i+j.

     sample = array ((2,3),(3,5)) [((i,j),i+j) | i <- [2..3], 
                                                 j <- [3..5] ]


2. listarray :: (Ix a) => (a,a) -> [b] -> Array a b

   In this form, all array values are entered in the correct
   order of index.  For example:

     squares = listarray (1,4) [1,4,9,16]
     sample = listarray ((2,3),(3,5)) [5,6,7,6,7,8]

   Using the range function on Ix defined above, we can define
   listarray in terms of array as follows:

     listarray bounds values = array bounds (zip (range bounds) values)

3. The final form is allows us to accumulate values in an array.
   The general idea is to extend the "array" function with the
   ability to provide multiple values for each index, with an
   explicit function for combining these multiple values.

   The function is called accumArray.  It takes four arguments:

   accumArray accumulate-function initial-value index-bounds index-value-list

   As an illustrative example, suppose we have a list of
   observations and want to add up all the observations with the
   same index value to build a "histogram".  Here is how we could
   do it

    histogram = accumArray (+) 0 (lower,upper) observations

   where each value in the histogram for indices in the range
   (lower,upper) is initialized to 0, observations is list of
   (index,value) pairs and each such pair is combined with the
   existing value at that index using the function +.

Extracting and updating values
------------------------------

 1. The value at index i in array arr is denoted arr!i

 2. We can update an array by providing fresh (index,value) pairs
    with the operator //

    For instance, if we write

      squares = (listarray (1,4) [1,4,9,16])//[(1,3),(3,7)]

    the resulting array corresponds to the association list

      [(1,3),(2,4),(3,7),(4,16)]
 
    As with the "array" function for creating arrays, it is an
    error to give duplicate index values in the list when
    updating an array --- if this happens, the value at that
    index becomes undefined.

Recovering information from an array
------------------------------------

Given an unknown array, we can recover information about its
indices and values:

 1. bounds :: (Ix a) => (Array a b) -> (a,a)

    bounds returns the lower and upper bounds of the indices of
    the array.

    Examples (from the arrays described above):
 
      bounds square = (1,4)   
      bounds sample = ((2,3),(3,5))

 2. indices :: (Ix a) => (Array a b) -> [a]

    indices returns the list of indices of the array.  Since
    range applied to a pair of indices returns the list of values
    between the two indices, we have
    
      indices = range . bounds

    Examples (from the arrays described above):
 
      indices square = [1,2,3,4]
      bounds sample = [(2,3),(2,4),(2,5),(3,3),(3,4),(3,5)]

 3. elems :: (Ix a) => (Array a b) -> [b]

    elems returns the list of values in the array, in order of
    the indices.

      elems arr = [ arr!i | i <- indices arr ]

    Examples (from the arrays described above):
 
      elems square = [1,4,9,16]
      elems sample = [5,6,7,6,7,8]
    
 4. assocs :: (Ix a) => (Array a b) -> [(a,b)]

    assocs returns the contents of the array as a list of
    (index,value) pairs.

    Examples (from the arrays described above):
 
      assocs square = [(1,1),(2,4),(3,9),(4,16)]
      elems sample = [((2,3),5),((2,4),6),((2,5),7),
                       ((3,3),6),((3,4),7),((3,5),8)]
    

An illustration: Matrix multiplication
--------------------------------------

To illustrate the ideas above, let us consider a function to
multiply two matrices.  We want to define a function

  matmult :: (Num b) => (Array ((Int,Int),(Int,Int)) b) -> 
                        (Array ((Int,Int),(Int,Int)) b) -> 
                        (Array ((Int,Int),(Int,Int)) b)

  Suppose we have a matrix A with rA rows, cA columns and matrix
  B with rB rows and cB colums.  We can multiply A by B to get a
  matrix C with rA rows and cB columns provide cA = rB.  We have
  the following formula to compute the (i,j)th entry in C:

    C(i,j) =   sum    A(i,k)*B(k,j)
             1<=k<=cA

  We can then write matmult as follows:

  matmult arra arrb =
    | (uca-lca) == (urb-lrb) =  
          array cbounds [((i,j), val i j | (i,j) <- range cbounds ] 
    where
      ((lra,lca),(ura,uca)) = bounds arra
      ((lrb,lcb),(urb,ucb)) = bounds arrb
      lrc = 1
      urc = (ura-lra)+1
      lcc = 1
      ucc = (ucb-lcb)+1
      cbounds = ((lrc,lcc),(urc,ucc))

      val :: Int -> Int -> b
      val i j = 
        sum  [arra!(lra+i-1,lca+k-1)*arrb!(lrb+k-1,lcb+j-1) |
                    k <- [1..(uca-lca)+1] ]

   We could also use accumArray to eliminate the function val as
   follows: 

  matmult arra arrb =
    | (uca-lca) == (urb-lrb) =  
          accumArray (+) 0 cbounds 
             [((i,j), arra!(lra+i-1,lca+k-1)*arrb!(lrb+k-1,lcb+j-1) |
                        (i,j) <- range cbounds, k <- [1..(uca-lca)+1] ]
    where
      ((lra,lca),(ura,uca)) = bounds arra
      ((lrb,lcb),(urb,ucb)) = bounds arrb
      lrc = 1
      urc = (ura-lra)+1
      lcc = 1
      ucc = (ucb-lcb)+1
      cbounds = ((lrc,lcc),(urc,ucc))