Introduction to Programming, Aug-Dec 2006 Lecture 18, Thursday 02 Nov 2006 Lecture 19, Tuesday 07 Nov 2006 Arrays ------ The last abstract datatype we shall consider is the array. We can think of an array as a fixed sequence of "cells", each position labelled by an index, where we can access each cell directly by specifying the index value. index-> 1 2 3 4 5 6 7 8 ------------------------------- value-> | a | c | f | i | j | k | a | d | ------------------------------- The values stored in an array are all of a uniform type. An array is like a list with two important differences: 1. Normally, arrays have a fixed number of elements. 2. The time taken to access the i^th element of an array is independent of i. (Recall that this takes time O(i) in a list). Arrays are built-in to Haskell, so we shall first look at how to use arrays in Haskell, and then consider how this datatype may be implemented in an efficient manner. To use arrays in Haskell, you have to first import the module Array as follows: import Array This makes available the abstract datatype Array a b Notice that Array has two types associated with it. The first type variable, a, refers to the type of the index variable, while the second type variable, b, refers to the type of the data stored in the array. The index variable must belong to the type class Ix, which is a subclass of Ord a. Intuitively, Ix consists of types which have a total order but which are also discrete, so that one can enumerate all values that list between a lower bound and upper bound. Thus, Bool, Char and Int would belong to Ix but not Float. Also, if t1, t2, ..., tk belong to Ix, the tuple (t1,t2,...,tn) belongs to Ix. Recall that tuples are ordered lexicographically, or in dictionary order. Thus, if we have index values from (2,3) to (3,5), they will be listed in order as [(2,3),(2,4),(2,5),(3,3,(3,4),(3,5)] Formally, the class Ix is defined as follows: class (Ord a) => Ix a where range :: (a,a) -> [a] index :: (a,a) a -> Int inRange :: (a,a) -> a -> Bool The function range takes a lower and upper bound of values and returns a list of all values between these bounds. The function index returns the position of a specific value in the list of values between a lower and upper bound. Finally, inRange specifies whether a given index lies within a given bound. For instance: range ((2,3),(3,5)) = [(2,3),(2,4),(2,5),(3,3,(3,4),(3,5)] index ((2,3),(3,5)) (2,5) = 2 inRange ((2,3),(3,5)) (3,3) = True inRange ((2,3),(3,5)) (3,2) = False In what follows, we will not bother too much about the details of the class Ix. We will be content to use types such as Int and (Int,Int) as index types of arrays. It is best to regard an array as a collection of (index,value) pairs. Such a list of (index,value) pairs is often called an "association list". Creating arrays --------------- Haskell supports three functions to construct arrays: 1. array :: (Ix a) => (a,a) -> [(a,b)] -> Array a b As the type suggests, the function takes a pair of index values, the lower and upper bounds of the indices, and an association list, in any order, to generate an Array. Here are some examples: squares = array (1,4) [(2,4),(3,9),(1,1),(4,16)] squares = array (1,4) [(i,i*i) | i <- [1..4]] somesquares = array (1,4) [(2,4),(1,1),(4,16)] The first example explicitly lists out the (index,value) pairs in the array, in a random order. The second example provides the same list using list comprehension. The third example shows that not all (index,value) pairs need to be defined. If an index is repeated in the list, the value at that index is undefined. If an index value is out of range, the entire array is undefined. Here is an example of an array whose indices are pairs of integers such that the entry at (i,j) is i+j. sample = array ((2,3),(3,5)) [((i,j),i+j) | i <- [2..3], j <- [3..5] ] 2. listarray :: (Ix a) => (a,a) -> [b] -> Array a b In this form, all array values are entered in the correct order of index. For example: squares = listarray (1,4) [1,4,9,16] sample = listarray ((2,3),(3,5)) [5,6,7,6,7,8] Using the range function on Ix defined above, we can define listarray in terms of array as follows: listarray bounds values = array bounds (zip (range bounds) values) 3. The final form is allows us to accumulate values in an array. The general idea is to extend the "array" function with the ability to provide multiple values for each index, with an explicit function for combining these multiple values. The function is called accumArray. It takes four arguments: accumArray accumulate-function initial-value index-bounds index-value-list As an illustrative example, suppose we have a list of observations and want to add up all the observations with the same index value to build a "histogram". Here is how we could do it histogram = accumArray (+) 0 (lower,upper) observations where each value in the histogram for indices in the range (lower,upper) is initialized to 0, observations is list of (index,value) pairs and each such pair is combined with the existing value at that index using the function +. Extracting and updating values ------------------------------ 1. The value at index i in array arr is denoted arr!i 2. We can update an array by providing fresh (index,value) pairs with the operator // For instance, if we write squares = (listarray (1,4) [1,4,9,16])//[(1,3),(3,7)] the resulting array corresponds to the association list [(1,3),(2,4),(3,7),(4,16)] As with the "array" function for creating arrays, it is an error to give duplicate index values in the list when updating an array --- if this happens, the value at that index becomes undefined. Recovering information from an array ------------------------------------ Given an unknown array, we can recover information about its indices and values: 1. bounds :: (Ix a) => (Array a b) -> (a,a) bounds returns the lower and upper bounds of the indices of the array. Examples (from the arrays described above): bounds square = (1,4) bounds sample = ((2,3),(3,5)) 2. indices :: (Ix a) => (Array a b) -> [a] indices returns the list of indices of the array. Since range applied to a pair of indices returns the list of values between the two indices, we have indices = range . bounds Examples (from the arrays described above): indices square = [1,2,3,4] bounds sample = [(2,3),(2,4),(2,5),(3,3),(3,4),(3,5)] 3. elems :: (Ix a) => (Array a b) -> [b] elems returns the list of values in the array, in order of the indices. elems arr = [ arr!i | i <- indices arr ] Examples (from the arrays described above): elems square = [1,4,9,16] elems sample = [5,6,7,6,7,8] 4. assocs :: (Ix a) => (Array a b) -> [(a,b)] assocs returns the contents of the array as a list of (index,value) pairs. Examples (from the arrays described above): assocs square = [(1,1),(2,4),(3,9),(4,16)] elems sample = [((2,3),5),((2,4),6),((2,5),7), ((3,3),6),((3,4),7),((3,5),8)] An illustration: Matrix multiplication -------------------------------------- To illustrate the ideas above, let us consider a function to multiply two matrices. We want to define a function matmult :: (Num b) => (Array ((Int,Int),(Int,Int)) b) -> (Array ((Int,Int),(Int,Int)) b) -> (Array ((Int,Int),(Int,Int)) b) Suppose we have a matrix A with rA rows, cA columns and matrix B with rB rows and cB colums. We can multiply A by B to get a matrix C with rA rows and cB columns provide cA = rB. We have the following formula to compute the (i,j)th entry in C: C(i,j) = sum A(i,k)*B(k,j) 1<=k<=cA We can then write matmult as follows: matmult arra arrb = | (uca-lca) == (urb-lrb) = array cbounds [((i,j), val i j | (i,j) <- range cbounds ] where ((lra,lca),(ura,uca)) = bounds arra ((lrb,lcb),(urb,ucb)) = bounds arrb lrc = 1 urc = (ura-lra)+1 lcc = 1 ucc = (ucb-lcb)+1 cbounds = ((lrc,lcc),(urc,ucc)) val :: Int -> Int -> b val i j = sum [arra!(lra+i-1,lca+k-1)*arrb!(lrb+k-1,lcb+j-1) | k <- [1..(uca-lca)+1] ] We could also use accumArray to eliminate the function val as follows: matmult arra arrb = | (uca-lca) == (urb-lrb) = accumArray (+) 0 cbounds [((i,j), arra!(lra+i-1,lca+k-1)*arrb!(lrb+k-1,lcb+j-1) | (i,j) <- range cbounds, k <- [1..(uca-lca)+1] ] where ((lra,lca),(ura,uca)) = bounds arra ((lrb,lcb),(urb,ucb)) = bounds arrb lrc = 1 urc = (ura-lra)+1 lcc = 1 ucc = (ucb-lcb)+1 cbounds = ((lrc,lcc),(urc,ucc)) Implementing arrays ------------------- We now look at how to implement arrays efficiently. To simplify the discussion, we shall assume that the indices associated with the elements of an N element array are always 1..N. Here are the functions supported by an array: makearray :: Int -> a -> (Array a) -- makearray N v makes a new -- array of size N initialized -- to v lookup :: (Array a) -> Int -> a -- Look up value at an index update :: (Array a) -> Int -> a -> (Array a) -- Update value at an index A simple implementation of an array is to use a size balanced binary tree along with size information at each node. -- The function makearray sets up, once and for all, a size balanced binary tree with n elements. This can be done in time O(n) using the function mkbtree, supplying a list with N copies of v as input to mkbtree. -- Position i in the array corresponds to position i in the inorder traversal of the tree. To lookup the i^th element, we start at the root and check the size of the left and right subtrees. Let these sizes be l and r. If i <= l, we know that position i is in the left subtree and move left. If i = l+1, the value we want is at the root. If i > l+1, we look in the right, but adjust the index to lookup to i-(l+1) because we now want the offset within the right subtree. -- Update will follow the same procedure as lookup to find the value to update. Clearly, the time to lookup a value will be at most log n since the tree is size balanced. To make the lookup time uniform for all positions, we can use a simpler type of tree where values are stored only at leaves --- internal nodes have no values. Let us call such a tree a "leaf tree". data LTree a = LNil | Leaf a | LNode Int (LTree a) (LTree a) With internal nodes, we have stored only an Int, representing the size of the subtree as this is required to look up the i^th position. The constructor "Leaf a" cannot be used in isolation. A leaf tree is either empty (LNil) or has at least one internal node (LNode) with at least one nonempty leaf. The smallest nonempty leaf tree would look like this, with a leaf value in the left subtree. LNode n (Leaf v) LNil For instance, here is how a leaf tree for the array we drew above could look: ------------------- / \ ------- ------- / \ / \ ----- ----- ----- ----- / \ / \ / \ / \ (1,a) (2,c) (3,f) (4,i) (5,j) (6,k) (7,a) (8,d) Notice that we have deliberately stored both the index value and the data value for each element of the array. This is to emphasize that the index values need not always be [0..n-1] for an n element array. Thus, if we allow arbitrary indices (as Haskell does), we have to modify our procedure for looking up an index to map the actual index to its corresponding entry in the range [1..n] before traversing the tree. Notice that we do not need the assumption (Eq a) because we do not need to maintain any relationship between the nodes based on data values. Here is how we would implement the functions makearray :: Int -> a -> (Array a) lookup :: (Array a) -> Int -> a update :: (Array a) -> Int -> a given the data definition data Array a = Arr (LTree a) For makearray, we use a variant of mkbtree that we call mkltree. The function mkltree takes as input a list of values and constructs a balanced leaf tree with the input list at its leaves. mkltree :: [a]-> (LTree a) mkltree l = fst (mkltreeaux l (length l)) mkltreeaux :: [a] -> Int -> (LTree a, [a]) mkltreeaux [] n = (LNil,[]) mkltreeaux l 0 = (LNil,l) mkltreeaux (x:xs) 1 = (LNode 1 (Leaf x) LNil, xs) mkltreeaux l n = (LNode n t1 t2,l2) where m = n `div` 2 rest = n - m (t1,l1) = mkltreeaux l m (t2,l2) = mkltreeaux l1 rest We can now define makearray as follows: makearray n v = Arr (mkltree [v |i <- [1..n]] n) The definition of lookup is easy lookup :: (Array a) -> Int -> a lookup (Arr Lnil) n = error "Empty array" lookup (Arr (LTree m (Leaf v) t2)) 1 = v lookup (Arr (LTree m t1 t2)) n = v | n < 1 || n > m = error "Invalid index" | n <= (size t1) = lookup (Arr t1) n | otherwise = lookup (Arr t2) (n - (size t1) where size :: (LTree a) -> Int size LNil = 0 size (LNode m t1 t2) = m The definition of update is analagous to lookup and is left as an exercise. ======================================================================