Introduction to Programming, Aug-Nov 2008 Lecture 21, Monday 17 Nov 2008 Arrays ------ The last abstract datatype we shall consider is the array. We can think of an array as a fixed sequence of "cells", each position labelled by an index, where we can access each cell directly by specifying the index value. index-> 1 2 3 4 5 6 7 8 ------------------------------- value-> | a | c | f | i | j | k | a | d | ------------------------------- The values stored in an array are all of a uniform type. An array is like a list with two important differences: 1. Normally, arrays have a fixed number of elements. 2. The time taken to access the i^th element of an array is independent of i. (Recall that this takes time O(i) in a list). Arrays are built-in to Haskell, so we shall first look at how to use arrays in Haskell, and then consider how this datatype may be implemented in an efficient manner. To use arrays in Haskell, you have to first import the module Array as follows: import Array This makes available the abstract datatype Array a b Notice that Array has two types associated with it. The first type variable, a, refers to the type of the index variable, while the second type variable, b, refers to the type of the data stored in the array. The index variable must belong to the type class Ix, which is a subclass of Ord a. Intuitively, Ix consists of types which have a total order but which are also discrete, so that one can enumerate all values that list between a lower bound and upper bound. Thus, Bool, Char and Int would belong to Ix but not Float. Also, if t1, t2, ..., tk belong to Ix, the tuple (t1,t2,...,tn) belongs to Ix. Recall that tuples are ordered lexicographically, or in dictionary order. Thus, if we have index values from (2,3) to (3,5), they will be listed in order as [(2,3),(2,4),(2,5),(3,3,(3,4),(3,5)] Formally, the class Ix is defined as follows: class (Ord a) => Ix a where range :: (a,a) -> [a] index :: (a,a) a -> Int inRange :: (a,a) -> a -> Bool The function range takes a lower and upper bound of values and returns a list of all values between these bounds. The function index returns the position of a specific value in the list of values between a lower and upper bound. Finally, inRange specifies whether a given index lies within a given bound. For instance: range ((2,3),(3,5)) = [(2,3),(2,4),(2,5),(3,3,(3,4),(3,5)] index ((2,3),(3,5)) (2,5) = 2 inRange ((2,3),(3,5)) (3,3) = True inRange ((2,3),(3,5)) (3,2) = False In what follows, we will not bother too much about the details of the class Ix. We will be content to use types such as Int and (Int,Int) as index types of arrays. It is best to regard an array as a collection of (index,value) pairs. Such a list of (index,value) pairs is often called an "association list". Creating arrays --------------- Haskell supports three functions to construct arrays: 1. array :: (Ix a) => (a,a) -> [(a,b)] -> Array a b As the type suggests, the function takes a pair of index values, the lower and upper bounds of the indices, and an association list, in any order, to generate an Array. Here are some examples: squares = array (1,4) [(2,4),(3,9),(1,1),(4,16)] squares = array (1,4) [(i,i*i) | i <- [1..4]] somesquares = array (1,4) [(2,4),(1,1),(4,16)] The first example explicitly lists out the (index,value) pairs in the array, in a random order. The second example provides the same list using list comprehension. The third example shows that not all (index,value) pairs need to be defined. If an index is repeated in the list, the value at that index is undefined. If an index value is out of range, the entire array is undefined. Here is an example of an array whose indices are pairs of integers such that the entry at (i,j) is i+j. sample = array ((2,3),(3,5)) [((i,j),i+j) | i <- [2..3], j <- [3..5] ] 2. listarray :: (Ix a) => (a,a) -> [b] -> Array a b In this form, all array values are entered in the correct order of index. For example: squares = listarray (1,4) [1,4,9,16] sample = listarray ((2,3),(3,5)) [5,6,7,6,7,8] Using the range function on Ix defined above, we can define listarray in terms of array as follows: listarray bounds values = array bounds (zip (range bounds) values) 3. The final form is allows us to accumulate values in an array. The general idea is to extend the "array" function with the ability to provide multiple values for each index, with an explicit function for combining these multiple values. The function is called accumArray. It takes four arguments: accumArray accumulate-function initial-value index-bounds index-value-list As an illustrative example, suppose we have a list of observations and want to add up all the observations with the same index value to build a "histogram". Here is how we could do it histogram = accumArray (+) 0 (lower,upper) observations where each value in the histogram for indices in the range (lower,upper) is initialized to 0, observations is list of (index,value) pairs and each such pair is combined with the existing value at that index using the function +. Extracting and updating values ------------------------------ 1. The value at index i in array arr is denoted arr!i 2. We can update an array by providing fresh (index,value) pairs with the operator // For instance, if we write squares = (listarray (1,4) [1,4,9,16])//[(1,3),(3,7)] the resulting array corresponds to the association list [(1,3),(2,4),(3,7),(4,16)] As with the "array" function for creating arrays, it is an error to give duplicate index values in the list when updating an array --- if this happens, the value at that index becomes undefined. Recovering information from an array ------------------------------------ Given an unknown array, we can recover information about its indices and values: 1. bounds :: (Ix a) => (Array a b) -> (a,a) bounds returns the lower and upper bounds of the indices of the array. Examples (from the arrays described above): bounds square = (1,4) bounds sample = ((2,3),(3,5)) 2. indices :: (Ix a) => (Array a b) -> [a] indices returns the list of indices of the array. Since range applied to a pair of indices returns the list of values between the two indices, we have indices = range . bounds Examples (from the arrays described above): indices square = [1,2,3,4] bounds sample = [(2,3),(2,4),(2,5),(3,3),(3,4),(3,5)] 3. elems :: (Ix a) => (Array a b) -> [b] elems returns the list of values in the array, in order of the indices. elems arr = [ arr!i | i <- indices arr ] Examples (from the arrays described above): elems square = [1,4,9,16] elems sample = [5,6,7,6,7,8] 4. assocs :: (Ix a) => (Array a b) -> [(a,b)] assocs returns the contents of the array as a list of (index,value) pairs. Examples (from the arrays described above): assocs square = [(1,1),(2,4),(3,9),(4,16)] elems sample = [((2,3),5),((2,4),6),((2,5),7), ((3,3),6),((3,4),7),((3,5),8)] An illustration: Matrix multiplication -------------------------------------- To illustrate the ideas above, let us consider a function to multiply two matrices. We want to define a function matmult :: (Num b) => (Array ((Int,Int),(Int,Int)) b) -> (Array ((Int,Int),(Int,Int)) b) -> (Array ((Int,Int),(Int,Int)) b) Suppose we have a matrix A with rA rows, cA columns and matrix B with rB rows and cB colums. We can multiply A by B to get a matrix C with rA rows and cB columns provide cA = rB. We have the following formula to compute the (i,j)th entry in C: C(i,j) = sum A(i,k)*B(k,j) 1<=k<=cA We can then write matmult as follows: matmult arra arrb = | (uca-lca) == (urb-lrb) = array cbounds [((i,j), val i j | (i,j) <- range cbounds ] where ((lra,lca),(ura,uca)) = bounds arra ((lrb,lcb),(urb,ucb)) = bounds arrb lrc = 1 urc = (ura-lra)+1 lcc = 1 ucc = (ucb-lcb)+1 cbounds = ((lrc,lcc),(urc,ucc)) val :: Int -> Int -> b val i j = sum [arra!(lra+i-1,lca+k-1)*arrb!(lrb+k-1,lcb+j-1) | k <- [1..(uca-lca)+1] ] We could also use accumArray to eliminate the function val as follows: matmult arra arrb = | (uca-lca) == (urb-lrb) = accumArray (+) 0 cbounds [((i,j), arra!(lra+i-1,lca+k-1)*arrb!(lrb+k-1,lcb+j-1) | (i,j) <- range cbounds, k <- [1..(uca-lca)+1] ] where ((lra,lca),(ura,uca)) = bounds arra ((lrb,lcb),(urb,ucb)) = bounds arrb lrc = 1 urc = (ura-lra)+1 lcc = 1 ucc = (ucb-lcb)+1 cbounds = ((lrc,lcc),(urc,ucc))