Introduction to Programming, Aug-Dec 2008 Lecture 6, Wed 27 Aug 2008 Folding functions through a list -------------------------------- Consider the following inductive definitions over lists: sum :: [Int] -> Int sum [] = 0 sum (x:xs) = x + (sum xs) and :: [Bool] -> Bool and [] = True and (x:xs) = x && (and xs) concat :: [[a]] -> [a] concat [] = [] concat (x:xs) = x ++ (concat xs) All of these have the general form: f [] = y f (x:xs) = x op (f xs) Thus, given a binary function op and an initial value y, we can evaluate these functions from right to left as follows x_1 x_2 . . . x_n-1 x_n y | | \ / | | op | | | | | y_n | \ / | op | | | y_n-1 | . . . . . . | | y_2 \ / op | y_1 where y_1 is the final value to be returned. As can be seen from the figure, we are "folding" the binary function op from right to left through the list. Since op is a binary function, we have to provide a second argument y to combine with the last element x_n. After this, each intermediate value is used as the right argument to op and the next element of the list is taken as the left argument. This folding operation is described by the Haskell function foldr, which takes three inputs: the function to be folded, the initial value, and the list to be folded. Thus, we can write sum xs = foldr (+) 0 xs and xs = foldr (&&) True xs concat xs = foldr (++) [] xs More concisely, since xs is the rightmost argument on both sides, we can write: sum = foldr (+) 0 and = foldr (&&) True concat = foldr (++) [] What is the type of foldr? In the three examples above, the function passed to foldr had the same types for both its inputs and its output. However, there is no reason for this. Suppose that the two inputs to the function op are of types a and b, respectively. Initially, we have op x_n y = y_n-1, where x_n is of type a and y is of type b. At the next step, we have to apply op to x_n-1 and y_n-1. Since the second argument to op must be of type b, this constrains y_n-1 to be of type b. In other words, we have the following general types: The function passed to foldr :: a->b->b The initial value :: b The input list :: [a] The output value :: b So, the type of foldr is foldr :: (a->b->b) -> b -> [a] -> b The function foldr itself can be defined inductively: foldr f y [] = y foldr f y (x:xs) = f x (foldr f y xs) Notice that folding from the right is a natural consequence of inductively decomposing lists from the left. This is the efficient way to decompose lists because of the internal representation of lists in Haskell. foldl ----- We can define a symmetric function, foldl, that folds a function f from left to right through a list. Thus, given a binary function op and an initial value y, we can evaluate op functions from left to right as follows. y x_1 x_2 . . . x_n \ / | | op | | | | | y_1 / | \ / | op | | | y_2 | \ | ... | \ | y_(n-1) / \ / op | y_n where y_n is the final value to be returned. To illustrate the use of foldl, we revisit the problem of converting a string to a number. The function we want to construct is strtonum :: String -> Int The convention we adopt is that the characters '0', '1', ..., '9' denote the number 0,1,...,9 and all other characters are interpredted as 0. Thus strtonum "138" should return 138 and strtonum "1ab9" should return 1009. We begin with a function chartonum that converts a single characte to a single digit integer. One way to write this function is to explicitly match the relevant characters '0', '1', ..., '9' and assign 0 to all other values of the input. chartonum :: Char -> Int chartonum '0' = 0 chartonum '1' = 1 ... chartonum '9' = 9 chartonum x = 0 Alternatively, we could have made use of the fact that '0' to '9' are consecutive in the internal table representing characters and written chartonum c | (c >= '0' && c <= '9') = 0 + (ord c - ord '0') | otherwise = 0 Now, to convert a String to an Int, we start from the left and inductively build up a number. At each step, we multiply the number we have so far by 10 and add the next digit. Pictorially, we have 0 d_1 d_2 . . . d_n \ / | | \ / | | k_1=10*0+d_1 / | \ / | k_2=10*k_1+d_2 | \ | ... | \ | k_n-1 / \ / k_n=10*k_n-1+d_n The operation that combines the partially constructed number with the next digit is the following: combine :: Int -> Char -> Int combine n c = 10*n + (chartonum c) We can now write strtonum as strtonum = foldl combine 0 Observe that the type of foldl is the following: foldl :: (b->a->b) -> b -> [a] -> b The difference is in the type of the function f passed to foldl; since we start from the left, the type of f is (b->a->b) and not (a->b->b) as in foldr, where we start from the right. Can we define foldl inductively? A naive definition would be the following: foldl :: (b->a->b) -> b -> [a] -> b foldl f 0 [] = 0 foldl f 0 l = f (foldl f 0 (init l)) (last l) However, this is not an efficient definition because computing init l and last l takes time proportional to the length of the list. Exercise: Devise a more efficient definition for foldl. Hint: Use an auxiliary function that explicitly maintains the incremental value that is being computed. Folding on nonempty lists ------------------------- We wrote the function sum as foldr (+) 0 If we write a corresponding function to compute the product of all elements of a list, it would be foldr (*) 1 This would work fine for nonempty lists but would give the somewhat counterintuitive answer that the product of an empty list is 1. To consider another example, suppose we want to find the maximum value in a list. Intuitively, this corresponds to folding the builtin function max through the list. Once again, there is a problem defining the maximum value of an empty list. Moreover, unlike the case of product, it is not even clear how to define this function --- the default value we supply must be smaller than any value actually in the list, which means we have to rely on the underlying system. To get around these difficulties, Haskell provides the functions foldr1 and foldl1 that work exactly like foldr and foldl, respectively, but only for nonempty lists. If the list has only one element, no folding is done and that element is returned. If the list has two or more elements, the function supplied to foldr1 (or fold1) is be folded through the list beginning with the last two (or first two) elementds in the list. Here is an inductive definition of foldr1: foldr1 :: (a->a->a) -> [a] -> a foldr1 f [x] = x foldr1 f [x,y] = f x y foldr1 f (x:y:ys) = f x (foldr1 f (y:ys)) Notice that the function passed to foldr1 has type a->a->a, unlike the type a->b->b of the function passed to foldr. This is because both arguments to f come from the list and the answer must again be of the same type. Given this, we can define the functions product of a list and maximum value of a list in terms of foldr1 (and foldl1). product = foldr1 (*) = foldl1 (*) maxlist = foldr1 max = foldl1 max ====================================================================== Accumulating intermediate values : scanl and scanr -------------------------------------------------- As we have seen, the function foldl folds a function f through a list and produces a single final value. Pictorially, we had y x_1 x_2 . . . x_n \ / | | op | | | | | y_1 / | \ / | op | | | y_2 | \ | ... | \ | y_(n-1) / \ / op | y_n The computation of y_n involves generating the intermediate values y, y_1, ..., y_(n-1). These correspond to "partial" answers of foldl for prefixes of the list. Sometimes, these partial answers are also interesting and they can be returned with extra effort. The Haskell function scanl achieves this. In other words, scanl f m l = [y,y_1,...,y_n] where [y,y_1,...,y_n] is the list of intermediate values generated when computing foldl f m l Thus, we have scanl (+) 0 [1..n] = [0,1,(1+2),....,(1+2+..+n)] scanl (*) 1 [1..n] = [1!,2!, 3!,...,n!] Symmetrically, scanr returns the partial values generated when evaluating foldr. Recall that the picture for foldr was the following: x_1 x_2 . . . x_n-1 x_n y | | \ / | | op | | | | | y_n | \ / | op | | | y_n-1 | . . . . . . | | y_2 \ / op | y_1 Thus, scanr f m l = [y_1,...,y_n,y] where [y_1,...,y_n,y] is the list of intermediate values generated when computing foldr f m l Notice that the output of scanl is the list of foldl values for longer and longer prefixes of l while the output of scanr is the list of foldr values for shorter and shorter suffixes of l. ====================================================================== Combinatorial functions on lists -------------------------------- We now look at some combinatorial functions on lists. All of these can be defined inductively in terms of the structure of the list. However, we will also see that we can use alternative notation to simplify these definition. Initial segments ---------------- We begin with the function initsegs that lists out the initial segments of a list. An initial segment of a list is a prefix --- that is, a sublist that includes the first k elements of l for some k. The idea is straightforward. The smallest initial segment of a list is the empty list. For a list of the form (x:xs), the initial segments can be obtained by inserting an x at the head of each initial segments of xs (and explicitly adding a fresh empty initial segment). Each initial segment is itself a list, so initsegs returns a list of lists. Here is an inductive definition of initsegs: initsegs :: [a] -> [[a]] initsegs [] = [[]] initsegs (x:xs) = [[]] ++ [x:l | l <- initsegs xs] An alternative, "purer", definition is initsegs :: [a] -> [[a]] initsegs [] = [[]] initsegs (x:xs) = [[]] ++ map (x:) (initsegs xs) Notice that we can generate a much more direct definition in terms of take. initsegs l = [take n l | n <- [0..length l]] One application of initsegs is to define scanl. scanl can be seen as repeated application of foldl on each initial segment of the givn list. In other words scanl f a l = map (foldl f a) (initsegs l) or scanl f a l = [foldl f a ll | ll <- initsegs l] All permutations of a list -------------------------- Our next task is to generate a function that lists out all permutations of a list. An inductive definition would require defining "permutations (x:xs)" in terms of "permutations xs". The logical way to lift "permutations xs" to "permutations (x:xs)" is to insert x in each possible position within each permutation of xs. We begin with a function "interleave" which achieves the task of inserting a value in every possible position of a list. interleave :: a -> [a] -> [a] interleave x [] = [[x]] interleave x (y:ys) = [x:y:ys] + map (y:) (interleave x ys) Alternatively, we have interleave x l = [(take n l) ++ [x] ++ (drop n l) | n <- [0..(length l) -1]] Now, we can define permutations :: [a] -> [[a]] permutations [] = [[]] permutations (x:xs) = [ zs | ys <- permutations xs ; zs <- interleave x ys ] If we did not use list comprehension, we would have something like permutations (x:xs) = concat (map (interleave x) (permutations xs)) Notice the need for a concat to remove an extra level of list brackets. Even with list comprehension, if we move the interleave to the left hand side, we have to add a concat. permutations (x:xs) = concat [ interleave x ys | ys <- permutations xs] Partitions of a list -------------------- A collection of nonempty lists l1, l2,...,lk is said to be a partition of the list l if l == l1 ++ l2 ++ ... ++ lk. For instance [[1],[2,3],[4,5,6]] is a partition of [1..6]. We would like to write a Haskell function "partitions" that takes as input a list l and returns all the partitions of l. Notice that each partition is itself written as a list of lists. Thus, the function partitions will return a list of (list of lists). partitions :: [a] -> [[[a]]] Since we are interested only in nonempty partitions, the base case is for the singleton list. partitions [x] = [[[x]]] For the inductive case, we observe that if we have a set of partitions for xs, the partitions of (x:xs) are either those in which x is added to the first component of some partition of xs or those in which [x] is added as an additional component to some partition of xs. Thus partitions (x:xs) = [(x:head l):(tail l) | l <- parts xs] ++ [[x]:l | l <- parts xs] ======================================================================