Introduction to Programming, Aug-Dec 2008 Lecture 17, Wednesday 22 Oct 2008 The abstract datatype Set ------------------------- A set is a collection of elements without repetitions and without any prespecified order. We associate two collections of operations with sets. Dictionary operations: empty :: Set a isempty :: Set a -> Bool member :: Set a -> a -> Bool insert :: Set a -> a -> Set a delete :: Set a -> a -> Set a Set operations: union :: Set a -> Set a -> Set a intersect :: Set a -> Set a -> Set a setdiff :: Set a -> Set a -> Set a Possible implementations: 1. The first possibility is to represent a set as a lists, possibly with elements repeated. We use the definition: data Set a = Setof [a] With this definition, the dictionary operations can be defined as follows. empty = Setof [] isempty Setof l = l == [] member (Setof l) x = elem x l insert (Setof l) x = Setof (x:l) delete (Setof l) x = Setof (filter (/= x) l) or, alternatively, delete (Setof l) x = Setof [y | y <- l, y /= x] If the size of the set is n, the size of the representation N, may be much larger than n, because of repetitions. Complexity of dictionary operations: empty : O(1) isempty : O(1) member : O(N) insert : O(1) delete : O(N) Other set operations: union (Setof xs) (Setof ys) = (Setof xs++ys) intersect (Setof xs) (Setof ys) = [y | y <- ys, elem y xs] setdiff (Setof xs) (Setof ys) = [x | x <- xs, not (elem x ys)] Let M and N be the size of the representations Setof xs and Setof ys, respectively. Then, union takes time proportional to M+N while intersect and setdiff take time proportional to MN. 2. To reduce the size of the list representation to be equal to the size of the set being represented, we can inductively maintain the set as a list without repetitions. The data definition remains the same: data Set a = Setof [a] Under the assumption that the list has no repetitions, the dictionary operations can be defined as follows. empty = Setof [] isempty Setof l = l == [] member (Setof l) x = elem x l insert (Setof l) x = Setof (x:(filter (/= x) l)) delete (Setof l) x = Setof (filter (/= x) l) The only change is in the definition of insert, where we have ensured that all existing copies of x are removed before we put in the new value. Now, if the size of the set is n, the complexity of the dictionary operations is given by: empty : O(1) isempty : O(1) member : O(n) insert : O(n) delete : O(n) For the other set operations, intersect and setdiff can be retained as in the earlier representation. When computing the union, we have to removed duplicates. One way is to systematically insert all the values from the second set into the first set, making use of the fact that insert handles the problem of filtering out duplicates. union (Setof xs) (Setof []) = Setof xs union (Setof xs) (Setof (y:ys) = union (insert (Setof xs) y) (Setof ys) With this definition, the union of two sets of size m and n takes time mn, as do intersect and setdiff. 3. If the values being stored in the set can be compared, we can maintain the set as a sorted list without repetitions. data (Ord a) => Set a = Setof [a] empty = Setof [] isempty Setof l = l == [] member (Setof l) x = elem x l insert (Setof l) x = Setof (insert x l) delete (Setof l) x = Setof (filter (/= x) l) Here, the insert function on sets calls the insert function on sorted lists that we have seen in the definition of insertion sort. We do not gain anything in terms of complexity for dictionary operations. We still have: empty : O(1) isempty : O(1) member : O(n) insert : O(n) delete : O(n) However, we can now use the fact that we can merge two sorted lists in linear time to write more efficient implementations of the other set operations. union (Setof xs) (Setof ys) = Setof (unionmerge xs ys) where unionmerge [] ys = ys unionmerge xs [] = xs unionmerge (x:xs) (y:ys) | x < y = x:(unionmerge xs (y:ys)) | y < x = y:(unionmerge (x:xs) ys) | otherwise = x:(unionmerge xs ys) intersect (Setof xs) (Setof ys) = Setof (intersectmerge xs ys) where intersectmerge [] ys = [] intersectmerge xs [] = [] intersectmerge (x:xs) (y:ys) | x < y = (intersectmerge xs (y:ys)) | y < x = (intersectmerge (x:xs) ys) | otherwise = x:(intersectmerge xs ys) setdiff (Setof xs) (Setof ys) = Setof (setdiffmerge xs ys) where setdiffmerge [] ys = [] setdiffmerge xs [] = xs intersectmerge (x:xs) (y:ys) | x < y = x:(intersectmerge xs (y:ys)) | y < x = (intersectmerge (x:xs) ys) | otherwise = (intersectmerge xs ys) Merging two lists of length m and n takes time m+n, so each of union, intersect and setdiff now takes time m+n. 4. If the elements of the set can be compared with each other, we can move from a sorted list representation to a balanced search tree representation. data (Ord a) => Set a = Setof STree a Recall that to maintain balanced search trees, we store the height of a node along with the value so we have data (Ord a) => STree a = Nil | Node Int (STree a) a (STree a) In this reprentation, it is clear that the dictionary operations member, insert and deleet take time O(log n) for a set of size n, thus beating the other representations by a long way. How about union, intersection and setdiff? The naive implementation of union would be to insert each element of the second set in the first, or vice versa, yielding an algorithm with complexity min(n log m, m log n). To implement this, we can first flatten out one of the sets into a list using inorder, preorder or postorder traversal and then systematically run through the elements of this list. Can we do better? We know that we can perform unionmerge, intersectmerge and setdiffmerge on two sorted lists in linear time. We can obtain a sorted list from a (balanced) search tree using inorder. We can also write a function mkbtree to construct a balanced search tree from a sorted list. If inorder and mkbtree work in linear time, we can implement union, intersect and setdiff in linear time in the balanced search tree representation. Let us start with analyzing inorder: inorder :: (STree a) -> [a] inorder Nil = [] inorder (Node m t1 x t2) = (inorder t1)++[x]++(inorder t2) If t is balanced, the left and right subtrees t1 and t2 are half the size of the main tree. Let T(n) denote the time required to generate the inorder traversal of a balanced tree with n nodes. The time taken to combine the two recursive inorder traversals is proportional to the length of (inorder t1) because ++ takes time proportional to the length of its left argument. Thus, we have T(n) = 2 T(n/2) + O(n) We have seen this recurrence before (e.g., mergesort) and the solution is T(n) = O(n log n), which is larger than the O(n) solution that we seek. The problem arises because Haskell lists are built up right to left, so it is inefficient to write a function that build a list left to right. An example of this is the naive reverse function that we have seen earlier, which takes quadratic time. reverse [] = [] reverse (x:xs) = (reverse xs) ++ [x] As with reverse, the key to making inorder more efficient is to use an auxiliary parameter. Let us define a new function inorderaux :: (STree a) -> [a] -> [a] such that inorderaux t l yields (inorder t)++l We can then recover inorder t as inorderaux t []. Here is a definition of inorderaux. inorderaux Nil l = l inorderaux (Node t1 x t2) l = inorderaux t1 (x:(inorderaux t2 l)) In other words, we compute the inorder traversal of Node t1 x t2 from right to left, first placing inorder t2 to the left of l, then adding x on the left of the resulting list and finally placing inorder t1 at the leftmost position. The complexity of inorderaux is given by: T(n) = 2 T(n/2) + O(1) The crucial feature is that we need only constant time to combine the two recursive computations of size n/2. For this recurrence, the solution is T(n) = O(n), which is what we seek. The next function that we have to achieve in linear time is to mkbtree, that constructs a balanced search tree from a sorted list. We need not insist that the input list be sorted. We shall write mkbtree in such a way that the output of mkbtree is a balanced tree and inorder (mkbtree l) == l If l is sorted, this ensures that mkbtree generates a search tree. The naive way to make a balanced tree from a list is to use the centre element of the list as the root and recursively construct balanced left and right subtrees from the first and second halves of the list: mkbtree :: [a] -> (STree a) mkbtree [] = Nil mkbtree [x] = Node Nil x Nil mkbtree l = Node (mkbtree left) root (mkbtree right) where m = (length l) `div` 2 root == l!!m left = take m l right = drop (m+1) l The complexity of mkbtree is given by the following recurrence: T(n) = 2 T(n/2) + O(n) The O(n) factor comes because it takes linear time to compute the midpoint of the input list and break it up into two halves.