Introduction to Programming, Aug-Dec 2008 Lecture 18, Wednesday 29 Oct 2008 The abstract datatype Set ------------------------- The naive way to make a balanced tree from a list is to use the centre element of the list as the root and recursively construct balanced left and right subtrees from the first and second halves of the list: mkbtree :: [a] -> (STree a) mkbtree [] = Nil mkbtree [x] = Node Nil x Nil mkbtree l = Node (mkbtree left) root (mkbtree right) where m = (length l) `div` 2 root == l!!m left = take m l right = drop (m+1) l The complexity of mkbtree is given by the following recurrence: T(n) = 2 T(n/2) + O(n) The O(n) factor comes because it takes linear time to compute the midpoint of the input list and break it up into two halves. To do better, we need a more sophisticated version of the trick we used to improve the efficiency of inorder. This time, rather than constructing a list as output, our input is a list. For optimum efficiency, we need to process the input from left to right. To achieve this, we write a function mkbtreeaux :: [a] -> Int -> (STree a, Int) whose behaviour is as follows: mkbtreeaux l n = (t,lrest) where t is a balanced tree made up from the first n elements of l and lrest is the used part of l (i.e., drop n l) Here is how we define mkbtreeaux: mkbtreeaux [] n = (Nil, []) mkbtreeaux l 0 = (Nil, l) mkbtreeaux l n = (Node t1 root t2, l2) where m = n `div` 2 (t1,(root:rest)) = mkbtreeaux l m (t2,l2) = mkbtreeaux rest (n - (m+1)) As before, we observe that to construct a balanced tree from the first n elements of l, we need to select the midpoint of the n elements as the root and inductively make balanced left and right subtrees from the left and right halves. However, instead of explicitly finding the midpoint and breaking up the list into two parts, we build up the tree from left to right using the same function. Pictorially, we have l |----------------------------------------------------------| n l2 <------------------------------------>|--------------------| m rest <---------------> root |-----------------------------------| n-(m+1) becomes t1 <------------> l2 becomes t2 |--------------------| For this function, the time complexity is given by T(n) = 2 T(n/2) + O(1) which yields T(n) = O(n), as required. Of course, we define mkbtree using mkbtreeaux as mkbtree l = fst (mkbtreeaux l (length l)) Thus, we can now implement union, intersect and setdiff for the balanced search tree representation of sets in linear time as a sequence of three operations: 1. inorder => generates sorted lists from the sets in linear time 2. appropriate merge => combines sorted lists in linear time 3. mkbtree => reconstructs balanced search tree in linear time Priority queues --------------- A priority queue is like a queue, except that elements that enter the queue have priorities, and hence do not exit the queue in the order that they entered. (Think of VIP's waiting for darshan at Tirupati.) Each item in a priority queue is thus a pair (p,v) where v is the actual value and p is the priority. For simplicity, we denote priorities using integers. Let us decide that priority p is higher than p' if p is bigger than p'. [Observe that everything we say henceforth can be done equivalently by reversing this condition and saying that p is higher priority than p' if p is smaller than p'.] We then need to implement the following operations in a priority queue: insert :: (PriorityQueue a) -> a -> (PriorityQueue a) delmax :: (PriorityQueue a) -> (a,(PriorityQueue a)) The first operation inserts an element into the queue while the second removes an element with the highest priority value. As with sets, we can quickly run through various implementations of priority queues and analyze the complexity of implementing the basic operations. 1. Unsorted lists: If we maintain a priority queue as an unsorted list of pairs (p,v), insert takes time O(1) while delmax takes time O(n) for a queue with n elements. 2. Sorted lists: If we sort the list in 1 in descending order of priority values, insert takes O(n) time while delmax takes time O(1) because the maximum priority value is always at the head of the list. 3. Balanced search trees: Here, we take time O(log n) to insert a value. The maximum value in a search tree is found by following the rightmost path to the leaf. Since all paths are of length O(log n), finding the largest value takes time O(log n). We can then delete it in time O(log n). One difficulty with the balanced search tree approach is that, so far, we have always assumed that we maintain search trees with at most one copy of any value. In a priority queue, many items may share the same priority value, so we have to modify our definitions of search trees accordingly. Note: From now on, we shall ignore the fact that elements in a priority queue are pairs (p,v) of priorities and values and think of them as single entities. Effectively, we are only going to store and manipulate the priorities. Heaps (or Heap Trees) --------------------- Constructing a balanced binary search tree from a priority queue is essentially equivalent to sorting the elements in the queue, because we can use our efficient inorder traversal to generate a sorted list from a binary search tree. Intuitively, it should not be necessary to order *all* elements in the queue to extract the one with minimum priority value. A weaker ordering should suffice. This is achieved by heap trees. Heap trees are binary trees, just like search trees, except that the inductive property relating values at various nodes is different. Here is the data definition (in which the components of HNode are arranged differently from those in a binary search tree to emphasize that this tree has a different organization). (Eq a) => data HTree a = HNil | HNode a (HTree a) (HTree a) The "heap property" is the following: The value at a node is larger than the values at its two children. A heap tree is one in which every node satisfies the heap property. A simple inductive argument establishes that the largest value in a heap tree is found at the root. However, we cannot say anything about the relative order of the values in the two subtrees. All of the following are valid heap trees. 6 6 6 / \ / \ / \ 5 2 4 5 3 5 / \ \ / \ \ / \ \ 3 4 1 2 3 1 1 2 4 Thus, in a heap tree, finding the largest element is straightforward --- it is always at the root! [The heap condition we have defined corresponds to what are called max-heap trees. Dually, we can insist that every node be smaller than its children and obtain a min-heap tree.] How do we build a heap tree from a list of values? We can begin by using mkbtree, defined earlier, to construct a (size) balanced binary tree from the list in linear time. Thus, it is sufficient to describe how to convert a (balanced) binary tree into a (balanced) heap tree. Assume that we have a node x whose left and right subtrees, L rooted at y and R rooted at z are already heap trees. --x-- / \ y z | | / \ / \ / L \ / R \ ----- ----- What do we need to do to ensure that the combined tree has the heap property? Clearly, if x >= max(y,z), there is no problem. If this does not hold, we exchange x with max(x,y). This brings x either to the root of L or the root of R. The heap property now holds at the top of this tree and also continues to hold in the subtree that was not affected by the exchange. In the subtree where x has moved, we have to inductively repeat the same step. In this process, a "light" value x will come down as far as it needs to before settling in a stable position. This is often called sifting. (A sieve that is used to clean flour lets smaller particles of flour through and retains stones and other large particles above.) Here is the function that we just described. sift :: (HTree a) -> (HTree a) sift HNil = HNil sift (HNode x HNil HNil) = (HNode x HNil HNil) sift (HNode x (HNode y t1 t2) HNil) | x >= y = HNode x (HNode y t1 t2) HNil | otherwise = HNode y (sift (HNode x t1 t2)) HNil sift (HNode x HNil (HNode z t3 t4)) | x >= z = HNode x HNil (HNode z t3 t4) | otherwise = HNode z HNil (sift (HNode x t3 t4)) sift (HNode x (HNode y t1 t2) (HNode z t3 t4)) | x >= max(y,z) = HNode x (HNode y t1 t2) (HNode z t3 t4) | y >= max(x,z) = HNode y (sift (HNode x t1 t2)) (HNode z t3 t4) | z >= max(x,y) = HNode z (HNode y t1 t2) (sift (HNode x t3 t4)) How long does it take to sift a tree? Observer that the value at the root descends one level after each sift. Thus, sift can call itself at most log n times if the original tree is balanced. Moreover, sift does not alter the structure of the tree, only the positions of the values, so the balance is retained. We can now systematically heapify a balanced tree bottom up. heapify :: (BTree a) -> (HTree a) heapify Nil = HNil heapify (Node t1 x t2) = sift (HNode x (heapify t1) (heapify t2)) How much time does heapify take? In order to heapify a tree, we have to first heapify its left and right subtrees and then sift the value at the top to its correct position. Assuming the original tree is balanced, we have the recurrence T(N) = 2 T(N/2) + O(log N) -------- -------- 2*heapify sift If we work this out, which we won't [see Bird's book "Introduction to Programming in Haskell" for the calculation], it turns out that T(N) = O(N). Thus, we have the following O(N) construction of a heap tree from a list of values: mkbtree heapify list -----------> balanced tree -----------> heap tree O(N) O(N)