Introduction to Programming, Aug-Dec 2006 Lecture 17, Tuesday 31 Oct 2006 Priority queues --------------- A priority queue is like a queue, except that elements that enter the queue have priorities, and hence do not exit the queue in the order that they entered. (Think of VIP's waiting for darshan at Tirupati.) Each item in a priority queue is thus a pair (p,v) where v is the actual value and p is the priority. For simplicity, we denote priorities using integers. Let us decide that priority p is higher than p' if p is bigger than p'. [Observe that everything we say henceforth can be done equivalently by reversing this condition and saying that p is higher priority than p' if p is smaller than p'.] We then need to implement the following operations in a priority queue: insert :: (PriorityQueue a) -> a -> (PriorityQueue a) delmax :: (PriorityQueue a) -> (a,(PriorityQueue a)) The first operation inserts an element into the queue while the second removes an element with the highest priority value. As with sets, we can quickly run through various implementations of priority queues and analyze the complexity of implementing the basic operations. 1. Unsorted lists: If we maintain a priority queue as an unsorted list of pairs (p,v), insert takes time O(1) while delmax takes time O(n) for a queue with n elements. 2. Sorted lists: If we sort the list in 1 in descending order of priority values, insert takes O(n) time while delmax takes time O(1) because the maximum priority value is always at the head of the list. 3. Balanced search trees: Here, we take time O(log n) to insert a value. The maximum value in a search tree is found by following the rightmost path to the leaf. Since all paths are of length O(log n), finding the largest value takes time O(log n). We can then delete it in time O(log n). One difficulty with the balanced search tree approach is that, so far, we have always assumed that we maintain search trees with at most one copy of any value. In a priority queue, many items may share the same priority value, so we have to modify our definitions of search trees accordingly. Note: From now on, we shall ignore the fact that elements in a priority queue are pairs (p,v) of priorities and values and think of them as single entities. Effectively, we are only going to store and manipulate the priorities. Heaps (or Heap Trees) --------------------- Constructing a balanced binary search tree from a priority queue is essentially equivalent to sorting the elements in the queue, because we can use our efficient inorder traversal to generate a sorted list from a binary search tree. Intuitively, it should not be necessary to order *all* elements in the queue to extract the one with minimum priority value. A weaker ordering should suffice. This is achieved by heap trees. Heap trees are binary trees, just like search trees, except that the inductive property relating values at various nodes is different. Here is the data definition (in which the components of HNode are arranged differently from those in a binary search tree to emphasize that this tree has a different organization). (Eq a) => data HTree a = HNil | HNode a (HTree a) (HTree a) The "heap property" is the following: The value at a node is larger than the values at its two children. A heap tree is one in which every node satisfies the heap property. A simple inductive argument establishes that the largest value in a heap tree is found at the root. However, we cannot say anything about the relative order of the values in the two subtrees. All of the following are valid heap trees. 6 6 6 / \ / \ / \ 5 2 4 5 3 5 / \ \ / \ \ / \ \ 3 4 1 2 3 1 1 2 4 Thus, in a heap tree, finding the largest element is straightforward --- it is always at the root! [The heap condition we have defined corresponds to what are called max-heap trees. Dually, we can insist that every node be smaller than its children and obtain a min-heap tree.] How do we build a heap tree from a list of values? We can begin by using mkbtree, defined earlier, to construct a (size) balanced binary tree from the list in linear time. Thus, it is sufficient to describe how to convert a (balanced) binary tree into a (balanced) heap tree. Assume that we have a node x whose left and right subtrees, L rooted at y and R rooted at z are already heap trees. --x-- / \ y z | | / \ / \ / L \ / R \ ----- ----- What do we need to do to ensure that the combined tree has the heap property? Clearly, if x >= max(y,z), there is no problem. If this does not hold, we exchange x with max(x,y). This brings x either to the root of L or the root of R. The heap property now holds at the top of this tree and also continues to hold in the subtree that was not affected by the exchange. In the subtree where x has moved, we have to inductively repeat the same step. In this process, a "light" value x will come down as far as it needs to before settling in a stable position. This is often called sifting. (A sieve that is used to clean flour lets smaller particles of flour through and retains stones and other large particles above.) Here is the function that we just described. sift :: (HTree a) -> (HTree a) sift HNil = HNil sift (HNode x HNil HNil) = (HNode x HNil HNil) sift (HNode x (HNode y t1 t2) HNil) | x >= y = HNode x (HNode y t1 t2) HNil | otherwise = HNode y (sift (HNode x t1 t2)) HNil sift (HNode x HNil (HNode z t3 t4)) | x >= z = HNode x HNil (HNode z t3 t4) | otherwise = HNode z HNil (sift (HNode x t3 t4)) sift (HNode x (HNode y t1 t2) (HNode z t3 t4)) | x >= max(y,z) = HNode x (HNode y t1 t2) (HNode z t3 t4) | y >= max(x,z) = HNode y (sift (HNode x t1 t2)) (HNode z t3 t4) | z >= max(x,y) = HNode z (HNode y t1 t2) (sift (HNode x t3 t4)) How long does it take to sift a tree? Observer that the value at the root descends one level after each sift. Thus, sift can call itself at most log n times if the original tree is balanced. Moreover, sift does not alter the structure of the tree, only the positions of the values, so the balance is retained. We can now systematically heapify a balanced tree bottom up. heapify :: (BTree a) -> (HTree a) heapify Nil = HNil heapify (Node t1 x t2) = sift (HNode x (heapify t1) (heapify t2)) How much time does heapify take? In order to heapify a tree, we have to first heapify its left and right subtrees and then sift the value at the top to its correct position. Assuming the original tree is balanced, we have the recurrence T(N) = 2 T(N/2) + O(log N) -------- -------- 2*heapify sift If we work this out, which we won't [see Bird's book "Introduction to Programming in Haskell" for the calculation], it turns out that T(N) = O(N). Thus, we have the following O(N) construction of a heap tree from a list of values: mkbtree heapify list -----------> balanced tree -----------> heap tree O(N) O(N) Listing out a heap tree in sorted order --------------------------------------- For balanced search trees, inorder traversal produces a sorted list of values (in linear time). We can also list out the values of a heap tree in sorted order. We know that the largest value is at the root, so we put out the root value first. Now, both the left and right subtrees are heap trees. If we list them out using the same process, they will independently yield sorted lists. We can then merge these lists to get a single sorted list. horder :: (HTree a) -> [a] horder HNil = [] horder (HTree x h1 h2) = x:(merge (horder h1) (horder h2)) where merge :: (Ord a) => [a] -> [a] -> [a] merge l1 [] = l1 merge [] l2 = l2 merge (x:xs) (y:ys) | x <= y = x:(merge xs (y:ys)) | otherwise = y:(merge (x:xs) ys) Note that the output of horder is in descending order. Applying reverse to the output will produce a list in ascending order in linear time. What is the complexity of horder? Assuming the heap tree is size balanced, we have to inductively horder both subtrees of size N/2 and merge them, so we have: T(N) = 2T(N/2) + O(N) or T(N) = O(N log N) Can we do better? Observe that we can now sort an arbitrary list by constructing a heap tree and listing it out using horder. Sorting takes at least O(N log N) time. We can construct a heap in time O(N). Thus, if we could improve the complexity of horder below O(N log N), we would have a sorting algorithm that is below O(N log N)! To complete this discussion, we formally define heapsort: heapsort l = horder (heapify (mkbtree l)) or, using the builtin operator "." for function composition: heapsort = horder . heapify . mkbtree Leftist heaps ------------- We have seen how to construct a size balanced heap (we shall henceforth write just "heap" for "heap tree") from a list in linear time. Recall that to implement a priority queue using heap, we need to implement the following operations: insert :: (PriorityQueue a) -> a -> (PriorityQueue a) delmax :: (PriorityQueue a) -> (a,(PriorityQueue a)) Thus, the one-shot heap construction procedure we have described is not enough. We need an efficient way to update heaps incrementally. We shall describe a technique to combine two heaps of size M and N in time O(log(M + N)). This will solve both the problems above. 1. To insert a value x in heap h, we construct a trivial heap of one element containing x and use the union algorithm to combine this with h. 2. To delete the maximum value in a heap, we remove the root and then combine the left and right subtrees using the union algorithm. Since union takes time O(log(M+N)), we can implement both the required operations in logarithmic time. To define our union operation, we need "leftist heaps". A leftist heap is one in which, at every node, the left subtree is has at least as many nodes as the right subtree. Recall that the definition of a heap does not require any specific order between the subtrees. Thus, if we exchange the left and right subtrees of a heap, we still have a heap. We can use this fact to write a procedure to convert an arbitrary heap into a leftist one, bottom up. We first write a function that realigns the left and right subtrees, according to size: realign :: (HTree a) -> (HTree a) realign HNil = HNil realign (HNode x h1 h2) | (size h1) < (size h2) = HNode x h2 h1 | otherwise = HNode x h1 h2 where size :: (HTree a) -> Int size HNil = 0 size (HTree x h1 h2) = 1 + (size h1) + (size h2) Thus, realign just reorders the left and right subtrees if the leftist property is violated at a node. As usual, we can convert size into a constant time function by storing the size of a heap as one of the values under HNode. In other words, we redefine HTree to be (Eq a) => data HTree a = HNil | HNode Int a (HTree a) (HTree a) However, for simplicity, we shall stick to the original definition in the rest of this exposition. We can now make an entire heap leftist using realign. mkleftist :: (HTree a) -> (HTree a) mkleftist HNil = HNil mkleftist (HNode x h1 h2) = realign (HNode x lh1 lh2) where lh1 = mkleftist h1 lh2 = mkleftist h2 Let us call the rightmost path in a heap the "right spine". The main property of a leftist heap of size n is that the length of the right spine is less than log n. This can be proved easily, by induction on the size of the heap. Let lrs(h) denote the length of the right spine of heap h. n = 0 : The heap is empty and the result is trivial n > 0 : Consider a heap h with root x and left and right subtrees h1 and h2 of size p and q, respectively. Then lrs(h) = 1 + lrs(h2) -- By definition of right spine < 1 + (log q) -- By induction hypothesis on h2 = log 2 + log q -- Arithmetic ... = log 2q -- Arithmetic ... <= log (p + q) -- h is leftist, so p >= q < log (1 + p + q) -- Arithmetic ... = log (size h) Union of leftist heaps ---------------------- Let us look at the problem of combining two leftist heaps: Suppose that h = x and h' = y / \ / \ h1 h2 h3 h4 Clearly, the smaller of x and y should become the root of the combined heap. Suppose x is smaller. We then merge h' with the right subheap h2, using mkleftist to preserve the leftist nature of the heap. Symmetrically, if y is smaller, we inductively merge h with h4. Here is the definition of union. union :: (HTree a) -> (HTree a) -> (HTree a) union h HNil = h union HNil h = h union (HTree x h1 h2) (HTree y h3 h4) | x < y = realign (HTree y h3 (union (HTree x h1 h2) h4)) | otherwise = realign (HTree x h1 (union h2 (HTree y h3 h4))) Each step of union makes one move down the right spine of either h or h'. Since we have already seen that lrs(h) < log (size h) for leftist heaps, it follows that each evaluation of union takes at most log(size h) + log(size h') steps. For any integers m and k, log m^k = k log m, so log m^k = O(log m). From this, it follows that O(log m + log n) = O(log mn) = O(log max(m,n)) = O(log (m+n)) Hence, O(log(size h)) + O(log(size h')) = O(log(size h + size h')), which is the bound we want for union. ======================================================================