Introduction to Programming, Aug-Dec 2008
Lecture 18, Wednesday 29 Oct 2008

The abstract datatype Set
-------------------------

   The naive way to make a balanced tree from a list is to use
   the centre element of the list as the root and recursively
   construct balanced left and right subtrees from the first and
   second halves of the list:

       mkbtree :: [a] -> (STree a)

       mkbtree [] = Nil
       mkbtree [x] = Node Nil x Nil
       mkbtree l =  Node (mkbtree left) root (mkbtree right)
         where
           m = (length l) `div` 2
           root == l!!m
           left = take m l
           right = drop (m+1) l
   
   The complexity of mkbtree is given by the following
   recurrence:

     T(n) = 2 T(n/2) + O(n)

   The O(n) factor comes because it takes linear time to compute
   the midpoint of the input list and break it up into two
   halves.

   To do better, we need a more sophisticated version of the
   trick we used to improve the efficiency of inorder.   This
   time, rather than constructing a list as output, our input is
   a list.  For optimum efficiency, we need to process the input
   from left to right.  To achieve this, we write a function

     mkbtreeaux :: [a] -> Int -> (STree a, Int)

   whose behaviour is as follows:

     mkbtreeaux l n = (t,lrest)
 
   where t is a balanced tree made up from the first n elements
   of l and lrest is the used part of l (i.e., drop n l)

   Here is how we define mkbtreeaux:

     mkbtreeaux [] n = (Nil, [])
     mkbtreeaux l  0 = (Nil, l)
     mkbtreeaux l  n = (Node t1 root t2, l2)
       where
       m = n `div` 2
       (t1,(root:rest)) = mkbtreeaux l m
       (t2,l2) = mkbtreeaux rest (n - (m+1)) 

   As before, we observe that to construct a balanced tree from
   the first n elements of l, we need to select the midpoint of
   the n elements as the root and inductively make balanced left
   and right subtrees from the left and right halves.  However,
   instead of explicitly finding the midpoint and breaking up the
   list into two parts, we build up the tree from left to right
   using the same function.

   Pictorially, we have
                            l
   |----------------------------------------------------------|
                   n                              l2
   <------------------------------------>|--------------------|
          m                             rest
   <---------------> root |-----------------------------------|
                            n-(m+1)
     becomes t1            <------------>
                                                  l2
                            becomes t2   |--------------------|

   For this function, the time complexity is given by

      T(n) = 2 T(n/2) + O(1)

   which yields T(n) = O(n), as required.

   Of course, we define mkbtree using mkbtreeaux as

      mkbtree l = fst (mkbtreeaux l (length l))

   Thus, we can now implement union, intersect and setdiff for
   the balanced search tree representation of sets in linear time
   as a sequence of three operations:

   1. inorder => generates  sorted lists from the sets in linear time
   2. appropriate merge => combines sorted lists in linear time
   3. mkbtree => reconstructs balanced search tree in linear time


Priority queues
---------------

A priority queue is like a queue, except that elements that enter
the queue have priorities, and hence do not exit the queue in the
order that they entered.  (Think of VIP's waiting for darshan at
Tirupati.)

Each item in a priority queue is thus a pair (p,v) where v is the
actual value and p is the priority.  For simplicity, we denote
priorities using integers.  Let us decide that priority p is
higher than p' if p is bigger than p'.

  [Observe that everything we say henceforth can be done
   equivalently by reversing this condition and saying that p is
   higher priority than p' if p is smaller than p'.]

We then need to implement the following operations in a priority
queue:

  insert :: (PriorityQueue a) -> a -> (PriorityQueue a)
  delmax :: (PriorityQueue a) -> (a,(PriorityQueue a))

The first operation inserts an element into the queue while the
second removes an element with the highest priority value.

As with sets, we can quickly run through various implementations
of priority queues and analyze the complexity of implementing the
basic operations.

1. Unsorted lists:

   If we maintain a priority queue as an unsorted list of pairs
   (p,v), insert takes time O(1) while delmax takes time O(n) for
   a queue with n elements.

2. Sorted lists:

   If we sort the list in 1 in descending order of priority
   values, insert takes O(n) time while delmax takes time O(1)
   because the maximum priority value is always at the head of
   the list.

3. Balanced search trees:

   Here, we take time O(log n) to insert a value.  The maximum
   value in a search tree is found by following the rightmost
   path to the leaf.  Since all paths are of length O(log n),
   finding the largest value takes time O(log n).  We can then
   delete it in time O(log n).

   One difficulty with the balanced search tree approach is that,
   so far, we have always assumed that we maintain search trees
   with at most one copy of any value.  In a priority queue, many
   items may share the same priority value, so we have to modify
   our definitions of search trees accordingly.

Note: From now on, we shall ignore the fact that elements in a
  priority queue are pairs (p,v) of priorities and values and
  think of them as single entities.  Effectively, we are only
  going to store and manipulate the priorities.

Heaps (or Heap Trees)
---------------------

Constructing a balanced binary search tree from a priority queue
is essentially equivalent to sorting the elements in the queue,
because we can use our efficient inorder traversal to generate a
sorted list from a binary search tree.

Intuitively, it should not be necessary to order *all* elements
in the queue to extract the one with minimum priority value.  A
weaker ordering should suffice.  This is achieved by heap trees.

Heap trees are binary trees, just like search trees, except that
the inductive property relating values at various nodes is
different.  Here is the data definition (in which the components
of HNode are arranged differently from those in a binary search
tree to emphasize that this tree has a different organization).

  (Eq a) => data HTree a = HNil | HNode a (HTree a) (HTree a)

The "heap property" is the following:

  The value at a node is larger than the values at its two
  children.

A heap tree is one in which every node satisfies the heap
property.  A simple inductive argument establishes that the
largest value in a heap tree is found at the root.  However, we
cannot say anything about the relative order of the values in the
two subtrees.  All of the following are valid heap trees.


        6             6               6
       / \     	     / \     	     / \
      5   2    	    4   5      	    3   5
     / \   \   	   / \   \   	   / \   \
    3   4   1  	  2   3   1  	  1   2	  4


Thus, in a heap tree, finding the largest element is
straightforward --- it is always at the root!

  [The heap condition we have defined corresponds to what are
   called max-heap trees.  Dually, we can insist that every node
   be smaller than its children and obtain a min-heap tree.]

How do we build a heap tree from a list of values?  We can begin
by using mkbtree, defined earlier, to construct a (size) balanced
binary tree from the list in linear time.

Thus, it is sufficient to describe how to convert a (balanced)
binary tree into a (balanced) heap tree.

Assume that we have a node x whose left and right subtrees, L
rooted at y and R rooted at z are already heap trees.

                       --x--
                      /     \
                     y       z
                     |       |
                    / \     / \
                   / L \   / R \
                   -----   -----

What do we need to do to ensure that the combined tree has the
heap property?  Clearly, if x >= max(y,z), there is no problem.
If this does not hold, we exchange x with max(x,y).  This brings
x either to the root of L or the root of R.  The heap property
now holds at the top of this tree and also continues to hold in
the subtree that was not affected by the exchange.  In the
subtree where x has moved, we have to inductively repeat the same
step.  In this process, a "light" value x will come down as far
as it needs to before settling in a stable position.  This is
often called sifting.  (A sieve that is used to clean flour lets
smaller particles of flour through and retains stones and other
large particles above.)

Here is the function that we just described.

   sift :: (HTree a) -> (HTree a)
   sift HNil = HNil
   sift (HNode x HNil HNil) = (HNode x HNil HNil)
   sift (HNode x (HNode y t1 t2) HNil)
       | x >= y    = HNode x (HNode y t1 t2) HNil
       | otherwise = HNode y (sift (HNode x t1 t2)) HNil
   sift (HNode x HNil (HNode z t3 t4))
       | x >= z    = HNode x HNil (HNode z t3 t4)
       | otherwise = HNode z HNil (sift (HNode x t3 t4))
   sift (HNode x (HNode y t1 t2) (HNode z t3 t4))
       | x >= max(y,z) = HNode x (HNode y t1 t2) (HNode z t3 t4)
       | y >= max(x,z) = HNode y (sift (HNode x t1 t2)) (HNode z t3 t4)
       | z >= max(x,y) = HNode z (HNode y t1 t2) (sift (HNode x t3 t4))

How long does it take to sift a tree?  Observer that the value at
the root descends one level after each sift.  Thus, sift can call
itself at most log n times if the original tree is balanced.
Moreover, sift does not alter the structure of the tree, only the
positions of the values, so the balance is retained.

We can now systematically heapify a balanced tree bottom up.

   heapify :: (BTree a) -> (HTree a)

   heapify Nil = HNil

   heapify (Node t1 x t2) = sift (HNode x (heapify t1) (heapify t2))

How much time does heapify take?  In order to heapify a tree, we
have to first heapify its left and right subtrees and then sift
the value at the top to its correct position.  Assuming the
original tree is balanced, we have the recurrence

  T(N) = 2 T(N/2) + O(log N)
         --------   --------
        2*heapify     sift

If we work this out, which we won't [see Bird's book
"Introduction to Programming in Haskell" for the calculation], it
turns out that T(N) = O(N).

Thus, we have the following O(N) construction of a heap tree from a
list of values:

         mkbtree                     heapify
  list ----------->  balanced tree -----------> heap tree
          O(N)                        O(N)