

Binary Trees
-----------

Binary trees necessarily have two children at each node

>  data Tree a = Nil| Node (Tree a) a (Tree a)



Parse Tree
----------

3+((4*7)+8) ->      +
                   / \
                  3   +
                     / \          The expression relating to this tree is  
                    *   8          unique
                   / \
                  4   7





Reconstructing binary trees from tree traversals
------------------------------------------------

Recall our definition of (polymorphic) binary trees in Haskell:

>  data Tree a = Nil | Node (Tree a) a (Tree a)

We have seen that such trees can unambiguously represent the
structure of arithmetic expressions.  Moreover, we have described
three canonical recursive procedures to list out all the values
in a tree: inorder, preorder and postorder traversals.

In general, a single tree traversal does not uniquely define the
structure of the tree.  For both the following trees, an inorder
traversal yields [1,2,3,4,5,6].

        4                     3
       / \                   / \
      2   5                 2   5
     / \   \               /   / \
    1   3   6             1   4   6

The same ambiguity is present for preorder and postorder
traversals.  The preorder traversal for the first tree above is
[4,2,1,3,5,6].  Here is a different tree with the same preorder
traversal.

        4
       / \
      2   1
         / \
        3   6
         \
          5

Similarly, we can easily construct another tree whose postorder
traversal [1,3,2,6,5,4] matches that of the first tree above.

Can we unambiguosly reconstruct a tree with preorder traversal
[4,2,1,3,5,6] if we fix the inorder traversal to be
[1,2,3,4,5,6]?  Here is how we would do it, by example, on the
tree above.

  Inorder  : [1,2,3,4,5,6]
  Preorder : [4,2,1,3,5,6]

From the preorder traversal, we know that 4 is at the root.  The
rest of the preorder traversal breaks up as two segments,
corresponding to the preorder traversals of the left and the
right subtrees.  From the position of 4 in the inorder traversal,
we know that [1,2,3] is the inorder traversal of the left subtree
and [5,6] is the inorder traversal of the right subtree.  Since
the left subtree has three nodes, we can split the tail of the
preorder traversal after three values.  Thus, we have identified
the root node and the subset of nodes in the left and right
subtrees and recursively broken up the reconstruction problem as
follows:

                         4
                       /   \
          Left subtree       Right subtree
      Inorder : [1,2,3]      Inorder : [5,6]
      Preorder: [2,1,3]      Preorder: [5,6]

This suggests the following Haskell program:

>  reconstruct :: (Eq a) => [a] -> [a] -> (Btree a)
>  -- First argument is inorder traversal, second is preorder traversal

>  reconstruct [] [] = Nil
>  reconstruct [x] [y] |(x==y) = Node Nil x Nil

>  reconstruct (x:xs) (y:ys)   = Node (reconstruct leftin leftpre) y
>                                   (reconstruct rightin rightpre)
>    where
>    leftsize = length (takeWhile (/= y) (x:xs))
>    leftin   = take leftsize (x:xs)
>    rightin  = drop (leftsize+1) (x:xs)
>    leftpre  = take leftsize ys
>    rightpre = drop leftsize ys
>   


In the definition above, "takewhile p l" is the builtin function
that returns the longest prefix of l all of whose elements
satisfy the condition p.  We can write our own definition of
takewhile as follows:

  takewhile :: [a->Bool] -> [a] -> [a]
  takewhile p [] = []
  takewhile p (x:xs)
    | p x       = x:(takewhile p xs)
    | otherwise = []


****Observe that our reconstruction procedure implicitly assumes that
all values in the tree have distinct values.****



Exercise:  Write a Haskell function to reconstruct a binary tree
from its inorder and postorder traversals.

Solution:

>data Btree a = Nil|Node (Btree a) a (Btree a)
>	      deriving(Eq,Show)

>reconstruct::(Eq a)=>[a]->[a]->(Btree a)
>reconstruct [] [] = Nil
>reconstruct [x] [y] |(x==y) = (Node Nil x Nil)
>reconstruct (x:xs) (y) = Node (reconstruct leftin leftpre) (last y) (reconstruct rightin rightpre)
>    where
>    leftsize = length leftin 
>    leftin   = takeWhile (/=(last y)) (x:xs)
>    rightin  = drop (leftsize+1) (x:xs)
>    leftpre  = take leftsize (init y)
>    rightpre = drop leftsize (init y)



Is is possible to reconstruct a binary tree uniquely from its
preorder and postorder traversals?  The following example shows
that this cannot be done in general:

       1   and   1    both have preorder  : [1,2]
      /           \             postorder : [2,1]
     2             2


However, if we impose additional structure on binary trees---for
instance, no node can have a right child without having a left
child---preorder and postorder traversals together uniquely fix
the shape of a tree.   Here is how we could do it, by example

  Preorder  : [4,2,1,3,5,6]
  Postorder : [1,3,2,6,5,4]

4 is clearly the root.  From the preorder traversal we know that
2 is the root of the left subtree and from the postorder
traversal we know that 5 is the root of the right subtree.  This
information is sufficient to recursively breakup the problem as
follows:

                        4
                      /   \
       Preorder : [2,1,3]  Preorder : [5,6]
       Postorder: [1,3,2]  Postorder: [6,5]


Exercise: Write a Haskell function to reconstruct a binary tree
from its preorder and postorder traversals with the restriction
that no node can have a right child without having a left child.

Solution :
>data Btree a = Nil|Node (Btree a) a (Btree a)
>	      deriving(Eq,Show)

>reconstruct::(Eq a)=>[a]->[a]->(Btree a)
>reconstruct [] [] = Nil
>reconstruct [x] [y] |(x==y) = (Node Nil x Nil)
>reconstruct (x:xs) (y) = Node (reconstruct leftpre leftpost) x (reconstruct rightpre rightpost)
>    where
>    leftsize = length leftpre 
>    leftpre     = takeWhile (/=(last (init y))) (xs)
>    rightpre    = drop leftsize xs
>    leftpost    = take leftsize y
>    rightpost   = drop leftsize (init y)



Binary search trees
-------------------

Another important use of binary trees is to store values that we
may want to look up later.  For instance, a binary search tree
could be used to store a dictionary of words.  A binary search
tree satisfies the following property at every node v: all values
in the subtree rooted at v that are smaller than the value stored
at v lie in the left subtree of v and all values in the subtree
rooted at v that are larger than the value stored at v lie in the
right subtree of v.  To emphasize that the values in the tree can
be ordered, we write a elaborate slightly on the Haskell
definition of binary trees to describe search trees.

  data (Ord a) => STree a = Nil | Node (STree a) a (STree a)

Observe that the structure of an STree is identical to that of a
normal Tree, but there is a type class dependence, similar to the
ones we have seen for polymorphic functions such as mergesort and
quicksort.

Here are two examples of search trees over the values
[1,2,3,4,5,6].

        4                     3
       / \                   / \
      2   5                 2   5
     / \   \               /   / \
    1   3   6             1   4   6

Both trees look reasonably well "balanced".  This is not always
the case.  For instance, here is a highly unbalanced search tree
over the same set of values.

                   6
                  /
                 5
                /
               4
              /
             3
            /
           2
          /
         1

To find a value in a binary search tree, we start at the root.
At each node, if we have not already found the value we are
looking for, we can use the search tree property to decide
whether to search in the right subtree or the left subtree.  We
keep walking down the tree in this fashion till we find the value
we seek or we reach a leaf node from where we cannot descend
further.  Thus, each lookup in a binary search tree traverses, in
the worst case, a single path from the root to a leaf node.

How much time does it take to look up a value in a search tree
with n nodes?  Let us say that a tree is balanced if at each node
the size of the left subtree differs from the size of the right
subtree by at most 1.  Initially, we search for the value in the
entire tree, with n nodes.  If we do not find the value at the
root, we search either the left or the right subtree.  Since the
tree is balanced, the number of nodes in each of these subtrees
is at most n/2.  In this way, we successively search trees of
size n, n/2, n/4, ... till we reach a leaf node, a subtree of
size 1.  The length of this sequence is clearly bounded by log n.
Another way of stating this is that the height of a balanced
search tree with n nodes is log n.

Here is a Haskell definition of the search procedure we just
described:

>  findtree :: (Stree a) -> a -> Bool
>  findtree Nil x = False
>  findtree (Node tleft y tright) x 
>    | x == y    = True
>    | x < y     = findtree tleft x
>    | otherwise = findtree tright x

Observe that we a search tree does not contain duplicate values.

Exercise:  What does inorder traversal of a search tree yield?

Soln: A sorted list



-------------------------------------------------------------


Search trees are not static objects.  In general, we have to
insert new values into search trees and remove stale values from
search trees.  There are two aspects to keep in mind while
defining these operations:

  a) We have to ensure that the updated tree remains a search
     tree 

  b) We have to preserve the balanced structure of the tree

For the moment we concentrate on the first aspect.  We will
tackle the problem of maintaining the balanced structure later.

----------------------------------------------------------------

Preserving order in a search tree
---------------------------------

Where should we insert a value into a search tree?  From the
definition of a search tree, there is only one possibility.
Search for the value in the tree.  If it already exists, there is
nothing to be done.  Otherwise, we reach a leaf node.  This is
the same path that we would have to follow to find the new value
after it has been inserted.  So, insert the new value as a left
or right child of the leaf node where the unsuccessful search
terminates.

  inserttree :: (Stree a) -> a -> (Stree a)
  inserttree Nil x = Node Nil x Nil
  inserttree (Node tleft y tright) x 
    | x == y    = Node tleft y tright
    | x < y     = Node (inserttree tleft x) y tright
    | otherwise = Node tleft y (inserttree tright x)

Clearly, the maximum number of steps required to insert a value
into a search tree is equal to the length of the longest path in
the tree.  Thus, if the search tree is balanced and has n nodes,
inserttree takes time log n, but will not in general preserve the
balanced structure of the tree.

How about deleting a value from a tree?  Suppose we find the
value x that we want to delete at some intermediate point in the
tree.

         y == x
        / \
   tleft   tright

We then have to "fuse" tleft and tright into a single tree.  If
tleft is empty, we can just "promote" tright to take the place of
the tree rooted at y.  Symmetrically, we can promote tleft if
tright is empty.  Thus, the nontrivial case is when both tleft
and tright are not empty.

         y == x
       /   \
      w     z
     / \   / \
    t1 t2 t3 t4

One obvious way to deal with this situation is to shift w up in
place of y.  This leaves us with the problem of fusing t1 and t2,
which corresponds to recursively deleting w from the left subtree
of y.

Here is a Haskell definition of the procedure that we have just
described. 

  deletetree :: (Stree a) -> a -> (Stree a)
  deletetree Nil x = Nil
  deletetree (Node tleft y tright) x
    | x < y   = Node (deletetree tleft x) y tright
    | x > y   = Node tleft y (deletetree tright x)

  -- In all cases below, we must have x == y

  deletetree (Node tleft y Nil) x             = tleft
  deletetree (Node Nil y tright) x            = tright
  deletetree (Node (Node t1 w t2) y tright) x = 
                Node (deletetree (Node t1 w t2) w) w tright
      





Search trees (continued)
------------------------

The deletetree function that we have defined is invoked
recursively log n times if the value we want to delete lies at
the root of the tree.  Recall that the interesting case was the
following, where the value to be deleted is y and both the left
and right subtrees of y are not empty.

         y == x
       /   \
      w     z
     / \   / \
    t1 t2 t3 t4

A better solution is to move the largest value from the left
subtree of y in place of y.  The largest value in a search tree
can be found easily, by following the rightmost path in the tree.
Removing this value from a tree is also a relatively easy
operation.  Here is a function that removes the maximum value
from a nonempty tree, returning both the value and the modified
tree, after deletion.

  deletemax :: (STree a) -> (a,STree a)
  deletemax (Node t1 y Nil) = (y,t1)
  deletemax (Node t1 y t2) = (z, Node t1 y tz)
    where (z,tz) = deletemax t2

We can now rewrite deletetree as follows:

  deletetree :: (Stree a) -> a -> (Stree a)
  deletetree Nil x = Nil
  deletetree (Node tleft y tright) x
    | x < y   = Node (deletetree tleft x) y tright
    | x > y   = Node tleft y (deletetree tright x)

  -- In all cases below, we must have x == y

  deletetree (Node Nil y tright) x   = tright
  deletetree (Node tleft y tright) x = Node tz z tright
    where (z,tz) = deletemax tleft

Exercise: Define the function deletemin and change the definition
of deletetree in the "interesting case" to move the smallest
value from the right subtree in place of the node being deleted.



Program can be found in ~/haskell/trial/BStree.hs


Balanced binary trees
---------------------

Having described how to preserve the inductive structure of a
search tree after insert and delete operations, it remains to be
seen how we can maintain the balanced nature of the tree.

We have defined a balanced tree as one in which, at each node,
the sizes of the left and right subtrees differ by at most one.
This definition of balance is difficult to maintain without
incurring a heavy rebalancing cost.  A less stringent definition
is to ask that the heights of the left and right subtrees differ
by at most one.  Such trees are called height-balanced trees or
AVL trees, after Adelson-Velskii and Landis, the first people to
propose the use of such balanced trees.

It is easy to see that a size-balanced tree is always
height-balanced.  The converse is not true in general.  For
instance, the following tree is height-balanced but not
size-balanced.

         4
        / \
       2   5
      /
     1

To use height-balanced trees in place of size-balanced trees, we
have to ensure that the height of a height-balanced tree is
logarithmic in the number of nodes in a tree.  To establish this,
we turn the problem around and consider the following: for a
fixed height h, what is the size S(h) of the smallest
height-balanced tree with height h?

for given h S(h) represents the most skewed tree possible which is height 
balanced

Starting with h=1, we can compute S(h) and draw the corresponding
tree T(h), ignoring the actual values:

  h     S(h)                 T(h)

  1      1                     *

  2      2                     *
                              /
                             *

  3      4                     *
                              / \
                             *   *
                            /
                           *

  ...   ...                   ...

  k     S(k-1)+S(k-2)+1        *
                             /   \
                         T(k-1)  T(k-2)

Thus, the "worst" tree of height k is inductively constructing by
fusing together the worst trees of height k-1 and k-2.  The
recurrence for S(h), given by

  S(1) = 1
  S(2) = 2
  S(k) = S(k-1) + S(k-2) + 1, k > 2

is very similar to that for the Fibonacci numbers

  F(1) = 1
  F(2) = 1
  F(k) = F(k-1) + F(k-2), k > 2

In the case of the Fibonacci numbers, we can show that F(k) is
exponentially large with respect to k.  Analogously, we can show
that S(k) grows exponentially with respect to k.  This means that
even the "most skewed" height-balanced tree of height k has a
size that is exponential in k.  In other words, the "most skewed"
height-balanced tree with n nodes has height logarithmic in n.


Let us assume that we have written a function rebalance that
reestablished the height-balanced property after an insert or
delete.

Here is how we would modify our functions inserttree and
deletetree.

  inserttree :: (Stree a) -> a -> (Stree a)
  inserttree Nil x = Node Nil x Nil
  inserttree (Node tleft y tright) x
    | x == y    = Node tleft y tright
    | x < y     = rebalance (Node (inserttree tleft x) y tright)
    | otherwise = rebalance (Node tleft y (inserttree tright x))


  deletetree :: (Stree a) -> a -> (Stree a)
  deletetree Nil x = Nil
  deletetree (Node tleft y tright) x
    | x < y   = rebalance (Node (deletetree tleft x)) y tright)
    | x > y   = rebalance (Node tleft y (deletetree tright x))

  -- In all cases below, we must have x == y

  deletetree (Node Nil y tright) x   = tright
  deletetree (Node tleft y tright) x = rebalance (Node tz z tright)
    where (z,tz) = deletemax tleft

Let us define the slope of a node to be the difference in heights
between the left and right subtrees of the node.  In a
height-balanced tree, the slope of each node is in the range
{-1,0,+1}.  Inserting or deleting a node can at most change the
slope by 1.  In the modified functions inserttree and deletetree,
we apply rebalancing bottom up, starting from the lowest node
where we potentially disrupt the balance.  Thus, when we need
to rebalance a node, we can assume that its slope is either 2 or
-2 and its subtrees are inductively height-balanced.

Suppose a node has slope 2 (the case -2 will be symmetric) with
both its subtrees height-balanced.  Then, the height of the left
subtree is two more than that of the right subtree.  We shall
consider two cases.

a) The slope at the root of the left subtree is 0 or 1.

   This situation corresponds to the following picture, where t[h]
   denotes that t has height h

                        x
                       / \
                      y   t3[h]
                    /   \
                t1[h+1]  t2[h or h+1]

   We can rotate this tree to the right as follows


                        y
                      /   \
                t1[h+1]    x
                         /   \
               t2[h or h+1]  t3[h]

   It is easy to observe that the slope at y is 0 or -1 and the
   slope at x is either 1 or 0, so the rotated tree is
   height-balanced (recall that t1, t2 and t3 were inductively
   assumed to be height-balanced).

   We have to verify that the resulting tree remains a search
   tree.  To this, it is sufficient to verify that the inorder
   traversals of the two trees match.  It is easy to verify that
   the inorder traversal for both trees is given by

     (inorder t1) ++ [y] ++ (inorder t2) ++ [x] ++ (inorder t3)


b) The slope at the root of the left subtree is -1.

   This situation corresponds to the following picture, where t[h]
   denotes that t has height h

                        x
                       / \
                      y   t3[h]
                    /   \
                t1[h]  t2[h+1]

   where t2 looks like

                        z
                      /   \
         t21[h or h-1]     t22[h or h-1]

   with at least one of t21 or t22 being of height h.

  
  Hence the tree looks like
                       
                           x
                          / \
                         y   t3[h]
                        / \
                    t1[h]  z
                          / \
             t21[h or h-1]   t22[h or h-1]
  
                          

   We can now rotate the subtree rooted at y to the left, as
   follows.

                        x
                      /   \
                    z      t3[h]
                  /   \
                y      t22[h or h-1]
               / \
          t1[h]   t21[h or h-1]

   First, we verify that in both cases, the inorder traversal of
   the left subtree of x is given by

    (inorder t1) ++ [y] ++ (inorder t21) ++ [z] ++ (inorder t22)

   so this left rotation preserves the search tree property.

   Notice that at this stage, the slope at z may be 2 (because it
   is possible that t1[h], t21[h] and t22[h-1]).

   However, we now apply a right rotation at x to get

                 _____________z____________
                /                          \
              y                              x
             / \                            / \
        t1[h]   t21[h or h-1]  t22[h or h-1]   t3[h]

   At this point, clearly the slope of x is -1 or 0, the slope of
   y is 1 or 0 and the slope of z is 0.  Further, the inorder
   traversal of this tree yields

     (inorder t1)++[y]++(inorder t21)++[z]++
                          (inorder t22)++[x]++(inorder t3)

   which is the same as that of the original tree, before the two
   rotations. 

We can now write the function rebalance:

   rebalance :: (Stree a) -> (Stree a)

   rebalance (Node t1 y t2)
     | abs (sy) < 2         = Node t1 y t2
     | sy == 2 && st1 /= -1 = rotateright (Node t1 y t2)
     | sy == 2 && st1 == -1 = rotateright (Node (rotateleft t1) y t2)
     | sy == -2 && st2 /= 1 = rotateleft (Node t1 y t2)
     | sy == -2 && st2 == 1 = rotateleft (Node t1 y (rotateright t2))
     where
       sy  = slope (Node t1 y t2)
       st1 = slope t1
       st2 = slope t2

       rotateright (Node (Node t1 y t2) x t3) = Node t1 y (Node t2 x t3)
       rotateleft  (Node t1 x (Node t2 y t3)) = Node (Node t1 y t2) x t3

How do we compute the slope?  A naive implementation would be as
follows

  slope :: (Stree a) -> Int
  slope Nil = 0
  slope (Node t1 x t2) = (height t1) - (height t2)

  height :: (Stree a) -> Int
  height Nil = 0
  height (Node t1 x t2) = 1 + (max (height t1) (height t2))

Clearly, computing height requires examining each node in the
tree, so computing the slope of a tree is proportional to n,
rather than log n.

The solution is to augment the information stored at each node to
inductively maintain the height of a node.
     
  data (Ord a) => STree a = Nil | Node Int (STree a) a (STree a)
  
where a node of the form (Node m t1 x t2) records that the value
at this node is x and the subtree rooted at this node has height
m.

With this additional information, we have

  height :: (Stree a) -> Int
  height Nil = 0
  height (Node m t1 x t2) = m

Thus, computing the slope at any node takes constant time. 

It remains to modify all the functions we have written so
far---insert, delete, rebalance---to correctly compute the height
value at a node after each update.  This is left as an exercise.



