Introduction to Programming, Aug-Dec 2008
Lecture 13, Monday 06 Oct 2008

Trees in Haskell
----------------

A tree is a structure in which each node has multiple successors,
called children.  There is a special node, called the root, from
which the tree begins.  The root is usually drawn as the topmost
node in the tree.  Every node other than the root has a unique
parent (the node of which this is a child), and hence a unique
path back to the root following parent links.

The following are not trees because some nodes have multiple
parents.  In particular, the structure on the left also has two
"root" nodes, 4 and 8.

        4   8                 3
       / \ /                 /|\
      2   5                 2 7 5
     / \ / \               /  |/ \
    1   3   6             1   4   6


Here are some examples of trees:

        4                     3
       / \                   /|\
      2   5                 2 7 5
     / \   \               /   / \
    1   3   6             1   4   6

In the trees we have drawn, a value is stored at each node.  As
in lists, these values have a uniform type --- Int, in the
examples above.  A bottom level node with no children is called a
leaf node.  Non-leaf nodes are called internal nodes.  Internal
nodes in a tree need not have a uniform number of children.  For
instance, the node with value 5 in the left tree has only one
child while the node with value 2 has two children.  The order of
the children is important.  In the tree on the left, each node
has upto two children and the two children are oriented as left
and right.  Thus, 2 is the left child of the root 4 and 3 is the
right childe of 2.  Notice that though 5 has only one child, 6,
this is a right child, not a left child.  The tree on the right
has upto three children per node.

We will typically look at binary trees, in which each node has
upto two children.  Here is how we describe a binary data over an
arbitrary type a.


  data BTree a = Nil | Node (BTree a) a (BTree a)

We could organize the constructor Node in other ways --- for
instance

  data BTree a = Nil | Node a (BTree a) (BTree a)

or

  data BTree a = Nil | Node (BTree a) (BTree a) a

Our choice of putting "a" between the two instances of (Btree a)
is helpful to visualize the structure of the tree.


Size vs height in trees
-----------------------

Lists have a linear structure, so there is only one measure of
size for a list, the length of the list.  Trees are two
dimensional, so we consider two quantities:

a) size   : the number of nodes in the tree
b) height : the length of the longest path from a root to a leaf

In general, we cannot fix a relationship between the height of a
tree and its size.  For instance, a tree could be highly skewed,
and have its height equal to its size, as follows.

                   6
                  /
                 5
                /
               4
              /
             3
            /
           2
          /
         1

We define a binary tree to be perfect if, at every node, the
size of the left and right subtrees are equal.  Here is a perfect
binary tree.

                  4
                /   \
               2     6
              / \   / \
             1   3 5   7

We can assign a level to each node in a binary tree --- the root
is at level 0, the children of the root are at level 1, ....,
children of nodes at level i are at level i+1, ...

In a perfect binary tree, it is easy to observe that there are
2^i nodes at level i.  Thus, if a perfect binary tree has height
h, then it has nodes at levels 0,1,...,h-1, and thus the size is
2^0 + 2^1 + ... + 2^{h-1} = 2^h - 1.  This shows that in a
perfect binary tree, the size is exponential with respect to the
height.  Conversely, a perfect binary tree with n nodes has
height log n.

A more generous notion is that of a balanced search tree ---
instead of requiring each node to have equal sized left and right
subtrees, we say that a tree is balanced if the size of the left
and right subtrees at a node differ by at most one.  Here is a
balanced, but not perfect, binary tree.
 
                  4
                /   \
               2     6
              /     / \
             1     5   7
            
It is not difficult to show that the exponential gap between
height and size holds for balanced trees as well.


Binary search trees
-------------------

An important use of binary trees is to store values that we may
want to look up later.  For instance, a binary search tree could
be used to store a dictionary of words.  A binary search tree
satisfies the following property at every node v: all values in
the subtree rooted at v that are smaller than the value stored at
v lie in the left subtree of v and all values in the subtree
rooted at v that are larger than the value stored at v lie in the
right subtree of v.  To emphasize that the values in the tree can
be ordered, we write a elaborate slightly on the Haskell
definition of binary trees to describe search trees.

  data (Ord a) => STree a = Nil | Node (STree a) a (STree a)

Observe that the structure of an STree is identical to that of a
normal Tree, but there is a type class dependence, similar to the
ones we have seen for polymorphic functions such as mergesort and
quicksort.

Here are two examples of search trees over the values
[1,2,3,4,5,6].

        4                     3
       / \                   / \
      2   5                 2   5
     / \   \               /   / \
    1   3   6             1   4   6

Both trees look reasonably well "balanced".  This is not always
the case.  For instance, here is a highly unbalanced search tree
over the same set of values.

                   6
                  /
                 5
                /
               4
              /
             3
            /
           2
          /
         1

To find a value in a binary search tree, we start at the root.
At each node, if we have not already found the value we are
looking for, we can use the search tree property to decide
whether to search in the right subtree or the left subtree.  We
keep walking down the tree in this fashion till we find the value
we seek or we reach a leaf node from where we cannot descend
further.  Thus, each lookup in a binary search tree traverses, in
the worst case, a single path from the root to a leaf node.

How much time does it take to look up a value in a search tree
with n nodes?  Let us say that a tree is balanced if at each node
the size of the left subtree differs from the size of the right
subtree by at most 1.  Initially, we search for the value in the
entire tree, with n nodes.  If we do not find the value at the
root, we search either the left or the right subtree.  Since the
tree is balanced, the number of nodes in each of these subtrees
is at most n/2.  In this way, we successively search trees of
size n, n/2, n/4, ... till we reach a leaf node, a subtree of
size 1.  The length of this sequence is clearly bounded by log n.
Another way of stating this is that the height of a balanced
search tree with n nodes is log n.

Here is a Haskell definition of the search procedure we just
described:

  findtree :: (Stree a) -> a -> Bool
  findtree Nil x = False
  findtree (Node tleft y tright) x 
    | x == y    = True
    | x < y     = findtree tleft x
    | otherwise = findtree tright x

Observe that a search tree does not contain duplicate values.

Exercise:  What does inorder traversal of a search tree yield?

Search trees are not static objects.  In general, we have to
insert new values into search trees and remove stale values from
search trees.  There are two aspects to keep in mind while
defining these operations:

  a) We have to ensure that the updated tree remains a search
     tree 

  b) We have to preserve the balanced structure of the tree

For the moment we concentrate on the first aspect.  We will
tackle the problem of maintaining the balanced structure later.

Where should we insert a value into a search tree?  From the
definition of a search tree, there is only one possibility.
Search for the value in the tree.  If it already exists, there is
nothing to be done.  Otherwise, we reach a leaf node.  This is
the same path that we would have to follow to find the new value
after it has been inserted.  So, insert the new value as a left
or right child of the leaf node where the unsuccessful search
terminates.

  inserttree :: (Stree a) -> a -> (Stree a)
  inserttree Nil x = Node Nil x Nil
  inserttree (Node tleft y tright) x 
    | x == y    = Node tleft y tright
    | x < y     = Node (inserttree tleft x) y tright
    | otherwise = Node tleft y (inserttree tright x)

Clearly, the maximum number of steps required to insert a value
into a search tree is equal to the length of the longest path in
the tree.  Thus, if the search tree is balanced and has n nodes,
inserttree takes time log n, but will not in general preserve the
balanced structure of the tree.

How do we delete a value from a tree?

First, we have to agree on what happens if the value to be
deleted does not occur in the tree.  One approach is to declare
this an error.  It is easier, how ever, to interpret "delete x
from t" as "delete x from t if the value exists in t", so if we
try to delete x and it is not found in t, the tree t is
unchanged.

Suppose we want to delete a value x from a tree whose root is y.
If x < y, we inductively delete x from the left subtree of y.
Similarly, if x > y, we inductively delete x from the right
subtree of y.  So, the interesting case is when x==y.

         y == x
       /   \
      w     z
     / \   / \
    t1 t2 t3 t4

If we remove y, we have a "hole" at the root of this tree.  It is
tempting to move either w (or z) into this place and recursively
delete w from the left subtree (or z from the right subtree).
However, this would not preserve the structure of the tree ---
for instance, if we move w up to the root, values in the tree t2,
which are bigger than w, will end up to the left of w.

The correct solution is to move the largest value from the left
subtree of y or the smallest value from the right subtree of y
in place of y.  The smallest value in a search tree can be found
easily, by following the leftmost path in the tree.  Removing
this value from a tree is also a relatively easy operation.  Here
is a function that removes the minmum value from a nonempty
tree, returning both the value and the modified tree, after
deletion.

  deletemin :: (STree a) -> (a,STree a)
  deletemin (Node Nil y t2) = (y,t2)
  deletemin (Node t1 y t2) = (z, Node t1 y tz)
    where (z,tz) = deletemin t1

We can now rewrite deletetree as follows:

  deletetree :: (Stree a) -> a -> (Stree a)
  deletetree Nil x = Nil
  deletetree (Node tleft y tright) x
    | x < y   = Node (deletetree tleft x) y tright
    | x > y   = Node tleft y (deletetree tright x)

  -- In all cases below, we must have x == y

  deletetree (Node tleft y Nil) x   = tleft
  deletetree (Node tleft y tright) x = Node tleft z tz
    where (z,tz) = deletemin tright

Exercise: Define the function deletemax and change the definition
of deletetree in the "interesting case" to move the largest
value from the left subtree in place of the node being deleted.