Introduction to Programming, Aug-Dec 2006 Lecture 12, Thursday 21 Sep 2006 Reconstructing binary trees from tree traversals ------------------------------------------------ In general, a single tree traversal does not uniquely define the structure of the tree. For example, as we have seen, for both the following trees, an inorder traversal yields [1,2,3,4,5,6]. 4 3 / \ / \ 2 5 2 5 / \ \ / / \ 1 3 6 1 4 6 The same ambiguity is present for preorder and postorder traversals. The preorder traversal for the first tree above is [4,2,1,3,5,6]. Here is a different tree with the same preorder traversal. 4 / \ 2 1 / \ 3 6 \ 5 Similarly, we can easily construct another tree whose postorder traversal [1,3,2,6,5,4] matches that of the first tree above. Can we unambiguosly reconstruct a tree with preorder traversal [4,2,1,3,5,6] if we fix the inorder traversal to be [1,2,3,4,5,6]? Here is how we would do it, by example, on the tree above. Inorder : [1,2,3,4,5,6] Preorder : [4,2,1,3,5,6] From the preorder traversal, we know that 4 is at the root. The rest of the preorder traversal breaks up as two segments, corresponding to the preorder traversals of the left and the right subtrees. From the position of 4 in the inorder traversal, we know that [1,2,3] is the inorder traversal of the left subtree and [5,6] is the inorder traversal of the right subtree. Since the left subtree has three nodes, we can split the tail of the preorder traversal after three values. Thus, we have identified the root node and the subset of nodes in the left and right subtrees and recursively broken up the reconstruction problem as follows: 4 / \ Left subtree Right subtree Inorder : [1,2,3] Inorder : [5,6] Preorder: [2,1,3] Preorder: [5,6] This suggests the following Haskell program: reconstruct :: [a] -> [a] -> (Btree a) -- First argument is inorder traversal, second is preorder traversal reconstruct [] [] = Nil reconstruct [x] [x] = Node Nil x Nil reconstruct (x:xs) (y:ys) = Node (reconstruct leftin leftpre) y (reconstruct rightin rightpre) where leftsize = length (takeWhile (/= y) (x:xs)) leftin = take leftsize (x:xs) rightin = drop (leftsize+1) (x:xs) leftpre = take leftsize ys rightpre = drop leftsize ys In the definition above, "takeWhile p l" is the builtin function that returns the longest prefix of l all of whose elements satisfy the condition p. Observe that our reconstruction procedure implicitly assumes that all values in the tree have distinct values. Exercise: Write a Haskell function to reconstruct a binary tree from its inorder and postorder traversals. Is is possible to reconstruct a binary tree uniquely from its preorder and postorder traversals? The following example shows that this cannot be done in general: 1 and 1 both have preorder : [1,2] / \ postorder : [2,1] 2 2 However, if we impose additional structure on binary trees---for instance, no node can have a right child without having a left child---preorder and postorder traversals together uniquely fix the shape of a tree. Here is how we could do it, by example Preorder : [4,2,1,3,5,6] Postorder : [1,3,2,6,5,4] 4 is clearly the root. From the preorder traversal we know that 2 is the root of the left subtree and from the postorder traversal we know that 5 is the root of the right subtree. This information is sufficient to recursively breakup the problem as follows: 4 / \ Preorder : [2,1,3] Preorder : [5,6] Postorder: [1,3,2] Postorder: [6,5] Exercise: Write a Haskell function to reconstruct a binary tree from its preorder and postorder traversals with the restriction that no node can have a right child without having a left child. Binary search trees ------------------- Another important use of binary trees is to store values that we may want to look up later. For instance, a binary search tree could be used to store a dictionary of words. A binary search tree satisfies the following property at every node v: all values in the subtree rooted at v that are smaller than the value stored at v lie in the left subtree of v and all values in the subtree rooted at v that are larger than the value stored at v lie in the right subtree of v. To emphasize that the values in the tree can be ordered, we write a elaborate slightly on the Haskell definition of binary trees to describe search trees. data (Ord a) => STree a = Nil | Node (STree a) a (STree a) Observe that the structure of an STree is identical to that of a normal Tree, but there is a type class dependence, similar to the ones we have seen for polymorphic functions such as mergesort and quicksort. Here are two examples of search trees over the values [1,2,3,4,5,6]. 4 3 / \ / \ 2 5 2 5 / \ \ / / \ 1 3 6 1 4 6 Both trees look reasonably well "balanced". This is not always the case. For instance, here is a highly unbalanced search tree over the same set of values. 6 / 5 / 4 / 3 / 2 / 1 To find a value in a binary search tree, we start at the root. At each node, if we have not already found the value we are looking for, we can use the search tree property to decide whether to search in the right subtree or the left subtree. We keep walking down the tree in this fashion till we find the value we seek or we reach a leaf node from where we cannot descend further. Thus, each lookup in a binary search tree traverses, in the worst case, a single path from the root to a leaf node. How much time does it take to look up a value in a search tree with n nodes? Let us say that a tree is balanced if at each node the size of the left subtree differs from the size of the right subtree by at most 1. Initially, we search for the value in the entire tree, with n nodes. If we do not find the value at the root, we search either the left or the right subtree. Since the tree is balanced, the number of nodes in each of these subtrees is at most n/2. In this way, we successively search trees of size n, n/2, n/4, ... till we reach a leaf node, a subtree of size 1. The length of this sequence is clearly bounded by log n. Another way of stating this is that the height of a balanced search tree with n nodes is log n. Here is a Haskell definition of the search procedure we just described: findtree :: (Stree a) -> a -> Bool findtree Nil x = False findtree (Node tleft y tright) x | x == y = True | x < y = findtree tleft x | otherwise = findtree tright x Observe that a search tree does not contain duplicate values. Exercise: What does inorder traversal of a search tree yield? Search trees are not static objects. In general, we have to insert new values into search trees and remove stale values from search trees. There are two aspects to keep in mind while defining these operations: a) We have to ensure that the updated tree remains a search tree b) We have to preserve the balanced structure of the tree For the moment we concentrate on the first aspect. We will tackle the problem of maintaining the balanced structure later. Where should we insert a value into a search tree? From the definition of a search tree, there is only one possibility. Search for the value in the tree. If it already exists, there is nothing to be done. Otherwise, we reach a leaf node. This is the same path that we would have to follow to find the new value after it has been inserted. So, insert the new value as a left or right child of the leaf node where the unsuccessful search terminates. inserttree :: (Stree a) -> a -> (Stree a) inserttree Nil x = Node Nil x Nil inserttree (Node tleft y tright) x | x == y = Node tleft y tright | x < y = Node (inserttree tleft x) y tright | otherwise = Node tleft y (inserttree tright x) Clearly, the maximum number of steps required to insert a value into a search tree is equal to the length of the longest path in the tree. Thus, if the search tree is balanced and has n nodes, inserttree takes time log n, but will not in general preserve the balanced structure of the tree. How do we delete a value from a tree? First, we have to agree on what happens if the value to be deleted does not occur in the tree. One approach is to declare this an error. It is easier, how ever, to interpret "delete x from t" as "delete x from t if the value exists in t", so if we try to delete x and it is not found in t, the tree t is unchanged. Suppose we want to delete a value x from a tree whose root is y. If x < y, we inductively delete x from the left subtree of y. Similarly, if x > y, we inductively delete x from the right subtree of y. So, the interesting case is when x==y. y == x / \ w z / \ / \ t1 t2 t3 t4 If we remove y, we have a "hole" at the root of this tree. It is tempting to move either w (or z) into this place and recursively delete w from the left subtree (or z from the right subtree). However, this would not preserve the structure of the tree --- for instance, if we move w up to the root, values in the tree t2, which are bigger than w, will end up to the left of w. The correct solution is to move the largest value from the left subtree of y (or the smallest value from the right subtree of y) in place of y. The largest value in a search tree can be found easily, by following the rightmost path in the tree. Removing this value from a tree is also a relatively easy operation. Here is a function that removes the maximum value from a nonempty tree, returning both the value and the modified tree, after deletion. deletemax :: (STree a) -> (a,STree a) deletemax (Node t1 y Nil) = (y,t1) deletemax (Node t1 y t2) = (z, Node t1 y tz) where (z,tz) = deletemax t2 We can now rewrite deletetree as follows: deletetree :: (Stree a) -> a -> (Stree a) deletetree Nil x = Nil deletetree (Node tleft y tright) x | x < y = Node (deletetree tleft x) y tright | x > y = Node tleft y (deletetree tright x) -- In all cases below, we must have x == y deletetree (Node Nil y tright) x = tright deletetree (Node tleft y tright) x = Node tz z tright where (z,tz) = deletemax tleft Exercise: Define the function deletemin and change the definition of deletetree in the "interesting case" to move the smallest value from the right subtree in place of the node being deleted.