Introduction to Programming, Aug-Dec 2008 Lecture 14, Monday 13 Oct 2008 Tree Traversals --------------- One way to describe a tree is to list out all the values stored in the nodes. There are three systematic ways to do this, assuming we read a tree from left to right: a) Preorder: At each node, first list out the value at the node, then inductively list out the left and right subtrees in the same manner. b) Postorder: At each node, inductively list out the left and right subtrees in the same manner, then list out the value at the node. a) Inorder: At each node, first list out the left subtree, then list out the value at the node, and finally list out the right subtree. Here are some examples: 4 preorder : [4,2,1,3,5,6] / \ 2 5 postorder : [1,3,2,6,5,4] / \ \ 1 3 6 inorder : [1,2,3,4,5,6] 3 preorder : [3,2,1,5,4,6] / \ 2 5 postorder : [1,2,4,6,5,3] / / \ 1 4 6 inorder : [1,2,3,4,5,6] These traversals can be defined as inductive functions in the obvious way: preorder :: (BTree a) -> [a] preorder Nil = [] preorder (Node t1 a t2) = [a] ++ (preorder t1) ++ (preorder t2) inorder :: (BTree a) -> [a] inorder Nil = [] inorder (Node t1 a t2) = (inorder t1) ++ [a] ++ (inorder t2) postorder :: (BTree a) -> [a] postorder Nil = [] postorder (Node t1 a t2) = (postorder t1) ++ (postorder t2) ++ [a] Reconstructing binary trees from tree traversals ------------------------------------------------ In general, a single tree traversal does not uniquely define the structure of the tree. For example, as we have seen, for both the following trees, an inorder traversal yields [1,2,3,4,5,6]. 4 3 / \ / \ 2 5 2 5 / \ \ / / \ 1 3 6 1 4 6 The same ambiguity is present for preorder and postorder traversals. The preorder traversal for the first tree above is [4,2,1,3,5,6]. Here is a different tree with the same preorder traversal. 4 / \ 2 1 / \ 3 6 \ 5 Similarly, we can easily construct another tree whose postorder traversal [1,3,2,6,5,4] matches that of the first tree above. Can we unambiguosly reconstruct a tree with preorder traversal [4,2,1,3,5,6] if we fix the inorder traversal to be [1,2,3,4,5,6]? Here is how we would do it, by example, on the tree above. Inorder : [1,2,3,4,5,6] Preorder : [4,2,1,3,5,6] From the preorder traversal, we know that 4 is at the root. The rest of the preorder traversal breaks up as two segments, corresponding to the preorder traversals of the left and the right subtrees. From the position of 4 in the inorder traversal, we know that [1,2,3] is the inorder traversal of the left subtree and [5,6] is the inorder traversal of the right subtree. Since the left subtree has three nodes, we can split the tail of the preorder traversal after three values. Thus, we have identified the root node and the subset of nodes in the left and right subtrees and recursively broken up the reconstruction problem as follows: 4 / \ Left subtree Right subtree Inorder : [1,2,3] Inorder : [5,6] Preorder: [2,1,3] Preorder: [5,6] This suggests the following Haskell program: reconstruct :: [a] -> [a] -> (Btree a) -- First argument is inorder traversal, second is preorder traversal reconstruct [] [] = Nil reconstruct [x] [x] = Node Nil x Nil reconstruct (x:xs) (y:ys) = Node (reconstruct leftin leftpre) y (reconstruct rightin rightpre) where leftsize = length (takeWhile (/= y) (x:xs)) leftin = take leftsize (x:xs) rightin = drop (leftsize+1) (x:xs) leftpre = take leftsize ys rightpre = drop leftsize ys In the definition above, "takeWhile p l" is the builtin function that returns the longest prefix of l all of whose elements satisfy the condition p. Observe that our reconstruction procedure implicitly assumes that all values in the tree have distinct values. Exercise: Write a Haskell function to reconstruct a binary tree from its inorder and postorder traversals. Is is possible to reconstruct a binary tree uniquely from its preorder and postorder traversals? The following example shows that this cannot be done in general: 1 and 1 both have preorder : [1,2] / \ postorder : [2,1] 2 2 However, if we impose additional structure on binary trees---for instance, no node can have a right child without having a left child---preorder and postorder traversals together uniquely fix the shape of a tree. Here is how we could do it, by example Preorder : [4,2,1,3,5,6] Postorder : [1,3,2,6,5,4] 4 is clearly the root. From the preorder traversal we know that 2 is the root of the left subtree and from the postorder traversal we know that 5 is the root of the right subtree. This information is sufficient to recursively breakup the problem as follows: 4 / \ Preorder : [2,1,3] Preorder : [5,6] Postorder: [1,3,2] Postorder: [6,5] Exercise: Write a Haskell function to reconstruct a binary tree from its preorder and postorder traversals with the restriction that no node can have a right child without having a left child. Input/Output ------------ So far, we have invoked Haskell functions interactively, via the Haskell interpreter. In this mode of operation, there is no need for a Haskell program to interact with the "outside world". However, invoking functions interactively has its limitations. It is tedious to type in large inputs manually and read off voluminous output from the screen. Instead, it would be much more convenient to be able to read and write data to files. There is also a natural need for programs to function offline, without direct interaction from the user. Such programs also need mechanisms to take inputs from the environment and write their output back. Why is Input/Output an issue in Haskell? ---------------------------------------- In Haskell, computation is rewriting, using function definitions to simplify expressions. This rewriting is done lazily --- that is, the arguments to a function are evaluated only when needed. A highly desirable goal is confluence --- the order of evaluation of independent subexpressions should not affect the outcome of the computation. An obvious approach would be to make input and output operations functions. For instance, suppose we have a function "read" that reads an integer from the keyboard. Consider the following expression that reads two integers and computes their difference: difference = read - read An immediate problem with this expression is confluence: the order of evaluation of the two (independent) occurrences of read changes the value computed. There is a more subtle problem, arising out of lazy evaluation. Consider a list of the form [7, factorial 8, 3+5]. We can extract its length through the expression: length [factorial 8, 7, 3+5] The function length only needs to check the number of elements in the list, and not the actual values. Under Haskell's lazy evaluation mechanism, this expression evaluates to 3 without actually computing "factorial 8" or "3+5". On the other hand, computing head [factorial 8, 7, 3+5] would result in evaluating "factorial 8", but not "3+5". Consider now, corresponding expressions length [read, read, read] and head [read, read, read] Using lazy evaluation, no values are actually read when evaluating the first expression! On the other hand, evaluating the second expression would read one value. This means that an expression that includes functions that perform input or output is not guaranteed to actually execute the operation. From these observations, we see that input/output actions need to be done in a specific order. Further, there should be no uncertainty as to whether such an action has been performed. Actions ------- To fix this problem, Haskell introduces a new quantity called an action. We can think of the world of a Haskell program as divided into two parts. There is the ideal world of "Values" that contains Ints, Floats, Chars and functions involving such quantities. This is the world that we have been dealing with so far. Side by side with this world is the "real" world with a keyboard, screen, data files ... Actions are used to transfer information from the real world to the ideal world and back. For this course, the only actions we deal with are those involving input and output. Recall that a function can be viewed as a black box with an input and an output. The inputs and outputs to a function are not to be confused with the input and output that we are trying to formalize. To avoid confusion, we will refer to the input of a function as its argument and the output of a function as its result. Here is our abstract view of a function. ----------------- Argument | | Result ---------->| |---------> | | ----------------- Both the argument and the result of a function lie in the abstract world of Values. Remember that the argument or result could itself be another function over Values. Actions, on the other hand, simultaneously interact both with the world of Values and the real world. Here is an abstract picture of an action ----------------- Argument | | Result ---------->| |---------> | /\ | Value world | /||\ | ................|.......||........|......................... Real world | || | -------||--------- \||/ \/ The vertical arrow that penetrates inside the box denoting the action represents the fact that an action transfers data between the Value world and the Real world. One might argue that we should be more careful and describe whether the data flows upwards or downwards in an action, or both ways. However, as we have observed above, the data that flows across this boundary is inherently sequential. An action that reads two data items and then writes one is different from an action that writes one data item between two reads. Hence, we cannot separate out the upward and downward data streams into two "channels" because, if we did so, we might lose information about the order in which reads and writes occur. Instead, we should think of the action as providing a single doorway between the Value world and the Real world through which data items pass one at a time, either upwards or downwards. For instance, an action that reads a character does not need an argument. It interacts with the real world to fetch a character and returns the character that is read as its result. In Haskell, the action that does all this is called getChar and has the follwing type: getChar :: IO Char The word IO indicates that this action performs input/output. There is no argument, only a result type. How about the symmetric action that takes a Char value as argument and prints it out. This action has an argument, but no result. Haskell has a type containing no values, denote "()". Using this, we can describe the type of putChar as: putChar :: Char -> IO () Here, the argument is a Char and the () indicates that the action returns nothing. Notice that we did not need to use () to describe the lack of an argument to getChar because it is legal for a function/action to have only a result (in the world of functions, such a quantity would be a constant that always returns a fixed value). However, we cannot write an expression that leaves out the result type, so we need to use the empty type () in this context. Notice also that a function that reads an argument and does not generate a result is completely useless --- what we do with such an entity? On the other hand, actions can be "one-sided" with respect to the world of Values because they perform something nontrivial with respect to the Real world. The occurrence of an action changes the state of the Real world. For instance, getChar consumes one character from the keyboard, leaving the input data pointing to the next character that has been typed. Similary, putChar produces a character on the screen. In the literature, this behaviour of updating the Real world while reading and generating values is referred to as a "side effect" of the function. Composing actions ----------------- In any nontrivial Haskell program, we compose simple functions to create more complex ones. When we compose functions, we set up a pipe feeding the result of one function as the argument to another one. It is natural to want to the same with actions. For instance, suppose we want to combine getChar and putChar to generate a complex action that reads a character from the keyboard and prints it out to the screen. In function notation, it would suffice to write putChar(getChar) However, this notation hides the fact that we are composing actions, not functions. For instance, we have to evaluate getChar before putChar, which is not the order that normal Haskell evaluation would choose. To get around these difficulties, Haskell provides an explicit operator, >>=, to compose actions. The complex action we are trying to define is written getChar >>= putChar This is to be interpreted as "first do getChar, then feed the result as the argument to putChar". As with functions, we can give this complex action a name. For instance, we can write echo = (getChar >>= putChar) What is the type of echo? The first part of echo is getChar, which does not require an argument. The last part of echo is putChar, which does not produce a result. The argument that putChar requires is supplied internally by getChar and is not a visible part of echo. Thus, echo has type echo :: IO () Seemingly, echo does nothing at all, since it does not require an argument and does not produce a result! What saves the day is the tag IO, which says that though echo has no visible effect in the Value world, it does perform some interaction with the Real world. What is the type of >>=? It takes two actions and connects the output of one to the input of the other. A generic action is of type (a -> IO b) where a and b stand for normal types in the Value world. We might guess that the type of >>= is (>>=) :: (a -> IO b) -> (b -> IO c) -> (a -> IO c) However, >>= is restricted to combining actions in which the first action has no argument, so the actual type of >>= is (>>=) :: IO a -> (a -> IO b) -> IO b Now, suppose we want to repeat echo --- that is, compose echo with itself. We could try echoTwice = (echo >>= echo) but we have a problem. We observed that echo does not require an argument and does not produce a result, so it is not accurate to talk of the result of the first echo action being fed as the argument to the second echo. What we need is an alternative composition operator that discards the result of the first action. This operator is called >> and is of type (>>) :: IO a -> IO b -> IO b Thus echoTwice = (echo >> echo) What if we want to read a character and print it twice? We want a complex action of the form getChar -----> putChar -----> putChar | | ------------- We can achieve this by writing, first put2char :: Char -> IO () put2char c = (putChar c) >> (putChar c) and then writing modifiedecho = (getChar >>= put2Char) do: an easier way to generate sequences of actions --------------------------------------------------- We have seen that composing actions generates, in general, a sequence of actions. In this sequence, the results of some actions may be passed as arguments to one or more later actions. Haskell provides a simple notation to describe such sequences, using the special word "do". For instance, do putChar c putChar c is a complex action that generates two putChars in a row. Thus, we can rewrite the function put2char as put2char :: Char -> IO () put2char = do putChar c putChar c Observe that the lines below do are indented in a systematic way. How do we capture the result of an earlier action to reuse as an argument later? The operator <- binds a variable to a value. Thus, we can write echo :: IO() echo = do c <- getChar putChar c In the first line, c is bound to result of getChar. This value is then used in the second line as an argument to putChar. We can now write getput2char directly using do notation as follows. getput2char :: IO () getput2char = do c <- getChar putChar c putChar c