Introduction to Programming, Aug-Dec 2008
Lecture 14, Monday 13 Oct 2008

Tree Traversals
---------------

One way to describe a tree is to list out all the values stored
in the nodes.  There are three systematic ways to do this,
assuming we read a tree from left to right:

a) Preorder:   At each node, first list out the value at the node,
               then inductively list out the left and right
               subtrees in the same manner.


b) Postorder:  At each node, inductively list out the left and
               right subtrees in the same manner, then list out
               the value at the node.

a) Inorder:    At each node, first list out the left subtree,
               then list out the value at the node, and finally
               list out the right subtree.


Here are some examples:


        4           preorder  : [4,2,1,3,5,6]
       / \        
      2   5         postorder : [1,3,2,6,5,4]
     / \   \      
    1   3   6       inorder   : [1,2,3,4,5,6]


        3           preorder  : [3,2,1,5,4,6]
       / \
      2   5         postorder : [1,2,4,6,5,3]
     /   / \
    1   4   6       inorder   : [1,2,3,4,5,6]

These traversals can be defined as inductive functions in the
obvious way:

  preorder :: (BTree a) -> [a]
  preorder Nil = []
  preorder (Node t1 a t2) = [a] ++ (preorder t1) ++ (preorder t2)

  inorder :: (BTree a) -> [a]
  inorder Nil = []
  inorder (Node t1 a t2) = (inorder t1) ++ [a] ++ (inorder t2)

  postorder :: (BTree a) -> [a]
  postorder Nil = []
  postorder (Node t1 a t2) = (postorder t1) ++ (postorder t2) ++ [a]


Reconstructing binary trees from tree traversals
------------------------------------------------

In general, a single tree traversal does not uniquely define the
structure of the tree.  For example, as we have seen, for both
the following trees, an inorder traversal yields [1,2,3,4,5,6].

        4                     3
       / \                   / \
      2   5                 2   5
     / \   \               /   / \
    1   3   6             1   4   6

The same ambiguity is present for preorder and postorder
traversals.  The preorder traversal for the first tree above is
[4,2,1,3,5,6].  Here is a different tree with the same preorder
traversal.

        4
       / \
      2   1
         / \
        3   6
         \
          5

Similarly, we can easily construct another tree whose postorder
traversal [1,3,2,6,5,4] matches that of the first tree above.

Can we unambiguosly reconstruct a tree with preorder traversal
[4,2,1,3,5,6] if we fix the inorder traversal to be
[1,2,3,4,5,6]?  Here is how we would do it, by example, on the
tree above.

  Inorder  : [1,2,3,4,5,6]
  Preorder : [4,2,1,3,5,6]

From the preorder traversal, we know that 4 is at the root.  The
rest of the preorder traversal breaks up as two segments,
corresponding to the preorder traversals of the left and the
right subtrees.  From the position of 4 in the inorder traversal,
we know that [1,2,3] is the inorder traversal of the left subtree
and [5,6] is the inorder traversal of the right subtree.  Since
the left subtree has three nodes, we can split the tail of the
preorder traversal after three values.  Thus, we have identified
the root node and the subset of nodes in the left and right
subtrees and recursively broken up the reconstruction problem as
follows:

                         4
                       /   \
          Left subtree       Right subtree
      Inorder : [1,2,3]      Inorder : [5,6]
      Preorder: [2,1,3]      Preorder: [5,6]

This suggests the following Haskell program:

  reconstruct :: [a] -> [a] -> (Btree a)
  -- First argument is inorder traversal, second is preorder traversal

  reconstruct [] [] = Nil
  reconstruct [x] [x] = Node Nil x Nil

  reconstruct (x:xs) (y:ys) = Node (reconstruct leftin leftpre) y
                                   (reconstruct rightin rightpre)
    where
    leftsize = length (takeWhile (/= y) (x:xs))
    leftin   = take leftsize (x:xs)
    rightin  = drop (leftsize+1) (x:xs)
    leftpre  = take leftsize ys
    rightpre = drop leftsize ys

In the definition above, "takeWhile p l" is the builtin function
that returns the longest prefix of l all of whose elements
satisfy the condition p.  Observe that our reconstruction
procedure implicitly assumes that all values in the tree have
distinct values.

Exercise:  Write a Haskell function to reconstruct a binary tree
from its inorder and postorder traversals.

Is is possible to reconstruct a binary tree uniquely from its
preorder and postorder traversals?  The following example shows
that this cannot be done in general:

       1   and   1    both have preorder  : [1,2]
      /           \             postorder : [2,1]
     2             2


However, if we impose additional structure on binary trees---for
instance, no node can have a right child without having a left
child---preorder and postorder traversals together uniquely fix
the shape of a tree.   Here is how we could do it, by example

  Preorder  : [4,2,1,3,5,6]
  Postorder : [1,3,2,6,5,4]

4 is clearly the root.  From the preorder traversal we know that
2 is the root of the left subtree and from the postorder
traversal we know that 5 is the root of the right subtree.  This
information is sufficient to recursively breakup the problem as
follows:

                        4
                      /   \
       Preorder : [2,1,3]  Preorder : [5,6]
       Postorder: [1,3,2]  Postorder: [6,5]


Exercise: Write a Haskell function to reconstruct a binary tree
from its preorder and postorder traversals with the restriction
that no node can have a right child without having a left child.

Input/Output
------------

So far, we have invoked Haskell functions interactively, via the
Haskell interpreter.  In this mode of operation, there is no need
for a Haskell program to interact with the "outside world".
However, invoking functions interactively has its limitations.
It is tedious to type in large inputs manually and read off
voluminous output from the screen.  Instead, it would be much
more convenient to be able to read and write data to files.

There is also a natural need for programs to function offline,
without direct interaction from the user.  Such programs also
need mechanisms to take inputs from the environment and write
their output back.

Why is Input/Output an issue in Haskell?
----------------------------------------

In Haskell, computation is rewriting, using function definitions
to simplify expressions.  This rewriting is done lazily --- that
is, the arguments to a function are evaluated only when needed.
A highly desirable goal is confluence --- the order of evaluation
of independent subexpressions should not affect the outcome of
the computation.

An obvious approach would be to make input and output operations
functions.  For instance, suppose we have  a function "read" that
reads an integer from the keyboard.  Consider the following
expression that reads two integers and computes their difference:

  difference = read - read

An immediate problem with this expression is confluence:  the
order of evaluation of the two (independent) occurrences of read
changes the value computed.

There is a more subtle problem, arising out of lazy evaluation.
Consider a list of the form [7, factorial 8, 3+5].  We can
extract its length through the expression:

  length [factorial 8, 7, 3+5]

The function length only needs to check the number of elements in
the list, and not the actual values.  Under Haskell's lazy
evaluation mechanism, this expression evaluates to 3 without
actually computing "factorial 8" or "3+5".  On the other hand,
computing

  head [factorial 8, 7, 3+5]

would result in evaluating "factorial 8", but not "3+5".

Consider now,  corresponding expressions

  length [read, read, read]

and

  head [read, read, read]

Using lazy evaluation, no values are actually read when
evaluating the first expression!  On the other hand, evaluating
the second expression would read one value.

This means that an expression that includes functions that
perform input or output is not guaranteed to actually execute the
operation.

From these observations, we see that input/output actions need to
be done in a specific order.  Further, there should be no
uncertainty as to whether such an action has been performed.

Actions
-------

To fix this problem, Haskell introduces a new quantity called an
action.  We can think of the world of a Haskell program as
divided into two parts.  There is the ideal world of "Values"
that contains Ints, Floats, Chars and functions involving such
quantities.  This is the world that we have been dealing with so
far.

Side by side with this world is the "real" world with a keyboard,
screen, data files ...  Actions are used to transfer information
from the real world to the ideal world and back.

For this course, the only actions we deal with are those
involving input and output.  Recall that a function can be viewed
as a black box with an input and an output.  The inputs and
outputs to a function are not to be confused with the input and
output that we are trying to formalize.  To avoid confusion, we
will refer to the input of a function as its argument and the
output of a function as its result.  Here is our abstract view of
a function.

                  -----------------
       Argument  |                 |  Result
      ---------->|                 |--------->
                 |                 |
                  -----------------

Both the argument and the result of a function lie in the
abstract world of Values.  Remember that the argument or result
could itself be another function over Values.

Actions, on the other hand, simultaneously interact both with the
world of Values and the real world.  Here is an abstract picture
of an action

                  -----------------
       Argument  |                 |  Result
      ---------->|                 |--------->
                 |       /\        |
  Value world    |      /||\       |
 ................|.......||........|.........................
   Real world    |       ||        |
                  -------||---------
                        \||/
                         \/

The vertical arrow that penetrates inside the box denoting the
action represents the fact that an action transfers data between
the Value world and the Real world.

One might argue that we should be more careful and describe
whether the data flows upwards or downwards in an action, or both
ways.  However, as we have observed above, the data that flows
across this boundary is inherently sequential.  An action that
reads two data items and then writes one is different from an
action that writes one data item between two reads.  Hence, we
cannot separate out the upward and downward data streams into two
"channels" because, if we did so, we might lose information about
the order in which reads and writes occur.  Instead, we should
think of the action as providing a single doorway between the
Value world and the Real world through which data items pass one
at a time, either upwards or downwards.

For instance, an action that reads a character does not need an
argument. It interacts with the real world to fetch a character
and returns the character that is read as its result.  In
Haskell, the action that does all this is called getChar and has
the follwing type:

   getChar :: IO Char

The word IO indicates that this action performs input/output.
There is no argument, only a result type.

How about the symmetric action that takes a Char value as
argument and prints it out.  This action has an argument, but no
result.  Haskell has a type containing no values, denote "()".
Using this, we can describe the type of putChar as:

   putChar :: Char -> IO ()

Here, the argument is a Char and the () indicates that the
action returns nothing.

Notice that we did not need to use () to describe the lack of an
argument to getChar because it is legal for a function/action to
have only a result (in the world of functions, such a quantity
would be a constant that always returns a fixed value).  However,
we cannot write an expression that leaves out the result type, so
we need to use the empty type () in this context.

Notice also that a function that reads an argument and does not
generate a result is completely useless --- what we do with such
an entity?  On the other hand, actions can be "one-sided" with
respect to the world of Values because they perform something
nontrivial with respect to the Real world.

The occurrence of an action changes the state of the Real world.
For instance, getChar consumes one character from the keyboard,
leaving the input data pointing to the next character that has
been typed.  Similary, putChar produces a character on the
screen.  In the literature, this behaviour of updating the Real
world while reading and generating values is referred to as a
"side effect" of the function.


Composing actions
-----------------

In any nontrivial Haskell program, we compose simple functions to
create more complex ones.  When we compose functions, we set up a
pipe feeding the result of one function as the argument to
another one.  It is natural to want to the same with actions.

For instance, suppose we want to combine getChar and putChar to
generate a complex action that reads a character from the
keyboard and prints it out to the screen.

In function notation, it would suffice to write

   putChar(getChar)

However, this notation hides the fact that we are composing
actions, not functions.  For instance, we have to evaluate
getChar before putChar, which is not the order that normal
Haskell evaluation would choose.

To get around these difficulties, Haskell provides an explicit
operator, >>=, to compose actions.  The complex action we are
trying to define is written

  getChar >>= putChar

This is to be interpreted as "first do getChar, then feed the
result as the argument to putChar".  As with functions, we can
give this complex action a name.  For instance, we can write

  echo = (getChar >>= putChar)

What is the type of echo?  The first part of echo is getChar,
which does not require an argument.  The last part of echo is
putChar, which does not produce a result.  The argument that
putChar requires is supplied internally by getChar and
is not a visible part of echo.  Thus, echo has type

  echo :: IO ()

Seemingly, echo does nothing at all, since it does not require an
argument and does not produce a result!  What saves the day is
the tag IO, which says that though echo has no visible effect in
the Value world, it does perform some interaction with the Real
world.

What is the type of >>=?  It takes two actions and connects the
output of one to the input of the other.  A generic action is of
type (a -> IO b) where a and b stand for normal types in the
Value world.  We might guess that the type of >>= is

  (>>=) :: (a -> IO b) -> (b -> IO c) -> (a -> IO c)

However, >>= is restricted to combining  actions in which the
first action has no argument, so the actual type of >>= is

  (>>=) :: IO a -> (a -> IO b) -> IO b

Now, suppose we want to repeat echo --- that is, compose echo
with itself.  We could try

  echoTwice = (echo >>= echo)

but we have a problem.  We observed that echo does not require
an argument and does not produce a result, so it is not accurate
to talk of the result of the first echo action being fed as the
argument to the second echo.  What we need is an alternative
composition operator that discards the result of the first action.
This operator is called >> and is of type

  (>>) :: IO a -> IO b -> IO b

Thus

  echoTwice = (echo >> echo)

What if we want to read a character and print it twice?  We want
a complex action of the form


  getChar -----> putChar    -----> putChar
             |             |
              -------------

We can achieve this by writing, first

  put2char :: Char -> IO ()
  put2char c = (putChar c) >> (putChar c)

and then writing

  modifiedecho = (getChar >>= put2Char)

do: an easier way to generate sequences of actions
---------------------------------------------------

We have seen that composing actions generates, in general, a
sequence of actions.  In this sequence, the results of some
actions may be passed as arguments to one or more later actions.
Haskell provides a simple notation to describe such sequences,
using the special word "do".  For instance,

  do
    putChar c
    putChar c

is a complex action that generates two putChars in a row.  Thus,
we can rewrite the function put2char as

  put2char :: Char -> IO ()
  put2char =
    do
      putChar c
      putChar c

Observe that the lines below do are indented in a systematic way.

How do we capture the result of an earlier action to reuse as an
argument later?  The operator <- binds a variable to a value.
Thus, we can write

   echo :: IO()
   echo = do
            c <- getChar
            putChar c

In the first line, c is bound to result of getChar.  This value
is then used in the second line as an argument to putChar.

We can now write getput2char directly using do notation as
follows.

  getput2char :: IO ()
  getput2char =
    do
      c <- getChar
      putChar c
      putChar c