Introduction to Programming, Aug-Dec 2008
Lecture 16, Monday 20 Oct 2008

lazy IO
-------

In the last lecture, we discussed an example of "lazy IO" in
Haskell.

  copyfile :: Handle -> Handle
  copyfile fromhandle tohandle = 
    do
       s <- hGetContents fromhandle
       hPutStr tohandle s

We said that hGetContents will not necessarily read in the entire
file associate with fromhandle.  Typically, if its argument s is
beyond a certain limit, hPutStr will write out its argument in
blocks, generating a fixed amount of text at time.  What happens
is that hGetContents reads the contents of fromhandle in blocks
corresponding to the way hPutStr writes out its values.

Lazy IO has to be handled carefully.  Suppose we expand copy file
to do the opening and closing of handles as well as the actual
reading of the file.  Then, the function we have written above
expands as:

  newcopyfile :: FilePath -> FilePath
  newcopyfile fromfile tofile = 
    do
       fromhandle <- openFile fromfile ReadMode
       tohandle <- openFile tofile WriteMode
       s <- hGetContents fromhandle
       hPutStr tohandle s
       hClose fromfile
       hClose tofile

This works in the same way as the previous copyfile.  Logically
speaking, since hGetContents is supposed to read the entire file
into s, we can close fromhandle before we write s to tohandle.
This gives us the following variant of copyfile.

  badcopyfile :: FilePath -> FilePath
  badcopyfile fromfile tofile = 
    do
       fromhandle <- openFile fromfile
       tohandle <- openFile tofile
       s <- hGetContents fromhandle
       hClose fromfile
       hPutStr tohandle s
       hClose tofile

How does this version behave?  Well, hGetContents is lazy and
there is no demand made on s, so nothing is read before
fromhandle is closed.  So, this version does nothing.

The moral of the story is that lazy IO should be handled with
care.  The results you get may vary unexpectedly with slight
perturbations of your code, as we saw here.

Actions are like values
-----------------------

Actions can be thought of as special types of functions.  Thus,
just as we can use functions in place of simple types --- for
instance, we can construct lists of functions and pass a function
as an argument or obtain a function as a resutl --- we can use
actions like simple types.

Here is a list of actions of type [IO ()]

  [ putChar 'c', putChar 'z', echo ]

We can write, for instance, a function that takes a list of
actions and executes them as a sequence:

  dolist :: [IO ()] -> IO ()
  dolist [] = return ()
  dolist (c:cs) = do
                     c
                     dolist cs

Haskell has a builtin function sequence of the following type:

  sequence :: [IO a] -> IO [a]

In other words, sequence combines the results of a list of
actions into a single list.  Here is how sequence is defined:

  sequence [] = return []
  sequence (c:cs) = do
                      r <- c
                      rs <- sequence cs
                      return (r:rs)

Notice the similarity in structure between sequence and getLine.


  getLine =  do
               c <- getChar
              ------------------
             | if (c == '\n')   |
             |    return ""     |
             | else             |
              ------------------
                  cs <- getLine
                  return (c:cs)

This is not surprising, since getLine combines the result of a
sequence of getChar's into a list of Char, or String.  The only
difference is that the list of actions in getLine is terminated
by reading '\n', so there is a condition to be checked before
making a recursive call to itself.

User defined "control" structures
---------------------------------

getLine is an example of a "loop" in which we call an action
recursively.  We can easily control this behaviour a bit more.
Suppose we want to write a version of getLine that reads n lines,
for a input integer n, and returns a list of strings, one per
line read.

   getNlines :: Int -> IO [String]
   getNlines 0 = return []
   getNlines n = do
                    thisline <- getLine
		    morelines <- getNlines (n-1)
		    return (thisline:morelines)

In general, if we want to repeat an action n times, we can write

   doNtimes 1 act = act
   doNtimes n act = do
                      act
                      doNtimes (n-1) act

Using let
---------

So far, for local values in a function, we have used where.  For
example:

  mergesort l = merge (mergesort left) (mergesort right)
    where
      n = (length l) `div` 2
      left = take n l
      right = drop n l

Dually, we can put the local definitions before the function
using let, as follows:

  let
    n = (length l) `div` 2
    left = take n l
    right = drop n l
    in
       mergesort l = merge (mergesort left) (mergesort right)
 
Inside a do block, we can use a variation of let, without "in",
to reuse the return value of a function.

  do 
    line <- getLine
    let revline = reverse line
    putStr revline

In other words, <- allows us to "remember" the return value of an
action and "let" allows us to "remember" the return value of a
function. 

Using the Haskell compiler
--------------------------

One of the standard Haskell compilers is the Glasgow Haskell
Compiler which can be invoked using the command ghc.

When you use an interpreter, you interact directly and can choose
the function you want to evaluate.  A compiled program runs
autonomously, so there has to be an unambiguous way of specifying
where the computation should start.  Like many other languages,
ghc expects computation to start with a function called main of
type IO(), located in a module Main.

One useful way to organize Haskell code is to put the actual code
in a separate module and use main in module Main to just call the
relevant function and print out its result using the builtin
function.

   print :: Show a => a -> IO ()

For instance, suppose all our code is in a module called MyModule
and the function to be invoked in MyModule is mymainfunction.
Then, the module Main would look like the following:

  module Main where
  import MyModule
  main = print (mymainfunction)

How do we actually compile the file?  The command is

  ghc --make Main.hs -o outputfilename

In this command, ghc is the name of the compiler while Main.hs is
the module to compile.  The flag "--make" tells ghc to look up
and compile all modules referred to and required by Main.hs.  The
flag "-o" is used to specify the name of the final executable
command.  If this is left out, the default is to produce an
executable called a.out.