Introduction to Programming, Aug-Dec 2008 Lecture 11, Wednesday 17 Sep 2008 User defined datatypes ---------------------- A datatype is a collection of values with a collective name. For instance, the datatype Int consists of the values {...,-2,-1,0,1,2,...} while the datatype Bool consists of the values {False,True}. Datatypes can have structure, and may be polymorphic --- for example, tuples. Datatypes can also be recursively defined and hence of unbounded size --- for example, lists. In Haskell, we can extend the set of built-in types using the the data statement. Enumerated datatypes -------------------- This simplest form of datatype is one consisting of a finite set of values. We can define such a type using the "data" statement, as follows. data Day = Sun | Mon | Tue | Wed | Thu | Fri | Sat Having introduced this new type, we can directly use it in functions such as: weekday :: Day -> Bool weekday Sun = True weekday Sat = True weekday _ = False We can also write a function "nextday". nextday :: Day -> Day nextday Sun = Mon nextday Mon = Tue ... nextday Fri = Sat nextday Sat = Sun What happens if we ask Haskell to evaluate "nextday Fri"? The answer is computed correctly as "Sat" but we get a message ERROR - Cannot find "show" function for: *** Expression : nextday Fri *** Of type : Day Similarly, if we ask whether "Tue == Wed", the response is ERROR - Cannot infer instance *** Instance : Eq Day *** Expression : Tue == Wed The problem is that we have not associated the new datatype with any type classes, including the most basic ones such as Eq and Show. One way to do this is write our own instance declarations. instance Eq Day where Sun == Sun = True Mon == Mon = True ... Sat == Sat = True _ == _ = False instance Show Day where show Sun = "Sun" show Mon = "Mon" ... show Sat = "Sat" These are the most natural definitions for Eq and Show --- each value is distinct and equal only to itself and each value is displayed in the same way it is defined. To make things easier, we can include these "default" instance definitions for Eq and Show using the word "deriving" as follows: data Day = Sun | Mon | Tue | Wed | Thu | Fri | Sat deriving (Eq, Show) In the same way, we can derive an instance definition for Ord --- the default definition would order the values in the sequence that they are presented, namely Sun < Mon < ... < Sat. Note that the built in datatype Bool can be thought of as defined in this way: data Bool = False | True deriving (Eq, Ord, Show) In fact, even Char and Int (whose range is effectively finite because we use a fixed number of bits to represent an Int) can be thought of as defined in the same way. Datatypes with parameters ------------------------- We can go beyond finite enumerated types and describe datatypes with a parameter, as in the following example. data Shape = Square Float | Circle Float | Rectangle Float Float deriving (Eq, Ord, Show) area :: Shape -> Float area (Square x) = x*x area (Circle r) = pi*r*r area (Rectangle l w) = l*w where pi = 3.1415927 Each variant of Shape has a contructor --- Square, Circle or Float. Each constructor is attached to a group of values, which can vary from constructor to constructor. The values Sun, Mon etc in the type Day are also constructors with zero values attached. What happens when we derive Eq for Shape? At the level of Shape, this will ensure that (Square x) will be equal to (Square y) provided x == y but (Square x) is never equal to (Circle y), etc. When we derive Ord, we have Square < Circle < Rectangle so (Square x) < (Circle y) for all x and y, (Circle z) < (Circle w) if z < w, etc. Polymorphic datatypes --------------------- We can extend our definition of Shape to permit any numeric type as the parameter. Here is the corresponding definition. Note the conditionality on Num a. Note also that we need to include the type parameter a in the name of the type --- the datatype is "Shape a" not just "Shape". data Num a => (Shape a) = Square a | Circle a | Rectangle a a deriving (Eq, Ord, Show) size :: (Shape a) -> a size (Square x) = x size (Circle r) = r size (Rectangle l w) = l*w Recursive datatypes ------------------- We can have recursive datatypes. Here is an example. data Mylist = Empty | Listof Int Mylist Here the constructors are Empty and Listof. Empty has zero arguments and is hence a constant, representing the base case of the recursive type. The constructor Listof combines a value of type Int with a nested instance of Mylist. Here is a value of type Mylist corresponding to the list [1,3,2]. Listof 1 (Listof 3 (Listof 2 Empty) In Haskell's builtin definition of lists, Empty is written as [] and Listof is written as an infix constructor ":", so the value above becomes the more familiar 1 : (3 : (2 : []) from which we can eliminate the brackets using the right associativity of ":". It is a small step to extend Mylist to be polymorphic. data Mylist a = Empty | Listof a (Mylist a) Now, a term that uses the constructor Listof has a value of type "a" and a nested list of the same type. Note again that the full name of the type is "Mylist a", not just "Mylist". If we change the definition slightly, we get a version of lists where each new element is appended to the right, rather than the left. data Mylist a = Empty | Listof (Mylist a) a In this representation, a list such as [1,3,2] would be written as Listof (Listof (Listof Empty 1) 3) 2 For inductively defined types, we can write inductive functions to process them. Just as for builtin lists, we can use pattern matching to decompose a value into its parts. For instance, here is a definition of length corresponding to the last definition of Mylist a. length :: (Mylist a) -> Int length Empty = 0 length Listof l x = 1 + length l To illustrate the role played by the type variables in the definition of an inductive datatype, let us consider an example of a polymorphic type that uses multiple type variables. Suppose we want to define lists that contain elements of types a and b, such that values of types a and b alternate in the list, beginning with a value of type a. There is no restriction on the last value --- if the list has an odd number of elements, the last value is of type a, otherwise it is of type b. Such a list will look like [x_1,y_1,x_2,y_2,....,x_m,y_m] where each x_i is of type a and each y_i is of type b. Notice that if we strip of x1, the remaining list is of the form [y_1,x_2,y_2,....,x_m,y_m]. This is again a list in which values of type a and b alternate, except that the first value is of type b. This observation leads us to the following definition. data Twolist a b = Empty | Listof a (Twolist b a) Notice that within Listof, the inductive call to Twolist inverts the order of the type variables. Thus, after a value of type a, we have a list that has alternate values starting with b. The next unfolding of the inductive definition would again invert the types, so we have a list in which the first value is of type a, and so on. ====================================================================== Organizing functions in modules ------------------------------- For small function definitions, it is acceptable to write all definitions in a single file and include all dependent definitions. However, as programs grow in size, it is desirable to break them up into separate units for the following reasons: 1. The functions defined in one unit may be useful in many contexts. For instance, if we define quicksort and save it as a separate unit, we should be able to include it automatically in another set of functions without rewriting the definition of quicksort. 2. Keeping functions in separate units makes it easier to maintain the programs. Finished portions are guaranteed not to be touched while editing definitions still under development, thus avoiding unintended modifications to definitions that are already correct and complete. 3. By separating out functions, the interdependence of functions on each other is more clearly specified. In particular, we can identify exactly what "interface" each function provides to the rest of the world. Provided we do not change this "interface", we can reimplement the actual function without changing the correctness of the overall code. For instance, we might organize a unit containing a function "sort" to sort lists. Initially, we may have implemented "sort" using insertion sort. At a later date, if we replace the insertion sort implementation by a better algorithm, such as quicksort, the rest of the code is not affected. The mechanism for collecting Haskell functions in a reusable unit is to declare them as a module. For simplicity, Haskell requires that each module should be in a separate file and the name of the module should be the same as that of the file containing it. Thus, we can make a unit consisting of quicksort and mergesort as follows: module Sortfunctions where quicksort :: (Ord a) => [a] -> [a] quicksort [] = [] quicksort (x:xs) = (quicksort lower) ++ [splitter] ++ (quicksort upper) where splitter = x lower = [ y | y <- xs, y <= x ] upper = [ y | y <- xs, y > x ] mergesort :: (Ord a) => [a] -> [a] mergesort [] = [] mergesort [x] = [x] mergesort l = merge (mergesort (front l)) (mergesort (back l)) where front l = take ((length l) `div` 2) l back l = drop ((length l) `div` 2) l merge [] ys = ys merge xs [] = xs merge (x:xs) (y:ys) | x < y = x:(merge xs (y:ys)) | otherwise = y:(merge (x:xs) ys) These definitions should be stored in a file called "Sortfunctions.hs", to match the module name. Notice that other than adding an initial line module Sortfunctions where we have not changed the definitions of quicksort and mergesort in any way. We can now invoke this module in another file as follows: import Sortfunctions ... After "import Sortfunctions", we can freely use quicksort and mergesort. The file that invokes "import Sortfunctions" need not be a module --- it can be a simple Haskell file that has some additional function definitions, which can freely use mergesort and quicksort. We have seen an example of invoking modules when we used the functions "ord" and "chr" for the Char type, which required importing the module Char. Sometimes, we may not want to import all functions from a module. For instance, suppose we want to use only quicksort from Sortfunctions and write our own mergesort. We can then say import Sortfunctions hiding (mergesort) If we did not hide mergesort, we would have to use a different name for the new implementation of mergesort because the same name cannot be given two different definitions. The builtin functions in Haskell (e.g. take, drop, max, etc) are defined in the Standard Prelude, which is implemented as a module called Prelude.hs. This module is imported implicitly in every Haskell file. However, it is possible to explicitly import Prelude and hide some of the builtin functions in case one wants to rewrite these functions. For instance, if we wanted to write different definitions for take and drop, we could initially write import Prelude hiding (take,drop) Symmetrically, it may be desirable to restrict what is visible outside a module. Suppose we use an auxiliary function in a module to define the main function. We may not want this auxiliary function to be visible outside. If we want to restrict the list of functions that a module exports, we write the list of exported functions in the module header line, as follows. module Sortfunctions(quicksort,mergesort) where This line specifies that among all the possible functions that may be defined in the module Sortfunctions, only quicksort and mergesort are actually visible to any file that imports this module. ======================================================================