The abstract datatype Set
-------------------------


A collection of elements without repetitions and without any
prespecified order is a Set.

Any implementation of the Set should be compatible with the following operations

Dictionary operations:

>  empty   :: Set a
>  isempty :: Set a -> Bool
>  member  :: Set a -> a -> Bool
>  insert  :: Set a -> a -> Set a
>  delete  :: Set a -> a -> Set a

Set operations:

>  union     :: Set a -> Set a -> Set a
>  intersect :: Set a -> Set a -> Set a
>  setdiff   :: Set a -> Set a -> Set a






Possible implementations:

1. (Sets as lists with elemnts repeated)

   The first possibility is to represent a set as a lists,
   possibly with elements repeated.  We use the definition:
  
   data Set a = Setof [a]


   Eg  Set A = {1,2,3,4}
       list = Setof A 
       A = [1,1,1,2,3,2,4,3,4]
   Note that size of list representing the set is not necessarily the size
   of the Set 
   If the size of the set is n, the size of the representation N,
   may be much larger than n, because of repetitions.  



   With this definition, the dictionary operations can be defined
   as follows.

>   empty              = Setof []
>   isempty Setof l    = l == []
>   member (Setof l) x = elem x l
>   insert (Setof l) x = Setof (x:l)
>   delete (Setof l) x = Setof (filter (/= x) l)

   (or, alternatively,)

   delete (Setof l) x = Setof [y | y <- l, y /= x]


   Complexity of dictionary operations:

   empty   : O(1)
   isempty : O(1)
   member  : O(N)
   insert  : O(1)
   delete  : O(N)

   Other set operations:

>   union     (Setof xs) (Setof ys) = (Setof xs++ys)
>   intersect (Setof xs) (Setof ys) = [y | y <- ys, elem y xs]
>   setdiff   (Setof xs) (Setof ys) = [x | x <- xs, not (elem x ys)]



   Let M and N be the size of the sets represented by Setof xs
   and Setof ys, respectively.(In other words the size of the lists holding the    values of the sets) 

  Then, Complexity of Set Operations

  union     : O(M+N)
  intersect : O(MN)
  setdiff   : O(MN)


2. To reduce the size of the list representation to be equal to
   the size of the set being represented, we can inductively
   maintain the set as a list without repetitions. 
   Then the size of the set is neccessarily the size of the representation

   The data definition remains the same:

   data Set a = Setof [a]

   Under the assumption that the list has no repetitions, the
   dictionary operations can be defined as follows.

>   empty              = Setof []
>   isempty Setof l    = l == []
>   member (Setof l) x = elem x l
>   insert (Setof l) x = Setof (x:(filter (/= x) l)) -- Insert x and delete repetitions 
>   delete (Setof l) x = Setof (filter (/= x) l)     -- Delete all occurences of x

   The only change is in the definition of insert, where we have
   ensured that all existing copies of x are removed before we
   put in the new value.

   Now, if the size of the set is n, 
   Complexity of the dictionary operations is given by:

>   empty   : O(1)
>   isempty : O(1)
>   member  : O(n)
>   insert  : O(n) -- Complexity changes from O(1) -> O(n)
>   delete  : O(n)

   For the other set operations, intersect and setdiff can be
   retained as in the earlier representation.  When computing the
   union, we have to removed duplicates.  One way is to
   systematically insert all the values from the second set into
   the first set, making use of the fact that insert handles the
   problem of filtering out duplicates.

>   union (Setof xs) (Setof [])    = Setof xs
>   union (Setof xs) (Setof (y:ys) = union (insert (Setof xs) y)
>                                          (Setof ys)
 
   With this definition, the union of two sets of size m and n
   takes time mn, as do intersect and setdiff.
   

3. Sorted List:

   If the values being stored in the set can be compared, we can
   maintain the set as a sorted list without repetitions.

>   data (Ord a) => Set a = Setof [a]

>   empty              = Setof []
>   isempty Setof l    = l == []
>   member (Setof l) x = elem x l
>   insert (Setof l) x = Setof (insert x l) -- Insert is overloaded (I think ...)
>   delete (Setof l) x = Setof (filter (/= x) l)

   Here, the insert function on sets calls the insert function on
   sorted lists that we have seen in the definition of insertion
   sort (with minor modifications) .
   We do not gain anything in terms of complexity for
   dictionary operations.  We still have:

>   empty   : O(1)
>   isempty : O(1)
>   member  : O(n)
>   insert  : O(n)
>   delete  : O(n)

   However, we can now use the fact that we can merge two sorted
   lists in linear time to write more efficient implementations
   of the other set operations.

>   union (Setof xs) (Setof ys) = Setof (unionmerge xs ys)
>     where
>       unionmerge [] ys = ys
>       unionmerge xs [] = xs
>       unionmerge (x:xs) (y:ys)
>         | x < y     = x:(unionmerge xs (y:ys))
>         | y < x     = y:(unionmerge (x:xs) ys)
>         | otherwise = x:(unionmerge xs ys)
 
   union : O(m+n) m,n = length of lists 

>   intersect (Setof xs) (Setof ys) = Setof (intersectmerge xs ys)
>     where
>       intersectmerge [] ys = []
>       intersectmerge xs [] = []
>       intersectmerge (x:xs) (y:ys)
>         | x < y     = (intersectmerge xs (y:ys))
>         | y < x     = (intersectmerge (x:xs) ys)
>         | otherwise = x:(intersectmerge xs ys)

   intersect : O(m+n) m,n = length of lists 
   
>   setdiff (Setof xs) (Setof ys) = Setof (setdiffmerge xs ys)
>     where
>       setdiffmerge [] ys = []
>       setdiffmerge xs [] = xs
>       intersectmerge (x:xs) (y:ys)
>         | x < y     = x:(intersectmerge xs (y:ys))
>         | y < x     = (intersectmerge (x:xs) ys)
>         | otherwise = (intersectmerge xs ys)
  
   setdiff : O(m+n) m,n = length of lists 
  

   Merging two lists of length m and n takes time m+n, so each of
   union, intersect and setdiff  now takes time m+n.
   
4. Balanced Search Tree

   If the elements of the set can be compared with each other, we
   can move from a sorted list representation to a balanced
   search tree representation.

>   data (Ord a) => Set a = Setof STree a

   Recall that to maintain balanced search trees, we store the
   height of a node along with the value so we have

>   data (Ord a) => STree a = Nil | Node Int (STree a) a (STree a)

   In this reprentation, it is clear that the dictionary
   operations member, insert and delete take time O(log n) for a
   set of size n, thus beating the other representations by a
   long way.

   How about union, intersection and setdiff?

   The naive implementation of union would be to insert each
   element of the second set in the first, or vice versa, yielding
   an algorithm with complexity min(n log m, m log n).   To
   implement this, we can first flatten out one of the sets into
   a list using inorder, preorder or postorder traversal and then
   systematically run through the elements of this list.

   Can we do better?  We know that we can perform unionmerge,
   intersectmerge and setdiffmerge on two sorted lists in linear
   time.  We can obtain a sorted list from a (balanced) search
   tree using inorder.  We can also write a function mkbtree to
   construct a balanced search tree from a sorted list.  If
   inorder and mkbtree work in linear time, we can implement
   union, intersect and setdiff in linear time in the balanced
   search tree representation.

   Let us start with analyzing inorder:

>   inorder :: (STree a) -> [a]
>   inorder Nil = []
>   inorder (Node m t1 x t2) = (inorder t1)++[x]++(inorder t2)

   If t is balanced, the left and right subtrees t1 and t2 are
   half the size of the main tree.  Let T(n) denote the time
   required to generate the inorder traversal of a balanced tree
   with n nodes.  The time taken to combine the two recursive
   inorder traversals is proportional to the length of (inorder
   t1) because ++ takes time proportional to the length of its
   left argument.  Thus, we have

     T(n) = 2 T(n/2) + O(n)

   We have seen this recurrence before (e.g., mergesort) and the
   solution is T(n) = O(n log n), which is larger than the O(n)
   solution that we seek.

   The problem arises because Haskell lists are built up right to
   left, so it is inefficient to write a function that build a
   list left to right.  An example of this is the naive reverse
   function that we have seen earlier, which takes quadratic
   time.

     reverse [] = []
     reverse (x:xs) = (reverse xs) ++ [x]
   
   As with reverse, the key to making inorder more efficient is
   to use an auxiliary parameter.  Let us define a new function

>     inorderaux :: (STree a) -> [a] -> [a]

   such that

     inorderaux t l 

   yields

     (inorder t)++l

   We can then recover inorder t as inorderaux t [].

   Here is a definition of inorderaux.

>     inorderaux Nil l = l
>     inorderaux (Node t1 x t2) l =  inorderaux t1 (x:(inorderaux t2 l))
         
      Eg 
                2               1
       in      / \  []    in   / \  (2:(in  3  [])) = in   1  (2:[3])  = [1,2,3]
              1   3    ->     n   n        / \            / \  
             / \ / \                      n   n          n   n
            N  N N  N

          where

        in   3   []  = in n (3:(in n []) = in n (3:[]) = in n [3] = [3] 
            / \ 
           n   n

   
   In other words, we compute the inorder traversal of Node t1 x
   t2 from right to left, first placing inorder t2 to the left of
   l, then adding x on the left of the resulting list and finally
   placing inorder t1 at the leftmost position.

   The complexity of inorderaux is given by:

      T(n) = 2 T(n/2) + O(1)

   The crucial feature is that we need only constant time to
   combine the two recursive computations of size n/2.  For this
   recurrence, the solution is T(n) = O(n), which is what we
   seek.

   The next function that we have to achieve in linear time is to
   mkbtree, that constructs a balanced search tree from a sorted
   list.  We need not insist that the input list be sorted.  We
   shall write mkbtree in such a way that the output of mkbtree
   is a balanced tree and

>       inorder (mkbtree l) == l

   If l is sorted, this ensures that mkbtree generates a search
   tree.

   The naive way to make a balanced tree from a list is to use
   the centre element of the list as the root and recursively
   construct balanced left and right subtrees from the first and
   second halves of the list:

>       mkbtree :: [a] -> (STree a)

>       mkbtree [] = Nil
>       mkbtree [x] = Node Nil x Nil
>       mkbtree l =  Node (mkbtree left) root (mkbtree right)
>         where
>           m = (length l) `div` 2
>           root == l!!m
>           left = take m l
>           right = drop (m+1) l
   
   The complexity of mkbtree is given by the following
   recurrence:
  

     T(n) = 2 T(n/2) + O(n)

   The O(n) factor comes because it takes linear time to compute
   the midpoint of the input list and break it up into two
   halves.



   To do better, we need a more sophisticated version of the
   trick we used to improve the efficiency of inorder.   This
   time, rather than constructing a list as output, our input is
   a list.  For optimum efficiency, we need to process the input
   from left to right.  To achieve this, we write a function

>     mkbtreeaux :: [a] -> Int -> (STree a, Int)

   whose behaviour is as follows:

     mkbtreeaux l n = (t,lrest)
 
   where t is a balanced tree made up from the first n elements
   of l and lrest is the unused part of l (i.e., drop n l)

   Here is how we define mkbtreeaux:

>     mkbtreeaux [] n = (Nil, [])
>     mkbtreeaux l  0 = (Nil, l)
>     mkbtreeaux l  n = (Node t1 root t2, l2)
>       where
>       m = n `div` 2
>       (t1,(root:rest)) = mkbtreeaux  l m 
>       (t2,l2) = mkbtreeaux rest (n - (m+1)) 


   As before, we observe that to construct a balanced tree from
   the first n elements of l, we need to select the midpoint of
   the n elements as the root and inductively make balanced left
   and right subtrees from the left and right halves.  However,
   instead of explicitly finding the midpoint and breaking up the
   list into two parts, we build up the tree from left to right
   using the same function.

   Pictorially, we have
                   n
   <------------------------------------>
                            l
   |----------------------------------------------------------|
          m                             rest
   <---------------> root |-----------------------------------|
                            n-(m+1)
     becomes t1           <------------>
                                                  l2
                           becomes t2   |---------------------|

   For this function, the time complexity is given by

      T(n) = 2 T(n/2) + O(1)

   which yields T(n) = O(n), as required.

   Thus, we can now implement union, intersect and setdiff for
   the balanced search tree representation of sets in linear time
   as a sequence of three operations:

   1. inorder => generates  sorted lists from the sets in linear time
   2. appropriate merge => combines sorted lists in linear time
   3. mkbtree => reconstructs balanced search tree in linear time
  
