Introduction to Programming, Aug-Dec 2006 Lecture 21, Tuesday 14 Nov 2006 Lecture 22, Thursday 16 Nov 2006 Memoization and Dynamic Programming ----------------------------------- Last time, we saw how we could make the computation of an inductive definition with overlapping subcomputations more efficient by "memoizing" intermediate results. Today we look at some more examples and also study a related idea from the theory of algorithms called dynamic programming. Pinball game ------------ A board has obstacles arranged in a triangle, as follows. () / \ () () / \ / \ () () () / \ / \ / \ () () () () / \ / \ / \ / \ () () () () () Each obstacle has a number of points associated with it. For instance, we could have the following assignment of points to the obstacles in the board above. 15 / \ 28 33 / \ / \ 18 22 16 / \ / \ / \ 35 15 11 17 / \ / \ / \ / \ 29 13 14 26 12 When we drop a ball on the topmost obstacle, it bounces off and goes down, either left or right. Depending on which way it goes, it bounces off an obstacle at the next level, and again gets deflected left or right. This continues till it reaches the bottom row of obstacles. When the ball hits the final obstacle on its run, we add up the points of all the obstacles that it has collided against to obtain our score for the game. With sufficient skill, it is possible to control the ball as it moves down the board and select the path along with it travels. The aim is to choose a path that maximizes the score. For the moment, let us concentrate on calculating the maximum score that one can obtain on a given board (note that there may be more than one way to achieve this maximum score). Later, we will come back to the question of determining a path that actually yields this maximum score. First, we need a way to identify each obstacle unambiguously. We can assign a pair of coordinates (i,j) to each obstacle where i in {1,2,...,n} is the row in which the obstacle lies and j in {1,2,...,i} is the position of the obstacle within that row, from left to right. Let points(i,j) be the points assigned to the obstacle at location (i,j). If we rearrange the triangle a bit, we have the following arrangement. | 1 2 3 4 5 ---> j ---+---------------------------- 1 | 15 | 2 | 28 33 | 3 | 18 22 16 | 4 | 35 15 11 17 | 5 | 29 13 14 26 12 | | v | i | We define score(i,j) to be the maximum score that we can achieve if the ball starts its path at the obstacle labelled (i,j). The quantity we want to calculate is given by score(1,1), corresponding to the obstacle at the top of the triangle. There is a natural inductive definition of score(i,j). After bouncing of (i,j), the ball will next go to either (i+1,j), if it goes left, or (i+1,j+1), if it goes right. If we already know the scores at these two positions, we can computute score(i,j) as follows: score(i,j) = points(i,j) + max [ score(i+1,j), score(i+1,j+1) ] The base case is when i = n, corresponding to the last row. For all j in {1,2,...,n} we have score(n,j) = points(n,j) Computing score(i,j) naively would result in recomputing intermediate values. For instance, both score(3,2) and score(3,3) would require the value of score(4,3). Through memoiziation, we can avoid wasteful recomputation, just as we saw in the fibonacci example. Dynamic programming ------------------- Another way to approach the problem of overlapping subcomputations is to examine the order in which values are required. In each inductive definition, we have a base case for which the value is immediately available. The inductive definition specifies a relationship between a value and its "neighbours". Once we have calculated the base cases, we can compute the positions that lie in the neighbourhood of these base cases. In this way, we can systematically "grow" the set of known values, ensuring at each stage that we already have at hand the values that we need to compute the current value. For example, in the fibonacci case, the base cases are "fib 0" and "fib 1". Having calculated these, we can immediately calculate "fib 2", since the values it depends on are both known. With "fib 2" in hand, we can compute "fib 3". In this way, we proceed to "fib 4", "fib 5", ... We can regard this as filling up the memo table systematically for n = 0,1,2,... Notice that this corresponds to the natural way in which we enumerate the fibonnacci sequence as 1 1 2 3 5 ... Consider our second problem, the pinball game. The base cases in this computation are the values score(n,j), for j in {1,2,...,n}. With this in hand, we can immediately compute the values score(n-1,k), k in {1,2,...,n-1} since both neighbours of the obstacle at (n-1,k) lie in the range of values that are already known. Having calculated row n-1, we can proceed back to row n-2, n-3, ... till we reach the top row. This technique is called dynamic programming. In dynamic programming, we systematically compute the entries of the memo table "bottom up" by analyzing the order in which values would be computed had we used the normal "top down" inductive definition. In a sense, both approaches are equivalent, because we actually compute each value only once. However, in practice, dynamic programming is more efficient because we avoid the overhead associated with keeping many function calls pending while we move down to the base case. Expression evaluation --------------------- Consider infix arithmetic expressions over integers using the operators + and * without parentheses and without any assumptions about the order in which to evaluate subexpressions. Thus, an expression such as 6*3+2*5 may be evaluated as (6*3)+(2*5) = 28 or 6*((3+2)*5) = 150 or ((6*3)+2)*5 = 100, depending on the order of evaluation. Our aim is to find a way of bracketing the expression so as to achieve the maximum overall value. As with the pinball game, for the moment we disregard the problem of finding the actual bracketing and concentrate on calculating the best possible value we can achieve with this expression. For simplicity, let us assume that the numbers in the expression are all single digits. If we number the positions in the input expression from 1, every odd position is an integer and every even position is an arithmetic operator. When we bracket an expression, we explicitly fix an order in which subexpressions are evaluated. Starting with the initial expression, we combine values in pairs, using one of the operators, and keep reducing it till we end up with a single number. Let us focus on what happens in the last step. Consider the expression we saw earlier, 6*3+2*5. The last step could involve any of the three operators. We have no reason to believe, a priori, that one of these three is better than the others, so we have to consider all of them: Case 1: The first occurrence of * is the last operator applied. Then, prior to this, we have optimally evaluated subexpressions 6 and 3+2*5. Case 2: The + is the last operator applied. Then, prior to this, we have optimally evaluated subexpressions 6*3 and 2*5. Case 3: The second occurrence of * is the last operator applied. Then, prior to this, we have optimally evaluated subexpressions 6*3+2 and 5. In each case, the best value we can achieve by applying the last operator is obtained by maximizing the values we obtain from the corresponding subexpressions. (This is a consequence of the choice of operators + and *, which grow monotonically in both their arguments. What would happen if we allowed the operator - in our expression?) The subexpression we have to evaluate is a part of the original expression. The subexpressions generated by subcomputations could overlap. For instance, in Case 2, we have to compute the values of 6*3 and 2*5. The case 6*3 will also occur one step later in Case 3, while 2*5 will occur one step later in Case 1. Hence, it makes sense to memoize these values. How do we identify subexpressions unambiguously, to ensure that we can recognize when we have (or have not) yet computed the value for a given subexpression? Notice that each subexpression is actually a substring of the original expression. Thus, we can identify it by noting its starting and ending points. Thus, in general, we want to compute a function maxval(i,j) where i < j are indices in the range 1,2,...,length(e) for a input expression e. Moreover, assuming that all the integers in the expression are single digit numbers, we have already observed that a subexpression will start and end at an odd index. We can write down an inductive defintion for maxval(i,j). maxval(i,j) = max op(k)(maxval(i,k-1),maxval(k+1,j)) k in [i..j] k even Here, op(k) refers to the function corresponding to the operator at position k --- op(k) is multiplication if the symbol is * and addition if the symbol is +. All the arithmetic operators are at even positions, so we restrict ourselves to even numbers k in the range [i..j]. We pick all such k, inductively compute the subexpressions resulting from splitting the expression at k, and take the maximum. The base case is when the expression has length 1, so that we have a single value. For odd positions i in the range [1..length(e)] maxval(i,i) = value(i) where value(i) is the integer value corresponding to the digit at position i. Using memoization, we can calculate the values we need without recomputing any intermediate values. The final answer we want is maxval(1,length(e)). How would we systematically calculate thse values using dynamic programming. If we place the values maxval(i,j) in a two dimensional array, we have the following picture: | 1 3 5 7 9 ---> j ---+---------------------------- 1 | o x x x x | 3 | . o x x x | 5 | . . o x x | 7 | . . . o x | 9 | . . . . o | | v | i | Notice that we only consider odd values for i and j because each subexpression starts and ends at an odd position. Positions marked . correspond to pairs (i,j) where j < i and hence do not denote valid subexpressions. Positions marked "o" correspond to values of the form maxval(i,i) which are the base case. The remaining positions, marked "x" have to be computed using the inductive definition. According to the inductive definition, to compute maxval(i,j) we need to know maxval(i,k), where i < k <= j and maxval(k,j) where i <= k < j. Suppose we are trying to compute maxval(1,7). This requires us to know maxval(1,5), maxval(1,3) and maxval(1,1) along the row i = 1 and maxval(3,7), maxval(5,7) and maxval(7,7) along the column j = 7. Pictorially, we have the following situation, where y marks the value we are trying to compute and z marks the values that are required to compute it. | 1 3 5 7 9 ---> j ---+---------------------------- 1 | z z z y x | 3 | . o x z x | 5 | . . o z x | 7 | . . . z x | 9 | . . . . x | | v | i | Thus, in the array, to compute a value, we need to know all values to its left and all values directly below it. Notice that each value to the left pairs up with a value below it, according to the inductive definition --- for instance, maxval(1,1) goes with maxval(3,7) corresponding to splitting the expression at position 2, maxval(1,3) goes with maxval (5,7) corresponding to splitting the expression at position 4 and maxvao(5,7) goes with maxval(7,7) corresponding to splitting the expression at position 6. As we observed earlier, the base cases are the values on the diagonal. Having computed the values on the diagonal, we can next tackle the values one off the diagonal, of the form maxval(i,i+2), because, for these positions, we know all the values in the same row and column. We can then proceed to the elements two positions off the diagonal of the from maxval(i,i+4) and so on. In the picture below, we describe the order in which values are calculated. The diagonal elements can be calculated (in any order) in the first pass, so they are labelled 1. The off diagonal elements can then be calculated (in any order) in the second pass, so they are labelled 2. Thus, in 5 passes, we can compute the value maxval(1,9) that is four steps off the diagonal. | 1 3 5 7 9 ---> j ---+---------------------------- 1 | 1 2 3 4 5 | 3 | . 1 2 3 4 | 5 | . . 1 2 3 | 7 | . . . 1 2 | 9 | . . . . 1 | | v | i | Edit distance ------------- Suppose we have two strings u = a1 a2 ... an v = b1 b2 .. bm The aim is to measure how "close" u and v are to each other. (An application of this could be in a spelling checker, which might suggest a replacement v from its dictionary for a mistyped string u, depending on how close v is to u). One way is to calculate how much work we need to do to edit u and v to be equal to each other. In each edit step, we can perform one of the following operations on either string. 1. Delete a letter 2. Insert a letter 3. Modify a letter Notice that each of these operations is reversible in one step. Thus, for instance, if we transform u to u' and v to v' so that u' = v', u -----> u' = v -----> v' we can equivalently reverse the operations used to transfrom u to u' to go from v' back to u. In other words, we can simplify the problem by assuming that we always operate on v, leaving u untouched. For instance, all the following pairs are at edit distance one apart: "bash", "ash" --- delete a letter in the second word "art", "cart" --- insert a letter in the second word "cash", "cast" --- modify a letter in the second word As usual, there is no way to "guess" a good sequence of operations to transform v to u, so we have to try all possibilities systematically. Consider the original u and v, where u = a1 a2 ... an v = b1 b2 ... bm If we start at the left end of both words, we begin with a1 and b1. There are two cases: a1 == b1 : In this case, it is clearly sufficient to transform b2..bm into a2...an, so we can pass over this pair and look at the rest of the word a1 /= b1 : Here we have multiple options: we can modify b1 to be the same as a1, we can insert a1 before b1 or we can delete b1 and proceed. These three cases yield the following subproblems: Modify b1 to a1 : Transform b2..bm to a2..an Insert a1 before b1 : Transform b1..bm to a2..an Delete b1 : Transform b2..bm to a1..an In each case, we have to add the operation that we just performed to the edit distance of the subproblem to obtain the overall edit distance. The same analysis holds, in general, when we reach some intermediate stage where we are considering ai a{i+1} ... an and bj b{j+1} ... bm. Let ed(i,j) denote the edit distance of this pair of words. Then, using Haskell like syntax, ed(i,j) | ai == bj = ed(i+1,j+1) | otherwise = 1 + max(ed(i+1,j+1), -- transform bj to ai ed(i+1,j), -- insert ai before bj ed(i,j+1)) -- delete bj The base case is when either word is empty --- that is, when i is n+1 or j is m+1. If i = n+1, the most efficient solution is to delete all the letters in the second word. Symmetrically, if j = m+1, we have to insert each letter corresponding to the first word. Thus: ed(n+1,j) = (m-j+1) -- delete all the letters bj..bm ed(i,m+1) = (n-i+1) -- insert all the letters ai..an Once again, we observe that we need memoization because of overlapping subproblems. For instance, ed(i,j) requires us to compute ed(i+1,j+1) if ai /= bj. However, we also have to compute ed(i,j+1) and one of the subproblems generated by ed(i,j+1) is again ed(i+1,j+1). How about a dynamic programming solution? Once again, we start by writing out the values that we need to compute in a two dimensional array. The base cases correspond to the bottom row (i = n+1) and the rightmost column (j = m+1), marked by x below. | 1 2 3 4 5 6 .... m+1 ---> j ---+---------------------------- 1 | . . . . . . . . x 2 | . . . . . . . . x 3 | . . . . . . . . x 4 | . . . . . . . . x 5 | . . . . . . . . x 6 | . . . . . . . . x . | . . . . . . . . . . | . . . . . . . . . n+1 | x x x x x x . . x | | v | i | To compute a value ed(i,j) we need its three neighbours to the east (ed(i,j+1)), south east (ed(i+1,j+1)) and south (ed(i+1,j)). After computing the base cases, the value ed(n,m) can be calculated since the values of its three neighbours are known. Having done this, we can calculate the rest of row n, right to left and the rest of column m, bottom to top. We then move to the next unknown corner (n-1,m-1) and repeat the process. Thus we have the following picture describing the order in which the values in the memo table should be calculated. | 1 2 3 4 .... m-1 m m+1 ---> j ---+---------------------------- 1 | m+1 m m-1 m-2. . 3 2 1 2 | m m ^ ^ . . ^ ^ 1 3 | m-1 < m-1 | . . | | 1 4 | m-2 <---- m-2. . | | 1 . | . . . . . . | | 1 . | . . . . . . | | 1 n-1 | 3 <------------- 3 | 1 n | 2 <---------------- 2 1 n+1 | 1 1 1 1 1 1 1 1 1 | | v | i | Recovering the witness ---------------------- In the three problems we have seen --- the pinball game, expression evaluation and edit distance --- we have calculated the optimum value without recording how it is obtained. Thus, though we know the best score we can reach in the pinball game, we don't know how to guide the ball to achieve this score. Likewise, though we know the maximum value attainable by the input arithmetic expression, we don't know how to bracket it to achieve such a value. Finally, in the edit distance problem, we would like to recover a sequence of edit operations that matches the edit distance we calculate. Adding this information is just a matter of recording one more item of information at each point in the memo table. Let us start with the pinball game. Recall the inductive definition of score(i,j): score(i,j) = points(i,j) + max [ score(i+1,j), score(i+1,j+1) ] How do we decide whether the ball should go left or right at (i,j)? Clearly, if the maximum in the next row is at (i+1,j), the ball should go left while if the maximum is at (i,j+1), the ball should go right. If the two values are the same, we could go either way. Thus, while computing score(i,j) we can also compute a quantity direction(i,j) which is set to "L" or "R" (we can even add the possibility "{L,R}" if we want to record all posssible optimal paths). Now, direction(1,1) will tell us the first step in the ball's trajectory. If we follow this direction, we reach (1,1) or (1,2). We keep following the path indicated by direction(i,j) as we walk down the board and recover an optimal path. What about bracketing our arithmetic expression to obtain the optimum value? Recall that maxval(i,j) = max op(k)(maxval(i,k-1),maxval(k+1,j)) k in [i..j] k even The value of k that maximizes the right hand side tells us how to break up the expression at the top level. If k' is this maximum value of k, set breakpoint(i,j) = k' This means that the subexpression spanning positions [i..j] is bracketed as ([i..k'-1]) op(k') ([k'+1..j]). We can now look at breakpoint(i,k'-1) and breakpoint(k'+1,j) to find out how to break the left and right subexpressions. As before, we follow these links till we reach a trivial expression of length one. From this collection of breakpoints, we recover the optimal bracketing for the original expression. In a similar vein, we can build up a sequence of actual edit operations that achieves the edit distance. Each time we calculate a value of ed(i,j) using the inductive definition ed(i,j) | ai == bj = ed(i+1,j+1) | otherwise = 1 + max(ed(i+1,j+1), -- transform bj to ai ed(i+1,j), -- insert ai before bj ed(i,j+1)) -- delete bj the expression that actually determines ed(i,j) tells us what operation we need to perform. We can thus record op(i,j) to be one the values {nil,transform,insert,delete}, with the obvious interpretation. In all these examples, notice that we record a bounded amount of additional information at each point in the memo table (or, alternatively, we maintain two parallel memo tables of bounded size). ======================================================================