\begin{document} \documentclass[11pt]{article} \usepackage{latexsym} \usepackage{amsmath} \usepackage{amssymb} \usepackage{amsthm} \usepackage{hyperref} \usepackage{algorithmic} \usepackage{algorithm} \usepackage{complexity} \usepackage{graphicx} \usepackage{palatino} %my favourite font \usepackage{makeidx} \makeindex \newcommand{\handout}[5]{ \noindent \begin{center} \framebox[\textwidth]{ \vbox{ \hbox to \textwidth { {\bf #2}\hfill {\bf Computational Complexity } } \vspace{4mm} \hbox to \textwidth { {\Large \hfill #5 \hfill} } \vspace{2mm} \hbox to \textwidth { {\em #3 \hfill #4} } } } \end{center} \vspace*{4mm} } \newcommand{\lecture}[4]{\handout{#1}{#2}{Instructor: #3}{Scribe: #4}{#1}} \newtheorem{theorem}{Theorem} \newtheorem{theorem*}{Theorem} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{observation}[theorem]{Observation} \newtheorem{proposition}[theorem]{Proposition} \newtheorem{definition}[]{Definition} \newtheorem{claim}[theorem]{Claim} \newtheorem{fact}[]{Fact} \newtheorem{subclaim}[theorem]{Subclaim} % my custom commands \newcommand{\inparen}[1]{\left(#1\right)} %\inparen{x+y} is (x+y) \newcommand{\inbrace}[1]{\left\{#1\right\}} %\inbrace{x+y} is {x+y} \newcommand{\insquar}[1]{\left[#1\right]} %\insquar{x+y} is [x+y] \newcommand{\inangle}[1]{\left\langle#1\right\rangle} %\inangle{A} is <A> \newcommand{\abs}[1]{\left|#1\right|} %\abs{x} is |x| \newcommand{\norm}[1]{\left\Vert#1\right\Vert} %\norm{x} is ||x|| \newcommand{\union}{\cup} \newcommand{\Union}{\bigcup} \newcommand{\intersection}{\cap} \newcommand{\Intersection}{\bigcap} \newcommand{\super}[2]{#1^{\inparen{#2}}} %\super{G}{i-1} is G^{(i-1)} \newcommand{\setdef}[2]{\inbrace{{#1}\ : \ {#2}}} \newcommand{\inv}[1]{#1^{-1}} \newcommand{\inrpdt}[2]{\left\langle{#1},{#2}\right\rangle}%\inrpdt{x}{y} is <x,y>. \newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\iso}{\cong} \newcommand{\bea}[1]{\begin{eqnarray*} #1 \end{eqnarray*}} \newcommand{\legendre}[2]{\inparen{\frac{#1}{#2}}} \newcommand{\zo}{\inbrace{0,1}} \newcommand{\Rot}{\text{Rot}} \newcommand{\tensor}{\otimes} % Commands specific to this file % TODO: Find the right way to typeset group index \DeclareMathOperator{\Sym}{Sym} \newcommand{\gpidx}[2]{\insquar{#1 : #2}} %\gpidx{H}{K} is [H : K] \newcommand{\gpigs}[2]{\gpidx{\super{G}{#1}}{\super{G}{#2}}} %Group index of g super ... \newcommand{\llhd}{\!\!\lhd\!\!\lhd} \newcommand{\roundoff}[1]{\left\lfloor #1 \right\rceil} % \newcommand{\ceil}[1]{\lceil #1 \rceil} \newcommand{\floor}[1]{\lfloor #1 \rfloor} \newcommand{\F}{\mathbb{F}} \newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\Z}{\mathbb{Z}} \renewcommand{\R}{\mathbb{R}} \renewcommand{\C}{\mathbb{C}} %for expected value \newcommand{\Ex}{{\bf E}} %for algorithms \renewcommand{\algorithmicrequire}{\textbf{Input:}} %Dirac's notation \newcommand{\bra}[1]{\left\langle#1\right|} \newcommand{\ket}[1]{\left|#1\right\rangle} %ZigZag Product \newcommand{\zigzag}{\textcircled{z}} \lecture{Random Walks on Expanders}{CS640}{Manindra Agrawal}{Ramprasad Saptharishi} Last class we saw how linear algebra can be used; we saw that spectral expansion and vertex expansion are essentially equivalent. To illustrate yet another example of how we can use linear algebra to prove theorems on expanders, let us prove the theorem we informally stated in the earlier lecture - amplifying success probability in an $\RP$ algorithm. \\ We stated that if we consider a small subset and took a random walk starting from a vertex in this subset, then the probability that we don't leave this set is exponentially small. We shall prove this formally this class. \section{Random Walks on Expanders Graphs} \index{expanders!random walks} \begin{theorem} Let $S$ be an arbitrary subset of vertices of a $(n,d,\lambda)$ expander of size at most $\delta n$. Pick a vertex $X_1$ at random and start a walk of length $k$ from that vertex. Let the walk be $X_1, X_2, \cdots, X_{k+1}.$ Then $$ \Pr[X_1,X_2,\cdots X_{k+1} \in S] \leq \inparen{(1-\lambda)\delta + \lambda}^k $$ \end{theorem} \begin{proof} We shall express all probabilities, etc in terms of matrix/vector products and analyze them. We first pick a vertex at random and therefore any vertex of the graph can be picked with probability $1/n.$ Let us capture this by using a vector $\ket{p_1} = \frac{1}{n}\sum_{i=1}^n\ket{i}$ that has a value $1/n$ in every coordinate. We want the entire walk in the set $S$ and therefore we want the first vertex $X_1$ also to belong to the set $S.$ In order to capture this, we need to rule out the cases when the vertex does not belong to $S.$ This can be captured by the diagonal matrix, say $B$, that has a $1$s exactly at all $(i,i)$-th positions when $i\in S$ and the rest of the entries $0.$ Multiplying any vector with this matrix would preserve just the coordinates corresponding to $S$ and kill all other coordinates to $0.$ In Dirac's notation $$ B = \sum_{i\in S}\ket{i}\bra{i} $$ Hence, if we want $X_1$ to belong to $S$, we need to sum up all the coordinates of $B\ket{p_1}$ which is just the $L_1$ norm of $B\ket{p_1}.$ Then $$ \Pr[X_1\in S] = \abs{B\ket{p_1}}_1 %= \bra{\hat{1}}B\ket{p_1}\leq \sqrt{n}\norm{B\ket{p_1}} $$ What about the next vertex? Notice that if you start at some probability vector $v$ which represents the current distribution over vertices, the new distribution after choosing a neighbour at random is just $Av.$ And now we need to check if this new vertex is in $S$ and therefore $$ \Pr[X_1,X_2 \in S] = \abs{BAB\ket{p_1}}_1 $$ And similarly, for a walk of $k$ steps, the final probability would just be $$ \Pr[X_1,X_2,\cdots, X_{k+1}\in S] = \abs{(BA)^kB\ket{p_1}}_1 $$ It is always useful to convert it into $L_2$ norms since it is easier to analyze than compared to the $L_1$ norm. Let $\ket{\hat{1}}$ be the vector with every coordinate as $1$. Then $\abs{v}_1 = \bra{\hat{1}}\ket{v}$ and therefore by applying Cauchy Schwarz, we get $\abs{v}_1 = \bra{\hat{1}}\ket{v} \leq \sqrt{n}\norm{v}.$ Here is one possible approach: we have a notion of a norm of a vector. What about the norm of a matrix? Thinking of a matrix as a transformation taking one vector to another, can amount by which the matrix ``stretches'' the vector be used to define the norm of a matrix? \begin{definition}\index{linear algebra!norm of a matrix} For an $n\times n$ matrix $C$, the norm of $C$ is said to be the least $\alpha\geq 0$ such that for every vector $\ket{v}$, $\norm{C\ket{v}}\leq \alpha\norm{\ket{v}}.$ \end{definition} Thus we could use this definition to get an upper bound on $\norm{(BA)^kB\ket{p_1}}$ by eliminating one matrix at a time. Let us try and see what the norm of $B$ and $A$ are. The norm of $B$ is $1$ since if you give it a vector that has non-zero entries at exactly those coordinates corresponding to $S$, then $B$ would just preserve the vector. Therefore, the norm of $B$ is $1.$ What about $A$? Unfortunately, the norm of $A$ is $1$ as well. Thinking of vectors expressed as linear combinations of the eigenbasis, notice that except the direction of $\ket{v_1}$, $A$ shrinks every other direction since $|\lambda_i|\leq 1.$ But since $A$ preserves $\ket{v_1}$, the norm of $A$ is $1$ as well. But then this would give us something as trivial as $Pr[X_1,X_2,\cdots, X_k\in S] \leq 1.$ But the point is that we learn something important from this attempt: \begin{itemize} \item The matrix $B$ kills all components of the vector in directions outside $S.$ \item The matrix $A$ shrinks all but the component along $\ket{\hat{1}}$ by at least $\lambda.$ \end{itemize} So instead, let us look at the norm of the matrix $BA.$ At this point we shall use a nice trick. Write the matrix $A = (1-\lambda)J + \lambda C,$ where $J$ the $n\times n$ matrix with every entry being $1/n.$ One could express $J$ as $\frac{1}{n}\bra{\hat{1}}\ket{\hat{1}}$ in the dirac notation. We would be using the following two important properties of the norm. \begin{observation} $\norm{P+Q} \leq \norm{P} + \norm{Q}$ and $\norm{PQ} \leq \norm{P}\norm{Q}$ \end{observation} \begin{proof} Suppose $\norm{P} = \alpha$ and $\norm{Q} = \beta.$ Then: \bea{ \norm{(P+Q)\ket{v}} & = & \norm{P\ket{v} + Q\ket{v}}\\ & \leq & \norm{P\ket{v}}+\norm{Q\ket{v}}\\ & \leq & \alpha\norm{\ket{v}} + \beta\norm{\ket{v}}\\ & \leq & (\alpha + \beta)\norm{\ket{v}} } The other property is pretty straight forward as well. \end{proof} Using this, we have $\norm{BA}\leq (1-\lambda)\norm{BJ} + \lambda\norm{BC}.$ Note that for any vector $\ket{v}$, $J\ket{v}$ will just have every coordinate as the average of all entries of $v.$ Let us say the average of all entries of $v$ was $t$. Then $BJv = B(t\ket{\hat{1}}).$ Now $B$ would kill all but the coordinates that belong to $S$ and therefore $\norm{B\ket{v}} = t\sqrt{|S|}.$ Therefore $$ \norm{BJ\ket{v}} = t\sqrt{S} = t\sqrt{\delta n} = \frac{\abs{\ket{v}}_1}{n}\sqrt{\delta n}\leq \frac{\sqrt{n}\norm{\ket{v}}}{n}\sqrt{\delta n} = \sqrt{\delta}\norm{\ket{v}} $$ We need to get a bound on $\norm{BC}$ and we would be done. \begin{claim} $\norm{C} \leq 1.$ \end{claim} \begin{proof} By our definition, $C = \frac{1}{\lambda}\inparen{A - (1 - \lambda)J}.$ Let $\ket{v}$ be any vector. Write $\ket{v} = \ket{v'} + \ket{w}$ where $\ket{v'}$ is parallel to $\ket{\hat{1}}$ and $\ket{w}$ is orthogonal to it. $$ C\ket{v'} = \frac{1}{\lambda}A\ket{v'} - \frac{1-\lambda}{\lambda}J\ket{v'} = \frac{1}{\lambda}\ket{v'} - \frac{1-\lambda}{\lambda}\ket{v'} = \ket{v'} $$ As for the orthogonal component, notice that $J\ket{w} = 0$ since $w$ is orthogonal to $\ket{\hat{1}}.$ Therefore, $$ C\ket{w} = \frac{1}{\lambda}\inparen{A\ket{w} - (1 - \lambda)J\ket{w}} = \frac{1}{\lambda}A\ket{w} $$ Taking norms, $$ \norm{C\ket{w}} = \frac{1}{\lambda}\norm{A\ket{w}} \leq \norm{\ket{w}} $$ since $\ket{w}$ is orthogonal to $\ket{\hat{1}}.$ Also note that the vector $C\ket{w}$ will be a vector in the orthogonal complement of $\ket{\hat{1}}.$ \bea{ \norm{C\ket{v}}^2 & = & \norm{C\ket{v'}}^2 + \norm{C\ket{w}}^2\\ & \leq & \norm{\ket{v'}}^2 + \norm{\ket{w}}^2 } And therefore, $\norm{C} \leq 1.$ It is infact equal to $1$ since the second eigenvector $\ket{v_2}$ is the candidate. \end{proof} And therefore, \bea{ \norm{BA} & \leq & (1 - \lambda)\norm{BJ} + \lambda\norm{BC}\\ & \leq & (1 - \lambda)\sqrt{\delta} + \lambda\\ \implies \norm{(BA)^kB\ket{p_1}} & \leq & \inparen{(1 - \lambda)\sqrt{\delta} + \lambda}^k\norm{B\ket{p_1}}\\ & = & \inparen{(1 - \lambda)\sqrt{\delta} + \lambda}^k\sqrt{\frac{|S|}{n^2}}\\ & = & \inparen{(1 - \lambda)\sqrt{\delta} + \lambda}^k\sqrt{\frac{\delta}{n}}\\ \implies \abs{(BA)^kB\ket{p_1}}_1 & \leq & \sqrt{n}\cdot\inparen{(1 - \lambda)\sqrt{\delta} + \lambda}^k\sqrt{\frac{\delta}{n}}\\ & = & \inparen{(1 - \lambda)\sqrt{\delta} + \lambda}^k\sqrt{\delta}\\ & \leq & \inparen{(1 - \lambda)\delta + \lambda}^k } And that proves the theorem. \end{proof} \section{Amplifying Success Probability in Randomized Algorithms} We did make a passing remark on how expanders can be used to reduce the number of random bits used in the naive boosting technique to boost the success probability. \subsection{Expanders in $\RP$ algorithms} \index{expanders!boosting $\RP$ success} Let us look at any $\RP$ algorithm; probabilistic algorithm with error only on one side. When $x$ was in the language, then your algorithm always answers yes on all random strings. The only case when you could be fooled is when $x$ is not in the language but your random choice also says yes. The naive approach is to repeat this say $k$ times and hope that you get at least one random choice that says no. Suppose the algorithm had error probability of say $\delta$ initially and used $m$ random bits. Then after $k$ tries, the error probability as $\delta^k$ and the total number of random bits would be $mk.$ We want to make this more efficient. \\ The idea is to interpret each random $m$-bit string as a vertex in an expander graph. Consider an $(2^m,d,\lambda)$ expander and each vertex encodes a random string. You pick a vertex at random in the graph. That corresponds to some string. Run that algorithm on this string. Then pick a random neighbour of this vertex and move to this vertex. Use the string corresponding to this vertex as your new random string and do the algorithm. Repeat this $k$ times by making a walk of length $k$ on the expander. Let $S$ be set of random strings on which your algorithm errs. Then since the error probability was $\delta$ to start with, $|S| = \delta n$ and we can apply the above theorem. Then we get the error down to exponential in $k$ by a $k$-step random walk. The total number of random bits is $m$ to get the first vertex, and then $\log d$ random bits to pick a random neighbour everytime. Therefore, the total number of random bits is just $O(m + k\log d)$ and $d$ is usually a constant and therefore $O(m+k)$ which is way better than $O(mk).$ \subsection{Expanders in $\BPP$ algorithm} \index{expanders!boosting $\BPP$ success} In order to reduce the number of random bits in a $\BPP$ algorithm we need a stronger theorem on random walks to apply for $\BPP.$ We won't be doing the proof of this; it is far more complex than the one we proved above. \begin{theorem} Let $S$ be a subset of vertices of an $(n,d,\lambda)$ expander graph such that $|S|\leq \delta n.$ Let $X_1, X_2, \cdots, X_{k+1}$ be a random walk on this expander graph. Then $$ \Pr\insquar{\abs{\frac{\#\setdef{i}{X_i\in S}}{k} - \delta} > \epsilon} \leq 2e^{-\frac{(1-\lambda)\epsilon^2\delta}{60}} $$ \end{theorem} This essentially means that if you start with say an error probability of $1/4$ initially, then essentially after a $k$ step random walk, the number of times you revisit the set of bad vertices is very close to $\delta k.$ Therefore, if you started with $1/4$, your final fractino will also be very close to $1/4$ and therefore, the majority vote will will give you the right answer with prob $2^{-O(k)}.$ Thus, with $O(m+k)$ random bits, you can amplify the success probability in a $\BPP$ algorithm as well. \end{document}