\usepackage{palatino} %my favourite font

        \hbox to \textwidth { {\bf #2}\hfill {\bf Computational Complexity }
        \hbox to \textwidth { {\Large \hfill #5  \hfill} }
        \hbox to \textwidth { {\em #3 \hfill #4} }

\newcommand{\lecture}[4]{\handout{#1}{#2}{Instructor: #3}{Scribe:


% my custom commands
\newcommand{\inparen}[1]{\left(#1\right)}             %\inparen{x+y}  is (x+y)
\newcommand{\inbrace}[1]{\left\{#1\right\}}           %\inbrace{x+y}  is {x+y}
\newcommand{\insquar}[1]{\left[#1\right]}             %\insquar{x+y}  is [x+y]
\newcommand{\inangle}[1]{\left\langle#1\right\rangle} %\inangle{A}    is <A>
\newcommand{\abs}[1]{\left|#1\right|}                 %\abs{x}        is |x|
\newcommand{\norm}[1]{\left\Vert#1\right\Vert}        %\norm{x}       is ||x||
\newcommand{\super}[2]{#1^{\inparen{#2}}}             %\super{G}{i-1} is G^{(i-1)}
\newcommand{\setdef}[2]{\inbrace{{#1}\ : \ {#2}}}
\newcommand{\inrpdt}[2]{\left\langle{#1},{#2}\right\rangle}%\inrpdt{x}{y} is <x,y>.
\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\bea}[1]{\begin{eqnarray*} #1 \end{eqnarray*}}

% Commands specific to this file
% TODO: Find the right way to typeset group index
\newcommand{\gpidx}[2]{\insquar{#1 : #2}}             %\gpidx{H}{K}   is [H : K]
\newcommand{\gpigs}[2]{\gpidx{\super{G}{#1}}{\super{G}{#2}}} %Group index of g super ...
\newcommand{\roundoff}[1]{\left\lfloor #1 \right\rceil}

% \newcommand{\ceil}[1]{\lceil #1 \rceil}
\newcommand{\floor}[1]{\lfloor #1 \rfloor}

%for expected value
\newcommand{\Ex}{{\bf E}}

%for algorithms

%Dirac's notation

%ZigZag Product

\lecture{Random Walks on Expanders}{CS640}{Manindra Agrawal}{Ramprasad Saptharishi}

Last class we saw how linear algebra can be used; we saw that spectral
expansion and vertex expansion are essentially equivalent. 

To illustrate yet another example of how we can use linear algebra to
prove theorems on expanders, let us prove the theorem we informally
stated in the earlier lecture - amplifying success probability in an
$\RP$ algorithm. \\

We stated that if we consider a small subset and took a random walk
starting from a vertex in this subset, then the probability that we
don't leave this set is exponentially small. We shall prove this
formally this class.

\section{Random Walks on Expanders Graphs}
\index{expanders!random walks}
Let $S$ be an arbitrary subset of vertices of a $(n,d,\lambda)$
expander of size at most $\delta n$. Pick a vertex $X_1$ at random and
start a walk of length $k$ from that vertex. Let the walk be $X_1,
X_2, \cdots, X_{k+1}.$ Then
\Pr[X_1,X_2,\cdots X_{k+1} \in S] \leq \inparen{(1-\lambda)\delta + \lambda}^k
We shall express all probabilities, etc in terms of matrix/vector
products and analyze them. We first pick a vertex at random and
therefore any vertex of the graph can be picked with probability
$1/n.$ Let us capture this by using a vector $\ket{p_1}
= \frac{1}{n}\sum_{i=1}^n\ket{i}$ that has a value $1/n$ in every

We want the entire walk in the set $S$ and therefore we want the first
vertex $X_1$ also to belong to the set $S.$ In order to capture this,
we need to rule out the cases when the vertex does not belong to $S.$
This can be captured by the diagonal matrix, say $B$, that has a $1$s
exactly at all $(i,i)$-th positions when $i\in S$ and the rest of the
entries $0.$ Multiplying any vector with this matrix would preserve
just the coordinates corresponding to $S$ and kill all other
coordinates to $0.$ In Dirac's notation
B = \sum_{i\in S}\ket{i}\bra{i}
Hence, if we want $X_1$ to belong to $S$, we need to sum up all the
coordinates of $B\ket{p_1}$ which is just the $L_1$ norm of
$B\ket{p_1}.$ Then
\Pr[X_1\in S] = \abs{B\ket{p_1}}_1 %= \bra{\hat{1}}B\ket{p_1}\leq \sqrt{n}\norm{B\ket{p_1}}

What about the next vertex? Notice that if you start at some
probability vector $v$ which represents the current distribution over
vertices, the new distribution after choosing a neighbour at random is
just $Av.$ And now we need to check if this new vertex is in $S$ and
\Pr[X_1,X_2 \in S] = \abs{BAB\ket{p_1}}_1
And similarly, for a walk of $k$ steps, the final probability would
just be
\Pr[X_1,X_2,\cdots, X_{k+1}\in S] = \abs{(BA)^kB\ket{p_1}}_1
It is always useful to convert it into $L_2$ norms since it is easier
to analyze than compared to the $L_1$ norm. Let $\ket{\hat{1}}$ be the
vector with every coordinate as $1$. Then $\abs{v}_1
= \bra{\hat{1}}\ket{v}$ and therefore by applying Cauchy Schwarz, we
get $\abs{v}_1 = \bra{\hat{1}}\ket{v} \leq \sqrt{n}\norm{v}.$

Here is one possible approach: we have a notion of a norm of a
vector. What about the norm of a matrix? Thinking of a matrix as a
transformation taking one vector to another, can amount by which the
matrix ``stretches'' the vector be used to define the norm of a

\begin{definition}\index{linear algebra!norm of a matrix}
For an $n\times n$ matrix $C$, the norm of $C$ is said to be the least
$\alpha\geq 0$ such that for every vector $\ket{v}$,
$\norm{C\ket{v}}\leq \alpha\norm{\ket{v}}.$

Thus we could use this definition to get an upper bound on
$\norm{(BA)^kB\ket{p_1}}$ by eliminating one matrix at a time. Let us
try and see what the norm of $B$ and $A$ are. 

The norm of $B$ is $1$ since if you give it a vector that has non-zero
entries at exactly those coordinates corresponding to $S$, then $B$
would just preserve the vector. Therefore, the norm of $B$ is $1.$

What about $A$? Unfortunately, the norm of $A$ is $1$ as
well. Thinking of vectors expressed as linear combinations of the
eigenbasis, notice that except the direction of $\ket{v_1}$, $A$ shrinks
every other direction since $|\lambda_i|\leq 1.$ But since $A$
preserves $\ket{v_1}$, the norm of $A$ is $1$ as well. 

But then this would give us something as trivial as
$Pr[X_1,X_2,\cdots, X_k\in S] \leq 1.$ But the point is that we learn
something important from this attempt:
\item The matrix $B$ kills all components of the vector in directions
outside $S.$
\item The matrix $A$ shrinks all but the component along
$\ket{\hat{1}}$ by at least $\lambda.$

So instead, let us look at the norm of the matrix $BA.$ At this point
we shall use a nice trick. Write the matrix $A = (1-\lambda)J
+ \lambda C,$ where $J$ the $n\times n$ matrix with every entry being
$1/n.$ One could express $J$ as
$\frac{1}{n}\bra{\hat{1}}\ket{\hat{1}}$ in the dirac notation. We
would be using the following two important properties of the norm.

$\norm{P+Q} \leq \norm{P} + \norm{Q}$ and $\norm{PQ} \leq \norm{P}\norm{Q}$
Suppose $\norm{P} = \alpha$ and $\norm{Q} = \beta.$ Then:
\norm{(P+Q)\ket{v}} & = & \norm{P\ket{v} + Q\ket{v}}\\
 & \leq & \norm{P\ket{v}}+\norm{Q\ket{v}}\\
 & \leq & \alpha\norm{\ket{v}} + \beta\norm{\ket{v}}\\
 & \leq & (\alpha + \beta)\norm{\ket{v}}
The other property is pretty straight forward as well.

Using this, we have $\norm{BA}\leq (1-\lambda)\norm{BJ}
+ \lambda\norm{BC}.$ Note that for any vector $\ket{v}$, $J\ket{v}$
will just have every coordinate as the average of all entries of $v.$
Let us say the average of all entries of $v$ was $t$. Then $BJv =
B(t\ket{\hat{1}}).$ Now $B$ would kill all but the coordinates that
belong to $S$ and therefore $\norm{B\ket{v}} = t\sqrt{|S|}.$ Therefore
\norm{BJ\ket{v}} = t\sqrt{S} = t\sqrt{\delta n}
= \frac{\abs{\ket{v}}_1}{n}\sqrt{\delta
n}\leq \frac{\sqrt{n}\norm{\ket{v}}}{n}\sqrt{\delta n} = \sqrt{\delta}\norm{\ket{v}}

We need to get a bound on $\norm{BC}$ and we would be done. 
$\norm{C} \leq 1.$
By our definition, $C = \frac{1}{\lambda}\inparen{A - (1
- \lambda)J}.$ Let $\ket{v}$ be any vector. Write $\ket{v} = \ket{v'}
+ \ket{w}$ where $\ket{v'}$ is parallel to $\ket{\hat{1}}$ and
$\ket{w}$ is orthogonal to it. 
C\ket{v'} = \frac{1}{\lambda}A\ket{v'}
- \frac{1-\lambda}{\lambda}J\ket{v'} = \frac{1}{\lambda}\ket{v'}
- \frac{1-\lambda}{\lambda}\ket{v'} = \ket{v'}
As for the orthogonal component, notice that $J\ket{w} = 0$ since $w$
is orthogonal to $\ket{\hat{1}}.$ Therefore, 
C\ket{w} = \frac{1}{\lambda}\inparen{A\ket{w} - (1 - \lambda)J\ket{w}}
= \frac{1}{\lambda}A\ket{w}
Taking norms, 
\norm{C\ket{w}}  = \frac{1}{\lambda}\norm{A\ket{w}} \leq \norm{\ket{w}}
since $\ket{w}$ is orthogonal to $\ket{\hat{1}}.$ Also note that the
vector $C\ket{w}$ will be a vector in the orthogonal complement of
\norm{C\ket{v}}^2 & = & \norm{C\ket{v'}}^2 + \norm{C\ket{w}}^2\\
 & \leq & \norm{\ket{v'}}^2 + \norm{\ket{w}}^2
And therefore, $\norm{C} \leq 1.$ It is infact equal to $1$ since
the second eigenvector $\ket{v_2}$ is the candidate. 

And therefore, 
\norm{BA} & \leq & (1 - \lambda)\norm{BJ} + \lambda\norm{BC}\\
 & \leq & (1 - \lambda)\sqrt{\delta} + \lambda\\
\implies \norm{(BA)^kB\ket{p_1}} & \leq & \inparen{(1
- \lambda)\sqrt{\delta} + \lambda}^k\norm{B\ket{p_1}}\\
 & = & \inparen{(1 - \lambda)\sqrt{\delta}
 + \lambda}^k\sqrt{\frac{|S|}{n^2}}\\
& = & \inparen{(1 - \lambda)\sqrt{\delta}
 + \lambda}^k\sqrt{\frac{\delta}{n}}\\
\implies \abs{(BA)^kB\ket{p_1}}_1 & \leq & \sqrt{n}\cdot\inparen{(1
 - \lambda)\sqrt{\delta} + \lambda}^k\sqrt{\frac{\delta}{n}}\\
& = & \inparen{(1 - \lambda)\sqrt{\delta} + \lambda}^k\sqrt{\delta}\\
& \leq  & \inparen{(1 - \lambda)\delta + \lambda}^k
And that proves the theorem.

\section{Amplifying Success Probability in Randomized Algorithms}

We did make a passing remark on how expanders can be used to reduce
the number of random bits used in the naive boosting technique to
boost the success probability.

\subsection{Expanders in $\RP$ algorithms}
\index{expanders!boosting $\RP$ success}

Let us look at any $\RP$ algorithm; probabilistic algorithm with error
only on one side. When $x$ was in the language, then your algorithm
always answers yes on all random strings. The only case when you could
be fooled is when $x$ is not in the language but your random choice
also says yes. The naive approach is to repeat this say $k$ times and
hope that you get at least one random choice that says no. 

Suppose the algorithm had error probability of say $\delta$
initially and used $m$ random bits. Then after $k$ tries, the error
probability as $\delta^k$ and the total number of random bits would be
$mk.$ We want to make this more efficient. \\

The idea is to interpret each random $m$-bit string as a vertex in an
expander graph. Consider an $(2^m,d,\lambda)$ expander and each vertex
encodes a random string. You pick a vertex at random in the
graph. That corresponds to some string. Run that algorithm on this
string. Then pick a random neighbour of this vertex and move to this
vertex. Use the string corresponding to this vertex as your new random
string and do the algorithm. Repeat this $k$ times by making a walk of
length $k$ on the expander. 

Let $S$ be set of random strings on which your algorithm errs. Then
since the error probability was $\delta$ to start with, $|S| =  \delta
n$ and we can apply the above theorem. Then we get the error down to
exponential in $k$ by a $k$-step random walk. 

The total number of random bits is $m$ to get the first vertex, and
then $\log d$ random bits to pick a random neighbour
everytime. Therefore, the total number of random bits is just $O(m +
k\log d)$ and $d$ is usually a constant and therefore $O(m+k)$ which
is way better than $O(mk).$

\subsection{Expanders in $\BPP$ algorithm}
\index{expanders!boosting $\BPP$ success}

In order to reduce the number of random bits in a $\BPP$ algorithm we
need a stronger theorem on random walks to apply for $\BPP.$ We won't
be doing the proof of this; it is far more complex than the one
we proved above. 

Let $S$ be a subset of vertices of an $(n,d,\lambda)$ expander graph
such that $|S|\leq \delta n.$ Let $X_1, X_2, \cdots, X_{k+1}$ be a
random walk on this expander graph. Then
\Pr\insquar{\abs{\frac{\#\setdef{i}{X_i\in S}}{k} - \delta}
> \epsilon} \leq 2e^{-\frac{(1-\lambda)\epsilon^2\delta}{60}}

This essentially means that if you start with say an error probability
of $1/4$ initially, then essentially after a $k$ step random walk, the
number of times you revisit the set of bad vertices is very close to
$\delta k.$ Therefore, if you started with $1/4$, your final fractino
will also be very close to $1/4$ and therefore, the majority vote will
will give you the right answer with prob $2^{-O(k)}.$ 

Thus, with $O(m+k)$ random bits, you can amplify the success
probability in a $\BPP$ algorithm as well.