\begin{document}
\documentclass[11pt]{article}

\usepackage{latexsym}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{hyperref}
\usepackage{algorithmic}
\usepackage{algorithm}
\usepackage{complexity}
\usepackage{graphicx}
\usepackage{palatino} %my favourite font
\usepackage{makeidx}
\makeindex

\newcommand{\handout}[5]{
  \noindent
  \begin{center}
    \framebox[\textwidth]{
      \vbox{
        \hbox to \textwidth { {\bf #2}\hfill {\bf Computational Complexity }
       }
        \vspace{4mm}
        \hbox to \textwidth { {\Large \hfill #5  \hfill} }
        \vspace{2mm}
        \hbox to \textwidth { {\em #3 \hfill #4} }
      }
    }
  \end{center}
  \vspace*{4mm}
}

\newcommand{\lecture}[4]{\handout{#1}{#2}{Instructor: #3}{Scribe:
    #4}{#1}}

\newtheorem{theorem}{Theorem}
\newtheorem{theorem*}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{observation}[theorem]{Observation}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{definition}[]{Definition}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{fact}[]{Fact}
\newtheorem{subclaim}[theorem]{Subclaim}

% my custom commands
\newcommand{\inparen}[1]{\left(#1\right)}             %\inparen{x+y}  is (x+y)
\newcommand{\inbrace}[1]{\left\{#1\right\}}           %\inbrace{x+y}  is {x+y}
\newcommand{\insquar}[1]{\left[#1\right]}             %\insquar{x+y}  is [x+y]
\newcommand{\inangle}[1]{\left\langle#1\right\rangle} %\inangle{A}    is <A>
\newcommand{\abs}[1]{\left|#1\right|}                 %\abs{x}        is |x|
\newcommand{\norm}[1]{\left\Vert#1\right\Vert}        %\norm{x}       is ||x||
\newcommand{\union}{\cup}
\newcommand{\Union}{\bigcup}
\newcommand{\intersection}{\cap}
\newcommand{\Intersection}{\bigcap}
\newcommand{\super}[2]{#1^{\inparen{#2}}}             %\super{G}{i-1} is G^{(i-1)}
\newcommand{\setdef}[2]{\inbrace{{#1}\ : \ {#2}}}
\newcommand{\inv}[1]{#1^{-1}}
\newcommand{\inrpdt}[2]{\left\langle{#1},{#2}\right\rangle}%\inrpdt{x}{y} is <x,y>.
\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\iso}{\cong}
\newcommand{\bea}[1]{\begin{eqnarray*} #1 \end{eqnarray*}}
\newcommand{\legendre}[2]{\inparen{\frac{#1}{#2}}}
\newcommand{\zo}{\inbrace{0,1}}
\newcommand{\Rot}{\text{Rot}}
\newcommand{\tensor}{\otimes}


% Commands specific to this file
% TODO: Find the right way to typeset group index
\DeclareMathOperator{\Sym}{Sym}
\newcommand{\gpidx}[2]{\insquar{#1 : #2}}             %\gpidx{H}{K}   is [H : K]
\newcommand{\gpigs}[2]{\gpidx{\super{G}{#1}}{\super{G}{#2}}} %Group index of g super ...
\newcommand{\llhd}{\!\!\lhd\!\!\lhd}
\newcommand{\roundoff}[1]{\left\lfloor #1 \right\rceil}


% \newcommand{\ceil}[1]{\lceil #1 \rceil}
\newcommand{\floor}[1]{\lfloor #1 \rfloor}
\newcommand{\F}{\mathbb{F}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\Z}{\mathbb{Z}}
\renewcommand{\R}{\mathbb{R}}
\renewcommand{\C}{\mathbb{C}}

%for expected value
\newcommand{\Ex}{{\bf E}}

%for algorithms
\renewcommand{\algorithmicrequire}{\textbf{Input:}}

%Dirac's notation
\newcommand{\bra}[1]{\left\langle#1\right|}
\newcommand{\ket}[1]{\left|#1\right\rangle}

%ZigZag Product
\newcommand{\zigzag}{\textcircled{z}}

\lecture{Derandomization: The Deathly Hallows}{CS640}{Manindra Agrawal}{Ramprasad Saptharishi}

Last class we revisited MA protocols. And by putting certain
restrictions on the size of the proof, number of random bits used by
verifier, and the number of probes into the proof, we came up with the
definition of MIP. We stated the following theorem:

\begin{theorem}
$\MIP = \NEXP$
\end{theorem}

In this case we had exponential sized proofs and polynomial many
probes and random bits. We wanted to see if a scale-down can tell us
anything about NP. Then we stated the PCP theorem:\index{PCP Theorem}

\begin{theorem}
$\NP = \PCP(O(\log n),O(1)).$ 
\end{theorem}

Infact the constant number of probes can be made to $3.$ The
philosophical interpretation of this is amazing: some gives you a
polynomial sized proof of the statement that $x\in L$ and all you need
to do is read $3$ bits of this proof to verify whether the proof is
right or not with good probability. 

This came as a {\em huge} surprise to complexity theorists and is
considered one of the crowning achievements in this field. If one
needs a picture of why this could be true, look at the proof as an
error-correcting code. So the idea is that, if the proof was
incorrect, then it differs from a right proof (if any) at some
places. Encode it with an error correcting code. Then, even if there
was a single place where it differed from the actual proof, it would
now propogate to a lot of places in the encoded word. So finding an
error there is a lot easier. 

The original proof is quite complex and involves a builds up on a lot
of earlier results. A much simpler proof of the PCP theorem was given
by Irit Dinur in 2005. We shall be doing her proof in this course. \\

\section{The Deathly Hallows}

We shall make a digression and look at randomization again and certain
combinatorial objects that can help reduce (or remove) randomness from
algorithms. 

We shall discuss 3 combinatorial objects: 
\begin{enumerate}
\item Something that squeezes out ``nearly uniform'' bits from
``partially random'' source. 
\item Something that reduces our dependence on random bits used in
algorithms
\item Something that eliminates randomness all together
\end{enumerate}

One may wonder why we need the second object if we can eliminate the
randomness all together. The fact is that we know how to make create
the second object but not the third. 

\subsection{Extractors}

This is our first object - to extract ``nearly uniform'' bits from a
``partially random'' source. We first need to formally define what we
mean by partially random or nearly random distributions.

\subsubsection{Some Parameters on Distributions}\index{probability distributions}

Let $X$ be a probabilistic distribution over $\zo^n.$ If $X$ were the
uniform distribution, then for every string $s\in \zo^n$, $\Pr[X = s]
= 2^{-n}.$ But if the distribution was skewed then this need not be
true. 

We need some way of characterizing how skew distributions are. Suppose
the distribution was such that some string like $00\cdots 0$ appeared
with probability $1/2$, then the distribution is really skewed; most
of the time you would sample this string. 

Hence, one way to characterize the skewness is to look at the item
that gets sampled with highest probability; this is certainly atleast
$2^{-n}.$ If this wasn't too large, then we intuitively see that it's
not too skewed. This is captured by the definition of {\em
min-entropy.}

\begin{definition}\index{probability distributions!min-entropy}
A distribution $X$ over $\zo^n$ is said to have min-entropy $k$ if
$$
\max_{s\in \zo{n}}\Pr[X=s] \leq \frac{1}{2^k}
$$
\end{definition}

Note that min-entropy of any distribution over $n$-bit strings is
between $0$ and $n.$ If the min-entropy was $n$ then we already have a
uniform distribution. 

Intuitively, if min-entropy was $k$, then the distribution has
essentially only $k$ truly random bits. Interpret this over the
following distribution over $n$ bit strings. Define the probability as
$2^{-k}$ for all strings that have the last $n-k$ bits as zeroes. In
essence, this is just a uniform distribution over the first $k$ bits
and padding them up with zeroes. 

Thus, any distribution with min-entropy has $k$ truly random bits
inside. We are interested in extracting them somehow.\\

Next is the notion of ``nearly uniform.'' What does it mean to say a
distribution is almost random? That is captured by the following
distance measure between two distributions. 

\begin{definition}\index{probability distributions!distance}
For any two distributions $X,Y$ over $\zo^n$, define the distance
between them as
$$
\delta(X,Y) = \frac{1}{2}\sum_{s\in \zo^n}\abs{\Pr[X=s] - \Pr[Y=s]}
$$
\end{definition}

If $X$ was indeed equal to $Y$ then the distance is zero. And since 
\bea{
\frac{1}{2}\sum_s\abs{\Pr[X=s] - \Pr[Y=s]} & \leq
& \frac{1}{2}\sum_s\Pr[X=s]
+ \frac{1}{2}\sum_sPr[Y=s]\\
& = & \frac{1}{2} +\frac{1}{2} = 1
}
the distance is always a number between $0$ and $1.$ The distance is
equal to $1$ if the two distributions are orthogonal\footnote{for any
string $s$, either $\Pr[X=s]=0$ or $\Pr[Y=s]=0$}.

And with this definition we can formally define the notion of ``nearly
random''. 

\begin{definition}\index{probability distributions!$\epsilon$-close to uniform}
A distribution $X$ is said to be $\epsilon$ close to uniform if
$\delta(X,U_n) \leq \epsilon$ where $U_n$ is the uniform distribution
over $n$-bit strings. 
\end{definition}

Now if you look back at all the randomized algorithms we have done,
they assume truly random bits. It can be seen that instead of that, if
we were to provide them with random bits that are $\epsilon$ close to
uniform, then the error probability just goes from $1/3$ to
$1/3+\epsilon.$ \\

With these, we can now define our extractors formally.

\begin{definition}\index{extractors}
A $(k,\epsilon)$ extractor is a polynomial time computable function
$E:\zo^n\times \zo^t\longrightarrow \zo^m$ such that for any
distribution $X$ over $\zo^n$ of min-entropy at least $k$, 
$$
\delta\inparen{E(X,U_t),U_m} \leq \epsilon
$$
\end{definition}

So $E$ is an extractor that takes a distribution of reasonable
min-entropy, and a random {\em seed} of $t$ purely random bits and
{\em extracts} out the randomness from $X.$ This is meaningless if $t$
was large. For instance if $t$ was as large as $m$, then we can just
ignore the first parameter of $E$ and just output the uniform bits. 

However, the power of this is seen when $t$ is much much smaller than
$n$ or $m.$ We shall prove the following theorem later.

\begin{theorem}
There exists an $(n^\delta, \epsilon)$ extractor $E$ where 
$$E:\zo^{n}\times \zo^{O(\log
n)} \longrightarrow \zo^{n^{\delta'}}\qquad \delta'\leq \delta$$
\end{theorem}

This in essence means that you need your random seed as small as just
logarithmic in the $m$ and $n.$ With just that many seeds, you can
extract almost all the random bits from $X.$

It can be shown that the random seed is essential and that to do any
reasonable extraction, you need $O(\log n)$ random bits as a seed. Of
course, in the next iteration one could use the almost uniform random
bits as a seed. But we need $O(\log n)$ random bits to start with. 

In a sense, the seed is a catalyst to the reaction, you need it to
start the process; without that we won't be able to extract any
randomness. 

\subsection{Expanders}\index{expanders}

This is the object that will help us reduce the number of random bits
used in randomized algorithms. This has a totally different definition
compared to extractors; these are graphs. 

\begin{definition}\index{expanders!vertex expansion}
A $d$-regular graph $G$ on $n$ vertices is called an
$(n,d,\epsilon)$-vertex expander  if for every subset $S$ of size less than
$n/2$, the number of edges going out of $S$ is atleast $\epsilon
d|S|.$ That is, $E(S,V\setminus S)\geq \epsilon d|S|.$
\end{definition}

Since $G$ is a $d$-regular graph, the number of vertices incident on
$S$ is $d|S|.$ The definition of expander says that an $\epsilon$
fraction of those edges actually go out of $S.$ 

\subsubsection{Boosting Success Probability using Expanders}
\index{expanders!boosting $\RP$ success}
Let us take some randomized algorithm we did in class. To see the
power of expanders better, let us take an algorithm whose error is
one-sided. We saw numerous algorithms where if $x$ was in the
language, then the algorithm accepts with probability $1.$ The error
was only when $x\notin L$, we could incorrectly accept $x$ with some
small probability. Let us say this error was $1/3$ to begin with. We
want to boost this error. 

Our approach was to repeat this algorithm $k$ times and therefore the
error probability would come down to $(1/3 )^k.$ What is the number of
random bits used in the process? Suppose a single round used $m$ purely
random bits. Then this method of boosting would use $km$ random bits
totally. 

We shall see how this can be reduced substantially using expander
graphs. Look at all the possible random strings you could get. The
algorithm uses $m$ random bits, the number of possible random strings
is $n=2^m.$ Of these strings, some of them lead the algorithm to the
right answer, and some of them don't. Let $S$ be the set of bad
strings, those that lead the algorithm to the wrong answer. Since the
error is bounded by $1/3,$ it follows that $|S|< n/3 < (1/2)n.$

Consider a $(n,d,\epsilon)$ expander where each vertex encodes a
sequence of $m$ bits. Note that $d,\epsilon$ are constants. Now $S$,
the set of bad vertices, is a subset whose size is less than $n/2$ and
therefore has the expansion property. Pick a random vertex $v.$ If
this vertex was outside $S$, then we already have a witness. The bad
case when we get a vertex inside $S.$ At this point, pick one of the
$d$ neighbours of $S$ at random and move to vertex. Note that, by the
expansion properties of the graph, an $\epsilon$ fraction of the edges
go out. Therefore, with prob at most $(1-\epsilon)$ you will remain in
$S.$ Now repeat this picking a random neighbour and moving to that for
say $k$ times. Then the probability that this entire walk stayed
inside $S$ is about $(1-\epsilon)^k.$ In this way, you boost the error
probability to again inverse exponential just like the naive
independent trials method. 

But what is the total number of random bits used? The first vertex to
be picked needed $m$ random bits. After that, to pick a neighbour at
random, we just need $\log d$ random bits. And since $d$ is a
constant, this is just contant number of random bits. Therefore, the
total number of random bits used is $m+O(k)$ which is way better than
$mk.$\\

Notice that there is a small catch here. How do we construct this
expander? The problem is that the size is {\em huge}, $2^m$
vertices. How do we even construct such graphs? Notice that you really
don't need to store the entire graph (since that would be exponential
space and would destroy all hopes of efficient computation). All we
need is, given a vertex $v$, what are the neighbours of $v$? 

This is accomplished by maps called {\em rotation
maps}\index{expanders!rotation maps}. These are maps that take $u$ and
an integer $i\leq d$ as input and returns the $i$-th neighbour $v$ of
$u$ and a $j$ such that $u$ is the $j$-th neighbour of $i$.  And
fortunately, expanders can be constructed such that this map is
polynomial time computable; we shall see this soon.

And thus, we can reduce the number of random bits used in the
boosting. 

\subsection{Pseudo-random Generators}

This is the third object; the one that completely eliminates
randomness. Let us first the formal definition of it and then look at
the intuition. 

\begin{definition}\index{PRG}
Let $f:\zo^\star\longrightarrow \zo^\star$ be a function that
satisfies the following properties:
\begin{itemize}
\item $f(x)$ is computable in $2^{O(|X|)}$ time.
\item There exists a constant $c$ such that on any string $x$ of size
$c\log n$, $f(x)$ is a string of size less than $n.$ In other words,
$f$ stretches strings of $O(\log n)$ bits to $n$ bits.

\item For every circuit $C$ of size $n$ with $n$ inputs,
$$
\abs{\Pr_{x\in \zo^n}[C(x)= 1] - \Pr_{y\in \zo^{c\log
n}}[C(f(y))=1]}\leq \frac{1}{n}
$$
\end{itemize}
Then $f$ is called an optimal pseudo-random generator against linear
sized circuits. 
\end{definition}

First let us explain the adjectives used here. It is optimal this
pseudo-random generator is stretchings of logarithmic size to $n$
bits. It can be shown that there does not exist any pseudo-random
generator that stretches some function less than $\log n$ to $n$
bits. And hence, in that sense it is optimal. 

It is called a pseudo-random because the output of $f$ is strings of
$\zo^n$ but they aren't really random but appear random to the circuit
$C.$ To understand this better, think of $C$ as a randomized
algorithm. Suppose its input was fixed and $C$ is just look for the
random coin tosses so that it can give the answer. 

What we know is that if $x$ is a string in the language, then
$\Pr_{x\in \zo^n}[C(x) = 1] > 2/3$ is atleast and if it was not in the
language this is less than $1/3.$ Suppose instead of $x$, we pick up a
random string $y$ of size $c\log n$ and give $f(y)$ to $C.$ Then by
the third property, we know that $\Pr_{y\in \zo^{c\log n}}[C(f(y))=1]$
is just about $1/n$ away from $1/3$ or $2/3$ and therefore is still
good enough. In essense, instead of giving $C$ really random bits, $f$
provides it with bits that aren't really random but still manages to
make $C$ work properly. That is what is meant by pseudo-random
generators {\em against} linear sized circuits. 

Another way to look at it is think if $C$ as a distinguisher circuit -
tries to distinguish between random and pseudo-random bits provided as
input. The third property tells us that $C$ cannot really distinguish
between them since the output of $C$ on real random bits and outputs
of $f$ isn't too far away from each other. Therefore, the function $f$
is able to {\em fool} $C$ into thinking that it is actually providing
it with truly random bits. \\

Now notice that $f$ is stretching strings as small as $c\log n$ bits
to $n$ bits. Therefore, we can just run over every string
$y\in \zo^{c\log n}$ and find the fraction of points where
$C(f(y))=1.$ We are guarenteed that this is very close to $1/3$ or
$2/3$ and therefore we will know for sure how $C$ would behave if
provided with truly random bits. 

Thus, by running over every possible $y$, which can be done in
polynomial time since $f$ is computable in $2^{O(|x|)}$ time, we can
completely remove randomness from the algorithm $C.$ Summarizing this
as a theorem:

\begin{theorem}
If there exists an optimal, pseudo-random generator against linear
sized circuits, then $\BPP=\P.$\qed
\end{theorem}


Unfortunately, we do not know of any way to find such PRGs at the
moment. But there are strong evidences to believe that such PRGs
exist. And hence, people believe that $\BPP =\P.$ However finding a
PRG of this kind doesn't mean we can derandomize classes like $\AM.$
To remove randomness from higher classes, we need more powerful PRGs. \\

We shall get back to this after our discussion on expanders and the
$\PCP$ theorem. We will see that PRGs are strongly connected to
extractors. 



\end{document}