\input{preamble}
\begin{document}
\lecture{Trevisan Extractor}{Pseudorandomness}{Srikanth Srinivasan}{Ramprasad Saptharishi}

\section{Introduction}

In the last meeting, we saw how we could use a hard function to
construct a pseudorandom generator and hence obtain some
derandomization results. This can be thought of as a way by which we
converted one pseudorandom object to another. 

People study another pseudorandom object called
\emph{extractors}. These are objects that ``extracts'' almost pure
random bits from weak random sources. Lets first define a few terms. 

\begin{definition}
Let $X$ be any distribution. The \emph{min-entropy} of $X$ is said to
be $k$ if for every $x$ in the domain, $X(x) \leq 2^{-k}$.
\end{definition}
Intuitively, what this says that there are some $k$ random bits that
are hidden in distribution $X$. 

\begin{definition}
  Given two distributions $X$ and $Y$, the statistical distance
  between the two distributions is denoted by $d(X,Y)$ or $|X - Y|$
  and is defined as
  $$
  d(X,Y) = |X - Y| = \frac{1}{2}\sum_x\abs{X(x) - Y(x)}
  $$
\end{definition}

\begin{observation}
  $d(X,Y) = \max_{A\subseteq S}(X(A) - Y(A))$
\end{observation}
From this observation, it is clear as to why it is called the
``statistical distance''. One could think of $A$ that maximizes the
RHS as a statistical test. Thus, if the distance between $X$ and $Y$
is $\epsilon$, then no statistical test can distinguish between them
with an advantage of more than $\epsilon$. Or in other words, $X$
behaves like $Y$ with respect to any statistical test. \\

Let us take a quick glimpse at the PRG setting, we wanted to create a
PRG such that the output of it behaves like the uniform distribution
with respect to small circuits. This notion of being indistinguishable
with respect to statistical tests seems like a stronger
property. Trevisan used the black-box proof of Nisan-Wigderson's
generator to show that it is actually a proof to show that we have a
good extractor. 

\subsection{Extractors}

\begin{definition}
An $(k,\epsilon)$ extractor is a function
$E:\zo^m\times \zo^l \longrightarrow \zo^n$ such that given any
distribution $X$ over $\zo^m$ with min-entropy at least $k$, 
$$
\abs{E(X,U_l) - U_n} \leq \epsilon        
$$
\end{definition}
Typically one would want $l$ to be as small as possible, and $n$ to be
close to $k+l$ (so that it extracts all the randomness of from the
input. 

How does one prove that a certain function is not an extractor? Here
is how it is usually done. 

Assume that $E$ is not an $(k,\epsilon)$ extractor. Then there exists
some $k$-source $X$ such that $\abs{E(X,U_l) - U_n} > \epsilon$. This
means that there is some $A\subseteq \zo^n$ such that 
\bea{
\Pr[E(x,U_l) \in A] - \Pr[U_n \in A] &> &\epsilon\\
\implies \Pr[A(E(x,y))=1] - \Pr[A(z) = 1]& > &\epsilon
}

For each $x\in \zo^m$, define $R_x = \Pr_y[A(E(x,y))=1] - \Pr[A(z) =
1] \leq 1$. Therefore, the last equation just says that $E_x[R_x]
> \epsilon$. Then, with probability at least $\frac{\epsilon}{2}$ over
the choice of $x$, $R_x > \frac{\epsilon}{2}$ (prove this!). Which
means that the same statistical test $A$ works for a large number of
$x$'s. Let us say $A$ is $\epsilon$-bad for $x$ if $R_x
> \frac{\epsilon}{2}$. 

\emph{Lets assume that we could prove that for any $A$, the
number of $x$'s such that $A$ is $\epsilon$-bad for $x$ is less than
$K$.} 

Then, one of the $x$'s must have large weight put on it. Since
$R_x \geq \frac{\epsilon}{2}$, we have
$K\frac{1}{2^k}\geq \frac{\epsilon}{2}$ which would then show that
$k\geq \log\inparen{\frac{K}{\epsilon}}$. Thus if we started with
a source $X$ with larger min-entropy, we would have a
contradiction. Thus, the function would then give a
$\inparen{\log\inparen{\frac{K}{\epsilon}}
+1, \epsilon}$-extractor. \\

The main bottleneck is in showing that w cn bound the number of $x$'s
that could be bad for any $A$. 

\section{Trevisan's Extractor}

Lets recall the Nisan-Wigderson pseudorandom generator. We started
with a $(n,l,m,\log n)$-design which is a family of $n$ subsets of $[l]$
with cardinality $m$ each such that the pairwise intersection is
bounded by $\log n$. Then, suppose we are given a function $f$, then
the generator $G_f$ is defined as follows:
\bea{
G_f:\zo^l&\longrightarrow&\zo^n\\
G_f(y) & = & f(y|_{S_1})f(y|_{S_2})\cdots f(y|_{S_n})
}
If the function was hard to compute, then this would be pseudorandom
for a certain class of circuits. \\

Let us look at the proof. It proceeded by assuming the contrary, and
then getting a circuit that would predict a bit of the generator
correctly with significant advantage. If we were to replace the
circuit $C$ by an oracle $A$, the entire proof would go through. We
would then end up with a small circuit that has oracle access to $A$
and can compute $f$ significantly. Since $C$ was just used as a black
box, one could think of $C$ as a statistical test $A$ and go
ahead. This was the motivation for Trevisan's extractor. 

\begin{definition}(preliminary)
Trevisan's extractor $E:\zo^M\times \zo^l\longrightarrow \zo^m$ is
defined in the following way. Let $x\in \zo^M$ where $M = 2^m$. Then
$x$ can be thought of as a truthtable of a boolean function
$f_x:\zo^m\longrightarrow \zo$. Now define $E(x,y) = G_{f_x}(y)$.
\end{definition}
\begin{theorem}(intentionally vague) The above function is indeed an
extractor with appropriate parameters. 
\end{theorem}

I am being vague here because giving the parameter takes the focus
off the proof ideas which, in my opinion, is more important. 

\begin{proof}
Just as we outlined in the earlier section, we have a statistical test
$A\subseteq \zo^n$ that distinguishes the output of the extractor from
the uniform distribution. Suppose $A$ is $\epsilon$-bad for $x$, then
$R_x\geq \frac{\epsilon}{2}$. This is just the same as
\bea{
\Pr[A(E(x,y))=1] - \Pr[A(z) = 1] & > & \frac{\epsilon}{2}\\
\implies \Pr[A(G_{f_x}(y))=1] - \Pr[A(z)=1] & > & \frac{\epsilon}{2}
} 

If we bootstrap the Nisan-Wigderson generator's proof, we get a small
oracle circuit that computes $f_x$ at $\frac{1}{2}+\frac{\epsilon}{2}$
points. A quick glimpse as to how we would achieve this. We use a
hybrid argument, we get a next-bit-predictor. We fix the remaining
random bits to keep the inequality. Now using the fact that the
correlation between the sets $S_i$ and $S_j$ is small, we build small
circuits that compute the value over the intersection and use that to
build a small oracle circuit that evaluates $f_x$ at more than
$\frac{1}{2}+\frac{\epsilon}{2n}$ places. Now let this new circuit be
$C''$. How many bits are required to describe $C''$? We need to say
what is the bit that the next-bit-predictor predicts, what are the
fixed random bits, what are the small circuits that evaluate the
function on the intersection. Therefore, the total number of bits is
$\log n + (l - m) + (n-i) + in \leq 2n^2$ bits of information. 

Why do we need to go through all this? Suppose we can completely
recover $x$ somehow,  then the number of $x$'s for which $A$ is
$\epsilon$-bad is at most $2^{2n^2}$. That would then give us a good
extractor! 

Even otherwise, for how many $y$'s is it the case that the truthtable
of $C_x$ agrees with $y$ in more than
$\frac{1}{2}+\frac{\epsilon}{2n}$? This could potentially be huge. But
this is only because our $x$ was arbitrary. What if we had some more
structure on $x$? The answer is glaring in front of the eyes: use an
error correcting code!

\begin{definition}
A function $\text{ECC}:\zo^{M_1}\longrightarrow \zo^{M_2}$ is said to
be an $(M_1,M_2,d)$-code if for all $x_1\neq x_2$, the hamming
distance $\Delta_H(\text{ECC}(x_1) - \text{ECC}(x_2))\geq d$.

It is said to be $\inparen{\rho M_2,L}$-list decodable if for all
$r\in \zo^{M_2}$, 
$$
\abs{\setdef{m\in\zo^{M_2}}{\Delta_H(\text{ECC}(m),r) \leq \rho
M_2}} \leq L
$$
\end{definition}

Therefore, lets now modify the generator as follows. 
\begin{definition}(final)
Let $\text{ECC}$ be an $(\frac{1}{2} - \frac{\epsilon}{2n},L)$-list
decodable error correcting code. Trevisan's extractor is defined as
the function $E(x,y) = G_{f_{\text{ECC}(x)}}(y)$.
\end{definition}

Now, the number of $y$'s that are bad is bounded by $L\cdot
2^{2n^2}$. And therefore, Trevisan's extractor is indeed an
$\inparen{\log L + 2n^2 + \log{\frac{1}{\epsilon} + 1,\epsilon}}$.
\end{proof}

And of course, the existance of a code with such a distance is not
too much to ask for. Infact, we have good codes such that
$(\frac{1}{2} - \frac{\epsilon}{2n},\poly\inparen{\frac{n}{\epsilon}})$ list
decodable codes. 



\end{document}

