%        File: error-correcting.tex
%     Created: Wed Feb 06 12:00 AM 2008 I
% Last Change: Wed Feb 06 12:00 AM 2008 I
%
\input{preamble}
\usepackage{complexity}

%\usepackage{fullpage}
\usepackage{amsthm}
\usepackage{amstext}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{latexsym}
\usepackage{algorithm}
\usepackage{algorithmic}
\usepackage{hyperref}

\date{\today}


\usepackage{url}

%% \newcommand{\heading}[5]{
%%    \renewcommand{\thepage}{#1\arabic{page}}
%%    \noindent
%%    \begin{center}
%%    \framebox{
%%       \vbox{
%%     \hbox to 4.48in { {\bf}
%%          \hfill #2 }
%%        \vspace{4mm}
%%        \hbox to 4.48in { {\Large \hfill #5  \hfill} }
%%        \vspace{2mm}
%%        \hbox to 4.48in { {\it #3 \hfill #4} }
%%       }
%%    }
%%    \end{center}
%% %   \vspace*{4mm}
%% }
%% %may be modified for other lecture notes.
%% \newcommand{\handout}[3]{\heading{#1}{#2}{Speaker: Srikanth}{Scribe: Piyush Srivastava}{#3}}

\begin{document}

\lecture{Error-Correcting Codes}{Pseudorandomness}{Srikanth
Srinivasan}{Piyush Srivastava}
\section{Introduction}

We would now be moving towards a proof of the \emph{Impagliazzo -
Wigderson Theorem}\cite{IW01}, which essentially indicates a
connection between \emph{worst-case} (as opposed to average-case)
hardness and randomness. More precisely, it is the following therem:

\begin{theorem}
If there is a language $L \in \E$, which cannot be decided by
circuits of size $2^{\delta n}$ for any $\delta > 0$, then
$\P = \BPP$.  \label{ImpWig01}
\end{theorem}

Till now, our progress in this direction has been up to
the \emph{Nisan-Wigderson pseudorandom generator}. The NW generator
shows that if we have $L \in \E$, such that $L$ is \emph{average case
hard} for exponential sized circuits, then $\P = \BPP$.


\section{The Approach}
The approach to strengthening the NW generator is to start with
a \emph{worst case hard} language $L$ and code it to another
language $L'$ such that if $L'$ is not average-case hard then $L$ is
not worst case hard. If this holds, then one can simply bootstrap the
NW result and derive the IW result. Error correcting codes would
play a part in the coding step described above.


Let us look at the above in slightly more detail. Let $L_n$ be the $n$
bit ``slices'' of $L$, that is the subsets of $L$ of size $n$
strings. If $L$ is not average case hard, we have ``small'' circuits
which agrees $L_n$ on at least a $\frac{1}{2} +
\frac{1}{2^{\delta n}}$ fraction of inputs. From these circuits, we would want to
construct small ``always-correct'' circuits for another language
$L'$. This is roughly similar to going from corrupted code-words to
the actual code-words in a error-correction setting. This stimulate
the use of Error-correcting codes in this connection.

\section{Preliminaries}

We fix an alphabet $\Sigma$, where $|\Sigma| = q$.

\begin{definition}
A $\insquar{n, k, d}_q$ code is a map $C: \Sigma^k \mapsto \Sigma^n$
such that
$$
\forall x,y \in \Sigma^k\quad ,\quad  x \neq
y \implies \hamming{C(x)}{C(y)} \geq d. 
$$
Here, $\Delta_H$ denotes the Hamming distance. $d$ is called
the \emph{distance} of the code.
\end{definition}

We would normally look at $\Sigma$ as a finite field, and so that $q$
would be a prime power. Also, in this case, $C$ above is called
a \emph{linear code} if it is a linear map, with repect to addition
over the vector space $\Sigma^k$.

\section{Reed-Solomon Codes}
We now describe the \emph{Reed-Solomon} codes. We fix a finite field
$\F$ of size $q \geq n$, and choose distinct $x_1, x_2, \dots,
x_n \in \F$. We also pick an \emph{efficiently invertible}
linear map $M$ that maps a $k$-dimensional vector in $\F^k$ to a
polynomial of degree less than $k$. We denote $M\inparen{{\vect{a}}}$ by
the polynomial $p_{\vect{a}}$. We then define the \emph{Reed-Solomon
code}, using the natural evaluation map for polynomials as:

$$
C: {\vect{a}} \mapsto \inparen{p_{\vect{a}}(x_1),
p_{\vect{a}}(x_2), \dots, p_{\vect{a}}(x_n)}
$$

An example of such a map is one that sends ${\vect{a}}$ to
$p_{\vect{a}}$ where $p_{\vect{a}}$ is the \emph{unique} polynomial of
degree not greater than $k - 1$, such that for $z \in [k],$
$p_{\vect{a}}(x_z) = a_z$. 

Clearly, as $M$ is linear, this is a linear code. The distance $d$
clearly satisfies $d \geq n - k + 1$, as any non-zero codeword can
have zeroes in at most $ k - 1$ positions.  $k$ is called
the \emph{rate} of the code.

In this context, decoding is roughly the following problem: Given a
vector `$\vect{y}$' retrieve `$\vect{a}$' for which the set
$\setdef{i}{p_\vect{a}\inparen{x_i}=y_i}$ is large. This essentialy
captures the idea of detecting and correcting the received word if the
corruption is small. Also, if we need \emph{unique} decoding, the
number of positions in which there is error must be less than
$\ceil{\frac{d-1}{2}}.$


\subsection{Unique decoding for Reed-Solomon Codes: The Berlekemp-Welch Algorithm} 

The Berlekemp-Welch algorithm (Algorithm \ref{algBW}) achieves unique
decoding for Reed-Solomon codes when the number of errors is less than
$\frac{d}{2}$. The algorithm uses two polynomials, which together
describe the position and the value of errors.

We shall denote $B$ by the set of bad indices. That is, $B
= \setdef{i}{p(x_i)\neq y_i}$. 
\begin{eqnarray*}
E(x) & = & \prod_{i \in B}(x - x_i)\\
N(x) & = & p(x)E(x)
\end{eqnarray*}

Note that $\deg E < \frac{d}{2}$ and $\deg N < \frac{d}{2} + k - 1$
and also that 
\begin{equation}
\forall i, N(x_i) = y_iE(x_i) \label{eqn1}
\end{equation}

\begin{algorithm}
\caption{Berlekemp-Welch Decoding}
\label{algBW}
\begin{algorithmic}[1]
\REQUIRE $S = \inbrace{(x_i, y_i), 1 \leq i \leq n}$. Here $\vect{y}$ is the received
code word and $\vect{x}$ is as used in the definition of Reed-Solomon
codes.
\OUTPUT Unique polynomial $p$ such that degree of $p \leq k-1$ such that
$\abs{\setdef{i}{p(x_i)\neq y_i}} < \frac{d}{2}$.
		
\STATE Compute $N'$ of degree at most $\frac{n+k-1}{2}$ and $E'$ of degree less than
$\frac{n-k+1}{2}$ such that equation \ref{eqn1} holds. This can be done through
Gaussian elimination.
\IF {$E' \nmid N'$}
\PRINT ``Error''
\ELSE
\STATE Output the decoded message $\inparen{p(x_1), p(x_2)\dots
p(x_n)}$, where $p = \frac{N'}{E'}$.
\ENDIF
\end{algorithmic}
\end{algorithm}

It only remains to show the following lemma.

\begin{lemma}\label{lem1}
If the algorithm above calculates $N',E'$ where $E'|N'$, then $\frac{N}{E} =
\frac{N'}{E'}$.
\end{lemma}
\begin{proof}
Consider the polynomials $NE'$ and $N'E$. Clearly, $\forall i\in[n], NE'(x_i) =
N'E(x_i)$. Now $\deg{NE'},\deg{N'E} < n$, and they agree on $n$ distinct points.
So, $NE'=N'E$, and the desired result follows.
\end{proof}

\section{Sudan's List Decoding Algorithm}

Unique decoding has the caveat that it cannot work when the number of
errors is larger than $d/2$. However, even in such a case, it might be
possible to produce a small ``list'' of the candidate
messages. \emph{List decoding} is essentially the problem of finding
these lists. List decoding algorithms are known which can handle
$n(1-\delta)$ errors, for any constant $\delta > 0$. We describe here
Sudan's list decoding algorithm(Algorithm
\ref{alg:S}).


\begin{algorithm}
\caption{Sudan's List Decoding Algorithm}
\label{alg:S}
\begin{algorithmic}[1]
\REQUIRE $S = \inbrace{(x_i, y_i)| i \leq i \leq n}, t \in \N$.
\OUTPUT List $L$ of polynomials $p\in\F[x]$ such that:
\begin{itemize}
\item $\deg p \leq k-1$
\item $|\setdef{i}{p(x_i)=y_i}| \geq t$
\end{itemize}
\COMMENT {We would want $t > 2\sqrt{kn}$, as then by choosing $k \approx
\epsilon\sqrt{n}$, we can get $\frac{t}{n} \rightarrow 0$.}
\STATE Find $Q \in \F[x,y]$, such that 
\begin{itemize}
\item $x$-degree of  $Q \leq d_x$
\item $y$-degree of  $Q \leq d_y$
\end{itemize}
and $\forall i \in [n]$, $Q(x_i,y_i) = 0$.
\COMMENT {$d_x$ and $d_y$ are chosen so that the following two conditions are satisfied:
\begin{itemize}
\item $(d_x+1)(d_y+1)> n$
\item $d_x + kd_y < t$
\end{itemize}
}
\STATE Use bivariate factorization to factorize $Q$ into its irreducible factors and take
the irreducible factors of the form $(y-g_j(x))$. Output the list of the $g_j$'s.
\end{algorithmic}
\end{algorithm}

{\bf Informal justification for Algorithm \ref{alg:S}}

The condition $(d_x+1)(d_y+1) > n$ is introduced so that $Q$ can be
obtained by solving a set of $n$ linear equations.

Let $p$ be a polynomial satisfying the conditions to be in the output
list. Consider the polynomial $Q'(x) = Q(x,p(x))$. Clearly, $Q'$ has
at least $t$ distinct roots, and it's degree is upper-bounded by $d_x
+ (k-1)d_y$. So, if $d_x + (k-1)d_y < t$, $Q'$ is identically
zero. Hence $(y-p(x))|Q(x,y)$. Also, $(y-p(x))$ is also clearly an
irreducible factor of $Q$.

Also, there can be at most $d_y$ such factors. Thus, the length of the
output list is at most $d_y$. It remains to fix $d_x$ and $d_y$. It is
clear that taking $d_x < \sqrt{kn}$ and $d_y < \sqrt{\frac{n}{k}}$
ensures the above conditions allow us to take $t = 2\sqrt{kn}$. Also,
the length of the output list is at most $t/2k$.

\section{Reed-Muller codes}
In \emph{Reed-Solomon codes}, we looked at a map from the message to
the evaluation of a univariate polynomial at a fixed set of
points. Reed-Muller codes generalise this idea to looking
at \emph{multivariate} polynomials. We fix a finite field $\F_q$, and
a subset $H \subset \F, |H| = h$ and an $m\in \N$. Our code would be a
map $C: \F^{h^m} \longrightarrow \F^{q^m}$.

Analogous to the Reed-Solomon Code, we define a injective map
$a \mapsto p_a(x_1, x_2, \dots, x_m)$, where $p \in \F[x_1,
x_2, \dots, x_m]$ is an $m$-variate polynomial over $\F$. Our code
would be $C(a) = (p_a(b))_{b \in \F}$. Although any efficiently
invertible map from $a$ to $p_a$ is fine, we may in particular define
$p_a$ using interpolation by imposing the following condition:
$$
\forall \alpha \in H^m\quad,\quad p_a(\alpha) = a_\alpha
$$
(It is clear that the components of $a$ can be indexed by elements of $H^m$.)

The interpolation would give $p_a$ of total degree at most $mh$.


\section{Sub-linear time decoding for Reed-Muller codes}

We said that our idea would be to ``code'' a worst-case hard function
$f$ to a function $f'$ such that $f$ is worst-case hard only if $f'$
is average-case hard. The Sub-linear time decoding for Reed-Muller
codes gives some hope in this direction.

A way of capturing the above technique is the following: Suppose $f'$
is not average case hard. So a small circuit $C'$ computes $f'$ at a
large fraction of inputs. One would then want to show that this
implies the existence of a circuit $C$ which computes $f$ everywhere.

The above discussion motivates the following problem: Given a circuit
$C'$ which can computes a function $f$ that agrees with a polynomial
$p$ on a given fraction say $\epsilon(n)$ of the input ($n$ is the
input length), can we construct a circuit $C$ which calculates $p$ on
all inputs. Ideally, we would like $\epsilon(n)$ to be
inverse-exponential to be helpful in the proof of the
Impagliazzo-Wigderson theorem.  However, we begin by taking
$\epsilon(n) = \frac{9}{10}$ to start with. We would see how to handle
smaller $\epsilon$ in the next talk, and then we would bootstrap the
following theorem that we prove now:

\begin{theorem}	\label{them:subLinDec}
Let $p: \F^m \mapsto \F$ be a polynomial such that $\deg p \leq k$ and
$C'$ be a circuit which computes $f: \F^m \mapsto \F$ where $f$ agrees
with $p$ on at least a $\frac{9}{10}$ fraction of inputs. If, further
$|\F|^m \geq 5k$ then there exits a circuit $C$ such that $|C| =
O(\text{poly }|C'|)$, which computes $p$ on all inputs.
\end{theorem}

\begin{proof}
Fix an input $x \in \F^m$, and choose $r \in \F^m$ uniformly at
random. Now pick $t > 5(k-1), t \in \N$. The reason for this choice
lies in a later part of this proof. Now define $q_x(y) = p(x +
yr)$. Clearly, if we can determine $q_x(0)$, $p(x) = q_x(0)$ is
determined as well. We now try to find $q_x$ using the Berlekemp-Welch
decoding algorithm(Algorithm \ref{algBW}).

Pick non-zero $a_i \in \F^m$ for $1\leq i\leq t$ uniformly at random.
As individual points, $x+a_ir$ is a random point in $\F^m$. So,
$E[\abs{\setdef{i}{f(x+a_ir) \neq q(a_i)}}] < \frac{t}{10}$.

Using Markov's inequality\footnote{$\Pr[X \geq
M] \leq \frac{E[X]}{M}$, where $X$ is a non-negative random
variable over an ordered set, and $M$ is a value in its domain.}, we
get
$$
\Pr_{r,a_1,\dots,a_t}\insquar{\abs{\setdef{i}{f(x + a_ir) \neq q(a_i)}} \geq \frac{3t}{10}} <
\frac{1}{3}
$$

Thus with probability at least $\frac{2}{3}$, at least $\frac{7t}{10}$
of the equations $f(x+a_ir) = q(a_i), 1\leq i \leq t$ hold.

Now, to use Berlekemp-Welch decoding we only need $\frac{3t}{10}
< \frac{d}{2}$, where $d$ is the distance of the code. Here, $d \leq t
- k + 1$, and hence $t > 5(k-1)$ suffices. Also for this reason we
required $|\F^m| > 5k$, as we required so many distinct non-zero
distinct elements from $\F^m$.

The circuit $C$ would thus proceed by evaluating $f$ using $C'$ on the
$t$ inputs, and then using Berlekemp-Welch decoding. We only need to
ensure that the elimination of random bits used in the computation
does not lead to a large increase in the size of this circuit. It is
easy to check that the success probability can be amplified enough to
let us fix a sequence of random bits that work for all inputs. This
would then give us a small circuit we are looking for. 
\end{proof}
%% Suppose we repeated the computation $N = m(t+1)\log_3 q$ times, then
%% the failure probability reduces to less than $P
%% = \frac{1}{q^{m(t+1)}}$, while the number of random field elements
%% required is now $Nm(t+1)$.

%% Call a sequence of random bits \emph{good} for the input $x$ if the BW
%% decoding in the above computation succeeds for this random choice
%% and \emph{bad} otherwise. The number of all such tuples is $ T =
%% q^{(t+1)Nm}$. As the failure probability is at most $P$, the
%% number of sets which are bad for some input cannot exceed $q^{m\inp{1
%% + N\inp{t+1}}}P$, which is less than $T$. Hence we can always choose a
%% tuple, which is of size $N\log q$ and which is good for all inputs. We
%% just hardwire this tuple into $C$. It is clear that $C$ satisfies the
%% required size bound.


\end{document}





