\begin{document} \documentclass[11pt]{article} \usepackage{latexsym} \usepackage{amsmath} \usepackage{amssymb} \usepackage{amsthm} \usepackage{hyperref} \usepackage{algorithmic} \usepackage{algorithm} \usepackage{complexity} \usepackage{graphicx} \usepackage{palatino} %what a beautiful font! \newcommand{\handout}[5]{ \noindent \begin{center} \framebox[\textwidth]{ \vbox{ \hbox to \textwidth { {\bf #2}\hfill {\bf Computational Number Theory } } \vspace{4mm} \hbox to \textwidth { {\Large \hfill #5 \hfill} } \vspace{2mm} \hbox to \textwidth { {\em #3 \hfill #4} } } } \end{center} \vspace*{4mm} } \newcommand{\lecture}[4]{\handout{#1}{#2}{Instructor: #3}{Scribe: #4}{Lecture #1}} \newtheorem{theorem}{Theorem} \newtheorem{theorem*}{Theorem} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{observation}[theorem]{Observation} \newtheorem{proposition}[theorem]{Proposition} \newtheorem{definition}[]{Definition} \newtheorem{claim}[theorem]{Claim} \newtheorem{fact}[]{Fact} \newtheorem{subclaim}[theorem]{Subclaim} % my custom commands \newcommand{\inparen}[1]{\left(#1\right)} %\inparen{x+y} is (x+y) \newcommand{\inbrace}[1]{\left\{#1\right\}} %\inbrace{x+y} is {x+y} \newcommand{\insquar}[1]{\left[#1\right]} %\insquar{x+y} is [x+y] \newcommand{\inangle}[1]{\left\langle#1\right\rangle} %\inangle{A} is <A> \newcommand{\abs}[1]{\left|#1\right|} %\abs{x} is |x| \newcommand{\norm}[1]{\left\Vert#1\right\Vert} %\norm{x} is ||x|| \newcommand{\union}{\cup} \newcommand{\Union}{\bigcup} \newcommand{\intersection}{\cap} \newcommand{\Intersection}{\bigcap} \newcommand{\super}[2]{#1^{\inparen{#2}}} %\super{G}{i-1} is G^{(i-1)} \newcommand{\setdef}[2]{\inbrace{{#1}\ : \ {#2}}} \newcommand{\inv}[1]{#1^{-1}} \newcommand{\inrpdt}[2]{\left\langle{#1},{#2}\right\rangle}%\inrpdt{x}{y} is <x,y>. \newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\iso}{\cong} \newcommand{\bea}[1]{\begin{eqnarray*} #1 \end{eqnarray*}} \newcommand{\legendre}[2]{\inparen{\frac{#1}{#2}}} % Commands specific to this file % TODO: Find the right way to typeset group index \DeclareMathOperator{\Sym}{Sym} \newcommand{\gpidx}[2]{\insquar{#1 : #2}} %\gpidx{H}{K} is [H : K] \newcommand{\gpigs}[2]{\gpidx{\super{G}{#1}}{\super{G}{#2}}} %Group index of g super ... \newcommand{\llhd}{\!\!\lhd\!\!\lhd} \newcommand{\roundoff}[1]{\left\lfloor #1 \right\rceil} % \newcommand{\ceil}[1]{\lceil #1 \rceil} \newcommand{\floor}[1]{\lfloor #1 \rfloor} \newcommand{\F}{\mathbb{F}} \newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\Z}{\mathbb{Z}} \renewcommand{\R}{\mathbb{R}} \renewcommand{\C}{\mathbb{C}} %for algorithms \renewcommand{\algorithmicrequire}{\textbf{Input:}} % Problems we look at \newcommand{\GIso}{\lang{Graph\text{-}Iso}} %Without \text, ugly minus instead of hyphen. \newcommand{\GAut}{\lang{Graph\text{-}Aut}} \newcommand{\SStab}{\lang{Set\text{-}Stab}} %for Quantum computing \newcommand{\q}[1]{\left|#1\right\rangle} \lecture{13: Codes: An Introduction}{CS681}{Piyush P Kurur}{Ramprasad Saptharishi} \section*{Overview} Over the next few lectures we shall be looking at codes, linear codes in particular. In this class we shall look at the motivation and a glimpse at error detection. The details shall be done in the lectures to come. \section{Codes} Suppose you have two parties Alice and Bob who wish to communicate over a channel which could potentially be unreliable. What we mean by unreliable is that some parts of the message could be corrupt or changed. We want to make sure that the recepient can detect such corruption if any, or sometimes even recover the message from the corrupted. We can assume that the channel sends allows sending some strings over a fixed finite alphabet (a finite field, or bits, or the english alphabet etc). The two questions we need to address here is detection and correction. \subsection{Block Codes} What if Alice needs to send a message, in say english, and the channel has a different alphabet, say binary strings. Then we need some way of converting strings of one alphabet into another. This is achieved by looking at blocks of code. In the example of english to binary, we could look at ascii codes. Each letter would correspond to a {\em block} of letters in the channel. Of course, not all blocks of bits could correspond to meaningful sentences or messages. A block code is in general just a subset of strings. To formally define it: \begin{definition} Let $\Sigma$ be the fixed finite alphabet for the channel of communication. A block code $\mathcal{C}$ of length $n$ over this communication is a subset of $\Sigma^n.$ Elements of $\mathcal{C}$ are called code words. \end{definition} \begin{definition} For any two strings $x$ and $y$ of the same length, the hamming distance is defined as the number of indices that $x$ and $y$ differ in. $$ d(x,y) = \abs{\setdef{i}{x_i \neq y_i}} $$ \end{definition} \begin{definition} The distance of a code $\mathcal{C}$ is the minimum distance between its codewords. That is, $$ d(\mathcal{C}) = \min_{x\neq y}d(x,y) $$ \end{definition} In a sense, the distance of a code is a measure of how much one needs to change to alter one code to another. For example, if the distance of a code was say $5$, then it means that there are two strings (messages) $x$ and $y$ that differ at just $5$ places. Now suppose $x$ was sent through the channel and those $5$ bits were changed due to the noise in the channel, then Bob would receive the message $y$ from Alice while she had sent $x.$ And since $y$ was also a message, Bob could completely misinterpret Alice's message. We would like codes to have large distance so that it takes a lot of corruption to actually alter one codeword into another. From this, we have a simple observation. \begin{observation} Let $d$ be the distance of a code $\mathcal{C}.$ Then the code is $d-1$-detectable, or if the channel corrupts atmost $d-1$ letters of the message, then the other party can detect that the code word has been corrupted. \end{observation} \begin{proof} As we remarked earlier, since the distance is $d$, it takes atleast $d$ corruptions to change one code word into another. Therefore, on any code word $x$, something less than $d$ corruptions cannot change it to another codeword. Therefore, if the string Bob received was a codeword, then he knows for sure that Alice had sent that string for sure. \end{proof} Therefore, if the channel changed atmost $t$ bits, any code with distance atleast $t+1$ would allow error detection. But of course, if Bob received ``I hobe you'', he knows that the message was corrupted but he has no way of determining whether the message was ``I love you'' or ``I hate you''. In order to correct $t$ errors, you need a more than just a distance of $t+1.$ \begin{observation} If $\mathcal{C}$ is a code of distance atleast $2t+1$, then any message that is corrupt by atmost $t$ bits can be recovered. Or in other words, the code is $t$-correctable. \end{observation} Suppose Alice had sent some codeword $x$ and let us say this was altered through the channel and Bob received it as $y.$ Given that atmost $t$ bits were altered, we want to show that Bob can infact recover the message. Since Bob knew that atmost $t$ bits are corrupted, he looks at all codewords at a hamming distance of atmost $t$ from $y.$\footnote{can be thought of as putting a ball of radius $t$ around $y$} Now clearly $x$ is a codeword that is present at a hamming distance at most $t$ from $y.$ If $x$ was the only such code word, then Bob knows for sure that the message Alice sent has to be $x.$ But it must be the case that $x$ is the only such codeword. Suppose not, say there was some other codeword $x'\neq x$ at a distance atmost $t$ from $y.$ Now since $x$ and $y$ differ at most $t$ places and $y$ and $x'$ at atmost $t$, by the triangle inequality $x$ and $x'$ differ at atmost $2t$ places. But this contradicts the assumption that the distance of the code is atleast $2t+1.$\\ Or in other words, if you want to move from one codeword to another through corruption, you need to corrupt $2t+1$ bits atleast. And therefore if you corrupt just $t$ place, you are definitely closer to where you started from than any other codeword. But of course, it does not make sense to look at all code words of distances less than $t$ from $y$ to decode. Even besides that, how do we even figure out if a given word is a code word or not. So two important properties that we would want the code to have is efficient detection of codewords and effecient decoding. \section{Linear Codes} Recall that a block code $\mathcal{C}$ is an arbitrary subset of $\Sigma^n.$ These codes could have no structure underlying them and that inherently makes detecting if a string is a codeword hard. Hence comes the idea for linear codes. Since our alphabet is finite, we shall assume that the alphabet is infact a finite field $\F_q.$ Now our space is $\F_q^n$ which is infact a vector space over $\F_q$ of dimension $n.$ Instead of looking at arbitrary subsets of this space, linear codes restrict themselves to subspaces of $\F_q^n.$ \begin{definition} A $\insquar{n,k,d}_q$ linear code $\mathcal{C}$ such that $\mathcal{C}$ is a $k$-dimensional subspace of $\F_q^n$ and has distance $d.$ That is, if $x,y \in \mathcal{C}$ then so is $\alpha x + \beta y$ for all $\alpha,\beta \in \F_q.$ \end{definition} To intuitively understand the parameters, we would be encoding messages of length $k$ with codes of length $n$ so that it can error-correct up to $d/2$ errors. Thus we want $k$ to be as close to $n$ as possible and also try to make $d$ large. It is not possible to arbitrarily increase both but we want some reasonable values. To see an example of a linear code, consider our field to be $\F_2.$ Then if $\mathcal{C} = \inbrace{(0,0,0),(1,1,1)}$ then this is a $\insquar{3,1,3}_2$ linear code. \begin{definition} The weight of any string $x$ is the number of indices of $x$ that have a $1$ in it. $$ wt(x) = \abs{\setdef{i}{x_i=1}} $$ \end{definition} Then we have the following easy observation. \begin{observation} If $\mathcal{C}$ is a linear code, then $$ d(\mathcal{C}) = \min_{x\neq 0} wt(x) $$ \end{observation} \begin{proof} By definition, $d(\mathcal{C}) = d(x,y)$ but $d(x,y) = wt(x-y).$ Note that since we are looking at a linear code, $x,y\in \mathcal{C}$ also tells us that $x-y\in \mathcal{C}.$ Therefore, for every $x\neq y$ we have a corresponding $0\neq z = x-y$ that is a codeword whose weight is exactly the distance between $x$ and $y.$ \end{proof} \subsection{Detection} Suppose we have a linear code $\mathcal{C}$ and given a string $x$ we want to check if this is in the code or not. Since we know that our code is a subspace of $\F_q^n$, we can represent $\mathcal{C}$ using a basis. Let us say we are looking at $[n,k,\_]_q$ codes and our basis be $\inbrace{b_1, b_2, \cdots, b_k}$ where each $b_i \in \F_q^n.$ The idea is that we want to construct a matrix $H$ such that $Hx = \bar{0}$ if and only if $x\in \mathcal{C}.$ Thus, in terms of tranformations, we want a linear map $H$ such that the kernel of this map is precisely (nothing more, nothing less) $\mathcal{C}.$ The question is, how do we find such a matrix?\\ We first find a transformation that achieves what we want and then try and figure out what the matrix of the transformation should look like. We have a basis $\inbrace{b_1,b_2,\cdots, b_k}$ for our code. Let us extend this first to a basis $\inbrace{b_1, b_2, \cdots, b_n}$ of $\F_q^n.$ Thus, every vector $v\in \F_q^n$ can be written as a unique linear combination of $b_i$s. We do the obvious transformation: map all $b_i$ where $i\leq k$ to $0$ and the rest of the basis elements to identity. \bea{ v & = & \alpha_1 b_1 + \alpha_2 b_2 + \cdots +\alpha_k b_k + \alpha_{k+1}b_{k+1} + \cdots + \alpha_n b_n\\ Hv & = & \alpha_{k+1}b_{k+1} + \cdots + \alpha_n b_n } Now a vector $v$ will be in the kernel of $H$ if and only if all $\alpha_i$ where $i>k$ is zero. Then $v$ has to be a linear combination of just the basis elements of $\mathcal{C}$ and therefore must itself be a codeword. \\ Now comes the question of how to compute the matrix of transformation of $H.$ Suppose it turns out that that the basis we chose was actually the standard basis $\inbrace{e_1,e_2, \cdots, e_n}.$ Then what can we say about the matrix $H$? Then clearly, it should map all vector $(\alpha_1, \alpha_2, \cdots, \alpha_n)$ to $(0,0,\cdots, 0,\alpha_{k+1},\cdots, \alpha_n).$ And this matrix is just $$ \hat{H} = \insquar{\begin{array}{cc} 0 & 0 \\ 0 & I \end{array} }_{n\times n} $$ where the $I$ is the identity matrix of order $n-k.$ But it's unlikely that we start with such a nice basis. The good news is that we can easily move from one basis to another. We just want a way to send each $b_i$ to $e_i$ so that we can then use the transformation $\hat{H}$ to send it to $0$ if $i\leq k$ and keep it non-zero otherwise. Instead of sending $b_i$ to $e_i$, the other direction is easy to compute. What if we ask for a matrix $B$ such that $Be_i = b_i$? This is easy because $Be_i$ is just the $i$-th column of the matrix $B.$ Thus the matrix is just $\insquar{b_1 b_2 \cdots b_n}$ where each $b_i$ is now expanded as a column vector; just place each basis element side by side as a column vector and that is the transformation. Now that we have a matrix $B$ that sends $e_i$ to $b_i$, how do we get a matrix that sends $b_i$ to $e_i$? Take $B^{-1}$! Now look at $\hat{H}B^{-1}$ as a transformation. $\hat{H}B^{-1}b_i = \hat{H}e_i$ which is $0$ if $i\leq k$ and $e_i$ otherwise. Thus this matrix $\hat{H}B^{-1}$ is the matrix we were after: a matrix whose kernel is precisely the code $\mathcal{C}.$ \end{document}