[Next] [Up] [Previous] [Index]

Error-Correcting Codes

After text has been compressed, and even more so after it has been encrypted, the resulting output is random in appearance. This makes it easy to make mistakes in handling it, and hard to see errors if they are introduced through electrical noise on a communications link or any similar cause.

Thus, even if redundancy is removed as far as is possible before encryption by means of data compression, after encryption is complete, it becomes desirable to put redundancy back in, to give resistance to errors.

There are mathematical techniques, given an arbitrary text, to add more symbols to it in a way that gives resistance to a specific number of errors. Since they depend only on the input text, if the ciphertext is used as input, this kind of added redundancy does not help the cryptanalyst, except by giving both him and the legitimate recipient a copy of the ciphertext that is more likely to be accurate.

Thus, telegraph codes were designed so that an error in a single letter of a codeword, or switching two adjacent letters in a codeword, would not result in another valid codeword, by the use of tables such as this:

AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ A& 
BB BC BD BE BF BG BH BI BJ BK BL BM BN BO BP BQ BR BS BT BU BV BW BX BY BZ B& BA 
CC CD CE CF CG CH CI CJ CK CL CM CN CO CP CQ CR CS CT CU CV CW CX CY CZ C& CA CB 
DD DE DF DG DH DI DJ DK DL DM DN DO DP DQ DR DS DT DU DV DW DX DY DZ D& DA DB DC 
EE EF EG EH EI EJ EK EL EM EN EO EP EQ ER ES ET EU EV EW EX EY EZ E& EA EB EC ED 
FF FG FH FI FJ FK FL FM FN FO FP FQ FR FS FT FU FV FW FX FY FZ F& FA FB FC FD FE 
GG GH GI GJ GK GL GM GN GO GP GQ GR GS GT GU GV GW GX GY GZ G& GA GB GC GD GE GF 
HH HI HJ HK HL HM HN HO HP HQ HR HS HT HU HV HW HX HY HZ H& HA HB HC HD HE HF HG 
II IJ IK IL IM IN IO IP IQ IR IS IT IU IV IW IX IY IZ I& IA IB IC ID IE IF IG IH 
JJ JK JL JM JN JO JP JQ JR JS JT JU JV JW JX JY JZ J& JA JB JC JD JE JF JG JH JI 
KK KL KM KN KO KP KQ KR KS KT KU KV KW KX KY KZ K& KA KB KC KD KE KF KG KH KI KJ 
LL LM LN LO LP LQ LR LS LT LU LV LW LX LY LZ L& LA LB LC LD LE LF LG LH LI LJ LK 
MM MN MO MP MQ MR MS MT MU MV MW MX MY MZ M& MA MB MC MD ME MF MG MH MI MJ MK ML 
NN NO NP NQ NR NS NT NU NV NW NX NY NZ N& NA NB NC ND NE NF NG NH NI NJ NK NL NM 
OO OP OQ OR OS OT OU OV OW OX OY OZ O& OA OB OC OD OE OF OG OH OI OJ OK OL OM ON 
PP PQ PR PS PT PU PV PW PX PY PZ P& PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO 
QQ QR QS QT QU QV QW QX QY QZ Q& QA QB QC QD QE QF QG QH QI QJ QK QL QM QN QO QP 
RR RS RT RU RV RW RX RY RZ R& RA RB RC RD RE RF RG RH RI RJ RK RL RM RN RO RP RQ 
SS ST SU SV SW SX SY SZ S& SA SB SC SD SE SF SG SH SI SJ SK SL SM SN SO SP SQ SR 
TT TU TV TW TX TY TZ T& TA TB TC TD TE TF TG TH TI TJ TK TL TM TN TO TP TQ TR TS 
UU UV UW UX UY UZ U& UA UB UC UD UE UF UG UH UI UJ UK UL UM UN UO UP UQ UR US UT 
VV VW VX VY VZ V& VA VB VC VD VE VF VG VH VI VJ VK VL VM VN VO VP VQ VR VS VT VU 
WW WX WY WZ W& WA WB WC WD WE WF WG WH WI WJ WK WL WM WN WO WP WQ WR WS WT WU WV 
XX XY XZ X& XA XB XC XD XE XF XG XH XI XJ XK XL XM XN XO XP XQ XR XS XT XU XV XW 
YY YZ Y& YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX 
ZZ Z& ZA ZB ZC ZD ZE ZF ZG ZH ZI ZJ ZK ZL ZM ZN ZO ZP ZQ ZR ZS ZT ZU ZV ZW ZX ZY 

M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L    AA BB CC DD EE FF GG HH II JJ KK LL MM NN OO PP QQ RR SS TT UU VV WW XX YY ZZ 
N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M    A& BA CB DC ED FE GF HG IH JI KJ LK ML NM ON PO QP RQ SR TS UT VU WV XW YX ZY 
O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N    AZ B& CA DB EC FD GE HF IG JH KI LJ MK NL OM PN QO RP SQ TR US VT WU XV YW ZX 
P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O    AY BZ C& DA EB FC GD HE IF JG KH LI MJ NK OL PM QN RO SP TQ UR VS WT XU YV ZW 
Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P    AX BY CZ D& EA FB GC HD IE JF KG LH MI NJ OK PL QM RN SO TP UQ VR WS XT YU ZV 
R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q    AW BX CY DZ E& FA GB HC ID JE KF LG MH NI OJ PK QL RM SN TO UP VQ WR XS YT ZU 
S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R    AV BW CX DY EZ F& GA HB IC JD KE LF MG NH OI PJ QK RL SM TN UO VP WQ XR YS ZT 
T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S    AU BV CW DX EY FZ G& HA IB JC KD LE MF NG OH PI QJ RK SL TM UN VO WP XQ YR ZS 
U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T    AT BU CV DW EX FY GZ H& IA JB KC LD ME NF OG PH QI RJ SK TL UM VN WO XP YQ ZR 
V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U    AS BT CU DV EW FX GY HZ I& JA KB LC MD NE OF PG QH RI SJ TK UL VM WN XO YP ZQ 
W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V    AR BS CT DU EV FW GX HY IZ J& KA LB MC ND OE PF QG RH SI TJ UK VL WM XN YO ZP 
X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W    AQ BR CS DT EU FV GW HX IY JZ K& LA MB NC OD PE QF RG SH TI UJ VK WL XM YN ZO 
Y  Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X    AP BQ CR DS ET FU GV HW IX JY KZ L& MA NB OC PD QE RF SG TH UI VJ WK XL YM ZN 
Z  &  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y    AO BP CQ DR ES FT GU HV IW JX KY LZ M& NA OB PC QD RE SF TG UH VI WJ XK YL ZM 
&  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z    AN BO CP DQ ER FS GT HU IV JW KX LY MZ N& OA PB QC RD SE TF UG VH WI XJ YK ZL 
A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &    AM BN CO DP EQ FR GS HT IU JV KW LX MY NZ O& PA QB RC SD TE UF VG WH XI YJ ZK 
B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A    AL BM CN DO EP FQ GR HS IT JU KV LW MX NY OZ P& QA RB SC TD UE VF WG XH YI ZJ 
C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B    AK BL CM DN EO FP GQ HR IS JT KU LV MW NX OY PZ Q& RA SB TC UD VE WF XG YH ZI 
D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C    AJ BK CL DM EN FO GP HQ IR JS KT LU MV NW OX PY QZ R& SA TB UC VD WE XF YG ZH 
E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D    AI BJ CK DL EM FN GO HP IQ JR KS LT MU NV OW PX QY RZ S& TA UB VC WD XE YF ZG 
F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E    AH BI CJ DK EL FM GN HO IP JQ KR LS MT NU OV PW QX RY SZ T& UA VB WC XD YE ZF 
G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F    AG BH CI DJ EK FL GM HN IO JP KQ LR MS NT OU PV QW RX SY TZ U& VA WB XC YD ZE 
H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G    AF BG CH DI EJ FK GL HM IN JO KP LQ MR NS OT PU QV RW SX TY UZ V& WA XB YC ZD 
I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H    AE BF CG DH EI FJ GK HL IM JN KO LP MQ NR OS PT QU RV SW TX UY VZ W& XA YB ZC 
J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I    AD BE CF DG EH FI GJ HK IL JM KN LO MP NQ OR PS QT RU SV TW UX VY WZ X& YA ZB 
K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J    AC BD CE DF EG FH GI HJ IK JL KM LN MO NP OQ PR QS RT SU TV UW VX WY XZ Y& ZA 
L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  &  A  B  C  D  E  F  G  H  I  J  K    AB BC CD DE EF FG GH HI IJ JK KL LM MN NO OP PQ QR RS ST TU UV VW WX XY YZ Z& 

A miniature version of such a code construction table is shown in The Codebreakers by David Kahn, but here I've put the last two letters of the codeword in alphabetical order in the rows of the square on the lower right, since a code compiler would want to generate codewords in alphabetical order. The row of the top square, and the column of the square on the lower right, which contain digraphs beginning with &, are omitted, since those are unused if codewords from a 26-letter alphabet are wanted.

Valid codewords consist of two letters from the top, a letter from the lower left, and two letters on the lower right, such that the single letter shares the same column as the pair of letters from the top and the same row as the pair of letters from the lower right. The use of an extra dummy letter & is required to avoid codewords differing only by a swap of two adjacent letters. The middle square can start with any letter; here it is started with M to indicate that this feature can be used to produce codewords different from those in someone else's code.

For example, given a string of bits, a single bit which contains the parity of that string of bits can be appended to it. In that case, if the result is transmitted, and exactly one bit is in error in what is recieved, the parity will be wrong, and the fact that an error has taken place will be detected.

Another simple example would be to transmit one's message three times. For every group of three corresponding bits from the three copies of the message, if exactly one error takes place in communications, that error can be corrected, because of the three recieved copies of that bit, two will be correct, and only one will be wrong.

These are the two trivial examples of perfect codes: adding a parity bit, and repeating each bit a certain number of times.

If we confine ourselves to the even parity form of a parity bit, both these codes can be represented by matrices:

A parity bit:

1 0 0 0 0 0 0 0 1
0 1 0 0 0 0 0 0 1
0 0 1 0 0 0 0 0 1
0 0 0 1 0 0 0 0 1
0 0 0 0 1 0 0 0 1
0 0 0 0 0 1 0 0 1
0 0 0 0 0 0 1 0 1
0 0 0 0 0 0 0 1 1

Triple repetition:

1 1 1

In general, matrix multiplication works like this: introduce the numbers which are the elements of the vector being multiplied from the left, starting with the first one on the top; for each bit in that row, calculate the product of the number introduced from the side with the number in the matrix; then, after all the products have been calculated, total them down the columns to obtain the output.

For error-correcting codes illustrated by matrices, it is the bits of the message that are introduced from the left. Then, an AND is performed (rather than a multiplication) between them and the bits of the matrix, and the results are accumulated by an XOR going down the columns (rather than addition).

Illustrating this in the case of a parity bit applied to a nybble:

        1   0   0   0   1
  1 -->  1   0   0   0   1

        0   1   0   0   1
  0 -->  0   0   0   0   0

        0   0   1   0   1
  1 -->  0   0   1   0   1

        0   0   0   1   1
  1 -->  0   0   0   1   1
        -------------------
         1   0   1   1   1

As this is a code applied to a binary signal, all arithmetic is done modulo 2 (hence, 1+1+1 = 1 instead of 3).

Just using parity bits can allow correcting errors. One way to do this is to have both a parity bit for each byte of data, and a parity byte that results from the XOR of all the data bytes. If there is one error, the byte it is in is indicated by the parity bit for that byte, and the bit it is in is indicated by which byte of the final parity byte shows the wrong parity.

A simple way to obtain row and column parity for a sequence of characters on a computer would be to add three parity characters to a line of 49 7-bit characters on this basis:

This would allow one single-bit error in a line to be found and therfore corrected, as follows:

33333X3-----*------*------*------*------*------*- ==?
----2-13333*3*----2-1----2-1----2-1----2-1----2-1 ===
1--2---1--2---*33*3331--2---1--2---1--2---1--2--- ?==
-12-----12-----12----3**3333-12-----12-----12---- ===
-21-----21-----21-----21----3**3333-21-----21---- ===
2--1---2--1---2--1---2--1---2--1---*33*3332--1--- =?=
----1-2----1-2----1------1-2----1-2----1-23333*3* ===

The X is at the only point where the three lines of bits marked by 1, 2, and 3 all coincide. Note that the asterisks mark points where two of them cross. Assuming the shifts after a byte is XORed in even take place after the last byte, the question marks indicate where the three parity characters would show the wrong parity if there was an error at the point marked by the X.

It is easier to do this for 7-bit characters because 7 is an odd number, but it can also be done for 8-bit bytes, simply by rotating only one accumulator by one bit for each byte, and not rotating one accumulator at all, as follows:

33313333---1-------1-------1-------1-------1-------1-------1---- ==?
----1---3333*333----1-------1-------1-------1-------1-------1--- ===
22222*2222222*22*****X**22222*2222222*2222222*2222222*2222222*22 =?=
------1-------1-------1-333333*3------1-------1-------1-------1- ===
-------1-------1-------1-------13333333*-------1-------1-------1 ===
1-------1-------1-------1-------1-------*33333331-------1------- ?==
-1-------1-------1-------1-------1-------1------3*333333-1------ ===
--1-------1-------1-------1-------1-------1-------1-----33*33333 ===

However, approaches based only on using parity bits are unsophisticated, and more efficient ways of correcting and detecting multiple errors can be found if more sophisticated codes are used.

Another class of error-correcting codes are called Hamming codes. No longer trivial, these codes are also perfect, which means they add exactly enough redundancy to a group of input bits to detect and correct a certain number of errors.

Note that this perfection applies to one particular situation, where errors occur randomly among bits. Where there is a tendency for bits close together in a message to be in error together, this being called a burst error, other strategies are needed. One common way of dealing with this is simply to apply conventional error-correcting codes to widely-separated bits at equal intervals in a message. This, or the strategy of applying the error-correcting code first, and then spreading out the bits of each output group, is called interleaving.

In a Hamming code, after the group of bits being encoded, the extra parity bits are formed by allocating every combination of those bits with two or more of them equal to 1 to each bit position in the input. Hence, changing one bit in the input always changes at least three bits in the word with error-correction added. Since no two rows in the error-checking part of the matrix are identical, changing two bits in the input still results in a three-bit change at least; thus, the number of bits difference (or the Hamming distance) between possible outputs (valid codewords) is a minimum of three.

A parity bit can be added to a Hamming code; the result is another perfect code, this time with a four-bit change resulting from any change in the input.

The Hamming distance between valid codewords tells us how much error-checking we have. If there is no error-checking added, each different input still gives a different output, for a distance of 1. If the distance is 2, one error can be detected, as is the case for the code that just adds one parity bit. If the distance is 3, as for repeating a bit three times, or the basic Hamming code without a parity bit added, one error can be corrected.

The error-correcting capabilities of an error-correcting code are given by the Hamming distance between codewords, but it is possible to choose how to make use of them. For example, if we repeat every bit three times, we can either choose to accept 011 as standing for 1, hence correcting one error when it occurs, but being fooled if two errors happen, or we can choose to discard it, accepting only 000 and 111. Thus, a Hamming distance of 3 can be used either to correct one error, or to detect two errors. A distance of 4, such as obtained by a Hamming code with a parity bit, or by repeating a bit four times, allows correcting one error and detecting two errors, or detecting up to three errors.

Some Hamming codes are:

1 0 0 0 0 1 1
0 1 0 0 1 0 1
0 0 1 0 1 1 0
0 0 0 1 1 1 1

1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 1 0 0 0 0 0 0 0 0 0 0 1 0 1
0 0 1 0 0 0 0 0 0 0 0 0 1 1 0
0 0 0 1 0 0 0 0 0 0 0 1 0 0 1
0 0 0 0 1 0 0 0 0 0 0 1 0 1 0
0 0 0 0 0 1 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 1 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 1 0 0 0 1 0 1 1
0 0 0 0 0 0 0 0 1 0 0 1 1 0 1
0 0 0 0 0 0 0 0 0 1 0 1 1 1 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1

Since a Hamming code, as shown, can only correct one error, decoding is simple:

This, of course, assumes that the transmitted block actually was subjected to no more than one error. When a parity bit is added to the Hamming code, decoding is modified slightly:

These are just one particular form for each of the three Hamming codes shown. Rearranging the rows or columns, or certain other modifications, will not change the ability of the code to be used to detect or correct errors. Replacing one row of the matrix by the XOR of itself and another row is allowed, because that will result in the matrix generating exactly the same set of codewords, but they will be generated in some cases for different inputs.

There are a number of other codes used for error detection and correction. Examples of codes used in the same kind of form as used with Hamming codes include Hadamard codes and Reed-Muller codes. These codes all add slightly more redundancy than is required for the error-checking they seek to attain.

An example of a Hadamard code looks like this:

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 1
1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1
1 0 0 1 1 0 0 1 0 1 1 0 0 1 1 0
1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1
1 0 1 0 0 1 0 1 0 1 0 1 1 0 1 0
1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0
1 0 0 1 0 1 1 0 0 1 1 0 1 0 0 1

These codes are obtained by a procedure similar to that used to create fractal designs:

                       1 1   1 1
   1 1                 1 0   1 0
   1 0  - becomes -->
                       1 1   0 0
                       1 0   0 1

applied as many times as desired, and then the result has its complement appended to it. Mariner 9 used the code of this type that occupies a 32 by 64 matrix. This method only generates Hadamard codes of orders which are powers of 2. Hadamard codes and matrices of many other orders which are multiples of 4 are also known, and it is conjectured, but not yet proven, that one exists for every such order. These Hadamard codes are obtained by other, more difficult methods.

Only one other perfect code for binary signals is known, the binary Golay code. It takes a 12-bit input, and adds 11 error-checking bits to it. Like the Hamming codes, an extra parity-check bit can be added, and the code remains perfect.

A modified form of the Golay code with parity bit added, so that the parity bit is no longer explicitly visible, is shown in a book by two of the three co-authors of the Handbook of Applied Cryptography, and an equivalent form, modified by some rearrangement of rows and columns (to obtain a shifting of the cyclic 11 by 11 portion of the error-checking matrix, and to put the row and column of 1 bits in a more conventional position) is shown here:

1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1
0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 1
0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 0 1
0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 1 0 1 1
0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1 1
0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1 0 1
0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0 1 1
0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 1 1 1
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 1 1 1 1
0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 1 1 0 1
0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0 0 1
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0

In this code, the minimum Hamming distance between valid codewords is eight; without the parity bit, it would be seven.

A distance of seven allows either correcting three errors, correcting two errors and detecting two errors, correcting one error and detecting four errors, or detecting six errors.

A distance of eight allows either correcting three errors and detecting one error, correcting two errors and detecting three errors, correcting one error and detecting five errors, or detecting seven errors.

Examine the error-checking portion, the second square half, of the matrix for the Golay code shown above. The right and bottom edges consist of all ones except for a zero where they meet.

The remaining 11 by 11 square consists of the sequence 10110111000 repeated in every row, but shifted one space to the left in each row. This sequence contains exactly six one bits, an even number.

The matrix is symmetric, hence unchanged when flipped around the diagonal running from the top left to the bottom right.

Hence, the fact that every row contains an odd number of 1 bits, the last row ANDed with any other row produces a row with six one bits, and any two of the first 11 rows, when combined by an AND, produces a rotated version of one of the following strings:

10110111000 and 01101110001 = 00100110000 (3 bits)
10110111000 and 11011100010 = 10010100000 (3 bits)
10110111000 and 10111000101 = 10110000000 (3 bits)
10110111000 and 01110001011 = 00110001000 (3 bits)
10110111000 and 11100010110 = 10100010000 (3 bits)
10110111000 and 11000101101 = 10000101000 (3 bits)
10110111000 and 10001011011 = 10000011000 (3 bits)
10110111000 and 00010110111 = 00010110000 (3 bits)
10110111000 and 00101101110 = 00100101000 (3 bits)
10110111000 and 01011011100 = 00010011000 (3 bits)

preceded by a single 1 bit, means that, using modulo 2 arithmetic, the error-checking matrix in the code as represented here is its own inverse. (If it weren't for that extra zero, a different decoding matrix would be required, and a slightly more complicated version of the decoding procedure given below would be required.) Because of the symmetry around the diagonal, this is true both in the usual convention for matrix multiplication (numbers go in from the top, and come out on the right) and the one used for error-correcting codes (numbers go in from the left, and come out on the bottom). For more information on matrix multiplication, see the section concerning the Hill Cipher.

This helps to make it practical to check a codeword formed by this code, and then transmitted, for errors. The following procedure will find the correct original 12-bit input if there were three or fewer errors in the transmitted 24-bit block, or it will fail if there were four errors. (With more than four errors, of course, it can be fooled.)

Another perspective on the binary Golay Code, which I also found understandable, is contained in this thesis. In the form of the Golay Code given above, eleven rows are cyclic, and one special, in the error-checking matrix; in the form discussed in that thesis, all twelve rows are equivalent, but as they relate to the faces of a dodecahedron, there is no way to put them in a cyclic order.

The form of the Golay code discussed there is:

1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1
0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 1 1 1
0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 1 1 1
0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 1 1
0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 0 0 1
0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 0 1
0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 1 0 1 0 1 1 0 0
0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0
0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0
0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 1 1 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1

Each of the twelve rows and columns of the error-checking part of the matrix corresponds to a face of the dodecahedron, and it contains a zero for every pair of faces that are next to one another. (A face is not considered to be next to itself.) This is still a Golay code, with the same error-correcting property of a Hamming distance of 8 between codewords, and not only is the error-checking matrix symmetric, but once again it is its own inverse as shown here. Because of the dodecahedral symmetry, once again, it is only necessary to AND one row of the matrix with the eleven other rows to establish that. For example, row 1 shares four bits with rows 2 to 11, and two bits with row 12.

But being self-dual is not a necessary property of a Golay code; either example of a Golay code given here would still be a Golay code if the error-checking matrix were reflected left to right, since the error-checking properties would be identical, but then that matrix would not be its own inverse.

This site contains a link to a paper in DVI format giving eight different constructions of the binary Golay code. (Unfortunately, you may have difficulty in viewing documents in DVI format on your system.)

And here is an example of a form of the Golay code, again one that generates 24-bit output, with the parity bit concealed, that may have actually been used in the transmission of information; it is part of an unofficial draft standard for automatically establishing radio links, and thus it may have been taken from previous actual practice or standards:

1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 1 1
0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0
0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 1 0 1 1
0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 1 0 1 1 0
0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 0 1 1 0 0 1
0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 0 1 1 0 1
0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 0 1 1 1
0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 1 1 1 0
0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 1 1 0 1
0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 1 1 1

The error-checking part of the matrix is not symmetric along the main (upper left to lower right) diagonal, and its transpose is shown as being used in decoding. Also, it is said to have been generated from the polynomial

 11  9  7  6  5
x  +x +x +x +x +x+1

Each column has seven ones in it, so when multiplied by its transpose, there will be a 1 in every position on the diagonal in the result. Any two distinct columns have either four or two ones in common, (as I had to verify by brute force with a short BASIC program) and so the transpose of the error-checking part of the matrix is indeed also the inverse of that part.

Despite the fact that each column and row contains seven ones, the error matrix can't be produced simply by rearranging rows and columns of the one produced from the dodecahedron. This can be shown because the columns corresponding to opposite faces can be identified (no zeroes in the same row), and two non-opposite faces must be adjacent to two other faces, and those two faces must be adjacent to three other (than the first two: those two faces may be adjacent to each other if the two non-opposite faces were not adjacent) faces each, two from each group of which are opposite to two from the other group (which is the point at which it breaks down if you start with the first two columns).

Error-checking in this case involves the use of the inverse of the error-checking part of the matrix, but otherwise the algorithm is the same as the one given above.

A form of the Golay Code as a (23,12) code, without the parity bit added, is the following, based on the one from the original paper in which Marcel Golay of the Signal Corps described this error-correcting code in 1949.

                 A A A A B B B C C D
                 B C D E C D E D E E

1 0 0     0 0    0 0 0 0 1 1 1 1 1 1   A       1
0 1 0 ... 0 0    0 1 1 1 0 0 0 1 1 1   B       1
0 0 1     0 0    1 0 1 1 0 1 1 0 0 1   C       1
                 1 1 0 1 1 0 1 0 1 0   D       1
                 1 1 1 0 1 1 0 1 0 0   E       1

                 0 1 0 1 0 1 1 1 0 0   ABCED   1
                 0 1 1 0 1 0 1 0 0 1   ABDCE   1
                 0 0 1 1 1 1 0 0 1 0   ABEDC   1
                 1 0 1 0 0 0 1 1 1 0   ACBDE   1
                 1 0 0 1 1 0 0 1 0 1   ACEBD   1
0 0 0 ... 1 0    1 1 1 0 0 1 0 0 1 1   ADCBE   1

0 0 0 ... 0 1    1 1 1 1 1 1 1 1 1 1           0

On one side is the identity matrix, abbreviated in this diagram. Note that the column of all ones except for a zero at the bottom, is part of the basic structure of the code, so the column representing the parity bit is hidden elsewhere in the preceding examples. (This seems astounding, as one would think that if one added a parity bit to this code as it stands, the result would be wasteful instead of perfect, given its similarity to that column.)

AB to DE are the various combinations of two different objects from a set of five. The first five rows represent when one of the five objects is not part of that pair of objects.

The next six rows represent when that pair of objects, or its reversal, is not present in one of the six different odd permutations of the five objects when cyclic permutations are ignored.

Then there is the row with all ones except for that single zero in the last column.

The field of error-correcting codes, like the field of data compression, is still a very active one, with new research being done and patents being issued on new methods.

In the 1970s, a set of error-correcting codes called "Fire codes" were developed. One of the simplest of these could be used as a direct replacement for a parity bit, used with each character in many existing systems, providing error correction instead of just the detection of erroneous characters.

It worked like this:

Data   * - - - - - - - *    * - - * - - - - * - - *
bits   - * - - - - - - *    - * - - * - - - * - - *
       - - * - - - - - *    - - * - - * - - * - - *
       - - - * - - - - *    - - - * - - * - * - - *
       - - - - * - - - *    - - - - * - - * * - - *
       - - - - - * - - *    - - - - - * - - X - - *
       - - - - - - * - *    - - - - - - * - * * - *
       - - - - - - - * *    - - - - - - - * * - * *
Parity = = = = = = = = P    = = = = = = = = ! = = !

Each parity bit is the XOR of parity for the data byte with which it is associated and parity for staggered bits from the eight previous bytes. Essentially, it is a clever way to obtain the same results as are provided by the use of row and column parity bits. (After the last byte of data, one extra byte, with the extra parity bit, is needed to provide parity for all the remaining diagonals.)

More complicated forms involved XORing these parity bits with the parity of another group of eight bits, staggered by being from bytes separated by a larger distance.

A very recent exciting development in error-correcting codes is the class of codes known as "Turbo Codes". These codes are based on the principle of applying two error-correcting codes to the data bits, but with different interleaving. Just as interleaving by itself turns a burst error into scattered individual-bit errors, which error-correcting codes can deal with more easily, using two different interleaving schemes on the data being transmitted reduces to a very small value the chance that any bit will be unrecoverable due to an excessive number of random errors in both of the error-correcting code blocks in which it is found. (The same principle can also be applied to continuous error-correcting codes as well as block codes.)


[Next] [Up] [Previous] [Index]

Table of Contents
Home Page