[Next] [Up/Previous] [Index]

Base-26 Armor

Since

 2^47 = 140,737,488,355,328 and
26^10 = 141,167,095,653,376

ten letters can represent 47 bits with fairly good efficiency. In fact, the information content of 10 letters from a 26-character alphabet amounts to 47.00439718 bits, less than 1/227th of a bit extra.

However, one would like to avoid performing 48-bit multiplications to do the conversion.

A method devised by IBM for representing three decimal digits in 10 binary bits can illustrate a way in which to do this.

Illustrative Example: Encoding of 3 Digits in 10 Bits

Let ooo (or ppp, qqq) represent a digit from 0 to 7 represented by the following code:

000  0
001  1
010  2
011  3
100  4
101  5
110  6
111  7

and let d (or e, f) represent a digit from 8 to 9 in the following code:

0    8
1    9

then, any three decimal digits can be represented within 10 bits as follows:

0ooopppqqq (3 digits with 3 bits - 1 way)
100dpppqqq (2 digits with 3 bits, 1 digit with 1 bit - 3 ways)
101oooeqqq
110ooopppf
11100deqqq (1 digit with 3 bits, 2 digits with 1 bit - 3 ways)
11101dpppf
11110oooef
1111100def (3 digits with 1 bit - 1 way)

with 24 unused combinations remaining.

This method of encoding was patented by IBM, in U.S. patent 3,842,414. This patent was filed on June 18, 1973, and granted on October 15, 1974.

It may or may not already be obvious, but the table illustrating the decimal to binary code is based on ooo, ppp, and qqq representing the first, second, and third digits respectively (if they are from 0 to 7) and on d, e, and f representing the first, second, and third digits respectively if they are instead either 8 or 9. This same notational convention will be used within the description of the coding that takes 47 bits to ten letters.

Here, we have some binary values left over, so we can go from decimal to binary, but we cannot use this code to represent arbitrary binary data as decimal digits without some modification.

With the lengths that I am choosing, it will be the other way around for letters; we will use all possible binary combinations, and still have a few alphabetic combinations left over.

Coding bits as letters

Because the conversion from 47 bits to 10 letters is very efficient, a direct coding, since it would use a large number of possible codes, would be very complicated. Instead, to derive a coding the description of which would be reasonable in length, I have chosen to describe a method that has a two-layer structure.

Shorter codings: the basis

First, we will represent 14 binary bits as three letters.

aaaa (or bbbb, cccc) will represent a letter from the code:

0000 A   0001 B   0010 C   0011 D   0100 E   0101 F   0110 G   0111 H
1000 I   1001 J   1010 K   1011 L   1100 M   1101 N   1110 O   1111 P

iii (or jjj, kkk) will represent a letter from the code:

000 Q   001 R   010 S   011 T   100 U   101 V   110 W   111 X

and m (or n, o) will represent a letter from the code:

0 Y   1 Z

Then, a coding generating three letters from 14 bits looks like the following:

00aaaabbbbcccc  (4,4,4: 1 way)
010iiibbbbcccc  (3,4,4: 3 ways)
011aaaajjjcccc
100aaaabbbbkkk
1010iiijjjcccc  (3,3,4: 3 ways)
1011iiibbbbkkk
1100aaaajjjkkk
11010iiijjjkkk  (3,3,3: 1 way)
11011mbbbbcccc  (1,4,4: 3 ways)
11100aaaancccc
11101aaaabbbbo
111100mjjjcccc  (1,3,4: 6 ways: 4 used here.)
111101mbbbbkkk
111110iiincccc
111111aaaankkk

This shorter coding of 14 bits as three letters is not quite as efficient as the overall coding of 47 bits to ten letters we will construct using it as a component. 3 times 14 is 42, and so there are five bits left, which don't quite fit in one letter. If the first five bits represent a number from 0 to 25, then they can represent a letter, using the codings shown above, in one of the forms 0aaaa, 10iii, or 1100m. Otherwise, codings which use the combinations of three letters which are not used by the above coding of fourteen bits to three letters are used. These codings don't encode fourteen bits, since only a small fraction of the possible combinations of three letters are left.

Other component codes

Some more codings, using the unused letter combinations, are now constructed for use in producing a coding of 47 bits into ten letters that is manageable to describe, since a direct coding is very long and complicated.

There are still some possible arrangements of three letters left over, although the number is now rather small. 10 bits can still be encoded in those which remain, as follows:

00iiibbbbo  (1,3,4: 6 ways: 2 remaining for use here)
01aaaajjjo  
100mjjjkkk  (1,3,3: 3 ways)
101iiinkkk
110iiijjjo
1110mncccc  (1,1,4: 3 ways: 2 used here)
1111mbbbbo

And 7 bits can still be encoded in the few now remaining:

0aaaano  (1,1,4: 3 ways: 1 remaining for use here)
10mnkkk  (1,1,3: 3 ways: 2 used here)
11mjjjo

The remaining combinations make 5:

iiino

and 3 bits:

mno

The code for 47 bits

To encode 47 bits as ten letters, we now proceed as follows, using the codings that we've constructed so far.

aaaa, iii, or m represent the first of the 10 letters.

The remaining letters are divided into three groups of three, coded using the method above.

For the first group of three, the 14, 10, 7, and 5 bit codings are represented by AAAAAAAAAA, FFFFFF, PPP, and X respectively; for the second, BBBBBBBBBB, GGGGGG, QQQ, and Y; for the third, CCCCCCCCCC, HHHHHH, RRR, and Z. The 3 bit coding is not required. Note that the length of the strings is equal to the length of the coding in bits minus 4, so that the length of the symbols used here are constant.

47 bits become 10 letters as follows:

0aaaaAAAAAAAAAABBBBBBBBBBCCCCCCCCCC (14,14,14, 1 way)
10iiiAAAAAAAAAABBBBBBBBBBCCCCCCCCCC (14,14,14, 1 way)
1100mAAAAAAAAAABBBBBBBBBBCCCCCCCCCC (14,14,14, 1 way)
11010aaaaFFFFFFBBBBBBBBBBCCCCCCCCCC (10,14,14, 3 ways)
11011aaaaAAAAAAAAAAGGGGGGCCCCCCCCCC
11100aaaaAAAAAAAAAABBBBBBBBBBHHHHHH
111010iiiFFFFFFBBBBBBBBBBCCCCCCCCCC (10,14,14, 3 ways)
111011iiiAAAAAAAAAAGGGGGGCCCCCCCCCC
111100iiiAAAAAAAAAABBBBBBBBBBHHHHHH
11110100mFFFFFFBBBBBBBBBBCCCCCCCCCC (10,14,14, 3 ways)
11110101mAAAAAAAAAAGGGGGGCCCCCCCCCC
11110110mAAAAAAAAAABBBBBBBBBBHHHHHH
11110111aaaaPPPBBBBBBBBBBCCCCCCCCCC (7,14,14, 3 ways)
11111000aaaaAAAAAAAAAAQQQCCCCCCCCCC
11111001aaaaAAAAAAAAAABBBBBBBBBBRRR
111110100aaaaFFFFFFGGGGGGCCCCCCCCCC (11,11,14, 3 ways)
111110101aaaaFFFFFFBBBBBBBBBBHHHHHH
111110110aaaaAAAAAAAAAAGGGGGGHHHHHH
111110111iiiPPPBBBBBBBBBBCCCCCCCCCC (7,14,14, 3 ways)
111111000iiiAAAAAAAAAAQQQCCCCCCCCCC
111111001iiiAAAAAAAAAABBBBBBBBBBRRR
1111110100iiiFFFFFFGGGGGGCCCCCCCCCC (11,11,14, 3 ways)
1111110101iiiFFFFFFBBBBBBBBBBHHHHHH
1111110110iiiAAAAAAAAAAGGGGGGHHHHHH
1111110111aaaaXBBBBBBBBBBCCCCCCCCCC (5,14,14, 3 ways)
1111111000aaaaAAAAAAAAAAYCCCCCCCCCC
1111111001aaaaAAAAAAAAAABBBBBBBBBBZ
11111110101mPPPBBBBBBBBBBCCCCCCCCCC (7,14,14, 3 ways)
11111110110mAAAAAAAAAAQQQCCCCCCCCCC
11111110111mAAAAAAAAAABBBBBBBBBBRRR
11111111000iiiXBBBBBBBBBBCCCCCCCCCC (5,14,14, 3 ways)
11111111001iiiAAAAAAAAAAYCCCCCCCCCC
11111111010iiiAAAAAAAAAABBBBBBBBBBZ
111111110110aaaaPPPGGGGGGCCCCCCCCCC (7,11,14, 6 ways)
111111110111aaaaPPPBBBBBBBBBBHHHHHH
111111111000aaaaFFFFFFQQQCCCCCCCCCC
111111111001aaaaAAAAAAAAAAQQQHHHHHH
111111111010aaaaFFFFFFBBBBBBBBBBRRR
111111111011aaaaAAAAAAAAAAGGGGGGRRR
1111111111000aaaaFFFFFFGGGGGGHHHHHH (11,11,11, 1 way)
1111111111001iiiPPPGGGGGGCCCCCCCCCC (7,11,14, 6 ways)
1111111111010iiiPPPBBBBBBBBBBHHHHHH
1111111111011iiiFFFFFFQQQCCCCCCCCCC
1111111111100iiiAAAAAAAAAAQQQHHHHHH
1111111111101iiiFFFFFFBBBBBBBBBBRRR
1111111111110iiiAAAAAAAAAAGGGGGGRRR
1111111111111mXBBBBBBBBBBCCCCCCCCCC (5,14,14, 3 ways, 1 used)

Attempting to produce a direct coding, using the aaaa/iii/m symbols for individual letters, produces a very long list of codes. This nested approach is considerably more manageable.

Here is an example, to help illustrate the above.

The 47-bit string

11100011100011100011100011100011100011100011100

will code as follows:

11100 aaaa (14-bits/3 lt) (14-bits/3 lt) (10 bits )
      0111 00011100011100 01110001110001 1100011100
         H 00aaaabbbbcccc 011aaaajjjcccc 110iiijjjo
                H   B   M       I  X   B      R  WY

to produce HHBMI XBRWY as its representation.


[Next] [Up/Previous] [Index]

Next
Table of Contents
Home Page