[Next] [Up] [Previous] [Index]

Twofish

Developed by Bruce Schneier as a successor to his 64-bit Blowfish block cipher, the Twofish specification has been released, and is available at http://www.counterpane.com//twofish.html in both Adobe Postscript format and Adobe Acrobat .PDF format.

How Twofish Works

Twofish uses 40 32-bit subkeys. The first eight are used for whitening, four at the beginning and four at the end are XORed with the entire block. Each round uses two of the remaining 32 subkeys, and so Twofish has sixteen rounds.

The division of the 128-bit block into four 32-bit quarters is done using the "little-endian" convention, which presumably means the leftmost quarter is the earliest one, but the least significant numerically. I will denote the quarters as Q0 through Q3.

The Twofish Round

A round of Twofish proceeds as follows:

Q3 is rotated one bit left.

Q0, and Q1 rotated left 8 bits, are each submitted to four key-dependent S-boxes with 8-bit inputs and outputs. These key-dependent S-boxes are equivalent to the use of fixed S-boxes with 8-bit inputs and outputs alternating with the XOR of subkey material, and the fixed S-boxes are themselves constructed from smaller S-boxes with four-bit inputs and outputs.

Then, the bytes in each are combined by means of matrix multiplication with the following matrix (called the MDS matrix):

01 EF 5B 5B
5B EF EF 01
EF 5B 01 EF
EF 01 EF 5B

(needless to say, the entries are in hexadecimal notation) over the Galois Field of 2^8 with the modular polynomial x^8 + x^6 + x^5 + x^3 + 1.

This means that when a byte is multiplied by an element of the matrix, instead of actual multiplication being performed, which can be thought of as shifting the input byte, and adding it to the total whenever its last bit is over a 1 bit of the number in the matrix, a similar operation is performed, but an XOR instead of addition is performed.

Thus, in effect, one is multiplying polynomials (one cannot carry, since one doesn't know what x is) with coefficients that are either 0 and 1, handled according to modulo-2 arithmetic.

The result of this "multiplication" is a 16-bit number. It is then reduced to an 8-bit number as follows: the modular polynomial, considered to be the binary string 101101001, is shifted left until its first bit coincides with the first 1 bit of the result. It is then XORed with the number, and shifted right until its first bit coincides with the first remaining 1 bit, and this is repeated until the number is 8 bits long or less.

Finally, the two resulting 32-bit quantities are mixed with each other through a Pseudo-Hadamard Transform; the one formed from Q1 is added to the one formed from Q0, then the one formed from Q0 is added to the one formed from Q1.

Then, the first subkey for the round is added to the one formed from Q0, and the result is XORed to Q2. The second subkey for the round is added to the one formed from Q1, and the result is XORed to Q3.

Finally, Q2 is rotated one bit right, and the two halves of the block are swapped (Q0 is swapped with Q2, and Q1 is swapped with Q3).

As in DES, the halves are not swapped after the last round.

Note that the matrix multiplication, since it involves an XOR (rather than addition) of the individual products, can be precomputed, making the S-box entries 32 bits wide rather than 8.

The Key Schedule

This description of the key schedule is not quite detailed enough to permit implementation at this time, but it outlines all the required steps, and should help in understanding the original documentation.

Key generation begins by deriving three key vectors each half the length of the original key.

The first two are formed by splitting the key into 32-bit parts. Numbering these parts starting from zero, the even-numbered ones become M(e), and the odd-numbered ones become M(o).

The third one is formed by dividing the key into 64-bit parts, and producing one 32-bit part of the key vector by multiplying each 64-bit part by this matrix (called the RS matrix):

01 A4 55 87 5A 58 DB 9E
A4 56 82 F3 1E C6 68 E5
02 A1 FC C1 47 AE 3D 19
A4 55 87 5A 58 DB 9E 03

over the Galois Field GF(2^8) with the modular polynomial x^8 + x^6 + x^3 + x^2 + 1, which means that after an XOR-multiplication the bit pattern 101001101 is used to fit the result back into eight bits this time.

The reason that both fields are referred to simply as "GF(2^8)" is because, except that the same roles will now be performed by different numbers from 0 to 255, no matter which modular polynomial is used, there is really only one Galois Field for any prime power. Changing the modular polynomial changes which number does what, which is an isomorphism, but it does not change the underlying mathematical structure of the field. However, that doesn't mean that changing the modular polynomial serves no purpose here: it is essentially a way to get an S-box for free.

The resulting 32-bit words are then placed in reverse order into the key vector called S.

The fixed S-boxes with eight bit inputs and outputs, called q(0) and q(1), are each constructed from four S-boxes with four bit inputs and outputs. For q(0), the S-boxes are:

Input   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
      ------------------------------------------------
 T0     8  1  7 13  6 15  3  2  0 11  5  9 14 12 10  4
 T1    14 12 11  8  1  2  3  5 15  4 10  6  7  0  9 13
 T2    11 10  5 14  6 13  9  0 12  8 15  3  2  4  7  1
 T3    13  7 15  4  1  2  6 14  9 11  3  0  8  5 12 10

And, for q(1), the S-boxes, again given the same designations, are:

Input   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
      ------------------------------------------------
 T0     2  8 11 13 15  7  6 14  3  1  9  4  0 10 12  5
 T1     1 14  2 11  4 12  3  7  6 13 10  5 15  9  0  8
 T2     4 12  7  5  1  6  9 10  0 14 13  8  2 11  3 15
 T3    11  9  5  1 12  3 13 14  6  4  7 15  2  0  8 10

Each S-box is formed as follows:

The input byte is split into its most significant and least significant halves, A and B.
The new value of A is the XOR of the old values of A and B.
The new value of B is the XOR of the old value of A, B rotated right one bit, and 8 if the old value of A was odd.
A and B are then replaced by their substitutes in the 16-bit S-boxes T0 and T1 respectively.
The new value of A is the XOR of the old values of A and B.
The new value of B is the XOR of the old value of A, B rotated right one bit, and 8 if the old value of A was odd.
A and B are then replaced by their substitutes in the 16-bit S-boxes T2 and T3 respectively.
A and B are then combined in reverse order to form the result byte.

The four key-dependent S-boxes are formed from the 32-bit elements of the key vector S as follows:

If the key is 256 bits long, the key vector S has four 32-bit elements, S(0), S(1), S(2), and S(3). In that case, the four key-dependent S-boxes are equivalent to performing the operations:

output = q(0)(S(0,0) xor q(1)(S(1,0) xor q(1)(S(2,0) xor q(0)(S(3,0) xor q(1)(input))))
output = q(1)(S(0,1) xor q(1)(S(1,1) xor q(0)(S(2,1) xor q(0)(S(3,1) xor q(0)(input))))
output = q(0)(S(0,2) xor q(0)(S(1,2) xor q(1)(S(2,2) xor q(1)(S(3,2) xor q(0)(input))))
output = q(1)(S(0,3) xor q(0)(S(1,3) xor q(0)(S(2,3) xor q(1)(S(3,3) xor q(1)(input))))

where S(2,1) means byte 1 of word 2 in the key vector S.

When the key is 192 bits long, the S-box definitions are simplified to:

output = q(0)(S(0,0) xor q(1)(S(1,0) xor q(1)(S(2,0) xor q(0)(input)))
output = q(1)(S(0,1) xor q(1)(S(1,1) xor q(0)(S(2,1) xor q(0)(input)))
output = q(0)(S(0,2) xor q(0)(S(1,2) xor q(1)(S(2,2) xor q(1)(input)))
output = q(1)(S(0,3) xor q(0)(S(1,3) xor q(0)(S(2,3) xor q(1)(input)))

and when the key is only 128 bits long, the S-box definitions are simplified to:

output = q(0)(S(0,0) xor q(1)(S(1,0) xor q(1)(input))
output = q(1)(S(0,1) xor q(1)(S(1,1) xor q(0)(input))
output = q(0)(S(0,2) xor q(0)(S(1,2) xor q(1)(input))
output = q(1)(S(0,3) xor q(0)(S(1,3) xor q(0)(input))

The subkeys which are actually added to the function outputs before they are XORed to another quarter of the block are produced by a process very similar to a Twofish round, but with key-dependent S-boxes derived from the key vectors M(e) and M(o) instead of from the key vector S. The inputs are the round number times 2, plus 1 on one side, in all four bytes of a word, and the rotate left step takes place after the function instead of before: an 8-bit rotate left before the PHT, and a 9-bit rotate left after.

[Next] [Up] [Previous] [Index]

Next
Chapter Start
Table of Contents
Main Page