The one time pad and the many time pad vulnerability

The scope of this article is to present the one time pad cipher method and its biggest vulnerability: that of the many time pad.

The one time pad: what it is and how it works

The one time pad is the archetype of the idea of stream cipher. It’s very simple: if you want to make a message unintelligible to an eavesdropper, just change each character of the original message in a way that you can revert, but that looks random to another person.

The way the one time pad works is the following. Suppose \mathcal{M} is the clear-text message you would like to send securely, of length |\mathcal{M}| = s. First, you need to generate a string \mathcal{K} of equal length |\mathcal{K}| = s. Then, you can obtain a cipher-text version of your message by computing the bitwise XOR of the two strings:

    \[\mathcal{M} \oplus \mathcal{K}\]

The best thing is that decoding is just the same as encoding, as the XOR operator has the property that \mathcal{X} \oplus \mathcal{X} = 0 \ \forall X (and that \mathcal{X} \oplus 0 = \mathcal{X} \ \forall \mathcal{X}). The only difference is that the cipher-text is involved in the XOR, rather than the clear-text:

    \[\mathcal{C} \oplus \mathcal{K} = \mathcal{M} \oplus \mathcal{K} \oplus \mathcal{K} = \mathcal{M} \oplus 0 = \mathcal{M}\]

Below is an example of the one time pad encoding achieved with Python, with a made-up pad string.

In the first section, result holds the XOR result. In the second part, the result and one_time_pad variables are XORed together to obtain the original plain-text message again.

It is not difficult to realize that the whole strength of the algorithm lies in the \mathcal{K} pad. Of course, as an attacker, if you can obtain \mathcal{K} in some way, then it is not difficult to get the clear-text message from the ciphered one as well.

One requirement of \mathcal{K} is that it should be of the same size of the message. Otherwise, there are not enough bytes to XOR with, and part of the message is revealed. But the other, more subtle but still fundamental, requirement is that each \mathcal{K} is used only once. Understanding why this second requisite is important is the scope of this article.

The many time pad vulnerability

If the same key is used more than once, bad things will happen and you will go to hell. As soon as you have encoded a message with a given pad, throw it away immediately and for good! The one time pad is so simple that one little bit of carelessness can break it!

Let’s look at what happens if the same key \mathcal{K} is used more than once. Suppose an attacker intercepted some of the cipher-text messages, but did not have the key used to encode them. Could he get something out of them, if they had been encrypted with the same key? It turns out that the key becomes completely useless:

    \[\mathcal{C}_1 = \mathcal{M}_1 \oplus \mathcal{K}\]

    \[\mathcal{C}_2 = \mathcal{M}_2 \oplus \mathcal{K}\]

    \[\mathcal{C}_1 \oplus \mathcal{C}_2 =\mathcal{M}_1 \oplus\mathcal{K} \oplus \mathcal{M}_2 \oplus \mathcal{K} = \mathcal{M}_1 \oplus \mathcal{M}_2\]

where in the last line we have used the fact that the XOR is a commutative operator, and that \mathcal{K} \oplus \mathcal{K} = 0.

But behold! Given only two cipher-text, we retrieved a combination of the original plain-text messages! We may have not recovered the original messages yet, but we don’t have too long a way to go.

For the sake of simplicity, let’s assume that plain-text messages are made up only of letters (both lower and uppercase) and spaces. What is needed just one stroke of genius. The question we now need to ask is: what happens when two characters from the plain-texts are XORed together?

Using the ASCII encoding, we can build the following (symmetric) table:

 \oplus A-Z (65-90) a-z (97-122) Space (32)
A-Z (65-90) \leq 32 \leq 64 \geq 65
a-z (97-122) \leq 64 \leq 32 \geq 65
Space (32) \geq 65 \geq 65 0

The crucial part is the last line of the table. When XORing a space with any of the upper/lowercase letters, a number \geq 65 is obtained.

Imagine to have the three following plaintexts:

and suppose you know they have been encrypted with the same one time pad. As we have seen above, we do not need to consider the key \mathcal{K} as it can be made to vanish if used multiple times. Let’s look at the the messages split in columns:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
\mathcal{M}_0 A t t a c k n o w
\mathcal{M}_1 W i t h d r a w t o m o r r o w
\mathcal{M}_2 C o m e t o d a y

I will treat messages as arrays: the first char of the first message will be \mathcal{M}_0[0], the second char of the second message will be \mathcal{M}_1[1], and so on.

Let’s look at the 7th column, where the first message has a space. What happens if we XOR \mathcal{M}_0[6] with \mathcal{M}_1[6] and \mathcal{M}_2[6]? As expected, we get:

    \[\mathcal{M}_0[6] \oplus \mathcal{M}_1[6] = \mathcal{C}_0[6] \oplus \mathcal{C}_1[6] = 65\]

    \[\mathcal{M}_0[6] \oplus \mathcal{M}_2[6] = \mathcal{C}_0[6] \oplus \mathcal{C}_1[6] = 79\]

So it seems we can pretty confidently assert that \mathcal{M}_0[6] is a space. But since \mathcal{M}_0[6] \oplus \mathcal{M}_1[6] is known, we can easily recover \mathcal{M}_1[6] as well! And, of course, we can do the same with \mathcal{M}_2[6]. Boom!

But the best is yet to come! We know that \mathcal{C}_0 = \mathcal{M}_0 \oplus \mathcal{K}. Thus, we can now recover one char of the key itself:

    \[\mathcal{K}[6] = \mathcal{M}_0[6] \oplus \mathcal{C}_0[6]\]

If we intercept enough ciphertexts, all encoded with the same one time pad key, then it’s likely that there will be enough spaces spread throughout all columns that would allow a fair amount of decryption with this method. Even if that would not be the case, enough spaces would probably be present to assure at least a partial decryption, that may then be completed by hand.

To sum up, the idea of the many time pad is that:

    \[\text{If } \mathcal{M}_t[k] \oplus \mathcal{M}_s[k] \geq 65 \ \forall s \neq t \Rightarrow \mathcal{M}_t[k] \text{ is a space}\]

It is surprising how much one can do with seemingly useless information. It becomes clear now why cryptographic protocols should never leak any information whatsoever!

Python code for many time pad

The many time pad vulnerability is pretty powerful. It is not difficult to understand, and not so difficult to implement. Below is an almost-working Python code that you may draw inspiration from.

Leave a Reply

Your email address will not be published. Required fields are marked *