The scope of this article is to present the one time pad cipher method and its biggest vulnerability: that of the many time pad.
The one time pad: what it is and how it works
The one time pad is the archetype of the idea of stream cipher. It’s very simple: if you want to make a message unintelligible to an eavesdropper, just change each character of the original message in a way that you can revert, but that looks random to another person.
The way the one time pad works is the following. Suppose is the clear-text message you would like to send securely, of length . First, you need to generate a string of equal length . Then, you can obtain a cipher-text version of your message by computing the bitwise XOR of the two strings:
The best thing is that decoding is just the same as encoding, as the XOR operator has the property that (and that ). The only difference is that the cipher-text is involved in the XOR, rather than the clear-text:
Below is an example of the one time pad encoding achieved with Python, with a made-up pad string.
It is not difficult to realize that the whole strength of the algorithm lies in the pad. Of course, as an attacker, if you can obtain in some way, then it is not difficult to get the clear-text message from the ciphered one as well.
One requirement of is that it should be of the same size of the message. Otherwise, there are not enough bytes to XOR with, and part of the message is revealed. But the other, more subtle but still fundamental, requirement is that each is used only once. Understanding why this second requisite is important is the scope of this article.
The many time pad vulnerability
If the same key is used more than once, bad things will happen and you will go to hell. As soon as you have encoded a message with a given pad, throw it away immediately and for good! The one time pad is so simple that one little bit of carelessness can break it!
Let’s look at what happens if the same key is used more than once. Suppose an attacker intercepted some of the cipher-text messages, but did not have the key used to encode them. Could he get something out of them, if they had been encrypted with the same key? It turns out that the key becomes completely useless:
where in the last line we have used the fact that the XOR is a commutative operator, and that .
But behold! Given only two cipher-text, we retrieved a combination of the original plain-text messages! We may have not recovered the original messages yet, but we don’t have too long a way to go.
For the sake of simplicity, let’s assume that plain-text messages are made up only of letters (both lower and uppercase) and spaces. What is needed just one stroke of genius. The question we now need to ask is: what happens when two characters from the plain-texts are XORed together?
Using the ASCII encoding, we can build the following (symmetric) table:
A-Z (65-90) | a-z (97-122) | Space (32) | |
A-Z (65-90) | |||
a-z (97-122) | |||
Space (32) | 0 |
The crucial part is the last line of the table. When XORing a space with any of the upper/lowercase letters, a number is obtained.
Imagine to have the three following plaintexts:
1 2 3 |
Attack now Withdraw tomorrow Come today |
and suppose you know they have been encrypted with the same one time pad. As we have seen above, we do not need to consider the key as it can be made to vanish if used multiple times. Let’s look at the the messages split in columns:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | |
A | t | t | a | c | k | n | o | w | |||||||||
W | i | t | h | d | r | a | w | t | o | m | o | r | r | o | w | ||
C | o | m | e | t | o | d | a | y |
I will treat messages as arrays: the first char of the first message will be , the second char of the second message will be , and so on.
Let’s look at the 7th column, where the first message has a space. What happens if we XOR with and ? As expected, we get:
So it seems we can pretty confidently assert that is a space. But since is known, we can easily recover as well! And, of course, we can do the same with . Boom!
But the best is yet to come! We know that . Thus, we can now recover one char of the key itself:
If we intercept enough ciphertexts, all encoded with the same one time pad key, then it’s likely that there will be enough spaces spread throughout all columns that would allow a fair amount of decryption with this method. Even if that would not be the case, enough spaces would probably be present to assure at least a partial decryption, that may then be completed by hand.
To sum up, the idea of the many time pad is that:
It is surprising how much one can do with seemingly useless information. It becomes clear now why cryptographic protocols should never leak any information whatsoever!
Python code for many time pad
The many time pad vulnerability is pretty powerful. It is not difficult to understand, and not so difficult to implement. Below is an almost-working Python code that you may draw inspiration from.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
key = bytearray(b'?' * max_length) #Key decrypting routine for k in range(max(len(c) for c in ciphertexts)): cts = [c for c in ciphertexts if len(c) > k] #selects all cipher texts long at least k guess = ord('?') #holds guess for current key char for curs1 in range(len(cts)): for curs2 in range(len(cts)): if curs2 == curs1: continue xor = ord(cts[curs1][k]) ^ ord(cts[curs2][k]) if 0 < xor < 65: guess = ord('?') break guess = SPACE if guess == SPACE: #we can use the space to recover one key char and decrypt the whole column! key[k] = ord(cts[curs1][k]) ^ SPACE break #Decode messages for k in range(len(key)): for curs in range(len(cleartexts)): if len(cleartexts[curs]) > k: cleartexts[curs][k] = key[k] ^ ord(ciphertexts[curs][k]) print("\n".join(c.decode('ascii') for c in cleartexts)) |