Brute force a crackme file password with Python

I was to reverse a file for a challenge, MD5 hash 85c9feed0cb0f240a62b1e50d1ab0419.

The challenge was called mio cuggino, purposefully misspelled with two g letters. It asks for three numbers. The challenge led me to a brute force of the password with a Python script, learning how to interact with a subprocess stdin and stdout (SKIP to next section if you don’t care about context but only want the code).

Looking at the assembly with Radare, the first thing it does is to check that the numbers are non-negative and in increasing order. In details, it checks that:

  1. exactly three inputs have been provided;
  2. the first two are non-negative;
  3. the third is bigger than the second;
  4. the second is bigger than the first;
  5. the third is non-negative.

Very good, so the input pattern is three non-negative integers in increasing order. Fine. No clue about what those numbers should be though, yet.

Scroll the assembly just enough to unravel the magic.

A (pointer to) string is loaded into ebx, which contains the following Italian sentence:

Mi ha detto mio cuggino che una volta e’ stato co’ una che poi gli ha scritto sullo specchio benvenuto nell’AIDS, mio cuggino mio cuggino

The assembly basically takes the characters in the string that correspond to the first and second input (for ex, 0 as first input would map to the first char, M) and checks whether they are equal. If this is not satisfied, a Nope message is shown and the binary returns.

If this is satisfied, the same check is repeated with the third input (with the first one, although this doesn’t matter). If this is satisfied as well, a tricky sub.puts_640 function is called (with 5 inputs), and a Uhm message is shown.

Going to looking into that routine is absolutely useless as it’s completely unreadable, and even makes a bunch of additional calls that are further jumbled.

Continue reading

The one time pad and the many time pad vulnerability

The scope of this article is to present the one time pad cipher method and its biggest vulnerability: that of the many time pad.

The one time pad: what it is and how it works

The one time pad is the archetype of the idea of stream cipher. It’s very simple: if you want to make a message unintelligible to an eavesdropper, just change each character of the original message in a way that you can revert, but that looks random to another person.

The way the one time pad works is the following. Suppose \mathcal{M} is the clear-text message you would like to send securely, of length |\mathcal{M}| = s. First, you need to generate a string \mathcal{K} of equal length |\mathcal{K}| = s. Then, you can obtain a cipher-text version of your message by computing the bitwise XOR of the two strings:

    \[\mathcal{M} \oplus \mathcal{K}\]

The best thing is that decoding is just the same as encoding, as the XOR operator has the property that \mathcal{X} \oplus \mathcal{X} = 0 \ \forall X (and that \mathcal{X} \oplus 0 = \mathcal{X} \ \forall \mathcal{X}). The only difference is that the cipher-text is involved in the XOR, rather than the clear-text:

    \[\mathcal{C} \oplus \mathcal{K} = \mathcal{M} \oplus \mathcal{K} \oplus \mathcal{K} = \mathcal{M} \oplus 0 = \mathcal{M}\]

Below is an example of the one time pad encoding achieved with Python, with a made-up pad string.

In the first section, result holds the XOR result. In the second part, the result and one_time_pad variables are XORed together to obtain the original plain-text message again.

It is not difficult to realize that the whole strength of the algorithm lies in the \mathcal{K} pad. Of course, as an attacker, if you can obtain \mathcal{K} in some way, then it is not difficult to get the clear-text message from the ciphered one as well.

Continue reading

Getting started with Binary reverse engineering: an example

For a challenge in a university security class, I was given this file to crack: reverse1. I started with reverse0, which was considerably easier than the second one. In this post I will briefly explain how I tackled reverse1. I provided the files so you can you try on your own and then came back for hints if you are stuck! If you are new to this business, as I relatively am, I advise you to start from reverse0 and crack that first.

Hashes of reverse1 file: 
MD5 – c22c985acb7ca0f373b7279138213158
SHA256 – cd56541a75657630a2a0c23724e55f70e7f4f77300faf18e8228cd2cffe8248e

Disassembling and hoping for the best

The first thing I did was to disassemble the file with Radare to have a look at the code.

The assembly is quite jumbled up, and difficult to analyse all together. A quick look tells us that trying to crack the file just by reversing the assembly is no easy task, and actually a silly idea to begin with. There’s a cycle after the password is read from standard input, then some other instructions, then another cycle… it’s difficult to get what is going on…

Instead, let’s seek the Bad password print section, and see what should happen for the code to jump there. If we are lucky enough, we may find a bunch of final checks that will send over to the Bad password section. If we can find those, we may then look at those bits of assembly to understand how to avoid going there.

Scroll down enough, and down at the bottom I can see the Bad password part, starting at 0x080484f0.

Radare helps in showing two different arrows going into this address. The related comparisons are the following:

Continue reading

A method for calculating pattern relevance in a text

This article aims at presenting a method for computing the relevance of a given string (pattern) in a text. This algorithm is at the core of my WordPress plugin Smart Tag Insert.

First of all, there is a difference between a simple pattern matching and computing text pattern relevance. The question we are trying to address here is the following: I have a string, and I would like to know how much that string is relevant for a specific text. For example, let’s say we have “download music” as the string of which relevance we are interested into. How can we determine how much relevant it is for a specific article?

The simple approach

The easy thing one could try is run a pattern match of “download music” in the article text. That is okay, but suppose the article contained strings like “download the music”, or “download some music”, or “downloading music”, or “download good quality music”. It is clear that, to a human, all these strings are equivalent when trying to understand what the article is about: it is about downloading music, regardless of whether it is good, bad, a lot or little.

A simple pattern match would fall short, because it would exclude all those other strings and make it look like the content is not very much about downloading music, just because “download music” was never found exactly that way.

So the first point we need to acknowledge if we want to try to teach a machine to compute text pattern relevance, is that we need to find a way, at least a rough way, to teach it to grasp the meaning of the content.

Continue reading