A tale in topology – The large clovers meadow

A small tale with a topological soul, with the aim of providing a very high level intuition for the notion of density and dense set in topology.

Smally Open was a cheeky youngster of the Open family who lived in a large meadow. It was a very nice and green meadow, well cared for and abundant in clovers. Smally Open liked it much because he found it magic. Oh, I wish you were there as well, my smally readers: Smally Open climbed the highest trees, went as far as he could from his home, went down below the blades of grass, but there was no way: anyway he was, Smally Open would always see his verdant meadow, as if it moved with him and changed its look. Anyway he looked, Smaly Open would always see a large green meadow.

You will wonder, my dears, if Smally Open did not get bored to death in always seeing the very same meadow anywhere he went… well, I hope you do not believe that it was just all alike! The meadow looked all the same if you looked from afar and without attention, but sometimes there was a butterfly here, some other time there was an amazingly beautiful red leaf there: at any rate, there was always some new thing to fascinate him.

Anyway, when Smally Open did get bored, there always was a nice way he liked to use to spend his time: go and bother Mr. Dense-D. He was pretty a weird and extraordinary sir, in fact he had a lot of heads (how many, you ask? Try to count the stars in a cloudless night: Mr Dense-D had at least three-times-twice-one-thousand-times-that-number-of-heads!) You should not however think of D-Dense as a green weird martian with several scary heads, going away pecking at children’s ears, no! D-Dense was a very distinguished man, very kind and happy.

Indeed, D-Dense was the happiness of the meadow in which he lived together with Smally Open, because he was its clovers. But not a few crumpled and frail clovers, oh well stop with these prejudices towards Mr. D-Dense! No, those were beautiful clovers, big and comfortable, and above all there was one in any part of the meadow you would look for. When Smally Open was tired, he would look around and immediately find one of the heads of Mr. D-Dense, upon which he could rest under the sunlight.

Apertolo, paying, would often jump from one blade of grass to another. When he would lose his balance and fall in the void he was always lucky that Mr. D-Dense lived in his same meadow, because were it not for his many heads that would catch him, falling on the ground he would at least get a big bruise! But Smally Open sometimes take advantage of Mr. D-Dense’s kindness: he would jump on his heads from tree tops on purpose, he would eat on the clovers, crumbling on D-Dense’s heads, or he would ruffle all his hair intentionally. At any rate, he would really bother him! Mr. D-Dense was a patient man, but one day he was fed up and said: “Enough, I am old and tired: I need a quiet holiday. I am going to leave.” And so he started lifting himself.

First, Smally Open fell on the ground. Then he heard a great rumble, as if the whole land was shaking, and saw all the clovers moving and lifting, with mountains of soil up in the air, until Mr. D-Dense headed with all his clover-heads towards the horizon.

Continue reading

A/B testing on WordPress: a framework for developers and tutorial

Some months ago, I changed one link in the menu in my website postpaycounter.com. After that, it looked to me more people were purchasing products, i.e. the conversion rate had increased. But how to check whther that was really the case, or if it was just an accident/impression? Use an A/B test, I told myself!

With an A/B test, half of the users are served one version of the page, the one with the old link, and half of them another version of it, the one with the new link in place. When a sale happens, we may then log that as a success for the kind of page that was used, be it the A version or the B one.

In my case, the two versions of the page simply consisted of two different links in the menu, while I wanted the success to be logged when the user purchased something (I use Easy Digital Downloads to handle purchases).

I could find a bunch of plugins that allowed to set up A/B tests, but they all seemed pretty difficult to customize from a developer perspective, and I was already seeing myself wrestling with someone else’s code that provide tons of features useless to me, but through which was nearly impossible to interact with Easy Digital Downloads. So I decided to build my own, simple implementation, with the aim of it being tailored to developers rather than users who needed an interface. 

An A/B test implementation example

This is an example of how to use the little framework. To set up a test, you only need to provide two functions:

Continue reading

Numpy histogram density does not sum to 1

During a Computational Vision lab, while comparing histograms, I stumbled upon a peculiar behavior. The histograms pairwise kernel matrix – which is just a fancy name for the matrix holding histograms correlations one with another – did not have ones on the diagonal. This means that one histogram was not fully correlated to itself, which is weird.

numpy histogram integral not 1The comparison metric I was using is the simple histogram intersection one, defined as

    \[K_{hi} = \sum^{d}_{m=1}min(x_m,y_m)\]

Continue reading

The Distributional Hypothesis: semantic models in theory and practice

This was the final project for the Data Semantics course at university – A report on distributional semantics and Latent Semantic Analysis.

Here is the nicely-formatted pdf version (with references).

What is the Distributional Hypothesis

When it comes to Distributional Semantics and the Distributional Hypothesis, the slogan is often “You shall know a word by the company it keeps” (J.R. Firth).

The idea of the Distributional Hypothesis is that the distribution of words in a text holds a relationship with their corresponding meanings. More specifically, the more semantically similar two words are, the more they will tend to show up in similar contexts and with similar distributions. Stating the idea the other way round may be helpful: given two morphemes with different semantical meaning, their distribution is likely to be different.

For example, fire and dog are two words unrelated in their meaning, and in fact they are not often used in the same sentence. On the other hand, the words dog and cat are sometimes seen together, so they may share some aspect of meaning.

Mimicking the way children learn, Distributional Semantics relies on huge text corpora, the parsing of which would allow to gather enough information about words distribution to make some inference. These corpora are treated with statistical analysis techniques and linear algebra methods to extract information. This is similar to the way humans learn to use words: by seeing how they are used (i.e. coming across several examples in which a specific word is used).

The fundamental difference between human learning and the learning a distributional semantic algorithm could achieve is mostly related to the fact that humans have a concrete, practical experience to rely on. This allows them not only to learn the usage of a word, but to eventually understand its meaning. However, the way word meaning is inferred is still an open research problem in psychology and cognitive science.

Continue reading

Brute force a crackme file password with Python

I was to reverse a file for a challenge, MD5 hash 85c9feed0cb0f240a62b1e50d1ab0419.

The challenge was called mio cuggino, purposefully misspelled with two g letters. It asks for three numbers. The challenge led me to a brute force of the password with a Python script, learning how to interact with a subprocess stdin and stdout (SKIP to next section if you don’t care about context but only want the code).

Looking at the assembly with Radare, the first thing it does is to check that the numbers are non-negative and in increasing order. In details, it checks that:

  1. exactly three inputs have been provided;
  2. the first two are non-negative;
  3. the third is bigger than the second;
  4. the second is bigger than the first;
  5. the third is non-negative.

Very good, so the input pattern is three non-negative integers in increasing order. Fine. No clue about what those numbers should be though, yet.

Scroll the assembly just enough to unravel the magic.

A (pointer to) string is loaded into ebx, which contains the following Italian sentence:

Mi ha detto mio cuggino che una volta e’ stato co’ una che poi gli ha scritto sullo specchio benvenuto nell’AIDS, mio cuggino mio cuggino

The assembly basically takes the characters in the string that correspond to the first and second input (for ex, 0 as first input would map to the first char, M) and checks whether they are equal. If this is not satisfied, a Nope message is shown and the binary returns.

If this is satisfied, the same check is repeated with the third input (with the first one, although this doesn’t matter). If this is satisfied as well, a tricky sub.puts_640 function is called (with 5 inputs), and a Uhm message is shown.

Going to looking into that routine is absolutely useless as it’s completely unreadable, and even makes a bunch of additional calls that are further jumbled.

Continue reading