This was the final project for the Data Semantics course at university – A report on distributional semantics and Latent Semantic Analysis.
Here is the nicely-formatted pdf version (with references).
What is the Distributional Hypothesis
When it comes to Distributional Semantics and the Distributional Hypothesis, the slogan is often “You shall know a word by the company it keeps” (J.R. Firth).
The idea of the Distributional Hypothesis is that the distribution of words in a text holds a relationship with their corresponding meanings. More specifically, the more semantically similar two words are, the more they will tend to show up in similar contexts and with similar distributions. Stating the idea the other way round may be helpful: given two morphemes with different semantical meaning, their distribution is likely to be different.
For example, fire and dog are two words unrelated in their meaning, and in fact they are not often used in the same sentence. On the other hand, the words dog and cat are sometimes seen together, so they may share some aspect of meaning.
Mimicking the way children learn, Distributional Semantics relies on huge text corpora, the parsing of which would allow to gather enough information about words distribution to make some inference. These corpora are treated with statistical analysis techniques and linear algebra methods to extract information. This is similar to the way humans learn to use words: by seeing how they are used (i.e. coming across several examples in which a specific word is used).
The fundamental difference between human learning and the learning a distributional semantic algorithm could achieve is mostly related to the fact that humans have a concrete, practical experience to rely on. This allows them not only to learn the usage of a word, but to eventually understand its meaning. However, the way word meaning is inferred is still an open research problem in psychology and cognitive science.