On knowing when to stop in software development

One of the very few things I learnt in art class is what the role of Jackson Pollock was in his art. Because, we were asked, what is the role of the artist, if the only thing he does is let paint drop on a canvas? His role is to decide when the work is complete.

This is something we most often overlook in computer science: there comes a time when a project, or a feature, is complete, and any more improvements, any more work put into it is likely to decrease its value and ruin all the good work. Too often we want progress in our applications, without realizing that it’s actually destroying them. Sometimes it’s just better to move on and work on something else. Even if a solution is 10 years old, it doesn’t mean it has to be updated because progress requires it.

Let me present a couple examples.

The Gutenberg editor in WordPress

WordPress 5 introduced the new Gutenberg editor, a project that has been rated with 2 stars out of 5 with a total of around 2000 reviews at the time of writing. It’s a product that is so buggy and un-usable that it is bewildering that it made it into Core, but whatever (in 10 minutes of usage, I found 7 crucial and unreported bugs just 4 months prior to release – see my review).

gutenberg reviewsLet us pause and ponder why it was introduced. Any apology of Gutenberg will say that is because the classic editor felt old. It looked so much like Office 2003, and it’s 2018, they say! You see, they say, 15 years in computer science is a huge deal!

But, you see, what is the main purpose of an editor? To write. And to that it must be apt. Gutenberg shifted the focus from writing content to designing a page, effectively forcing a progress in the wrong direction. Not much has changed in writing since Office 2003 came around: we still use bold, italic, headings, text alignment and little more. Anything else requires the careful crafting of a designer and writing of some HTML, as it should be. Nothing else is needed, really, when it comes to writing. But, they say, you cannot even create a table with the classic editor! And I say, that’s right, it should be possible! But that doesn’t require trashing a whole editor and building a cumbersome React-y thing just so that we can have tables, does it?

But, they say, this way you don’t need a designer to design your pages anymore. Of course, people must be really stupid if they have been paying web-developers/designers to put up their websites for the last 25 years, of course! So stupid of us! There, instead of hiring a professional photographer to shoot at your weeding, just give a compact camera to your uncle, since technology and progress have enabled you to do so. Because it really is just the same. When I was a kid, websites designed with Dreamweaver were looked down on, and anybody who wanted a real site should have hired a professional. Not it looks like everybody can do everything – expect that, uhm, they can’t.

Too often the right questions are not asked and carefully considered. Those are the most basic ones: do we really need this thing? How difficult is it to build it? Is it really worth it? What is the impact it will have on users/market? Does it add something really useful and needed without breaking anything else?

Continue reading “On knowing when to stop in software development”

Tasks un-owned are task that go forgotten

If you are a tech company, and your people commit code, then you probably have some code review policy. And if you do not, you definitely should: you want to have an extra pair of eyes on the code that goes live. You certainly do not want a mistake to break things. And that is why you do pull requests to contribute to GitHub repos, and why Google employees must have a certain degree of maturity to commit code without review.

BUT, as long as that is a good idea, we must be careful to implement it the right way. Just enforcing reviews is not enough. You want to make the time between the code is sent for review and the code is deployed as short as possible. The longer the review time span is, the more work will be needed when the review comes. That is because:

  1. Who wrote the code simply does not have it fresh in their mind anymore. The context switch between the current task and the code he wrote days/weeks ago is just more demanding;
  2. Conflicts are more likely to arise, and then more work is needed in solving them;
  3. Other issues may depend on the code being held for review. Other people may spend (waste) time debugging an issue for which a solution is already available;
  4. If the repo is public, it makes more difficult for other to jump in and contribute, because they also have to be aware of all the pending code.

The right way to implement a code review system in a tech company

I think the best way to implement a code review system is to:

  1. Assign each code review to a particular member of the team. If everybody owns a task, then nobody does as well. That is why you want that particular review to be a responsibility of someone specific. An automated system can randomly assign a review to a team member.
  2. Each code review comes with a deadline. That’s it: code reviews are as important as any other task – basically because every other task often generates a code review at some point, so if we lag on those, nothing gets carried to the end and we are getting no work done at all! We may have different priorities associated with different deadlines, but we want each review to expire at some point (with the longest being a couple days)!
  3. Team members can turn down their assignments, but only if they have a good reason to. Again, if code does not get reviewed, it cannot go live, and thus the work has been done for nothing. Reviews are important and must be considered as such.
  4. Then just track how it goes: who is turning down most assignments? Is the weight uniformly distributed across the team?

A/B testing on WordPress: a framework for developers and tutorial

Some months ago, I changed one link in the menu in my website postpaycounter.com. After that, it looked to me more people were purchasing products, i.e. the conversion rate had increased. But how to check whther that was really the case, or if it was just an accident/impression? Use an A/B test, I told myself!

With an A/B test, half of the users are served one version of the page, the one with the old link, and half of them another version of it, the one with the new link in place. When a sale happens, we may then log that as a success for the kind of page that was used, be it the A version or the B one.

In my case, the two versions of the page simply consisted of two different links in the menu, while I wanted the success to be logged when the user purchased something (I use Easy Digital Downloads to handle purchases).

I could find a bunch of plugins that allowed to set up A/B tests, but they all seemed pretty difficult to customize from a developer perspective, and I was already seeing myself wrestling with someone else’s code that provide tons of features useless to me, but through which was nearly impossible to interact with Easy Digital Downloads. So I decided to build my own, simple implementation, with the aim of it being tailored to developers rather than users who needed an interface. 

An A/B test implementation example

This is an example of how to use the little framework. To set up a test, you only need to provide two functions:

Continue reading “A/B testing on WordPress: a framework for developers and tutorial”

Numpy histogram density does not sum to 1

During a Computational Vision lab, while comparing histograms, I stumbled upon a peculiar behavior. The histograms pairwise kernel matrix – which is just a fancy name for the matrix holding histograms correlations one with another – did not have ones on the diagonal. This means that one histogram was not fully correlated to itself, which is weird.

numpy histogram integral not 1The comparison metric I was using is the simple histogram intersection one, defined as

    \[K_{hi} = \sum^{d}_{m=1}min(x_m,y_m)\]

Continue reading “Numpy histogram density does not sum to 1”

The one time pad and the many time pad vulnerability

The scope of this article is to present the one time pad cipher method and its biggest vulnerability: that of the many time pad.

The one time pad: what it is and how it works

The one time pad is the archetype of the idea of stream cipher. It’s very simple: if you want to make a message unintelligible to an eavesdropper, just change each character of the original message in a way that you can revert, but that looks random to another person.

The way the one time pad works is the following. Suppose \mathcal{M} is the clear-text message you would like to send securely, of length |\mathcal{M}| = s. First, you need to generate a string \mathcal{K} of equal length |\mathcal{K}| = s. Then, you can obtain a cipher-text version of your message by computing the bitwise XOR of the two strings:

    \[\mathcal{M} \oplus \mathcal{K}\]

The best thing is that decoding is just the same as encoding, as the XOR operator has the property that \mathcal{X} \oplus \mathcal{X} = 0 \ \forall X (and that \mathcal{X} \oplus 0 = \mathcal{X} \ \forall \mathcal{X}). The only difference is that the cipher-text is involved in the XOR, rather than the clear-text:

    \[\mathcal{C} \oplus \mathcal{K} = \mathcal{M} \oplus \mathcal{K} \oplus \mathcal{K} = \mathcal{M} \oplus 0 = \mathcal{M}\]

Below is an example of the one time pad encoding achieved with Python, with a made-up pad string.

In the first section, result holds the XOR result. In the second part, the result and one_time_pad variables are XORed together to obtain the original plain-text message again.

It is not difficult to realize that the whole strength of the algorithm lies in the \mathcal{K} pad. Of course, as an attacker, if you can obtain \mathcal{K} in some way, then it is not difficult to get the clear-text message from the ciphered one as well.

Continue reading “The one time pad and the many time pad vulnerability”

Getting started with Binary reverse engineering: an example

For a challenge in a university security class, I was given this file to crack: reverse1. I started with reverse0, which was considerably easier than the second one. In this post I will briefly explain how I tackled reverse1. I provided the files so you can you try on your own and then came back for hints if you are stuck! If you are new to this business, as I relatively am, I advise you to start from reverse0 and crack that first.

Hashes of reverse1 file: 
MD5 – c22c985acb7ca0f373b7279138213158
SHA256 – cd56541a75657630a2a0c23724e55f70e7f4f77300faf18e8228cd2cffe8248e

Disassembling and hoping for the best

The first thing I did was to disassemble the file with Radare to have a look at the code.

The assembly is quite jumbled up, and difficult to analyse all together. A quick look tells us that trying to crack the file just by reversing the assembly is no easy task, and actually a silly idea to begin with. There’s a cycle after the password is read from standard input, then some other instructions, then another cycle… it’s difficult to get what is going on…

Instead, let’s seek the Bad password print section, and see what should happen for the code to jump there. If we are lucky enough, we may find a bunch of final checks that will send over to the Bad password section. If we can find those, we may then look at those bits of assembly to understand how to avoid going there.

Scroll down enough, and down at the bottom I can see the Bad password part, starting at 0x080484f0.

Radare helps in showing two different arrows going into this address. The related comparisons are the following:

Continue reading “Getting started with Binary reverse engineering: an example”

Does C++ delete operator really free memory?

Well, I have been wondering about this for quite a while now, and I have tried to run some tests to better understand what’s going on under the hood. The standard answer is that after you call delete you should not expect anything good from accessing that memory spot. However, this did not seem enough to me. What is it really happening when calling delete(ptr)? Even though there no standard behavior, what could happen, anyway? Here’s what I’ve found. I’m using g++ on Ubuntu 16.04, so this may play a role in the results.

What I first expected when using the delete operator was that the freed memory would be handed back to the system for usage in other processes. Let me say this does not happen under any of the circumstances I have tried.

Memory released with delete still seem to be allocated to the program it first allocated it with new. I have tried, and there is no memory usage decrease after calling delete. I had a software which allocated around 30MB of lists through new calls, and then released them with subsequent delete calls. What happened is that, looking at the System monitor while the program was running, even a long sleep after the delete calls, memory consumption my the program was the same. No decrease! This means that delete does not release memory to the system.

In fact, it looks like memory allocated by a program is his forever! However, the point is that, if deallocated, memory can be used again by the same program without having to allocate any more. I tried to allocate 15MB, freeing them, and then allocating another 15MB of data after, and the program never used 30MB. System monitor always showed it around 15MB. What I did, in respect to the previous test, was just to change the order in which things happened: half allocation, half deallocation, other half of allocation.

So, apparently memory used by a program can increase, but never shrink. Continue reading “Does C++ delete operator really free memory?”

Tips and advice on being a freelance in Information Technology

What follows is Javier Silva’s interview to me. The interview is mostly focused on how what it is like to be a freelance in the IT field and how to start as a programmer (and how that may evolve into a business). It was first published on his blog in Spanish. He also did a small review of my Post Pay Counter plugin.

Please, Introduce yourself!

I’m Stefano from Italy. I study mathematics, but there are very few things I am not interested into. I am a web developer, a walker, a reader, and an amateur photographer. Those are just the things that take most of my time, but don’t believe I don’t do anything else!

You work as web developer… where did you study it? or how did you learn it?

I’ve never taken any classes on web developing or on any IT subject. I have just always been into computers and technology, and by reading/replying on forums, experimenting and lots of tutorials I have learnt all that I know. When I was 12, at school we were covering divisors, prime numbers, factorization, and the like, and… you know, homework was boring as hell! I was just tired of having to figure out whether a number was prime, or what its divisors were, so I wrote a little script that did it for me. That evolved in writing more complex software and slowly learning to write decent-quality code.

Homework was boring as hell! I was just tired of having to figure out whether a number was prime, or what its divisors were, so I wrote a little script that did it for me.

There are a lot of resources out there for people willing to learn. I believe the key is to play around and experiment. And, as always, a lot of practice is important.

What is your point of view about the “programming career”? Is it a competitive profession?

I believe it definitely is. Unless you aim at working within your own city, meeting customers face to face, you really face a lot of competition. If I am hiring someone to develop something for me, and I don’t require they live in my same city, then I can pick anyone from all over the world. And good luck to convince me that you are the best developer, and that I really want you.

Continue reading “Tips and advice on being a freelance in Information Technology”

bbPress – Anonymous Subscriptions

This add-on plugin for bbPress will allow anonymous users to subscribe to topics and get email notifications when a new reply is posted. The notification email includes an unsubscribe link.

bbPress notifications will keep to go out to registered users, this plugin only extends the thing to anonymous posters as well!

Download (it’s free!)

bbPress - Anonymous Subscriptions

A case example with >100% subscription rate

This is vital for support forums, for example. On Post Pay Counter support forums, I did not want customers to sign-up: I wanted them to be able to request support in a matter of minutes, without any hassle. I liked the idea of “enter your name and email and you’re done!” But I also felt like they needed to be notified when someone replied to help. It was not compulsory, of course, but I would have wanted it as a customer.

Continue reading “bbPress – Anonymous Subscriptions”

A method for calculating pattern relevance in a text

This article aims at presenting a method for computing the relevance of a given string (pattern) in a text. This algorithm is at the core of my WordPress plugin Smart Tag Insert.

First of all, there is a difference between a simple pattern matching and computing text pattern relevance. The question we are trying to address here is the following: I have a string, and I would like to know how much that string is relevant for a specific text. For example, let’s say we have “download music” as the string of which relevance we are interested into. How can we determine how much relevant it is for a specific article?

The simple approach

The easy thing one could try is run a pattern match of “download music” in the article text. That is okay, but suppose the article contained strings like “download the music”, or “download some music”, or “downloading music”, or “download good quality music”. It is clear that, to a human, all these strings are equivalent when trying to understand what the article is about: it is about downloading music, regardless of whether it is good, bad, a lot or little.

A simple pattern match would fall short, because it would exclude all those other strings and make it look like the content is not very much about downloading music, just because “download music” was never found exactly that way.

So the first point we need to acknowledge if we want to try to teach a machine to compute text pattern relevance, is that we need to find a way, at least a rough way, to teach it to grasp the meaning of the content.

Continue reading “A method for calculating pattern relevance in a text”