Machine Learning explained by Mango Shopping!

When I read this Quora question: Machine Learning: How do you explain Machine learning and Data Mining to non CS people?, I started wondering what would be an interesting analogy between a common activity that people can easily relate to, and the idea of “learning” algorithms.

Then, it struck me:

Mango Shopping

Suppose you go shopping for mangoes one day. The vendor has laid out a cart full of mangoes. You can handpick the mangoes, the vendor will weigh them, and you pay according to a fixed Rs per Kg rate (typical story in India).


Obviously, you want to pick the sweetest, most ripe mangoes for yourself (since you are paying by weight and not by quality). How do you choose the mangoes?

You remember your grandmother saying that bright yellow mangoes are sweeter than pale yellow ones. So you make a simple rule: pick only from the bright yellow mangoes. You check the color of the mangoes, pick the bright yellow ones, pay up, and return home. Happy ending?

Not quite.


Read the complete answer on Quora

EDIT: My answer seems to be getting a lot of attention, with many people who had little idea about machine learning earlier, appreciating the intuitive analogy with Mango Shopping! It reinforces my view that there is a real need for someone to provide simple and intuitive explanations for ML concepts (which I have tried to do in my other answers on Quora).



In layman’s terms, how does Gibbs Sampling work?

This question which was asked on Quora reminded me of the tough time I had during my Advanced Machine Learning course, trying to get a hold of Gibbs Sampling. Fortunately, since I seem to have understood the crucial idea behind the process by now, I thought it would be cool to try to simplify it for those who are getting sleepless nights because it.

The Gibbs sampling algorithm is one solution to a basic and important question:How do you sample values from a probability distribution?

Lets look at simple cases first. Suppose your distribution has a single variable X which takes two values:

P(X=0) = 0.5 and P(X=1) = 0.5

How do you sample a value of X? Simple, flip a coin. If its heads, X=1, else X=0. And if you are a computer program, call rand( ) (or any uniform random number generator of your choice) and if rand( ) > 0.5, then X=1, else X=0.

(Note: we assume rand( ) returns real numbers in the interval [0,1])

This was a binomial distribution. What if you have a multinomial distribution?

Read the complete answer on Quora

P.S. 1: Trying to simplify ML concepts through Quora answers seems to be a great way to sharpen my own idea of the same.

P.S. 2: I think a compilation of all difficult ML concepts explained in a simple way would do a great deal of benefit to the world (Don’t suppose something like this exists already). Hoping to write such a book some day 🙂

Support Vector Machines: Why is solving in the dual easier than solving in the primal?

Support vector machines are probably the most talked about classifiers. Writing an answer for this Quora question: Why is solving in the dual easier than in the primal, was a good opportunity for me to refresh my understanding of this concept.

Short answer: kernels. Long answer: keeerneeels. 🙂

A more intuitive answer:

The most significant benefit from solving the dual comes when you are using the “Kernel Trick” to classify data that is not linearly separable in the original feature space…

Read the complete answer on Quora

A Note On The Name Of The Blog

“Prior Wisdom” is a reference to Bayesian Inference.

Bayes Theorem

Bayes Theorem

Bayes Theorem states that how well we can make inferences from any event (the posterior probability) depends on what we already know about the world (the prior probability) and what we can say about how our hypotheses affect the evidence (the likelihood). The more accurately our priors depict the real world, the better we can make sense of the problem at hand. From the Wikipedia page on Prior Knowledge For Pattern Recognition:

Prior knowledge refers to all information about the problem available in addition to the training data. However, in this most general form, determining a model from a finite set of samples without prior knowledge is an ill-posed problem, in the sense that a unique model may not exist. […]

The importance of prior knowledge in machine learning is suggested by its role in search and optimization. Loosely, the no free lunch theorem states that all search algorithms have the same average performance over all problems, and thus implies that to gain in performance on a certain application one must use a specialized algorithm that includes some prior knowledge about the problem.

My take is that the all of human wisdom can be encoded in the prior. All learning effort is directed towards “knowing all there is to know” through modelling a prior that accurately depicts the nature of the world.

I see this pattern in all spheres of human knowledge (maybe its just a cognitive bias!), including both the natural and human sciences. Hmm, need to think about how can I use this prior wisdom. 🙂