What is Machine Learning?

Machine Learning (ML) is, as Tom Mitchell stated, “The study of algorithms that improve their performance P at some task T with experience E”. Another way of looking at it is that we are learning an algorithm that solves an inference problem or a model that describes some data set. We will discuss these two concepts at a basic level below and then introduce some cool things machine learning and deep learning have accomplished to hopefully incite some interest.


Inference means making a decision or prediction about some sort of data, perhaps in a probabilistic sense. For example, I might give you an image and say “is there a cat in this image?” or “what is the probability that there is a cat in this image?”


As you might imagine, it gets more complex and ambiguous when the thing you’re trying to predict is secluded in some way or only represents the idea of the thing you’re trying to predict instead of being the thing itself (e.g. the cat-faced bowl above).

There are perhaps more interesting inference problems where you don’t have complete information, but you know some related information. For example, if I give you the temperature in San Francisco, San Jose, and Fremont, can you predict (infer) the temperature in Palo Alto? Or If I give you the position and velocity of a car at time $t_{1}$, can you tell me the probability the car will be 5 meters north at time $t_{2}$? The output of an inference algorithm can either be a concrete decision (e.g. the image does have a cat in it) or a probability distribution over the set of possible outcomes (e.g. there is a 60% chance that the temperate in Palo Alto is between 50 and 65 degrees Fahrenheit).


Modeling allows us to describe data either qualitatively or numerically. There are typically geometric models, which are ones that try to find geometric structure in data and probabilistic models, which try to find a probability distribution given a bunch of samples of random variables.

An example of a geometric model might be: I have (square-footage, location, number of bedrooms) information for 1000 houses. How can I find some combination of these attributes that still comes close to fitting the data? This boils down to finding a lower-dimensional subspace that comes close to containing all the original data.

Data close to a lower dimensional subspace

Why study ML?

Besides all the buzzwords, machine learning, and indeed artificial intelligence in general, can do some pretty amazing stuff. More than 20 years ago we created AIs that could beat the world’s best chess player, and more recently DeepMind’s AlphaGo beat the world’s best Go player. These are well known examples. As we will learn soon, machine learning provides us with powerful tools to describe data. However, my personal favorite reason for studying machine learning is that it gives us a solid foundation to study more advanced topics such as those typically referred to as deep learning. This umbrella term includes everything from convolutional neural networks to generative adversarial networks to reinforcement learning agents that learn to play hide and seek. Studying ML provides us with mathematical and algorithmic frameworks for analyzing these exciting topics.

I’ll end with a relatively recent video by OpenAI, where agents learn how to coordinate with other agents to play hide and seek.

Credit: OpenAI