Understanding Gradient Descent: The Optimization Algorithm Behind Machine Learning

4 min readJun 4, 2024

In this image we can see that how by using gradient descent we are minimizing the cost function or loss function by taking small steps towards blue area where the loss is minimum. — Gradient Descent Optimization

In the world of machine learning and deep learning, optimization plays a crucial role in training models to make accurate predictions. One of the most commonly used optimization algorithm is Gradient Descent. In this story we will understand gradient descent in deep without getting into math.

What is Gradient Descent ?

Gradient Descent is an optimization algorithm commonly used in machine learning and deep learning to minimize the cost function or loss function (Error between the actual value and the predicted value).

Now, as you know that what is gradient descent let’s talk about how it works and what are the steps inside it. We will understand gradient descent with real life examples.

Understanding Gradient Descent:

Compute the Gradient:

To compute gradient descent without getting into math, I will try to give my best to make you understand with example.

Imagine you’re hiking in the mountains, and you goal is to reach the lowest point in a valley. However, it’s foggy, so you can only see the ground immediately around you. To find the way down, you can do the following:

Look Around: Check the slope of the ground at your current position. Is it sloping up, down, or flat?
Determine Direction: Find the direction in which the ground slopes downward the steepest.
Take a Step: Move a small step in that direction.
Repeat: Repeat the process until you reach the bottom of the valley.

If we take a look on it in a picture so it will look like this.

As you can see in the above image that we are taking small steps towards the lowest point in the valley.

Now let’s understand the same example in machine learning.

Gradient Descent in Machine Learning

In machine learning, we use a similar process to minimize a “cost function” (which measures how well our model is performing, in the case of our example it’s the error in our step taking that we are taking the right step are not). Let’s understand how it works:

Start with an Initial Guess: Begin with some initial values for the model’s parameters (like weights in a neural network).
Evaluate the Cost Function: Calculate how well the model is doing with the current parameters. This is done by evaluating the cost function, which tells us the error or loss.
Compute the Gradient: Instead of checking the ground’s slope, we compute the gradient, which tells us the direction in which the cost function increases the most. Essentially, it tells us which direction to move our parameters to reduce the cost the fastest.
Update the Parameters: Move the parameters a small step in the opposite direction of the gradient. This step size is controlled by the “learning rate.” Just like taking a step in the direction that slopes downward the steepest.
Repeat: Keep repeating the process of computing the gradient and updating the parameters until the cost function is minimized (or until it stops changing significantly).

Real-world Example

Imagine you have a plant, and you want to determine the best amount of water and sunlight it needs to grow the tallest. You start with a guess, give the plant a certain amount of water and sunlight, and observe its growth (this is like evaluating the cost function).

Observation: You notice that the plant’s growth is not optimal.
Adjust: You tweak the amounts of water and sunlight slightly and see if the plant grows taller.
Direction: If giving it more water improves growth but more sunlight doesn’t, you realize you should increase water but not sunlight (this is like computing the gradient).
Repeat: You continue adjusting the water and sunlight levels, each time improving the plant’s growth until you find the optimal combination.

In this analogy:

The plant’s growth is the performance of your model.
Water and sunlight are the model’s parameters.
Observing the plant’s growth is like evaluating the cost function.
Tweaking the amounts based on growth is like computing the gradient and updating the parameters.

By repeatedly adjusting the parameters in the direction that reduces the error (just like finding the right combination of water and sunlight), we gradually improve the model’s performance, eventually reaching an optimal set of parameters.

The steps we walk through in this whole analogy is:

Initialization
Computing Gradient
Updating Parameters
Iterating

These are the steps taken in Gradient Descent.

Advantages and Disadvantages

Advantages:

Simple to understand and implement.
Efficient for large datasets (especially with Mini-Batch and Stochastic variants).

Disadvantages:

Sensitive to the choice of learning rate.
Can get stuck in local minima for non-convex functions.
Requires careful tuning of hyperparameters like learning rate and batch size.

Hope you guys understand it see you in next story about Stochastic Gradient Descent.

ba bye…… Taseer Mehboob