Gradient Boosting Explained


What is Gradient boosting?
Gradient boosting is a machine learning technique for regression and classification
problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.

It is also known as Gradient Boosted Decision Trees (GBDT), because it is mainly used in conjunction with trees.

How it works?
Gradient Boosting is a collection of many weak learners that trains on the data and try to reduce the residuals. It trains the weak learners sequencially and each new learner gradually minimizes the loss function of the whole system.
This is done using the Gradient Descent method.

So, what is Gradient Descent?
It is an optimization algorithm that finds the optimal weights for the loss function that reduces the total error of the system.

say, we have a loss function: y = ax + b + e ; where 'e' = error term

Now, the idea behind this is to minimize the error. Say we have a very high error for our model (starting point at the graph below) and we need to be at the end point where the total error is very less. So, in order to travel from point A to B, we need to take a path where the error descents gradually.

step 1: first initialize a bunch of random weights to the parameters 'a' and 'b' and calculate the errors. 
step 2: now calculate the descent, i.e. change in error with every randomly initialized weights. This helps us to move in the direction in which the error is less.
step 3: now we have reached a new position (say A1) where the error is little less than the original error. Again adjust the weights to the parameters 'a' and 'b' at position A1 with some random weights. Calculate the descent and move in the direction where the error is less.
step 4: repeat this process until we have found the optimal values of 'a' and 'b' where the error is at the lowest, and further adjusting those weights does not minimize the error.
step 5: use the new weights (optimal) for prediction.

This is how gradient descent helps in minimizing the error. 


Now, lets get back to Gradient Boosting and see how it works with the help of gradient descent.

step 1: first train a simple predictor to the data and then train a new model to predict the errors which we got from the first model.

step 2: then combine both the predictors to form an ensemble model. Train this ensemble model to the data and again train a new predictor to predict the errors which we got from the ensemble model.

step 3: repeat this process till we wish to. At last we will get a complex robust predictor which minimizes the error vastly for the whole system.



This is how Gradient Boosting works by giving weights to the errors at each level of the learning process. As a result, it minimizes errors and enhances a large boost in model performance.


Follow my Machine Learning projects in Github to see how Gradient Boosting can be implemented using Scikit-Learn and how to tune its parameters to improve performance.


Comments

Popular posts from this blog

A/B Testing: Bernoulli Model VS the Beta-Binomial Hierarchical Model

Exploratory Data Analysis and Hypothesis Testing - Loan Prediction

Recurrent Neural Networks and LSTM explained