Author: [Alexey Natekin](https://www.linkedin.com/in/natekin/), OpenDataScience founder, Machine Learning Evangelist. Translated and edited by [Olga Daykhovskaya](https://www.linkedin.com/in/odaykhovskaya/), [Anastasia Manokhina](https://www.linkedin.com/in/anastasiamanokhina/), [Egor Polusmak](https://www.linkedin.com/in/egor-polusmak/), and [Yuanyuan Pao](https://www.linkedin.com/in/yuanyuanpao/). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose. Today we are going to have a look at one of the most popular and practical machine learning algorithms: gradient boosting. ## Article outline We recommend going over this article in the order described below, but feel free to jump around between sections. 1. [Introduction and history of boosting](#introduction-and-history-of-boosting) - [History of Gradient Boosting Machine](#history-of-gbm) 1. [GBM algorithm](#gbm-algorithm) - [ML Problem statement](#ml-problem-statement) - [Functional gradient descent](#functional-gradient-descent) - [Friedman's classic GBM algorithm](#friedmans-classic-gbm-algorithm) - [Step-by-step example of the GBM algorithm](#step-by-step-example-how-gbm-works) 1. [Loss functions](#loss-functions) - [Regression loss functions](#regression-loss-functions) - [Classification loss functions](#classification-loss-functions) - [Weights](#weights) 1. [Conclusion](#4conclusion) 1. [Useful resources](#useful-resources) ## 1. Introduction and history of boosting Almost everyone in machine learning has heard about gradient boosting. Many data scientists include this algorithm in their data scientist's toolbox because of the good results it yields on any given (unknown) problem. Furthermore, XGBoost is often the standard recipe for [winning](https://github.com/dmlc/xgboost/blob/master/demo/README.md#usecases) [ML competitions](http://blog.kaggle.com/tag/xgboost/). It is so popular that the idea of stacking XGBoosts has become a meme. Moreover, boosting is an important component in [many recommender systems](https://en.wikipedia.org/wiki/Learning_to_rank#Practical_usage_by_search_engines); sometimes, it is even considered a [brand](https://yandex.com/company/technologies/matrixnet/). Let's look at the history and development of boosting. Boosting was born out of [the question:](http://www.cis.upenn.edu/~mkearns/papers/boostnote.pdf) is it possible to get one strong model from a large amount of relatively weak and simple models? By saying "weak models", we do not mean simple basic models like decision trees but models with poor accuracy performance, where poor is a little bit better than random. [A positive mathematical answer](http://www.cs.princeton.edu/~schapire/papers/strengthofweak.pdf) to this question was identified, but it took a few years to develop fully functioning algorithms based on this solution e.g. AdaBoost. These algorithms take a greedy approach: first, they build a linear combination of simple models (basic algorithms) by re-weighing the input data. Then, the model (usually a decision tree) is built on earlier incorrectly predicted objects, which are now given larger weights.