Topic 5. Bagging and Random Forest#
Yet again, both theory and practice are exciting. We discuss why “wisdom of a crowd” works for machine learning models, and an ensemble of several models works better than each one of the ensemble members. In practice, we try out Random Forest (an ensemble of many decision trees) – a “default algorithm” in many tasks. We discuss in detail the numerous advantages of the Random Forest algorithm and its applications. No silver bullet though: in some cases, linear models still work better and faster.
Steps in this block#
1. Read 5 articles:
“Bagging” (same as a Kaggle Notebook);
“Random Forest” (same as a Kaggle Notebook);
“Feature Importance” (same as a Kaggle Notebook);
2. Watch a video lecture on coming in 3 parts:
the theory behind ensembles, bagging, and Random Forest;
classification metrics in machine learning;
business case, where we discuss a real classification task – predicting customer payment;
3. Complete demo assignment 5 (same as a Kaggle Notebook) where you compare logistic regression and Random Forest in the credit scoring problem;
4. Check out the solution (same as a Kaggle Notebook) to the demo assignment (optional);
5. Complete Bonus Assignment 5 were you’ll be applying logistic regression and Random Forest in two different tasks, which will be great for your understanding of application scenarios of these two extremely popular algorithms (optional, available under Patreon “Bonus Assignments” tier).