Topic 3. Classification, Decision Trees, and k Nearest Neighbors#

../../_images/topic3-teaser.png

Here we delve into machine learning and discuss two simple approaches to solving the classification problem. In a real project, you’d better start with something simple, and often you’d try out decision trees or nearest neighbors (as well as linear models, the next topic) right after even simpler heuristics. We discuss the pros and cons of trees and nearest neighbors. Also, we touch upon the important topic of assessing the quality of model predictions and performing cross-validation. The article is long, but decision trees, in particular, deserve it – they make a foundation for Random Forest and Gradient Boosting, two algorithms that you’ll be likely using in practice most often.

Steps in this block#

  1. Read the article (same as a Kaggle Notebook);

  2. Watch a video lecture coming in 2 parts: the theory behind decision trees, an intuitive explanation, and practice with Sklearn decision trees;

  3. Complete demo assignment 3 (same as a Kaggle Notebook) on decision trees;

  4. Check out the solution (same as a Kaggle Notebook) to the demo assignment (optional);

  5. Complete Bonus Assignment 3 where you’ll go through the math of decision trees, practice with Sklearn’s implementation and then implement this algorithm on your own, from scratch (optional, available under Patreon “Bonus Assignments” tier).