Bonus Assignment 5. Logistic Regression vs. Random Forest

Bonus Assignment 5. Logistic Regression vs. Random Forest#

../../_images/topic5-teaser.png

You can purchase a Bonus Assignments pack with the best non-demo versions of mlcourse.ai assignments. Select the “Bonus Assignments” tier on Patreon or a similar tier on Boosty (rus).

  

Details of the deal

mlcourse.ai is still in self-paced mode but we offer you Bonus Assignments with solutions for a contribution of $17/month. The idea is that you pay for ~1-5 months while studying the course materials, but a single contribution is still fine and opens your access to the bonus pack.

Note: the first payment is charged at the moment of joining the Tier Patreon, and the next payment is charged on the 1st day of the next month, thus it’s better to purchase the pack in the 1st half of the month.

mlcourse.ai is never supposed to go fully monetized (it’s created in the wonderful open ODS.ai community and will remain open and free) but it’d help to cover some operational costs, and Yury also put in quite some effort into assembling all the best assignments into one pack. Please note that unlike the rest of the course content, Bonus Assignments are copyrighted. Informally, Yury’s fine if you share the pack with 2-3 friends but public sharing of the Bonus Assignments pack is prohibited.


I personally prefer either logistic regression or Random Forest, depending on the task, to build a first baseline in a machine learning task. The reason is that these algorithms are less sensitive to hyperparameter choice and can work fairly well out-of-the-box. Hence, it’s important to understand when it’s better to prefer logistic regression and which tasks are more suitable for Random Forest.

In this assignment, you’ll be applying logistic regression and Random Forest in two different tasks – credit scoring and movie reviews classification. This will be great for your understanding of application scenarios of these two extremely popular algorithms. You’ll also learn the hard way that Random Forest should not be used in case of very large dimensions.