Topic 1. Exploratory data analysis with Pandas#

../../_images/pandas.jpg

Diving into Machine Learning and seeing the math in action is certainly an exciting prospect. However, a significant portion of working on real-world projects, around 70-80%, is actually spent on preparing and cleaning the data. This is where Pandas comes in handy and proves to be a valuable tool, as I use it on a daily basis in my work. This article outlines the essential Pandas methods for preliminary data analysis. We will then analyze a dataset on telecom customer churn and attempt to predict it using common sense alone (and Pandas of course), without any model training. Don’t underestimate the power of such an approach.

Steps in this block#

  1. Read the article “Exploratory data analysis with Pandas” (same in a form of a Kaggle Notebook);

  2. Watch a video lecture “Pandas & Data Analysis” (optional);

  3. Complete demo assignment 1 (same as a Kaggle Notebook) where you’ll be exploring demographic data, the UCI “Adult” dataset;

  4. Check out the solution (same as a Kaggle Notebook) to the demo assignment (optional);

  5. Complete Bonus Assignment 1 where you’ll be analyzing the history of the Olympic Games with Pandas (optional, available under Patreon “Bonus Assignments” tier).