Spring 2019 assignments

  1. Exploratory Data Analysis (EDA) of US flights, nbviewer. Deadline: February 24, 20:59 GMT
  2. In Assignment 2, you’ll be beating baselines in first two competitions:
    • Part 1. User Identification with Logistic Regression (beating baselines in the “Alice” competition), nbviewer. Deadline: March 10, 20:59 GMT
    • Part 2. Predicting Medium articles popularity with Ridge Regression (beating baselines in the “Medium” competition), nbviewer. Deadline: March 10, 20:59 GMT
    • providing reproducible solutions (if at least one baseline is beaten) - see roadmap. Deadline: March 17, 20:59 GMT
  3. Decision trees, Random Forest, and gradient boosting. Deadline: March 31, 20:59 GMT
    • Part 1. “Decision trees for classification and regression”, nbviewer
    • Part 2. “Random Forest and Logistic Regression in credit scoring and movie reviews classification”, nbviewer
    • Part 3. “Flight delays” competition, Kernel starter

Demo assignments, just for practice:

  1. Exploratory data analysis with Pandas, nbviewer, Kaggle Kernel, solution
  2. Analyzing cardiovascular disease data, nbviewer, Kaggle Kernel, solution
  3. Decision trees with a toy task and the UCI Adult dataset, nbviewer, Kaggle Kernel, solution
  4. Sarcasm detection, Kaggle Kernel, solution. Linear Regression as an optimization problem, nbviewer, Kaggle Kernel
  5. Logistic Regression and Random Forest in the credit scoring problem, nbviewer, Kaggle Kernel
  6. Exploring OLS, Lasso and Random Forest in a regression task, nbviewer, Kaggle Kernel, solution
  7. Unsupervised learning, nbviewer, Kaggle Kernel
  8. Implementing online regressor, nbviewer, Kaggle Kernel, solution
  9. Time series analysis, nbviewer, Kaggle Kernel, solution
  10. Beating baseline in a competition, Kaggle kernel