Prerequisites

The course aims at a perfect balance between theory and practice. Therefore, prerequisites include:

  • Python
  • Math
  • Software
  • DevOps

Python

Basic skills are required: writing loops, functions, classes etc. Passing some interactive tutorial like DataQuest, DataCamp or even CodeAcademy will suffice. However, deeper dive into Python is only appreciated, there will be some tasks where you have to implement an ML algo from scratch.

Math

Knowledge of basic concepts from calculus, linear algebra, probability theory, and statistics is also required. If you need to catch up, a good resource will be Part I of the “Deep Learning” book or “Mathematics for Machine Learning”. For a deeper dive take a look at MIT courses.

Software requirements

Generally, installing the latest Anaconda 3 distribution is the best option (it contains latest Python with NumPy, Pandas, Sklearn, Jupyter and lots of other libraries). However, some other packages are also used – Xgboost and/or LightGBM and/or CatBoost and Vowpal Wabbit to name a few. Installing some of them on Windows might be painful.

You’ve got several alternatives to set up your learning environment:

  • Kaggle Kernels & Azure ML
  • Pip & Anaconda
  • Docker

Kaggle Kernels & Azure ML

The easiest way to start working with course materials is to visit Kaggle Dataset mlcourse.ai and fork some Kernels (please keep them private). All your Jupyter notebooks with Anaconda are live and running in your browser. Almost all needed datasets are there as well. However, uploading other datasets might be tiresome.

Pip & Anaconda

Most python packages like NumPy, Pandas or Sklearn can be installed manually with pip – python installer. However, the preferred option is to use Anaconda. Additionally, you’ll need Xgboost, Vowpal Wabbit and (maybe) LightGBM and CatBoost for competitions.

Docker

All necessary software is already installed and distributed in a form of a Docker container. Instructions:

  • install Docker
  • in case of Windows you might need to enable virtualization in BIOS
  • clone and download the mlcourse.ai repository
  • cd in terminal into mlcourse.ai
  • execute python run_docker_jupyter.py. First time it might take 5-10 minutes
  • optionally, you can add some more installations to the Dockerfile file, build locally the Docker image (docker build -t <tag_name> .) and run python run_docker_jupyter.py -t <tag_name>
  • go to localhost:4545
  • execute all cells in notebook check_docker.ipynb to make sure all the libraries are installed and work fine.

Typically, Docker containers need a lot of disk space

  • docker ps – list all containers
  • docker stop $(docker ps -a -q) – stop all containers
  • docker rm $(docker ps -a -q) – remove all containers
  • docker images - list all docker images
  • docker rmi <image_id> – remove a docker image

Docker documentation is full of concise and clear examples.

Jupyter

Regardless of the environment (pip, Kaggle Kernels/Azure or Docker), you’ll work with Jupyter notebooks. If new to this, take a look at jupyter.org. In a nutshell, this is a way of mixing code, graphics, markdown, latex etc. in single development environment. Perfect for sharing your work/ideas, for prototyping and for working with educative materials.

To start working with course materials (i.e. Jupyter notebooks), download/clone this) repo and run jupyter-notebook from the downloaded directory mlcourse.ai.

DevOps

Apart from installing the environment, it’s highly recommended that you familiarize yourself with GitHub and bash. And Docker, of course, if you choose this way of setting your environment.