The course aims at a perfect balance between theory and practice. Therefore, prerequisites include:
Basic skills are required: writing loops, functions, classes etc. Passing some interactive tutorial like DataQuest, DataCamp or even CodeAcademy will suffice. However, deeper dive into Python is only appreciated, there will be some tasks where you have to implement an ML algo from scratch.
Knowledge of basic concepts from calculus, linear algebra, probability theory, and statistics is also required. If you need to catch up, a good resource will be Part I of the “Deep Learning” book or “Mathematics for Machine Learning”. For a deeper dive take a look at MIT courses.
Generally, installing the latest Anaconda 3 distribution is the best option (it contains latest Python with
Jupyter and lots of other libraries). However, some other packages are also used –
Vowpal Wabbit to name a few. Installing some of them on Windows might be painful.
You’ve got several alternatives to set up your learning environment:
- Kaggle Kernels & Azure ML
- Pip & Anaconda
Kaggle Kernels & Azure ML
The easiest way to start working with course materials is to visit Kaggle Dataset mlcourse.ai and fork some Kernels (please keep them private). All your Jupyter notebooks with Anaconda are live and running in your browser. Almost all needed datasets are there as well. However, uploading other datasets might be tiresome.
Pip & Anaconda
Most python packages like
Sklearn can be installed manually with
pip – python installer. However, the preferred option is to use Anaconda. Additionally, you’ll need
Vowpal Wabbit and (maybe)
CatBoost for competitions.
All necessary software is already installed and distributed in a form of a Docker container. Instructions:
- install Docker
- in case of Windows you might need to enable virtualization in BIOS
- clone and download the mlcourse.ai repository
- cd in terminal into
python run_docker_jupyter.py. First time it might take 5-10 minutes
- optionally, you can add some more installations to the Dockerfile file, build locally the Docker image (
docker build -t <tag_name> .) and run
python run_docker_jupyter.py -t <tag_name>
- go to
- execute all cells in notebook check_docker.ipynb to make sure all the libraries are installed and work fine.
Typically, Docker containers need a lot of disk space
- docker ps – list all containers
- docker stop $(docker ps -a -q) – stop all containers
- docker rm $(docker ps -a -q) – remove all containers
- docker images - list all docker images
- docker rmi <image_id> – remove a docker image
Docker documentation is full of concise and clear examples.
Regardless of the environment (pip, Kaggle Kernels/Azure or Docker), you’ll work with Jupyter notebooks. If new to this, take a look at jupyter.org. In a nutshell, this is a way of mixing code, graphics, markdown, latex etc. in single development environment. Perfect for sharing your work/ideas, for prototyping and for working with educative materials.
To start working with course materials (i.e. Jupyter notebooks), download/clone this) repo and run
jupyter-notebook from the downloaded directory mlcourse.ai.
Apart from installing the environment, it’s highly recommended that you familiarize yourself with GitHub and bash. And Docker, of course, if you choose this way of setting your environment.