Here we cover:
Git, bash, and all¶
Apart from installing the environment, it’s highly recommended that you familiarize yourself with
git, GitHub and
bash. Learn git branching and GitHowTo are nice interactive tutorials to grasp the basics of git.
bash, it’s just very rewarding to be familiar with UNIX OS and command-line utils like
sort, etc. These utilities have been constantly optimized throughout several decades of UNIX existence, and many basic operations can be done with these
bash utils very efficiently: counting the number of lines in a file, replacing an expression with another one for all files in a folder, etc.
Setting the environment¶
You’ve got several alternatives to set up your learning environment:
Kaggle Notebooks or Azure ML, i.e. avoid local configurations and just use the browser
Pip & Anaconda or Poetry
Kaggle Notebooks or Azure ML¶
The easiest way to start working with course materials (no local software installations needed) is to visit Kaggle Dataset mlcourse.ai and fork some Notebooks (better to keep them private). All your Jupyter notebooks with Anaconda are live and running in your browser. Almost all needed datasets are there as well. However, uploading other datasets might be tiresome.
Pip & Anaconda or Poetry¶
Most python packages like
Sklearn can be installed manually with
pip – Python installer, e.g.
pip install numpy. Additionally, you’ll need
Vowpal Wabbit, and (maybe)
CatBoost for competitions.
The Anaconda 3 distribution is one of the best options as it already contains the latest Python with
Jupyter, and lots of other libraries. However, some other packages are also used in our course –
Vowpal Wabbit to name a few. In addition, the
Graphviz library must be installed. Installing some of them on Windows might be painful.
Poetry is an alternative Python packages and dependency manager.
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
Installing dependencies from the
This will install the required packages. For the rest, please refer to Poetry docs.
The recommended way of working with course materials is running Jupyter notebooks. If new to this, take a look at jupyter.org. In a nutshell, this is a way of mixing code, graphics, markdown, latex, etc. in a single development environment. Perfect for sharing your work/ideas, for prototyping and for working with educative materials.
To start working with the course materials (i.e. Jupyter notebooks):
install jupyter, this depends on how you set up the environemnt in the previous step
download/clone the course repo repo
jupyter-notebookfrom the downloaded directory mlcourse.ai.
this opens http://localhost:8888/tree (8888 is the default port) in your browser, from there you can run Jupyter notebooks in the
check Jupyter docs and the interactive demo (“try classic notebook”) to get hands dirty with Jupyter
Note: not to be confused with Jupyter Notebooks
The mlcourse.ai website now renders a Jupyter book. A strong advantage of this type of content is that it’s actually a book with executable content meaning that the pages that you see are not just static but they are updated with each build of the book by running all Python code. This also guarantees (well, if the book is frequently re-built, say, through a CI/CD process) that the book actually shows working Python code.
jupyter-book build mlcourse_ai_jupyter_book
Note: this may take a long time, about an hour, to play around with a toy example, check how a template JupyterBook is created.
Then, open the HTML file located at