Translated by Ivan Zakharov, ML enthusiast.

All content is distributed under the Creative Commons CC BY-NC-SA 4.0 license.

**Fill cells marked with "Your code here" and submit your answers to the questions through the web form.**

In [1]:

```
import pandas as pd
import os
from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly import graph_objs as go
import requests
import pandas as pd
print(__version__) # need 1.9.0 or greater
init_notebook_mode(connected = True)
def plotly_df(df, title = ''):
data = []
for column in df.columns:
trace = go.Scatter(
x = df.index,
y = df[column],
mode = 'lines',
name = column
)
data.append(trace)
layout = dict(title = title)
fig = dict(data = data, layout = layout)
iplot(fig, show_link=False)
```

First, read the data in as a `dataframe`

. Today we will predict the number of views of the Machine Learning wiki page. I downloaded the data using the Wikipediatrend library for `R`

.

In [2]:

```
df = pd.read_csv('../../data/wiki_machine_learning.csv', sep = ' ')
df = df[df['count'] != 0]
df.head()
```

Out[2]:

In [3]:

```
df.shape
```

Out[3]:

In [4]:

```
df.date = pd.to_datetime(df.date)
```

In [5]:

```
plotly_df(df.set_index('date')[['count']])
```

We will build a prediction using the simple library `Facebook Prophet`

. In order to evaluate the quality of the model, we drop the last 30 days from the training sample.

In [6]:

```
from fbprophet import Prophet
```

In [7]:

```
predictions = 30
df = df[['date', 'count']]
df.columns = ['ds', 'y']
train_df = df[:-predictions].copy()
```

In [8]:

```
# Your code here
```

** Question 1: ** What is the prediction of the number of views of the wiki page on January 20? Round to the nearest integer.

- 4947
- 3833
- 5229
- 2744

Estimate the quality of the prediction with the last 30 points.

In [9]:

```
# Your code here
```

** Question 2 **: What is MAPE equal to?

- 38.38
- 42.42
- 5.39
- 65.91

** Question 3 **: What is MAE equal to?

- 355
- 4007
- 713
- 903

In [10]:

```
%matplotlib inline
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.api as sm
```

** Question 4: ** Let's verify the stationarity of the series using the Dickey-Fuller test. Is the series stationary? What is the p-value?

- Series is stationary, p_value = 0.107
- Series is not stationary, p_value = 0.107
- Series is stationary, p_value = 0.001
- Series is not stationary, p_value = 0.001

In [11]:

```
# Your code here
```

** Question 5 **: Next, we turn to the construction of the SARIMAX model (`sm.tsa.statespace.SARIMAX`

). What is the best set of parameters (among listed) for the SARIMAX model according to the `AIC`

criterion?

- D = 1, d = 0, Q = 0, q = 2, P = 3, p = 1
- D = 2, d = 1, Q = 1, q = 2, P = 3, p = 1
- D = 1, d = 1, Q = 1, q = 2, P = 3, p = 1
- D = 0, d = 0, Q = 0, q = 2, P = 3, p = 1

In [26]:

```
# Your code here
```