## Open Machine Learning Course¶

Author: Mariya Mansurova, Analyst & developer in Yandex.Metrics team.
Translated by Ivan Zakharov, ML enthusiast.

# Assignment #9 (demo)

## Time series analysis

Fill cells marked with "Your code here" and submit your answers to the questions through the web form.

In [1]:
import pandas as pd
import os

from plotly import __version__
from plotly import graph_objs as go
import requests
import pandas as pd

print(__version__) # need 1.9.0 or greater

init_notebook_mode(connected = True)

def plotly_df(df, title = ''):
data = []

for column in df.columns:
trace = go.Scatter(
x = df.index,
y = df[column],
mode = 'lines',
name = column
)
data.append(trace)

layout = dict(title = title)
fig = dict(data = data, layout = layout)

2.7.0


## Data preparation¶

First, read the data in as a dataframe. Today we will predict the number of views of the Machine Learning wiki page. I downloaded the data using the Wikipediatrend library for R.

In [2]:
df = pd.read_csv('../../data/wiki_machine_learning.csv', sep = ' ')
df = df[df['count'] != 0]

Out[2]:
date count lang page rank month title
81 2015-01-01 1414 en Machine_learning 8708 201501 Machine_learning
80 2015-01-02 1920 en Machine_learning 8708 201501 Machine_learning
79 2015-01-03 1338 en Machine_learning 8708 201501 Machine_learning
78 2015-01-04 1404 en Machine_learning 8708 201501 Machine_learning
77 2015-01-05 2264 en Machine_learning 8708 201501 Machine_learning
In [3]:
df.shape

Out[3]:
(383, 7)
In [4]:
df.date = pd.to_datetime(df.date)

In [5]:
plotly_df(df.set_index('date')[['count']])


We will build a prediction using the simple library Facebook Prophet. In order to evaluate the quality of the model, we drop the last 30 days from the training sample.

In [6]:
from fbprophet import Prophet

In [7]:
predictions = 30

df = df[['date', 'count']]
df.columns = ['ds', 'y']
train_df = df[:-predictions].copy()

In [8]:
# Your code here


Question 1: What is the prediction of the number of views of the wiki page on January 20? Round to the nearest integer.

• 4947
• 3833
• 5229
• 2744

Estimate the quality of the prediction with the last 30 points.

In [9]:
# Your code here


Question 2 : What is MAPE equal to?

• 38.38
• 42.42
• 5.39
• 65.91

Question 3 : What is MAE equal to?

• 355
• 4007
• 713
• 903

## Predicting with ARIMA¶

In [10]:
%matplotlib inline
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.api as sm


Question 4: Let's verify the stationarity of the series using the Dickey-Fuller test. Is the series stationary? What is the p-value?

• Series is stationary, p_value = 0.107
• Series is not stationary, p_value = 0.107
• Series is stationary, p_value = 0.001
• Series is not stationary, p_value = 0.001
In [11]:
# Your code here


Question 5 : Next, we turn to the construction of the SARIMAX model (sm.tsa.statespace.SARIMAX). What is the best set of parameters (among listed) for the SARIMAX model according to the AIC criterion?

• D = 1, d = 0, Q = 0, q = 2, P = 3, p = 1
• D = 2, d = 1, Q = 1, q = 2, P = 3, p = 1
• D = 1, d = 1, Q = 1, q = 2, P = 3, p = 1
• D = 0, d = 0, Q = 0, q = 2, P = 3, p = 1
In [26]:
# Your code here