Assignment #8 (demo). Implementation of online regressor#
Author: Yury Kashnitsky. Translated by Sergey Oreshkov. This material is subject to the terms and conditions of the Creative Commons CC BY-NC-SA 4.0 license. Free use is permitted for any non-commercial purpose.
Same assignment as a Kaggle Notebook + solution.
Here we’ll implement a regressor trained with stochastic gradient descent (SGD). Fill in the missing code. If you do everything right, you’ll pass a simple embedded test.
Linear regression and Stochastic Gradient Descent#
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator
from sklearn.metrics import log_loss, mean_squared_error, roc_auc_score
from sklearn.model_selection import train_test_split
from tqdm import tqdm
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
Implement class SGDRegressor. Specification:
class is inherited from
sklearn.base.BaseEstimatorconstructor takes parameters
eta– gradient step (\(10^{-3}\) by default) andn_epochs– dataset pass count (3 by default)constructor also creates
mse_andweights_lists in order to track mean squared error and weight vector during gradient descent iterationsClass has
fitandpredictmethodsThe
fitmethod takes matrixXand vectory(numpy.arrayobjects) as parameters, appends column of ones toXon the left side, initializes weight vectorwwith zeros and then makesn_epochsiterations of weight updates (you may refer to this article for details), and for every iteration logs mean squared error and weight vectorwin corresponding lists we created in the constructor.Additionally the
fitmethod will createw_variable to store weights which produce minimal mean squared errorThe
fitmethod returns current instance of theSGDRegressorclass, i.e.selfThe
predictmethod takesXmatrix, adds column of ones to the left side and returns prediction vector, using weight vectorw_, created by thefitmethod.
class SGDRegressor(BaseEstimator):
# you code here
def __init__(self):
pass
def fit(self, X, y):
pass
def predict(self, X):
pass
Let’s test out the algorithm on height/weight data. We will predict heights (in inches) based on weights (in lbs).
# for Jupyter-book, we copy data from GitHub, locally, to save Internet traffic,
# you can specify the data/ folder from the root of your cloned
# https://github.com/Yorko/mlcourse.ai repo, to save Internet traffic
DATA_PATH = "https://raw.githubusercontent.com/Yorko/mlcourse.ai/main/data/"
data_demo = pd.read_csv(DATA_PATH + "weights_heights.csv")
plt.scatter(data_demo["Weight"], data_demo["Height"])
plt.xlabel("Weight (lbs)")
plt.ylabel("Height (Inch)")
plt.grid();
X, y = data_demo["Weight"].values, data_demo["Height"].values
Perform train/test split and scale data.
X_train, X_valid, y_train, y_valid = train_test_split(
X, y, test_size=0.3, random_state=17
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train.reshape([-1, 1]))
X_valid_scaled = scaler.transform(X_valid.reshape([-1, 1]))
Train created SGDRegressor with (X_train_scaled, y_train) data. Leave default parameter values for now.
# you code here
Draw a chart with training process – dependency of mean squared error from the i-th SGD iteration number.
# you code here
Print the minimal value of mean squared error and the best weights vector.
# you code here
Draw chart of model weights (\(w_0\) and \(w_1\)) behavior during training.
# you code here
Make a prediction for hold-out set (X_valid_scaled, y_valid) and check MSE value.
# you code here
sgd_holdout_mse = 10
Do the same thing for LinearRegression class from sklearn.linear_model. Evaluate MSE for hold-out set.
# you code here
linreg_holdout_mse = 9
try:
assert (sgd_holdout_mse - linreg_holdout_mse) < 1e-4
print("Correct!")
except AssertionError:
print(
"Something's not good.\n Linreg's holdout MSE: {}"
"\n SGD's holdout MSE: {}".format(linreg_holdout_mse, sgd_holdout_mse)
)
Something's not good.
Linreg's holdout MSE: 9
SGD's holdout MSE: 10