In the first lab, you implemented a simple linear regression algorithm. Despite its simplicity, this algorithm is quite powerful and can fulfill many of your ML needs. It is not surprising, then, that ML researchers have come up with all sorts of tricks to make it even more effective. In this lab, we have a look at a number of them. However, let us revise the basic model first.
For the sake of reproducibility, let us set the seed of the random generator:
import numpy as np
# make numpy randomisation predictable
np.random.seed(0)
Then, let us import the Boston house price dataset included in the scikit library:
from sklearn.datasets import load_boston
# load dataset
boston = load_boston()
And split the dataset in two: 80% training set and 20% test set. It sounds familiar, doesn't it?
# partition the dataset into training and test sets
rnd_indices = np.random.permutation(boston.data.shape[0])
train_size = int(boston.data.shape[0] * 0.8)
train_indices = rnd_indices[:train_size]
test_indices = rnd_indices[train_size:]
train_data = boston.data[train_indices, :]
test_data = boston.data[test_indices, :]
train_target = boston.target[train_indices]
test_target = boston.target[test_indices]
Exercise 1.1. As a quick warm-up, modify the following code to print the fifth and sixth feature.
import matplotlib.pyplot as plt
plt.figure()
plt.plot(train_data[:,0], train_target, "*")
# Solution
import matplotlib.pyplot as plt
plt.subplot(1,2,1)
plt.plot(train_data[:,4], train_target, "*")
plt.xlabel('Nitric oxides concentration\n(parts per 10 million)')
plt.ylabel('Median price ($1000)')
plt.subplot(1,2,2)
plt.xlabel('Average number of rooms per dwelling')
plt.plot(train_data[:,5], train_target, "*")
plt.show()
Finally, it is time to train our linear regression model:
from sklearn.linear_model import LinearRegression
# fit a linear regressor
lin_reg = LinearRegression()
lin_reg.fit(train_data, train_target)
Exercise 1.2. Modify the following code to print both the training and the testing RMSE.
train_predict = lin_reg.predict(train_data)
train_rmse = np.sqrt(((train_target - train_predict) ** 2).mean())
print(train_rmse)
# Solution
from sklearn.metrics import mean_squared_error
from math import sqrt
test_predict = lin_reg.predict(test_data)
test_rmse = sqrt(mean_squared_error(test_target, test_predict))
print('Test RMSE = ', test_rmse)
The first trick we look at is a data cleaning technique, and quite a general one. Before feeding the training set to your favourite ML algorithm, it is good practice to normalise the input features. This means scaling them so that their values fall more or less in the same range. The scikit library offers two different scaling methods: min-max scaling, and standard scaling.
Exercise 2.1. What do this methods do (explain in 1-2 sentences)? Hint: just google the name of the method and browse the scikit documentation!
Solution Min-max scaling or MinMaxScaler in the Scikit-learn library transforms features by scaling each feature to a given range; see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler. Standard scaling or in the Scikit-learn library transforms features to have zero mean and unit variance; see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html. Have a look at the tutorial https://scikit-learn.org/stable/modules/preprocessing.html since pre-processing is an important step before applying any Machine Learning algorithm.
Let us pick the standard scaling method and write some sample code:
scaler = StandardScaler()
scaler.fit(train_data)
scaled_train_data = scaler.transform(train_data)
scaled_test_data = scaler.transform(test_data)
Exercise 2.2. Modify the code above to print the eleventh feature before and after the transformation.
# Solution
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(train_data)
scaled_train_data = scaler.transform(train_data)
scaled_test_data = scaler.transform(test_data)
Exercise 2.3. Retrain the vanilla linear regressor on the scaled training set and make prediction on the (scaled) test set. Print the RMSE on training and test set.
# Solution
lin_reg = LinearRegression()
lin_reg.fit(scaled_train_data, train_target)
train_predict = lin_reg.predict(scaled_train_data)
test_predict = lin_reg.predict(scaled_test_data)
train_rmse = np.sqrt(((train_target - train_predict) ** 2).mean())
test_rmse = sqrt(mean_squared_error(test_target, test_predict))
print('Train RMSE = ', train_rmse, ', Test RMSE =', test_rmse)
ML algorithms are affected by noise and outliers. On top of that, the model we are using might be too powerful for the problem at hand, and end up overfitting. In order to avoid this, we can constrain the result of our learning effort, and avoid choosing extreme values for the weights of our model. This is called regularisation and finds many applications across the ML spectrum.
Here, we look at two alternatives for our linear model: Ridge regression and Lasso regression.
Coding them in scikit is quite easy. Here is an example:
from sklearn.linear_model import Ridge, Lasso
# fit a ridge regressor
alpha_ridge = 1
ridge_reg = Ridge(alpha_ridge, solver="cholesky")
ridge_reg.fit(train_data, train_target)
# fit a lasso regressor
alpha_lasso = 1
lasso_reg = Lasso(alpha_lasso)
lasso_reg.fit(train_data, train_target)
Exercise 3.1. Compare the training and test RMSE for the Ridge, Lasso and vanilla linear regressors on the Boston house price dataset (no feature scaling).
# Solution
lin_reg = LinearRegression()
lin_reg.fit(train_data, train_target)
train_lin_predict = lin_reg.predict(train_data)
train_ridge_predict = ridge_reg.predict(train_data)
train_lasso_predict = lasso_reg.predict(train_data)
test_lin_predict = lin_reg.predict(test_data)
test_ridge_predict = ridge_reg.predict(test_data)
test_lasso_predict = lasso_reg.predict(test_data)
train_lin_rmse = sqrt(mean_squared_error(train_target, train_lin_predict))
train_ridge_rmse = sqrt(mean_squared_error(train_target, train_ridge_predict))
train_lasso_rmse = sqrt(mean_squared_error(train_target, train_lasso_predict))
test_lin_rmse = sqrt(mean_squared_error(test_target, test_lin_predict))
test_ridge_rmse = sqrt(mean_squared_error(test_target, test_ridge_predict))
test_lasso_rmse = sqrt(mean_squared_error(test_target, test_lasso_predict))
print('Linear Train RMSE =', train_lin_rmse, ', Linear Test RMSE =', test_lin_rmse, ', Ridge Train RSME =', train_ridge_rmse, ', Ridge Test RMSE =', test_ridge_rmse, ', Lasso Train RMSE =', train_lasso_rmse, ', Lasso Test RMSE =', test_lasso_rmse)
Exercise 3.2. Do the results change much if we scale the input features beforehand (compare the RMSE in both cases)?
# Solution
lin_reg = LinearRegression()
lin_reg.fit(scaled_train_data, train_target)
ridge_reg = Ridge(1, solver="cholesky")
ridge_reg.fit(scaled_train_data, train_target)
lasso_reg = Lasso(1)
lasso_reg.fit(scaled_train_data, train_target)
train_lin_predict = lin_reg.predict(scaled_train_data)
train_ridge_predict = ridge_reg.predict(scaled_train_data)
train_lasso_predict = lasso_reg.predict(scaled_train_data)
test_lin_predict = lin_reg.predict(scaled_test_data)
test_ridge_predict = ridge_reg.predict(scaled_test_data)
test_lasso_predict = lasso_reg.predict(scaled_test_data)
train_lin_rmse = sqrt(mean_squared_error(train_target, train_lin_predict))
train_ridge_rmse = sqrt(mean_squared_error(train_target, train_ridge_predict))
train_lasso_rmse = sqrt(mean_squared_error(train_target, train_lasso_predict))
test_lin_rmse = sqrt(mean_squared_error(test_target, test_lin_predict))
test_ridge_rmse = sqrt(mean_squared_error(test_target, test_ridge_predict))
test_lasso_rmse = sqrt(mean_squared_error(test_target, test_lasso_predict))
print('Linear Train RMSE =', train_lin_rmse, 'Linear Test RMSE =', test_lin_rmse, 'Ridge Train RSME =', train_ridge_rmse, ', Ridge Test RMSE =', test_ridge_rmse, ', Lasso Train RMSE =', train_lasso_rmse, ', Lasso Test RMSE =', test_lasso_rmse)
Exercise 3.3. What is the best value of alpha_ridge and alpha_lasso (for simplicity use the test set as a validation set)?
# Solution (obtained by using GridSearch)
from sklearn.model_selection import GridSearchCV
param_grid = {'alpha': [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]}
ridge_reg = Ridge(solver="cholesky")
grid_search = GridSearchCV(ridge_reg, param_grid)
grid_search.fit(train_data, train_target)
grid_search.best_params_
# Solution (obtained by using GridSearch)
param_grid = {'alpha': [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]}
ridge_reg = Lasso()
grid_search = GridSearchCV(ridge_reg, param_grid)
grid_search.fit(train_data, train_target)
grid_search.best_params_
Support Vector Machines (SVM) can be used not only for classification but also for regression! Furthermore, they already provide an implicit way of regularising the result by changing the width of the margin epsilon.
Exercise 4.1. Modify the code below to train a SVM regressor on the Boston house price dataset. Print the training and test RMSE.
# fit a support vector machine regressor
epsilon_svm = 1
svm_reg = LinearSVR(epsilon_svm)
svm_reg.fit(train_data, train_target)
# Solution
from sklearn.svm import LinearSVR
epsilon_svm = 1
svm_reg = LinearSVR(epsilon_svm)
svm_reg.fit(train_data, train_target)
Exercise 4.2. What value of epsilon gives you the best performance (for simplicity use the test set as a validation set)?
# Solution (obtained by using GridSearch)
param_grid = {'epsilon': [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]}
svm_reg = LinearSVR(max_iter=100000)
grid_search = GridSearchCV(svm_reg, param_grid)
grid_search.fit(train_data, train_target)
grid_search.best_params_
Linear regression is all well and good, but sometimes the dataset requires a non-linear model. In this regard, the ML literature offers quite a range of non-linear regression algorithm. Here we look at the simplest one, polynomial regression.
Before, implementing the algorithm, let us create a synthetic dataset:
n = 100
data = 12 * np.random.rand(n, 1) - 3.9
target = 0.09 * (data**3) + 0.3 * (data**2) - 4.1 * data - 2.4 + 4.79 * np.random.randn(n, 1)
Exercise 5.1. Plot the dataset.
# Solution
import matplotlib.pyplot as plt
plt.figure()
plt.plot(data, target, "*")
Exercise 5.2. Why would a linear model be a poor choice in this case (explain in 1-2 sentences)?
Solution Clearly, there is no straigtht line that can capture the trend of this generated data.
The idea behind polynomial regression is to expand the number of input features. We do so by taking the existing ones, and multiplying them by one another. More formally, we create an arbitrary number of polynomials of the given input features.
Fortunately, the scikit package allows us to implement this in a few lines of code:
# fit a quadratic regressor
from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree=2, include_bias=False)
poly_train_data = poly_features.fit_transform(train_data)
lin_reg = LinearRegression()
lin_reg.fit(poly_train_data, train_target)
The code above creates quadratic features, i.e. polynomials of degree two. If we need an even more powerful model, we can choose higher degrees. Of course, this will create an increasingly large number of extra features, and exposes us to overfitting. However, sometimes it is worth the effort.
Exercise 5.3. Which line of the code above you need to change to implement cubic regression (report the modified line)?
Solution the 4th line (changing from degree=2 to degree=3)
Exercise 5.4 Train both Ridge and Lasso regressions and compare the training and test RMSE with the vanilla linear regression shown above.
# Solution
from sklearn.linear_model import Ridge, Lasso
ridge_reg = Ridge(alpha=1, solver="cholesky")
lasso_reg = Lasso(alpha=1)
ridge_reg.fit(poly_train_data, train_target)
lasso_reg.fit(poly_train_data, train_target)
poly_test_data = poly_features.fit_transform(test_data)
train_lin_predict = lin_reg.predict(poly_train_data)
train_ridge_predict = ridge_reg.predict(poly_train_data)
train_lasso_predict = lasso_reg.predict(poly_train_data)
test_lin_predict = lin_reg.predict(poly_test_data)
test_ridge_predict = ridge_reg.predict(poly_test_data)
test_lasso_predict = lasso_reg.predict(poly_test_data)
train_lin_rmse = sqrt(mean_squared_error(train_target, train_lin_predict))
train_ridge_rmse = sqrt(mean_squared_error(train_target, train_ridge_predict))
train_lasso_rmse = sqrt(mean_squared_error(train_target, train_lasso_predict))
test_lin_rmse = sqrt(mean_squared_error(test_target, test_lin_predict))
test_ridge_rmse = sqrt(mean_squared_error(test_target, test_ridge_predict))
test_lasso_rmse = sqrt(mean_squared_error(test_target, test_lasso_predict))
print('Linear Train RMSE =', train_lin_rmse, ', Linear Test RMSE =', test_lin_rmse, ', Ridge Train RSME =', train_ridge_rmse, ', Ridge Test RMSE =', test_ridge_rmse, ', Lasso Train RMSE =', train_lasso_rmse, ', Lasso Test RMSE =', test_lasso_rmse)
Finally, we can modify our SVM code to do polynomial regression. Notice that the SVM takes as input the regular features and manipulates them with a polynomial kernel:
from sklearn.svm import SVR
svm_reg = SVR(kernel="poly", degree=2, C=100, epsilon=0.1)
svm_reg.fit(train_data, np.ravel(train_target))
Exercise 5.5 Plot the original dataset as in Exercise 5.1, but add the predictions of the four models (vanilla linear, Ridge, Lasso and SVM). Hint: you can modify the code below and print four separate plots.
# Solution
train_rmse = sqrt(mean_squared_error(train_target, train_lin_predict))
test_rmse = sqrt(mean_squared_error(test_target, test_lin_predict))
plt.figure()
plt.plot(train_data, train_target, "*", train_data, train_lin_predict, ".", test_data, test_target, "*", test_data, test_lin_predict, ".")
print('Train RMSE =', train_rmse, ', Test RMSE =', test_rmse)
# Solution
train_rmse = sqrt(mean_squared_error(train_target, train_ridge_predict))
test_rmse = sqrt(mean_squared_error(test_target, test_ridge_predict))
plt.figure()
plt.plot(train_data, train_target, "*", train_data, train_ridge_predict, ".", test_data, test_target, "*", test_data, test_ridge_predict, ".")
print('Train RMSE =', train_rmse, ', Test RMSE =', test_rmse)
# Solution
train_rmse = sqrt(mean_squared_error(train_target, train_lasso_predict))
test_rmse = sqrt(mean_squared_error(test_target, test_lasso_predict))
plt.figure()
plt.plot(train_data, train_target, "*", train_data, train_lasso_predict, ".", test_data, test_target, "*", test_data, test_lasso_predict, ".")
print('Train RMSE =', train_rmse, ', Test RMSE =', test_rmse)
# Solution
train_svm_predict = svm_reg.predict(train_data)
test_svm_predict = svm_reg.predict(test_data)
train_rmse = sqrt(mean_squared_error(train_target, train_svm_predict))
test_rmse = sqrt(mean_squared_error(test_target, test_svm_predict))
plt.figure()
plt.plot(train_data, train_target, "*", train_data, train_svm_predict, ".", test_data, test_target, "*", test_data, test_svm_predict, ".")
print('Train RMSE =', train_rmse, ', Test RMSE =', test_rmse)