1. Motivated by the use of CNN for MNIST dataset, now you are to try CNN on the Sign Language MNIST next. In this task, you are given a dataset of hand gestures that are labelled with an American Sign Language letter (except for J and Z which need hand motion). The dataset can be retrieved from either https://drive.google.com/file/d/1zkX8oQ74JFcJ7Gli6M_qnnGo41tMeZQI/view?usp=sharing or the original source on Kaggle: https://www.kaggle.com/datamunge/sign-language-mnist/download. Its format is very similar to the classic MNIST (28 x 28 pixels and each label is a number representing a letter from A to Z -- but excluding 9=J and 25=Z). It is already partitioned into a training set and a testing set.
1.(a) Implement a CNN and train it on the training set with the Gradient Descent optimiser. You are free to set the structure and the hyperparameters by your own. However, please write a short description of them and a justification of your choices.
1.(b) Now replace the Gradient Descent with a Stochastic Gradient Descent (SGD) optimiser. Demonstrate how much does the CNN improve. Also justify your demonstration technique --- why did you demonstrate in such a way? (You must also explain if there is any change to the CNN's structure).
1.(c) Finally, replace SGD with the Adam optimiser and redo the previous subtask. Is it better to use Adam?
sign_letters = [chr(x) for x in range(65,91)]
print(sign_letters)
# Import and prepare dataset
import numpy as np
import pandas as pd
test = pd.read_csv("./test/sign_mnist_test.csv")
train = pd.read_csv("./train/sign_mnist_train.csv")
total_pixels = 28 * 28
X_train = train.loc[:, "pixel1":].to_numpy().astype(np.float32) / 255.0
y_train = train.loc[:, "label"].to_numpy().astype(np.int32)
X_test = test.loc[:, "pixel1":].to_numpy().astype(np.float32) / 255.0
y_test = test.loc[:, "label"].to_numpy().astype(np.int32)
X_train = X_train.reshape(X_train.shape[0],28,28,1)
X_test = X_test.reshape(X_test.shape[0],28,28,1)
# Verify the data
import matplotlib.pyplot as plt
image_index = 10
one_image = X_train[5]
one_image = one_image.reshape(28, 28)
plt.title("Label {}".format(sign_letters[test.loc[image_index, "label"]]))
plt.imshow(one_image.astype(np.float32), cmap="gray", vmin=0, vmax=1.0)
plt.show()
# Build a CNN
import tensorflow as tf
from tensorflow.keras import layers, models
height = 28
width = 28
channels = 1
conv1_fmaps = 32
conv1_ksize = 3
conv1_stride = 1
conv1_pad = 'same'
conv2_fmaps = 64
conv2_ksize = 3
conv2_stride = 2
conv2_pad = 'same'
n_fc = 64
n_outputs = 25
model = models.Sequential()
model.add(layers.Conv2D(filters=conv1_fmaps, kernel_size=conv1_ksize, strides=conv1_stride,
padding=conv1_pad, activation='relu', input_shape=(height, width, channels)))
model.add(layers.Conv2D(filters=conv2_fmaps, kernel_size=conv2_ksize, strides=conv2_stride,
padding=conv2_pad, activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'))
model.add(layers.Flatten())
model.add(layers.Dense(units=n_fc, activation='relu'))
model.add(layers.Dense(units=n_outputs))
model.add(layers.Softmax())
model.summary()
# Compile and train CNN
from tensorflow.keras import losses
model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10)
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, y_test)
Accuracy on testing set of CNN trained with SGD optimiser is reported above. You can easily change the optimizer during the compilation with optimizer='adam'.
2. In this task, you have to do time series prediction using RNNs. In particular, we aim to predict the internet traffic data (the dataset itself can be found here - https://secure.ecs.soton.ac.uk/noteswiki/images/internet-traffic-data-in-bits-fr.xlsx - or you can find the link to this data set from the module web site). It contains internet traffic data (in bits) of a academic backbone network in the UK. It was collected between 19 November 2004 and 27 January 2005. Data were collected at five minute intervals.
# Get the data
import numpy as np
import pandas as pd
data = pd.read_excel("internet-traffic-data-in-bits-fr.xlsx", usecols=1) #ignore the last null column
data = data.drop(data.index[(len(data)-1)]) # discard irrelevant rows at the end
data = data.drop(data.index[(len(data)-1)])
data.columns = ["time", "bits"] # rename columns
data.head(10)
# Have a visual look on the data
import matplotlib.pyplot as plt
plt.figure(figsize=(20, 10))
plt.plot(data["bits"])
plt.show()
t_min, t_max = 0, len(data)
t_min, t_max
# preprocessing
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
print(scaler.fit(data["bits"].values.reshape(-1, 1)))
normalized_data = scaler.transform(data["bits"].values.reshape(-1, 1)).flatten()
plt.figure(figsize=(20, 10))
plt.plot(normalized_data)
plt.show()
2.(a) Implement an RNN and train it on the provided data. You are free to set the hyperparameters by your own. However, please write a short description to justify your choices.
2.(b) Once you have trained the RNN, use the Stochastic Gradient Descent technique to improve the RMSE and plot the results to demonstrate how SGD works (again, you are free to choose the way you demonstrate - however, you need to explain why you did choose to do that way).
2.(c) Finally, replace SGD with Adam and redo the previous subtask. Is it better to use Adam? Explain your answer (hint: compare the number of steps, how many times it takes to converge etc)
import tensorflow as tf
def reset_graph(seed=42):
tf.reset_default_graph()
tf.set_random_seed(seed)
np.random.seed(seed)
def next_batch(batch_size, n_steps):
"""
Returns a batch with `n_steps`: number of instances
"""
t0 = np.random.randint(low=t_min, high=t_max - t_min - n_steps, size=(batch_size, 1), dtype=np.int32)
Ts = t0 + np.arange(0., n_steps + 1, dtype=np.int32)
ys = normalized_data[Ts]
# return X's and Y's
return ys[:, :-1].reshape(-1, n_steps, 1), ys[:, 1:].reshape(-1, n_steps, 1)
t = np.arange(t_min, t_max)
n_steps = 20
# a training instance
t_instance = np.arange(12, 12+n_steps+1)
plt.figure(figsize=(11,4))
plt.subplot(121)
plt.title("The whole time series", fontsize=14)
# plot all the data
plt.plot(t,normalized_data[t], label=r"Traffic Data")
# plot only the training set
plt.plot(t_instance[:-1], normalized_data[t_instance[:-1]], "b-", linewidth=3, label="A training instance")
plt.legend(loc="lower left", fontsize=14)
plt.axis([0, 100, 0, 1])
plt.xlabel("Time")
plt.ylabel("Bits")
plt.subplot(122)
plt.title("A training instance", fontsize=14)
plt.plot(t_instance[:-1], normalized_data[t_instance[:-1]], "bo", markersize=10, label="instance")
# notice that targets are shifted by one time step into the future
plt.plot(t_instance[1:], normalized_data[t_instance[1:]], "r*", markersize=10, label="target")
plt.legend(loc="upper left")
plt.xlabel("Time")
plt.show()
reset_graph()
n_steps = 50
n_inputs = 1
n_neurons = 100
n_outputs = 1
n_layers = 3
learning_rate = 0.001
n_iterations = 1000
batch_size = 50
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])
layers = [tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu)
for layer in range(n_layers)]
multi_layer_cell = tf.nn.rnn_cell.MultiRNNCell(layers)
cell = tf.contrib.rnn.OutputProjectionWrapper(tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu), output_size=n_outputs)
outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
loss = tf.sqrt(tf.losses.mean_squared_error(outputs, y)) # RMSE
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
saver = tf.train.Saver()
debug = False
with tf.Session() as sess:
init.run()
for iteration in range(n_iterations):
X_batch, y_batch = next_batch(batch_size, n_steps)
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
if debug or iteration % 100 == 0:
mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
print(iteration, "\tRMSE:", mse)
saver.save(sess, "./my_time_series_model")
t_instance = np.arange(12, 12+n_steps+1)
pred_all = [0 for i in range(n_steps)]
with tf.Session() as sess:
saver.restore(sess, "./my_time_series_model")
#init.run()
X_new = normalized_data[np.array(t_instance[:-1].reshape(-1, n_steps, n_inputs))]
y_pred = sess.run(outputs, feed_dict={X: X_new})
for i in range(0, len(normalized_data)-n_steps):
if i % 2000 == 0:
print(i)
r = np.arange(i, i + n_steps)
# print(np.array(r.reshape(-1, n_steps, n_inputs)))
X_new = normalized_data[np.array(r.reshape(-1, n_steps, n_inputs))]
pred = sess.run(outputs, feed_dict={X: X_new})
# print(pred)
new_inst = pred[0][0][-1]
pred_all.append(new_inst)
len(pred_all), len(normalized_data)
plt.figure(figsize=(20, 10))
plt.title("Testing the model", fontsize=14)
plt.plot(t_instance[:-1], normalized_data[t_instance[:-1]], "bo", markersize=10, label="instance")
plt.plot(t_instance[1:], normalized_data[t_instance[1:]], "r*", markersize=10, label="target")
plt.plot(t_instance[1:], y_pred[0,:,0], "c.", markersize=10, label="prediction")
plt.legend(loc="upper left")
plt.xlabel("Time")
plt.show()
plt.figure(figsize=(20, 10))
plt.plot(normalized_data[-500:])
plt.plot(pred_all[-500:], linewidth=3)
plt.show()
from sklearn.metrics import mean_squared_error
import math
print("RMSE", mean_squared_error(normalized_data[:(len(pred_all))], pred_all))