from coursenotes.nb_03 import get_data
import torch.nn.functional as F
import matplotlib as mpl
from torch import nn
import math
Impractical Deep Learning for Coders Lesson 2, Minibatch Training
We get to write the training loop!
Initial Setup
Setup Data
This is based on the previous notebooks setup, explainations should be looked at there
'image.cmap'] = 'gray'
mpl.rcParams[
= get_data()
x_train,y_train,x_valid,y_valid
= x_train.shape
n,m = y_train.max()+1
c = 50 nh
class BasicModel(nn.Module):
"Basic fully connected model"
def __init__(self, n_in, num_hidden, n_out):
super().__init__()
self.layers = [
nn.Linear(n_in,num_hidden),
nn.ReLU(),
nn.Linear(num_hidden, n_out)
]
def __call__(self, x):
for layer in self.layers:
= layer(x)
x return x
= BasicModel(m, nh, 10) model
= model(x_train) pred
A loss function: Cross Entropy Loss
def log_softmax(x):
return (x.exp() /
sum(-1, keepdim=True))
(x.exp(). ).log()
def log_softmax(x):
return (x.exp() /
sum(-1, keepdim=True))
(x.exp(). ).log()
def log_softmax(x):
Log softmax is simply taking the exponential of x, dividing it by the sum of all the exponentials, and then taking the log of that result
).log()
We take the log because negative log likelihood expects a log, not a negative
= log_softmax(pred) log_preds
We can then calculate log likelihood, which is equal to:
(classARight * log10(classAProb)) + (classBRight * log10(classBProb))...
For example, assume two classes such as above, with the probabilities being 0.98 and 0.2 respectively. The right answer is 0
= 1 # One hot encoded label
is_cat = 0 # OHE label
is_dog = 0.98 # Softmaxed predictions
preds = math.log10(preds) # Take log base 10
log_pred_cat = math.log10(1-preds) # Take log base 10
log_pred_dog
= -((is_cat * log_pred_cat) + (is_dog * log_pred_dog)); nll # Follow the above, and make it negative nll
0.00877392430750515
We can make it faster by first finding the location of the 1 (since there is only a single one), then using that index calculate it all
3] y_train[:
tensor([5, 0, 4])
0,1,2], [5,0,4]] log_preds[[
tensor([-2.4597, -2.3251, -2.1119], grad_fn=<IndexBackward0>)