Impractical Deep Learning for Coders Lesson 2, Minibatch Training

We get to write the training loop!

from coursenotes.nb_03 import get_data

import torch.nn.functional as F
import matplotlib as mpl

from torch import nn
import math

Initial Setup

Setup Data

This is based on the previous notebooks setup, explainations should be looked at there

mpl.rcParams['image.cmap'] = 'gray'

x_train,y_train,x_valid,y_valid = get_data()

n,m = x_train.shape
c = y_train.max()+1
nh = 50

class BasicModel(nn.Module):
    "Basic fully connected model"
    def __init__(self, n_in, num_hidden, n_out):
        super().__init__()
        self.layers = [
            nn.Linear(n_in,num_hidden),
            nn.ReLU(),
            nn.Linear(num_hidden, n_out)
        ]
        
    def __call__(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

model = BasicModel(m, nh, 10)

pred = model(x_train)

A loss function: Cross Entropy Loss

Code
Code + Explanation

def log_softmax(x):
    return (x.exp() / 
            (x.exp().sum(-1, keepdim=True))
           ).log()

def log_softmax(x):
    return (x.exp() / 
            (x.exp().sum(-1, keepdim=True))
           ).log()

def log_softmax(x):

Log softmax is simply taking the exponential of x, dividing it by the sum of all the exponentials, and then taking the log of that result

           ).log()

We take the log because negative log likelihood expects a log, not a negative

log_preds = log_softmax(pred)

We can then calculate log likelihood, which is equal to:

(classARight * log10(classAProb)) + (classBRight * log10(classBProb))...

For example, assume two classes such as above, with the probabilities being 0.98 and 0.2 respectively. The right answer is 0

is_cat = 1 # One hot encoded label
is_dog = 0 # OHE label
preds = 0.98 # Softmaxed predictions
log_pred_cat = math.log10(preds) # Take log base 10
log_pred_dog = math.log10(1-preds) # Take log base 10

nll = -((is_cat * log_pred_cat) + (is_dog * log_pred_dog)); nll # Follow the above, and make it negative

0.00877392430750515

We can make it faster by first finding the location of the 1 (since there is only a single one), then using that index calculate it all

y_train[:3]

tensor([5, 0, 4])

log_preds[[0,1,2], [5,0,4]]

tensor([-2.4597, -2.3251, -2.1119], grad_fn=<IndexBackward0>)