Define the general fastai optimizer and variants

For the actual fastai documentation, you should go to the Optimizer documentation. These are minimal docs simply to bring in the source code and related tests to ensure that minimal functionality is met

detuplify_pg[source]

detuplify_pg(d)

set_item_pg[source]

set_item_pg(pg, k, v)

class OptimWrapper[source]

OptimWrapper(opt, hp_map=None) :: _BaseOptimizer

Common functionality between Optimizer and OptimWrapper

OptimWrapper Examples

Below are some examples with OptimWrapper with Pytorch optimizers:

Adam[source]

Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

Convience function to make an Adam optimizer compatable with Learner

@delegates(optim.Adam)
def Adam(params, **kwargs): 
    "Convience function to make an Adam optimizer compatable with `Learner`"
    return OptimWrapper(optim.Adam(params, **kwargs))

SGD[source]

SGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False)

Convience function to make a SGD optimizer compatable with Learner

@delegates(optim.SGD)
def SGD(params, **kwargs):
    "Convience function to make a SGD optimizer compatable with `Learner`"
    return OptimWrapper(optim.SGD(params, **kwargs))

Differential Learning Rates and Groups with Pytorch Optimizers

Out of the box, OptimWrapper is not able to utilize param groups and differential learning rates like fastai has. Below contains the necissary helper functions needed, as well as a tutorial

params[source]

params(m)

Return all parameters of m

convert_params[source]

convert_params(o:list)

Converts o into Pytorch-compatable param groups

o should be a set of layer-groups that should be split in the optimizer

Example:

def splitter(m): return convert_params([[m.a], [m.b]])

Where m is a model defined as:

class RegModel(Module):
  def __init__(self): self.a,self.b = nn.Parameter(torch.randn(1)),nn.Parameter(torch.randn(1))
  def forward(self, x): return x*self.a + self.b
def _mock_train(m, x, y, opt):
    m.train()
    for i in range(0, 100, 25):
        z = m(x[i:i+25])
        loss = F.mse_loss(z, y[i:i+25])
        loss.backward()
        opt.step()
        opt.zero_grad()