For the actual fastai documentation, you should go to the Learner documentation. These are minimal docs simply to bring in the source code and related tests to ensure that minimal functionality is met
You probably want to jump directly to the definition of Learner.
class _A:
def __init__(self, a): self.a = a
@contextmanager
def a_changed(self, v): return replacing_yield(self, 'a', v)
a = _A(42)
with a.a_changed(32):
test_eq(a.a, 32)
test_eq(a.a, 42)
See the class Metric below for more information.
file can be a Path object, a string or an opened file object. pickle_protocol is passed along to torch.save
file can be a Path object, a string or an opened file object. If a device is passed, the model is loaded on it, otherwise it's loaded on the CPU.
If strict is True, the file must exactly contain weights for every parameter key in model, if strict is False, only the keys that are in the saved model are loaded in model.
opt_func will be used to create an optimizer when Learner.fit is called, with lr as a default learning rate. splitter is a function that takes self.model and returns a list of parameter groups (or just one parameter group if there are no different parameter groups). The default is trainable_params, which returns all trainable parameters of the model.
cbs is one or a list of Callbacks to pass to the Learner. Callbacks are used for every tweak of the training loop. Each Callback is registered as an attribute of Learner (with camel case). At creation, all the callbacks in defaults.callbacks (TrainEvalCallback, Recorder and ProgressCallback) are associated to the Learner.
metrics is an optional list of metrics, that can be either functions or Metrics (see below).
path and model_dir are used to save and/or load models. Often path will be inferred from dls, but you can override it or pass a Path object to model_dir. Make sure you can write in path/model_dir!
wd is the default weight decay used when training the model; moms, the default momentums used in Learner.fit_one_cycle. wd_bn_bias controls if weight decay is applied to BatchNorm layers and bias.
Lastly, train_bn controls if BatchNorm layers are trained even when they are supposed to be frozen according to the splitter. Our empirical experiments have shown that it's the best behavior for those layers in transfer learning.
You can use regular PyTorch functionality for most of the arguments of the Learner, although the experience will be smoother with pure fastai objects and you will be able to use the full functionality of the library. The expectation is that the training loop will work smoothly even if you did not use fastai end to end. What you might lose are interpretation objects or showing functionality. The list below explains how to use plain PyTorch objects for all the arguments and what you might lose.
The most important is opt_func. If you are not using a fastai optimizer, you will need to write a function that wraps your PyTorch optimizer in an OptimWrapper. See the optimizer module for more details. This is to ensure the library's schedulers/freeze API work with your code.
dlsis aDataLoadersobject, that you can create from standard PyTorch dataloaders. By doing so, you will lose all showing functionality likeshow_batch/show_results. You can check the data block API or the mid-level data API tutorial to learn how to use fastai to gather your data!modelis a standard PyTorch model. You can use anyone you like, just make sure it accepts the number of inputs you have in yourDataLoadersand returns as many outputs as you have targets.loss_funccan be any loss function you like. It needs to be one of fastai's if you want to useLearn.predictorLearn.get_preds, or you will have to implement special methods (see more details after theBaseLossdocumentation).
Now let's look at the main thing the Learner class implements: the training loop.
Uses lr and wd if they are provided, otherwise use the defaults values given by the lr and wd attributes of Learner.
All the examples use synth_learner which is a simple Learner training a linear regression model.
learn = synth_learner(lr=0.1)
learn(_before_epoch)
learn.model = learn.model.cpu()
xb,yb = learn.dls.one_batch()
init_loss = learn.loss_func(learn.model(xb), yb)
learn.fit(10)
xb,yb = learn.dls.one_batch()
final_loss = learn.loss_func(learn.model(xb), yb)
assert final_loss < init_loss, (final_loss,init_loss)
from fastai.optimizer import SGD
from functools import partial
This is an internal method called by Learner.fit. If passed, i is the index of this iteration in the epoch. In training mode, this does a full training step on the batch (compute predictions, loss, gradients, update the model parameters and zero the gradients). In validation mode, it stops at the loss computation. Training or validation is controlled internally by the TrainEvalCallback through the training attribute.
Nothing is returned, but the attributes x, y, pred, loss of the Learner are set with the proper values:
b = learn.dls.one_batch()
learn.one_batch(0, b)
test_eq(learn.x, b[0])
test_eq(learn.y, b[1])
out = learn.model(learn.x)
test_eq(learn.pred, out)
test_eq(learn.loss, learn.loss_func(out, b[1]))
This method is called internally to create the optimizer, the hyper-parameters are then adjusted by what you pass to Learner.fit or your particular schedulers (see callback.schedule).
learn = synth_learner(n_train=5, cbs=VerboseCallback())
assert learn.opt is None
learn.create_opt()
assert learn.opt is not None
test_eq(learn.opt.hypers[0]['lr'], learn.lr)
We only describe the basic functionality linked to Callbacks here. To learn more about Callbacks an how to write them, check the callback.core module documentation.
Let's first see how the Callbacks become attributes of Learner:
class TstCallback(Callback):
def batch_begin(self): self.learn.a = self.a + 1
tst_learn = synth_learner()
test_eq(len(tst_learn.cbs), 1)
assert isinstance(tst_learn.cbs[0], TrainEvalCallback)
assert hasattr(tst_learn, ('train_eval'))
tst_learn = synth_learner(cbs=TstCallback())
test_eq(len(tst_learn.cbs), 2)
assert isinstance(tst_learn.cbs[1], TstCallback)
assert hasattr(tst_learn, ('tst'))
This how the Callbacks are called internally. For instance a VerboseCallback just prints the event names (can be useful for debugging):
learn = synth_learner(cbs=VerboseCallback())
learn('after_fit')
learn = synth_learner()
learn.add_cb(TestTrainEvalCallback())
test_eq(len(learn.cbs), 2)
assert isinstance(learn.cbs[1], TestTrainEvalCallback)
test_eq(learn.train_eval.learn, learn)
learn.add_cbs([TestTrainEvalCallback(), TestTrainEvalCallback()])
test_eq(len(learn.cbs), 4)
learn = synth_learner()
test_eq(len(learn.cbs), 1)
with learn.added_cbs(TestTrainEvalCallback()):
test_eq(len(learn.cbs), 2)
By order, we mean using the internal ordering of the Callbacks (see callback.core for more information on how it works).
learn = synth_learner()
learn.add_cb(TestTrainEvalCallback())
learn.ordered_cbs('before_fit')
learn = synth_learner()
learn.add_cb(TestTrainEvalCallback())
cb = learn.cbs[1]
learn.remove_cb(learn.cbs[1])
test_eq(len(learn.cbs), 1)
assert cb.learn is None
assert not getattr(learn,'test_train_eval',None)
cb can simply be the class of the Callback we want to remove (in which case all instances of that callback are removed).
learn = synth_learner()
learn.add_cbs([TestTrainEvalCallback(), TestTrainEvalCallback()])
learn.remove_cb(TestTrainEvalCallback)
test_eq(len(learn.cbs), 1)
assert not getattr(learn,'test_train_eval',None)
Elements of cbs can either be types of callbacks or actual callbacks of the Learner.
learn = synth_learner()
learn.add_cbs([TestTrainEvalCallback() for _ in range(3)])
cb = learn.cbs[1]
learn.remove_cbs(learn.cbs[1:])
test_eq(len(learn.cbs), 1)
Elements of cbs can either be types of callbacks or actual callbacks of the Learner.
learn = synth_learner()
learn.add_cb(TestTrainEvalCallback())
with learn.removed_cbs(learn.cbs[1]):
test_eq(len(learn.cbs), 1)
test_eq(len(learn.cbs), 2)
At each step, callbacks are shown in order, which can help debugging.
learn = synth_learner()
learn.show_training_loop()
In order to change the data passed to your model, you will generally want to hook into the before_batch event, like so:
class TstCallback(Callback):
def before_batch(self):
self.learn.xb = self.xb + 1000
self.learn.yb = self.yb - 1000
Since that is so common, we provide the before_batch_cb decorator to make it easier.
@before_batch_cb
def cb(self, xb, yb): return xb+1000,yb-1000
file can be a Path, a string or a buffer. pickle_protocol is passed along to torch.save.
file can be a Path, a string or a buffer. Use device to load the model/optimizer state on a device different from the one it was saved.
import tempfile
with tempfile.TemporaryDirectory() as d:
learn = synth_learner(path=d)
learn.fit(1)
#Test save created a file
learn.save('tmp')
assert (Path(d)/'models/tmp.pth').exists()
#Test load did load the model
learn1 = synth_learner(path=d)
learn1 = learn1.load('tmp')
test_eq(learn.a, learn1.a)
test_eq(learn.b, learn1.b)
test_eq(learn.opt.state_dict(), learn1.opt.state_dict())
The Learner is saved in self.path/fname, using pickle_protocol. Note that serialization in Python saves the names of functions, not the code itself. Therefore, any custom code you have for models, data transformation, loss function etc... should be put in a module that you will import in your training environment before exporting, and in your deployment environment before loading it.
load_learner requires all your custom code be in the exact same place as when exporting your Learner (the main script, or the module you imported it from).fastai provides to_detach which by default detachs tensor gradients, and gathers (calling maybe_gather) tensors from all ranks if running in distributed data parallel (DDP) mode.
When running in DDP mode all ranks need to have the same batch size, and DistributedDL takes care of padding batches as needed; however when gathering all tensors (e.g. for calculating metrics, inference, etc.) we need to discard the padded items. DistributedDL provides a method to_detach that removes padding appropriately.
Calling to_detach_from_dl with learn as a learner will attempt to find a to_detach method in the learner's last used DataLoader dl and use that one if found, otherwise it will resort to the vanilla to_detach.
Metrics can be simple averages (like accuracy) but sometimes their computation is a little bit more complex and can't be averaged over batches (like precision or recall), which is why we need a special class for them. For simple functions that can be computed as averages over batches, we can use the class AvgMetric, otherwise you'll need to implement the following methods.
Metric has state depending on tensors, don’t forget to store it on the CPU to avoid any potential memory leaks.learn = synth_learner()
tst = AvgMetric(lambda x,y: (x-y).abs().mean())
t,u = torch.randn(100),torch.randn(100)
tst.reset()
for i in range(0,100,25):
learn.pred,learn.yb = t[i:i+25],(u[i:i+25],)
tst.accumulate(learn)
test_close(tst.value, (t-u).abs().mean())
tst = AvgLoss()
t = torch.randn(100)
tst.reset()
for i in range(0,100,25):
learn.yb,learn.loss = t[i:i+25],t[i:i+25].mean()
tst.accumulate(learn)
test_close(tst.value, t.mean())
tst = AvgSmoothLoss()
t = torch.randn(100)
tst.reset()
val = tensor(0.)
for i in range(4):
learn.loss = t[i*25:(i+1)*25].mean()
tst.accumulate(learn)
val = val*0.98 + t[i*25:(i+1)*25].mean()*(1-0.98)
test_close(val/(1-0.98**(i+1)), tst.value)
def metric_value_fn(): return 5e-3
vm = ValueMetric(metric_value_fn, 'custom_value_metric')
test_eq(vm.value, 5e-3)
test_eq(vm.name, 'custom_value_metric')
vm = ValueMetric(metric_value_fn)
test_eq(vm.name, 'metric_value_fn')
By default, metrics are computed on the validation set only, although that can be changed by adjusting train_metrics and valid_metrics. beta is the weight used to compute the exponentially weighted average of the losses (which gives the smooth_loss attribute to Learner).
The logger attribute of a Learner determines what happens to those metrics. By default, it just print them:
import torch.nn.functional as F
def tst_metric(out, targ): return F.mse_loss(out, targ)
learn = synth_learner(n_train=5, metrics=tst_metric)
# pat = r"[tensor\(\d.\d*\), tensor\(\d.\d*\), tensor\(\d.\d*\), 'dd:dd']"
pat = r"\[\d, \d+.\d+, \d+.\d+, \d+.\d+, '\d\d:\d\d'\]"
test_stdout(lambda: learn.fit(1), pat, regex=True)
learn = synth_learner(n_train=5, metrics=tst_metric)
res = learn.validate()
test_eq(res[0], res[1])
x,y = learn.dls.valid_ds.tensors
test_close(res[0], F.mse_loss(learn.model(x), y), 1e-3)
with_decoded will also return the decoded predictions using the decodes function of the loss function (if it exists). For instance, fastai's CrossEntropyFlat takes the argmax or predictions in its decodes.
Depending on the loss_func attribute of Learner, an activation function will be picked automatically so that the predictions make sense. For instance if the loss is a case of cross-entropy, a softmax will be applied, or if the loss is binary cross entropy with logits, a sigmoid will be applied. If you want to make sure a certain activation function is applied, you can pass it with act.
save_preds and save_targs should be used when your predictions are too big to fit all in memory. Give a Path object that points to a folder where the predictions and targets will be saved.
concat_dim is the batch dimension, where all the tensors will be concatenated.
inner is an internal attribute that tells get_preds it's called internally, inside another training loop, to avoid recursion errors.
with_loss=True on a custom loss function, make sure you have implemented a reduction attribute that supports ’none’ learn = synth_learner(n_train=5, metrics=tst_metric)
preds,targs = learn.get_preds()
x,y = learn.dls.valid_ds.tensors
test_eq(targs, y)
test_close(preds, learn.model(x))
preds,targs = learn.get_preds(act = torch.sigmoid)
test_eq(targs, y)
test_close(preds, torch.sigmoid(learn.model(x)))
It returns a tuple of three elements with, in reverse order,
- the prediction from the model, potentially passed through the activation of the loss function (if it has one)
- the decoded prediction, using the potential
decodesmethod from it - the fully decoded prediction, using the transforms used to build the
Datasets/DataLoaders
rm_type_tfms is a deprecated argument that should not be used and will be removed in a future version. with_input will add the decoded inputs to the result.
predict you should use the entire fastai DataBlock API, as predict will not work with raw pytorch DataLoaders (and in turn, this sublibrary)In practice, we get the predictions n times with the transforms of the training set and average those. The final predictions are (1-beta) multiplied by this average + beta multiplied by the predictions obtained with the transforms of the dataset. Set beta to None to get a tuple of the predictions and tta results. You can also use the maximum of all predictions instead of an average by setting use_max=True.
If you want to use new transforms, you can pass them with item_tfms and batch_tfms.
tta you need to utilize the entire fastai DataBlock API, as a result it is unsupported in this sublibrary