!pip install git+https://github.com/fastai/fastai
!pip install git+https://github.com/fastai/fastcore
Collecting git+https://github.com/fastai/fastai
  Cloning https://github.com/fastai/fastai to /tmp/pip-req-build-7c5crln7
  Running command git clone -q https://github.com/fastai/fastai /tmp/pip-req-build-7c5crln7
Requirement already satisfied: pip in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (19.3.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (20.4)
Collecting fastcore>=1.0.5
  Downloading https://files.pythonhosted.org/packages/ce/03/04eb54f2d482e06375cbbd06fb9d71670a5607739ecfa18a4bd25bfbd9fa/fastcore-1.0.15-py3-none-any.whl (40kB)
     |████████████████████████████████| 40kB 3.7MB/s 
Requirement already satisfied: torchvision>=0.7 in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (0.7.0+cu101)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (3.2.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (1.0.5)
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (2.23.0)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (3.13)
Requirement already satisfied: fastprogress>=0.2.4 in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (1.0.0)
Requirement already satisfied: pillow in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (7.0.0)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (0.22.2.post1)
Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (1.4.1)
Requirement already satisfied: spacy in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (2.2.4)
Requirement already satisfied: torch>=1.6.0 in /usr/local/lib/python3.6/dist-packages (from fastai==2.0.14) (1.6.0+cu101)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->fastai==2.0.14) (2.4.7)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from packaging->fastai==2.0.14) (1.15.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from torchvision>=0.7->fastai==2.0.14) (1.18.5)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->fastai==2.0.14) (2.8.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->fastai==2.0.14) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->fastai==2.0.14) (0.10.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas->fastai==2.0.14) (2018.9)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->fastai==2.0.14) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->fastai==2.0.14) (2020.6.20)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->fastai==2.0.14) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->fastai==2.0.14) (1.24.3)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.6/dist-packages (from scikit-learn->fastai==2.0.14) (0.16.0)
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai==2.0.14) (1.0.0)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai==2.0.14) (2.0.3)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai==2.0.14) (1.0.2)
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai==2.0.14) (1.1.3)
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai==2.0.14) (1.0.2)
Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai==2.0.14) (0.4.1)
Requirement already satisfied: thinc==7.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai==2.0.14) (7.4.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai==2.0.14) (4.41.1)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai==2.0.14) (3.0.2)
Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from spacy->fastai==2.0.14) (50.3.0)
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy->fastai==2.0.14) (0.8.0)
Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from torch>=1.6.0->fastai==2.0.14) (0.16.0)
Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from catalogue<1.1.0,>=0.0.7->spacy->fastai==2.0.14) (1.7.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.6/dist-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy->fastai==2.0.14) (3.1.0)
Building wheels for collected packages: fastai
  Building wheel for fastai (setup.py) ... done
  Created wheel for fastai: filename=fastai-2.0.14-cp36-none-any.whl size=185359 sha256=6b1f931451e060025a72936ebbaaa6a2d99dbb58e2d9b84e842c55f6a1721a57
  Stored in directory: /tmp/pip-ephem-wheel-cache-15upts8i/wheels/83/30/a0/6fa8a74c9f5a5ab45cdc84e9f9ed56d8a72750e11ebf50a364
Successfully built fastai
Installing collected packages: fastcore, fastai
  Found existing installation: fastai 1.0.61
    Uninstalling fastai-1.0.61:
      Successfully uninstalled fastai-1.0.61
Successfully installed fastai-2.0.14 fastcore-1.0.15
Collecting git+https://github.com/fastai/fastcore
  Cloning https://github.com/fastai/fastcore to /tmp/pip-req-build-7fykshnj
  Running command git clone -q https://github.com/fastai/fastcore /tmp/pip-req-build-7fykshnj
Requirement already satisfied (use --upgrade to upgrade): fastcore==1.0.15 from git+https://github.com/fastai/fastcore in /usr/local/lib/python3.6/dist-packages
Requirement already satisfied: pip in /usr/local/lib/python3.6/dist-packages (from fastcore==1.0.15) (19.3.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from fastcore==1.0.15) (20.4)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->fastcore==1.0.15) (2.4.7)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from packaging->fastcore==1.0.15) (1.15.0)
Building wheels for collected packages: fastcore
  Building wheel for fastcore (setup.py) ... done
  Created wheel for fastcore: filename=fastcore-1.0.15-cp36-none-any.whl size=40887 sha256=8feb4aa9c227a90f158bed73ab4bc9a2100dcf7a55df689e1fe6d5ed3937e12b
  Stored in directory: /tmp/pip-ephem-wheel-cache-83grb3ie/wheels/8a/2a/23/bc50c8f5e28776b44ac837a01fcfa675724565d4813d8e51c7
Successfully built fastcore

This blog is also a Jupyter notebook available to run from the top down. There will be code snippets that you can then run in any environment. In this section I will be posting what version of fastai and fastcore I am currently running at the time of writing this:

  • fastai: 2.0.13

  • fastcore: 1.0.13


Note: this blog is the result of joint efforts between myself and Juvian on the forums

CAMVID Benchmarks, Can't We Just Use the Code from Class?

In the fastai course, we are walked through the CAMVID dataset, semantic segmentation with a car's point of view. Ideally, we would then like to compare our results to the current state-of-the-art benchmarks.

However! In it's current state, this cannot be done.

Why you might ask? Recently the benchmarks have been adapting a few "weird" changes, as well as making the dataset slightly easier, although comparing against them is not as straightforward as you would hope either (more on this at the end)

The Metric:

First, the reported metrics are different. Instead of accuracy the Mean Intersection Over Union (mIOU) is reported, as well as individual IOU's per class

The Number of Classes:

In the original fastai version of the dataset, 31 classes are present, with an additional void classes that is ignored in the resulting benchmarks.

Researchers have since changed this class distribution to 11 total classes: Building, Tree, Sky, Car, Sign, Road, Pedestrian, Fence, Pole, Sidewalk, and Cyclist, with one more twelveth void class that is again, not taken into account.

This change in classes allows for a higher mIOU being reported without having the rarely-seen classes scew the results, so if you were running mIOU on the class notebooks and getting ~20% and being confused why it doesn't align, this is why!

The Splits

When we train with fastai, we wind up mixing in the baseline evaluation dataset with the training data! Not something we want at all! The train/validation/test split in most papers tends to be: 367/101/233. That is correct, there is two-times as many test images as there are validation.

The SegNet Version

This is a version that has images and labels coming in at a size of 360x480 pixels, which is half the size of what fastai's source dataset is, but has its labels with the 11 classes. What is different paper to paper however is how they use the dataset, which can lead to issues. Let's look at the current options and their pros/cons:

Using the SegNet Dataset:

If we decide to use only this dataset, there is not much room for fastai's tricks (such as progressive resizing and Pre-Sizing). That being said, there are papers which use this. If you look on the CAMVID leaderboard however, you'll notice the best model placed at 8th. So what's next?

Well, what is the SOTA we're comparing against then?

While below is a benchmark, we can't truly compare against it. However, if we wish to, we will be focusing on the models that have an ImageNet backbone:

image.png

Using the fastai Images with Smaller Labels

fastai uses the high-quality 720x960 images and labels, so it would make logical sense to train on them and use these smaller masks as the labels, which is being done on all the upper benchmarks.

The Issue

There is a very big issue with this though, which Jeremy pointed out to us while we were discussing these new benchmark approaches. Simply upscaling the labels, without any adjustments to the fastai images, on its own sounds "weird." Instead, what we do is resize the images back down to the 360x480 size before then upsampling them. This winds up increasing the final accuracy

Can We Train Now?

Okay, enough talking, can we see some code to back up your claims?

Sure, let's do it! To visualize what we will be doing, throughout this blog we will be:

  1. Downloading a different dataset
  2. Making a DataBlock which pre-sizes our images to the proper size
  3. Making a unet_learner which:
    • Uses a pretrained ResNet34 backbone
    • Uses the ranger optimizer function
    • Compare the use of ReLU and Mish in the head
    • Uses both IOU and mIOU metrics to properly allow us to benchmark the results
  4. Make a test_dl with the proper test set to evaluate with.

Downloading the Dataset

The dataset currently lives in a the repository, so we will go ahead and clone it and make it our working directory:

!git clone https://github.com/alexgkendall/SegNet-Tutorial.git
%cd SegNet-Tutorial/
Cloning into 'SegNet-Tutorial'...
remote: Enumerating objects: 2785, done.
remote: Total 2785 (delta 0), reused 0 (delta 0), pack-reused 2785
Receiving objects: 100% (2785/2785), 340.84 MiB | 26.65 MiB/s, done.
Resolving deltas: 100% (81/81), done.
/content/SegNet-Tutorial

Now we still want to use fastai's input images, so we'll go ahead and pull their CAMVID dataset. First let's import fastai's vision module:

from fastai.vision.all import *

Then grab the data:

path_i = untar_data(URLs.CAMVID)

Let's see how both datasets are formatted:

path_l = Path('')
path_i.ls()
(#4) [Path('/root/.fastai/data/camvid/codes.txt'),Path('/root/.fastai/data/camvid/images'),Path('/root/.fastai/data/camvid/labels'),Path('/root/.fastai/data/camvid/valid.txt')]
path_l.ls()
(#9) [Path('.gitattributes'),Path('Models'),Path('Scripts'),Path('README.md'),Path('docker'),Path('CamVid'),Path('.gitignore'),Path('Example_Models'),Path('.git')]

So we can see that fastai has the usual images and labels folder, while we can't quite tell where the annotations are in our second one. Let's narrow down to the CamVid folder:

path_l = path_l/'CamVid'
path_l.ls()
(#9) [Path('CamVid/train.txt'),Path('CamVid/train'),Path('CamVid/test.txt'),Path('CamVid/val.txt'),Path('CamVid/testannot'),Path('CamVid/test'),Path('CamVid/trainannot'),Path('CamVid/valannot'),Path('CamVid/val')]

And we can see a better looking dataset! The three folders we will be caring about are trainannot, valannot and testannot, as these are where the labels live.

DataBlock

As we saw how the data was split up, fastai currently doesn't have something to work along those lines, the closest is GrandparentSplitter. We'll write something similar called FolderSplitter, which can accept names for the train and validation folders:

def _folder_idxs(items, name):
    def _inner(items, name): return mask2idxs(Path(o).parents[0].name == name for o in items)
    return [i for n in L(name) for i in _inner(items, n)]

def FolderSplitter(train_name='train', valid_name='valid'):
    "Split `items` from parent folder names `parent_idx` levels above the item"
    def _inner(o):
        return _folder_idxs(o, train_name),_folder_idxs(o, valid_name)
    return _inner

Next we will need a way to get our x images, since they live differently than our labels. We can use a custom function to do so:

def get_x(o): return path_i/'images'/o.name

Finally we need a get_y that will use that same filename to go grab our working masks:

def get_mask(o): return o.parent.parent/(o.parent.name + 'annot')/o.name

We have almost all the pieces to making our dataset now. We'll use fastai's progressive resizing when training, and pass in a set of codes for our dataset:

codes = ['Sky', 'Building', 'Pole', 'Road', 'Pavement', 'Tree', 'SignSymbol', 'Fence', 'Car', 'Pedestrian', 'Bicyclist', 'Unlabelled']
half, full = (360, 480), (720, 960)

Now for those transforms. I mentioned earlier we will be downscaling and then upscaling the images, this way the same upscaling is applied to our labels and our images, though the images start from a higher quality. Since we want to train small, we'll resize it back down in the batch transforms as well as normalize our inputs:

item_tfms = [Resize(half), Resize(full)]
batch_tfms = [*aug_transforms(size=half), Normalize.from_stats(*imagenet_stats)]

And with this we can now build the DataBlock and DataLoaders:

camvid = DataBlock(blocks=(ImageBlock, MaskBlock(codes=codes)),
                   get_items=get_image_files,
                   splitter=FolderSplitter(valid_name='val'),
                   get_x=get_x,
                   get_y=get_mask,
                   item_tfms=item_tfms,
                   batch_tfms=batch_tfms)

We'll call the .summary() to make sure our images and masks do crop to the half size:

camvid.summary(path_s)

We can see the final input and mask size is (360,480), which is what we want! Let's go ahead and make them DataLoaders:

dls = camvid.dataloaders(path_l, bs=4)

Since we have a void column, our c attribute in the DataLoaders needs to be one less:

dls.c = len(codes) - 1

Metrics

For the next part Juvian was the one to bring this to life! We want class-wise IOU as well as mIOU, which are defined below:

class IOU(AvgMetric):
    "Intersection over Union Metric"
    def __init__(self, class_index, class_label, axis, ignore_index=-1): store_attr('axis,class_index,class_label,ignore_index')
    def accumulate(self, learn):
        pred, targ = learn.pred.argmax(dim=self.axis), learn.y
        intersec = ((pred == targ) & (targ == self.class_index)).sum().item()
        union = (((pred == self.class_index) | (targ == self.class_index)) & (targ != self.ignore_index)).sum().item()
        if union: self.total += intersec
        self.count += union
  
    @property
    def name(self): return self.class_label
from sklearn.metrics import confusion_matrix

class MIOU(AvgMetric):
    "Mean Intersection over Union Metric"
    def __init__(self, classes, axis): store_attr()

    def accumulate(self, learn):
        pred, targ = learn.pred.argmax(dim=self.axis).cpu(), learn.y.cpu()
        pred, targ = pred.flatten().numpy(), targ.flatten().numpy()
        self.total += confusion_matrix(targ, pred, range(0, self.classes))

    @property
    def value(self): 
        conf_matrix = self.total
        per_class_TP = np.diagonal(conf_matrix).astype(float)
        per_class_FP = conf_matrix.sum(axis=0) - per_class_TP
        per_class_FN = conf_matrix.sum(axis=1) - per_class_TP
        iou_index = per_class_TP / (per_class_TP + per_class_FP + per_class_FN)
        iou_index = np.nan_to_num(iou_index)
        mean_iou_index = (np.mean(iou_index))    

        return mean_iou_index

    @property
    def name(self): return 'miou'

With our metric functions defined, let's combine them all. We'll want a mIOU, as well as 11 IOU for each class:

metrics = [MIOU(11, axis=1)]

Note: we do not need to pass in an ignore_index here, as any values larger than 10 get ignored

And now let's declare our IOU's. Since there's so many we'll just make a function instead that relies on our codes:

for x in range(11): metrics.append(IOU(x, codes[x], axis=1, ignore_index=11))

With this we can finally move over to our model and training:

The Model and Training

For the model we will use a pretrained ResNet34 backbone architecture that has Mish on the head of the Dynamic Unet:

config = unet_config(self_attention=False, act_cls=Mish)

Our optimizer will be ranger:

opt_func = ranger

And finally, since we have an ignore_index we need to pass this into our loss function as well, otherwise we will trigger a CUDA error: device-side assert triggered

loss_func = CrossEntropyLossFlat(ignore_index=11, axis=1)

Now let's pass this all into unet_learner:

learn = unet_learner(dls, resnet34, metrics=metrics, opt_func=opt_func, 
                     loss_func=loss_func, config=config)
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /root/.cache/torch/hub/checkpoints/resnet34-333f7ec4.pth

Phase 1

We'll find a good learning rate, fit for ten epochs frozen with GradientAccumulation to help with stability before unfreezing and training for a few more:

learn.lr_find()
SuggestedLRs(lr_min=0.0007585775572806596, lr_steep=0.0010000000474974513)

A good learning rate is around 2e-3, so we'll train with that using fit_flat_cos as the ranger optimizer should be paired with it:

lr = 2e-3
learn.fit_flat_cos(10, slice(lr), cbs=[GradientAccumulation(n_acc=16)])
epoch train_loss valid_loss miou Sky Building Pole Road Pavement Tree SignSymbol Fence Car Pedestrian Bicyclist time
0 1.380507 0.860166 0.269197 0.839513 0.644756 0.000020 0.730014 0.010095 0.692669 0.000000 0.000000 0.044094 0.000000 0.000000 00:38
1 0.697614 0.569771 0.381895 0.889391 0.811681 0.000020 0.823444 0.581527 0.781517 0.000000 0.000621 0.312639 0.000000 0.000000 00:36
2 0.467255 0.442739 0.403631 0.878348 0.775515 0.000122 0.910412 0.655963 0.808596 0.001485 0.005501 0.395828 0.004051 0.004118 00:36
3 0.359350 0.333798 0.473028 0.943389 0.848014 0.000081 0.945536 0.777093 0.788921 0.008712 0.007907 0.553949 0.206558 0.123142 00:35
4 0.303314 0.260455 0.576267 0.941747 0.850498 0.000243 0.952665 0.813468 0.889507 0.100332 0.357541 0.781372 0.222650 0.428913 00:35
5 0.273420 0.270565 0.585087 0.932999 0.864418 0.000081 0.949140 0.818339 0.869284 0.293988 0.269792 0.811271 0.216093 0.410552 00:35
6 0.254735 0.236601 0.634471 0.936278 0.870673 0.003281 0.964194 0.855702 0.889725 0.300149 0.482891 0.841558 0.286659 0.548068 00:35
7 0.240327 0.270188 0.596441 0.937764 0.871333 0.032959 0.955953 0.837018 0.822873 0.273233 0.048920 0.846606 0.319242 0.614951 00:35
8 0.209995 0.195687 0.673547 0.946722 0.886908 0.060189 0.963652 0.848611 0.892291 0.436009 0.496328 0.864719 0.363641 0.649950 00:35
9 0.185206 0.197379 0.667369 0.946318 0.889515 0.076916 0.966370 0.858471 0.891145 0.420854 0.474111 0.864551 0.342494 0.610310 00:35

Next we'll unfreeze and train for 12 more epochs. When training we will adjust the learning rate and apply the EarlyStoppingCallback to help prevent overfitting:

lrs = slice(lr/400, lr/4)
learn.unfreeze()
learn.fit_flat_cos(12, lrs, cbs=[GradientAccumulation(n_acc=16)])
epoch train_loss valid_loss miou Sky Building Pole Road Pavement Tree SignSymbol Fence Car Pedestrian Bicyclist time
0 0.182300 0.191817 0.679859 0.944626 0.883056 0.093460 0.968520 0.857086 0.903970 0.411324 0.541308 0.858053 0.379852 0.637194 00:35
1 0.171663 0.193553 0.688322 0.941426 0.889005 0.109457 0.966605 0.853181 0.892943 0.453734 0.501025 0.844362 0.418895 0.700908 00:35
2 0.164871 0.225098 0.661339 0.944716 0.888159 0.106461 0.966710 0.855766 0.872609 0.494205 0.322309 0.859968 0.357163 0.606668 00:35
3 0.163544 0.197305 0.690802 0.944984 0.891899 0.105461 0.964988 0.862533 0.880129 0.444691 0.444886 0.822281 0.502733 0.734242 00:35
4 0.157475 0.183090 0.700796 0.937403 0.889857 0.104603 0.966317 0.860737 0.905106 0.476430 0.573778 0.816515 0.466653 0.711353 00:34
5 0.149104 0.195971 0.682122 0.945185 0.888626 0.116620 0.963612 0.859883 0.897999 0.347324 0.413924 0.842594 0.493171 0.734406 00:35
6 0.146632 0.229591 0.673781 0.943697 0.891223 0.150029 0.968885 0.864813 0.850287 0.487822 0.321995 0.824384 0.412684 0.695772 00:35
7 0.140522 0.190752 0.684680 0.948050 0.900241 0.082288 0.963449 0.861177 0.885507 0.323323 0.454499 0.867523 0.501653 0.743767 00:35
8 0.134812 0.162649 0.720961 0.946529 0.903660 0.142304 0.971249 0.882453 0.899834 0.526134 0.572753 0.840024 0.488276 0.757354 00:35
9 0.129773 0.167973 0.710287 0.943785 0.902817 0.139235 0.971278 0.887299 0.901242 0.471411 0.524803 0.834308 0.491301 0.745677 00:35
10 0.122608 0.160006 0.733646 0.946728 0.905341 0.141303 0.968730 0.872145 0.903727 0.495692 0.603912 0.844581 0.579497 0.808447 00:35
11 0.117691 0.160273 0.733083 0.947101 0.906433 0.146316 0.969303 0.876323 0.902350 0.521031 0.582046 0.861685 0.554041 0.797289 00:35

We'll save away this model and quickly check how it's doing on our test set:

learn.save("360")
Path('models/360.pth')
fnames = get_image_files(path_l/'test')
test_dl = learn.dls.test_dl(fnames, with_labels=True)
metrics = learn.validate(dl=test_dl)[1:]
names = list(map(lambda x: x.name, learn.metrics))
for value, metric in zip(metrics, names):
  print(metric, value)
miou 0.6513576513430401
Sky 0.9236697535182652
Building 0.8305953126939783
Pole 0.23508068238078056
Road 0.945941641771041
Pavement 0.8307263165520995
Tree 0.7624501080720603
SignSymbol 0.43377705211681206
Fence 0.3818599814936706
Car 0.8266845071526181
Pedestrian 0.5090933599937093
Bicyclist 0.4850554490284065

We can see a starting mIOU of 65% almost matching the mid-tier performer, let's see if we can take it further by using the full sized images

Phase 2:

First let's free up our memory:

del learn
torch.cuda.empty_cache()
import gc
gc.collect()
11257

We'll adjust our transforms to instead keep our full sized images:

item_tfms = [Resize(half), Resize(full)]
batch_tfms = [*aug_transforms(size=full), Normalize.from_stats(*imagenet_stats)]

And simply train again from there:

camvid = DataBlock(blocks=(ImageBlock, MaskBlock(codes=codes)),
                   get_items=get_image_files,
                   splitter=FolderSplitter(valid_name='val'),
                   get_x=get_x,
                   get_y=get_mask,
                   item_tfms=item_tfms,
                   batch_tfms=batch_tfms)

dls = camvid.dataloaders(path_l, bs=2)
dls.c = len(codes) - 1

We'll need to re-declare our metrics as the current ones have memory of our last training session:

metrics = [MIOU(11, axis=1)]
for x in range(11): metrics.append(IOU(x, codes[x], axis=1, ignore_index=11))

And now let's train:

learn = unet_learner(dls, resnet34, metrics=metrics, opt_func=opt_func,
                     config=config, loss_func=loss_func)
learn.load('360');
learn.freeze()

lr = 1e-3
learn.fine_tune(12, lr, cbs=[GradientAccumulation(n_acc=16), EarlyStoppingCallback()])
epoch train_loss valid_loss miou Sky Building Pole Road Pavement Tree SignSymbol Fence Car Pedestrian Bicyclist time
0 0.211583 0.217368 0.711367 0.949347 0.888438 0.112724 0.925386 0.750803 0.893861 0.535350 0.629848 0.865719 0.491392 0.782169 02:00
epoch train_loss valid_loss miou Sky Building Pole Road Pavement Tree SignSymbol Fence Car Pedestrian Bicyclist time
0 0.180150 0.165417 0.734545 0.940707 0.910473 0.077289 0.952566 0.826502 0.915739 0.597755 0.683396 0.874197 0.522924 0.778450 02:01
1 0.163459 0.167075 0.727612 0.941061 0.910744 0.090534 0.958094 0.846227 0.907565 0.588291 0.661547 0.872477 0.487174 0.740012 02:03
No improvement since epoch 0: early stopping

Let's check it's final IOU:

fnames = get_image_files(path_l/'test')
test_dl = learn.dls.test_dl(fnames, with_labels=True)
metrics = learn.validate(dl=test_dl)[1:]
names = list(map(lambda x: x.name, learn.metrics))
for value, metric in zip(metrics, names):
  print(metric, value)
miou 0.6408294816140846
Sky 0.9200597102142184
Building 0.829854500316895
Pole 0.25428639814379245
Road 0.8889084399152402
Pavement 0.6907592069297879
Tree 0.7675178133912139
SignSymbol 0.46596359844043667
Fence 0.3685350595915968
Car 0.7864697614577011
Pedestrian 0.5871856373610818
Bicyclist 0.48958417199296694

Results and Discussion

At first we tried a standard Unet without any special tricks, and we got a test mIOU of around 59%. From this baseline we tried applying Self-Attention, Label Smoothing, and the Mish activation function (as the default is ReLU).

What we found is that by simply applying Mish we could boost that 59% to around 64%, and do note that Mish was only applied to the head of the model, not in the ResNet backbone. (with the highest we got around 67% mIOU)

Self Attention did not seem to help as much, bringing down the mIOU to 62% when training even with the Mish activation function.

Applying Label Smoothing led to a very different result baked inside of each individual IOU. While the mIOU was not as high as a flat Mish model, the distributions of the IOU's changed.

When applying the proper presizing techniques demonstrated here, we saw a boost of 10% mIOU, confirming an idea that simply blowing up your masks to match the original image resolution can diminish the value inside of them.

Conclusions

What conclusions can we actually make from this study? Not as much as you would think, and the reason lies within current issues in Academia. Right now there are three different datasets being used:

  • fastai images with SegNet masks
  • SegNet images and masks
  • fastai images and labels while ignoring all the other classes

Well... who is right then? Technically 2 and 3 are right, but the three cannot be compared equally. Remember that benchmark table I showed earlier? If you go and read the papers each use one of the three techniques done here.

So... what can we make of this?

There is one direct conclusion we can make: using Mish in the head of our Dynamic Unet boosts the mIOU by 5%. So it is absolutely worth trying and using with your projects.

Where do we go from here?

A better dataset which is much more consistant is the CityScapes dataset. It's for research only and you must upload your predictions on the test set to the website, it's essentially a Kaggle competition for researchers, a format I believe works much better. Researchers compare both how they perform on the validation set and the test set. This is certainly an easier benchmark for folks to tackle with the fastai UNet, so hopefully one day someone will try a benchmark and see how it does!