Research Blog

How to Train Your ResNet 6: Weight Decay

December 19, 2018

We learn more about the influence of weight decay on training and uncover an unexpected relation to LARS.

How to Train Your ResNet

September 24, 2018

The introduction to a series of posts investigating how to train Residual networks efficiently on the CIFAR10 image classification dataset. By the fourth post, we can train to the 94% accuracy threshold of the DAWNBench competition in 79 seconds on a single V100 GPU.

How to Train Your ResNet 1: Baseline

September 24, 2018

We establish a baseline for training a Residual network to 94% test accuracy on CIFAR10, which takes 297s on a single V100 GPU.

How to Train Your ResNet 2: Mini-batches

September 24, 2018

We investigate the effects of mini-batch size on training and use larger batches to reduce training time to 256s.

How to Train Your ResNet 3: Regularisation

September 24, 2018

We identify a performance bottleneck and add regularisation to reduce the training time further to 154s.

How to Train Your ResNet 4: Architecture

September 24, 2018

We search for more efficient network architectures and find a 9 layer network that trains in 79s.

How to Train Your ResNet 5: Hyperparameters

November 28, 2018

We develop some heuristics for hyperparameter tuning.


September 20, 2018

Are GPUs a good target for speech synthesis? Is Baidu's GPU implementation of WaveNet the best you can do on a GPU? We run some tests, discuss latency and find out