Notes on Transfer Learning in Caffe

This is a very straightforward practical approach:

https://github.com/NVIDIA/DIGITS/tree/master/examples/fine-tuning

One trick I’ve learned from somewhere (can’t find the link, unfortunately), which is a break from the above tutorial, is to simply reduce the base learning rate by an order of magnitude when transferring, while simultaneously seting the “new” layers to have their lr_mult an order of magnitude higher than the rest of the network

so, to initially train:

  1. learning rate =.01
  2. each layer's lr_rate = 1

to transfer learn:

  1. learning rate = .001
  2. each previous layer's lr_rate = 1
  3. new layers lr_rate = 10

This saves a lot of editing of the files and allows for small amounts of adjustment to existing layers, while focusing the bulk of the learning on the newer layers, yet still resisting to overfitting small datasets.

This is pretty informative:

https://cs231n.github.io/transfer-learning/
And I think this is the original TL paper as far as I remember:

http://www.datascienceassn.org/sites/default/files/How%20Transferable%20are%20Features%20in%20Deep%20Neural%20Networks.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *