Friday, June 19, 2020

With enough data and/or fine tuning, simpler models are as good as more complex models

This is an age-old issue that seems to repeat itself in every field. There are a couple of recent papers published criticising the race to beat SOTA.

This recent paper demonstrates that older and simpler model perform as well as newer models as long as they get enough data to train.

This has some interesting impact on production systems. As if you already have a good enough model, throwing more data at it can help achieve close to SOTA result.
Which means that you won't have to build from scratch a new model to keep up with SOTA in your production system. You just need to collect more data as the system run and retrain your model once in a while.
Also, less complex models tend to have shorter Inference time in production. Which would be a quite crucial component as well that gets impacted by model complexity.

In another recent paper, the authors look at Metric learning papers from the past four years and demonstrate that the performance claims over the old method (often more than double) are mainly due to the lack of tuning.
Most of the time the authors of the SOTA beating algorithm show two evaluations. One where they finetune their algorithm on the test set and compare against the off the shelf tuning SOTA algorithm.

"Our results show that when hyperparameters are properly tuned via cross-validation, most methods perform similarly to one another"

"...this brings into question the results of other cutting edge papers not covered in our experiments. It also raises doubts about the value of the hand-wavy theoretical explanations in metric learning papers."
This happens time and time again across the industry and academia: perf benchmark of CPU Intel vs AMD, GPU Nvidia vs ATI, Network, Storage, etc....
This can be due to lack of knowledge, time, integrity, etc..

To conclude, be careful, the latest shiny model might note the best one for your production. If you spend enough time and data on older models you might achieve the same performance at lower inference cost.
Obviously, this assumes that you already have the best practice when it comes to model monitoring in production :)

No comments :

Post a comment