Double descent

In statistics and machine learning, double descent is the phenomenon where a statistical model with a small number of parameters and a model with an extremely large number of parameters have a small error, but a model whose number of parameters is about the same as the number of data points used to train the model will have a large error.[2] It was discovered around 2018 when researchers were trying to reconcile the bias-variance tradeoff in classical statistics, which states that having too many parameters will yield an extremely large error, with the 2010s empirical observation of machine learning practitioners that the larger models are, the better they work.[3][4] The scaling behavior of double descent has been found to follow a broken neural scaling law[5] functional form.

An example of the double descent phenomenon in a two-layer neural network: When the ratio of parameters to data points is increased, the test error falls first, then rises, then falls again.[1] The vertical line marks the boundary between the underparametrized regime (more data points than parameters) and the overparameterized regime (more parameters than data points).

References

  1. Schaeffer, Rylan; Khona, Mikail; Robertson, Zachary; Boopathy, Akhilan; Pistunova, Kateryna; Rocks, Jason W.; Fiete, Ila Rani; Koyejo, Oluwasanmi (2023-03-24). "Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle". arXiv:2303.14151v1 [cs.LG].
  2. "Deep Double Descent". OpenAI. 2019-12-05. Retrieved 2022-08-12.
  3. evhub (2019-12-05). "Understanding "Deep Double Descent"". LessWrong.
  4. Belkin, Mikhail; Hsu, Daniel; Ma, Siyuan; Mandal, Soumik (2019-08-06). "Reconciling modern machine learning practice and the bias-variance trade-off". Proceedings of the National Academy of Sciences. 116 (32): 15849–15854. arXiv:1812.11118. doi:10.1073/pnas.1903070116. ISSN 0027-8424. PMC 6689936. PMID 31341078.
  5. Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.

Further reading


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.