GLOSSARY ENTRY (DERIVED FROM QUESTION BELOW) | ||||
---|---|---|---|---|
|
16:05 Jul 22, 2020 |
|
Spanish to English translations [PRO] Mathematics & Statistics | |||||||
---|---|---|---|---|---|---|---|
|
| ||||||
| Selected response from: Francois Boye United States Local time: 07:57 | ||||||
Grading comment
|
Summary of answers provided | ||||
---|---|---|---|---|
4 | exponentially decaying average of past squared gradientss |
| ||
4 | an exponentially decreasing average of squared past gradients |
|
exponentially decaying average of past squared gradientss Explanation: I've found this, though I have no idea what it means! Adaptive Moment Estimation (Adam) is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients s like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients v, similar to momentum. Whereas momentum can be seen as a ball running down a slope, Adam behaves like a heavy ball with friction, which thus prefers flat minima in the error surface. https://towardsdatascience.com/optimisation-algorithm-adapti... -------------------------------------------------- Note added at 17 mins (2020-07-22 16:23:26 GMT) -------------------------------------------------- Oops! 'Gradients' should only have one 's'. -------------------------------------------------- Note added at 2 hrs (2020-07-22 18:59:14 GMT) -------------------------------------------------- The naive way to do the windowed accumulation of squared gradients is simply by accumulating the last w squared gradients. However, storing and updating the w previous squared gradients is not efficient, especially when the parameter to be updated is very large, which in deep learning could become millions of parameters. Instead, the author of Adadelta implements the accumulation as an exponentially decaying average of the squared gradients, which denoted by 𝔼[g²]. This local accumulation at timestep 𝑡 is computed by https://medium.com/konvergen/continuing-on-adaptive-method-a... 4.6 Adam Adaptive Moment Estimation (Adam) [10] is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients vt like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients mt, similar to momentum: https://arxiv.org/pdf/1609.04747.pdf RMSprop Root Mean Squared Propagation (RMSprop) is very close to Adagrad, except for it does not provide the sum of the gradients, but instead an exponentially decaying average. This decaying average is realized through combining the Momentum algorithm and Adagrad algorithm, with a new term. https://mlfromscratch.com/optimizers-explained/#/ Adam Adam stands for Adaptive Moment Estimation. In addition to storing an exponentially decaying average of past squared gradients like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients, similar to momentum. https://www.kaggle.com/residentmario/keras-optimizers |
| |||||||||||||
8 hrs confidence:
Login or register (free and only takes a few minutes) to participate in this question. You will also have access to many other tools and opportunities designed for those who have language-related jobs (or are passionate about them). Participation is free and the site has a strict confidentiality policy. KudoZ™ translation helpThe KudoZ network provides a framework for translators and others to assist each other with translations or explanations of terms and short phrases.
See also: Search millions of term translations Your current localization setting
English
Select a language Close search
|