exponencialmente decreciente de los gradientes cuadrados pasados

English translation: an exponentially decreasing average of squared past gradients

GLOSSARY ENTRY (DERIVED FROM QUESTION BELOW)
Spanish term or phrase:exponencialmente decreciente de los gradientes cuadrados pasados
English translation:an exponentially decreasing average of squared past gradients

16:05 Jul 22, 2020
    The asker opted for community grading. The question was closed on 2020-07-26 15:54:12 based on peer agreement (or, if there were too few peer comments, asker preference.)


Spanish to English translations [PRO]
Mathematics & Statistics
Spanish term or phrase: exponencialmente decreciente de los gradientes cuadrados pasados
Context: También almacena un promedio exponencialmente decreciente de los gradientes cuadrados pasados similar a RMSprop.
Robert Copeland
United States
Local time: 07:57
an exponentially decreasing average of squared past gradients
Explanation:
a decreasing average is an average that decreases over time; that decrease is exponential if follows an exponential function.

https://en.wikipedia.org/wiki/Gradient


Selected response from:

Francois Boye
United States
Local time: 07:57
Grading comment
Selected automatically based on peer agreement.
4 KudoZ points were awarded for this answer



Summary of answers provided
4exponentially decaying average of past squared gradientss
Helena Chavarria
4an exponentially decreasing average of squared past gradients
Francois Boye


  

Answers


17 mins   confidence: Answerer confidence 4/5Answerer confidence 4/5
exponentially decaying average of past squared gradientss


Explanation:
I've found this, though I have no idea what it means!

Adaptive Moment Estimation (Adam) is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients s like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients v, similar to momentum. Whereas momentum can be seen as a ball running down a slope, Adam behaves like a heavy ball with friction, which thus prefers flat minima in the error surface.

https://towardsdatascience.com/optimisation-algorithm-adapti...

--------------------------------------------------
Note added at 17 mins (2020-07-22 16:23:26 GMT)
--------------------------------------------------

Oops! 'Gradients' should only have one 's'.

--------------------------------------------------
Note added at 2 hrs (2020-07-22 18:59:14 GMT)
--------------------------------------------------

The naive way to do the windowed accumulation of squared gradients is simply by accumulating the last w squared gradients. However, storing and updating the w previous squared gradients is not efficient, especially when the parameter to be updated is very large, which in deep learning could become millions of parameters. Instead, the author of Adadelta implements the accumulation as an exponentially decaying average of the squared gradients, which denoted by 𝔼[g²]. This local accumulation at timestep 𝑡 is computed by

https://medium.com/konvergen/continuing-on-adaptive-method-a...

4.6 Adam
Adaptive Moment Estimation (Adam) [10] is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients vt like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients mt, similar to momentum:

https://arxiv.org/pdf/1609.04747.pdf

RMSprop
Root Mean Squared Propagation (RMSprop) is very close to Adagrad, except for it does not provide the sum of the gradients, but instead an exponentially decaying average. This decaying average is realized through combining the Momentum algorithm and Adagrad algorithm, with a new term.

https://mlfromscratch.com/optimizers-explained/#/

Adam
Adam stands for Adaptive Moment Estimation. In addition to storing an exponentially decaying average of past squared gradients like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients, similar to momentum.

https://www.kaggle.com/residentmario/keras-optimizers

Helena Chavarria
Spain
Local time: 12:57
Native speaker of: Native in EnglishEnglish
PRO pts in category: 12

Peer comments on this answer (and responses from the answerer)
agree  philgoddard: These terms may look difficult to a non-statistician, and I'm not one, but they're fairly easy to guess and Google.
7 mins
  -> Cheers, Phil :-)

disagree  Francois Boye: a gradient does not decay; instead it in/decreases
1 hr
  -> As I have mentioned, I'm definitely no expert. I suggest you contact the authors of the papers/articles I've used to illustrate my answer. Thank you for your much-appreciated opinion.
Login to enter a peer comment (or grade)

8 hrs   confidence: Answerer confidence 4/5Answerer confidence 4/5
an exponentially decreasing average of squared past gradients


Explanation:
a decreasing average is an average that decreases over time; that decrease is exponential if follows an exponential function.

https://en.wikipedia.org/wiki/Gradient




Francois Boye
United States
Local time: 07:57
Specializes in field
Native speaker of: Native in FrenchFrench
PRO pts in category: 72
Grading comment
Selected automatically based on peer agreement.
Login to enter a peer comment (or grade)



Login or register (free and only takes a few minutes) to participate in this question.

You will also have access to many other tools and opportunities designed for those who have language-related jobs (or are passionate about them). Participation is free and the site has a strict confidentiality policy.

KudoZ™ translation help

The KudoZ network provides a framework for translators and others to assist each other with translations or explanations of terms and short phrases.


See also:
Term search
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search