Machine Learning | Artificial Intelligence
LoveOxygenProducers
β’
1y ago
β’
95%
[R] Unraveling the Mysteries: Why is AdamW Often Superior to Adam+L2 in Practice?
Hello, ML enthusiasts! ππ€ We analyzed rotational equilibria in our latest work, ROTATIONAL EQUILIBRIUM: HOW WEIGHT DECAY BALANCES LEARNING ACROSS NEURAL NETWORKS
π‘ Our Findings: Balanced average rotational updates (effective learning rate) across all network components may play a key role in the effectiveness of AdamW.
π ROTATIONAL EQUILIBRIUM: HOW WEIGHT DECAY BALANCES LEARNING ACROSS NEURAL NETWORKS
Looking forward to hearing your thoughts! Letβs discuss more about this fascinating topic together!
Comments 3