Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
NeutralArtificial Intelligence
The recent success of Shampoo in the AlgoPerf contest has reignited interest in optimization algorithms for training neural networks, particularly those based on Kronecker factorization. While Shampoo's performance is impressive, it relies on complex heuristics like learning rate grafting and stale preconditioning, which complicate the algorithm and require additional hyperparameter tuning. Understanding these elements is crucial for researchers and developers looking to enhance neural network training efficiency.
— Curated by the World Pulse Now AI Editorial System



