Add muon and flash-muon optimizer

#39537

Issue Details

14 days ago
No assignee
Feature request
kadirnarkadirnar
opened 14 days ago
Author

Feature request

Muon: https://github.com/KellerJordan/Muon Flash-Muon: https://github.com/nil0x9/flash-muon Paper: https://arxiv.org/pdf/2502.16982

Motivation

An effective optimizer method to further accelerate LLM training. The Muon team has recently proven the importance of the muon optimizer in the LLM models they released.

Image

Your contribution

I want to add the transformers library to this optimization.