Add muon and flash-muon optimizer
Issue Details
Author
Feature request
Muon: https://github.com/KellerJordan/Muon Flash-Muon: https://github.com/nil0x9/flash-muon Paper: https://arxiv.org/pdf/2502.16982
Motivation
An effective optimizer method to further accelerate LLM training. The Muon team has recently proven the importance of the muon optimizer in the LLM models they released.
Your contribution
I want to add the transformers library to this optimization.
Issue Details
Add muon and flash-muon optimizer
Author
Feature request
Muon: https://github.com/KellerJordan/Muon Flash-Muon: https://github.com/nil0x9/flash-muon Paper: https://arxiv.org/pdf/2502.16982
Motivation
An effective optimizer method to further accelerate LLM training. The Muon team has recently proven the importance of the muon optimizer in the LLM models they released.
Your contribution
I want to add the transformers library to this optimization.