thought loops, need temperature adjustment for chat models

#4321

Issue Details

20 days ago
No assignee
bug-unconfirmed
JasonW25JasonW25
opened 20 days ago
Author

using version V0.30.0

running primarily Qwen3, different parameters from 4B,8B,14B,and 32B, also running quant4 versions. problem is very bad with Polaris-4B-Preview at q8.

hardware is a 5060ti and 3060. I swap between running the models on a single gpu and shared depending on whats training on my cards. Do not think the hardware is the issue though.

Seems like the model temperature needs to be higher and there doesn't really seem like a way to change this unless I fork the project.

The lower parameter models seem to be a little worse, but I still have issues with qwen3-32B.

Polaris almost every time I have tried will form a thought loop of about like 500ish tokens.

Qwen3 typically is much smaller. I have had it just get stuck spamming |||||||||||||||| and ... it seems to happen more often on the smaller qwens, but I have had it happen on qwen3-32B