Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could these quantized models make MTP (Multi-Token Prediction) significantly faster when used as drafters for larger regular Gemma 4 models?
 help



Google already released specialized drafters for Gemma 4.

The E2B ones? Or what do you mean by specialized drafters?

They have -assistant in the name, so e.g.: https://huggingface.co/google/gemma-4-31B-it-assistant

Thanks

The “-assistant” models released by Google are specialised tiny MTP draft models :)

31b-it-assistant is what enables MTP




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: