Could these quantized models make MTP (Multi-Token Prediction) significantly fas... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		somewhatrandom9 2 days ago \| parent \| context \| favorite \| on: Gemma 4 QAT models: Optimizing compression for mob... Could these quantized models make MTP (Multi-Token Prediction) significantly faster when used as drafters for larger regular Gemma 4 models?
		help

dist-epoch 2 days ago [–]

Google already released specialized drafters for Gemma 4.

Havoc 1 day ago | [–]

The E2B ones? Or what do you mean by specialized drafters?

int_19h 1 day ago | | | [–]

They have -assistant in the name, so e.g.: https://huggingface.co/google/gemma-4-31B-it-assistant

Havoc 1 day ago | | | [–]

Thanks

girvo 1 day ago | | | [–]

The “-assistant” models released by Google are specialised tiny MTP draft models :)

31b-it-assistant is what enables MTP

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact