Is there a way to run these Omni models on a Macbook quantized via GGUF or MLX? ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		terhechte 69 days ago \| parent \| context \| favorite \| on: Qwen3-Omni-Flash-2025-12-01：a next-generation nati... Is there a way to run these Omni models on a Macbook quantized via GGUF or MLX? I know I can run it in LMStudio or Llama.cpp but they don't have streaming microphone support or streaming webcam support. Qwen usually provides example code in Python that requires Cuda and a non-quantized model. I wonder if there is by now a good open source project to support this use case?

tgtweak 69 days ago | [–]

You can probably follow the vLLM instructions for omni here, then use the included voice demo html to interface with it:

https://github.com/QwenLM/Qwen3-Omni#vllm-usage

https://github.com/QwenLM/Qwen3-Omni?tab=readme-ov-file#laun...

mobilio 69 days ago | [–]

Yes - there is a way: https://github.com/ggml-org/whisper.cpp

novaray 69 days ago | [–]

Whisper and Qwen Omni models have completely different architectures as far as I know

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact