Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Those are LLMs with an extra modality bolted to them.

Which is good - that it works this well speaks of the generality of autoregressive transformers, and the "reasoning over image data" progress with things like Qwen3-VL is very impressive. It's a good capability to have. But it's not a separate thing from the LLM breakthrough at all.

Even the more specialized real time robotics AIs often have a bag of transformers backed by an actual LLM.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: