Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The pace of commoditization in image generation is wild. Every 3-4 months the SOTA shifts, and last quarter's breakthrough becomes a commodity API.

What's interesting is that the bottleneck is no longer the model — it's the person directing it. Knowing what to ask for and recognizing when the output is good enough matters more than which model you use. Same pattern we're seeing in code generation.

 help



SOTA shifts, yes. But the average person doing the work has been very happy with SDXL based models. And that was released two years ago.

The fight right now outside of API SOTA is who will replace SDXL to be the “community preference”

It’s now a three way between Flux2 Klein, Z-Image, and now Qwen2.


There is a decent chance there will be no clear consensus... Maybe people going custom LoRas etc should publish for the 3x most common models. Or maybe the tooling will make it so that switching models in a workflow will be painfree, as has kind of happened with LLMs.

Could be. Which isn’t too bad a scenario if lora makers release for multiple models for the popular platforms at once.

This year I will have direct need for diffusion models for work, so I’m keeping an eye on this for sure.


I'm happy the models are becoming commodity, but we still have a long way to go.

I want the ability to lean into any image and tweak it like clay.

I've been building open source software to orchestrate the frontier editing models (skip to halfway down), but it would be nice if the models were built around the software manipulation workflows:

https://getartcraft.com/news/world-models-for-film


PLEASE STOP POSTING AI GENERATED COMMENTS



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: