Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Speaking of CLIP, I'm always troubled that the next CLIP might not get released as both OpenAI and Google are shifting into competition mode. Sad to think there might be a more advanced version of CLIP already but sitting in a secret vault somewhere.

Edit: I'm not referring to a CLIP-2 but any advance on the same level of importance as CLIP.



The biggest CLIP models we know of are open source.

If a company has a bigger CLIP model they don't have even reported that.

Also OpenAI had already for a moment a proprietary CLIP model that was bigger than any other models available, the CLIP-H used by Dalle 2.


As someone who is out of the loop but could use high quality image embeddings right now, what's the best CLIP model right now?


it really depends on what you're trying to achieve, if you want to build a semantic image search then a small/base model would be fine, I think that bigger models usually leak to much information that makes the embeddings space to difficult to interpreter for simple algorithm like cosine similarity, if you want to condition a generative model then a bigger model should provide more information about the prompt or the image.


SDXL uses OpenCLIP, and then OpenAI CLIP as a backup basically to allow it to spell words properly, but I think you could replace the second one.


Stable Diffusion switched to OpenCLIP for stable diffusion 2. But it looks they went back to clip for the xl version.

People complained about openclip not being as good. Hopefully we can have a better and open clip model eventually.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: