Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(Disclaimer: I work on interpretability at Anthropic.)

I wanted to flag that this is an accessible blog post and that there's a link to the paper ( https://transformer-circuits.pub/2025/introspection/index.ht... ) at the top. The paper explores this in more detail and rigor.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: