Transformers for software engineers

sthatipamala · on April 2, 2022

Thanks for this writeup! Software engineers should definitely learn about Transformers. They are a significant step-change in the advancement of deep learning. They've proven useful in multiple domains outside of natural language, including computer vision and audio.

This opens up the world of multi-modal models which represent concepts by considering multiple types of input. The most popular one has been OpenAI's CLIP [0]

If anyone's interested, I run a podcast and did an episode on Transformer models and their implications [1]. Check it out!

[0] https://openai.com/blog/clip/ [1] https://www.youtube.com/watch?v=Kb0II5DuDE0

dontreact · on April 2, 2022

Just a fun technical note:

CLIP would be possible with any language and vision encoder.

It turns out that transformers were the most efficient encoder choices available at the time but most likely the approach would have given interesting results with a resnet and some sort of convolutional language encoder. In fact the paper has a resnet as one model for the vision side.

ShamelessC · on April 3, 2022

Yeah they trained multiple ResNet models. The ViT's (vision transformers) tended to outperform the ResNet's although they discuss a wide range of datasets and I believe the ResNet ones are better in some cases. They have also been used for visual salience tasks as they may have a better representation of positional information in a scene.

I think they do use a modified ResNet with some form of attention in it - but I'm not clear on the specifics and my understanding is that is somewhat common now.

amelius · on April 2, 2022

The Limitations section of [0] isn't very encouraging.

> While CLIP usually performs well on recognizing common objects, it struggles on more abstract or systematic tasks such as counting the number of objects in an image and on more complex tasks such as predicting how close the nearest car is in a photo. On these two datasets, zero-shot CLIP is only slightly better than random guessing. Zero-shot CLIP also struggles compared to task specific models on very fine-grained classification, such as telling the difference between car models, variants of aircraft, or flower species.

> CLIP also still has poor generalization to images not covered in its pre-training dataset. For instance, although CLIP learns a capable OCR system, when evaluated on handwritten digits from the MNIST dataset, zero-shot CLIP only achieves 88% accuracy, well below the 99.75% of humans on the dataset. Finally, we’ve observed that CLIP’s zero-shot classifiers can be sensitive to wording or phrasing and sometimes require trial and error “prompt engineering” to perform well.

pstoll · on April 3, 2022

This is a really accessible write up.

Although I agree with the comment elsewhere of asking for a “examples in Rust” comment somewhere to save everyone the 15 sec of squinting “is this … C++, or…” who don’t instantly recognize it.

Also funny that I just shared a Twitter thread on this topic to a coworker.

I was sharing the POV of this prof with the “wow, it’s amazing to think all the hard-won gains I ever did with deep feature engineering and tuning is now useless!”

Sharing in case others find it helpful.

https://twitter.com/moyix/status/1469401502422818823?s=10&t=...

Which references this thread:

https://twitter.com/karpathy/status/1468370605229547522?s=10...

civilized · on April 2, 2022

There are millions of "Transformers Explained" blog posts by now. The one I got the most out of is "Transformers from Scratch" by Peter Bloem:

http://peterbloem.nl/blog/transformers

blueflow · on April 2, 2022

I hoped this would be an explanation of electricity, magnetism, the Lorentz force and induction.

sydthrowaway · on April 3, 2022

Principles probably 10x more useful than ML gobbledygook. We need more fundamental research.

softcactus · on April 3, 2022

Curious why you think this. Do you not believe there is some value in automation of certain tasks?

Jensson · on April 3, 2022

No he doesn't believe it is useless, he said that it was probably 0.1x as useful as something else, so unless he thinks that something else is useless he considers this knowledge to still be useful.

swframe2 · on April 3, 2022

Here is one that shows how to implement Transforms using excel.

https://www.youtube.com/watch?v=S9eKuRVigjY

gregsadetsky · on April 3, 2022

Thanks for sharing this!

bitL · on April 3, 2022

It's funny how nobody outside expensive classes at Stanford says anything about the relationship of attention to a (somewhat fuzzy) lookup table. With that in mind transformers suddenly become easy for software engineers.

chaosprint · on April 2, 2022

Very good content! It's interesting that you use Rust here.

nelhage · on April 2, 2022

(author here)

I actually did a v0 writeup in Go, but I wanted a language that had a bit more powerful type system and more support for a fluent/functional style in some of the expressions. I was optimizing for what I know and felt was highly expressive; I've since gotten a bunch of feedback from people who are interested in the content but not comfortable reading Rust, so perhaps it wasn't the best choice.

ShamelessC · on April 3, 2022

What does the`.0` calling convention mean in Rust? e.g. ``` for (i, r) in right.0.iter().enumerate() { out.0[i] += r; } ```

Aside from that, I thought it was fairly legible. Great write-up by the way. Squashing things into state helps get rid of some of the spookiness created by matrix multiplication and back-propagation. I also really appreciated seeing the explanation on the actual MLP part of the transformer as that is typically assumed to be prior knowledge in other tutorials.

heinrichf · on April 3, 2022

It's to access the tuple's single element, but the author could have used #[repr(transparent)]

mrfusion · on April 3, 2022

You should mention that you’re using rust. I got distracted trying I figure out what language you were using.

mrfusion · on April 3, 2022

Python would be awesome!

f311a · on April 2, 2022

I think it’s better to use other languages for pseudo code. Rust is not the easiest language to read and understand for people who never used it.