More

freeqaz · 2025-12-05T12:09:50 1764936590

I spent way too many hours writing this all today, but I wanted to get this pushed out for others to learn from. There is a ton of detail in this notes file[0] that Claude Code helped me assemble.

If anybody has any suggestions or questions, shoot! It's 4am though so I'll be back in a bit. These CVEs are quite brutal.

0: https://github.com/freeqaz/react2shell/blob/master/EXPLOIT_N...

freeqaz · 2025-12-01T05:40:50 1764567650

Unfortunately not. It's still very broken, and next year it will be worse for a ton of people. I got AI to write a short answer for you:

> Short version: Obamacare never turned into “free primary care for everyone,” it was just a bunch of rules and subsidies bolted onto the same old private-insurance maze. It helped at the margins (more people covered, protections for pre-existing conditions), but premiums/deductibles can still go nuclear if you’re in the wrong income bracket, state, or employer situation. From an EU/Poland perspective it’s not a public health system at all, just a slightly nerfed market where you still get to roll the dice every year.

freeqaz · 2025-10-20T13:18:49 1760966329

There is also a tradeoff between different vocabulary sizes (how many entries exist in the token -> embedding lookup table) that inform the current shape of tokenizers and LLMs. (Below is my semi-armchair stance, but you can read more in depth here[0][1].)

If you tokenized at the character level ('a' -> embedding) then your vocabulary size would be small, but you'd have more tokens required to represent most content. (And context scales non-linearly, iirc, like n^3) This would also be a bit more 'fuzzy' in terms of teaching the LLM to understand what a specific token should 'mean'. The letter 'a' appears in a _lot_ of different words, and it's more ambiguous for the LLM.

On the flip side: What if you had one entry in the tokenizer's vocabulary for each word that existed? Well, it'd be far more than the ~100k entries used by popular LLMs, and that has some computational tradeoffs like when you calculate the probability of each 'next' token via softmax, you'd have to run that for each token, as well as increasing the size of certain layers within the LLM (more memory + compute required for each token, basically).

Additionally, you run into a new problem: 'Rare Tokens'. Basically, if you have infinite tokens, you'll run into specific tokens that only appear a handful of times in the training data and the model is never able to fully imbue the tokens with enough meaning for them to _help_ the model during inference. (A specific example being somebody's username on the internet.)

Fun fact: These rare tokens, often called 'Glitch Tokens'[2], have been used for all sorts of shenanigans[3] as humans learn to break these models. (This is my interest in this as somebody who works in AI security)

As LLMs have improved, models have pushed towards the largest vocabulary they can get away with without hurting performance. This is about where my knowledge on the subject ends, but there have been many analyses done to try to compute the optimal vocabulary size. (See the links below)

One area that I have been spending a lot of time thinking about is what Tokenization looks like if we start trying to represent 'higher order' concepts without using human vocabulary for them. One example being: Tokenizing on LLVM bytecode (to represent code more 'densely' than UTF-8) or directly against the final layers of state in a small LLM (trying to use a small LLM to 'grok' the meaning and hoist it into a more dense, almost compressed latent space that the large LLM can understand).

It would be cool if Claude Code, when it's talking to the big, non-local model, was able to make an MCP call to a model running on your laptop to say 'hey, go through all of the code and give me the general vibe of each file, then append those tokens to the conversation'. It'd be a lot fewer tokens than just directly uploading all of the code, and it _feels_ like it would be better than uploading chunks of code based on regex like it does today...

This immediately makes the model's inner state (even more) opaque to outside analysis though. e.g., like why using gRPC as the protocol for your JavaScript front-end sucks: Humans can't debug it anymore without other tooling. JSON is verbose as hell, but it's simple and I can debug my REST API with just network inspector. I don't need access to the underlying Protobuf files to understand what each byte means in my gRPC messages. That's a nice property to have when reviewing my ChatGPT logs too :P

Exciting times!

0: https://www.rohan-paul.com/p/tutorial-balancing-vocabulary-s...

1: https://arxiv.org/html/2407.13623v1

2: https://en.wikipedia.org/wiki/Glitch_token

3: https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldm...

rco8786 · 2025-10-20T14:33:01 1760970781

Again, super interesting thanks!

> One area that I have been spending a lot of time thinking about is what Tokenization looks like if we start trying to represent 'higher order' concepts without using human vocabulary for them. One example being: Tokenizing on LLVM bytecode (to represent code more 'densely' than UTF-8)

I've had similar ideas in the past. High level languages that humans write are designed for humans. What does an "LLM native" programming language look like? And, to your point about protobufs vs JSON, how does a human debug it when the LLM gets stuck?

> It would be cool if Claude Code, when it's talking to the big, non-local model, was able to make an MCP call to a model running on your laptop to say 'hey, go through all of the code and give me the general vibe of each file, then append those tokens to the conversation'. It'd be a lot fewer tokens than just directly uploading all of the code, and it _feels_ like it would be better than uploading chunks of code based on regex like it does today...

That's basically the strategy for Claude's new "Skills" feature, just in a more dynamic/AI driven way. Claude will do semantic search through YAML frontmatter to determine what skill might be useful in a given context, then load that entire skill file into context to execute it. Your idea here is similar, use a small local model to summarize each file (basically dynamically generate that YAML front matter), feed those into the larger model's context, and then it can choose which file(s) it cares about based on that.

freeqaz · 2025-10-20T10:48:53 1760957333

Since I'm 5+ years out from my NDA around this stuff, I'll give some high level details here.

Snapchat heavily used Google AppEngine to scale. This was basically a magical Java runtime that would 'hot path split' the monolithic service into lambda-like worker pools. Pretty crazy, but it worked well.

Snapchat leaned very heavily on this though and basically let Google build the tech that allowed them to scale up instead of dealing with that problem internally. At one point, Snap was >70% of all GCP usage. And this was almost all concentrated on ONE Java service. Nuts stuff.

Anyway, eventually Google was no longer happy with supporting this and the corporate way of breaking up is "hey we're gonna charge you 10x what did last year for this, kay?" (I don't know if it was actually 10x. It was just a LOT more)

So began the migration towards Kubernetes and AWS EKS. Snap was one of the pilot customers for EKS before it was generally available, iirc. (I helped work on this migration in 2018/2019)

Now, 6+ years later, I don't think Snap heavily uses GCP for traffic unless they migrated back. And this outage basically confirms that :P

garbthetill · 2025-10-20T11:39:57 1760960397

Thats so interesting to me, I always assume companies like google who have "unlimited" dollars will always be happy to eat the cost to keep customers, especially given gcp usage outside googles internal services is way smaller compared to azure and aws. Also interesting to see snapchat had a hacky solution with AppEngine

freeqaz · 2025-10-20T13:33:13 1760967193

These are the best additional bits of information that I can find to share with you if you're curious to read more about Snap and what they did. (They were spending $400m per year on GCP which was famously disclosed in their S-1 when they IPO'd)

0: https://chrpopov.medium.com/scaling-cloud-infrastructure-5c6...

1: https://eng.snap.com/monolith-to-multicloud-microservices-sn...

makeitdouble · 2025-10-20T11:49:02 1760960942

The "unlimited dollars" come from somewhere after all.

GCP is behind in market share, but has the incredible cheat advantage of just not being Amazon. Most retailers won't touch Amazon services with a ten foot pole, so the choice is GCP or Azure. Azure is way more painful for FOSS stacks, so GCP has its own area with only limited competition.

Scubabear68 · 2025-10-20T13:05:13 1760965513

I’m not sure what you mean by Azure being more painful for FOSS stacks. That is not my experience. Old you elaborate?

However I have seen many people flee from GCP because: Google lacks customer focus, Google is free about killing services, Google seems to not care about external users, people plain don’t trust Google with their code, data or reputation.

dzonga · 2025-10-20T12:25:41 1760963141

Customers would rather choose Azure. GCP has a bad rep, bad documentation, bad support compared to AWS / Azure. & with google cutting off products, their trust is damaged.

ecshafer · 2025-10-20T12:03:47 1760961827

GCP as I understand it is the E-commerce/retail choice for this reason. Not Amazon being the main reason.

Honestly as a (very small) shareholder in Amazon, they should spin off AWS as a separate company. The Amazon brand is holding AWS back.

philistine · 2025-10-20T13:45:39 1760967939

Absolutely! AWS is worth more as a separate company than being hobbled by the rest of Amazon. YouTube is the same.

Big monopolists do not unlock more stock market value, they hoard it and stifle it.

array_key_first · 2025-10-20T12:34:25 1760963665

Google does not give even a singular fuck about keeping their customers. They will happily kill products that are actively in use and are low-effort for... convenience? Streamlining? I don't know, but Google loves to do that.

throwway120385 · 2025-10-20T14:47:29 1760971649

The engineering manager that was leading the project got promoted and now no longer cares about it.

lesuorac · 2025-10-20T15:18:53 1760973533

High margin companies are always looking to cut the lower-margin parts of their business regardless of if they're profitable.

The general idea being that you'll losing money due to opportunity cost.

Personally, I think you're better off just not laying people off and having them work the less (but still) profitable stuff. But I'm not in charge.

freeqaz · 2025-10-15T17:06:24 1760547984

Use Aerospace for window management. No animations. No disabling of security. It just works. https://github.com/nikitabobko/AeroSpace

freeqaz · 2025-10-15T04:32:28 1760502748

I think the better comparison, for consumers, is how fast is LPDDR5 compared to the normal DDR5 attached to your CPU?

Or, to be more specific, what is the speed when your GPU is out of RAM and it's reading from main memory over the PCI-E bus?

PCI-E 5.0: 64GB/s @ 16x or 32GB/s @ 8x 2x 48GB (96GB) of DDR5 in an AM5 rig: ~50GB/s

Versus the ~300GB/s+ possible with a card like this, it's a lot faster for large 'dense' models. Yes, even an NVIDIA 3090 is ~900GB/s of bandwidth, but it's only 24GB, so even a card like this Xe3P is likely to 'win' because of the higher memory available.

Even if it's 1/3rd of the speed of an old NVIDIA card, it's still 6x+ the speed of what you can get in a desktop today.

MrBuddyCasino · 2025-10-15T12:09:51 1760530191

This doesn’t matter at all, if the resulting tokens/sec is still too slow for interactive use.

freeqaz · 2025-10-13T19:08:01 1760382481

Recent reputation, yes. But their old reputation was very positive. They made cars that would survive in any condition (which is why they were popular for military uses).

These days, you're in one of two camps: Either you still believe (because you're ignorant or value the Jeep brand more than you value a reliable vehicle) or you've read the recent reviews and steer clear.

Jeep has been duking it out for the bottom of Consumer Reports ratings for a while now, yet they still seem to sell cars. As they continue to betray their loyal customer base though, I imagine this will change. I wish American car companies were better!

treesknees · 2025-10-13T19:36:47 1760384207

I think you’re conflating a few things. Jeeps, as manufactured during World War II, were produced by Ford and Willys. The Jeeps of today, manufactured by Stellantis, carry on the name (and arguably the general shape) but are completely different vehicles.

They “seem” to sell cars? Well, yes. The Wrangler and the Grand Cherokee are consistently near the top of list of most popular SUVs, year after year.

nostrademons · 2025-10-13T20:12:37 1760386357

The point of buying the brand is to conflate reputations.

Incipient · 2025-10-14T00:09:37 1760400577

It's slightly different here. They did seem to have bought a lot of the manufacturing - or at least they're still manufactured in the US? Maybe ex Chrysler factories?

China buying the MG brand was entirely just for reputation - no connection at all.

thegrim33 · 2025-10-13T23:46:27 1760399187

The older I get the less I care to believe in memes that float around, if everyone online memes about how horrible some product or brand supposedly is. In fact, the more prevalent the memeing is, the more I assume it's either manufactured or has just reached critical level of viral meme where now everyone repeats something simply because everyone else says it.

What percentage of people shitting on some brand actually have owned that brand for many years? And also owned other brands for many years, to be able to compare reliability and have any sort of informed opinion on the topic?

Things like Consumer Reports are just small surveys of the opinion of random members of the population, what they think about the brand, there's no connection to any objective reality about how reliable they vehicles actually are.

In the past I've tried to find a single study that actually compares objective reliability of brands. It does not exist. If you Google for it, everything you will find will eventually, at the bottom of it all, link back to the same Consumer Reports study.

I've owned a 2018 Wrangler for 6 years now, I've put 75k miles on it, many thousands of miles in the most remote places in the country, where if it had issues it'd be a 30 mile hike to safety. It's never once let me down in any way. Never once had a major problem. That's all I care about.

water-data-dude · 2025-10-13T22:01:26 1760392886

Don't forget the third camp who just really like OLD jeeps!

Somewhere in the ballpark of a week ago there was a car show near where I walk my dog (some charity event). Overall not that interesting - there were a lot of flashy low riders with the crazy hydraulics and stuff - but there was also this really cool jeep truck-thing from sometime in the 1950's, a Jeep Forward Control[0]. They had pics of it when they first got it, absolute rusty mess! But goddamn, I'm not even a car guy and I was impressed. Labor of love.

Then my cousin has a more modern Jeep and lemme tell you: not great. I wonder what happened to that company? Garden variety enshittification, or is there an interesting story there?

[0] https://en.wikipedia.org/wiki/Jeep_Forward_Control

freeqaz · 2025-10-08T23:55:13 1759967713

Scale AI | Product Security Engineer | TypeScript, Node, Python, AWS | Full-time | Hybrid in San Francisco, CA or NYC or Remote

This is for a team that I'm working closely with and helping grow. We're hiring for somebody with a hybrid SWE and Security background to help scale up the team.

It's currently one engineer who's got a ton on his plate, so the ideal person for this role is somebody who's interested in learning, digging in deep to fix issues (especially shipping PRs), and helping shape the future roadmap of the team. It is not an analyst role.

The role is primarily targeted at mid-career folks with a few years of experience (2+ years minimum). Those that are Senior/Staff level folks, especially those with a strong SWE background that are curious about security, are encouraged to apply if this role resonates strongly. (I personally transitioned from SWE -> ProdSec, years ago, and several other folks on the broader Security team have non-Security backgrounds too.)

Here is the role: https://scale.com/careers/4602047005

Feel free to apply there, email me with you resume (on my profile), or add me on LinkedIn[0]. I'll try my best to answer questions and reply to everybody, but sometimes there is so much inbound that it isn't possible. Thanks!

0: https://www.linkedin.com/in/freewortley/

astrastorm · 2025-10-09T04:05:07 1759982707

This is not remote.

freeqaz · 2025-09-29T19:21:13 1759173673

That's not just them saving it locally to like `~/.claude/conversations`? Feels weird if all conversations are uploaded to the cloud + retained forever.

gdudeman · 2025-09-29T20:16:15 1759176975

Ooo - good question. I'm unsure on this one.

adastra22 · 2025-09-30T02:50:19 1759200619

The conversation is stored locally. If you run Claude on two computers and try /resume, it won’t find the other sessions.

adastra22 · 2025-09-30T02:50:45 1759200645

They are downloaded from the cloud and just never deleted.

freeqaz · 2025-09-03T07:15:11 1756883711

I was trying to understand what this project is. It's some sort of open firmware for Canon camera that you put on the flash card (SD). The home page has info: https://www.magiclantern.fm/

names_r_hard · 2025-09-03T08:43:40 1756889020

Hi - I'm the current lead dev.

It's not firmware, which is a nice bonus, no risk of a bad rom flash damaging your camera (only our software!).

We load as normal software from the SD card. The cam is running a variant of uITRON: https://en.wikipedia.org/wiki/ITRON_project

ioma8 · 2025-09-03T07:51:59 1756885919

Yes its truly noteworthy project. They exploited Canon cameras by first managing to blink red charging LED. Then they used the LED blinks to transmit the firmware out. Then they built custom firmware which boots right from SD (thus no posibility to break the camera). The Magic Lantern firmware for example allows many basic cameras to do RAW 4K video recording (with unlimited length) - feature which is not even in the high-end models. But it has much more features to tinker with.

names_r_hard · 2025-09-03T09:00:16 1756890016

There's a fun step you're missing - it's not firmware. We toggle on (presumably) engineering functionality already present in Canon code, which allows for loading a file from card as an ARM binary.

We're a normal program, running on their OS, DryOS, a variant of uITRON.

This has the benefit that we never flash the OS, removing a source of risk.