Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why there's so much parsing related exploits?


Because people implement parsers in languages that don’t allow direct expression of grammars (e.g. C). To safely implement parsers you must choose either algebraic datatypes or continuation passing, and a lot of programmers choose neither. CPS is annoying in most languages. ADTs are the obvious choice but somehow in 2021 most people are using languages that don’t have them. If you write a parser in Haskell, for example, you’d have to mess up pretty badly and write totally non-idiomatic code to write a parser that crashes at all, let alone crashes in a way that compromises memory safety.


You can just use a handwritten parser in any memory-safe language.

However, I suspect parsers for big objects like images tend to have more vulnerabilities because developers try to avoid copying data, for performance reasons. Many memory-safe languages have their own performance issues, or make it hard to avoid copying bulk data, so those languages aren't a great fit.

This makes Rust a particularly good choice for writing parsers: Rust is pretty strong at supporting complex data sharing patterns while remaining memory-safe.


> You can just use a handwritten parser in any memory-safe language.

It’s still probably going to be wrong for a complex grammar, even if it’s not going to literally crash. Bratus et al did a survey of PDF parsers a few years ago and iirc all the popular ones were wrong in ways that don’t necessarily correspond to memory errors (like infinite looping).

> This makes Rust a particularly good choice for writing parsers

In particular, Rust has ADTs, which is the main feature I advocate for this.


Even infinite looping is much better failure mode than "villains seize control of your device".


If your goal is validation (i.e. this is a JPG/PNG) and stripping of EXIF data it is entirely possible to write your own parser in a managed and safe language in less than 500 lines of code without sacrificing any performance.

Don’t load them into memory, parse them as a stream byte-by-byte in accordance with the standard for the codec, check every offset before seeking, and reject images that don’t conform to the standard.

And of course, a ton of fuzzing to accompany it.


The overhead of the stream abstraction is typically a lot greater than the cost of accessing a byte array.

Also, maybe I'm wrong, but when I read "image parsing" I think that actually means "image decoding".


The overhead of stream abstractions is negligible if your goal is security when processing arbitrary input files provided from a zero-trust environment.

In environments where you’re prioritizing performance I’d still argue streams are likely your best bet when the size of the file to be parsed is not a constant. You wouldn’t want to load 50 large files into ram on a server environment let alone a phone.

If your input buffer is a bunch of tiny 10 KB files and you trust them? Sure, load them into memory and access their indices on the stack. Make sure you reuse the buffer to avoid unnecessary allocations.

If you want parallel processing with zero-allocations then streams with an array pool for their backing buffer are the best bet.

Not loading arbitrary files into memory will always be safer than doing so.

As for decoding - I believe the functions for validating if an array of bytes is an image should be far removed from the decoding and presentation of those bytes to the frame buffer. You don’t need to decode a JPG to validate that a file is a JPG. It either conforms to the standard or it doesn’t; the pixel data is irrelevant.


> if your goal is security

The goal is never just security.

E.g. for a Web browser like Firefox the priority has to be to be as fast or faster than the competition, THEN be secure. That's just the reality of what users care about. If the goal was just security we'd all have been using HotJava for the last 24 years.

The goal for Rust was performance plus safety. That's pretty hard to pull off.

> You wouldn’t want to load 50 large files into ram on a server environment let alone a phone.

mmap() works pretty well here.

> As for decoding - I believe the functions for validating if an array of bytes is an image should be far removed from the decoding and presentation of those bytes to the frame buffer. You don’t need to decode a JPG to validate that a file is a JPG. It either conforms to the standard or it doesn’t; the pixel data is irrelevant.

Yeah but in a browser for example you never want to just "validate" an image file, you want to decode it, and separating validation from decoding is just asking for trouble. That is the meaning of "parse, don't validate".


>ADTs are the obvious choice but somehow in 2021 most people are using languages that don’t have them.

Isn't inheritance to create hierarchy enough? why?

       Node 
   SubNode1    SubNode2
 SubNode1.1      SubNode2.1


Using inheritance in this way is a hack to emulate some of the functionality of ADTs. Grammars are perhaps one of the most poignant examples where the various constructors in your type might have no behaviors in common, so adherence to a shared interface is nothing but a vague indication that these types are somehow related. Sealed classes let you recover a little bit more of the functionality.


well, but from security standpoint it should be fine?


Because writing a good parser that is fast, secure and works on a whole bunch of crappy/broken/non compliant formats is very hard.

Sure its easy to write a fast, standards compliant parser (assuming its not a format like .psd or a word document), but it will choke on the multitude of dodgy versions of files out there, causing complaints

_most_ users only care about the picture/sound/video/other viewing correctly, with security a distant second. (they expect it to be secure) so pressure is on to make the parse work with everything.


I get where product people will noisily demand this, but perhaps the idea of parsing even broken files successfully is a large part of the problem. HTML started this way, and it's still terrible.

At an ancient job, we did a lot of scraping content from HTML as the de facto input source. Each content provider had to have custom code written, and it was as bad as you would expect. XML came about, and its largest advantage was that invalid input was broken, and must be rejected as such.


My guess: because most parsing uses the stack a lot, and the parsed language often allows arbitrary length inputs, both of which are connected to overflow problems, which in turn can often be exploited.


It's more often use-after-free or heap buffer overrun bugs, these days.


Are these parsing exploits connected to JPEGs? I don't know for sure, but I can understand how this could be a can of worms.

The JPEG core format is complicated, but JPEGs in circulation today are simply nightmarish. They can include XMP, EXIF, and ICC data, plus a good dozen of other extensions that may actually affect how the JPEG is displayed. For example, knowing if a JPEG contains CMYK data depends on an Adobe extension with is about twenty years old. These extensions are in use today in images published in the web. So, just to display an image, one needs a parser for the image format, a parser for ICC, a parser for XMP, a parser for that obscure Adobe extension from twenty years ago, and so on. Often, each of them has its own library and represents decades of developer time. It's a lot of space for bugs and exploits.


> Why there's so much parsing related exploits?

Broadly, because by necessity you're dealing with untrusted inputs in a relatively complex format. The complexity leaves room for bugs in how you handle the input and the user supplied data provides an easy way to feed malicious inputs in.


My thinking is this:

1. Exploitation inherently relies on malicious input data.

2. In computer systems, any input data (especially in human-facing systems) is not logically useful to the software until parsed.

3. Thus, a parser is the first software element which systematically interacts with the input data. It is the prime exploitation target.

That, + parsing complex data is just kind of hard to get right. If iMessage was UTF-8 only, this would not be an issue I'm sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: