*I was shocked at the time that some people actually think Postel's Law is *wron...

Joeri · on April 3, 2011

The main problem with "pure" implementations is that they deny a core aspect of humanity: we make mistakes. The problem is not that parsers have to deal with invalid syntax, that's just a given, it's that they have a notion of invalid syntax at all. It's not that hard to design a spec in such a way that all input will be parsed in a predictable way, maximally extracting semantics. This is what i like about the html5 parsing; it doesn't have a concept of unparseable input, yet all parsers can implement the same standardized parsing algorithm.

thwarted · on April 3, 2011

they deny a core aspect of humanity: we make mistakes

Exactly. Postel's Law is meant to work around that. One might call it being robust (one might also call "accepting the input and doing something sane rather than trying to guess" robust also). There are two holes in it, however: 1) it doesn't encourage people to actually fix their "mistakes", and 2) it encourages exploitation of those who are liberal with their input.

We must be liberal, but not necessarily too liberal, in what we accept. Postel's Law has specific applications. One shouldn't be liberal in their acceptance of tyrants, for example.

Joeri · on April 3, 2011

1. Why do people have to fix their mistakes if automation can solve the problem for them? If we can assume that mistakes will be made, and we can find an automated way to solve those mistakes, then why should we force humans to jump through hoops?

2. Why can't a parser be strictly standardized and liberal with its input at the same time? If the spec provides error recovery behavior, what is wrong with that?

My point is that there's no such thing as too liberal as long as all parsers implement the same exact kind of liberal parsing. Our low-level communication protocols have no concept of invalid input, they can recover from any random burst of garbage input, and we think this is normal. But then at a higher level of communication, like XML, suddenly error recovery is a bad thing? It makes no sense to me.

thwarted · on April 3, 2011

1. Why do people have to fix their mistakes if automation can solve the problem for them?

Postel's Law isn't about automation, it's about where to apply effort. Automatically fixing mistakes at the time of their creation would be great, but just like real life, there are a million ways something can be interpreted wrong after the fact (and thus the wrong "fix" applied) and only one way to interpret it right.

why should we force humans to jump through hoops?

Humans have to jump through hoops to create the robust error recovery. Rather than the effort being evenly distributed among all parties when everyone is conservative on both the production and acceptance side, the producers can be really lazy and the acceptors have to jump through hoops to accept all the lazy people's output. There is no automation here, someone has to write the code that liberally accepts things, which is often a hard task because of the many different ways things can be interpreted when they are not specific and explicit.

2. Why can't a parser be strictly standardized and liberal with its input at the same time? If the spec provides error recovery behavior, what is wrong with that?

A parser that is liberal with its input and provides robust error recovery begets tag soup. The only people who like tag soup are those who want to be lazy when producing it. It's more work, over all, to accept all input and try to figure out what was intended than it is to just say "I can't interpret this" and tell the generator that they need to be more conservative in what they generate (the other part of Postel's Law).

My point is that there's no such thing as too liberal as long as all parsers implement the same exact kind of liberal parsing.

It's not the liberal parsing that is necessarily the problem, it's the second order effect of liberal interpretation. If people can be liberal in their parsing, then they can be liberal in their interpretation, and if we accept that, we have to, as users, accept very little robust interoperability.

Our low-level communication protocols have no concept of invalid input, they can recover from any random burst of garbage input

"Recover"? If you don't do the TCP handshake in very specific ways, not only does no other server talk to you, but you may end up breaking some of the guarantees that TCP is supposed to provide. Random garbage that sets the RST bit in a TCP packet closes the connection, it doesn't "recover" from that.

Now, obviously, I'm not advocating that things should outright crash when given bad input: that's the worst. They should produce decent error messages as soon as possible so the producer can increase their conservative nature of generation. Consider serving a web page as application/xhtml+xml, which in Firefox (at least back when I was doing a lot of this) would fail to accept the file and would tell you where it was structured wrong. By accepting any old ambiguous format, you'd never see this error and you wouldn't know that you weren't being conservative in your output. And since different browsers treated malformed content differently (either accepting it, or not accepting it, or trying to guess and often getting it wrong or different from what other browsers guessed), you end up with a mess where the "liberal" accepting side gets tagged as deficient if it doesn't jump through all the hoops thrown at it.

tomjen3 · on April 3, 2011

The problem with Pastels law isn't that we make mistakes, thats a given.

The problem is that it still works. Drive a car too fast around a corner and you are thrown of the road; write invalid xml and you get an error.

Joeri · on April 3, 2011

But if your car can correct your cornering for you, why shouldn't it? Should we disable all our car's electronics just so we can learn to make fewer mistakes the hard way?

InclinedPlane · on April 4, 2011

The problem with postel's law in terms of webdev is that there is no reference browser implementation. For the longest time end-user clients were the only tools web devs had. This led to such a huge disconnect between the theoretical standard and the practical standards that even when validators came around it the gap was so large that almost no one bothered with ensuring their html was valid.

What woudl have been helpful, and still would be, is the ability to toggle a browser into "strict" rendering mode during development testing.

prodigal_erik · on April 4, 2011

Every HTML version dating back to http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt had a DTD. Validation was available long before the W3C made theirs trivial to use, the problem was widespread ignorance on the part of authors of early web pages and especially tutorials.