"Plaintext" is ASCII binary that is overwhelmingly English. The reason people li...

utborin · on Feb 25, 2021

Yes, exactly. I love binary protocols/formats. Plain text formats are wasteful, and difficult (or at least annoying) to implement with any consistency. But you really do need a translation layer to make binary formats reasonable to work with as a developer. There are very good reasons why we prefer to work with text: we have a text input device on our computers, and our brains are chock full of words with firmly associated meanings. We don't have a binary input device, nor do we come preloaded with associations between, say, the number 4 and the concepts of "end" and "headers." (0x4 is END_HEADERS in HTTP/2.)

Once you have the tools in place, working with binary formats is as easy as working with plaintext ones.

Of course making these tools takes work. Not much work, but work. And it's the kind of work most people are allergic to: up-front investment for long-term gains. With text you get the instant gratification of all your tools working out of the box.

I don't think I'd go so far as to say that plain text is junk food, but it's close. It definitely clogs arteries. :)

CivBase · on Feb 25, 2021

> "Plaintext" is ASCII binary that is overwhelmingly English.

I don't see any reason why "plaintext" must be limited to ASCII. Many "plaintext" protocols support Unicode, including the ones listed in this article. Some protocols use human language (as you said, overwhelmingly English), but many do not. There is nothing inherent about plaintext which necessitates the use of English.

> The reason people like plaintext is that we have the tooling to see bits of the protocol as it comes over the wire. If we had good tooling for other protocols, then the barrier to entry would be lower as well.

I disagree.

Humans have used text as the most ubiquitous protocol for storing and transferring arbitrary information since ancient times. Some other protocols have been developed for specific purposes (eg traffic symbols, hazard icons, charts, or whatever it is IKEA does in their assembly instructions), but none of have matched text in terms of accessibility or practicality for conveying arbitrary information.

I think your statement misrepresents the relationship between tool quality and the ubiquity of the protocol. Text has, throughout most of recorded human history, been the most useful and effective mechanism for transferring arbitrary information from one human to another. Text isn't so ubiquitous because our tooling for it is good; our tooling for text is good because it is so ubiquitous.

Text is accessible to anyone who can see and is supplemented by other protocols for those who can't (eg braille, spoken language, morse code). It is relatively compact and precise compared to other media like pictures, audio, or video. It is easily extended with additional glyphs and adapted for various languages. There's just nothing that holds a candle to text when it comes to encoding arbitrary information.

waynesonfire · on Feb 25, 2021

There is nothing inherently human readable about plain text. It's still unreadable bits, just like any other binary protocol. The benefits of plain text are the ubiquitous tools that allow us to interact with the format.

It would be interesting to think about what set of tools gives 80% of the plain text benefit. Is it cat? grep? wc? API?. Most programming languages I know of can read a text file and turn it into a string, that's nice. The benefit of this analysis would be that when developing a binary protocol, it will be evident the support tools that need to be developed to provide plenty of value.

I'm not afraid of binary protocols as long as there is tooling to interact with the data. And if those tools are available, I prefer binary protocols for it's efficiency.

CivBase · on Feb 25, 2021

> There is nothing inherently human readable about plain text. It's still unreadable bits, just like any other binary protocol. The benefits of plain text are the ubiquitous tools that allow us to interact with the format.

You seem to have glossed over my whole point about how the ubiquity of text is what drives good tooling for it, not the other way around. Text is not a technology created for computers. It has been a ubiquitous information protocol for millennia.

> I'm not afraid of binary protocols as long as there is tooling to interact with the data. And if those tools are available, I prefer binary protocols for it's efficiency.

I'm not afraid of binary protocols either and there are good reasons to use them. The most common reason is that they can be purpose-built to support much greater information density. However, purpose-built protocols require purpose-built tools and are, by their very nature, limited in application. Therefore, purpose-built protocols will never be as well supported as general-purpose protocols like text.

That isn't to say that purpose-built protocols are never supported well enough to be preferable over text. Images, audio, video, databases, programs, and many other types of information are usually stored in well-supported, purpose-built, binary protocols.

divbzero · on Feb 25, 2021

> I'm not afraid of binary protocols as long as there is tooling to interact with the data.

I agree with this premise but would also note how long it takes for such tooling to become widespread. Even UTF-8 took awhile to become universal — I recall fiddling with it on the command line as recently as Windows 7 (code page 1252 and the like).

kybernetikos · on Feb 25, 2021

> It would be interesting to think about what set of tools gives 80% of the plain text benefit.

My experience with binary protocols is that one of the first tools you write is one that converts it to a text format, and you then receive nearly 100% of the plain text benefit, as long as you can use that tool.

gambler · on Feb 25, 2021

ASCII has a built-in markup language and a processing control protocol that most people aren't even aware of and most tools out there don't support. This is significant. Look at the parts that are used and parts that aren't. What is the difference between them?

colejohnson66 · on Feb 25, 2021

I think the big reason the ASCII C0 characters never took off was because you can’t see or type them.[a] If I’m writing a spreadsheet by hand (like CSV/TSV), I have dedicated keys for the separators (comma and tab keys). I don’t have those for the C0 ones. I don’t even think there’s Alt-### codes for them.

[a]: Regarding “seeing” them, Notepad++ has a nifty feature where it’ll show the control characters’ names in a black box[0]

[0]: https://superuser.com/questions/942074/what-does-stx-soh-and...

rbanffy · on Feb 26, 2021

> Notepad++ has a nifty feature

Most physical terminals had the ability to show hex or control characters instead of/in addition to text.

specialist · on Feb 25, 2021

Heh. I used those control characters to embed a full text editor within AutoCAD on MS-DOS. Back in the day. Mostly because someone bet me it couldn't be done.

Robotbeat · on Feb 25, 2021

I don’t know. Can you tell me? ;)

tdeck · on Feb 25, 2021

I assume the parent is referring to the various control characters like "START OF HEADING", "START OF TEXT", "RECORD SEPARATOR", etc... I haven't seen most of these used for their original control purpose but they date back a long way:

https://ascii.cl/control-characters.htm

hakfoo · on Feb 26, 2021

I've seen them in some vendor-specific data formats in the financial space.

They seem to be from an era when the formatting models were either fixed-width fields, or a serial set of variable width fields delineated by FIELD SEPERATOR, GROUP SEPERATOR, etc.

What both models lacked was a good way to handle optional/sparse fields. If you have a data structure with 40 sub-fields, a JSON, XML or YAML notation can encode "Just subfield 26" pretty efficiently, but the FIELD SEPERATOR model usually involves having dozens of empty fields to space the one you want, and a lot of delicacy if the field layout changes.

TeMPOraL · on Feb 25, 2021

The bits that aren't used don't correspond to printable characters :).

colejohnson66 · on Feb 25, 2021

The “C0” block (U+0000 through U+001F) https://en.wikipedia.org/wiki/C0_and_C1_control_codes

They’re almost never used in practice however.

hamburglar · on Feb 25, 2021

I disagree that tooling makes up for the lack of human readability in a binary protocol. One of the reasons text-based protocols are so convenient to debug is that you can generally still read them when one side is screwing up the protocol. tcpdump: “oh, there’s my problem” Custom analyzer: “protocol error”

hermanradtke · on Feb 25, 2021

Pretend for a moment that HTTP used https://en.wikipedia.org/wiki/Esperanto instead of English. You would need tooling to translate Esperanto to English.

hamburglar · on Feb 25, 2021

Yes. Please feel free to assume that everywhere I say “plain text,” I mean “plain text that is not intentionally obfuscated.” I apologize for not being clear.

wtetzner · on Feb 25, 2021

Would that really cause a problem in determining if the text being sent is well-formed?

Having GET, POST, PUT, etc. and header names be in another language wouldn't prevent you from determining the well-formedness of the text.

andrewmcwatters · on Feb 25, 2021

I don't think we necessarily even need good tooling for other protocols, we just need good binary analysis tooling that visualizes any binary buffer.

I don't know of a single good app that exists for that.

nick__m · on Feb 25, 2021

Wireshark is somewhat usefull if the protocols in the binary blob are supported.

First, the buffer must be converted to ascii hex, then the following procedure is used to import it: https://www.wireshark.org/docs/wsug_html_chunked/ChIOImportS...

wtetzner · on Feb 25, 2021

That's because you need to know the format to know how to interpret it. Otherwise the best you can really do is use a hex editor.

Or are you suggesting a tool that lets you easily specify the binary format? I'm pretty sure there are some that exist.

andrewmcwatters · on Feb 25, 2021

Not really. But you can, for example, guess at it.

Lets say I have a struct I'm looking for, and I know it has a UTF-8 string and a length, presumably an unsigned int.

Using a hex editor alone to visualize blocks of those structs is painful. Enough that it's not pleasant to do. I can do it but, man in 2021, I really want a tool to help visualize that for me.

I can provide the app hints at what I'm looking for so it doesn't have to attempt to coalesce bytes into various data types. Some apps actually do this! But none of them very well or in an attractive way.

m463 · on Feb 26, 2021

Yet plaintext - again and again - stands the test of time.

I think the tooling analogy is like SLR cameras. I thought the camera was the important part of the equation, but it turns out the camera body is tossed out and replaced every couple of years. The lenses are the part that survive.

konjin · on Feb 25, 2021

They are self documenting. Looking at a binary format that starts with 41 41 41 41, is that a string, unsigned it, signed it, float, struct? Who knows?