Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sidenote: Although JSON is very common, I argue EDN is the best data format out there.
 help



Curious: what are the primary advantages you see?

Not GP but I enjoyed reading through some details of EDN here, I hadn't studied it before: https://edn-format.dev/

Yeah, I looked through the GitHub. I've used Clojure before so it seems pretty easy to pick up.

I'm not yolkedgeek but I can give my own answer: EDN has tags. Tags start with `#` and are followed by a symbol (which is a lot like an identifier except that a lot of punctuation is allowed in symbols, because EDN derives from Lisp syntax rules). The `/` character is used for namespacing, and a user-defined tag must use a namespace. The tag meaning is application-defined, but there are a couple standard tags with well-defined meanings:

#inst "1985-04-12T23:20:50.52Z" = an instant / timestamp in RFC 3339 format

#uuid "f81d4fae-7dec-11d0-a765-00a0c91e6bf6" = a GUID/UUID

More tags could be defined by the standard later, because the entire unprefixed namespace is reserved. But just having a well-defined way to represent timestamps and UUIDs is an immense win over JSON, where you have to somehow know (based on what you were expecting to receive) that this string should be parsed as a timestamp or a GUID.

Also, user-defined tags will often be used to represent a class:

#myapp/Person {:first "Fred" :last "Mertz"}

Again, no need to know (based on what you were expecting) that this particular object is an instance of Person; the data transfer format tells you what class it is. JSON has to add a field, and what field it is will vary from application to application so it's usually not possible to write a universal parser. One server might generate { "__type": "Person", "first": "Fred", "last": "Mertz } while another one does { "$$class": "Person", "first": "Fred", "last": "Mertz }, for example.

EDN also has syntax defined for sets, but that's a smaller win over JSON, because it's not often necessary to declare that something is a set. Still, there are times it's helpful; it's certainly not a bad thing to have a set syntax.

Also, EDN has comments built in to the system. Two kinds, one line-based comment (useful for actual comments, e.g. when you use EDN as a config format), and one that comments out the next thing in the file (useful for temporarily commenting out an entire section with a single token, or for removing ONE item temporarily from a list that's all on the same line so line-based comments are difficult). Because Douglas Crockford didn't envision JSON as being used for config, he forbade comments in JSON, and people have been coming up with competing proposals for putting comments back in ever since. (Thankfully, nearly all the proposals interoperate, because all of them sensibly use Javascript comment syntax, so it doesn't matter if the file is JSONC or HuJSON or JSON5, the comment syntax is the same).

But the biggest win for EDN is tags, which can convey type information outside the data structure. JSON has to use something inside the data structure to convey type information, and there's always that small chance that the name chosen (__type or $$class or whatever) will collide with a property of the actual object that was supposed to be serialized.


I get tags and atoms. It seems like the problem with class serialization is somewhat arbitrary though. It seems like both sides need the object schema ahead of time, in which case the schema can flag how it sdould be IDd / tagged.

I also wonder if atoms can be reduced for low-bandwidth transmission. Naïvly, you could just prepend a lookup table for multiple-use atoms.

I guess it seems more like niche, additional features when GGP seemed to be claiming a big step up.


I haven't used EDN, but I know YAML has an equivalent feature, and that had been a security issue in some instances because it deserialized into objects the system wasn't expecting. Perhaps their deserializer had learned from that doesn't have that issue?

Haven't used EDN myself but from a read through the docs, I'm pretty sure that on user tags, the deserializer just says "Here's the tag, and here's the object it was tagging" and lets the consuming code decide what to do with the tag. (And on canonical tags like dates and GUIDs, there's no security risk to deserializing them as the recipient language's version of timestamps and UUIDs).

Actually, https://github.com/edn-format/edn says "It is envisioned that a reader implementation will allow clients to register handlers for specific tags. Upon encountering a tag, the reader will first read the next element (which may itself be or comprise other tagged elements), then pass the result to the corresponding handler for further interpretation, and the result of the handler will be the data value yielded by the tag + tagged element, i.e. reading a tag and tagged element yields one value. This value is the value to be returned to the program and is not further interpreted as edn data by the reader."

So if the client is specifying the handlers, then it's up to the client's handler implementation to sanitize the incoming data before instantiating the objects. And since the client supplies the list of handlers, the only tags that will be handled are ones the client was expecting. Assuming sanitizing the incoming data before instantiating objects is done correctly, I don't see any way for that to become a security issue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: