See the entire conversation

A long read. ¯\_(ツ)_/¯ Discussing #JSON, #XML, #Protobuf, #Thrift, #Avro, #MessagePack, #AMQP (-Encoding), and #CSV in the context of #Messaging; some encodings I discuss briefly, and I explain why, some in more depth. "Data Encodings and Layout"
Data Encodings and Layout
Structured data encoding options for Messaging
vasters.com
36 replies and sub-replies as of May 19 2018

Perfect length... great job. Over the years you have time and again written on topics I haven’t had time to research in depth. Thank you.
Good read, but it’s a shame CBOR isn’t mentioned. I’d recommend it well before BSON, imo
What is your opinion on the custom tagging feature of CBOR Tim?
It’s a great extension point for future specs, though I probably wouldn’t let my team use it. Too much potential for incompatibility
The text I picked up from my archive is a few years old; thank you for the feedback. I might quite well amend the post.
An update would be welcome. I've not seen this article before so it is this evenings light reading.
Alright. I just went through the CBOR spec and since it's using similar tricks as MsgPack and AMQP, I added it to the writeup, with guidance for MsgPack and CBOR effectively being the same.
The comparison in the CBOR spec also says that it’s mostly equivalent to MsgPack, but it expresses deep concern with lack of extension in MsgPack tools.ietf.org/html/rfc7049#a…
I'm not all that sure that extensibility matters greatly.
Well, that was their opinion. If you’ve ever worked with dates in JSON you might wish they thought more about extending the spec. Things come up. Technology changes, goals become irrelevant and are replaced by others. CBOR does this via a very repetitive spec
Everyone seems to have fixed the JSON number and date-time issues in the newer binary encodings. I doubt there's a wave of new primitive types coming that require special encoding.
That is my concern. Even with system engineering doing a fantastic job of keeping everyone honest I can see chaos, confusion & complexity sneaking into systems.
The work they’re doing on it already is interesting. Dates, GUIDs, typed arrays. I’m sure there’s room for maps with implicit or predictable keys. OTOH BSON is nearly always used for the wrong reasons and provides few benefits over text JSON
This is really good. Thanks!
Missing ASN.1 DER which is clearly more important for the Internet than all of the above. And works more efficiently.
I see very little demand for ASN.1 DER in messaging/eventing outside of legacy integration scenarios.
And yet it still drives PKI that underpins https etc., as well as Kerberos and LDAP. If you see little demand that comes to ignorance. As if XML or JSON actually have any technical merit - inefficient, on the wire, inefficient to encode/decode, etc.
I’m aware of where it is in use. The examples you quote are obviously not unrelated and I get that it’s foundational for LDAP. The explicit subject of the post is application messaging payloads, and there, the world has largely moved on to modern encodings.
Sure, "modern" encodings that all waste resources. The world has regressed in this case.
This industry goes in circles and often arrives at the same place a second and third time, but then with different priorities. With the massively growing overall solution complexity devs need to deal with, simplicity and tooling alignment is a feature for all building blocks.
Thx for sharing. Just put this on my reading list for the weekend.
No love for ASN.1?
Sign of the times. Read thread branch bottom up. twitter.com/clemensv/statu…
This industry goes in circles and often arrives at the same place a second and third time, but then with different priorities. With the massively growing overall solution complexity devs need to deal with, simplicity and tooling alignment is a feature for all building blocks.
I was only half serious.
It’s a legitimate half serious question, though
Why didn't you go into detail on BSON?
I simply consider MessagePack the superior choice; I mention BSON‘s odd approach to supporting arrays. The post isn’t meant to be an exhaustive survey of all formats, but aims to guide.
Thanks for making my wait at passport control Brussels more interesting :)
One thing I wonder about is how event hub capture uses Avro but you mention for long term that may not be ideal. Are there alternatives coming for capture format to cold but replayable storage?
I think it’s a solid option. If you want to be super safe about being able to read the data in the distant future, plain text remains difficult to top as cold storage “insurance”.
we are converting avro+json (from EventHub Capture) to Parquet using Spark, massively smaller & faster to query. Also contains the schema's.
Hey! What about the ubiquitous ASN.1 DER encoding? 😃
This industry goes in circles and often arrives at the same place a second and third time, but then with different priorities. With the massively growing overall solution complexity devs need to deal with, simplicity and tooling alignment is a feature for all building blocks.
I liked the part about external schemas and related challenges. I find this is something difficult to explain to non-technical people. Thanks for the article!