A long read. ¯\_(ツ)_/¯
Discussing #JSON, #XML, #Protobuf, #Thrift, #Avro, #MessagePack, #AMQP (-Encoding), and #CSV in the context of #Messaging; some encodings I discuss briefly, and I explain why, some in more depth.
"Data Encodings and Layout"
Alright. I just went through the CBOR spec and since it's using similar tricks as MsgPack and AMQP, I added it to the writeup, with guidance for MsgPack and CBOR effectively being the same.
The comparison in the CBOR spec also says that it’s mostly equivalent to MsgPack, but it expresses deep concern with lack of extension in MsgPack tools.ietf.org/html/rfc7049#a…
Well, that was their opinion. If you’ve ever worked with dates in JSON you might wish they thought more about extending the spec. Things come up. Technology changes, goals become irrelevant and are replaced by others. CBOR does this via a very repetitive spec
Everyone seems to have fixed the JSON number and date-time issues in the newer binary encodings. I doubt there's a wave of new primitive types coming that require special encoding.
That is my concern. Even with system engineering doing a fantastic job of keeping everyone honest I can see chaos, confusion & complexity sneaking into systems.
The work they’re doing on it already is interesting. Dates, GUIDs, typed arrays. I’m sure there’s room for maps with implicit or predictable keys. OTOH BSON is nearly always used for the wrong reasons and provides few benefits over text JSON
And yet it still drives PKI that underpins https etc., as well as Kerberos and LDAP. If you see little demand that comes to ignorance. As if XML or JSON actually have any technical merit - inefficient, on the wire, inefficient to encode/decode, etc.
I’m aware of where it is in use. The examples you quote are obviously not unrelated and I get that it’s foundational for LDAP. The explicit subject of the post is application messaging payloads, and there, the world has largely moved on to modern encodings.
This industry goes in circles and often arrives at the same place a second and third time, but then with different priorities. With the massively growing overall solution complexity devs need to deal with, simplicity and tooling alignment is a feature for all building blocks.
This industry goes in circles and often arrives at the same place a second and third time, but then with different priorities. With the massively growing overall solution complexity devs need to deal with, simplicity and tooling alignment is a feature for all building blocks.
I simply consider MessagePack the superior choice; I mention BSON‘s odd approach to supporting arrays. The post isn’t meant to be an exhaustive survey of all formats, but aims to guide.
One thing I wonder about is how event hub capture uses Avro but you mention for long term that may not be ideal. Are there alternatives coming for capture format to cold but replayable storage?
I think it’s a solid option. If you want to be super safe about being able to read the data in the distant future, plain text remains difficult to top as cold storage “insurance”.
This industry goes in circles and often arrives at the same place a second and third time, but then with different priorities. With the massively growing overall solution complexity devs need to deal with, simplicity and tooling alignment is a feature for all building blocks.
I liked the part about external schemas and related challenges. I find this is something difficult to explain to non-technical people. Thanks for the article!