2012/08/09

XML and Other Complaints About Data Serialization Formats

I love JSON alot more than XML.  That doesn't say much, considering I've never done a project using either of them for any data serialization ever.  But in the context of data interchages, XML is ubiquitous and oh so crappy.  And it sucks because of the redundancy in both machine and human like context.  Of course, JSON's website itself describes these problems as well as anyone, so read what smart people say rather than idiots like myself on why JSON > XML.

But even if it is better that doesn't mean it is common, and whenever I look at or write C# comments for documentation having to write tags makes me want to punch babies.  Or when I just read html it makes me mad know how much wasted bandwidth and effort goes into every </div> in every document on the internet, ever.  Because JSON gets right what very few do - an order of operations that illustrates what needs to be said and nothing more.

In JSON, answer : 42 is a key value pair that is analygous to how you write a map in Javascript.  Since JSON is Javascript, this comes off as somewhat of a duh.  But the comparable key : value pair in XML needs to be something like <answer value=42 /> or <answer>42</answer> and both use up a superfluous number of characters in a context where they are unncessary.

That is enough praise for JSON though, on to my issues with it, since I am ranting.  My largest gripe with the format is that by setting out to be noble and very JS compatible, the format limits itself to the awful grammar of Javascript that has been documented by much smarter people as sucking in many ways.  If it didn't suck so bad, concepts like Coffeescript would never have arose in the era of web development - but Javascript is not so named without cause, and Sun's influence on the language lead it in many ways to be as "ugly" as it is today.

So my major qualm boils down to this: I want py PYON instead of JSON.  I much prefer the syntax of:

collectionFoo:
   doo : 1,
   loo : 2,
   zoo : 3,

To the JSON equivalent:

collectionFoo {
   doo : 1,
   loo : 2,
   zoo : 3,
}


Yes, to save one character.  To use tab intentation to denote end of statement rather than curly braces.  Because we already use the tabs in the first place.  It drives me nuts writing code in anything but Python anymore, especially in Ruby, because all these other languages have constructs like braces or End to denote the end of intended scope, when you already indent the content anyway!  It is best practices and the most readable, it is why the K&R convention arose after the C book came out, because the collective developer community looked at code and said "You know what really makes this stuff easy to read?  Indentation!" so they indented.  And read with indents.  But kept the braces and other nuances because it make compilers easier to write.

And that is a valid arguement.  If you are writing some 2 week homework assignment compiler and don't want to handle the different conditions for white space.  It takes one integer (or short, or byte - how far to you plan to intent the lines?) to keep count of leading whitespace on each parsed line, and that whitespace can and should denote the scope the line should be considered in.

I also have the conflicting viewpoint in favor of code minimization, and for those who just like things the way they are.  Now this goes against Python syntax by saying you can still just keep the {} syntax in place, but not make it mandatory, and in the absence of braces default to white space significance, but if you have braces, white spaces are ignored.  That is the best of both worlds in my mind, and I feel that the time costs of writing trivial parsers that can just do a switch if the next significant character after a : in a declaration is a {, or a newline, or an indentation, or non-whitespace character.  Or you can have it so that if the following character is { it is whitespace ignorant or if it is : it is whitespace deterministic.  You can save the collective milliseconds of developers writing the code, and then the dozens of times it will be read following, for the sake of clarity.

No comments:

Post a Comment