Tech rant

davmac.org > Techblog

I feel a rant coming on.

On XML

Let's start by saying that XML-lovers need their tongues stapled to their kneecaps. There, I've said it. I feel able to say such a thing partly because I believe there's an element of truth in it, and partly because I know that no-one reads the crap that I write here anyway (I've looked at the logs - very depressing. Very depressing indeed...)

The problem with XML, see, is that it's text based. That makes it possible, proponents of the beast would argue, to edit XML documents in a handy, regular old text editor like the one I'm using right now to type this document (It's called Joe, if you're interested).

But the thing is, for any complicated recursive structure a plain old text editor just isn't going to cut it. Try and hand-edit some XML such as that produced by a word processor - you'll quickly get lost in the myriad of angle brackets, the likes of which would get (should they attempt to comprehend them) even a perl programmer dribbling and singing "Humpty Dumpty" while bending over their desk with a carrot stuck up their arse.

Of course perl programmers probably do that on a fairly regular basis anyway, but that's surely the topic for a future rant.

Don't get me wrong: XML does have its uses. It's fairly good for simple configuration files. It's (barely) suitable for ant build scripts. But beyond these uses, it's a bloated mess that should not be considered. It wastes storage space and eats network bandwidth. It's also processor inefficient: to extract data from a particular element you need to parse the whole file up to that point. And apart from the alleged ability to edit with a text editor, it gives you nothing that a binary format could not.

For what is an XML document but a tree? The nodes are elements and the branches are the text and the contained elements. Why can't we have a simple binary format, structured as a tree, each node tagged with a type and delimited, with another document describing the structure (ie. the equivalent of a DTD)? It could potentially solve all the efficiency problems of XML while still - and this is the good part - while still allowing a standard editor (not a text editor, but an editor designed specifically for this new binary tree structure format) to be used to edit all such files.

Portability of such a binary format need not be a serious issue. Any node which contains a number, for example, is simply tagged as such, and the tag can also indicate the numbers size (in bytes) and byte-order. Floating-point numbers can also be tagged as several different binary formats also. You could even still store numbers as text if you wanted to.

The new binary format would then be just as easy to edit as an XML document (even easier, with a well-designed editor). The format would be just as extensible. What is so hard about that? I'm not the first to come up with the idea of Binary XML, not by a long shot. What I don't understand is why it hasn't yet come about.

On HTML and (in particular) CSS

Ugh. Alright, let's start out by saying that the fault is not so much with the standard as with the implementations (ie. Firefox), but that's still a problem.

Firstly: CSS has one attribute (display) which specifies both the shape in which contents of an element should be layed out, as well as how the element should be placed within its own parent.

That is, "display: block" causes the element to be treated as a block both in that the contents are layed out in a square box shape, but also in that the element will be placed below previous siblings. "display: inline" on the other hand causes all contained elements which are also inline to flow as if they were part of the parent layout flow.

For instance, here is some text. The bold element is using inline display:

Some sample text some more sample text this is some more sample text, it's long enough to demonstrate inline and finally this is yet more text.

Note that the bold text (also highlighted in yellow to make it easier to see) does not break the layout flow, and secondly that it crosses from one line to the next, ending at the right-hand-side of the first line and beginning again on the left-hand-side of the next.

Here's the same again, but the bold element is a block:

Some sample text some more sample text this is some more sample text, it's long enough to demonstrate inline and finally this is yet more text.

But what if we want to avoid disrupting the flow of the containing element, but to have the contained elements be laid out as a block? CSS 2.0 doesn't let you do that. You can fudge it by stuffing around with the HTML a bit (which is pretty much what CSS was supposed to prevent you from having to do, but let's put that aside for the moment). Specifically, in Firefox 1.0, I have to do the following:

Yes, it's a pain, but it works:

Some sample text some more sample text

this is some more sample text, it's long enough to demonstrate inline
and finally this is yet more text.

CSS 2.1 seems to have support for this (attribute value "inline-block") but I don't know if any browsers support this yet. (In fact, I'm having lots of trouble finding out which version of CSS firefox is meant to support. According to wikipedia, firefox should support CSS2 - but that doesn't explain why it can't handle "inline-table".)

There should be more control over how margins are collapsed. I should be able to put a header inside a table cell and not have a ridiculous amount of white space at the top and bottom. Likewise a header at the top of the page should not have white space above it.

There are plenty more issues. Why can't I position elements relative to some specified other element - it's always relative to the parent element. The order of elements in the document should not dictate the order of layout. I should be able to say, for instance: "this element must be to the right of the previous element, and it must be exactly the same height as that element. Also, it should be the same width (and should be directly below) some other element which is named... (whatever)."

Add a few more constraints like the one just suggested and you could do away completely with the need for "display: table-xxx" and the like, and you'd have generally much more powerful layout capabilities - truly approaching a seperation of content from presentation.

As a sidenote: err, cool, someone agrees with me!

On Scripting Languages

Two words people: Type Safety! It's there to save your ass from stupid mistakes. It's there to tell you, at compile time, that you've stuffed something up. Why, oh ye Perl and Python and other nasty language gurus, do you not understand this?

Answers like: "mumble, mumble, stack trace at run time, mumble, clearly identifies error location, mumble" do not cut it - that argument is a load of shit and I can't see how anyone who claims to be a programmer could spout such drivel. Here's the deal: a stack trace does not tell you the location of the error in a program; it only tells you whereabouts it was that the program borked itself. In a non-typesafe languages one cause of such borking is using an object or data item of an inappropriate type at some point, which later gets accessed as if it were an object of another type (for instance, by accessing a member which exists in the latter type but not the former). Naturally, the location where the dud object was stored is completely different to the location where the borking occurred.

Here's another thought, this one with a clear example. I was editing a Python program the other day because I needed to change a few aspects of its behaviour. I was clearly able to identify the method I needed to modify in the code - but I had no clue as to what data type was being passed in! Which in turn meant I had no idea what fields and methods were accessible. Of course I could probably have used the Python equivalent of reflection to dynamically inspect the arguments and print out such information, but there'd be no guarentee that the same information would always be available, seeing as any type could get passed in!

That's what type safety gives you: a gaurantee that certain fields and methods will be available to call. And, more than that, a clear warning or error at compile time when you access a member that is not guaranteed to be available!

I'm sure we've all seen a stack trace dumped in the middle of a web page when something in the generating script went wrong. Wouldn't it be better to be able to catch some of those errors at compile time, rather than after deployment?