Breaking news: w3c specs are not the Word of God

Pretentious introduction

In the world of religious loonies, there are two main kinds of nutter. One is the fundamentalist – someone with the attitude that a tract is the work of God, perfect and unquestionable; if it’s mentioned in the Book, it’s beyond doubt no matter how daft. The second sub-genre of nutter is the exegesist: someone who believes that extra, unwritten information may be teased out of the text with enough insight and critical reading.

Web Standards evangelist types will all recognise both type of adherent to that Holy of Holies: the w3c specs.

Fundamentalism and the definition list

Recently, I had my first ever occasion to use a definition list and, being visually unimaginative, I Googled around to get some ideas on how such a beast might be styled. I was staggered at what I found: people were using the humble <dl> for purposes that seem to me to be miles away semantically from the idea of term and associated definition(s). I’m thoroughly with Andy Budd who writes in his book CSS Mastery:

Many web standards pioneers seized on the fact that definition lists could be used to structurally group a series of related elements and started to use them to create everything from product listing and image galleries, to form and even page layouts. I personally believe that they stretch the implied meaning of definition lists beyond their natural breaking point.

Hear hear. The problem, however, goes back to the html specification of Definition lists which begins entirely sensibly:

Definition lists vary only slightly from other types of lists in that list items consist of two parts: a term and a description. The term is given by the DT element and is restricted to inline content. The description is given with a DD element that contains block-level content.

So far, so intelligible and semantic. Then, however, author Dave Raggett goes a bit mental and writes,

Another application of DL, for example, is for marking up dialogues, with each DT naming a speaker, and each DD containing his or her words.

WHAT?!?!

Stealing from Tantek, it’s much better semantically to mark up a completely fictional dialogue thus:

<cite>Dave Raggett</cite>
<blockquote>Hey, I've just found this crazy-looking mushroom in a field! I think I'll eat it before completing the spec for definition lists!</blockquote>
<cite>Tim Berners-Lee</cite>
<blockquote>Great plan! Would you also like this lumpy roll-up I found under a chair in the CERN student common room?</blockquote>
(Another example).

Tantak’s markup is semantic (although I think he then goes a step too far and uses an ordered list to contain each citation and blockquote), but he convincingly shows that a dialogue should not be marked up as a definition list, whatever the damn spec says.

Floats: I know they say that, but they really mean this.

Almost the opposite from the fundamentalist who says “if it’s in the spec it must be right”, the exegeist declares that the authors meant more than they actually wrote.

Lately, I’ve been hearing people say that floats are not really “meant” for laying out web pages. For example, a commentator on Drew Mclellan’s blog writes,

… The CSS 1.0 spec … talks about text wrapping around elements but nothing of multiple columns. I’ve never seen floats for layouts explicitly forbidden, but it’s pretty obvious that those above were the use cases in mind, and it’s hardly straightforward to use it for layout. (Source)

Now, I was never party to Messrs Lie and Boss’ discussions and decisions when they wrote the spec, and there may be other documents that I don’t know about, detailing what they really really meant, but the CSS1 float spec talks only of floating elements (and a div full of navigation etc is still an element). Admittedly, it also says “this property is most often used with inline images, but also applies to text elements” – but that isn’t normative, and even though it was almost certainly true of most pages when the spec was written, it isn’t the case now. I don’t think any exegesis of authorial intention is possible here. And if the authors felt strongly that it was only for simple elements, well, why not say that in the spec?

We could go on about holes in the specs: the lamentable lack of detail about the address element, the lack of a figure element to allow a caption and other data to be explicitly associated with an image, the fact that you can’t have lists in paragraphs or headings in lists, the inability to style poetry correctly and the resurrection of layout tables in css. We could mention the horrors of WCAG 2. But let’s not.

Let’s just acknowledge that the author’s intention, even when it can be reasonably gleaned from the specification, is not a Holy relic. As a community, we’ve built up a corpus of best semantic practice – let’s not throw the baby out with the bathwater by rejecting floats for layout, or abusing the definition list as the table was once sorely abused in the name of the Holy Specifications.

18 Responses to “ Breaking news: w3c specs are not the Word of God ”

Comment by Richard Rutter

Well you’re right Bruce, but I will hold my hands up as one of those folks who abuse definition lists. I’ve gone as far as to recommend them for use in forms. I suspect my motivation for the abuse is that dls provide handy hooks for CSS. I’ve also justified the decision on the basis of the example you quoted – I figured if you can use a definition list for dialogues then you can use it for other situations with pairs (eg form label and form control).

And that’s fine – it would rarely harm people’s experience unless really abused. But to take the other extreme – of only using HTML elements and CSS properties in the perceived spirit of which they were written would surely result in a lot of unstyled websites built in divs.

Comment by Jim

Is there a word for someone who finds the w3c specs ambiguous to the point of being useless? Or am I the only one?

If you use Tantek’s method to markup plays (and it is very sensible) don’t you run into a situation where some plays are marked up in that manner while others are marked up as definition lists? This then makes it difficult to write a standards-based parser which can read play dialogue from web pages, because you don’t know how they’ll have been marked up.

Of course, what you really need is TEI’s base tag set for drama (http://www.tei-c.org/P4X/DR.html) and an agreed-upon set of rules to transform the TEI elements to HTML. Or the ability to incorporate a TEI marked-up play into a page by extending XHTML, for example.

I think that the aim of a HTML author should be to write clean markup that works across browsers. If you’ve written HTML that makes your page accessible and easy to use, I don’t think you should be too worried about the semantic definitions given by the w3c.

Remember also that those HTML tags were originally introduced before CSS existed, when HTML was a presentational language. You could argue that it’s daft to apply usage examples that were written almost a decade ago to the sorts of web pages that people are writing now.

Comment by Joe Clark

I think it’s just fine and dandy to use definition lists to mark up appositional pairs. Such usage is not prohibited by the spec, which implies such usage is permitted.

http://blog.fawny.org/2004/05/16/ubu/

Besides, you can’t prove that using DL for this purpose is wrong the way you can prove that marking up a navbar in table cells is.

Comment by Jake Archibald

I agree that DLs are abused. I have even seen them used for heading-paragraph, with the DT being used instead of a H1 and the DD used instead of the paragraph.

However, you disagree with its use for forms… I’m not so sure, I think it makes sense for forms.

My DT could be “username:” (wrapped in a label) and the DD would hold the text input. Surely I’m asking the user to define their username, so definition lists are correct.

Any thoughts?

Comment by Ross Bruniges

I think I agree with Joe here – if it is not written anywhere that you should not use a particular element for something and specs are written in such a way that people can take different meanings from the same definitions these kind of ‘disagreements’ are bound to occur.

I think the difference now is that people will actually discuss things and put their ideas out for comment (like is happening here!!) opposed to just going out and doing things without proper thought or care – hopefully…..

Comment by Bruce

Thanks everyone for joining in and helping me clarify my thoughts.

Joe: you can help me out here, as you’re The Man when it comes to markup. I’m coming round to agreeing with Jake that a form could be a definition list, because the key word here is “definition”. But I can’t see any aspect of definition in appositional pairs. (Apart from the very specialised Literature-undergrad meaning of define (“King Lear’s line ‘Nothing will come of nothing’ defines the plot of the play”), and I think we both agree there’s little literature in the souls of w3c spec authors).

It’s almost as if the authors of the spec noticed that the browser default rendering of a definition list looks a bit like a play script and therefore decided that the element could be used for that. To me, it’s as odd and as incorrect as if the definition of blockquote said, “You can use this to indent anything you want indented”.

Even if the element were called “defintion and miscelleaneous list”, it still seems to me that Tantek’s suggested markup is a perfectly semantic way to mark up a dialogue. Help me understand the error of my ways, people!

While I agree with Jim that “you run into a situation where some plays are marked up in that manner while others are marked up as definition lists”, that shouldn’t be a barrier to using the correct elements. On the web we have navigation marked up in table cells, as simple adjacent links, with the <menu> element – yet the potential for confusion didn’t stop we standards evangelinas recommending using lists, even though that would increase the confusion.

Richard has a good point about handy hooks for styling, and that if people hadn’t stretched the specs there would be a hell of a lot of very plain websites out there. It’s true that the table element was grossly misused for layout, but the Get Out Of Jail Free card is that there was literally no alternative for years. These days, there’s perfectly good alternatives if an author needs extra hooks for styles: the div and span exist to group items together when there is no html element that fits the bill, so it’s surely better to use those than (mis)use an existing element, regardless of how (arguably) loosely-defined that element is. Isn’t it?

Thoughts, anyone?

(I confess also to thinking that my religious metaphors are biting me on the arse, as I must sound like one of those medieval priests debating how many angels can dance on a pinhead, while 90% of Web developers are using tables and spacers.)

Comment by Richard Rutter

“Even if the element were called ‘defintion and miscelleaneous list’, it still seems to me that Tantek’s suggested markup is a perfectly semantic way to mark up a dialogue.”

I think that’s kind of the point: there’s nothing wrong with Tantek’s suggestion and I doubt anyone would hold it against you if marked up dialogue like that. But the point is that perhaps there’s nothing wrong with using a definition list either.

We all know that HTML was designed for marking up scientific documents, rather than for the creation of modern-day web pages; and nothing has changed there. So we are left with some very specific elements, such as definition lists, but missing elements such as navigation lists which, in hindsight, would have been an obvious inclusion.

As such HTML 4 will always be open to ‘abuse’ or at least ‘rule-bending’ because it is so severely lacking. But if we can defend our markup decisions – if we’re actually thinking that such a thing might necessary – then it’s not really worth sweating about.

Comment by Johan

how about mark-up and styling decissions related to accessibility issues, site engine optimization

Comment by Jim

Have a look at this piece about XML vs. Microformats
http://cafe.elharo.com/xml/must-ignore-vs-microformats/
and the example page where XHTML is extended to include tags describing trade shows
http://www.cafeaulait.org/tradeshows.xml
I’ve only just come across it (via Simon Willison so I haven’t really had time to digest it. This was the idea I had in mind for play scripts though – take an existing XML vocabulary specifically designed to markup drama (including stage directions, camera angles and so forth) and feed that directly to the browser. As the article says, any modern browser should be capable of handling XML and XSLT.

Comment by Jim

“So we are left with some very specific elements, such as definition lists, but missing elements such as navigation lists which, in hindsight, would have been an obvious inclusion.”

I’m slightly confused by this – HTML did have navigation lists (<menu> and <dir>) in HTML 2 and HTML 3. See http://www.w3.org/MarkUp/html-spec/html-spec_5.html#SEC5.6.3

They were dropped from HTML 4 in favour of UL and OL.

I shall stop being a HTML trainspotter now and go do something productive instead.

Comment by Bruce

Indeed, Jim. I’ve previously lamented the dropping of menu which is much more semantic than a generic ul (and would make it 10 times simpler for Assistive technology and browser vendors to implement a “skip links” function – just ignore anything in a menu). Would love to know why it was dropped…

About Microformats: I haven’t looked at them in great detail, although they seem like a good concept. I did however murmur a prayer for the health of semantics at @media when Tantek advocated marking dates up as abbreviations.

Comment by Jim

I don’t really get Microformats, except that they allow you to do semantic-y sort of stuff within the unsemantic constraints of HTML, without having to learn the complicated bits of XML. So they’re very good for simple things like addresses and calendars

I don’t see how you would do something complex, like the following, with Microformats. Here the speaker of a speech, sp, is related back to a listing in the cast list, castItem, of the same document.
<!-- in the front matter ... -->
<castList>
<castItem><role id="m2">Menaechmus</role></castItem>
<castItem><role id="pen">Peniculus</role></castItem>
<!-- ... -->
</castList>
<!-- ... -->

<!-- in the text ... -->
<sp who="m2" ><l>Responde, adulescens, quaeso, quid nomen tibist?</l></sp>
<sp who="pen"><l>Etiam derides, quasi nomen non noveris?</l></sp>
<sp who="m2" ><l>Non edepol ego te, quot sciam, umquam ante hunc diem</l>
<l>Vidi neque novi; ...</l></sp>

Again, this from the base tag set for drama.

Comment by Sarven Capadisli

There is too much room for interpretation within the W3C specifications. Only a handful of people in the community take the extra step to look deep into the inconsistencies or perhaps even incompleteness of the recommendations.

This is one of the main reasons why I wrote: Where are my Web Standards

Whether its a dl or a ul has minimal impact in the grand scheme of things. Of course I am not advocating non-compliant markup or progress as far as Markup and Stylesheets go, but rather I question the bottom line; how does my action x impact my user?

If only the recommendations were both sound and complete.

Comment by Johan

Is this not a job for XHTML 2.0, XSLT to add more tags, customize tags through DTDs.

HTML is not the same thing.

Comment by Cecil Ward

A thought-provoking article Bruce. You referred to an example of how to best mark up a play, figures with captions and poetry. It occurs to me that a repository of best-practice markup patterns would be very worthwhile. Do you agree? If so, any ideas about who, how and where?