HTML 5, politics and me

I did a bit of a preamble at Saturday’s multipack presentation to explain my relationship with HTML 5 which isn’t in the slides, so I’m documenting it here.

I work for Opera, who are heavily involved in the specification process (along with Google, Apple, Mozilla, Microsoft, Adobe and many others). But I’m not part of the spec team in Opera, and writing or analysing specs is not part of my job or skill-set—I’m no spechead (or should that be smeghead?)

My interest is as a standards-lovin’ web developer who until recently was technical lead for a large legal website. Therefore, it’s interesting to me to try to use HTML 5 and see where it helps me and where the pitfalls are, and feed those pitfalls back to the working group before the spec is finalised.

When speaking about it at conferences, I always urge others to get involved: try it out and give feedback. If you don’t vote, you have no right to complain about the government.

But people do ask me my opinion on the politics surrounding the development of the language, so here you are.

Caveats

These are my personal opinions, not my employer’s.

I haven’t spoken about these matters on the mailing list, as I’m not nearly knowledgeable enough in these matters, so I reserve the right to change my mind in a second if someone argues convincingly. Probably repeatedly.

HTML 5 vs XHTML 2

XHTML 2 seems like a beautifully thought-out spec. But no-one has any interest in implementing it. It’s also unclear to me how it would deal with web applications.

HTML 5 is a kludge. It has to be; it tries to be backwards-compatible while hugely extending the language. Many, many aspects of it annoy me: the small element being in-line only; the revolting use of dt and dd for dialogues (see why it’s bad) and the lack of headings inside lists. But, for better or for worse, it’s the spec that has the traction and implementors behind it, so for pragmatic reasons it’s the one that I’m betting on.

Also, and most importantly, I believe that we need it fast. We need an open standard for simplifying building web apps before proprietary frameworks become the default.

Namespaces

I don’t understand the question. Way over my head. Next!

RDFa and microformats

The RDFa debate about has kicked off again on Sam Ruby’s site.

Making sure that your data is available to machine parsers so they can do something useful with it seems to me a noble goal, as long as it’s easy to do.

I had a look at the (presumably) simplest document about RDFa out there—the RDFa primer and I can’t say I came away any the wiser. The barrier to entry looks too high from where I stand. (But then I am only 5 foot 6 tall).

On the other hand, RDFa is a w3c standard, and microformats aren’t. Microformats seem to me to be easier to use but I still think that one reason they aren’t much adopted by “real people” is because they’re still too hard to code.

Perhaps a WordPress blog editor plug-in would help, and Drew McLellan wrote a nice Dreamweaver extension but, for me, if I can’t remember the syntax then it’s too hard.

I’m not in favour of bloating the HTML 5 spec and having an element for every conceivable occasion (var, kbd and samp, I’m looking at you) but I think we could usefully add a couple more microformatty elements to the language.

It seems to me that the useful microformats are based on a trinity of “who, where, when”. So I’m glad that HTML 5 has the time element (although I see no reason why authors shouldn’t be able to mark up BCE dates or dates like “July 2008″ or “1935″ which are currently disallowed by the spec).

I’d also like to see a place element. (Andy Mabbett proposes a location element, but I quite fancy having a geeky t-shirt reading “I’ve got the <time> if you’ve got the <place>”). It would look like this:

<place latitude="52.548" longitude="-1.932">Great Barr</place>

To complete the trinity, a name element could be used. If no attributes are present, that the name is assumed to be a single given name and single family name, so <name>Bruce Lawson</name> would be all you need, but if you want middle names, multi-part family names, nicknames, Russian patronyms etc you’d have to use attributes such as

<name given-name="Bruce" middle-name="Agamemnon" family-name="Lawson">Bruce Agamemnon Lawson</name>

<name given-name="Bruce" patronym="Brucemov" family-name="Lawsonski">Bruce Brucemov Lawsonski</name>

<name given-name="Bruce" family-name="Van der Growl">Bruce Van der Growl</name>

<name given-name="Bruce Il" family-name="Kim">Kim Bruce Il</name>

<name given-name="Bruce" nickname="Awesome" family-name="Lawson">Bruce "Awesome" Lawson</name>

It’ll never happen, but it’s what I’d like.

The video element

The spec used to say

User agents should support Ogg Theora video and Ogg Vorbis audio, as well as the Ogg container format

so browsers could display video without plug-ins and without using a patented codec (Ogg is opensource).

Nokia and Apple objected, and there is worry about “submarine patents”, so we’re very unlikely to see every browser agreeing on a patent-free codec to use in the near future, and so we’ll need horrors like nesting embed inside object inside video. Or just continue using Flash, of course (here’s a valid way to embed Flash video in HTML 5).

Sun Microsystems are developing a new patent-free codec called OMS video precisely to fill this hole.
Whether some companies with financial interests in their proprietary media formats will be willing to adopt it is unknown.

Accessibility and HTML 5

alt text

There was a lot of debate last year when HTML 5 proposed to make the alt attribute optional for some images. It was resolved by adding reams of explanatory notes to the spec, but it’s not this spec’s job to define accessibility.

I prefer the approach that Patrick Lauke suggested: the spec should say “Alt text is mandatory, but may be blank. See WCAG for guidance on writing alternate text”.

The summary attribute on tables

In HTML 4, the summary attribute “provides a summary of the table’s purpose and structure for user agents rendering to non-visual media such as speech and Braille”.

I’m not a fan of the summary attribute, having seen it misused hundreds of times in the wild in code like summary="this is a layout table".

But much more importantly, I don’t like invisible elements and think that if a table is sufficiently complex that it needs its structure explaining to a blind person, it needs its structure explaining to people using screen magnifiers, for whom part of the table may be off-screen at any given time, or for people like me who find complex tables difficult to understand.

So, unless there are useful studies that show that only blind people benefit from this kind of explanatory data and that no other users do, I have no problem with its removal from the HTML 5 spec. It would, of course, still be valid in existing sites which are (I feel safe in assuming) using the HTML 4 or XHTML 1.x DOCTYPEs.

Same with longdesc too, as no-one ever uses it and browsers don’t support it.

WAI-ARIA

ARIA should validate as HTML 5. I believe it will happen: Henri Sivonen is working on mapping the two specs.

Other accessibility worries have been resolved; Gez Lemon’s research into nested headers succeeded in adding it to the spec, and canvas must have accurate fallback content, especially if it is mis-used to display text (the spec used to say “should”).

What’s your favourite political argument? We could talk about spec splitting, but that’s for another day.

14 Responses to “ HTML 5, politics and me ”

Comment by Joshue O Connor

Hi Bruce,

I disagree with your stance on @summary as the misuse of the attribute is really not a solid enough reason to remove it from the specification. It /is/ very useful to describe complex data tables for screen reader users,even if it /may/ sometimes be misused.

Its retention in the HTML 5 spec would be more useful, than harmful, to people with disabilities for many reasons, backwards compatibility and so on, rather than some /new/ method to effectively do the same thing.

A long descriptor for complex tables is needed. For sighted users this content could be revealed using scripting if it would be of use to some groups, users with cognitive impairments perhaps. New UAs could also support this behaviour. I agree that the summary=”This is a layout table” case is however a misuse but that still does not take away from the attributes potential for good.

BTW Longdesc functionality will probably be replaced by the use of aria described-by or aria labelled-by properties.

Cheers

Josh

Comment by Bruce

Hi Joshue, I didn’t explain myself very well. My main argument isn’t that summary has been misused, but that explanatory stuff should be available to everyone. (amended article to emphasise the important point).

This is probably a stupid question, but why not replace summary by aria-described by, just like longdesc?

Comment by Joshue O Connor

Hi Bruce,

I guess the main reason not to replace @summary using the method is backwards compatibility. @summary is very well supported by older AT and there are many users of version numbers < JAWS 7/6 for example.

So yeah, the aria properties are great and they can be used to do the same thing but @summary already does this for complex data tables so there is no need to introduce a new method IMO. I am a fan of ARIA and using these generic qualities to describe content but in this case there is no need to use them to replace what is already there.

Cheers

Josh

Comment by Andy Mabbett

@Bruce: “location” or “place” – all the same to me, it’s just a label. I reused the former from iCalendar/ hCalendar, but as it’s meant for coordinates rather than a text place name, then “geo” (from the same source) might be better. And anything that stops you making bad jokes has to have something in its favour.

I’m with you on the name element, but any sub-division should be based on those that exist in vCard/ hCard, for ease of interoperability.

And if we’re having a “name” element, then why not one for marking-up postal addresses and place names, like the “adr” and sub-properties in hCard? (Sadly the element “address” already exists for other purposes.)

Did you see my other comment, on the mailing list:

a more widely-scoped “measurement” element, capable of taking, for example, values of “duration”, “length”, “mass”, “temperature”, etc.; and a value; and perhaps a schema (defaulting to
SI), would perhaps be more useful. Use cases are microformats, plus translation, automated conversion, sorting, etc.

As to learning microformats: if I can do it, then so can you. I’d be happy to teach you ;-)

Comment by Bruce

Joshue – “I guess the main reason not to replace @summary using the method is backwards compatibility. @summary is very well supported by older AT and there are many users of version numbers < JAWS 7/6 for example."

I’ve added a line: summary “would, of course, still be valid in existing sites which are (I feel safe in assuming) using the HTML 4 or XHTML 1.x DOCTYPEs.”

Andy – fixed that link, but your link to your mail to the mailing list is kaput and Google didn’t help me.

I’d redefine the address element; if other elements can be redefined, this can be as well, and I’ll bet you a pint of Old Tummygrowler that most in-the-wild uses of address are to postal addresses. (Opera MAMA will prove me right, I hope.)

Comment by Andy Mabbett

@Bruce: Thanks for fixing the former; the mailing list archives in the second link appear to be borked – hopefully temporarily.

Redefining address makes sense; and you should always do what your MAMA tells you.

Comment by Bruce

Is there a defined scheme for celestial co-ords you could point me to (that’s compatible with terrestrial co-ordinates as most people know them from mapping software)

I suspect that it’ll be regarded by the working group as an edge-case, along with BCE dates. But if you’ve a use case, you should mail the working group.

Comment by Jim O'Donnell

RA and dec are the celestial equivalents of lat/long, more or less. Coordinate systems are very complicated, in practice, but we wrote a beginners guide for our astronomy competition.

Wikipedia publish celestial coords on its astronomy pages, eg. http://en.wikipedia.org/wiki/Regulus
and we’re tagging flickr photos via the helpful people at astrometry.net. Oddly enough, we publish RA in degrees, wikipedia uses hours, minutes and seconds. Both are valid units for RA.

Whether that’s worth adding to HTML? I don’t know, but adding a tag that’s wedded to a single coordinate system, WGS84 latitude and longitude seems like it would be as short-sighted as restricting dates to the modern Gregorian calendar.

Re. BC dates, museums use them online all the time, of course, for antiquities (http://www.nmm.ac.uk/collections/explore/object.cfm?ID=AAA6229). Would it be worth the effort to mark those up as dates in a HTML 5 version of the catalogue record?