Microformats, accessibility, HTML 5 (again)

Microformats are a good idea, but some have accessibility problems because they expose machine data to humans by misusing the abbr element. These problems led to the BBC removing those microformats from their sites.

One such misuse is encoding dates and times in microformats such as hCalendar, hAtom, and hReview. Ultimately, this problem goes away in HTML 5, as that introduces a time element which is obviously better than an abbreviation for marking up dates and times (a tenet of microformats is to “use the most accurately precise semantic XHTML building block for each object”).

So, an example of an HTML 4 based hCalendar microformat (taken from the spec) is

<div class="vevent">
<a class="url" href="http://www.web2con.com/">http://www.web2con.com/</a>
<span class="summary">Web 2.0 Conference</span>:
<abbr class="dtstart" title="2007-10-05">October 5</abbr>-
<abbr class="dtend" title="2007-10-20">19</abbr>,
at the <span class="location">Argent Hotel, San Francisco, CA</span>
</div>

After replacing the abbr element with time and replacing its title attribute with datetime we get

<div class="vevent">
<a class="url" href="http://www.web2con.com/">http://www.web2con.com/</a>
<span class="summary">Web 2.0 Conference</span>:
<time class="dtstart" datetime="2007-10-05">October 5</time>-
<time class="dtend" datetime="2007-10-20">19</time>,
at the <span class="location">Argent Hotel, San Francisco, CA</span>
</div>

You can test how it renders in your fave browsers (and the other ones) on the microformats with time test page.

Replacing the abbr pattern elsewhere

Of course, this only works for dates and times. Other microformats use the flawed abbr pattern to code locations in microformats such as hCard, hCalendar & ‘geo’.

Here’s an example

Let’s go to <abbr class="geo" title="30.300474;-97.747247">Austin, TX</abbr>

Which, to a screenreader set to expand abbreviations, is an incomprehensible string of numbers (mp3):

“Thirty point three oh oh four seven four semi-colon minus ninety-seven point seven four seven two four seven”

A leading microformatter, Ben Ward, recently proposed an extension to the value-excerption pattern (no, I don’t know what that means, either) which allows this machine-readable information to remain in the DOM while hiding machine date from people.

My test page plays nicely with the following screenreaders:

I’m really excited by this as it may be the end of the microformats versus accessibility debates (that I’ve helped stir up). If you have access to assistive techologies, please give it a test.

Problems with this pattern from a microformats perspective might be

I hope, if screenreader and parser testing allows, that the new pattern will be adopted so that those of us who want to use microformats and care about accessibility can use it.

The future of microformats in HTML 5

I had naively thought that many microformats would use the HTML data-* attributes, which are for “embedding custom non-visible data” and thus seem perfect for embedding such information on structutal markup.

Any element in HTML 5 can have any number of data-* attributes. The asterisk is a wildcard; you can call them what you want. An example from the spec:

<div class="spaceship" data-id="92432"
data-weapons="laser 2" data-shields="50%"
data-x="30" data-y="10" data-z="90">

However, the spec goes on to say

User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values.

I was uncertain what this meant, so asked Anne van Kesteren who told me that these attributes are for passing data to scripts that are private to the page, rather than to indicate meaning to external parsers:

It’s so that non-private extensions that need User Agent implementation a) won’t break sites using those names for other purposes and b) get due consideration by the Working Group and a proper name without data-

This is made explicit in an addition to the spec last night by the editor, Ian Hickson:

This is because these attributes are intended for use by the site’s own scripts, and are not a generic extension mechanism for publicly-usable metadata.

So, microformats won’t be “rolled up” into HTML 5. I imagine that some of the microformats community will wish to lobby the HTML 5 working group for a proper name for the data they wish to store with an element, so that the data can be parsed reliably without the “hacky” nature of the microformats. There is a process for adding new features to the spec.

Others will want to ignore caveats that HTML 5 places on using the data-* attributes for publicly-usable metadata and use them anyway.

Perhaps most likely, microformats will continue to use class=, rel=, and the like as they do now. That would be valid HTML 5 and require no changes to specs or parsers.

So it seems that microformats will continue in an HTML 5 world. And, now that there seems to be a will to fix the accessibility problems, I think that’s a good thing.

15 Responses to “ Microformats, accessibility, HTML 5 (again) ”

Comment by Isofarro

Hey Bruce, I would suggest getting a screen magnifier user to test out the microformats examples. I have a recollection of something pixeldiva told me once about screen magnifiers, that they render title attributes as model-like dialogues that require a keypress each time to remove the title attribute contents from the magnified region of the page. This was, as far as I can remember, for any element in the document body that contained a title attribute – a div, a span.

I’m hoping pixeldiva can clarify what situations this occurred in for screen magnifiers – I think we had this chat around January 2007ish.

My takeaway from that is to use title attributes where they add value to the page content. Avoiding duplicating content with these attributes also makes sense. And of course, needless use of title attributes should be discouraged.

I don’t know what the effect on screen magnifiers will be on empty spans with title attributes, that’s not something I’d considered asking about at the time.

Mike.

Comment by Isofarro

“model-like dialogues”, I meant modal-like dialogues. Like a JavaScript alertbox, the one you have to click “OK” every single time it pops up.

Comment by Chris Heilmann

I don’t really get what the issue is here. Class attributes are meant to store machine information according to the W3C specs, yet the microformats community thinks it makes more sense to mis-use attributes and elements with a visual output and _then_ try to fix it with yet another code to possibly hide it from user agents.

The solution would be simple namespacing:

Austin, TX

This could be extended to for example woeid:123123, too. Yes, this isn’t a semantically valuable HTML element being used for this, but I dare anyone to show me a human being that answers “ah yeah, 30.300474;-97.747247″ when I say Austin,Texas!

Comment by Bruce

I’m with you, Christian. But I don’t much care how the microformats chaps want to mark up their code, as long as it doesn’t cause accessibility problems.

I do think that microformats are over-complicated (from my own experience of trying to mark them up) and that’s a barrier to entry for me — I don’t like to have to go to a book/ website to remind myself of the syntax.

Comment by Drew McLellan

“I do think that microformats are over-complicated (from my own experience of trying to mark them up) and that’s a barrier to entry for me”

Feedback like this is always useful, even more so with specifics.

All the specs are publicly drafted with plenty of opportunity to offer suggestions. We also try to build in authoring shortcuts where they make sense.

What I’m saying is that if you’ve got suggestions of how to make things simpler – please, put them forward! Simplification is always welcomed.

Comment by Bruce

Hi Drew
At the Law Society/ SRA I experimented with your Dreamweaver extension, until I decided not to use microformats because of the potential accessibility problems. (We absolutely *would* have had anguished phone calls if tooltips with geo or ISO time information appeared. In user testing, we found lots of people move the mouse as they read, so would have accidentally hovered over abbr elements).

For me, it’s tricky to remember what takes what tag – is it span or abbr? That could be because I don’t like the abbr element for that use so my mind rejects it? I also find some of the class names to be less then mnemonic; for example, “dtend” rather than “date-end”. (I know that they were inherited from some other spec.)

Comment by Drew McLellan

Don’t want to sidetrack this off topic too much, so I’ll be brief:

“For me, it’s tricky to remember what takes what tag – is it span or abbr?”

Generally the principal is that it doesn’t matter – use whatever markup is suitable for your document. I can’t think of a case where a microformat dictates what markup should be used – even hCard’s ‘url’ doesn’t have to be an A element.

Comment by David Storey

RDFa already has a attribute for data and such. Microformats could just use these in HTML 5 instead of lobby the HTML 5 group to invent another attribute instead. It just needs the extended RDFa attributes to be added to HTMl 5 (if they haven’t already) just as is the case with WAI-ARIA.

Comment by Jared Smith

Just a quick note that data-* could certainly be used to encapsulate the microformats data for a page script which then puts it where a user agent expects it to be (e.g., moves it from data-* to class, datetime, or whatever). It’s not particularly useful without scripting and doesn’t make as much sense as just putting it in class (or whatever) to begin with, but this is precisely the type of thing data-* is intended for.

Comment by Ben Ward

Well hello.

So, I’ll try to respond to respond to a few things here. First Bruce, thank you for all the work you’ve done lately (and in the past) helping out with the testing. Hearing about successes in JAWS is a huge boost.

Second, whilst I’ve been the one to take the time to present this new pattern, organise the test, do a lot of recent work and so forth, it’s not entirely ‘my’ proposal. The history of it can be traced through various brainstorming pages and various people’s contributions. Even I haven’t done that to identify individuals, but the point is that credit lies with the community, not with me individually.

Re: Christian Heilmann: The proposed semantics, use of empty span, not proposing using class. A few things:

It is, of course, somewhat subjective. As the wiki page for the current test effort notes, we’re pushing HTML4 into things it wasn’t designed to do (parallel representations of information). So we try to work something out that’s at least somewhat graceful and logical, and critically which doesn’t do any harm.

My reasoning behind a title and span, and in this specific case, reasoning against using class is as follows.

I reason that the machine-form of a date, or co-ordinates or whatever is itself content. You don’t want humans to see it, but it’s still content of the page. You could publish just with the machine-form, it would just be suboptimal to read. Since @title is for content, and we seem to have a cross-browser mechanism through which that title can not be exposed to humans, I find that to be a strong structural reason, and one which matches existing web publishing habits in general (using title, rather than inventing a new syntax elsewhere).
A pattern using @title can, if the author chooses, also expose that data to humans. You can use @value-title on a regular span containing text and have a tool-tip if that makes sense to your publishing situation. It’s flexible.
The title element ends up as a sibling to the human form. Again, that structure makes a lot of logical sense to me that the two forms are together as siblings in the DOM tree.

Regarding @class. Christian’s comment that ‘Class attributes are meant to store machine information according to the W3C specs’ is completely incorrect, and is a suggestion that comes up over and over again. So:

The HTML4 spec says, in section 7.5.2 on id and class attributes:

“The class attribute, on the other hand, assigns one or more class names to an element; the element may be said to belong to these classes.”

Below, it says that class may be used for stylesheet selectors, and ‘For general purpose processing by user agents.’

That ‘general purpose’ statement is wildly and regularly misinterpreted as allowing class to contain arbitrary data, even though that use contradicts the ‘belong to these classes’ specification. A data value is not a classification. Further clarifying this, under the spec for the id attribute in the same section above, ‘general purpose processing’ is defined more fully:

‘For general purpose processing by user agents (e.g. for identifying fields when extracting data from HTML pages into a database, translating HTML documents into other formats, etc.).’

Thus, @class is for classifying elements and for identifying fields (which is exactly what microformats does with it). Those things taken together, I don’t see how the spec can be read as permitting data itself to be valid within the class attribute.

Even if you disagree with my assertion that the ‘machine form’ is content, @class is still an invalid place to stick it.

Jared: Bruce already covered this in the article, but the data- attributes in HTML5 are explicitly and clearly defined as not for use by microformats: ‘Specifications intended for user agents must not define these attributes to have any meaningful values.’

Comment by Michael

“tooltips full of barely-comprehensible ISO data”

rubbish!
“2007-10-05″ is more meaningful to humans than rubbiish like “October 5″ … with the latter how would you know what YEAR they are talking about.

I would call the latter “barely comprehensible” rather than the ISO version.

Would you rather people get the date wrong?

In this case I think I’d want to see the tooltip so that I might be able to see the year!

There is a deeper cultural issue that needs to be looked at here … teaching people to write dates CLEARLY.

Comment by Ben Ward

Michael:

Yes, Year-Month-Day form dates are more universally comprehensible than their localised counterparts. And if people want to push that upon the world as a better representation they should certainly go out and do so.

‘Teaching people to write dates clearly’ is an honorable goal, but it is most certainly not the goal of Microformats to force that change on anyone. They exist to serve actual publishing practice as well as we can, and publishing is (and will remain) localised. Aesthetics, local comprehension, tradition, a simple matter of publisher taste are all reasons why Y-M-D is not desired in visible (or aural) presentation.

Furthermore, as soon as a date form becomes a date-time form, you do gain nonsense, namely T00:43:00-0800.

Comment by Bruce

Michael,

You may think that an ISO formatted date with a time is human readable. I don’t.

Also, RTFA. From hAccessibility:

Given a title value of “20070312T1700-06”, JAWS and Window Eyes both try to read an ISO date string never intended to assault human ears:

“Twenty million seventy-thousand three-hundred twelve tee seventeen-hundred dash zero six.”

ogg, mp3.