I’m trying to wrap my little bonce around HTML5 microdata, not least because Opera 12 pre-alpha has support for it.
I’m quite discouraged because the two articles I’ve read tell me that it’s easy, but I’m still stuck (although they are by noted brain-boxes Oli Studholme and Tab Atkins).
I’m befuddled over itemid:
Sometimes, an item gives information about a topic that has a global identifier. For example, books can be identified by their ISBN number.
Vocabularies (as identified by the itemtype attribute) can be designed such that items get associated with their global identifier in an unambiguous way by expressing the global identifiers as URLs given in an itemid attribute.
The exact meaning of the URLs given in itemid attributes depends on the vocabulary used.
What actually does this mean? How do I know if a particular vocabulary supports global identifiers for items?
In the spec, some Microdata vocabularies are listed. vCard is one, and the spec says “This vocabulary supports global identifiers for items.” The URL defining the itemtype for vCard doesn’t seem to tell me, and the examples in the spec make no use of itemid.
And, because I understand real examples rather than the theoretical, what’s the practical benefit of itemid?
Specifically, what would I gain by using
<div itemscope itemtype="http://vocab.example.net/book" itemid="urn:isbn:0867193719">
Rebekah Brooks' Self-portraits (ISBN 0867193719)
</div>
[example simplified from an example in the spec which says "The "http://vocab.example.net/book" vocabulary in this example would define that the itemid attribute takes a urn: URL pointing to the ISBN of the book."]
over Schema.org’s Book schema (which doesn’t seem to use itemid – in fact, schema.org seems to make no mention of it):
<div itemscope itemtype="http://schema.org/Book">
Rebekah Brooks' Self-portraits (ISBN <span itemprop="isbn">0867193719</span>)
</div>
Double points if you can answer the question without baffling me with mentions of SPARQLy OWLs and Don’tologies.
While I was on my holidays there was a storm(ette) about rev=canonical and how it isn’t possible in HTML 5 because rev isn’t in the spec. (Apparently, the answer is to use rel=shortlink instead).
Mark Pilgrim published an article about link relations in HTML 5 with more information about the rel attribute, which I found interesting; I had no idea that relations such as rel=license and rel=author were available to allow auto-discovery of license information, and author details.
So I want to float the idea of rel=accessibility that would allow assistive technologies to discover and offer shortcuts to accessibility information, such as a WCAG 2 conformance claim, or a form to request content in alternate formats (for example).
The reason this would be useful is that links to such pages are generally right down in the footer of the web pages. This means that, for screenreader users, they have to navigate to the end of the page to find the link, or not know it exists.
Ironically, on sites that really do need a link to accessibility help (because of lack of structure to navigate with or huge amounts of content before the footer), those who need it are unlikely to find the link to the help.
In the “bad old days”, helpful developers would give an accesskey attribute to that link (which are generally undiscoverable to the human or to a parser, and which often conflict with assistive technologies’ command keystrokes).
A standardised way of indicating the related accessibility information would be better and not rely on arbitrary keys chosen by a developer.
So, should I propose that rel=accessibility be added to the list of values? It looks to be an arduous process; although you don’t need to prove your worth to the HTML 5 gatekeepers, you do have to prove your worth to the microformats gatekeepers.
I thought I’d ask you guys first—is this a good idea?
Microformats are a good idea, but some have accessibility problems because they expose machine data to humans by misusing the abbr element. These problems led to the BBC removing those microformats from their sites.
One such misuse is encoding dates and times in microformats such as hCalendar, hAtom, and hReview. Ultimately, this problem goes away in HTML 5, as that introduces a time element which is obviously better than an abbreviation for marking up dates and times (a tenet of microformats is to “use the most accurately precise semantic XHTML building block for each object”).
So, an example of an HTML 4 based hCalendar microformat (taken from the spec) is
<div class="vevent">
<a class="url" href="http://www.web2con.com/">http://www.web2con.com/</a>
<span class="summary">Web 2.0 Conference</span>:
<abbr class="dtstart" title="2007-10-05">October 5</abbr>-
<abbr class="dtend" title="2007-10-20">19</abbr>,
at the <span class="location">Argent Hotel, San Francisco, CA</span>
</div>
After replacing the abbr element with time and replacing its title attribute with datetime we get
<div class="vevent">
<a class="url" href="http://www.web2con.com/">http://www.web2con.com/</a>
<span class="summary">Web 2.0 Conference</span>:
<time class="dtstart" datetime="2007-10-05">October 5</time>-
<time class="dtend" datetime="2007-10-20">19</time>,
at the <span class="location">Argent Hotel, San Francisco, CA</span>
</div>
You can test how it renders in your fave browsers (and the other ones) on the microformats with time test page.
Replacing the abbr pattern elsewhere
Of course, this only works for dates and times. Other microformats use the flawed abbr pattern to code locations in microformats such as hCard, hCalendar & ‘geo’.
Here’s an example
Let’s go to <abbr class="geo" title="30.300474;-97.747247">Austin, TX</abbr>
Which, to a screenreader set to expand abbreviations, is an incomprehensible string of numbers (mp3):
“Thirty point three oh oh four seven four semi-colon minus ninety-seven point seven four seven two four seven”
A leading microformatter, Ben Ward, recently proposed an extension to the value-excerption pattern (no, I don’t know what that means, either) which allows this machine-readable information to remain in the DOM while hiding machine date from people.
My test page plays nicely with the following screenreaders:
- JAWS 9 and JAWS 10 on Firefox 3 and IE 7 with high verbosity settings and abbr and acronym set to always be expanded (says Jared Smith)
- NVDA on Firefox 3
- Opera 9.63 and Voicover 10.5.5 (thanks Henny)
- Window-Eyes with IE 7 on WinXP (thanks dotjay)
- WinXP Window-Eyes with FF3 and full punc reads human date. It pauses before the .dtstart spans in Mouse mode, but not in Browse mode (thanks dotjay)
- Safari 3.2.1 on OS X Leopard with VoiceOver (thanks dotjay)
- WinXP Window-Eyes with IE 6 and full punc reads human date (thanks dotjay)
I’m really excited by this as it may be the end of the microformats versus accessibility debates (that I’ve helped stir up). If you have access to assistive techologies, please give it a test.
Problems with this pattern from a microformats perspective might be
- The machine data is “hidden” so might more easily fall out of sync with the human data
- The machine data must be the first child of the property. If it isn’t, a parser won’t see it – but it will be trickier to debug because the developer will still see it in the source code
I hope, if screenreader and parser testing allows, that the new pattern will be adopted so that those of us who want to use microformats and care about accessibility can use it.
The future of microformats in HTML 5
I had naively thought that many microformats would use the HTML data-* attributes, which are for “embedding custom non-visible data” and thus seem perfect for embedding such information on structutal markup.
Any element in HTML 5 can have any number of data-* attributes. The asterisk is a wildcard; you can call them what you want. An example from the spec:
<div class="spaceship" data-id="92432"
data-weapons="laser 2" data-shields="50%"
data-x="30" data-y="10" data-z="90">
However, the spec goes on to say
User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values.
I was uncertain what this meant, so asked Anne van Kesteren who told me that these attributes are for passing data to scripts that are private to the page, rather than to indicate meaning to external parsers:
It’s so that non-private extensions that need User Agent implementation a) won’t break sites using those names for other purposes and b) get due consideration by the Working Group and a proper name without data-
This is made explicit in an addition to the spec last night by the editor, Ian Hickson:
This is because these attributes are intended for use by the site’s own scripts, and are not a generic extension mechanism for publicly-usable metadata.
So, microformats won’t be “rolled up” into HTML 5. I imagine that some of the microformats community will wish to lobby the HTML 5 working group for a proper name for the data they wish to store with an element, so that the data can be parsed reliably without the “hacky” nature of the microformats. There is a process for adding new features to the spec.
Others will want to ignore caveats that HTML 5 places on using the data-* attributes for publicly-usable metadata and use them anyway.
Perhaps most likely, microformats will continue to use class=, rel=, and the like as they do now. That would be valid HTML 5 and require no changes to specs or parsers.
So it seems that microformats will continue in an HTML 5 world. And, now that there seems to be a will to fix the accessibility problems, I think that’s a good thing.