Archive for the 'microformats' Category

The practical value of semantic HTML

Bruce’s guide to writing HTML for JavaScript developers

It has come to my attention that many in the web standards gang are feeling grumpy about some Full Stack Developers’ lack of deep knowledge about HTML. One well-intentioned article about 10 things to learn for becoming a solid full-stack JavaScript developer said

As for HTML, there’s not much to learn right away and you can kind of learn as you go, but before making your first templates, know the difference between in-line elements like <span> and how they differ from block ones like <div>. This will save you a huge amount of headache when fiddling with your CSS code.

This riled me too. But, as it’s Consumerfest and goodwill to all is compulsory, I calmed down. And I don’t want to instigate a pile-on of the author of this piece; it’s indicative of an industry trend to regard HTML as a bit of an afterthought, once you’ve done the real work of learning and writing JavaScript. If the importance of good HTML isn’t well-understood by the newer breed of JavaScript developers, then it’s my job as a DOWF (Dull Old Web Fart) to explain it.

Gather round, Fullstack JavaScript Developers – together we’ll make your apps more usable, and my blood pressure lower.

What is ‘good’ HTML?

Firstly, let’s reach a definition of ‘good’ HTML. Many DOWFs used to get very irked about (X)HTML being well-formed: proper closing tags, quoted attributes and the like. Those days are gone. Sure, it’s good practice to validate your HTML, just like you lint your JavaScript (it can catch errors and make your code more maintainable), but browsers are very forgiving.

In fact, part of what we commonly call ‘HTML5’ is the Parsing Algorithm which is like an HTML ninja – incredibly powerful, yet rarely noticed. It ensures that all modern browsers construct the same DOM from the same HTML, regardless of whether the HTML is well-formed or not. It’s the greatest fillip to interoperability we’ve ever seen.

By ‘good’ HTML, I mean semantic HTML, a posh term for choosing the right HTML element for the content. This isn’t a philosophical exercise; it has directly observable practical benefits.

For example, consider the <button> element. Using this gives you some browser behaviour for free:

  • A button is focusssable via the keyboard. I bet you, dear reader, know all the keyboard shortcuts for your IDE; it makes development much faster. Many people use only the keyboard when using a webpage. I do it because I have multiple sclerosis – therefore the fine motor control required to use a mouse can be difficult for me. My neighbour has arthritis, so she prefers to use the keyboard.
  • Buttons can be activated using the space bar or the enter key; you don’t have to remember to listen for these keypresses in your script.
  • Inside a <form>, it doesn’t even need JavaScript to work.

“What’s that?”, you say. “Everyone has JavaScript”. No, they don’t. Most people do, most of the time. But I guarantee you that everyone is without JavaScript sometimes.

Here’s another example: semantically linking a <label> to its associated <input> increases usability for a mouse-user or touch-screen user, because clicking in the label focusses into the input field. See this in action (and how to do it) on MDN.

This might be me, with my MS; it might be you, on a touch-screen device on a bumpy train, trying to check a checkbox. How much easier it is if the hit area also includes the label “uncheck to opt out of cancelling us not sending you spam forever”. (Compare the first and second identical-looking examples in a checkbox demo.)

But the point is that by choosing the right element for the job, you’re getting browser behaviour for free that makes your app more usable to a range of different people.

Invisible browser behaviours

With me so far? Good. The browser behaviours associated with the semantics of <button> and <label> are obvious once you know about them – because you can see them.

Other semantics aren’t so obvious to a sighted developer with a desktop machine and a nice big monitor, but they are incredibly useful for those who do need them. Let’s look at some of those.

HTML5 has some semantics that you can use for indicating regions on a page. For example, <nav>, <main>, <header>, <footer>.

If you wrap your main content – that is, the stuff that isn’t navigation, logo and main header etc – in a <main> tag, a screen reader user can jump immediately to it using a keyboard shortcut. Imagine how useful that is – they don’t have to listen to all the content before it, or tab through it to get to the main meat of your page.

And for people who don’t use a screenreader, that <main> element doesn’t get in the way. It has no default styling at all, so there’s nothing for you to remove. For those that need it, it simply works; for those that don’t need it, it’s entirely transparent.

Similarly, using <nav> for your primary navigation provides screenreader users with a shortcut key to jump to the navigation so they can continue exploring your marvellous site. You were probably going to wrap your navigation in a <div class=”nav”> to position it and style it; why not choose <nav> instead (it’s shorter!) and make your site more usable to the 15% of the world who have a disability?

For more on this, I humbly point you to my 2014 post Should you use HTML5 header and footer?. A survey of screenreader users last year showed that 80% of respondents will use regions to navigate – but they can only do so if you choose to use them instead of wrapping everything in <div>s. Now you know they exist, why wouldn’t you use them?

New types of devices

We’re seeing more and more types of devices connecting to the web, and semantic HTML can help these devices display your content in a more usable way to their owners. And if your site is more usable than your competitors’, you win, and your boss will erect a massive gold statue of you in the office car park. (Trust me, your boss told me. They’ve already ordered the plinth.)

Let’s look at one example, the Apple Watch. Here are some screenshots and excerpts from the transcript of an Apple video introducing watchOS 5:

We’ve brought Reader to watchOS 5 where it automatically activates when following links to text heavy web pages. It’s important to ensure that Reader draws out the key parts of your web page by using semantic markup to reinforce the meaning and purpose of elements in the document. Let’s walk through an example. First, we indicate which parts of the page are the most important by wrapping it in an article tag.

diagram of an article element wrapping content on an Apple Watch

Specifically, enclosing these header elements inside the article ensure that they all appear in Reader. Reader also styles each header element differently depending on the value of its itemprop attribute. Using itemprop, we’re able to ensure that the author, publication date, title, and subheading are prominently featured.

Apple Watch diagram showing how it uses microdata attributes to layout and display information about an article

itemprop is an HTML5 microdata attribute. There are shared vocabularies documented at, which is founded by Google, Microsoft, Yahoo and Yandex. Using vocabularies with microdata can make your pages display better in search results:

Many applications from Google, Microsoft, Pinterest, Yandex and others already use these vocabularies to power rich, extensible experiences.

(If you plan to put things into microdata, please note that Apple, being Apple, go their own way, and don’t use a vocabulary here. Le sigh. See my article Content needs a publication date! for more. Or view source on this page to see how I’m using microdata on this article.)

Apple WatchOS also optimises display of items wrapped in <figure> elements:

Reader recognizes these tags and preserves their semantic styles. For this image, we use figure and figcaption elements to let the Reader know that the image is associated with the below caption. Reader then positions the image alongside its caption.

Apple Watch diagram showing how it lays out figures and captions if appropriately marked up

You probably know that HTML5 greatly increased the number of different <input> types, for example <input type=”email”> on a mobile device shows a keyboard with the “@” symbol and “.” that are in all email addresses; <input type=”tel”> on a mobile device shows a numeric keypad.

On desktop browsers, where you have a physical keyboard, you may get different User Interface benefits, or built-in validation. You don’t need to build any of this; you simply choose the right semantic that best expresses the meaning of your content, and the browser will choose the best display, depending on the device it’s on.

In WatchOS, input types take up the whole watch screen, so choosing the correct one is highly desirable.

First, choose the appropriate type attribute and element tag for your form controls.
WebKit supports a variety of form control types including passwords, numeric and telephone fields, date, time, and select menus. Choosing the most relevant type attribute allows WebKit to present the most appropriate interface to handle user input.

Secondly, it’s important to note that unlike iOS and macOS, input methods on watchOS require full-screen interaction. Label your form controls or specify aria label or placeholder attributes to provide additional context in the status bar when a full-screen input view is presented.

Apple Watch showing different full-screen keyboards for different email types

I didn’t choose to use <article> and itemprop and input types because I wanted to support the Apple Watch; I did it before the Apple Watch existed, in order that my code is future-proof. By choosing the right semantics now, a machine that I don’t know about yet can understand my content and display it in the best way for its users. You are opting out of this if you only use <div> or <span> because, by definition, they have “no special meaning at all”.


I hope I’ve shown you that choosing the correct HTML isn’t purely an academic exercise. Perhaps you can’t use all the above (and there’s much more that I haven’t discussed) but when you can, please consider whether there’s an HTML element that you could use to describe parts of your content.

For instance, when building components that emit banners and logos, consider wrapping them in <header> rather than <div>. If your styling relies upon classnames, <header class=”header”> will work just as well as <div class=”header”>.

Semantic HTML will give usability benefits to many users, help to future-proof your work, potentially boost your search engine results, and help people with disabilities use your site.

And, best of all, thinking more about your HTML will stop Dull Old Web Farts like me moaning at you.

What’s not to love? Have a splendid holiday season, whatever you celebrate – and here’s to a semantic 2019!


On the talismanic fight between RDFa and microdata

A new fight has broken out in specland, between the supporters of RDFa and supporters of microdata. Observers may be wondering why; both are methods of adding extra markup to existing content in order that machines may better understand the content. Semantic Web proponents (note capital letters) dream of a Web where all content is linked by said machines. Semantic Web sceptics have more humble aspirations of search engines better understanding micro-content (is this string of digits a book ISBN, or a phone number?).

RDFa was part of XHTML 2. It became a W3C standard (or, in their vernacular, a “recommendation”) in 2008. microdata was invented by Ian Hickson as part of HTML5 because he identified deficiencies in RDFa. microdata was subsequently modularised out of W3C HTML5, but microdata is part of HTML5; it validates, whereas RDFa doesn’t.

Note the history. Like football fights that break out because one guy called an opposing team fan’s pint “a pouff”, this isn’t about the actual slight at all; this is about the past, allegiances and alliances; it’s a clash of world views. This is XML versus non-XML; it’s the XHTML 2 gang against the uncouth young turks of HTML5. This is Rangers vs Celtic; it’s Blur vs Oasis; it’s Tiswas vs Swap Shop.

(Added 15:30 GMT: R/e my framing the current debate as a talismanic battle, I should point out that I don’t mean Manu (whom I’ve always found to be courteous, thoughful and a jolly good chap). Neither do I mean Marcos, who isn’t a WHATWG-er. But some of the discourse “cowardice”, “suck metadata and fade, for all I care” on one side, and “TimBL’s RDF temple priests still mad as hell” suggests some, er, partisan feeling going on.)

What follows is the observation of a layman; I’ve not used much structured content, so am not an expert (I once tried to use microformats for events at the Law Society, but their accessibility problems prevented it.)

In my opinion, the primary deficiencies of Classic RDFa are that it’s too hard to write. For professional metadata-ologists it may be simple (but, hey, those guys understand Dublin Core!). The difficulty for me as an HTML wrangler was namespacing, CURIEs, and triples. This is XML land, and most web authors are not particularly adept with XML.

There’s also the problem that in order to use RDFa properly, you needed an xmlns attribute which is separate from the content you’re actually marking up (you don’t anymore in RDFa 1.1, see Manu’s comment). In a world where lots of content is syndicated via machine, or copy and pasted by authors (many of whom don’t really understand what they’re copy and pasting), this leads to breakage as not all of the necessary moving parts get transferred to their new environment. Hixie wrote

Copy-and-paste of the source becomes very brittle when two separate parts of a document are needed to make sense of the content. Copy-and-paste is how the Web evolved, so I think it is important to keep it functional and easy.

microdata solves this problem. It’s also easier to write than Classic RDFa (in my opinion) although I’m still mystified by the itemid attribute. I intend to start using microdata on this site soon (in order to plug the holes left by removal of the HTML5 pubdate attribute).

I’ve been recommending that people use microdata. Its main advantages:

Manu Sporny understood the problem that RDFa is hard to author for those of us who find the best ontology is a don’t-ology. Almost a year ago, he set about simplifying RDFa and came up with RDFa Lite. RDFa Lite greatly simplifies RDFa; in fact, you can search and replace microdata terms with RDFa terms (see his post Mythical Differences: RDFa Lite vs. Microdata).

RDFa has multiple advantages, too:

It seems to me that developers should just choose the one that meets their project’s needs. Need valid code Don’t need “full fat” RDFa, need a JavaScript API? Choose microdata. Care about Facebook, don’t care about a JavaScript API? Use RDFa Lite.

The current fight, however, won’t allow that. The RDFa gang want to stop microdata going further in the standardisation process because RDFa became a Recommendation first, and microdata is quite similar to it. (This is a controversial perspective; see Manu’s comment.)

While I completely understand that two competing standards makes it harder for developers in the short term, I agree with Marcos Caceres (who isn’t a WHATWG/ HTML5 zealot) who counters Manu Sporny’s objection to microdata progressing thus:

I don’t see what it being a “Recommendation” has to do with anything – just because it’s a W3C Recommendation does not mean that RDFa has a monopoly on structured data in HTML. So, just because that spec reached Rec first doesn’t mean that it’s somehow better or preferable to any other future solution (including micro data). That would be like objecting to Javascript because assembler (or punch cards) already meet all the use cases…

I hope you will instead focus your energy on convincing the world that RDFa is the “correct technology” on its own merits and not place your bets on a mostly meaningless label (“Recommendation”) given by some (much loved, but) random standard organisation.

I see no technical reason to favour microdata or RDFa Lite; both do the job. So, developers; which tickles your fancies? RDFa Lite or microdata?

microdata help, please

I’m trying to wrap my little bonce around HTML5 microdata, not least because Opera 12 pre-alpha has support for it.

I’m quite discouraged because the two articles I’ve read tell me that it’s easy, but I’m still stuck (although they are by noted brain-boxes Oli Studholme and Tab Atkins).

I’m befuddled over itemid:

Sometimes, an item gives information about a topic that has a global identifier. For example, books can be identified by their ISBN number.

Vocabularies (as identified by the itemtype attribute) can be designed such that items get associated with their global identifier in an unambiguous way by expressing the global identifiers as URLs given in an itemid attribute.

The exact meaning of the URLs given in itemid attributes depends on the vocabulary used.

What actually does this mean? How do I know if a particular vocabulary supports global identifiers for items?

In the spec, some Microdata vocabularies are listed. vCard is one, and the spec says “This vocabulary supports global identifiers for items.” The URL defining the itemtype for vCard doesn’t seem to tell me, and the examples in the spec make no use of itemid.

And, because I understand real examples rather than the theoretical, what’s the practical benefit of itemid?

Specifically, what would I gain by using

<div itemscope itemtype="" itemid="urn:isbn:0867193719">
Rebekah Brooks' Self-portraits (ISBN 0867193719)

[example simplified from an example in the spec which says “The “” vocabulary in this example would define that the itemid attribute takes a urn: URL pointing to the ISBN of the book.”]

over’s Book schema (which doesn’t seem to use itemid – in fact, seems to make no mention of it):

<div itemscope itemtype="">
Rebekah Brooks' Self-portraits (ISBN <span itemprop="isbn">0867193719</span>)

Double points if you can answer the question without baffling me with mentions of SPARQLy OWLs and Don’tologies.


While I was on my holidays there was a storm(ette) about rev=canonical and how it isn’t possible in HTML 5 because rev isn’t in the spec. (Apparently, the answer is to use rel=shortlink instead).

Mark Pilgrim published an article about link relations in HTML 5 with more information about the rel attribute, which I found interesting; I had no idea that relations such as rel=license and rel=author were available to allow auto-discovery of license information, and author details.

So I want to float the idea of rel=accessibility that would allow assistive technologies to discover and offer shortcuts to accessibility information, such as a WCAG 2 conformance claim, or a form to request content in alternate formats (for example).

The reason this would be useful is that links to such pages are generally right down in the footer of the web pages. This means that, for screenreader users, they have to navigate to the end of the page to find the link, or not know it exists.

Ironically, on sites that really do need a link to accessibility help (because of lack of structure to navigate with or huge amounts of content before the footer), those who need it are unlikely to find the link to the help.

In the “bad old days”, helpful developers would give an accesskey attribute to that link (which are generally undiscoverable to the human or to a parser, and which often conflict with assistive technologies’ command keystrokes).

A standardised way of indicating the related accessibility information would be better and not rely on arbitrary keys chosen by a developer.

So, should I propose that rel=accessibility be added to the list of values? It looks to be an arduous process; although you don’t need to prove your worth to the HTML 5 gatekeepers, you do have to prove your worth to the microformats gatekeepers.

I thought I’d ask you guys first—is this a good idea?

Microformats, accessibility, HTML 5 (again)

Microformats are a good idea, but some have accessibility problems because they expose machine data to humans by misusing the abbr element. These problems led to the BBC removing those microformats from their sites.

One such misuse is encoding dates and times in microformats such as hCalendar, hAtom, and hReview. Ultimately, this problem goes away in HTML 5, as that introduces a time element which is obviously better than an abbreviation for marking up dates and times (a tenet of microformats is to “use the most accurately precise semantic XHTML building block for each object”).

So, an example of an HTML 4 based hCalendar microformat (taken from the spec) is

<div class="vevent">
<a class="url" href=""></a>
<span class="summary">Web 2.0 Conference</span>:
<abbr class="dtstart" title="2007-10-05">October 5</abbr>-
<abbr class="dtend" title="2007-10-20">19</abbr>,
at the <span class="location">Argent Hotel, San Francisco, CA</span>

After replacing the abbr element with time and replacing its title attribute with datetime we get

<div class="vevent">
<a class="url" href=""></a>
<span class="summary">Web 2.0 Conference</span>:
<time class="dtstart" datetime="2007-10-05">October 5</time>-
<time class="dtend" datetime="2007-10-20">19</time>,
at the <span class="location">Argent Hotel, San Francisco, CA</span>

You can test how it renders in your fave browsers (and the other ones) on the microformats with time test page.

Replacing the abbr pattern elsewhere

Of course, this only works for dates and times. Other microformats use the flawed abbr pattern to code locations in microformats such as hCard, hCalendar & ‘geo’.

Here’s an example

Let’s go to <abbr class="geo" title="30.300474;-97.747247">Austin, TX</abbr>

Which, to a screenreader set to expand abbreviations, is an incomprehensible string of numbers (mp3):

“Thirty point three oh oh four seven four semi-colon minus ninety-seven point seven four seven two four seven”

A leading microformatter, Ben Ward, recently proposed an extension to the value-excerption pattern (no, I don’t know what that means, either) which allows this machine-readable information to remain in the DOM while hiding machine date from people.

My test page plays nicely with the following screenreaders:

  • JAWS 9 and JAWS 10 on Firefox 3 and IE 7 with high verbosity settings and abbr and acronym set to always be expanded (says Jared Smith)
  • NVDA on Firefox 3
  • Opera 9.63 and Voicover 10.5.5 (thanks Henny)
  • Window-Eyes with IE 7 on WinXP (thanks dotjay)
  • WinXP Window-Eyes with FF3 and full punc reads human date. It pauses before the .dtstart spans in Mouse mode, but not in Browse mode (thanks dotjay)
  • Safari 3.2.1 on OS X Leopard with VoiceOver (thanks dotjay)
  • WinXP Window-Eyes with IE 6 and full punc reads human date (thanks dotjay)

I’m really excited by this as it may be the end of the microformats versus accessibility debates (that I’ve helped stir up). If you have access to assistive techologies, please give it a test.

Problems with this pattern from a microformats perspective might be

  • The machine data is “hidden” so might more easily fall out of sync with the human data
  • The machine data must be the first child of the property. If it isn’t, a parser won’t see it – but it will be trickier to debug because the developer will still see it in the source code

I hope, if screenreader and parser testing allows, that the new pattern will be adopted so that those of us who want to use microformats and care about accessibility can use it.

The future of microformats in HTML 5

I had naively thought that many microformats would use the HTML data-* attributes, which are for “embedding custom non-visible data” and thus seem perfect for embedding such information on structutal markup.

Any element in HTML 5 can have any number of data-* attributes. The asterisk is a wildcard; you can call them what you want. An example from the spec:

<div class="spaceship" data-id="92432"
data-weapons="laser 2" data-shields="50%"
data-x="30" data-y="10" data-z="90">

However, the spec goes on to say

User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values.

I was uncertain what this meant, so asked Anne van Kesteren who told me that these attributes are for passing data to scripts that are private to the page, rather than to indicate meaning to external parsers:

It’s so that non-private extensions that need User Agent implementation a) won’t break sites using those names for other purposes and b) get due consideration by the Working Group and a proper name without data-

This is made explicit in an addition to the spec last night by the editor, Ian Hickson:

This is because these attributes are intended for use by the site’s own scripts, and are not a generic extension mechanism for publicly-usable metadata.

So, microformats won’t be “rolled up” into HTML 5. I imagine that some of the microformats community will wish to lobby the HTML 5 working group for a proper name for the data they wish to store with an element, so that the data can be parsed reliably without the “hacky” nature of the microformats. There is a process for adding new features to the spec.

Others will want to ignore caveats that HTML 5 places on using the data-* attributes for publicly-usable metadata and use them anyway.

Perhaps most likely, microformats will continue to use class=, rel=, and the like as they do now. That would be valid HTML 5 and require no changes to specs or parsers.

So it seems that microformats will continue in an HTML 5 world. And, now that there seems to be a will to fix the accessibility problems, I think that’s a good thing.