Archive for the 'semantics' Category

On the talismanic fight between RDFa and microdata

A new fight has broken out in specland, between the supporters of RDFa and supporters of microdata. Observers may be wondering why; both are methods of adding extra markup to existing content in order that machines may better understand the content. Semantic Web proponents (note capital letters) dream of a Web where all content is linked by said machines. Semantic Web sceptics have more humble aspirations of search engines better understanding micro-content (is this string of digits a book ISBN, or a phone number?).

RDFa was part of XHTML 2. It became a W3C standard (or, in their vernacular, a “recommendation”) in 2008. microdata was invented by Ian Hickson as part of HTML5 because he identified deficiencies in RDFa. microdata was subsequently modularised out of W3C HTML5, but microdata is part of HTML5; it validates, whereas RDFa doesn’t.

Note the history. Like football fights that break out because one guy called an opposing team fan’s pint “a pouff”, this isn’t about the actual slight at all; this is about the past, allegiances and alliances; it’s a clash of world views. This is XML versus non-XML; it’s the XHTML 2 gang against the uncouth young turks of HTML5. This is Rangers vs Celtic; it’s Blur vs Oasis; it’s Tiswas vs Swap Shop.

(Added 15:30 GMT: R/e my framing the current debate as a talismanic battle, I should point out that I don’t mean Manu (whom I’ve always found to be courteous, thoughful and a jolly good chap). Neither do I mean Marcos, who isn’t a WHATWG-er. But some of the discourse “cowardice”, “suck metadata and fade, for all I care” on one side, and “TimBL’s RDF temple priests still mad as hell” suggests some, er, partisan feeling going on.)

What follows is the observation of a layman; I’ve not used much structured content, so am not an expert (I once tried to use microformats for events at the Law Society, but their accessibility problems prevented it.)

In my opinion, the primary deficiencies of Classic RDFa are that it’s too hard to write. For professional metadata-ologists it may be simple (but, hey, those guys understand Dublin Core!). The difficulty for me as an HTML wrangler was namespacing, CURIEs, and triples. This is XML land, and most web authors are not particularly adept with XML.

There’s also the problem that in order to use RDFa properly, you needed an xmlns attribute which is separate from the content you’re actually marking up (you don’t anymore in RDFa 1.1, see Manu’s comment). In a world where lots of content is syndicated via machine, or copy and pasted by authors (many of whom don’t really understand what they’re copy and pasting), this leads to breakage as not all of the necessary moving parts get transferred to their new environment. Hixie wrote

Copy-and-paste of the source becomes very brittle when two separate parts of a document are needed to make sense of the content. Copy-and-paste is how the Web evolved, so I think it is important to keep it functional and easy.

microdata solves this problem. It’s also easier to write than Classic RDFa (in my opinion) although I’m still mystified by the itemid attribute. I intend to start using microdata on this site soon (in order to plug the holes left by removal of the HTML5 pubdate attribute).

I’ve been recommending that people use microdata. Its main advantages:

Manu Sporny understood the problem that RDFa is hard to author for those of us who find the best ontology is a don’t-ology. Almost a year ago, he set about simplifying RDFa and came up with RDFa Lite. RDFa Lite greatly simplifies RDFa; in fact, you can search and replace microdata terms with RDFa terms (see his post Mythical Differences: RDFa Lite vs. Microdata).

RDFa has multiple advantages, too:

It seems to me that developers should just choose the one that meets their project’s needs. Need valid code Don’t need “full fat” RDFa, need a JavaScript API? Choose microdata. Care about Facebook, don’t care about a JavaScript API? Use RDFa Lite.

The current fight, however, won’t allow that. The RDFa gang want to stop microdata going further in the standardisation process because RDFa became a Recommendation first, and microdata is quite similar to it. (This is a controversial perspective; see Manu’s comment.)

While I completely understand that two competing standards makes it harder for developers in the short term, I agree with Marcos Caceres (who isn’t a WHATWG/ HTML5 zealot) who counters Manu Sporny’s objection to microdata progressing thus:

I don’t see what it being a “Recommendation” has to do with anything – just because it’s a W3C Recommendation does not mean that RDFa has a monopoly on structured data in HTML. So, just because that spec reached Rec first doesn’t mean that it’s somehow better or preferable to any other future solution (including micro data). That would be like objecting to Javascript because assembler (or punch cards) already meet all the use cases…

I hope you will instead focus your energy on convincing the world that RDFa is the “correct technology” on its own merits and not place your bets on a mostly meaningless label (“Recommendation”) given by some (much loved, but) random standard organisation.

I see no technical reason to favour microdata or RDFa Lite; both do the job. So, developers; which tickles your fancies? RDFa Lite or microdata?

Reading List

Industry

Misc

Scooby Doo and the proposed HTML5 <content> element

Note: Since writing this, I’ve continued vacillating and now support a <main> element. Why I changed my mind about the <main> element.

Trigger warning: contains disagreement about accessibility.

I’ve been vacillating (ooh err, missus) for two weeks from one opinion to the other regarding a proposed (and rejected) <content> element. This weekend, The Mighty Steve Faulkner wrote an unofficial draft of a <maincontent> element.

Dude, where’s my content?

For a while, people have suggested that HTML add a <content> element that wraps main content, because many websites have something like <div id="content"> surrounding the area that authors identify as their main content, which they then use to position and style that central content area.

Fans of WAI-ARIA also like to hang role="main" on that area, to tell assistive technology where the main content of the page starts. I do this too.

The editor of HTML.next, Ian Hickson, rejected a new <content> element:

What would the element _mean_? If it’s just “the main content”, then that is what the element’s contents would mean even without the element, so really it means the element is meaningless. And in that case, <div> is perfect, since that’s what it is: a grouping element with no meaning.

The primary argument against a special element is that it isn’t necessary, because the beginning of “main content” can be identified by a process of elimination that I call the “Scooby Doo algorithm”: you always know that the person behind the ghost mask will be the sinister janitor of the disused theme park, simply because he’s the only person in the episode who isn’t Fred, Daphne, Velma, Shaggy, or Scooby. (Like most Scooby fans, I’m pretending Scrappy never existed.)

Similarly,the first piece of content that’s not in a <header>, <nav>, <aside>, or <footer> is the beginning of the main content, regardless of whether it’s contained in an <article>, or <div>, or whether it is a direct child of the <body> element.

Authors do need to be able to identify their main content, both for styling (in which case <div> seems to be the most appropriate element) and as a target for “skip links”, in which case, the current <a href=”#main”>Skip nav<a> … <div id=”main”> pattern still does the trick.

It’s worth noting that people often code “skip links” believing it’s required by WCAG 2, but if browsers implemented the Scooby Doo algorithm that is explicitly not the case: “It is not the intent of this Success Criterion to require authors to provide methods that are redundant to functionality provided by the user agent.”

Many assistive technology useragents understand the ARIA role=”main”, so skip links should be unnecessary; ATs can hone in on <div id=”main” role=main> by themselves, even without supporting the Scooby Doo algorithm.

This suggests to me that a new element isn’t required. But…

Paving cowpaths, ease for authors

Chaals (ex-Opera, now Yandex) wrote

To turn the question around, if it is more convenient for authors to identify the main content, and not think about the classification of other parts, should we offer that facility as part of the platform? Or does it make sense to say that only the exhaustive identification of all supplementary content is an appropriate way to mark up a page anyway?

Chaals argues that it makes authoring easier – suddenly you get extra accessibility by just adding one <content> element, rather than adding the other elements that the Scooby Doo algorithm can then exclude. People using CMSs, who only control the textarea that gets lumped in as “main content” and can’t touch the surrounding areas can now add an element, without having to ask others to tweak templates.

But then, they can do this already, by surrounding their content with <div role=”main”> and this already works in assistive technologies.

A flawed argument for a new element is that it paves a cowpath, so should be added to the language. It’s certainly the case that <div id=”main”> and <div id=”content”> are very frequently found in pages – they were #2 and #6 in the most-frequently used ID attributes in the 2008 MAMA: What is the Web made of? report.

But not every cowpath needs paving. If it did, we’d also have a <logo> and a <container> element (#4 and #5 respectively), and we’d be recommending tables for layout. If something can be done automatically, without requiring extra authorial work, shouldn’t that be favoured? In the same way that we like HTML5 form types as they’re baked into the browser, shouldn’t the Scooby Doo algorithm be preferable?

Of course, the Scooby Doo algorithm requires the author to use <header>, <footer> <nav> and <aside> — but if (s)he doesn’t want/ isn’t able to author HTML5, ARIA’s role="main" is there precisely as bridging technology.

There’s also the argument that authors expect there to be a <content> element, so its absence violates the Principle of Least Surprise. But I’m not sure that’s a valid argument. Implementing the Scooby Doo algorithm would mean that pages whose author does nothing for accessibility can be made so that their main content area may be programmatically determined. ARIA exists for pages that aren’t in HTML5, or until the Scooby Doo algorithm is widely supported, and analysis shows that most ARIA is correctly used by authors.

Why add an extra complexity, which is more to go wrong and thus potentially harms accessibility?

Also available:

On the publication of Editor’s draft of the <picture> element

On Tuesday, a w3C Editor’s Draft of a proposal for HTML Responsive Images Extension was published, edited by Matt Marquis and Microsoft’s Adrian Bateman:

This proposal adds new elements and attribute to [HTML5] to enable different sources of images based on browser and display characteristics. The proposal addresses multiple use cases such as images used in responsive web designs and different images needed for high density displays.

It’s a mash-up of the strawman <picture> element I proposed in December last year, and the Hixie-blessed srcset attribute on img proposed by Ted O’Connor of Apple. (See Responsive images: what’s the problem, and how do we fix it? for more background.)

Of course, the publication of an editor’s draft means nothing. There isn’t any indication that any browser manufacturer will implement it. Philip Jagenstadt of Opera (but speaking personally) has already commented

Personally, I don’t think it’s a good idea to reuse the syntax of <video> + <source>, since it invites people to think that they work the same when they in fact are very different. It could be made to work, but the algorithm needs a lot more detail. Disclaimer: This is neither a “yay” nor “nay” from Opera Software, just my POV from working on HTMLMediaElement.

Those eager to bash Hixie and the WHATWG are using the new spec as if it were a cudgel; “this is how you deal with Hixie and WHATWG” says Marc Drummond.

I don’t think that’s productive. What is productive is the debate that this publication will (hopefully) foster. Two different but connected issues need debating. Firstly, the technical debate needs to continue. There are problems with the <picture> element, such as Philip points out, and its verbosity and potential to be a maintainability nightmare, as pointed out by Matt Wilcox.

And we need to work out how all can play a role in specifying the Web. The W3C dropped the ball with XHTML 2 because it was a philosophically elegant, intellectually satisfying specification that bore absolutely no relevance to the real world.

WHATWG picked up that ball, and grew the Web, because they knew what developers wanted: Flash and Silverlight’s application capabilities. The browser manufacturers who made up the early WHATWG knew that if those capabilities weren’t added to browsers, they would die.

But the gatekeepers of WHATWG aren’t jobbing developers or designers. Nothing wrong in that: neither am I any more. But their inability or unwillingness to account for the art direction use case has caused them to drop the ball here.

We need to get everyone’s voices heard: — people like Philip with implementor’s experience, people like Hixie with his encyclopaedic understanding of how specs inter-relate, and also solicit and understand the needs of those who make websites but can’t express their requirements in the language of specs, algorithms and IDL.

I don’t know how we can do this. The CSS Working Group had designers attending as invited experts for a short while, but I’m told that wasn’t particularly useful for either group. A group of designers called the CSS Eleven hailed themselves as “an international group of visual web designers and developers who are committed to helping the W3C’s CSS Working Group to better deliver the tools that are needed to design tomorrow’s web” and then promptly disappeared.

So congratulations to Matt Marquis and the others in the Responsive Images Community Group who have had the staying power to produce a spec for discussion.

Hopefully, this strawman spec can be refined into something workable that gives developers a simple, declarative way to send different-sized, art directed images depending on bandwidth and device characteristics. This means organisations waste less bandwidth, and it’s a faster (and cheaper) Web for consumers — you know, the great unwashed that we make websites for.

Added 8 September: The road to responsive images – useful introduction to actually authoring it in your websites.

Why HTML5 urgently needs an HTML adaptive images mechanism

After the recent kerfuffle about the draft HTML specification for a mechanism for adaptive images, and an excellent compromise suggestion by Florian Rivoal (Opera’s CSS WG rep), the mailing lists have gone quiet again.

(If you don’t know why we need such a mechanism, read Matt “Grrr” Wilcox‘s article Responsive images: what’s the problem, and how do we fix it?.)

We’ve needed such a mechanism for a while, but it’s particularly urgent now. Webkit has implemented a new non-standard CSS function called -webkit-image-set:

selector {
background: image-set(url(foo-lowres.png) 1x,
                      url(foo-highres.png) 2x) center;}

The changeset describes it:

The idea behind the feature is to allow authors to provide multiple variants of the same image at differing resolutions, and to allow the User Agent to choose the resource that is most appropriate at the time. This patch will choose the most appropriate image based on device scale factor.

(Read the email/ specification for -webkit-image-set.)

As far as I can tell, it will be in Safari 6 (OS X 10.8, shipping in July) and Mobile Safari for iOS 6, so we’ll be able to do adaptive background images but not adaptve content images.

Once that happens, I bet it’s 0.03 nanoseconds before developers convert <img src=”foo.blah” alt=”useful alternate text”> into an empty <div>s with adaptive background images using -webkit-image-set and (please!) a fallback background images for non-webkit browsers.

Why wouldn’t they? Retina screens will download massive but beautiful images, while users of lower-res phones save bandwidth by downloading smaller images instead of huge images and then reducing them down. Problem solved.

And, unfortunately, people who don’t download images, or can’t perceive images won’t get any alternate text, as there is no content <img> element any more. Those guys have always been last in the pecking order and there’s no reason to assume they’ll get a better deal this time.

And that’s why some agreement on <picture>, <pic>, <img srcset=”…”> (whatever it is; I don’t much care) needs to happen, and fast.

Update 22 June 2012: WHATWG conversation mysteriously dried, so Matthew Marquis wrote to W3C HTML WG and proposed Florian’s suggestion.

Reading List

Standards

  • Once again, discussion of installable/ packaged web app thingies. Major points:
    • Mozilla doesn’t want to reuse the W3C Widget spec because they want hosted, not installable apps. And the name “widget” is rubbish.
    • Hosted web apps are .. er .. web sites. But wrapped up, with a manifest, they can be monetized in a synergy of micropayment leverage.
    • “Installing” packaged apps can give them greater permissions than web sites. Hixie responded

      The “installation” security model of asking the user up-front to grant trust just doesn’t work because users don’t understand the question, and the “installation” security model of curating apps and trying to determine by empirical examination whether an application is trustworthy or not just doesn’t scale.

  • Two New Proposals to Solve the CSS3 Vendor Prefix Crisis
  • Last week I linked to an article that claimed adherence to WCAG didn’t magically solve all blind users’ problems. Here’s a rebuttal Methodological flaws put question mark on study of the impact of WCAG on user problems. It seems to me obvious that, while WCAG is very useful, it doesn’t replace brain, or testing.

Mobile and devices

Web Biz

Misc

Web Standards Song

“Like A Rounded Corner”, by Bruce and The Standardettes

The best of <time>s

(Article updated to correct some typos noticed by commenters, and clarify some aspects.)

Avid HTML5 watchers will know that the <time> element was dropped from HTML, then re-instated, with more New! Improved! semantics.

As before, you can put anything you like between the opening and closing tags – that’s the human-readable bit. The machine-readable bit is contained within a datetime attribute. Dates are expressed YYYY-MM-DD.

Previously, you could only mark up precise dates. So, 13 November 1905 could be expressed in HTML <time datetime="1905-11-13"> but November 1905 couldn’t be. This is a problem for historians where sometimes the precise date isn’t known.

Now, “fuzzy dates” are possible:

  • <time datetime="1905"> means the year 1905
  • <time datetime="1905-11"> means November 1905
  • <time datetime="11-13"> means 13 November (any year)
  • <time datetime="1905-W21"> means week 21 of 1905

As before, times are expressed using the 24 hour clock. Now, you can separate the date and time with a space rather than a “T” (but you don’t have to). So both of these are valid:

  • <time datetime="1905-11-13T09:00">
  • <time datetime="1905-11-13 09:00">

You can localise times, as before. Appending “Z” to the timezone indicates UTC (a way of saying “GMT” without it being comprehensible to normal people). Otherwise, use an offset:

  • <time datetime="09:00Z"> is 9am, UTC.
  • <time datetime="09:00-05:00"> is 9am in the timezone 5 hours behind UTC.
  • <time datetime="09:00+05:45"> is 9am in Nepal, which is UTC + 5 hours and 45 minutes.

Durations

In New! Improved! HTML5 <time>, you can represent durations, with the prefix “P” for “period”.

The datetime attribute “D” for days, “H” for hours, “M” for minutes and “XQ” for seconds. Just kidding – that’s “S”.

You can separate them with spaces (but you don’t have to). So <time datetime="P4D"> is a duration of 4 days, as is <time datetime="P 4 D">.

Using a “T” after the “P” marker allows you to be more precise: <time datetime="PT23H 9M 2.343S"> is a duration of 23 hours, 9 minutes and 2.345 seconds.

Alternatively, you can use a duration time component.

Whichever you choose, it’s represented internally as a number of seconds. Because of this, you can’t specify a duration in terms of months, because a month isn’t a precise number of seconds; a month can last from 28 to 31 days. Similarly, a year isn’t a precise number of seconds; it’s 12 months and February sometimes has an extra day.

You still can’t represent dates before the Christian era, as years can’t be negative. Neither can you indicate date ranges. To mark up From “21/02/2012 to 25/02/2012″, use two separate <time> elements.

pubdate

The pubdate attribute (a boolean that indicates that this particular date is the publication date of the parent <article> (or, if there is none, the whole document) is currently missing from the W3C version of the spec, but is still current in the WHATWG version. Its status is unclear to me (but I’m still using it).

Update 10 August 2012: in response to a query, I checked again and pubdate is gone from both the WHATWG and W3C specs.

Meet NEWT: New Exciting Web Technologies

As professionals, we need to keep our jargon accurate. That’s why I hate hearing “HTML5″ used as an umbrella term for any web technology, especially when people confuse HTML5 (mark-up and APIs) with CSS 3 (eye-candy). Repeat after me: HTML5 != CSS 3.

So we need a buzzword, as “HTML5 and related technologies” is unwieldy (and inaccurate). I prefer “NEWT” which stands for New Exciting Web Technologies and can thus safely encompass real-HTML5, CSS 3, SVG, XHR2, Geolocation, Web Sockets, WOFF, Web DB, IndexedDB, WebGL and the like.

Because new acronyms need a logo, the talented and generous Rob Winters made a cute newt.

various sizes of cute newt on Flickr

(Other sizes, Transparent-background version.)

Please, use the logo and term in your presentations, and wave goodbye to misery of I.B.E. (Inaccurate Buzzword Embarrassment)!

(Be nice if you attributed Rob, and even me, but no compulsion.)

HTML5 details element, built-in and bolt-on accessibility

Now that the HTML5 spec is settling down, we’re seeing a resurgance of people attempting to cut good bits out. One of the latest targets is the very useful details element. This is an expanding and contracting element that you can use to “hide” more details without JavaScript. It’s not supported anywhere yet, but my glamorous co-author Remy cleverly fakes it on his 2010 Full Frontal conference page. His code is


 <details>
  <summary>Get in touch</summary>
  <p>Interested in sponsoring? Get in touch.</p>
 </details>

If you click on the text inside the summary element (which defaults to the text “details” if absent), the details element expands. Click again, and it contracts.

Anyway, the argument against this element is that it duplicates what you can do already with a div, some JavaScript and some WAI-ARIA information to add role and state information to this widget. That’s true, but it completely misses the point that one of the reasons HTML5 is so interesting to developers is precisely because it greatly simplifies web development, meaning you don’t have to muck around with script and ARIA.

The new HTML5 forms UI widgets and validation (supported in Opera), for example, are entirely superfluous, you could argue, because you can do the same in JavaScript. Why have built-in sliders for input type=range or input type=date or even good old fashioned textarea when you could have one single input element and script everything else?

The answer is because it’s much more hassle to pull in script and keep role and state info with ARIA. It’s much better to build these things into the language, with built-in accessibility because the browser vendors bake in role and state information and hook up the widgets to the operating system.

Expecting developers to reinvent this wheel every time is “bolt-on” accessibility, and we all know that they won’t actually do it. Yes, they can use a library. I’ve seen the whole jQuery library pulled in just to make an expando box behave like details.

I refer you to John Foliot, whose email to the HTML Working Group makes an excellent argument that explains the subject much better than I can. I should add that Foliot is by no means an HTML5 cheerleader and is a great advocate of accessibility and ARIA. His main points are

  • Elements are better than attributes for semantics, javascript, css
  • If something needs to be referenced you should normally make it an element, with an id attribute
  • If something represents a separate entity, make it an element: you might want to add attributes later
  • having a simple element express something instead of requiring a bunch of lines of scripting is way more efficient.
  • moving functionality into the browser natively, instead of relying on scripting which requires additional CPU is an efficient design choice
  • having ‘accessibility’ built-in rather than bolt-on (which is what ARIA is) is a fundamental desire of most of us in the a11y community.

I recommend you read the whole email and his blogpost Thoughts on <details> versus @summary .

Added June 2011: Implementing HTML5 <details>.

HTML 5 is a mess

HTML 5 is a mess”, said John Allsopp.

I agree. It is. It’s also several different kind of messes, not all of which are avoidable or bad. Let’s look at them.

The backwards compatibility mess

The first kind of mess is because it builds on HTML 4. I’m sure if you were building a mark-up language from scratch you would include elements like footer, header and nav (actually, HTML 2 had a menu element for navigation that was deprecated in 4.01).

You probably wouldn’t have loads of computer science oriented elements like kbd,var, samp in preference to the structural elements that people “fake” with classes. Things like tabindex wouldn’t be there, as we all know that if you use properly structured code you don’t need to change the tab order, and accesskey wouldn’t make it because it’s undiscoverable to a user and may conflict with assistive technology. Accessibility would have been part of the design rather than bolted on.

But we know that now; we didn’t know that then. And HTML 5 aims to be compatible with legacy browsers and legacy pages. This page is written in HTML 5, and although your browser doesn’t “understand” it, it it still renders it.

There was a cartoon in the ancient satirical magazine Punch showing a city slicker asking an old rural gentleman for directions to his destination. The rustic says “To get there, I wouldn’t start from here”. That’s where we are with HTML. If we were designing a spec from scratch, it would look much like XHTML 2, which I described elsewhere as “a beautiful specification of philosophical purity that had absolutely no resemblance to the real world”, and which was aborted by the W3C last week.

That’s one reason why HTML 5 a mess. It’s built on a mess.

The process mess

Specifying HTML 5 is probably the most open process the W3C has ever had. Different groups with different interests battle it out over the mailing lists. Competing variant specs are springing up: Manu Sporny is devising a spec that incorporates RDFa in HTML 5, while The Mighty Steve Faulkner is writing one addressing accessibility issues including alt text and summary attributes. I’m hopeful that these will all be merged together, while simultaneusly, parts of the spec are being split out into separate specifications where there are no dependancies, such as Web Storage and Web Database.

This amorphous process is messy. But, in my opinion, infinitely better than the ivory tower of XHTML 2 academia.

The spec mess

Then, of course, there’s the real mess that John Allsopp and Matt Wilcox complain about—imprecise, ambiguous specification or unnecessary restrictions.

Why the restrictions on the time element? Why the overlap between figure and aside?

Why is the nav element so loosely specified? (“Only sections that consist of major navigation blocks are appropriate for the nav element”—define “major”, if you please.) Why can you have nav inside a header and not in a footer?

If these things bother you, you can do something about itemail the Working Group and make your feelings known.

Post-publication clarifications

I should clarify two things after some tweets and a Skype conversation with Lachlan Hunt. The first is in fairness to Hixie: he changed the definition of nav to clarify that there can be more than one per page, and I agreed that the new defintion was an improvement. But, as I’m a self-confessed dimwit, I only realised later that the word “major” leaves ambiguity as well. So that’s my brain parsing slowly, not Hixie’s bad.

Secondly, and most importantly, I haven’t had some kind of damascene conversion to the anti-HTML 5 camp. I firmly believe that it’s the best way forward for an Open Standard that allows us to write richer web pages and applications. And I’m sure that no-one, especially those in the Working Group, believes it’s perfect already. In fact, the Working Group consistently call for developer participation in reviewing the spec.

To reiterate: bitching in blogs (like I’m doing now!) and on twitter is OK, but the best way to be sure that your views are heard is to mail to Working Group.