On the talismanic fight between RDFa and microdata

A new fight has broken out in specland, between the supporters of RDFa and supporters of microdata. Observers may be wondering why; both are methods of adding extra markup to existing content in order that machines may better understand the content. Semantic Web proponents (note capital letters) dream of a Web where all content is linked by said machines. Semantic Web sceptics have more humble aspirations of search engines better understanding micro-content (is this string of digits a book ISBN, or a phone number?).

RDFa was part of XHTML 2. It became a W3C standard (or, in their vernacular, a “recommendation”) in 2008. microdata was invented by Ian Hickson as part of HTML5 because he identified deficiencies in RDFa. microdata was subsequently modularised out of W3C HTML5, but microdata is part of HTML5; it validates, whereas RDFa doesn’t.

Note the history. Like football fights that break out because one guy called an opposing team fan’s pint “a pouff”, this isn’t about the actual slight at all; this is about the past, allegiances and alliances; it’s a clash of world views. This is XML versus non-XML; it’s the XHTML 2 gang against the uncouth young turks of HTML5. This is Rangers vs Celtic; it’s Blur vs Oasis; it’s Tiswas vs Swap Shop.

(Added 15:30 GMT: R/e my framing the current debate as a talismanic battle, I should point out that I don’t mean Manu (whom I’ve always found to be courteous, thoughful and a jolly good chap). Neither do I mean Marcos, who isn’t a WHATWG-er. But some of the discourse “cowardice”, “suck metadata and fade, for all I care” on one side, and “TimBL’s RDF temple priests still mad as hell” suggests some, er, partisan feeling going on.)

What follows is the observation of a layman; I’ve not used much structured content, so am not an expert (I once tried to use microformats for events at the Law Society, but their accessibility problems prevented it.)

In my opinion, the primary deficiencies of Classic RDFa are that it’s too hard to write. For professional metadata-ologists it may be simple (but, hey, those guys understand Dublin Core!). The difficulty for me as an HTML wrangler was namespacing, CURIEs, and triples. This is XML land, and most web authors are not particularly adept with XML.

There’s also the problem that in order to use RDFa properly, you needed an xmlns attribute which is separate from the content you’re actually marking up (you don’t anymore in RDFa 1.1, see Manu’s comment). In a world where lots of content is syndicated via machine, or copy and pasted by authors (many of whom don’t really understand what they’re copy and pasting), this leads to breakage as not all of the necessary moving parts get transferred to their new environment. Hixie wrote

Copy-and-paste of the source becomes very brittle when two separate parts of a document are needed to make sense of the content. Copy-and-paste is how the Web evolved, so I think it is important to keep it functional and easy.

microdata solves this problem. It’s also easier to write than Classic RDFa (in my opinion) although I’m still mystified by the itemid attribute. I intend to start using microdata on this site soon (in order to plug the holes left by removal of the HTML5 pubdate attribute).

I’ve been recommending that people use microdata. Its main advantages:

Manu Sporny understood the problem that RDFa is hard to author for those of us who find the best ontology is a don’t-ology. Almost a year ago, he set about simplifying RDFa and came up with RDFa Lite. RDFa Lite greatly simplifies RDFa; in fact, you can search and replace microdata terms with RDFa terms (see his post Mythical Differences: RDFa Lite vs. Microdata).

RDFa has multiple advantages, too:

It seems to me that developers should just choose the one that meets their project’s needs. Need valid code Don’t need “full fat” RDFa, need a JavaScript API? Choose microdata. Care about Facebook, don’t care about a JavaScript API? Use RDFa Lite.

The current fight, however, won’t allow that. The RDFa gang want to stop microdata going further in the standardisation process because RDFa became a Recommendation first, and microdata is quite similar to it. (This is a controversial perspective; see Manu’s comment.)

While I completely understand that two competing standards makes it harder for developers in the short term, I agree with Marcos Caceres (who isn’t a WHATWG/ HTML5 zealot) who counters Manu Sporny’s objection to microdata progressing thus:

I don’t see what it being a “Recommendation” has to do with anything – just because it’s a W3C Recommendation does not mean that RDFa has a monopoly on structured data in HTML. So, just because that spec reached Rec first doesn’t mean that it’s somehow better or preferable to any other future solution (including micro data). That would be like objecting to Javascript because assembler (or punch cards) already meet all the use cases…

I hope you will instead focus your energy on convincing the world that RDFa is the “correct technology” on its own merits and not place your bets on a mostly meaningless label (“Recommendation”) given by some (much loved, but) random standard organisation.

I see no technical reason to favour microdata or RDFa Lite; both do the job. So, developers; which tickles your fancies? RDFa Lite or microdata?

13 Responses to “ On the talismanic fight between RDFa and microdata ”

Comment by Julian Reschke

Unless I’m missing something, neither RDFa nor Microdata “validate” out of the box. In both cases it depends on the validator actually providing support for extension specs.

Comment by Marcos Caceres

@gunnar, you can’t just disparage Microdata as a “one night hack” while arguing from authority. Each must be taken on its own merit (i.e., how well they meet the use cases – and if they do equally well, what do authors prefer or what is being used in the wild).

I personally don’t know or care which is better. But I oppose people abusing the W3C process in anti competitive ways because RDFa is failing. It makes the RFDa community look like a bunch of sore, bitter, losers… like companies in their dying throws who can’t innovate anymore so they use litigation to stop others from innovating.

Comment by Brian

If you look back throught the history of proposals and drafts that have been submitted and failed in the wider community, some even gaining multiple interoperable implementations and reaching REC, it seems plain that Manu’s objections are problematic. The wildest successes on the Web are the result of acceptance and adoption by a community before it ever even reaches REC, not a decree by a standards body.

Comment by Julian Reschke

So out of the four advantages of microdata you list, one is specific to Opera, one isn’t an advantage of all (validity), and one is debatable (what does it matter what schema.org is using exactly?).

This leaves us with the JS API, right?

Comment by Bruce

Julian, a specific behaviour in a browser is an advantage. To most sites out there, if a consortium of search engines say they will use a certain technology to enhance results, that matters. Greatly.

Comment by Manu Sporny

You have a number of mis-representations in your post, Bruce. I certainly don’t think you did it on purpose, but it frames the issue in a way that divides the Web community – and that’s a bad thing. The overall gist of your blog post is mis-guided:

this is about the past, allegiances and alliances; it’s a clash of world views. This is XML versus non-XML; it’s the XHTML 2 gang against the uncouth young turks of HTML5.

No, this is not about that… despite that being a great way to frame the issue to intensify the drama surrounding this technical decision. The current RDFa WG does not consist of the XHTML2 old guard, almost all of the group now are folks that love HTML5 and want to see the Web reach its full potential. Here’s the technical argument:

Converting 99% of the Microdata in the wild to RDFa Lite is as simple as a text search and replace of the Microdata attributes with RDFa Lite attributes. Both languages do the same thing in almost exactly the same way. Their main difference is the name of the attributes used. I hope this makes it clear how similar the two languages are. If a W3C spec already exists for a problem domain (the one that RDFa Lite already addresses), the W3C should have a good reason for publishing a spec (Microdata) that does over 90% of everything in almost exactly the same way.

To put this another way. Let’s say I wanted to create a spec called “Pixel” that duplicated over 90% of the functionality that Canvas provides, and adds a .tesselate() feature. I would expect that the W3C would reject the spec and integrate the feature set into Canvas, if it was that important. To think that we’d have Canvas and Pixel as W3C RECs seems fairly ridiculous, but that’s the position we’re going to be in if both RDFa and Microdata go to REC.

On to the errors in your blog post:

but microdata is part of HTML5; it validates, whereas RDFa doesn’t.

Simply not true.

Go here: http://validator.w3.org/nu/

Select “options”, then click on the “Presets” drop-down. As you can see, the first two options in the list support validation of RDFa Lite 1.1 /and/ RDFa 1.1. This support has been in there for some time.

There’s also the problem that in order to use RDFa properly, you need an xmlns attribute

You don’t need the xmlns attribute at all for RDFa 1.1 – that attribute has been deprecated. You also don’t need to use prefixes at all if your use case is simple (like most of the schema.org markup). In fact, usage of xmlns: in RDFa Lite 1.1 makes the HTML5+RDFa Lite 1.1 document non-conforming.

The RDFa gang want to stop microdata going further in the standardisation process because RDFa became a Recommendation first, and microdata is quite similar to it.

No, don’t do that. Don’t lump us all together like we’re one homogeneous group. I know that it’s human nature to categorize and put one group against another, but that’s not how I work… and it’s not how many of the other folks in the Web standards groups work. There are a number of varied opinions on this matter. I made the comment specifically as an individual and member of the HTML Working Group. This does not represent the opinion of other folks in the RDFa Working Group… in fact, some of them are pissed that I said anything at all. This is a controversial topic, and it’s hard enough to have a sane technical discussion about the alternatives without people asserting motives and intentions that you have never had in a public forum. Both you and Marcos have now asserted motives and intentions that are completely baseless and serve no purpose other than to froth the debate with unnecessary conjecture.

There is no “RDFa gang”, and nobody is trying to “stop Microdata”. What I do want to happen is for W3C to have a /rational and logical discussion/ about why they’re publishing two specifications that do almost exactly the same thing in almost exactly the same way. All of us want what’s good for the Web and Web developers, and in order for us to fulfill our charge as stewards of the Web’s architecture, we must have these painful discussions from time to time.

Comment by Gunnar Bittersmann

@Marcos: That’s what I’ve heard. Well, maybe it was not just one night but one weekend over which Hixie has written microdata into the spec.
And let’s face: It’s in the WHATWG spec not because it’s the better approach or because it had been implemented already at the time of writing, but simply because microdata was invented by Hixie Himself while RDFa was invented by others. Hardly a good reason.

Comment by Julian Reschke

Bruce,

if I understand correctly, schema.org supports RDFa Lite as well, so there simply is no advantage for either.

What I was trying to point out is that, when you recommended microdata in the past, this may have been based on misconceptions/misinformation/outdated information.

What *would* be interesting is in which way the two formats have advantages/disavantages *as of today*.

Best regards, Julian

Comment by Bruce

@manu thanks chap. I’ve linked the bits that you find most contentious to your comment so people can see both sides; corrected my error that RDFa doesn’t validate (it didn’t before, I’m certain; glad that’s changed). I added a line at the end “I see no technical reason to favour microdata or RDFa Lite; both do the job” to make it clear that I don’t see microdata as intrinsically “better” than RDFa (or vice versa).

@Julian I was trying to list the advantages of each, but made a mistake in saying only microdata validates; I’ve corrected that. schema.org says it has “experimental” support for RDFa. What that means is unclear to me: experimental because it’s new? It may be removed? If I were still leading a dev team, with all other factors being equal, I’d choose the one that wasn’t “experimentally” supported.

Comment by Stéphane Corlosquet

RDFa is also be supported by schema.org (although it’s “experimental” at the moment).

If you read carefully the schema.org data model page which you linked to above, you will note that what is referred to as “experimental” is the reflected system that the schema.org folks use for managing schema.org terms, in other words it’s the document listing all of the schema.org terms in RDFa. Nowhere does it say that the general support for schema.org expressed using RDFa in HTML in the wild is experimental, which is what matters in the context of your blog post.

You got the right link for the first announcement of RDFa support in schema.org: RDFa is also be supported by schema.org. A more recent blog post reiterates equal support for microdata and RDFa in schema.org:

Effective immediately, the GoodRelations vocabulary (http://purl.org/goodrelations/) is directly available from within the schema.org site for use with both HTML5 Microdata and RDFa.”

In this Google+ thread, Dan Brickley explains how they try to handle microdata and RDFa in an equivalent manner at Google (his first comment). In his second comment, he is expecting more RDFa examples on schema.org over time. That’s in line with what Peter Mika (Yahoo!) and Alexander Shubin (Yandex) announced last week at ISWC during their schema.org update presentation regarding the improvements to the schema.org infrastructure.