Marking up a blog with HTML 5 (part 2)

Further refining the HTML 5 structure

Last month, I replumbed this blog to use HTML 5 for the markup and replaced the basic framework (a completely typical collection of divs to hold headers, footers and sidebars) with new html 5 structural tags. Browsers can’t yet do anything useful with those new elements but I showed that, with a bit of coaxing, browsers can be persuaded to style them with CSS and JavaScript can access those new HTML 5 elements.

What I didn’t do then – because I needed to bury myself in the specs and do some research – is use HTML 5 to mark up the real guts of the site, to give articles, comments and datestamps real semantics.

This should not be considered a tutorial. It’s an experiment. The specs are ambiguous, so I can’t be sure I’m using every element properly, and there’s not exactly a huge body of examples in the wild to draw from. Therefore, if you disagree with my markup choices please do let me know. Nicely.

You can use my WordPress HTML 5 theme, which is based on the excellent Kubrick theme. I’d be dead chuffed if you’d let me know if you do use it. You’ll probably want to comment out references to plugins (unless you also use them). I don’t support the theme and I’m sorry that my PHP is so shockingly bad.

The blog home page

An interesting thing about a blog homepage is that there are generally the last 5 or so posts, each with a heading, a "body" and data about the post (time, who wrote it, how many comments etc.) and usually a link to another page that has the full blog post (if the homepage just showed an excerpt) and its comments.

HTML 5 has an article element which I use to wrap each story:

The article element represents a section of a page that consists of a composition that forms an independent part of a document, page, or site. This could be a forum post, a magazine or newspaper article, a Web log entry, a user-submitted comment, or any other independent item of content.

Let’s look in more detail at the guts of how I mark up each blogpost.

Anatomy of a blog post

diagram of article structure; explanation follows

The wrapper is no longer a generic div but an article. Within that is a header, comprising a heading (the title of the blogpost) and then the time of publication, marked up using the time element.

Then there are the pearls of wit and wisdom that consitute each of my posts, marked up as paragraphs, blockquotes etc., and is pulled unchanged out of the database. Following that is data about the blog post (category, how many comments) marked up as a footer and, in the case of pages that show a single blogpost, there are comments expressing undying admiration and love. Finally, there may be navigation from one article to the next.

Data about the article

Following the content there is some “metadata” about the post: what category it’s in, how many comments there are. I’ve marked this up as footer. I previously used aside which “represents a section of a page that consists of content that is tangentially related to the content around the aside element, and which could be considered separate from that content” but decided that it was too much of a stretch; data about a post is intimately related.

footer is a much better fit: “A footer typically contains information about its section such as who wrote it, links to related documents, copyright data, and the like.” I was initially thrown off-course by the presentational name of the element; my use here isn’t at the bottom of the page, or even at the bottom of the article, but it certainly seems to fit the bill – it’s information about its section, containing author name, links to related documents (comments) and the like. There’s no reason that you can’t have more than one footer on page; the spec’s description says "the footer element represents a footer for the section it applies to" and a page may have any number of sections. The spec also says "Footers don’t necessarily have to appear at the end of a section, though they usually do."

This does, however, raise an interesting question about WAI-ARIA. In the structural redesign, I gave the page’s "main" footer an aria-role="contentinfo", on the grounds that assistive technology users (or search engines) might wish to jump straight to that information about the page they’re using.

I’m assuming that it would not be helpful for each article’s footer metadata to also have the same aria role. Additionally, the aria spec says content info is "metadata that applies to the parent document", which I read as meaning the whole web page, not each individual article.

Those who know more about ARIA and HTML 5 than I do have suggested that there is an automatic one-to-one correspondance between the HTML 5 footer element and aria-role="contentinfo" and (see Comparison of ARIA landmark roles and HTML5 structural elements by Steve Faulkner and ARIA in HTML5 Integration: Document Conformance (Draft) by Henri Sivonen.)

Henri’s draft (but not Steve’s) suggests an approximate correspondance between HTML 5’s header and ARIA’s banner role.

If I’m right that assistive technology user expect only one instance of banner and contentinfo (and I’d very much like to discuss this), we might need to revisit these assumptions.

Comments

I’ve marked up comments as articles, too, as the spec says that an article could be “a user-submitted comment”, but nested these inside the parent. These are headed with the date and the time of the comment and name of its author. I tried wrapping these in a header too, but it wouldn’t validate as each header requires at least one heading with it. As the author and time of a comment doesn’t feel like a heading, and there was not one there before, I’ve left it as plain text. But I’m undecided.

The original WordPress install had an ordered list of comments, which I’ve removed; some things have an implied order and, now they’re marked up with unambigously parsable dates and times, it’s trivial to programmatically determine the order. I thought it might be fun to generate numbers with CSS using this code

article article {counter-increment: comment;}
article article:before {content: counters(comment, ".");}

and, for those using the Opera 10 alpha (or the more-recently released Safari 4 beta), those generated numbers should be styled using the punky demand font from nerfect.com, used as an CSS web font with kind permission from its creator, Mr Walters. (Internet Explorer has been able to embed DRM-ed fonts since version 4, but I was unable to understand how to use its WEFT tool.)

Times and dates

Most blogs, news sites and the like provide dates of article publication (and dates of events and the like).

Microformats people, the most vocal advocates of marking up dates and times, believe that computer-formatted dates are best for people: their wiki says “the ISO8601 YYYY-MM-DD format for dates is the best choice that is the most accurately readable for the most people worldwide, and thus the most accessible as well”.

I don’t agree (and neither do candidates in my vox pop of non-geeks, my wife, brother and parents). Therefore I’ve used the HTML 5 time element to give a machine parsable date to computers, while giving people a human-readable date. Blog posts get the date, while comments get the date and time.

The spec is quite hard to understand, in my opinion, but the format you use is 2004-02-28T15:19:21+00:00, where T separates the date and the time, and the + (or a -) is the offset from UTC. Dates on their own don’t need a timezone; full datetimes do. Oddly, the spec suggests that if you use a time without a date, you don’t need a timezone either.

There’s considerable controversy over the time element at the moment. Recently one of the inner circle, Henri Sivonen, wrote that it’s for marking up future events only and not for timestamping blogs or news items: “The expected use cases of hCalendar are mainly transferring future event entries from a Web page into an application like iCal." This seems very silly to me; if there is a time element, why not allow me to mark up any time or date?

The spec for time does not mention the future event-only restriction: "The time element represents a precise date and/or a time in the proleptic Gregorian calendar" and gives three examples, two of which are about the past and none of which are "future events". (Henri calls this is a "spec bug". I’m not picking on Henri, by the way; I have loads of respect for him. It’s just that he has written some of the most quotable quotes since I’ve been looking at this).

Although the spec doesn’t (currently) limit use of the element, it does limit format to precise dates in "the proleptic Gregorian calendar". This means I can mark up an archive page for "all blog posts today" using time, but not "all July 2008 posts" as that’s not a full YYYY-MM-DD date. Neither can you mark up precise, but ancient dates, so the date of Julius Ceasar’s assassination, 15 March 44 BC is not compatible.

It is inconsistent that some dates may be marked up and not others, and it’s a problem that’s going to get worse as we’ll see some dates marked up with HTML 5 and others, such as fuzzy dates and ancient dates as microformats as they’re not allowed in the official markup scheme, fragmenting such data.There are already search engines that look for dates such as searchmonkey and YQL and of historical dates marked up in Wikipedia as a microformat or in museum databases.

Henri writes "The time element is meant as a replacement for the microformat abbrdesign pattern in hCalendar (if the microformat community embraces time; if not, time is pretty much pointless in HTML 5)". Henri is right: time is pretty much pointless in HTML5 if it’s not embraced by the microformats community, but why would they embrace it? It prevents them doing a lot of what they do now so what do they gain?

I suggest the spec be amended to allow dates like "July 1966" and "3 January 1077" to be compatible with the time element. Restricting it to "future events" is likely to make it a stillborn element, or make that bit of the spec completely irrelevant to its de facto use.

Stay tuned

That’s enough for now. Corrections to the CSS as I notice abominations. More write-up to come about sections and headings. Any comments?

25 Responses to “ Marking up a blog with HTML 5 (part 2) ”

Comment by J

Great post (article!)! It’s been far too long since I’ve read anything about HTML. My first encounter was when I bought an HTML reference book by Molly. I read it pretty much from cover to cover. Then I fell out of the web and I’m looking for a way to climb back in. HTML 5 might be a second chance to have a play. I know this comment is pretty off topic from what you’re after but I wanted to thank you for giving me a little education on the train to London.

Comment by Bruce

You’ve made my evening. I hope HTML 5 will simplify the web for you. And for me. Lots of juicy functionality out of the box, with no need for JavaScript.

Comment by Jim O'Donnell

I’ve also suggested that HTML5 fully support IS08601 dates too, since ISO8601 allows BCE dates and dates which have no specific day or month. There’s also a standard (possibly also ISO8601) which supports time periods by seperating two dates with a slash eg. WWI would be 1914/1918. It would be much easier for markup if HTML accepted all these machine-readable formats.

If they could adopt a calendar, as per TEI, that would be useful too: http://www.tei-c.org/Guidelines/P4/html/ref-DATE.html

Comment by Dominykas

Oh, the temptation to use your HTML5 skin… I mean, do I really need to sleep tonight, when I could be playing around with it?..

And as for time being “future only” – it’s ridiculous. Does it mean that if I blog about, say, that I’m going to a conference X on a date Y, that I have to remove or adjust that post after the conference passes?.. And why, oh why, isn’t “comply with existing, related standards” (e.g. ISO) the first thing the standards developers do?…

Comment by steve faulkner

bruce wrote:
“Henri’s draft (but not Steve’s) suggests an approximate correspondance between HTML 5’s header and ARIA’s banner role.”

There may be an approximate correspondence between header and banner,if there is only one header in a document, but the HTML5 spec clearly implies that they may be a number of headers in a document. Reading the spec it seems to me that header is not meant to signify a container of content at the top of a page (which i read the role=”banner” to signify:

“The h1–h6 elements and the header element are headings.”
http://www.w3.org/TR/html5/semantics.html#headings-and-sections

I wouldn’t care much about the exact mapping between ARIA landmark roles and HTML5 elements if the implications of there being a mapping did not mean that use of landmark roles may be non conforming in HTML5.

Landmarks mostly duplicate HTML5 container elements.

The following roles are unsupported.

Landmarks”
http://hsivonen.iki.fi/aria-html5-bis/ (a later version of the doc of henri’s you link to)

Comment by Jacob Rask

Hm, so you use 2 (or more) header elements on each page.. I guess it does make sense, although the date in my opinion would be more of a “meta post” information, such as author and category, than semantically part of the article’s header? I like that it design-wise is clear on your site when the article was posted though, many people nowadays neither have the date in the URL nor easy to find on the page so you wont know if it’s outdated.

This HTML5 “series” made me both follow your blog and start experimenting with HTML5 myself, by the way!

Comment by Bruce

Dominykas, I agree. One of the principles of making good rules is that they can be testable. If you validate a page with a next-week event today, it will be valid. But if you re-validate in 2 weeks time, will the validator tell you the page is invalid as the “future event” is now in the past? That would be absurd.

Steve Faulkner, agree too (sorry if it made it sound like I didn’t). I understand the reason for the aria role content-info to be like banner in that it appears once per page (although I can see an argument for it being allowed multiple times; what is better for assitive tech users?). So if it’s only allowed once, we can’t have an automatic mapping, and the content author needs to be able to specify those roles, so they need to be added to/ made conformant with the HTML spec.

Jakob Rask – thanks so much. I’m glad my experiments are useful. I think it’s vital to test-drive the specs and see whether they make sense. Thanks for following them.

I’ve put the meta-information in the header as I think it’s appropriate for a blog where the “who” and the “when” are important to the understanding of a post,as you said. On a corporate site, I always think it’s good to have something on page to say “last updated on .. (date)” and perhaps a name, but that might very well be in the footer as it’s perhaps less vital information.

But it’s a blurry decision i made, and I might very well change it; as I said, this is an experiment rather than a tutorial, so thanks for making me think harder about it.

Comment by Rich Clark

Bruce,

Good summary of what you’ve done. And here we are again with the time element.

It seems pretty ridiculous that it would only be for future events, because as you say in two weeks time it would become invalid therefore people wouldn’t use it.

I only hope that the spec gets revised to include at least dates like those you suggested in your post.

Comment by Patrick H. Lauke

on henri’s comment about *future* dates…I may be wrong (and can’t be arsed to read through the thread), but it may be that he meant “*future* dates” as a shorthand for saying “gregorian calendar”, i.e. not meant for BC and such. because if he did mean it at face value, it’s of course ridiculous. and speaking of ical, i often find myself looking back at previous months in my calendar app (thunderbird+lightning) when I need to check things like “when did I fly over to Germany for that conference” or something…so even the use case he mentions doesn’t preclude past dates, in my opinion.

Comment by Bruce

Pat, Henri said “The time element is meant as a replacement for the microformat abbr design pattern in Calendar (if the microformat community embraces time ; if not, time in pretty much pointless in HTML5). The expected use cases of hCalendar are mainly transferring *future* event entries from a Web page into an application like iCal.”

Henri filed bug 6536 “The spec doesn’t make it clear that the time element is meant to be useful *if* microformats such as hCalendar adopt it in place of the abbr pattern.”

Comment by Dave

“computer-formatted dates are best for people”
I agree with you there, the full UTC date-stamp is illegible, but that is NOT what “YYYY-MM-DD format” is. As a Brit, I would have thought that all of the participants in your straw poll would have been aware of the horrible ambiguity of 06/06/2009 either side of the atlantic, so unless you are going to write dates in long-hand then there simply isn’t anything better than YYYY-MM-DD

Comment by John Faulds

I’ve been playing around with divs with HTML element-like class names recently (e.g. .section, .article, .aside) and have been wondering about the proper usage of article and section and was going to ask here but notice you’re not using any section elements in your site. Are they still on the drawing board?

Comment by Bruce

Hi John – I use section elements on the category pages in order to recalculate the computed hierarchy of heading elements (see my article Headings in HTML 5 and accessibility for more information), but haven’t written it up yet as I’m still puzzling over some of the details of how well it sits in the context of this blog.

As far as I can tell, section is a slighly more semantic version of div, and if article is more appropriate (which it is for a page of articles) that should be used in preference.

Comment by zibin

Nice one Bruce!

I am wondering will it be more logical to have section inside article rather than another artic to justify a sub section?

This way, section achieves the semantic representation of what we have for H1-H6

article
–section
—-section
–section
/article

Comment by Bruce

You’re right that section is there to work alongside h1 .. h6 (Although I’m retaining the h1.. h6 method as I don’t want to retrospectively amend hundreds of legacy posts).

But an article is the correct markup for “a user-submitted comment”, and I’ve chosen to nest them inside the parent article, so it’s easy to understand which comments relate to which article (this is prefectly valid). But it would be wrong to use article merely to start a new section.

Comment by AJ

Hi Bruce,

I’ve tried to download your HTML5 WordPress theme a couple of times now, from different pages – but when extracted it contains the Easy Contact script – not a WordPress theme at all!

Could you update the file? I’d love to start experimenting!

Thanks