Proposals for changes to lists in HTML 5

One of the things that have long irritated me about HTML is the restriction on what elements are allowed inside lists.

The specs for both HTML 4 and 5 allow only li for ul, ol, and only dt and dd are allowed inside dl definition lists. I’d like to expand that to allow h1h6, section and div.

I was talking to two of Opera’s Standards reps, Anne van Kesteren and Lachlan Hunt, about this and they suggested that I make a proposal to the HTML 5 working group, with appropriate use cases.

So before I make a tit of myself by putting flawed proposal to that somewhat grumpy group, I thought I’d do what Eric Meyer did and ask developers at large what you think. Here’s my reasoning—and if you have any more use cases or objections, please let me know.

Allowing headings (h1h6) in lists

Until recently, I worked for the Law Society and Solicitors Regulation Authority. In such a business, we spent a lot of time marking up rules, regulations and statutes.

In the UK, as most (all?) other jurisdictions, laws and rules are written with numbered paragraphs. Within those lists are headings that introduce sections. The headings are not part of a list item, but group list items. Check out any of the thousands of examples at Office of Public Sector Information or the UK Statute Law Database.

Here’s a small but nevertheless real-world example: take a quick look at the Solicitors’ Practising Certificate Regulations 1995 (PDF 34K), which I naturally want to mark up like this:

<ol>…
<h2>Commencement</h2>
<li> These regulations replace the Practising Certificate Regulations 1976 in relation to all practising certificates, and applications for practising certificates, for any period commencing on or after 1st November 1995.</li>
<h2>Requests for information</h2>
<li>In addition to information supplied on any prescribed form under these regulations, solicitors must supply to the Law Society such information as to their practice as solicitors as the Society shall from time to time reasonably require for the purpose of processing applications.</li>
<h2>Replacement date and conditions</h2>
<li>The replacement date for every practising certificate shall be the 31st October following the issue of the applicant’s current practising certificate.</li>
<li>Every practising certificate shall specify its commencement date, its replacement date, and any conditions imposed by the Law Society</li>
…</ol>

You’ll notice that the heading "Replacement date and conditions" is not part of either of the following two items, so is not a child of either li. Instead, it groups (or introduces) them, and therefore, its semantically most appropriate location is as a child of the surrounding ol.

Another way to mark up this document is as a succession of headings and paragraphs, with each paragraph beginning with a hard-coded paragraph number, perhaps surrounded with a span that is styled with dislay:block; in order to make the number look like a list marker. This spectacularly fails the Bruce Lawson Markup Duck Test which states that if it looks like a duck, walks like a duck and quacks like a duck then it is a duck: a list of paragraphs, each beginning with a number indicating the order of the paragraphs is an ordered list, and needs to be marked up as one.

Take a more complex example, Legal Services Act 2007, paragraphs 203-206. This legislation is a long list of numbered paragraphs, interspersed with headings to group the following paragraphs into sections. Being more complex, this legislation has nested (ordered) sublists, but the same logic and basic structure holds here too:


<ol>

<li><h5>
The giving of notices, directions and other documents in electronic form</h5>
<ol>
<li>[subparagraph 1]</li>
<li>[subparagraph 2]</li> …
<li>[subparagraph 8]</li>
</ol>
</li>
<h4>Orders, rules etc</h4>
<li><h5>Orders, regulations and rules</h5>
… lots of subparagraphs …

</li>
<li><h5>Consultation requirements for rules</h5></li>
<li><h5>Parliamentary control of orders and regulations</h5></li>
<h4>Interpretation</h4>

</ol>

A counter argument is that that these whole piece of legislation is an ordered list of sections, each containing a sublist list of paragraphs within that section.. And that is an legitimate way to look at it, except that the actual numbered paragraphs would no longer have the correct paragraph numbers auto-generated, as they’d be split into sublists.

Playing with CSS counters wouldn’t help, as different lists are treated as separate entities, so numbering in one list can’t follow on from numbering in another list. To avoid the paragraph immediately below a section heading (the h4 in my code example above) going back to 1, you would have to give the li a start attribute and hard-code the paragraph number, making a mockery of the idea of automatically generating numbers in ordered lists. Even if it could be faked with CSS counters or hardcoding the start attribute, it shouldn’t be because that fails the Duck Test, too.

HTML 5 sections

For HTML 5, it would be ideal if the spec allowed the new section element to be a child of a list. This means that content could be pulled from a CMS into different pages with different heading hierarchies, and the headings would automatically be the correct level within that context. This is an idea from the XHTML 2 spec, which has an unnumbered h element:

Structured headings use the single h element, in combination with the section element to indicate the structure of the document, and the nesting of the sections indicates the importance of the heading. The heading for the section is the one that is a child of the section element.

In HTML 5 this is complicated by backwards compatiblity, so any heading element from h1h6 can be chosen, and the headings and sections algorithm determines what “level” it actually is. (See A Preview of HTML 5 for a more readable discussion of section).

I’ve marked up the Practising Certificate example as HTML 5 and styled the various different levels of h1s using CSS so you can see a practical example of the usefulness of allowing headings and section to be children of a list.

Headings in definition lists

An example in a definition list would be similar. Here’s a real-world glossary marked up as a definition list (which is the best way to mark them up, in my opinion, although some favour tables).

A really long alphabetical glossary would be enhanced by dividing it up with headers for each letter of the alphabet, for reasons of scannability, or so an on-the-fly table of contents generator could make a linked table of contents above the glossary.

That could be done by the following (illegal code):

<h1>Glossary</h1>
<dl>
<section>
<h1>A</h1>
<dt>Aardvark</dt>
<dd>Never hurt anybody</dd>
<dt>Allegro</dt>
<dd>The lower limbs of people standing side-by-side</dd>
<dd>The finest car known to man</dd>
</section>
<section>
<h1>B</h1>
<dt>Bee porn</dt>
<dd>See Christian Heilmann, Tom Hughes-Croucher</dd>
</section>
</dl>

You might say that each letter of the alphabet should have its own dl. I contend that a glossary is a single entity, not twenty-six different lists and would reply "Tish and pish, sir. You are a nincompoop, and your words are balderdash, poppycock and gobbledegook."

And I’d be right, and you’d be sorry.

Allowing div as a child of a list

While we’re talking of rules and specifications, I’d like to know why I can’t use div inside a list.

Mostly I’d like to do this so that I could properly style definition lists to look like tables.

You can’t reliably style definition lists at the moment, but you can if you can wrap a dt and its associated dds in a div. This is illegal, but works cross-browser already.

I agree with the HTML 5 gang when they refuse a new grouping di element (presumably "definition item"), saying "This is a styling problem and should be fixed in CSS. There’s no reason to add a grouping element to HTML, as the semantics are already unambiguous."

Yes, there is no reason for a new definition grouping element; we already have a generic grouping element called div. And, yes, it’s true that it’s a problem for CSS, but with all the other stuff on the CSS Working Group’s agenda, they’re unlikely to get round to it soon.

It must be a common problem (the HTML 5 crew cite it as a "frequently asked question") and it can be easily solved using the interoperable, backwardly-compatible method I outlined above.

It also raises a philosophical question: I can understand why there are restrictions on where some elements can go (for example, it would make no semantic sense to allow a list inside an image), but why restrict where an author can put an element that has absolutely no meaning ("The div element represents nothing at all")?

Conclusion

I see the argument against over-complicating a specification, but I think that if a new spec can’t accommodate real-world examples of content then the specification is not in danger of over-complication—rather, it’s currently over-simplistic. HTML 5 has been bravely making itself backwards-compatible and thereby becoming more complicated in some areas (such as the algorithm for working out the importance of headings in sections), so slight extra complication to help developers can also help its adoption.

Thank you for reading this far, dear reader. Am I talking nonsense? Have I missed something obvious? Do tell, before the HTML 5 gang employ sarcasm or scorn at me.

10 Responses to “ Proposals for changes to lists in HTML 5 ”

Comment by JackP

It’s not precisely the same thing, but while you’re on lists, I’d like to be able to specify the value of an ordered list (either a start value or a per-item value).

For example, if I want to quote from a legal document, it’s normally full of numbered paragraphs (an ordered list). However, unless I want to quote all of it, there is no easy way to quote paras 49-53, and mark them up in a correctly numbered ordered list…

Comment by Dave

I’m nodding too, although in not quite a loon like manner.

I totally agree about the headings; it makes good semantic sense and I would certainly have used it if it was available. Less sure about the div, although I can’t see a reason why it shouldn’t be allowed. I just seem to have amanged without it so far.

Comment by Bruce

Hi Jack – you can set a starting value for a list in HTML 5 (and in transitional (x)html)) using ol start="49"

(For some bonkers reason, it’s deprecated in strict doctypes, but undeprecated in HTML 5)

Alternatively, but more complex, abstracted from the markup and unsupported in IE you can use CSS counters.

(In an example when you want to quote paras 49-53, it makes sense to allow the content author to specify the starting value in the markup, rather than muck around with inline styles).

HTML 5 also has quite a neat attribute that allows reversed ordered lists (for countdowns and the like)

Comment by Michael

I agree wholeheartedly about the headings. I’ve used nested lists with a heading in the parent to get around this, but it’s clumsy.

Comment by Robert Whittaker

I think I’d be in favour of the changes you’re suggesting, since I can see they’d be useful for many real-life examples.

There are two interesting questions that this raises though:

1/ What the difference between an ordered list and a set of numbered paragraphs?

2/ When should the numbers themselves be considered part of the content of a document, rather than part of the styling? (And how should HTML / CSS address this?)

Comment by Matt Wilcox

Hi Bruce,

These all seem like great ideas to me. At the moment the only way to simulate the effect of headers “inside” lists is to use CSS counters to effect the next list, but support for those is flaky at best, and the semantics are actually quite different.

I’d like to be able to do all of the things expressed in your article.

Comment by AlastairC

I’d add that there are problems with the valid current methods:

Nested lists, or even just headings in lists are very difficult for editors to deal with, both at the technical level (how a WYSIWYG interface works), and how end-users think about it.

Putting a heading inside a list doesn’t make sense to end users, and putting block elements in a list item seems difficult for things like TinyMCE to deal with.