HTML 5, microformats and testing accessibility

It’s unsurprising, I suppose, that if a group of like-minded individuals go into conclave to write a specification, they will be angered and annoyed when that spec is criticised and questioned by outsiders. This is what has happened when the microformats spec and HTML 5 specs came under scrutiny.

Testing shows the gulf between theory and practice

Most microformats adherents seem to agree with an article that James Craig and I wrote, hAccessibility, that pointed out that the current spec’s use of the abbr element is inaccessible to some users. In theory, “Austin, TX” is an abbreviation of “30.300474;-97.747247″. In practice, it doesn’t work (mp3).

It shocks me that when this flawed idea was originally mooted, a trivial test with a forty-minute demo version of JAWS would have shown it was inaccessible, yet no-one thought to do it. As they say, in theory, theory and practice are the same. In practice, they are not..

Creators of new techniques and specs need accessibility at the heart of their specs, and that means testing.

Goodbye headers, hello again <kbd> and <samp>

A similar problem is happening with the HTML 5 group. Like many people, I haven’t paid much attention to the WHATWG until now, because it was “a loose, unofficial, and open collaboration of Web browser manufacturers and interested parties” with an “invite-only steering committee”. It all seemed highly pie-in-the-sky, and there were other, more immediate tasks at hand, like evangelising accesibility in existing technologies like HTML, or emerging technologies like microformats.

Now that the WHATWG spec is to be the basis of HTML5, a lot more scrutiny is directed at the spec.

The HTML5 spec ruled nothing in and nothing out. The criteria for retaining an element or attribute is that of usefulness. I personally question whether the computer-science kbd, var and samp elements should go in there (really, when was the last time you used these on a client’s business site?), but that’s not the reason for my rant.

As Roger Johansson recently noted, the HTML5 spec drops two useful attributes from data tables – the headers and summary attributes that WCAG recommends.

Lachlan Hunt, who’s heavily involved in HTML5, wrote,

They haven’t been removed. They just have not been added yet due to lack of evidence to support them.

The headers attribute: I’m aware that this one currently has better support in ATs than the scope attribute does, but for most cases it’s redundant.

If the problem is just associating cells with their headers, we should investigate alternatives that would make it easier, such as defining an algorithm for more accurate implicit association.

That would be better because it increases accessibility, while reducing the requirements on authors. However, that needs research and evidence to determine if it can cover sufficient use cases reliably, which will allow us to figure out if headers is still required.

Now, in my opinion, one of the reasons that screenreaders are the Netscape 4 of the assistive technology world is precisely because they try to use heuristics to figure things out, rather than use the specified standards. If, for example, everyone had used the (now-deprecated) menu element for navigation, assistive technologies wouldn’t need to try to guess where content starts, and authors wouldn’t need to code the dreaded “skip links”. But that’s another side issue.

The burden of proof

Testing is vital, particularly at the border of accessibility theory and practice. I wonder, for example, if tabindex and accesskey would have made it to the HTML4 spec if there had been full testing with assistive technology users?

What I really want to know from the HTML5 people is who they think is going to do this research that will provide the evidence that their gang requires before useful attributes are restored to the specification.

The WHATWG spec is funded by big business, all of whom have millions in the bank. Maybe now the spec is “official”, they will be funding user research with disabled people using assistive technologies. Perhaps they will invite representatives from the manufacturers of the big screenreaders to work with them. They could even fund those representatives, given that assistive technology vendors aren’t anything like as rich as Apple, Opera, Mozilla and Google.

After all, it’s impossible to imagine that they would make arbitrary decisions to remove or retain certain elements, all with unknown accessibility side-effects, and put the burden to prove the usefulness of removed attributes on a small group of volunteers, isn’t it?


Also see Gez Lemon’s more sober article, The HTML Scope/Headers Debate.

25 Responses to “ HTML 5, microformats and testing accessibility ”

Comment by Nick

I personally question whether the computer-science kbd, var and samp elements should go in there (really, when was the last time you used these on a client’s business site?)

I can’t remember either. But don’t forget that mot all HTML documents are written for public business sites. I can think of many instances where these elements are truly useful – academic papers, instructions, help documentation, and so on. A trivial point, but thought I’d mention it all the same.

However, I share your worries. We know little about the actual workings of the working group that will finalise the specification, and the seemingly illogical rationale for their decisions.

Comment by Andy Hume

“We know little about the actual workings of the working group that will finalise the specification, and the seemingly illogical rationale for their decisions.”

Then join the group and help to create a spec that fulfills your requirements as well. There’s an awful lot of people criticizing, and not that many people doing anything about it. That’s why I’ve encouraged Roger to stay on in the WG when he was considering leaving. He is one person who does have accessibility at the fore-front of his concerns, and that is not common in the WG (why this might be is a totally seperate conversation). If he leaves then that’s one less person to advocate accessibility as a core part of the spec.

There is work to be done now on this – it could be a hugely important time for the future of HTML and the web – and I’d rather read that the experts were getting involved and explaining and teaching rather than complaining that not everyone has exactly the same perspective or understanding that they do.

Comment by bruce

Nick – thanks for your point, which isn’t trivial at all. The kbd, samp and var elements are useful to a small group – but are they useful enough to be in html? For example, there are millions of sites with poetry and song lyrics, but no-one suggests a <stanza> or <chorus> element should be introduced; <div class=”chorus”> is good enough. Do we really need <kbd> as opposed to a generic <span class=”kbd”>? This is a great debate that’s raging.

I want to second what Andy says. There’s a lot of noise with the working group, but it’s vital to keep reading it. To me, it’s more important than WCAG 2.

Comment by James Edwards

Well personally, I question both the need and veracity of a new version of HTML when the current version is still so poorly implemented across UAs and ATs.

To take elements for example – is it wise to introduce a whole new raft of input types that ATs will need to be able to reconcile, isn’t it more sensible to find solutions that reduce down to the smallest number? A slider control is just a text input with a changing value – the input method for that value is not a matter of semantics, it’s a matter of UI design, and we already have the tools to implement that design (JavaScript and CSS). Isn’t the introduction of an input of type slider just going to confuse the issue for ATs, by allowing authors to not have to consider the basic semantics of the element they choose?

And how, to take HTML 5 in broader terms, does allowing authors and authoring tools to output any crappy markup they want help interoperability? Defining a standard that is less strict than HTML 4 is, in my opinion, semantic suicide – all the work we’ve done to encourage authors to care about standards and interoperability could be undone overnight.

I wasn’t aware of the loss of headers attributes in tables, but I’m amazed and appalled to hear it. I can’t believe that Lachlan thinks this is a sensible rationale:

“I’m aware that this one currently has better support in ATs than the scope attribute does, but for most cases it’s redundant.”

If all we care about is “most cases” then accessibility doesn’t matter at all. Almost by definition, we’re dealing with edge cases, but that is totally not the point. I agree with you when you say that the use of heuristics is a big part of the problem, and the solution to this problem is a tighter, not a looser, standard, in which heuristics are simply not required.

Comment by Anne van Kesteren

bruce, concluding that from my statement seems a bit like a stretch. It does seem that people get this wrong in example code on the web though. I forgot to note that Lachlan is not talking about heuristics, but talking about defining exact algorithms. Much like the algorithm required to support headers=, only a tad more complicated (but not much). This would also make ATs support the existing usage of scope= better.

I’m not really sure why you guys feel offended by all this. At this point the draft is a proposal. If there’s substansive evidence (and Hixie said he will look into it if nobody else will) to support headers= it will probably be added to the draft. To date, the only examples given are way easier to address with scope= or are done incorrectly (using header= as opposed to headers=).

The WHATWG has an open process and everyone is invited to join and contribute. The steering committee is just there to kick the editor out in case he “misbehaves” and doesn’t have much more of a say otherwise. Everyone’s feedback is taken into account.

Comment by mpt

Bruce, the WhatWG is much less “funded by big business” than the W3C is. Companies pay thousands of euros to the W3C to get a privileged status that barely exists in the WhatWG. To contribute in the WhatWG, all you need do is subscribe to the mailing list.

James, a new version of HTML is necessary, and the current version is poorly implemented, for exactly the same reason. For any UA vendor that wants its UA to be popular, HTML 4.01 is unimplementable. (And for any Web author that wants their site to be popular, XHTML is impractical.)

Comment by bruce

Anne, thanks for coming back; I was concerned that your stopping by to comment solely on a typo meant that you were dismissing the argument. I was wrong, and am grateful for your clarification.

I’m not particularly offended (yet), just concerned that there appears to be a degree of arbitrariness in the way elements are in the spec or not in the spec, and my recent experience shows that it’s very hard to convince conclaves that their spec should be re-considered because of accessibility reasons.

mpt – the WHATWG was “a loose, unofficial, and open collaboration of Web browser manufacturers and interested parties”. Apple, Opera, Mozilla have considerable cash. Just because they’re small compared with Microsoft and Google hardly means they’re tiny and cash-strapped. The WHATWG originally began to advance the interests of the browser manufacturers, and to codify their proprietory innovations (canvas etc). That is A Good Thing.

As they have money to devote to the advancement of the web’s language in order that they can sell sexy new browsers/ user agents, they should also devote cash to the real, thorough. methodical testing and involvement of people with disabilities and those who make their assistive technologies.

(I’m not antipathetic to big business, as long as they innovate responsibly.)

Comment by Anne van Kesteren

bruce, the browser vendors started the WHATWG. People from all over the world (there are over 700 subscribers on the list) are taking part in the effort giving input on accessibility, usability, ease of authoring, etc. For instance, the initial version of the canvas element didn’t support fallback content. Because of the “WHATWG process” it became more accessible and gained that feature. As a result, Safari updated their implementation and other browsers implemented it right from the start.

Whether or not elements and attributes make it into the specification is not an arbitrary process by the way, although I suppose it may seem that way if you’re not involved in the process. Descisions on what to include are based on research. Such as the multiple studies Ian Hickson did inside Google on markup usage in over three billion documents.

Comment by Bruce

Hi Anne, as Patrick said above, we are on the list and have been trying to become involved in the process.

The research is interesting, but not conclusive. For example, zillions of documents use the font tag. That doesn’t mean it should be retained. Few documents use headers or scope, but that’s because data tables are in the minority (and accessibly-minded coders even more so).

I’d ask of the original WHATWG members (I believe you’re a member; please correct me if I’m mistaken): what assessment was made about accessibility when the decision was made to drop headers and summary from the spec? Which screenreaders were tested? What plans are there for formal engagement with assistive technology vendors, users and the W3C Web Accessibility Initiative?

Comment by Anne van Kesteren

I thought Patrick referred to the HTML WG.

The reason for summary= was that a large number of sites used the attribute incorrectly making it not usable for AT. headers= wasn’t used widely at all and is also much more complex to author than scope= which was used (well, more often anyway). I don’t think we can achieve an accessible web by having special purpose features just for AT-clients as people don’t use them. We can achieve accessibility by making features as accessible as possible by default so that if authors use them they are accessible too.

I’m a contributor to the WHATWG by the way, not an “official” member.

The font element probably has to be retained for WYSIWYG editors. So far we haven’t managed to replace WYSIWYG editors on the web with something better and it seems like we won’t get somewhere better anytime soon. So I guess as long as we have WYSIWYG editors we’ll have the font element too. (Note that having a font element or a styled span element doesn’t make much of a difference.)

Comment by Benjamin Hawkes-Lewis

Bruce asks:

really, when was the last time you used these [var, samp, kbd] on a client’s business site?

Actually, on a financial site I’m currently working on for my current employer, I used not only headers and summary, (for a complex data table that could not be fully articulated using scope), but also kbd (for search query text). I could have used var instead, now that I think about it.

James Edwards writes:

To take elements for example – is it wise to introduce a whole new raft of input types that ATs will need to be able to reconcile, isn’t it more sensible to find solutions that reduce down to the smallest number? A slider control is just a text input with a changing value – the input method for that value is not a matter of semantics, it’s a matter of UI design, and we already have the tools to implement that design (JavaScript and CSS). Isn’t the introduction of an input of type slider just going to confuse the issue for ATs, by allowing authors to not have to consider the basic semantics of the element they choose?

No. On the general point, the closer that HTML controls approximate to the ordinary controls common to graphical user interfaces and exposed by accessibility frameworks like MSAA in a standard fashion (rather than with hacky and often poorly thought-out scripting and styles), the easier it will be for AT-friendly browsers to translate HTML controls into the familiar terms of those accessibility frameworks. On the specific point, such accessibility frameworks tend to include a role for slider controls since a slider is /not/ just a text input with a changing value. Although definitions vary slightly, one is that a slider is a control that allows the user to select or adjust a value in increments from a bounded range of minimum to maximum values (Mapping MSAA and IAccessible2 to ATK).

Anne van Kesteren writes:

The reason for summary= was that a large number of sites used the attribute incorrectly making it not usable for AT.

That’s only one interpretation of the limited available evidence, not a demonstrated fact. The commercial Google Code Survey found that summary was widely used without investigating how. Ian Hickson presented another survey, which he agreed was insufficiently representative and for which I can find no stated methodology, as evidence of widespread misuse.

Now the specified use of summary is wider than the usage recommended by the Web Content Accessibility Guidelines. The actual specification for summary states: This attribute provides a summary of the table’s purpose and structure for user agents rendering to non-visual media such as speech and Braille. Unlike best practice, this wording is not inconsistent with a summary like “This table is for layout” or a summary that duplicates information available elsewhere in the markup. Summaries compliant with the specification but not WCAG may not be ideal for assistive technology, but they are still useable and do not prevent summaries compliant with both from being helpful.

The survey involved a selection of 469 instances of summary. By my count, only 13 summaries were obviously spam. 215 (almost half) of the summaries were null. Now the Web Content Accessibility Guidelines recommend that tables used for layout have either no summary attribute or a null summary attribute. So at least some of those will be correct uses of summary, as will many of the non-uses of summary not included in the survey.

One simply read summary. Only 54 simply declared the table to be for layout without providing any further information about its purpose or structure. One did the opposite and declared the table to be a Data table. Such summaries are bad practice, but not utterly incompatible with the HTML specification, nor atypical of the widespread abuse of most elements and attributes. The remaining 185 non-null summaries followed the specification by attempting to describe the actual contents or structure of the table, regardless of whether the table was being used for data or layout. Judging by the summary text alone, 50 of these described the content or structure of tables used for laying out the page, navigation, or forms, leaving 135 summaries that could well comply with WCAG if they don’t duplicate information in caption elements. Seeing as most tables on the web are presumably layout tables, 135 (or even half that number) out of a selection of 469 could be thought surprisingly good. And given that visual user agents do not show summaries, the fact that so many summaries at least attempt to provide useful text rather than spam, even if they still fall short of best practice, could suggest an unusual willingness to cater to assistive technology.

Comment by Benjamin Hawkes-Lewis

Bruce asks rhetorically: really, when was the last time you used these [var, samp, kbd] on a client’s business site? Actually, on a financial site I’m currently working on for my current employer, I used not only headers and summary, (for a complex data table that could not be fully articulated using scope), but also kbd (for search query text). I could have used var instead, now that I think about it.

James Edwards writes:

To take elements for example – is it wise to introduce a whole new raft of input types that ATs will need to be able to reconcile, isn’t it more sensible to find solutions that reduce down to the smallest number? A slider control is just a text input with a changing value – the input method for that value is not a matter of semantics, it’s a matter of UI design, and we already have the tools to implement that design (JavaScript and CSS). Isn’t the introduction of an input of type slider just going to confuse the issue for ATs, by allowing authors to not have to consider the basic semantics of the element they choose?

No. On the general point, the closer that HTML controls approximate to the ordinary controls common to graphical user interfaces and exposed by accessibility frameworks like MSAA in a standard fashion (rather than with hacky and often poorly thought-out scripting and styles), the easier it will be for AT-friendly browsers to translate HTML controls into the familiar terms of those accessibility frameworks. On the specific point, such accessibility frameworks tend to include a role for slider controls since a slider is /not/ just a text input with a changing value. Although definitions vary slightly, one is that a slider is a control that allows the user to select or adjust a value in increments from a bounded range of minimum to maximum values (Mapping MSAA and IAccessible2 to ATK).

Anne van Kesteren writes: The reason for summary= was that a large number of sites used the attribute incorrectly making it not usable for AT. But that’s only one interpretation of the limited available evidence, not a demonstrated fact. The commercial Google Code Survey found that summary was widely used without investigating how. Ian Hickson presented another survey, which he agreed was insufficiently representative and for which I can find no stated methodology, as evidence of widespread misuse.

Now the specified use of summary is wider than the usage recommended by the Web Content Accessibility Guidelines. The actual specification for summary states: This attribute provides a summary of the table’s purpose and structure for user agents rendering to non-visual media such as speech and Braille. Unlike best practice, this wording is not inconsistent with a summary="This table is for layout" or a summary that duplicates information available elsewhere in the markup. Summaries compliant with the specification but not WCAG may not be ideal for assistive technology, but they are still useable and do not prevent summaries compliant with both from being helpful.

The survey involved a selection of 469 instances of summary. By my count, only 13 summaries were obviously spam. 215 (almost half) of the summaries were null. Now the Web Content Accessibility Guidelines recommend that tables used for layout have either no summary attribute or a null summary attribute. So at least some of those will be correct uses of summary, as will many of the non-uses of summary not included in the survey.

One simply read summary. Only 54 simply declared the table to be for layout without providing any further information about its purpose or structure. One did the opposite and declared the table to be a Data table. Such summaries are bad practice, but not utterly incompatible with the HTML specification, nor atypical of the widespread abuse of most elements and attributes. The remaining 185 non-null summaries followed the specification by attempting to describe the actual contents or structure of the table, regardless of whether the table was being used for data or layout. Judging by the summary text alone, 50 of these described the content or structure of tables used for laying out the page, navigation, or forms, leaving 135 summaries that could well comply with WCAG if they don’t duplicate information in caption elements. Seeing as most tables on the web are presumably layout tables, 135 (or even half that number) out of a selection of 469 could be thought surprisingly good. And given that visual user agents do not show summaries, the fact that so many summaries at least attempt to provide useful text rather than spam, even if they still fall short of best practice, could suggest an unusual willingness to cater to assistive technology.

By the way, Bruce, some hints about what markup is permissible in comments would be helpful.

Comment by Ian Hickson

I’m the editor of the WHATWG specs.

WHATWG isn’t funded by big business. The only expense WHATWG has is the hosting of the Web site, and I pay for that out of my own pocket. The people who contribute to WHATWG are mostly volunteers; a few are browser vendor employees, but until recently the WHATWG was mostly under the radar of the management levels of those companies so that doesn’t mean much really. I work for Google, but I had to convince them to pay me to work on the WHATWG stuff, before that I was an Opera employee. (And as the people who know me can tell you, I’d be doing this even if I wasn’t employed.)

Regarding the headers=”" attribute, the issue isn’t closed, no decision has yet been made. I intend to do a study leveraging the resources Google puts at my disposal to make an educated decision based on actual research to see if it makes sense to have the feature or not. So to answer the question of who we expect to do the research — we will.

I hope that helps clarify the situation.

Comment by Bruce

Ian, thanks very much for stopping by to clarify. I’m staggered to find out that the browser manufacturers aren’t throwing cash to fund the WHATWG.

Good to see the headers issue isn’t closed.
I re-iterate my call browser manufacturers to fund research with real users and real assistive technologies. They’re not skint!

Comment by Ted Drake

Headers and summary are two attributes that suffer horribly from ignorance. I strongly feel that they’d be used more often if most programmers actually knew they existed.

The table summary is an especially helpful attribute for screen readers.

Here’s an example of a complicated set of tables on Yahoo! Tech that use headers to make a product comparison grid usable for screen readers. It uses the headers and summary attributes.

I built this page originally. It wasn’t easy but I feel the result was worth the effort.

Comment by steve faulkner

Hi bruce,
I joined the HTML WG last week to comment on the headers issue primarily. I have been doing some testing with JAWS and Window Eyes to see if the headers attribute is supported JAWS 6.2 test http://lists.w3.org/Archives/Public/public-html/2007Jun/0072.html and Window Eyes 6.0 test http://lists.w3.org/Archives/Public/public-html/2007Jun/0114.html

The tests indicate that for complex irregular data tables, JAWS and Window Eyes make use of id/headers to correctly interpret the headers associated with a data cell, whereas the scope attribute in this case was not up to the job.

Comment by alan bowers

HTML 5 authors follow a double standard. They claim that the spec is driven by the real world and what current browsers are capable of rendering. Then, they completely ignore the capabilities of screen readers and other assistive technology. The Web is not just for Web browsers!

But the biggest problem with HTML 5 is not fixing the accessibility flaws of HTML 4. Specifically support for numbered headings that nobody uses correctly. Mr Hickson, don’t be afraid to borrow a little from XHTML 2. Don’t worry, you won’t get contaminated.

Comment by Laura

On May 27, I asked for advice from WAI and the PFWG on the “headers” issue:
http://lists.w3.org/Archives/Public/public-html/2007May/1208.html

This is their resulting statement:
http://lists.w3.org/Archives/Public/public-html/2007Jun/0145.html

Then as the editor requested,
http://lists.w3.org/Archives/Public/public-html/2007Jun/0003.html

I put together an issues page in the HTML 5 working group’s wiki on the subject.
http://esw.w3.org/topic/HTML/IssueTableHeaders

Comment by bruce

Thanks Laura.

I like this sentence from the WAI/ PFWG statement:

Some commentors have suggested that in order to sustain a small language there have to be some screening factors, and frequency of use in the as-is Web is the screening factor to use.

The WAI position on this is roughly “that is like saying that the builder of a high-rise building should decide whether or not to include fire-stairs based on whether the previous buildings at that street address had burned down or not.”