Accessibility of HTML 5 video and audio elements

HTML 5 has the audio and video elements that conveniently allow an author to add multimedia to their pages in an intuitive way. The advantage to the consumer is that the files will play in the browser with no plugins, and the data will be in the browser and therefore can be manipulated with scripts.

These elements are supported in Safari 3.1, Firefox 3.5 beta and labs builds of Opera but as yet the proprietary format bickering is unfinished and no single codec is agreed upon. (Until we can use the video element, there is a way to embed Flash video on a page with valid HTML 5.)

To anyone who’s used to the horrors of YouTube-style embeds inside object monsters, the syntax is refreshing:

<video src="xxx.yyy" autoplay controls>
<a href=""xxx.yyy"">Download this video</a>

The autoplay attribute

autoplay is an attribute which makes the media play as soon as it can without prompting the user. This would be a problem for those who work in shared offices and who aren’t expecting depressing midi versions of classic rock songs to blare out when following a link, and would be a huge problem for people who rely on sound for understanding the Web, such as those using a screenreader or talklets, as the sound in a video would drown out all other content on the page.

This attribute can therefore cause annoyance or barrier and I wonder whether these negatives have been considered when the use cases for this attribute were deemed significant enough to merit its inclusion in the specification.

(Update 9 May: See also Autoplay is bad for all users. Hat-tip John Foliot.)

It’s not as if every video content provider is anxious to force their videos to play; some, such as YouTube, do but others, like Vimeo, don’t. Video sites catering to the gentleman’s leisure movie market (ahem) are similarly split: RedTube and YouPorn do begin automatically; Pornorama doesn’t. (See what I have to go through to research this stuff?!?!)

Update 8 May: I’m now reluctantly accepting that autoplay should stay because, as Simon Pieters points out

Removing the attribute will not make pages stop autoplaying video. Instead they will use script to make videos autoplay, and then it becomes harder for the user to prevent videos from autoplaying. (You could have a pref in the UA to disable autoplay.)

The controls attribute

controls is a boolean attribute which, if present, means that you want the browser to give stop/ play/ pause buttons etc. If it’s absent, it’s assumed that you are scripting your own controls. The controls spec says

User agents may make the following features available, however, even when the attribute is absent: … controls to affect playback of the media resource (e.g. play, pause, seeking, and volume controls), but such features should not interfere with the page’s normal rendering. For example, such features could be exposed in the media element’s context menu.

Given that autoplaying audio may significantly interfere with the page’s normal rendering in a screenreader, I think that the spec should say that user agents must provide a mechanism to mute or pause media. I’ve written to the working group to suggest this.

Update 8 May: So far, the response has been disappointing, with some pointing to the fact that the operating system has the ability to mute content. The trouble is that muting via the operating system will mute the screenreader as well as the autoplayed media. As a blind developer commented on twitter, “you’d be pretty unpopular if you did that to me”

The draft User Agent Accessibility Guidelines (UAAG) 2.0 has guideline 4.9 “Provide control of content that may reduce accessibility” with a success criterion

4.9.7 Stop/Pause/Resume Multimedia: The user can stop, pause, and resume rendered audio and animation content (including video and animated images) that last three or more seconds at their default playback rate. (Level A)

Given that the HTML 5 spec already strays way out into accessibility territory by pronouncing on alt text, and is as much for implementors as for content authors, I think it should confirm what the draft UAAG says.

Accessibility of content

You’ll notice that in the code example above, the content of the video element is fallback content for browsers that can’t render the new element. In this case, it’s a link wher the user can get the video file to view offline. (This is exactly the same model as the content of an iframe in HTML 4).

The video spec says

Note: In particular, this content is not fallback content intended to address accessibility concerns. To make video content accessible to the blind, deaf, and those with other physical or cognitive disabilities, authors are expected to provide alternative media streams and/or to embed accessibility aids (such as caption or subtitle tracks) into their media streams.

This is radical stuff, but I think that it’s the right way to go. These days, content from one source is embedded into countless websites – see Flickr or YouTube. It’s wishful thinking to assume that everyone who wishes to syndicate the content will copy and paste accessibility information; we can see this in specs like oEmbed which is “a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource” yet its authors don’t think that alternate text was worth including when embedding a picture.

So it’s better that the originator of the content writes/ provides the accessibility information just once and it’s accessible wherever it’s embedded through some kind of interface in the browser. (I expected the spec for controls to say that, whether or not the attribute is present, the user agent must provide a mechanism for accessing the embedded alternative content, but it is silent on this.)

So philosophically, I support the write-once-carry-everywhere model for media accessibility. But how would I author it? I’ve made a few videos with Windows Movie Maker that I’ve uploaded to YouTube, and published my songs in the (proprietary) mp3 format, but I have no idea which formats carry synchronised captioning or transcription information within them, or what authoring tools I can use.

Are there any?

(Related plug: under the banner of, I’m talking about HTML 5 at a free meet-up in London on —the day after @media—with Molly, Remy Sharp, Dean Edwards and hopefully others. Details.)

17 Responses to “ Accessibility of HTML 5 video and audio elements ”

Comment by patrick h. lauke

my first thought would be SMIL, but – not having looked at this in any detail – i wonder how this tallies up with html5…it’s not video or audio per se, just a container for generic multimedia.

Comment by Martin Kliehm

Hi Bruce,

there is a W3C standard for captioning called DFXP, and it is supported by a number of free captioning tools like MAGpie. These tools have various output formats including SMIL, which in turn is compatible with DAISY afaik. A while ago I wrote an article about DFXP for captioning YouTube. I think that is the way forward. So in my opinion the HTML 5 authors need to find a way to bolt DFXP onto the video and audio elements, not reinventing the wheel.


Comment by Joe Clark

DFXP is just Timed Text. It’s great Andrew W.K. is supporting it, but nobody else is and, even with an SMPTE committee working in secret, nobody else is gonna. Even if they do, it won’t solve the Format Wars.

All video formats in common use, and some in uncommon use, can contain closed captions or subtitles. A browser may be too stupid to display them.

Comment by Philip Jägenstedt

@Joe Clark While it may be true that many container formats have some or several ways of embedding subtitles, in very few cases does it actually work reliably. For example, can you embed subtitles in an AVI file and actually have it work on all/most platforms? (I’m would like the answer to be yes, but I think not.) For Ogg you have Writ, CMML, OGM/SRT and Kate, but none has emerged as a standard really. For MPEG1 I guess it just can’t be done and for MPEG2 it’s only DVD-like bitmap subtitles. Not sure about QuickTime/MP4. In short, it’s a big mess.

Comment by Martin Kliehm

Good point by @Joe Clark. So there are various competing formats for captions and subtitles. The W3C standard is DFXP, YouTube for example supports Subviewer (.sub) and SubRip (.srt), and of course there are other timestamped formats for synchronizing content, like SMIL and DAISY. Would it be unreasonable to expect a native browser video player to be compatible with several of them?

Comment by Bruce

Not unreasonable at all, Martin.

I wonder if Sun’s promising OMS format will allow accessibility information to be embedded in it as well?

But are there any authoring tools that I can easily use now to add transcript information to audio and captioning info to videos?

Joe Clark – great to see you here again.

Comment by John Foliot

@Joe Clark – keep up friend: JW FLV Player ( and Ohio State U’s ultra-accessible take on the same player ( supports DFXP (and I believe is moving to make it the native format for the player), as does NCAM’s ccPlayer (

@Aaron Bassett & @Bruce: Silvia Pfeiffer ( did some great research as part of a Mozilla grant looking at the issue (and especially, Bruce, the difference between in-band and out-band ‘transcripts’ – in-band for ‘one-file’ downloads and out-band for SEO and accessibility within native browser environment. Good stuff and documented here:

I know she had some dialogue with the WHAT WG W3C HTML5 WG (with a tip of the hat to Mark Pilgrim), but am not sure whether his highness has deemed it worthy of processing.

Comment by Martin Kliehm

Re: Tools. I would try MAGpie and compare it with CaptionTube for YouTube (can’t paste any URLs because I’m on the iPhone, but they should be easy to google). Software vs. web application should give you a good idea of the capabilities of these tools. And while you’re at it you could try to upload DFXP to YouTube and test if is supported. A commenter on my blog suggested it could work, but nobody has checked it recently. ;-)

Comment by Joshue O Connor

Good write up Bruce.

To me it seems that the big issue (at this stage anyway) is with the accessibility of the player controls (as such) that are built into the browser, more so than the accessibility of the content itself.

The very idea that content can be rendered directly in the browser without the need for third party plugins is great. The HTML 5 spec needs to therefore provide accessible controls for users to manage that content and it will have to anticipate and support diverse user needs. Simon Pieters comment about dropping autoplay and authors just scripting that behaviour is prescient.

The element seems to be going a long way to achieving this. I can’t (yet anyway) find things that jump out at me as plain wrong. The biggest chestnut is getting users to upload video content that supports the needs of people with disabilities, and this same issue spans multiple elements and APIs in the spec.

Advice should be therefore taken from UAAG as this provides guidance for vendors. It goes without saying (but I’ll say it anyway) that video should support accessibility as much as possible, of course, and I think so far it looks pretty good.**

** Note my hat is off and I have salt at the ready should I have to dine on it.

Comment by GusZus

I try to research if there is a way to prevent the video-files from loading before a user request it. Only the image spezified by the poster-attribut should load its resource. After all a 45MB video-file is much larger than a 235KB image-file. And bandwidth is also a accessibility concern.

Flash kann do this by just loading an swf file and then on further request the video file.

If you can’t do this with HTML 5 video, you can’t really use it. Of course there could be some useragent preference to behave this way, but i think it should be the default.

Does someone have information about this? I’m unable to find an answer in the HTML 5 spec.

Comment by Philip Jägenstedt

GusZus, apart from requiring that the document load event be delayed until the first frame is decoded (if there is no poster attribute), the spec leaves it up to the UA when to fetch data and how much. Expect the default behavior to improve as browsers compete to provide the best experience. Still, if browsers are currently doing too poor of a job, you could simply use a script to replace an img element with a video element when clicked.

Comment by GusZus

Thanks for your comment Philip Jägenstedt.

I think your right, and i found out about this mozilla bugreport regarding stopping download after the DOM ready state is HAVE_CURRENT_DATA.

Also the autobuffer attribute as a hint to download the whole video immediately makes only sense if it is expected that the useragent will at least not always do this by himself.

My testing with Firefox 3.5 shows that this bug seems to be fixed. It doesn’t load the video until i start playback.

But sadly Safari 4 does.