Archive for the 'HTML5' Category

Making accessible tagged PDFs with Prince

Love them or hate them, PDFs are a fact of life for many organisations. If you produce PDFs, you should make them accessible to people with disabilities. With Prince, (twitter) it’s easy to produce accessible, tagged PDFs from semantic HTML, CSS and SVG.

It’s an enduring myth that PDF is an inaccessible format. In 2012, the PDF profile PDF/UA (for ‘Universal Accessibility’) was standardised. It’s the U.S. Library of Congress’ preferred format for page-oriented content and the International Standard for accessible PDF technology, ISO 14289.

Let’s look at how to make accessible PDFs with Prince. Even if you already have Prince installed, grab the latest build (think of it as a stable beta for the next version) and install it; it’s a free license for non-commercial use. Prince is available for Windows, Mac, Linux, Free BSD desktops and wrappers are available for Java, C#/ .NET, ActiveX/COM, PHP, Ruby on Rails and Node/ JavaScript for integrating Prince into websites and applications.

Here’s a trivial HTML file, which I’ve called prince1.html.

<!DOCTYPE html>
<html>
<meta charset=utf-8>
<title>My lovely PDF</title>
<style>
        h1 {color:red;}
        p {color:green;}
</style>
<h1>Lovely heading</h1>
<p>Marvellous paragraph!</p>
</html>

From the command line, type

$ prince prince1.html

Prince has produced prince1.pdf in the same folder. (There are many command line switches to choose the name of the output file, combine files into a single PDF etc., but that’s not relevant here. Windows fans can also use a GUI.)

Using Adobe Acrobat Pro, I can inspect the tag structure of the PDF produced:

Acrobat screenshot: no tags available

As you can see, Acrobat reports “No Tags available”. This is because it’s perfectly legitimate to make inaccessible PDFs – documents intended only for printing, for example. So let’s tell Prince to make a tagged PDF:

$ prince prince1.html --tagged-pdf

Inspecting this file in Acrobat shows the tag structure:

Acrobat screenshot showing tags

Now we can see that under the <Document> tag (PDF’s equivalent of a <body> element), we have an <H1> and a <P>. Yes, PDF tags often —but not always— have the same name as their HTML counterparts. As Adobe says

PDF tags are similar to tags used in HTML to make Web pages more accessible. The World Wide Web Consortium (W3C) did pioneering work with HTML tags to incorporate the document structure that was needed for accessibility as the HTML standard evolved.

However, the fact that the PDF now has structural tags doesn’t mean it’s accessible. Let’s try making a PDF with the PDF-UA profile:

$ prince prince1.html --pdf-profile="PDF/UA-1"

Prince aborts, giving the error “prince: error: PDF/UA-1 requires language specification”. This is because our HTML page is missing the lang attribute on the HTML element, which tells assistive technologies which language the text is written in. This is very important to screen reader users, for example; the pronunciation of the word “six” is very different in English and French.

Unfortunately, this is a very common error on the Web; WebAIM recently analysed the accessibility of the top 1,000,000 home pages and discovered that a whopping 97.8% of home pages had detectable accessibility failures. A missing language specification was the fifth most common error, affecting 33% of sites.

screenshot from webaim showing most common accessibility errors on top million homepages
Image courtesy of webaim.org, © WebAIM, used by kind permission

Let’s fix our web page by amending the HTML element to read <html lang=en>.

Now it princifies without errors. Inspecting it in Acrobat Pro, we see a new <Annot> tag has appeared. Right-clicking on it in the tag inspector reveals it to be the small Prince logo image (that all free licenses generate), with alternate text “This document was created with Prince, a great way of getting web content onto paper”:

Acrobat screenshot with annotation on the Prince logo added with free licenses

This generation of the <Annot> with alternate text, and checking that the document’s language is specified allows us to produce a fully-accessible PDF, which is why we generally advise using the --pdf-profile="PDF/UA-1" command line switch rather than --tagged-pdf.

Adobe maintains a list of Standard PDF tags, most of which can easily be mapped by Prince to HTML counterparts.

Customising Prince’s default mappings

Prince can’t always map HTML directly to PDF tags. This could be because there isn’t a direct counterpart in HTML, or it could be because the source markup has conflicting markup and styling.

Let’s look at the first scenario. HTML has a <main> element, which doesn’t have a one-to-one correspondence with a single PDF tag. On many sites, there is one article per document (a wikipedia entry, for example), and it’s wrapped by a <main> element, or some other element serving to wrap the main content.

Let’s look at the wikipedia article for stegosaurus, because it is the best dinosaur.

We can see from browser developer tools that this article’s content is wrapped with <div id=”bodyContent”>. We can tell Prince to map this to the PDF <Art> tag, defined as “Article element. A self-contained body of text considered to be a single narrative” by adding a declaration in our stylesheet:

#bodyContent { prince-pdf-tag-type: Art; }

On another site, we might want to map the <main> element to <Art>. The same method applies:

Main { prince-pdf-tag-type: Art;}

Different authors’ conventions over the years is one reason why Prince can’t necessarily map everything automatically (although, by default HTML <article> gets mapped to <Art>).

Therefore, in this new build of PrinceXML, much of the mapping of HTML elements to PDF tags has been removed from the logic of Prince, and into the default stylesheet html.css in the style sub-folder. This makes it clearer how Prince maps HTML elements to PDF tags, and allows the author to override or customise it if necessary.

Here is the relevant section of the default mappings:

article { prince-pdf-tag-type: Art }
section { prince-pdf-tag-type: Sect }
blockquote { prince-pdf-tag-type: BlockQuote }
h1 { prince-pdf-tag-type: H1 }
h2 { prince-pdf-tag-type: H2 }
h3 { prince-pdf-tag-type: H3 }
h4 { prince-pdf-tag-type: H4 }
h5 { prince-pdf-tag-type: H5 }
h6 { prince-pdf-tag-type: H6 }
ol { prince-pdf-tag-type: OL }
ul { prince-pdf-tag-type: UL }
li { prince-pdf-tag-type: LI }
dl { prince-pdf-tag-type: DL }
dl > div { prince-pdf-tag-type: DL-Div }
dt { prince-pdf-tag-type: DT }
dd { prince-pdf-tag-type: DD }
figure { prince-pdf-tag-type: Div } /* figure grouper */
figcaption { prince-pdf-tag-type: Caption }
p { prince-pdf-tag-type: P }
q { prince-pdf-tag-type: Quote }
code { prince-pdf-tag-type: Code }
img, input[type="image"] {
prince-pdf-tag-type: Figure;
prince-alt-text: attr(alt);
}
abbr, acronym {
prince-expansion-text: attr(title)
}

There are also two new properties, prince-alt-text and prince-expansion-text, which can be overridden to support the relevant ARIA attributes.

Uncle Hakon shouting at me in Paris
Uncle Håkon shouting at me last month in Paris

Taking our lead from wikipedia again, we might want to produce a PDF table of contents from the ‘Contents’ box. Here is the Contents for the entry about otters (which are the best non-dinosaurs):

screenshot of wikipedia's in-page table of contents

The box is wrapped in an unordered list inside a <div id=”toc”>. To make this into a PDF Table of Contents (<TOC>), I add these lines to Prince’s HTML.css (because obviously I can’t touch the wikipedia source files):

#toc ul {prince-pdf-tag-type: TOC;} /*Table of Contents */
#toc li {prince-pdf-tag-type: TOCI;} /* TOC item */

This produces the following tag structure:

Acrobat screenshot showing PDF table of contents based on the wikipedia table of contents

In one of my personal sites, I use HTML <nav> as the wrapper for my internal navigation, so would use these declaration instead:

nav ul {prince-pdf-tag-type: TOC;}
nav li {prince-pdf-tag-type: TOCI;}

Only internal links are appropriate for a PDF Table of Contents, which is why Prince can’t automatically map <nav> to <TOC> but makes it easy for you to do so, either by editing html.css directly, or by pulling in a supplementary stylesheet.

Mapping when semantic and styling conflict

There are a number of tricky questions when it comes to tagging when markup and style conflict. For example, consider this markup which is used to “fake” a bulleted list visually:


<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<title>My lovely PDF</title>
<style>
div div {display:list-item;
    list-style-type: disc;
    list-style-position: inside;}
</style>
<div>

    <div>One</div>
    <div>Two</div>
    <div>Three</div>

</div>

Browsers render it something like this:

what looks like a bulleted list in a browser

But this merely looks like a bulleted list — it isn’t structurally anything other than three meaningless <div>s. If you need this to be tagged in the output PDF as a list (so a screen reader user can use a keyboard short cut to jump from list to list, for example), you can use these lines of CSS:

body>div {prince-pdf-tag-type: UL;}
div div {prince-pdf-tag-type: LI;}

Prince creates custom OL-L and UL-L tags which are role-mapped to PDF’s list structure tag <L>. Prince also sets the ListNumbering attribute when it can infer it.

Mapping ARIA roles

Often, developers supplement their HTML with ARIA roles. This can be particularly useful when retrofitting legacy markup to be accessible, especially when that markup contains few semantic elements — the usual example is adding role=button to a set of nested <div>s that are styled to look like a button.

Prince does not do anything special with ARIA roles, partly because, as webaim reports,

they are often used to override correct HTML semantics and thus present incorrect information or interactions to screen reader users

But by supplementing Prince’s html.css, an author can map elements with specific ARIA roles to PDF tags. For example, if your webpage has many <div role=”article”> you can map these to pdf <Art> tags thus:

div[role="article"] {prince-pdf-tag-type: Art;}

Conclusion

As with HTML, the more structured and semantic the markup is, the better the output will be. But of course, Prince cannot verify that alternate text is an accurate description of the function of an image. Ultimately claiming that a document meets the PDF/UA-1 profile actually requires some human review, so Prince has to trust that the author has done their part in terms of making the input intelligible. Using Prince, it’s very easy to turn long documents —even whole books— into accessible and attractive PDFs.

Structured data and Google

Domain-specific markup for fun and profit

It doesn’t come as a surprise to Dull Old Web Farts (DOWFs) like me to learn last month that Google gives a search boost to sites that use structured data (as well as rewarding sites for being performant and mobile-friendly). Google has brilliant heuristics for analysing the content of sites, but developers being explicit and marking up their content using subject-specific vocabularies means more robust results.

For the first time (to my knowledge), Google has published some numbers on how structured data affects business. The headlines:

  • Jobrapido’s overall organic traffic grew by 115%, and they have seen a 270% increase in new user registrations from organic traffic
  • After the launch of job posting structured data, Google organic traffic to ZipRecruiter job pages converted at a rate three times higher than organic traffic from other search engines. The Google organic conversion rate on job pages was also more than 4.5 times higher than it had been previously, and the bounce rate for Google visitors to job pages dropped by over 10%.
  • In the month following implementation, Eventbrite saw roughly a 100-percent increase in the typical year-over-year growth of traffic from Google Search
  • Traffic to all Rakuten Recipe pages from search engines soared 2.7 times, and the average session duration was now 1.5 times longer than before.

Impressive, indeed. So how do you do it? For this site, I chose a vocabulary from schema.org:

These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model. Over 10 million sites use Schema.org to markup their web pages and email messages. Many applications from Google, Microsoft, Pinterest, Yandex and others already use these vocabularies to power rich, extensible experiences.

Because this is a blog, I chose the BlogPosting schema, and I use the HTML5 microdata syntax. So each article is marked up like this:

<article itemscope itemtype="http://schema.org/BlogPosting">
  <header>
  <h2 itemprop="headline" id="post-11378">The HTML Treasure Hunt</h2>
  <time itemprop="dateCreated pubdate datePublished" 
    datetime="2019-05-20">Monday 20 May 2019</time>
  </header>
    ...
</article>

The values for the microdata attributes are specified in the schema vocabulary, except the pubdate value on itemprop which isn’t from schema.org, but is required by Apple for WatchOS because, well, Apple likes to be different.

And that’s basically it. All of this, of course, is taken care of by one WordPress template, so it’s automatic.

Metadata partial copy-paste necrosis for misery and loss

One thing puzzles me, however; Google documentation says that Google Search supports structured data in any of three formats: JSON-LD, RDFa and microdata formats, but notes “Google recommends using JSON-LD for structured data whenever possible”.

However, no reason is given for preferring JSON-LD except “Google can read JSON-LD data when it is dynamically injected into the page’s contents, such as by JavaScript code or embedded widgets in your content management system”. I guess this could be an advantage, but one of the other “features” of JSON-LD is, in my opinion, a bug:

The markup is not interleaved with the user-visible text

I strongly feel that metadata that is separated from the user-visible data associated with it highly susceptible to metadata partial copy-paste necrosis. User-visible text is also developer-visible text. When devs copy/ paste that, it’s very easy to forget to copy any associated metadata that’s not interleaved, leading to errors. (And Google will penalise errors: structured data will not show up in search results if “The structured data is not representative of the main content of the page, or is potentially misleading”.)

An example of metadata partial copy-paste necrosis can be seen in the commonly-recommended accessible form pattern:

<label for="my-input">Your name:</label>
<input id="my-input"/>

As Thomas Caspars wrote

I’ve contacted chums in Google to ask why JSON-LD is preferred, but had no reply. (I may go as far as trying to “reach out” next time.)

Andrew wrote

I’m pretty sure Google prefers JSON-LD over microdata because it’s easier for them to stealborrow the data for their own use in that format. When I was working on a screen-scraping project a few years ago, I found that to be the case. Since then, I’ve come to believe that schema.org is really about making it easier for the big guys to profit from data collection instead of helping site owners improve their SEO. But I’m probably just being a conspiracy theorist.

Speculation and conspiracy theories aside, until there’s a clear reason why I should use JSON-LD over interleaved microdata, I’m keeping it as it is.

Google replies

Updated 23 May: Dan Brickley, a Google employee who is Lord of Schema.org, wrote this thread on Twitter:

On Smart TVs

When I was doing developer relations at Opera, I did everything I could to avoid having to go near the Opera TV part of the business – which was basically an app store of HTML5 websites for “Smart” TVs. This was for three reasons: the world of Smart TVs was a world of closed standards. Secondly, as Patrick Lauke wrote, the chips in the early Smart TVs were cheap and crappy which seriously crippled the web experience.

But the main reason was that I felt Smart TVs were a solution looking for problem.

TVs are big, beautiful screens and good speakers, sitting in a social space. No-one wants to surf Facebook on a screen that mum and dad are also watching, especially controlling it with arrows on a remote control unit.

A cheap device like a Chromecast allows people to control the content with a portable device they already have and know how to use – be it a laptop, tablet or phone, and see the movie/ photos on a big screen with family and friends. TVs are meant to be dumb; your phone is smart and a little USB gizmo connects the two.

So why did Opera have a B2B division developing Smart TV offerings? (And after the dismemberment of Opera, it continues as Vewd). The answer: because TV set manufacturers wanted Smart TV, so would throw money at us. But why did the set manufacturers want it?

A news report last week made it clear: it’s about collecting data off users, hitting the network a lot to phone that data back to HQ, then “monetizing” it, in some cases, advertising to viewers. That helps reduce the cost at sale of TVs. As the Verge puts it: Taking the smarts out of smart TVs would make them more expensive. From their interview with TV manufacturer Vizio’s CTO Bill Baxter:

So look, it’s not just about data collection. It’s about post-purchase monetization of the TV.

This is a cutthroat industry. It’s a 6-percent margin industry, right? I mean, you know it’s pretty ruthless. You could say it’s self-inflicted, or you could say there’s a greater strategy going on here, and there is. The greater strategy is I really don’t need to make money off of the TV. I need to cover my cost.

…there are ways to monetize that TV and data is one, but not only the only one. It’s sort of like a business of singles and doubles, it’s not home runs, right? You make a little money here, a little money there. You sell some movies, you sell some TV shows, you sell some ads, you know.

I have a Smart TV (it’s difficult to find a new one that isn’t). I’ve never connected it to the web, for just this reason.

The practical value of semantic HTML

Bruce’s guide to writing HTML for JavaScript developers

It has come to my attention that many in the web standards gang are feeling grumpy about some Full Stack Developers’ lack of deep knowledge about HTML. One well-intentioned article about 10 things to learn for becoming a solid full-stack JavaScript developer said

As for HTML, there’s not much to learn right away and you can kind of learn as you go, but before making your first templates, know the difference between in-line elements like <span> and how they differ from block ones like <div>. This will save you a huge amount of headache when fiddling with your CSS code.

This riled me too. But, as it’s Consumerfest and goodwill to all is compulsory, I calmed down. And I don’t want to instigate a pile-on of the author of this piece; it’s indicative of an industry trend to regard HTML as a bit of an afterthought, once you’ve done the real work of learning and writing JavaScript. If the importance of good HTML isn’t well-understood by the newer breed of JavaScript developers, then it’s my job as a DOWF (Dull Old Web Fart) to explain it.

Gather round, Fullstack JavaScript Developers – together we’ll make your apps more usable, and my blood pressure lower.

What is ‘good’ HTML?

Firstly, let’s reach a definition of ‘good’ HTML. Many DOWFs used to get very irked about (X)HTML being well-formed: proper closing tags, quoted attributes and the like. Those days are gone. Sure, it’s good practice to validate your HTML, just like you lint your JavaScript (it can catch errors and make your code more maintainable), but browsers are very forgiving.

In fact, part of what we commonly call ‘HTML5’ is the Parsing Algorithm which is like an HTML ninja – incredibly powerful, yet rarely noticed. It ensures that all modern browsers construct the same DOM from the same HTML, regardless of whether the HTML is well-formed or not. It’s the greatest fillip to interoperability we’ve ever seen.

By ‘good’ HTML, I mean semantic HTML, a posh term for choosing the right HTML element for the content. This isn’t a philosophical exercise; it has directly observable practical benefits.

For example, consider the <button> element. Using this gives you some browser behaviour for free:

  • A button is focusssable via the keyboard. I bet you, dear reader, know all the keyboard shortcuts for your IDE; it makes development much faster. Many people use only the keyboard when using a webpage. I do it because I have multiple sclerosis – therefore the fine motor control required to use a mouse can be difficult for me. My neighbour has arthritis, so she prefers to use the keyboard.
  • Buttons can be activated using the space bar or the enter key; you don’t have to remember to listen for these keypresses in your script.
  • Inside a <form>, it doesn’t even need JavaScript to work.

“What’s that?”, you say. “Everyone has JavaScript”. No, they don’t. Most people do, most of the time. But I guarantee you that everyone is without JavaScript sometimes.

Here’s another example: semantically linking a <label> to its associated <input> increases usability for a mouse-user or touch-screen user, because clicking in the label focusses into the input field. See this in action (and how to do it) on MDN.

This might be me, with my MS; it might be you, on a touch-screen device on a bumpy train, trying to check a checkbox. How much easier it is if the hit area also includes the label “uncheck to opt out of cancelling us not sending you spam forever”. (Compare the first and second identical-looking examples in a checkbox demo.)

But the point is that by choosing the right element for the job, you’re getting browser behaviour for free that makes your app more usable to a range of different people.

Invisible browser behaviours

With me so far? Good. The browser behaviours associated with the semantics of <button> and <label> are obvious once you know about them – because you can see them.

Other semantics aren’t so obvious to a sighted developer with a desktop machine and a nice big monitor, but they are incredibly useful for those who do need them. Let’s look at some of those.

HTML5 has some semantics that you can use for indicating regions on a page. For example, <nav>, <main>, <header>, <footer>.

If you wrap your main content – that is, the stuff that isn’t navigation, logo and main header etc – in a <main> tag, a screen reader user can jump immediately to it using a keyboard shortcut. Imagine how useful that is – they don’t have to listen to all the content before it, or tab through it to get to the main meat of your page.

And for people who don’t use a screenreader, that <main> element doesn’t get in the way. It has no default styling at all, so there’s nothing for you to remove. For those that need it, it simply works; for those that don’t need it, it’s entirely transparent.

Similarly, using <nav> for your primary navigation provides screenreader users with a shortcut key to jump to the navigation so they can continue exploring your marvellous site. You were probably going to wrap your navigation in a <div class=”nav”> to position it and style it; why not choose <nav> instead (it’s shorter!) and make your site more usable to the 15% of the world who have a disability?

For more on this, I humbly point you to my 2014 post Should you use HTML5 header and footer?. A survey of screenreader users last year showed that 80% of respondents will use regions to navigate – but they can only do so if you choose to use them instead of wrapping everything in <div>s. Now you know they exist, why wouldn’t you use them?

Update: here’s a YouTube video of blind screenreader user Leonie Watson talking through how she navigates this site using the HTML semantics we’ve discussed.

YouTube video

New types of devices

We’re seeing more and more types of devices connecting to the web, and semantic HTML can help these devices display your content in a more usable way to their owners. And if your site is more usable than your competitors’, you win, and your boss will erect a massive gold statue of you in the office car park. (Trust me, your boss told me. They’ve already ordered the plinth.)

Let’s look at one example, the Apple Watch. Here are some screenshots and excerpts from the transcript of an Apple video introducing watchOS 5:

We’ve brought Reader to watchOS 5 where it automatically activates when following links to text heavy web pages. It’s important to ensure that Reader draws out the key parts of your web page by using semantic markup to reinforce the meaning and purpose of elements in the document. Let’s walk through an example. First, we indicate which parts of the page are the most important by wrapping it in an article tag.

diagram of an article element wrapping content on an Apple Watch

Specifically, enclosing these header elements inside the article ensure that they all appear in Reader. Reader also styles each header element differently depending on the value of its itemprop attribute. Using itemprop, we’re able to ensure that the author, publication date, title, and subheading are prominently featured.

Apple Watch diagram showing how it uses microdata attributes to layout and display information about an article

itemprop is an HTML5 microdata attribute. There are shared vocabularies documented at schema.org, which is founded by Google, Microsoft, Yahoo and Yandex. Using schema.org vocabularies with microdata can make your pages display better in search results:

Many applications from Google, Microsoft, Pinterest, Yandex and others already use these vocabularies to power rich, extensible experiences.

Update: Google published numbers on how using structured data boosts conversions.

(If you plan to put things into microdata, please note that Apple, being Apple, go their own way, and don’t use a schema.org vocabulary here. Le sigh. See my article Content needs a publication date! for more. Or view source on this page to see how I’m using microdata on this article.)

Apple WatchOS also optimises display of items wrapped in <figure> elements:

Reader recognizes these tags and preserves their semantic styles. For this image, we use figure and figcaption elements to let the Reader know that the image is associated with the below caption. Reader then positions the image alongside its caption.

Apple Watch diagram showing how it lays out figures and captions if appropriately marked up

You probably know that HTML5 greatly increased the number of different <input> types, for example <input type=”email”> on a mobile device shows a keyboard with the “@” symbol and “.” that are in all email addresses; <input type=”tel”> on a mobile device shows a numeric keypad.

On desktop browsers, where you have a physical keyboard, you may get different User Interface benefits, or built-in validation. You don’t need to build any of this; you simply choose the right semantic that best expresses the meaning of your content, and the browser will choose the best display, depending on the device it’s on.

In WatchOS, input types take up the whole watch screen, so choosing the correct one is highly desirable.

First, choose the appropriate type attribute and element tag for your form controls.
WebKit supports a variety of form control types including passwords, numeric and telephone fields, date, time, and select menus. Choosing the most relevant type attribute allows WebKit to present the most appropriate interface to handle user input.

Secondly, it’s important to note that unlike iOS and macOS, input methods on watchOS require full-screen interaction. Label your form controls or specify aria label or placeholder attributes to provide additional context in the status bar when a full-screen input view is presented.

Apple Watch showing different full-screen keyboards for different email types

I didn’t choose to use <article> and itemprop and input types because I wanted to support the Apple Watch; I did it before the Apple Watch existed, in order that my code is future-proof. By choosing the right semantics now, a machine that I don’t know about yet can understand my content and display it in the best way for its users. You are opting out of this if you only use <div> or <span> because, by definition, they have “no special meaning at all”.

Summary

I hope I’ve shown you that choosing the correct HTML isn’t purely an academic exercise. Perhaps you can’t use all the above (and there’s much more that I haven’t discussed) but when you can, please consider whether there’s an HTML element that you could use to describe parts of your content.

For instance, when building components that emit banners and logos, consider wrapping them in <header> rather than <div>. If your styling relies upon classnames, <header class=”header”> will work just as well as <div class=”header”>.

Semantic HTML will give usability benefits to many users, help to future-proof your work, potentially boost your search engine results, and help people with disabilities use your site.

And, best of all, thinking more about your HTML will stop Dull Old Web Farts like me moaning at you.

What’s not to love? Have a splendid holiday season, whatever you celebrate – and here’s to a semantic 2019!

More!

Editing the W3C HTML5 spec

On 3 November, the Queen and Uncle Timbo (Sir Tim Berners-Lee to you) came round to Lawson Towers to threaten me with a punch in the face and a karate chop if I didn’t co-edit the W3C HTML5 spec.

So I said yes — they look pretty frightening, I think you’ll agree.

Tim Berners Lee with fist raised, and the Queen making a karate chop gesture

There’s a pretty groovy and diverse team, including Patricia Aas who works on the Vivaldi browser; my friend and ex-Opera Devrel colleague, Shwetank Dixit who lives and works in India for Barrier Break, an accessibility organisation; ex-Opera chum Sangwhan Moon; long-haired digital trouble maker who does “Open Standards at GDS” (Government Digital Service), Terence Eden; Xiaoqian Wu of W3C and Steve Faulkner of The Paciello Group, another accessibility organisation.

I’m currently advising Wix Engineering on web standards. As an independent consultant, I’m not representing Wix, but obviously any relevant lessons we’ve learned by open-sourcing our Stylable CSS for Components pre-processor will be fed back to the W3C Working Groups.

A lot of people have asked me why there are two HTML specs – the Living Standard at WHATWG, and the HTML5.x specs at W3C. What’s the difference? And which should you use? The answer: it depends. (Well, this is web development, after all).

Spec implementors (primarily, browser vendors) implement from the WHATWG spec. The painstakingly algorithmic WHATWG document is exactly what we need so that browsers implement interoperably. But it’s hard for web authors (the million or so normal folk worldwide who write HTML to make websites) to get advice on how best to write good HTML from the WHATWG spec.

I’ve long used Steve Faulkner’s excellent guidance on HTML and ARIA, for example. The WHATWG spec is a future-facing document; lots of ideas are incubated there. The W3C spec is a snapshot of what works interoperably – authors who don’t care much about what may or may not be round the corner, but who need solid advice on what works now may find this spec easier to use.

For example, the WHATWG spec talks of the outlining algorithm, whereby a heading element such as <h1> changes its “level” depending on how it’s nested in <article>, <section> etc. I wish this actually worked, because it’s useful and elegant. However, as the W3C spec says “There are currently no known native implementations of the outline algorithm in graphical browsers or assistive technology user agents” and so advises “Authors should use heading rank (h1-h6) to convey document structure.”

I believe the latter spec is more useful to people writing websites today.

I’ve got lots of friends in WHATWG and believe it when I wrote that the strength of the web over last 7 years has been due to WHATWG. In my opinion, the work that WHATWG did saved the Web from irrelevance, while the W3C went meandering around XML-land instead. I’ve discussed this with some friends in WHATWG and W3C, too, to canvas opinion. They told me HTML needs a cute little mascot, so I’m stepping up to the plate.

I don’t want the W3C and WHATWG specs to diverge in substantive matters, and hope that useful authoring advice we produce can make its way into both specs.

Vive open standards!

On URLs in Progressive Web Apps

I’m writing this as a short commentary on Stuart Langridge’s post The Importance of URLs which you should read (he’s surprisingly clever, although he looks like the antichrist in that lewd hat).

Stuart says

I approve of the Lighthouse team’s idea that you don’t qualify as an add-to-home-screen-able app if you want a URL bar

Opera’s implementation of Progressive Web Apps differs from Chrome’s here (we only take the content layer of Chromium; we implement all the UI ourselves, precisely so we can do our own thing). Regardless of whether the developer has chosen display: standalone or display: fullscreen in order to hide the URL bar, Opera will display it if the app is served over HTTP because we think that the user should know exactly where she is if the app is served over an insecure connection. Similarly, if the user follows a link from your app that goes outside its domain, Opera spawns a new tab and forces display: browser so the URL bar is shown.

But I take Jeremy Keith’s point:

I want people to be able to copy URLs. I want people to be able to hack URLs. I’m not ashamed of my URLs …I’m downright proud.

One of the superpowers of the Web is URLs, and fullscreen progressive web apps hide them (deliberately). After our last PWA meeting with the Chrome team in early February, I was talking about just this with Andreas Bovens, the PM for Opera for Android. We mused about some mechanism (a new gesture?) that would allow the user to see and copy (if they want) the URL of the current page. I’ve already heard of examples when developers are making their own “share this” buttons — and devs re-implementing browser functionality is often a klaxon signalling something is missing from the platform.

When I mentioned our musings on Twitter this morning, Alex Russell said “we’ve been discussing the same.” It is, as Chrome chappie Owen Campbell-Moore said “a difficult UX problem indeed”, which is one reason that Andreas and I parked our discussion. One of Andreas’ ideas is long press on the current page, and then get an option to copy/share the URL of the page you’re currently viewing (this means that a long press is not available as an action for site owners to use on their sites. Probably not a big deal?)

What do you think? How can we best allow the user to see the current URL in a discoverable way?

Web Components, accessibility and the Priority of Constituencies

Gosh, what a snappy title. I’m not expecting a job offer from Buzzfeed any time soon.

Today, Apple sent their consolidated feedback on Web Components to the webapps Working Group. The TL;DR: they like the concept, are “considering significant implementation effort”, but want lots of changes first including removal of subclassing, eg <button is=”my-button”>.

I think this is bad; this construct means existing HTML elements can be progressively enhanced – in the example above, browsers that don’t support components or don’t support JavaScript get a functional HTML <button> element. It also means that, by enhancing existing HTML elements, your components get the default browser behaviour for free – so, in this example, your snazzy my-button element inherits focussability and activation with return or spacebar withut you having to muck about with tabindex or keyboard listeners. (I wrote about this in more detail last year in On the accessibility of web components. Again.)

Apple raised a bug Remove the support for inherting from builtin subclasses of HTMLElement and SVGElement and notes “without this hack accessibility for trivial components is harder as more things have to be done by hand” (why “this hack”? A loaded term). However, it calls for removal because “Subclassing existing elements is hard as implementation-wise identity is both object-based and name / namespace based.”

Implementation is hard. Too hard for the developers at Apple, it appears. So Web developers must faff around adding ARIA and tabindex and keyboard listeners (so most won’t) and the inevitable consequence of making accessibility hard is that assistive technology users will suffer.

HTML has a series of design principles, co-edited by Maciej Stachowiak who sent Apple’s feedback. One of those is called “Priority of Constituencies” which says

In case of conflict, consider users over authors over implementors over specifiers over theoretical purity. In other words costs or difficulties to the user should be given more weight than costs to authors; which in turn should be given more weight than costs to implementors; which should be given more weight than costs to authors of the spec itself, which should be given more weight than those proposing changes for theoretical reasons alone.

Fine words. What changed?

Why we can’t do real responsive images with CSS or JavaScript

I’m writing a talk on <picture>, srcset and friends for Awwwards Conference in Barcelona next month (yes, I know this is unparalleled early preparation; I’m heading for the sunshine for 2 weeks soon). I decided that, before I get on to the main subject, I should address the question “why all this complex new markup? Why not just use CSS or JavaScript?” because it’s invariably asked.

But you might not be able to see me in Catalonia to find out, because tickets are nearly sold out. So here’s the answer.

All browsers have what’s called a preloader. As the browser is munching through the HTML – before it’s even started to construct a DOM – the preloader sees “<img>” and rushes off to fetch the resource before it’s even thought about speculating about considering doing anything about the CSS or JavaScript.

It does this to get images as fast as it can – after all, they can often be pretty big and are one of the things that boosts the perceived performance of a page dramatically. Steve Souders, head honcho of Velocity Conference, bloke who knows loads about site speed, and renowned poet called the preloader “the single biggest performance improvement browsers have ever made” in his sonnet “Shall I compare thee to a summer’s preloader, bae?”

So, by the time the browser gets around to dealing with CSS or script, it may very well have already grabbed an image – or at least downloaded a fair bit. If you try

<img id=thingy src=picture.png alt="a mankini">
…
@media all and (max-width:600px) {
 #thingy {content: url(medium-res.png);}
 }

@media all and (max-width:320px) {
 #thingy {content: url(low-res.png);}
 }

you’ll find the correct image is selected by the media query (assuming your browser supports content on simple selectors without :before or :after pseudo-elements) but you’ll find that the preloader has downloaded the resource pointed to by the <img src> and then the one that the CSS replaces it with is downloaded, too. So you get a double download which is not what you want at all.

Alternatively, you could have an <img> with no src attribute, and then add it in with JavaScript – but then you’re fetching the resource until much later, delaying the loading of the page. Because your browser won’t know the width and height of the image that the JS will select, it can’t leave room for it when laying out the page so you may find that your page gets reflowed and, if the user was reading some textual content, she might find the stuff she’s reading scrolls off the page.

So the only way to beat the preloader is to put all the potential image sources in the HTML and give the browser all the information it needs to make the selection there, too. That’s what the w and x descriptors in srcset are for, and the sizes attribute.

Of course, I’ll explain it with far more panache and mohawk in Barcelona. So why not come along? Go on, you know you want to and I really want to see you again. Because I love you.

Review: JavaScript & JQuery by Jon Duckett

Disclosure stuff: I was sent a free copy of this by the publishers. From 2000-2002 I worked with its author. I currently work with Mathias Bynens, the book’s technical reviewer (but didn’t know this until after reading it).

Don’t tell anyone, but I’m not very confident with my JavaScript – I learned to program in COBOL, FORTRAN, BASIC, 6502, Z80 and DCL which are all procedural, so I was looking forwards to a book that starts at the very beginning and treats me kindly, before clubbing me over the head because I polluted the global namespace and laughing at me for extending my prototypical object literal constructors into my callbacks.

The book’s blurb says “We’ll not only show you how to read and write JavaScript, but we’ll also teach you the basics of computer programming in a simple, visual way” and is the follow-up to Duckett’s hugely successful HTML and CSS Book which sold over 150,000 copies. (In tech book terms, that’s practically Harry Potter; in this industry, 5,000 is a successful book.)

The book looks beautiful. High quality paper, colour images, with real care and attention lavished on the layout and the words. I’m no quivering aethete designer, but I found it pleasurable to read even though it’s a weighty 600 page tome. Each page (or spread) is its own discrete infolump so it’s easy to out down and come back to.

It starts light – defining events, objects, methods and properties, showing the relationship between HTML, CSS and JS, and with a section on Progressive Enhancement (hoorah). However, I was slightly peturbed that the first worked example uses document.write. I can see why you’d do this – it allows you to show something, but without having to muck about appending to the DOM or using getElementById and innerHTML but it didn’t feel particularly good practice (especially as getElementById and innerHTML are introduced soon after, anyway.) In the author’s defence, he does note that this is Considered Naughty.

Elsewhere, we see lots of workarounds and IE-specific aspects of the DOM. I’m comfortable with these being there; we have to live in the real world, and I think that a book that ignores this does a disservice to its readers – it’s right to equip someone to make pages work on IE.wtf or understand what’s happening in older/ inherited scripts.

The book moves briskly after the traditional introduction to loops, variables and other syntax. By page 270 we’re looking at event listeners, including IE5-8, event delegation, mutation events (with a note that mutation observers are coming, but no more than that.)

Chapter 7 begins with jQuery. Again, there are times when jQuery is entirely appropriate. What’s good is that this book teaches JS concepts first, and always keeps the two separate. (I get tired of “JS” tutorials that are actually about jQuery.)

The rest of the book romps through “HTML5” APIs, JSON, common UI widgets and – usefully – debugging. Attention is paid to pointing out what’s standard and what isn’t, what’s vanilla JS and what isn’t. Progressive enhancement, accessibility and separation of concerns is are kept in mind throughout. This is good. You can see the table of contents.

When I’m reading, I often don’t care if I’m reading a paper book or a Kindle. But in this case, I was glad I was reading the full-colour, attractive book. I’ve railed about the shoddy quality and general unattractiveness of most computer books before. When so much information is available on the web, publishers must provide some sort of extra value. This book has it – the information is thought-through, beautifully presented and clearly explained. While I guess that the majority of my regular readers won’t need an explanation of what a loop or variable is, I believe this would be an excellent book for someone wanting to start with JavaScript, and learn it well. It doesn’t cover Promises, Mutation Observers, but I don’t really think they’re right for a beginners’ book, anyway.

Oh – and it made me realise that I’m not nearly as crap at JavaScript as I thought I was. Which is nice.

Ten years of WHATWG

A little over ten years ago, on 4 June 2004, Opera employee Ian “Hixie” Hickson sent a mail titled WHAT open mailing list announcement announcing

a loose, unofficial, and open collaboration of Web browser manufacturers and interested parties. The group aims to develop specifications based on HTML and related technologies to ease the deployment of interoperable Web Applications

Ten years is a long time, especially so in software, but nevertheless, the achievements of WHATWG have been remarkable. Hixie wrote

The working group intends to ensure that all its specifications address backwards compatibility concerns, clearly provide reasonable transition strategies for authors, and specify error handling behavior to ensure interoperability even in the face of documents that do not comply to the letter of the specifications.

Core aspects of the web platform were never adequately specified. XMLHttpRequest, for example, was shipped in IE5 in March 1999, and reverse-engineered and shipped in Firefox, Opera, Safari and iCab, but never actually documented until Anne van Kesteren co-specified it in WHATWG in a Working Draft of 5 April 2006. Anne’s currently working on the Fetch Standard, which defines something as basic as “requests, responses, and the process that binds them” and the Encoding Standard:

While encodings have been defined to some extent, implementations have not always implemented them in the same way, have not always used the same labels, and often differ in dealing with undefined and former proprietary areas of encodings. This specification attempts to fill those gaps so that new implementations do not have to reverse engineer encoding implementations of the market leaders and existing implementations can converge.

Of course, the poster children of WHATWG are the slew of new APIs that “HTML5” brings us – Web Workers, Web Sockets, native video and audio etc etc. There have been mistakes along the way (of course there have, in a decade!). Last year, Hixie told me

My biggest mistake…there are so many to choose from! pushState() is my favourite mistake, for the sheer silliness of ending up with an API that has a useless argument and being forced to keep it because the feature was so desired that people used it on major sites before we were ready to call it done, preventing us from changing lest we break it. postMessage()‘s security model is so poorly designed that it’s had academic papers written about how dumb I was, so that’s a pretty big mistake. (It’s possible to use postMessage() safely. It’s just that the easiest thing to do is not the safe way, so people get it wrong all the time.) The appcache API is another big mistake. It’s the best example of not understanding the problem before designing a solution, and I’m still trying to fix that mess.

But to me, the biggest triumph of WHATWG has been error-handling and interoperability (actually, two sides of the same coin). We’ve moved from a vision of the future where everything was supposed to be XML and browsers were to stop parsing if they met malformed markup, to a present where every browser knows how to construct an identical DOM from the most mangled/tangled HTML. We’re moving to a world where interoperability is paramount, and where specifications are made in the open, in constant consultation with developers (for example, Service Workers, Web Components) based on real use-cases.

I think the existence and the work of WHATWG has secured the viability of the web platform. Happy 10th birthday. And thanks.