A Minimal HTML5 Document

There seems to be confusion about the minimal set of elements that make a valid HTML5 page.

(Amended on prompting from Tab Atkins and Mathias Bynens in comments below.)

The simplest valid document is


<!doctype html>

The title element is required in most situations, but when a higher-level protocol provides title information, e.g. in the Subject line of an e-mail when HTML is used as an e-mail authoring format, the title element can be omitted.

Assuming you’re writing a web page rather than an HTML email, you need the title element, although technically it can be blank.


<!doctype html>
<title></title>

However, you shouldn’t do that. Failure to specify a character encoding which can introduce an obscure but real security vulnerability. So, the simplest valid and secure document looks like this:


<!doctype html>
<meta charset=utf-8>
<title>blah</title>
<p>I'm the content

(You don’t actually need the content, of course, but it’s a pretty rubbish web page without it, and an empty title isn’t much good.)

However, for accessibility reasons, you should declare the natural language of the document (English/ French/ Swahili) on the html element, which therefore means you need that element (note that you don’t need to close it, though):


<!doctype html>
<html lang=en>
<meta charset=utf-8>
<title>blah</title>
<p>I'm the content

If you’re planning to use AppCache to enable offline applications, you’ll need the html element as the manifest attribute goes there.

Internet Explorer 9 Developer Preview 3 and its antecedents can’t apply CSS to new HTML5 elements without a body element. (Try it without body and with body.)

So if you’re attempting to do that, the smallest valid, secure, screenreader-accessible and stylable-in-IE HTML5 page you can have is


<!doctype html>
<html lang=en>
<meta charset=utf-8>
<title>blah</title>
<body>
<p>I'm the content

Just because you can do this doesn’t mean you should, of course. Depending on your colleagues, it could be confusing and thus a maintainability nightmare.

I use the head element, and close those tags that need closing (although I don’t bother with trailing slashes on self-closing elements).

So the minimal valid, secure, screenreader-accessible and stylable-in-IE HTML5 page (not email) that it easily readable and maintainable (subjective, of course) is probably


<!doctype html>
<html lang=en>
<head>
<meta charset=utf-8>
<title>blah</title>
</head>
<body>
<p>I'm the content</p>
</body>
</html>

Enjoy.

(PS, I co-wrote a book!)

38 Responses to “ A Minimal HTML5 Document ”

Comment by John

I guess I’m old-old-skool then (from back before tags started being self-closed in XHTML).

I don’t think it matters either way, but I like to leave off the ” /” at the end as it means less code to look and scroll through, plus I think it looks neater.

Good mention about the tag though, what’s up with that? Is it really not needed anymore??

Comment by Adrian Higginbotham

what happens if you don’t specify the language? does it default to Browser language or some such? a means of dropping to OS or Browser lang might be preferable in some cases. an incorrect lang is much much worse than non at all. many’s the time that my screenreader switches to reading english in a french, germa, or other accent because it believes what it’s reading is the language specified in the page rather than that which it actually is. of course of Devs didn’t allow incorrect languages to be set it would never happen but they do, so it does.

Comment by Chris Heilmann

This is such a step back we might as well hunch over and grunt again. The whole concept of a HTML document as a parseable document regardless of software that checks it is out of the window that way again.

How would this continue? Do I have to go through the file byte by byte to find the next opening tag bracket?

In terms of maintainability this is just moronic.

Comment by Bruce

Christian and Alastair,

to clarify: I use the head element, and close those tags that need closing (although I don’t bother with the trailing slashes).

I’m not advocating this as best practice, merely correcting misconceptions about what’s actually required by the language, what’s required by browsers and what’s optional.

Adding a caveat to the article.

Comment by Bruce

Adrian

I don’t know what happens with screenreaders if the lang attribute isn’t present. I’d imagine it would use the language that you’ve set the browser as, but that’s out of scope for the markup language.

Comment by Tab Atkins

The <meta charset> isn’t required for a valid document. If you’re transmitting the charset of the page via your response headers, or just sticking with pure ascii-range characters, there’s no need for it.

The actual bare-bones minimal valid page is
<!DOCTYPE html><title></title>. 30 bytes.

Comment by AlastairC

Ha ha! Now my first comment makes me look like a git :-)

I got the intent, and I didn’t mean to say you’d missed something, but it might be good to link to (or show) a *good* base template. Your Google-juice is powerful (young Jedi), I can see many people looking for a starting HTML5 template landing here.

Comment by Rimantas

@Christian:

this is nothing new. I’ve been playing with minimal valid HTML4.01 documents years ago, and that’s mostly the same.
And there is nothing moronic in it.

Comment by patrick h. lauke

@christian

but the parsing algorithm for how to unambiguously interpret the markup into the same DOM across all browsers is documented in painfully dull detail (as opposed to what happened previously, which was the source of incompatibilities with non-validating markup giving funky results in different browsers). so any software that wants to digest html5 can do so by implementing the algorithm and be guaranteed a sane result, no?

Comment by AlastairC

There seem to be two issues here:
- What should be the minimum for a valid document, and
- Whether closing tags should be required.

The post was about the former, but we’re reacting to the later. (Sorry Bruce)

Although I would assume Patrick is correct (the algorithm is defined), it definitely feels like a step back, as it allows for less ordered (orderly?) markup.

It’s fair enough the spec defines how things work now, but shouldn’t we be promoting well-formed markup? Then in future the bar to create a browser isn’t quite so high.

Also, it’s not just browsers; other things consume HTML (e.g. YQL, WYSIWYG editors, assistive technology), and allowing for ill-formed markup makes their job harder.

Comment by mattur

> “shouldn’t we be promoting well-formed markup”

No, speed is far more important for most websites.

Comment by markc

Speed! That is the point of well formed markup, so browsers can reliably render an incoming stream in a single pass using an XML compliant engine. Ill-formed markup leads to multi-pass quirks mode. That was the whole point of the XHTML “fad” a decade ago and now you want to go back to the rubbish that existed before that!

Comment by Bruce

Alastair said:

There seem to be two issues here:
- What should be the minimum for a valid document, and
- Whether closing tags should be required.

The post was about the former, but we’re reacting to the later. (Sorry Bruce)

No problem; I deliberately put non-closed elements in there as I figured people would be incensed.

Let’s have it out.

“other things consume HTML (e.g. YQL, WYSIWYG editors, assistive technology)”

Is there any evidence that well-formed XML is better for these? Given that 95% of the Web doesn’t validate, it seems to me that any crawler would have to parse HTML rather than XHTML.

Comment by Bruce

@tabatkins

Thanks. Brainfart from me r/e the character encoding. (I get it right in our book, though). Amended the article.

@mathias

Amended the article – didn’t know about title being optional!

Presumably, the smallest valid XHTML5 doctype is longer, as although DOCTYPE isn’t required, you’d need an opening <html xmlns="http://www.w3.org/1999/xhtml"> and its closing tag.

Comment by patrick h. lauke

@alastairc

“shouldn’t we be promoting well-formed markup?”

on what grounds, though? is there any advantage, other than readability of the source?

“Then in future the bar to create a browser isn’t quite so high.”

the bar is leveled and low for anybody wishing to create their own browser now, as – compared to the partially grey and wooly areas of html 4 parsing and particularly error correction – the algorithm is clearly defined, fully documented, and freely available without the need to reverse-engineer how some browsers cope with things (and even available, from what i remember, as actual running code examples etc in different languages).

Comment by AlastairC

Patrick, are you really saying that it’s as easy to parse (and build a parser for) HTML which doesn’t close tags? (Or put attributes in quotes as related aspect)

It isn’t just browsers, the advantages (apart from readability of source, which is a valid one), are that it is easier for other parsers as well. As I said above, YQL, editors, assistive tech etc.

Isn’t that why the microformat guys insisted on XHTML?

Comment by Bruce

@Alastair

“other things consume HTML (e.g. YQL, WYSIWYG editors, assistive technology)”

out of interest, is there any evidence that well-formed (eg, closing-tagged) code is better for these?

Given that 95% of the Web doesn’t validate, it seems to me that any crawler would have to parse invalid HTML as well as well-formed XHTML, simply because the overwhelming majority of the Web is the former rather than the latter.

Comment by AlastairC

There is a big difference between something that parses HTML for content, and something that renders it. The level of complexity of a parser is (and should be) much less than a browser, it doesn’t have to worry about CSS/JS for starters.

Browsers have to render dodgy content, because it’s reporting directly to a user, but a parser shouldn’t have to know such complex rules. What you’re talking about is essentially like using regex on HTML.

Christian might be able to speak to YQL, but WYSIWYG editors tend to convert the different forms of tag soup (from each browser that does contentEdtiable) to XHTML. The server-side microformat processors I’ve used rely on a valid (XHTML) DOM.

Parsing content out of the browser context is hard, things like beautifulsoup.py exist because it’s so hard to get right.

Logically, a well-formed DOM should be easier to read than tag soup, in the same way that this JavaScript is ambiguous:
if ( a === 1 )
b = 2;
c = 3;

(I hope that comes out ok!)

I had thought the HTML group was creating this compendium of current behaviours (the spec) because it was necessary. I did not think that authors should be encouraged to use poorly formed markup.

Well specified tag soup is still tag soup.

Comment by Alia

Letter to some people
If you like to get angry about this completely correct and accessible document, than an obsessive–compulsive disorder might also be fun for you.
Eat shi… XHTML!

Comment by Alia

And by the way: I really love it when it’s not announced what will happen to my input.

<!DOCTYPE html>
<html lang=en>
<meta http-equiv=content-type content=”text/html;charset=utf-8″>
<title>Letter to some people</title>
<p>If you like to get angry about this completely correct and accessible document, than an obsessive–compulsive disorder might also be fun for you.
<p>Eat shi… XHTML!

Comment by bruce

Alia – point taken. I’ve added a list of allowed tags and a note to escape code meant for display.

You could have shortened your character encoding meta tag to meta charset=utf-8.

Comment by mattur

@markc: The “whole point” of XHTML was to be “ready for the future”, for wildly improbable values of “future”.

Parser speed is not a performance bottleneck. Download speed is.

Comment by Tab Atkins

Oh, well if we’re going that far and allowing special-purpose documents like emails in this competition, then the shortest HTML5 document is “” (0 bytes), which is a valid input to @srcdoc.

I’d like to see anyone produce a valid negative-length HTML5 document!

Comment by Tab Atkins

Also, response to markc, comment 14:

Speed! That is the point of well formed markup, so browsers can reliably render an incoming stream in a single pass using an XML compliant engine. Ill-formed markup leads to multi-pass quirks mode. That was the whole point of the XHTML “fad” a decade ago and now you want to go back to the rubbish that existed before that!

I can assure you that parsing XML is, in fact, generally slower than parsing HTML. “XML is faster” is a persistent myth. Just ask any browser dev who’s worked on their browser’s parsing engine.

Comment by LJ

Bruce, I heard that you are able to have a valid website without using . I know no wouldn’t for a huge website. But for a simple blog is the really necessary?

Comment by Simon H

“is there any advantage [to well-formed markup] other than readability of the source?”

Given that if you do this for a living, you’ll always spend significantly more time maintaining code than writing it, isn’t that enough?

Comment by Alia

Bruce, I would love to use @charset, but Lynx doesn’t understand it. And accessibility is my personal obsessive–compulsive disorder. (But I always throw the first stone.)

And I am sorry for being so rude about the missing info about your transforms on replies – maybe I have to less other things to care for. But I really like it now.

Comment by Bruce

@SimonH

- readability of the source is very important. (And completely subjective; I find 6502 Assembler quite readable as it was my first programming language)

@Alia

Interesting; I woudn’t even know how to get hold of a copy of Lynx (the real one, not the emulator).

Comment by Chris Heilmann

Interesting discussion. To me the most valid point is readability and maintainability of my code. If I rely on a browser engine to make something useful out of this then that doesn’t sound safe to me – I have been fooled by them far too often.

How does this “document” look in a text editor with colour coding? Can I collapse parts of it when I don’t want to be distracted by them? I edit HTML code – if speed is a real concern I write a build script that concatenates, minifies and changes the code to live code. If people really think that a few closing P tags would make their page slower they have not understood gzip on the server.

We’re developers and should be allowed to write and maintain code that is predictable and follows a clean convention. The last example above is totally fine by me and this is how I write my HTML5 except that I put ” around the attributes as that helps my colour coding, too.

The whole argument that “less code is better” leads to unmaintainable code. If you want to do speedcoding, enter a 64k intro contest in the demo scene – don’t expect future maintainers to be as excited as you are about the things you do as they will not read up why your code is so short and just add random stuff at the end of it. Want proof of that? Compare any CSS document after it went through a few rounds of maintenance.

Comment by Kees

From your article:

“””The title element is required in most situations, but when a higher-level protocol provides title information, e.g. in the Subject line of an e-mail when HTML is used as an e-mail authoring format, the title element can be omitted.

Assuming you’re writing a web page rather than an HTML email, you need the title element, although technically it can be blank.”””

That is odd, if you look at the WhatWG specs for the title element itself it says something different:

If it’s reasonable for the Document to have no title, then the title element is probably not required. See the head element’s content model for a description of when the element is required.

Okay that’s just a little bit ambiguous, and in fact I misread it at first to mean the title element is pretty much optional as long as you can make a reasonable argument for the document to have no (need for a) title. But the description in the head element‘s content model (that you linked the changelog of) is pretty strict about when it is required or not.

When in doubt, the stricter option is probably the way to go, but it could have been worded more clearly IMO. However the next bit flat-out contradicts you, also from the WhatWG title element section:

The title element must not be empty.

So in your example, your title element actually isn’t empty, but for the wrong reasons :) It’s actually not allowed to be empty.

Sorry if this is nit-picking, but that’s kind of the name of the game when web standards are concerned, heh ;-)