I was explaining the fool-proof simplicity of Web Applications 1.0 (or, HTML5) to a colleague the other day. It was an explanation that HTML5 is predicated on Media Type rather than Document Type Definition (DTD).
HTML 4.01 with <meta http-equiv="content-type" content="text/html"> and XHTML 1.0 with <meta http-equiv="content-type" content="text/html"> are HTML 5.
Web Applications 1.0 states it.
The first such concrete syntax is “HTML5”. This is the format recommended for most authors. It is compatible with all legacy Web browsers. If a document is transmitted with the MIME type
text/html, then it will be processed as an “HTML5” document by Web browsers.
A schematic was drawn.

And, the explanation went something like this.
XHTML Media Types [W3C Note 1 August 2002], 3. Recommended Media Type Usage, 3.1. 'text/html' states,
The '
text/html' media type [RFC2854] is primarily for HTML, not for XHTML. In general, this media type is not suitable for XHTML. However, as [RFC2854] says,[XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01 and which may also be labeled astext/html.[XHTML1], Appendix C "HTML Compatibility Guidelines" summarizes design guidelines for authors who wish their XHTML documents to render on existing HTML user agents. The use of '
text/html' for XHTML should be limited for the purpose of rendering on existing HTML user agents, and should be limited to [XHTML1] documents which follow the HTML Compatibility Guidelines. In particular, 'text/html' is not suitable for XHTML Family document types that adds elements and attributes from foreign namespaces, such asXHTML+MathML[XHTML+MathML].XHTML documents served as '
text/html' will not be processed as XML [XML10], e.g. well-formedness errors may not be detected by user agents. Also be aware that HTML rules will be applied for DOM and style sheets (see C.11 and C13 of [XHTML1] respectively).
And, since, Media Types are included in the Content Type/Character Set declaration, e.g., <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">, we can find order of precedence when declared by reviewing,
HTML 4.01 Specification, 5 HTML Document Representation, 5.2 Character encodings, 5.2.2 Specifying the character encoding
To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):
- An HTTP "charset" parameter in a "Content-Type" field.
- A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
- The charset attribute set on an element that designates an external resource.
The server-side configuration takes precedence over the declaration in a site’s META element. [Note: Have you had ever a character encoding mismatch error given by the Markup Validation Service? It’s precedence (or, priority).]
What effect does this have on writing well-formed HTML 4.01 <meta http-equiv="content-type" content="text/html"> or well-formed XHTML 1.0 <meta http-equiv="content-type" content="text/html">? Nothing. Absolutely nothing. Because it’s HTML5 when it’s <meta http-equiv="content-type" content="text/html">. That's exceedingly simple.
In theory, the HTML5 specification could be written so that in the future User Agents (or, browsers) ignore DTDs and work from MIME/Media/Content Types. I believe Mr. van Kesteren mentioned that.
[Published date: 19 January 2007]

