[Skip to the Main Content]

HTML and XHTML are Identical in HTML5

I was explaining the fool-proof simplicity of Web Applications 1.0 (or, HTML5) to a colleague the other day. It was an explanation that HTML5 is predicated on Media Type rather than Document Type Definition (DTD). XHTML5, too.

HTML 4.01 with <meta http-equiv="content-type" content="text/html"> and XHTML 1.0 with <meta http-equiv="content-type" content="text/html"> are HTML 5.

Web Applications 1.0 states it.

The first such concrete syntax is “HTML5”. This is the format recommended for most authors. It is compatible with all legacy Web browsers. If a document is transmitted with the MIME type text/html, then it will be processed as an “HTML5” document by Web browsers.

A schematic was drawn.

HTML5 Schematic

And, the explanation went something like this.

XHTML Media Types [W3C Note 1 August 2002], 3. Recommended Media Type Usage, 3.1. 'text/html' states,

The 'text/html' media type [RFC2854] is primarily for HTML, not for XHTML. In general, this media type is not suitable for XHTML. However, as [RFC2854] says, [XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01 and which may also be labeled as text/html.

[XHTML1], Appendix C "HTML Compatibility Guidelines" summarizes design guidelines for authors who wish their XHTML documents to render on existing HTML user agents. The use of 'text/html' for XHTML should be limited for the purpose of rendering on existing HTML user agents, and should be limited to [XHTML1] documents which follow the HTML Compatibility Guidelines. In particular, 'text/html' is not suitable for XHTML Family document types that adds elements and attributes from foreign namespaces, such as XHTML+MathML [XHTML+MathML].

XHTML documents served as 'text/html' will not be processed as XML [XML10], e.g. well-formedness errors may not be detected by user agents. Also be aware that HTML rules will be applied for DOM and style sheets (see C.11 and C13 of [XHTML1] respectively).

And, since, Media Types are included in the Content Type/Character Set declaration, e.g., <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">, we can find order of precedence when declared by reviewing,

HTML 4.01 Specification, 5 HTML Document Representation, 5.2 Character encodings, 5.2.2 Specifying the character encoding

To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):

  1. An HTTP "charset" parameter in a "Content-Type" field.
  2. A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
  3. The charset attribute set on an element that designates an external resource.

The server-side configuration takes precedence over the declaration in a site’s META element. [Note: Have you had ever a character encoding mismatch error given by the Markup Validation Service? It’s precedence (or, priority).]

What effect does this have on writing well-formed HTML 4.01 <meta http-equiv="content-type" content="text/html"> or well-formed XHTML 1.0 <meta http-equiv="content-type" content="text/html">? Nothing. Absolutely nothing. Because it’s HTML5 when it’s <meta http-equiv="content-type" content="text/html">. That's exceedingly simple.

In theory, the HTML5 specification could be written so that in the future User Agents (or, browsers) ignore DTDs and work from MIME/Media/Content Types. I believe Mr. van Kesteren mentioned that.

Sean Fraser posted this on January 19, 2007 07:03 AM.

  • Add to Technorati Favorites
  • de.licio.us: http://www.elementary-group-standards.com/html/html-and-xhtml-are-identical-in-html5.html
  • furl: http://www.elementary-group-standards.com/html/html-and-xhtml-are-identical-in-html5.html
  • reddit: http://www.elementary-group-standards.com/html/html-and-xhtml-are-identical-in-html5.html


Devon wrote this at January 19, 2007 07:06 PM

The question I have is, how should/will browsers interpret it? As quirks mode or standards compliant mode? That will probably will still depend on the DTD. That's the problem nobody's fixing as far as I see.

Comment Author Gravatar
Sean Fraser wrote this at January 19, 2007 08:30 PM

Devon: It has been fixed. It has a DocType with no definition, i.e., <!DOCTYPE html>. That's it. That DocType declaration — Currently — triggers standards-compliant mode in browsers. [The next article will explain it in greater detail.]

However, since the W3C HTML WG doesn't seem to appreciate the work being conducted and performed by the WHAT WG, that all could change. [And that will be addressed in a summary article.]

Comment Here

Reply guidelines: Basic HTML (a href, p, code, blockquote, dl, dt, dd, ul, ol, li, cite and q) are allowed. Line breaks and paragraphs are automated.

Inappropriate, unwarranted or self-aggrandizemented comments may suffer redaction. Or, deletion.

[Note: A gravatar, or globally recognized avatar, is that small image in the comments. Gravatar sets-up them.]

The Elementary Standards: A Compendium of Web Standards, CSS, Linguistics and Search Engine Optimization methodology Copyright ©2005-2007 Sean Fraser. All work is published under a Creative Commons License. All Rights Reserved.

Palm trees on a grassy field in Hawai’i

Main Content Returns thus