[Skip to the Main Content]

They aren’t HTML5 docs in the first place

How things change. These days, I have ceased considering what makes a document conforming or nonconforming by its Document Type Declaration. Documents are conforming or documents are nonconforming. Occasionally, they’re each. A conforming HTML 4.01 document may be a conforming XHTML 1.0 document. Conversly, a nonconforming XHTML 1.0 document may be a conforming HTML5 document.

Ian Hickson offered the following in reply [June 17, 2007] to an W3C HTML WG message, Allow other doctypes.

Conforming HTML4 and XHTML1 docs will not become non-conforming HTML4 and XHTML1 docs. They'll remain conforming HTML4 and XHTML1 docs. They won't be conforming HTML5 docs because they aren't HTML5 docs in the first place. I don't see this as a problem.

HTML 4.01 or XHTML 1.0 document content which is found nonconforming with the (X)HTML5 Conformance Checking Service may be made HTML5 conforming and—after DocType replacement—pass. Presently, conforming HTML5 documents that have not included undefined W3C elements, e.g., <header> will pass W3C Markup Validation (except for the various Document Type Declaration and Charater Set requirements).

I thought about what constitutes failure and acceptance between HTML 4.01, XHTML 1.0 and HTML5. It’s content.

I did some simple test cases.

HTML 4.01/Strict Passed W3C Markup Validation

The actual markup,


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>HTML5 - HTML 4.01 Strict Passes W3C Markup Validation</title>
</head>
<body>
<p>This is a sentence with an XHTML <code><br /></code> in a <br />paragraph.</p>
<p>This is a sentence with an image <img src="http://www.elementary-group-standards.com/images/elementary-theory-rosette.jpg" >
     alt="Elementary Rosette" />  with an XHTML <code><img /></code> in a paragraph.</p>
</body>
</html>

Result for http://www.elementary-group-standards.com/formaldehyde/html5-html-pass.html - W3C Markup Validator

XHTML 1.0/Strict Passed W3C Markup Validation

The actual markup,


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>HTML5 - XHTML 1.0 Strict Passes Markup W3C Validation</title>
</head>
<body>
<p>This is a sentence with an XHTML <code><br /></code> in an <br />paragraph.</p>
<p>This is a sentence with an image <img src="http://www.elementary-group-standards.com/images/elementary-theory-rosette.jpg" >
     alt="Elementary Rosette" /> with an XHTML <code><img /></code> in a paragraph.</p>
</body>
</html>

Result for http://www.elementary-group-standards.com/formaldehyde/html5-xhtml-pass.html - W3C Markup Validator

HTML5 Passed (X)HTML5 Conformance Checker

The actual markup,


<!DOCTYPE html>
<html>
<head>
<title>HTML5 - HTML5 Passes HTML5 Conformance Checker</title>
</head>
<body>
<p>This is a sentence with an XHTML <code><br /></code> in a <br />paragraph.</p>
<p>This is a sentence with an image <img src="http://www.elementary-group-standards.com/images/elementary-theory-rosette.jpg" >
     alt="Elementary Rosette" />  with an XHTML <code><img /></code> in a paragraph.</p>
</body>
</html>

(X)HTML5 conformance checking results for http://www.elementary-group-standards.com/formaldehyde/html5-html5-pass.html

XHTML well-formed self-closing elements are acceptable in HTML 4.01 (as illustrated in the first test case). However, some things are not what they seem.

HTML 4.01/Strict (Second Test) Failed W3C Markup Validation

The actual markup,


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>HTML5 - HTML 4.01 Strict Fails W3C Markup Validation</title>
</head>
<body>
<p>The Content-Type is <code><meta http-equiv="content-type" content="text/html; charset=utf-8" /></code></p>
<p>This is a sentence with an XHTML <code><br /></code> in an <br />paragraph.</p>
<p>This is a sentence with an image <img src="http://www.elementary-group-standards.com/images/elementary-theory-rosette.jpg" >
     alt="Elementary Rosette" />  with an XHTML <code><img /></code> in a paragraph.</p>
</body>
</html>

Result for http://www.elementary-group-standards.com/formaldehyde/html5-html-fail.html - W3C Markup Validator

So. XHTML well-formed self-closing elements in the meta:elements are not acceptable in HTML 4.01. That’s odd.

XHTML 1.0/Strict (Second Test) Failed W3C Markup Validation

The actual markup,


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>HTML5 - XHTML 1.0 Strict (Second Test) Fails W3C Markup Validation</title>
</head>
<body>
<p>This is a sentence with an HTML <code><br></code> in a <br> paragraph.</p>
<p>This is a sentence with an image <img src="http://www.elementary-group-standards.com/images/elementary-theory-rosette.jpg" >
     alt="Elementary Rosette" />  with an XHTML <code><img /></code> in a paragraph.</p>
</body>
</html>

Result for http://www.elementary-group-standards.com/formaldehyde/html5-xhtml-second-fail.html - W3C Markup Validator

And, HTML empty elements are not acceptable in XHTML 1.0. That makes sense. XHTML well-formedness requirements require empty elements to be self-closed whereas the HTML 4.01 specification makes no mention of well-formedness which thereby allows W3C Quality Assurance to have corrected their Validation Service so that at this moment it accepts self-closed XHTML elements in HTML 4.01. However, it passes (X)HTML Conformance Checker.

The above test cases are very simple. Still. They illustrate that it is possible to have content which meets HTML 4.01, XHTML 1.0 and HTML5 validation reqirements; and, when the appropriate DocTypes are used, one has three conforming documents. However, for all this theory, it appears HTML5 has an interesting present-day Conformance Checker loophole: some nonconforming XHTML 1.0 will pass as HTML5 whereas nonconforming HTML 4.01 shall not.

And, HTML 4.01 Markup Validation accepts XHTML well-formed empty elements.

Things are changing.


Sean Fraser posted this on June 17, 2007 05:44 PM.

  • Add to Technorati Favorites
  • de.licio.us: http://www.elementary-group-standards.com/html/html5-documents-are-not.html
  • furl: http://www.elementary-group-standards.com/html/html5-documents-are-not.html
  • reddit: http://www.elementary-group-standards.com/html/html5-documents-are-not.html

Comments

Thomas Broyer wrote this at July 26, 2007 03:03 AM

There's a misunderstanding of the difference in results from html5-html-pass and html5-html-fail:

The former passes but does not mean what you think. Read carefully the error description from the latter. Actually, in SGML, it seems (I don't know SGML subtilities) that the / (solidus) closes the tag, and the > (less-than sign) following it is then part of the textual content. While it's not a problem with a paragraph, it becomes one in the "head" because textual content is not allowed there.

So: while the former passes validation, extracting textual content would prove you're wrong assuming "XHTML well-formed self-closing elements are acceptable in HTML 4.01". It's not a problem with the "meta" element but really with using "/>" in start-tags.


Comment Author Gravatar
Sean Fraser wrote this at July 26, 2007 07:53 PM
Thomas:

Thank you for the clarification. It's in the fine print, isn't it. I remain hopeful that once the technical aspects of the specification have been clarified into acceptance, common language explanations will follow. I look forward to you daily W3C HTML WG mailing list write-ups.


Comment Here

Reply guidelines: Basic HTML (a href, p, code, blockquote, dl, dt, dd, ul, ol, li, cite and q) are allowed. Line breaks and paragraphs are automated.


Inappropriate, unwarranted or self-aggrandizemented comments may suffer redaction. Or, deletion.

[Note: A gravatar, or globally recognized avatar, is that small image in the comments. Gravatar sets-up them.]

The Elementary Standards: A Compendium of Web Standards, CSS, Linguistics and Search Engine Optimization methodology Copyright ©2005-2007 Sean Fraser. All work is published under a Creative Commons License. All Rights Reserved.

Palm trees on a grassy field in Hawai’i

Main Content Returns thus