[Skip to the Main Content]

(X)HTML Well-Formedness requires Validation

Well-formedness is a common discipline when constructing the markup of websites. That’s what most developers do, isn't it? So, pages with well-formed elements should be displayed in browsers as web developers intended, Right? And, all developers need to do is view their pages in a browser. Perhaps, not. What about ill-formedness? How are ill-formed documents displayed? The W3C offers a clue about it.

XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition), 4. Differences with HTML 4, 4.1. Documents must be well-formed has this intriguingly ominous caveat,

“Although overlapping is illegal in SGML, it is widely tolerated in existing browsers.”

Every web standards advocate has seen ill-formed (X)HTML when peering at source code but how many have identified it when viewing a web page. None. And, I don’t mean CSS (or, Cascading Style Sheet) errors; those are simple to see. The W3C notes that overlapping is illegal but what about other sorts of well-formedness errors, e.g., XHTML unclosed elements and nesting irregularities. How would one know that a web page has ill-formed markup by viewing ill-formed pages in a browser if “it is widely tolerated in existing browsers.” One wouldn’t; it’s widely tolerated.

How widely tolerated is ill-formedness in existing browsers?

Significantly.

I did this simple test.

The (X)HTML,


<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>What's Wrong Here?</title>
<p>First Paragraph
<h1>Header
<dd>
<p>Second Paragraph
<ol>
<li>First Line Item
</h1>
<li>Second Line Item

That’s all. Except, for one inline style.

It is parsed and rendered thusly. It is rendered—Similarly—in all browsers. The browsers have taken their default attribute values (or, settings) for each element and applied them. Well, they tried.

  1. The first <p> defaults to the browsers’ values.
  2. The <h1> defaults.
  3. The <dd> defaults and has indented the elements which follow as expected (regardless of the missing <dl>.
  4. The second <p> has inherited the <h1> font size as well as being indented by the preceding <dd>. [Note: It has inherited the header’s font size because of the closing </h1>. The header violates HTML 4.01 syntax and XHTML 1.0 “nesting” requirements.]
  5. The <ol> defaults.
  6. The first <li> has default <ol> enumeration and <dd> indentation. It has inherited <h1> font size.
  7. The </h1> behaves accordingly. It closes the header element.
  8. The second <li> defaults to an unordered list regardless of the missing <ul> and, since the header was closed, it does not inherit <ol> enumeration, <dd> indentation nor <h1> font size.

[Note: If you have heard about “error-handling“, this ill-formed page is an example of what error-handling does.]

What a web page appears to be in a browser is not an indication of well-formed markup; some designers believe otherwise. [Elementary aside: What about potential clients?] If you had—Merely—viewed the above example (and, not peered at the source code) it appears to be an oddly constructed CSS page. Validation (or, Conformance) performance is required.

The W3C (X)HTML Validation Service can indirectly identify markup well-formedness deficiencies. The error descriptions are not precise but they do offer guidance when reviewing source code.

Ill-formed (X)HTML can (and, often does) affect any page. It doesn’t matter if pages are generated by hand-coding, (X)HTML editors or Content Management Systems (CMS): ill-formed content may strike at any time. Any where. And, until an (X)HTML Well-Formedness (or, Semantics) Validator or Conformance Checker is invented, the W3C HTML Validation Service remains the best tool but only if it’s used. Do not rely on browsers!

Validate!


Sean Fraser posted this on February 26, 2007 12:16 PM.

  • Add to Technorati Favorites
  • de.licio.us: http://www.elementary-group-standards.com/web-standards/xhtml-wellformedness-requires-validation.html
  • furl: http://www.elementary-group-standards.com/web-standards/xhtml-wellformedness-requires-validation.html
  • reddit: http://www.elementary-group-standards.com/web-standards/xhtml-wellformedness-requires-validation.html

Comments

zcorpan wrote this at February 26, 2007 02:38 PM

HTML, nor SGML for that matter, have the concept of well-formeness. There is no such thing as well-formed (or ill-formed) HTML.

You will see well-formedness errors quite clearly in browsers if you use an XML MIME type.


Comment Author Gravatar
Sean Fraser wrote this at February 26, 2007 03:19 PM

zcorpan: True. Technically, HTML does not have well-formedness requirements in the W3C Recommendations. The concept (and, practice), however, has been adopted by those who knowingly use HTML 4.01.

After reading your comments, I revised "HTML" to be "(X)HTML" for clarity. I tend to consider XHTML + text/html as HTML; and, write accordingly.


Voice teacher wrote this at June 7, 2007 04:37 AM

There is no such thing as well-formed HTML ? Very funny..


Comment Here

Reply guidelines: Basic HTML (a href, p, code, blockquote, dl, dt, dd, ul, ol, li, cite and q) are allowed. Line breaks and paragraphs are automated.


Inappropriate, unwarranted or self-aggrandizemented comments may suffer redaction. Or, deletion.

[Note: A gravatar, or globally recognized avatar, is that small image in the comments. Gravatar sets-up them.]

The Elementary Standards: A Compendium of Web Standards, CSS, Linguistics and Search Engine Optimization methodology Copyright ©2005-2007 Sean Fraser. All work is published under a Creative Commons License. All Rights Reserved.

Palm trees on a grassy field in Hawai’i

Main Content Returns thus