The differences between HTML and XHTML have been noted with a new list of deprecated items (where one can note that
<acronym> has been included). Differences between HTML 4.01 and HTML5 less so. The following identifies one difference that shall affect numerous sites. It's anchors.
HTML 4.01 and HTML5 each define
<a> as an inline element. The difference occurs when HTML5 introduces “Significant Text” and “Embedded Content”.
- Significant text
- Significant text, for the purposes of determining the presence of significant inline content, consists of any character other than those falling in the Unicode categories Zs, Zl, Zp, Cc, and Cf. [UNICODE] [Elementary note: That could be “.” or a single one pixel transparent GIF.]
- Embedded content
- Embedded content consists of elements that introduce content from other resources into the document, for example
img. Embedded content elements can have fallback content: content that is to be used when the external resource cannot be used (e.g. because it is of an unsupported format). The element definitions state what the fallback is, if any.
This study was performed after comparison of this site’s index page between (X)HTML5 Conformance Checking Service Technology Preview and W3C® Unicorn “The Web’s Universal Conformance Checker - ALPHA Test Version”. It passed W3C® anchor conformance but failed (X)HTML5’s. Test case was setup and run through the (X)HTML5 Conformance Checking Service.
[Note: The following addresses HTML5; not XHTML5. All HTML 4.01
text/html and XHTML 1.0
text/html documents are considered by Web Applications 1.0 to be HTML5.
This—Then—is what occurred.
<body> <a id="jabberwocky" href="#twasbrillig" title="Snarks"></a> </body>
(X)HTML5 Conformance Errors
http://www.w3.org/1999/xhtmlnot allowed in this context.Line 8, column 55 in resource http://www.menehune-foundry.com/vague/HTML5-anchor.html
http://www.w3.org/1999/xhtmlrequires significant inline content but did not have any.Line 8, column 59 in resource http://www.menehune-foundry.com/vague/HTML5-anchor.html
Web Applications 1.0, 3. Semantics and structure of HTML elements, 3.3.3. Kinds of elements
220.127.116.11. Block-level elements offers,
Block-level elements are used for structural grouping of page content.
There are several kinds of block-level elements:
- Some can only contain other block-level elements:
- Some can only contain inline-level content:
- Some can contain either block-level elements or inline-level content (but not both):
- Finally, some have very specific content models:
18.104.22.168. Inline-level content offers,
Inline-level content consists of text and various elements to annotate the text, as well as some embedded content (such as images or sound clips).
Inline-level content comes in various types:
- Strictly inline-level content
- Text, embedded content, and elements that annotate the text without introducing structural grouping. For example:
img. Elements used in contexts allowing only strictly inline-level content must not contain anything other than strictly inline-level content.
- Structured inline-level elements
- Block-level elements that can also be used as inline-level content. For example:
Some elements are defined to have as a content model significant inline content. This means that at least one descendant of the element must be significant text or embedded content.
Unless an element’s content model explicitly states that it must contain significant inline content, simply having no text nodes and no elements satisfies an element whose content model is some kind of inline content.
After reading, two things are lacking: significant text and an embedding element.
The corrections as below.
<body> <p><a id="jabberwocky" href="#twasbrillig" title="Snarks">Top</a></p> </body>
“The document conforms to the machine-checkable conformance requirements for HTML5 (subject to the utter previewness of this service).”
That example doesn’t address practical application. Let’s say that you have a “Skip to Main Content” link at the beginning of the page and you have a “Return to Main Content” link at the bottom of the page. And, an anchor between.
<body> <p id="jumpContent"> <a href="#mainContent" title="Skip to the Main Content">Skip to the Main Content</a> </p> [ … ] <a id="mainContent"></a> <div> [CONTENT] </div> [ … ] <div id="plynth"> <p><a href="#mainContent" title="Main Content Start Return"> <img src="http://www.elementary-group-standards.com/images/fin.jpg" width="64" height="105" alt="Return to the main content above"></a></p> </div> </body>
One could embed an anchor in a
<p> and include invisible significant content, e.g., one pixel image, which defeats the purpose, doesn’t it. The anchor
id should be set on an appropriate element, e.g., a
<div>. It’s web standards semantics. Id’s are—Simply—that.
<body> [ … ] <div id="main-content"> [CONTENT] </div> [ … ] </body>
And, there’s your anchor. The division identification name is allowed once per page, isn’t it. And, since it’s allowed but once, it can be used as a style identifier and as a processing identifier. [Note: See HTML 4.01 Specification, 12 Links, 12.2.3 Anchors with the id attribute wherein it states, “The
id attribute can act as more than just an anchor name (e.g., style sheet selector, processing identifier, etc.).”
However. Let’s say you have
<a id="fir-a9" href="#stuffandnonsense-co-uk" title="Top"></a> which with through the miracle of CSS that anchor redirect is set over a background image. The anchor would be set in an embedding element, e.g.,
<p>, and a one-pixel transparent GIF (or, PNG) could be inserted, thereby, meeting HTML5's Significant Text requirement.
- Web Applications 1.0, 1.4.1. HTML vs XHTML states,
“There are various concrete syntaxes that can be used to transmit resources that use this abstract language, two of which are defined in this specification.”
“The first such concrete syntax is “HTML5”. This is the format recommended for most authors. It is compatible with all legacy Web browsers. If a document is transmitted with the MIME type
text/html, then it will be processed as an “HTML5” document by Web browsers.”
“The second concrete syntax uses XML, and is known as “XHTML5”. When a document is transmitted with an XML MIME type, such as
application/xhtml+xml, then it is processed by an XML processor by Web browsers, and treated as an “XHTML5” document. Generally speaking, authors are discouraged from trying to use XML on the Web, because XML has much stricter syntax rules than the “HTML5” variant described above, and is relatively newer and therefore less mature.”