[Skip to the Main Content]

(X)HTML5 Character Encoding

If you may have read the Web Applications 1.0 specifications you may have believed UTF-8 to be the sole acceptable character endcoding. Or, if after reading WHATWG Wiki’s Differences HTML and XHTML you may have noted that (X)HTML5 supports UTF-8 Character Encoding only. That isn’t so.

On December 5, 2006 I sent this message.

HTML5/XHTML5 specs _implicitly_ state UTF-8 is required. None of the specs - as far as I can see - state UTF-8 as _explicitly_ required, i.e., USE UTF-8 OR DIE.


“Is UTF-8 the sole acceptable charset for HTML5?”

Ian Hickson replied on December 5, 2006,

“No, any character set can be used. We haven’t quite defined how character encodings work yet, but basically it’ll be whatever IE7 supports today.”

That would be most Every charset.

Microsoft Windows Add Search Providers to Internet Explorer 7 states,

“If your search provider doesn’t install correctly, click here to select another character encoding.”

That then provides the following character encodings:

  • UTF-8
  • Arabic (IBM-864)
  • Arabic (IBM-864-I)
  • Arabic (ISO-8859-6)
  • Arabic (ISO-8859-6-E)
  • Arabic (ISO-8859-6-I)
  • Arabic (Windows-1256)
  • Armenian (ARMSCII-8)
  • Baltic (ISO-8859-13)
  • Baltic (ISO-8859-4)
  • Baltic (Windows-1257)
  • Celtic (ISO-8859-14)
  • Central European (IBM-852)
  • Central European (ISO-8859-2)
  • Central European (Windows-1250)
  • Chinese Traditional (Big5)
  • Chinese Traditional (Big5-HKSCS)
  • Chinese Simplified (GB18030)
  • Chinese Simplified (GB2312)
  • Chinese Simplified (ISO-2022-CN)
  • Cyrillic (IBM-855)
  • Cyrillic (ISO-8859-5)
  • Cyrillic (ISO-IR-111)
  • Cyrillic (KOI8-R)
  • Cyrillic (Windows-1251)
  • Cyrillic/Russian (CP-866)
  • Cyrillic/Ukrainian (KOI8-U)
  • English (US-ASCII)
  • Greek (ISO-8859-7)
  • Greek (Windows-1253)
  • Georgian (GEOSTD8)
  • Hebrew (IBM-862)
  • Hebrew (ISO-8859-8-E)
  • Hebrew (ISO-8859-8-I)
  • Hebrew (Windows-1255)
  • Hebrew Visual (ISO-8859-8)
  • Japanese (EUC-JP)
  • Japanese (ISO-2022-JP)
  • Japanese (Shift_JIS)
  • Korean (EUC-KR)
  • Korean (ISO-2022-KR)
  • Korean (KS_C_5601-1987)
  • Nordic (ISO-8859-10)
  • South European (ISO-8859-3)
  • Romanian (ISO-8859-16)
  • Thai (TIS-620)
  • Turkish (IBM-857)
  • Turkish (ISO-8859-9)
  • Turkish (Windows-1254)
  • Unicode (UTF-7)
  • Vietnamese (VISCII)
  • Vietnamese (Windows-1258)
  • Western (IBM-850)
  • Western (ISO-8859-1)
  • Western (ISO-8859-15)
  • Western (Windows-1252)

That's comprehensive.

Sean Fraser posted this on December 14, 2006 04:46 PM.

  • Add to Technorati Favorites
  • de.licio.us: http://www.elementary-group-standards.com/html/xhtml5-character-encoding.html
  • furl: http://www.elementary-group-standards.com/html/xhtml5-character-encoding.html
  • reddit: http://www.elementary-group-standards.com/html/xhtml5-character-encoding.html


Comment Here

Reply guidelines: Basic HTML (a href, p, code, blockquote, dl, dt, dd, ul, ol, li, cite and q) are allowed. Line breaks and paragraphs are automated.

Inappropriate, unwarranted or self-aggrandizemented comments may suffer redaction. Or, deletion.

[Note: A gravatar, or globally recognized avatar, is that small image in the comments. Gravatar sets-up them.]

The Elementary Standards: A Compendium of Web Standards, CSS, Linguistics and Search Engine Optimization methodology Copyright ©2005-2007 Sean Fraser. All work is published under a Creative Commons License. All Rights Reserved.

Palm trees on a grassy field in Hawai’i

Main Content Returns thus