The xml: Namespace - xhtml

What are the implications of using the xml: namespace? I'm talking about the difference between xml:lang and lang, and also xml:id and id. Should I prefer using xml: when writing XHTML documents? How compatible is it?

The W3C spec says to use both variants for XHTML documents, noting that the xml: prefixed one will take precedence.
For xml:lang and lang in particular read this one line text:
http://www.w3.org/TR/xhtml1/#C_7

Related

Which output formats support action="passthrough"? (XHTML doesn't - frowny)

We have source documentation in DITA that targets multiple products using the "product" attribute. Our publishing tool supports XHTML input and supports conditionalized output, but 'dita' seems to ignore action="passthrough" for the xhtml target.
What other output formats support action="passthrough"? Maybe I can hack up a temporary workaround. Thanks!
From what I remember, it was considered that in XHTML-based outputs you cannot have "data-" attributes because they are not part of the XHTML transitional specification. So the passthrough works only for HTML 5.
So this XSLT stylesheet:
dita-ot\plugins\org.dita.xhtml\xsl\dita2xhtml-util.xsl
matches all such data- attributes and eliminates them:
<xsl:template match="#*[starts-with(name(), 'data-')]" mode="add-xhtml-ns" priority="10"/>
I see you opened an issue on the DITA OT issues list:
https://github.com/dita-ot/dita-ot/issues/2955
I added the same comment on the issue and maybe we'll discuss there more with the DITA OT devs.

Is there any HTML 5 construct that is ONLY supported in the XML serialization?

Does there exist any "thing"(element, DOM manipulation, styling, nesting of elements, attributes, anything of that sort...) one can do in XHTML 5, that one CANNOT do in HTML 5? I remember reading on the web about one such case, but I cannot recall where it is I saw it.
This is apart from the use of content from external namespaces such as SVG and MathML (which is supported in HTML as well).
For reference, the number of answers to the converse question "what can you do in HTML 5 that you can't in XHTML 5?" are very large, given the strictness of XHTML. Hence I'm looking for answers to this question.
Yes, for example entity declarations and references to entities so defined. They are part of XML, so they must be supported when using XML serialization, as it is required to follow generic XML rules. Example:
<!DOCTYPE html [
<!ENTITY foo "Hello world">
]>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Demo</title>
</head>
<body>
&foo;
</body>
</html>
XHTML, being XML supports xml-stylesheet declarations. Not just CSS but also XSLT. Which can transform the document tree before presentationXSLT also supports inclusions via document("foo.xml"), which can be used as an XInclude surrogate since no browser supports the latter right now.
XML parsers validate well-formedness
XHTML supports namespaces, allowing other XML content (not just SVG and MathML) to be embedded
CDATA sections
.innerHTML, .insertAdjacentHTML() and .createContextualFragment() validate well-formedness
The topic is quite interesting in general. E.g. an XHTML5 parser is not quite the same as a pure XML parser, as the HTML5 spec mandates a few willful violations of the XML parser, e.g. to support the <template> element.
There also are a handful of states in which you can have a valid DOM that will throw an error if you try to export it through the XHTML fragment serialization algorithm.
And the HTML Fragment serialization algorithm may emit a string which will result in a different DOM when parsed again by a HTML parser.
So basically all three of the following are not fully isomorphic to one another, in any combination:
the XHTML5 serialization
the (X)HTML5 DOM
the HTML5 serialization.
In XHTML, you can use self-closing syntax (/>) on non-void elements:
<script src="js.js" />
And void elements can have stray end tags:
<input></input>
I was able to find what I was remembering vaguely in this unofficial Q&A by hsivonen. I'm still looking for other such "features".
[...] In this case, you must avoid constructs that aren’t supported in text/html (e.g. div as a child of p).
Searching about more, I found this page (second post from top) :
but basically a p can never enclose a div in HTML (or XHTML served with the mime type text/html). If you are serving XHTML with an XML mime type, you can do this in theory, but the result would not be valid XHTML.
saying that the HTML parser simply doesn't allow the possibility, while the XHTML parser, which doesn't need to second-guess the code, accepts it but it's still invalid.
I decided to test it out : took an application/xhtml+xml page, tried to add a div inside a p using Chrome dev tools "Edit as HTML" function. It worked. I copied the source, made the same change and tested it in validator.nu. It marked it as invalid, to my slight disappointment.
Trying to add a div in a text/html page in the same manner was impossible. As soon as I exited the "Edit as HTML" mode, it simply moved the div after the p.

to escape or not to escape: well formed XHTML with diacritics

Say that you have a XHTML document in English but it has accented characters (e.g. meta name="author" content="José"). Let's say you have no control over the HTTP headers.
Should the characters be replaced for their corresponding named entities (e.g. á, etc)?
Should the xml:lang attribute be set to English?
I know I can check the W3C recommendation but I am asking more from a practical point of view.
Should the characters be replaced for their corresponding named entities (e.g. á, etc)?
Since you can't control the HTTP headers (and thus the declared character encoding) you should encode everything using ASCII (since it is a safe subset of just about everything).
This will require that you use entities for anything that isn't in ASCII. Named ones are preferred (as they are easier for people editing the HTML to handle) but not required.
Should the doc type and the xml:lang attribute be set to English?
The EN in the Doctype is a reference to the language that the comments in the DTD are written in. The HTML 3.x / 4.x and XHTML 1.x Doctypes must always use EN.
The lang attribute (and additionally the xml:lang attribute) should specify the language that the content is written in. If that is English, then it should be English.
Looks like I kind of missed the point, so here's the answer, and following up is the rant on encodings.
xml:lang="en" doesn't forbid you from using any character you want, it's only metadata for use by browser, search engines, accessibility software, etc. If you page is in English, then go ahead, write it.
As of diacritics, HTML supports both directly writing the character or writing the entity, both in attributes and in text nodes (and possibly in node names too, but I'm not sure; anyways, that's not going to happen with HTML). However, it's easier in my opinion to use UTF-8 everywhere than to escape entities; and there are like 4 ways to set the encoding of a page, so it would be hard to believe that, in a practical case, you can't do it.
From a practical point of view, being a French speaker with diacritics in my first name, I find it is a MAJOR annoyance (and markdown won't let me stress MAJOR enough) when websites don't support accentuated letters. Even if you set xml:lang to English, it's not going to solve this problem.
I recommend that you use UTF-8 because it is backwards-compatible with ASCII and it can encode every UCS character. If you have no control over the HTTP headers, you still have two options: the XML declaration, and the meta tag.
If I recall correctly, if you get an XML document, the encoding "attribute" in the <?xml?> tag has precedence. This is your first solution, but it's probably not supported by legacy browsers.
<?xml encoding="UTF-8"?>
Your other option, and by far better supported, is to use the meta tag to tell the browser about the encoding. In HTML4-, you can use this:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
In HTML5+, you can use this simpler form:
<meta charset="UTF-8">
Since you use XHTML, you'll want to self-close these (and use the appropriate application/xhtml+xml MIME type in the Content-Type <meta> tag).

Are Custom Attributes OK in XHTML

I understand that according to the HTML specification, it's invalid to add custom attributes to elements. Is this also invalid with XHTML?
I thought XHTML was part of the XML family, and as such was extensible. Being extensible, isn't it ok to use custom attributes?
Dave
custom attributes won't be considered valid by the standard W3C validators.
You can define your own document type definition (DTD) though. See http://www.alistapart.com/articles/customdtd/ for more information about that.
With the standard document type definition, you can't introduce your own custom attributes.
But, starting with HTML5 you'll be able to introduce your own attributes as long as you prefix them with data-.

HTML 5 versus XHTML 1.0 Transitional?

It seems that HTML 5 is going to be supported (partially) by Firefox 3.1 and other browsers. It is adding support for video and audio as tags, but these are new tags that XHTML 1.0 Transitional does not recognize. What is the behavior supposed to be if I use a new HTML 5 tag in a future version of Firefox but use the DTD for XHTML? And what if I mix HTML 5 markup with XHTML 1.0 Trans?
This is getting confusing. Why didn't they just add these tags to XHTML? How do we support both XHTML and HTML 5?
Video on HTML 5: http://www.youtube.com/watch?v=xIxDJof7xxQ
HTML5 is so much easier to write than XHTML 1.0.
You don't have to manually declare the "http://www.w3.org/1999/xhtml" namespace.
You don't have to add type attributes to script and style elements (they default to text/javascript and text/css).
You don't have to use a long doctype where the browser just ignores most of it. You must use <!DOCTYPE html>, which is easy to remember.
You don't have a choice to include or not include a dtd uri in the doctype and you don't have a choice between transitional and strict. You just have a strict doctype that invokes full standards mode. That way, you don't have to worry about accidentally being in Almost standards mode or Quirks mode.
The charset declaration is much simpler. It's just <meta charset="utf-8">.
If you find it confusing to write void elements as <name>, you can use <name/>, if you want.
HTML5 has a really good validator at http://validator.nu/. The validator isn't bound by a crappy DTD that can't express all the rules.
You don't have to add //<![CDATA etc. in inline scripts or stylesheets (in certain situations) to validate.
You can use embed if needed.
Just syntax-wise, when you use HTML5, you end up with cleaner, easier to read markup that always invokes standards mode. When you use XHTML 1.0 (served as text/html), you're specifying a bunch of crud (in order to validate against a crappy dtd) that the browser will do automatically.
Myths and misconceptions abound in this thread.
XHTML 1.0 is older than HTML 5. It cannot use any new vocabulary. Indeed, its main selling point was that it uses exactly the same vocabulary as HTML 4.01.
There will be no XHTML 1.2 - most probably. And it is not needed. XHTML 5 is the XML serialization of HTML 5. Identical vocabulary, different parsing rules.
HTML has never been treated as true SGML in browsers. No browser has ever implemented an SGML-compliant parser. HTML 5 will make this fact into a rule and the HTML serialization will follow todays de facto standard. One could perhaps say that it is "SGML-ish".
As it has been stated, the DTD serves exactly one purpose IN BROWSERS, and that is to distinguish between standards compliance mode and quirks mode. Thus it affects only styling and scripting. If you are using frames on a page with astrict doctype, they will render just fine. As will <embed> and even <marquee> - even though the latter is an abomination and the former not in any current standard. It is part of HTML 5, though.
Video and audio can be used regardless of serialization, XML or HTML. they are part of both HTML 5 and XHTML 5. Once the parsing stage is over a browser will have constructed an internal DOM of the document. That DOM will be for all practical purposes the same regardless of serialization. And yes, XHTML sent with text/html is still normal html, regardless of doctype.
Well, generally speaking HTML is SGML and XHTML is expressed in XML. Because of that, creating XHTML is connected with more restrictions (in the form of markup) than HTML is. (SGML-based versus XML-based HTML)
As mentioned on Wikipedia, HTML 5 will also have a XHTML variant (XHTML 5).
Rule of thumb: You should always use valid markup. That also means that you should not use the mentioned <video> or <audio> tags in XHTML 1.0 Transitional, as those are not an element of that specification. If you really need to use those tags (which I highly doubt), then you should make sure that you use the HTML 5/XHTML 5 DTD in order to specify that your document is in that DOCTYPE.
Using HTML 5 or XHTML 5 in the given state of the implementation (AFAIK, the standard is not even settled, yet, correct?) could be counter-productive, as almost all users may not see the website rendered correclty anyways.
Edit 2013:
Because of the recent downvotes and since this accepted answer cannot be deleted (by me), I would like to add that the support and standardization process of HTML5 is nowadays totally different to what it was when I wrote this answer five years ago. Since most major browsers support most parts of the HTML5 draft and because a lot of stuff can be fixed with polyfills in older browsers, I mainly use HTML5 now.
You might be looking at the problem the wrong way because the relationship to XHTML 1.x section, HTML 5 states:
"This specification is intended to replace XHTML 1.0 as the normative definition of the XML serialization of the HTML vocabulary."
Now that language is controversial (the XHTML 2 WG has disputed it and the HTML WG is trying to resolve the differences...) but that's where we stand right now.
A couple of notes:
HTML 5 includes an XML serialization known as XHTML 5, the spec explains the differences if you're into nitty gritty details
HTML is not SGML. Henri Sivonen has done a great write up on the history of HTML parsing
As of this time (it has been a topic of debate several times), there won't be a DTD for HTML/XHTML 5 -- the Conformance Requirements section of the spec explains why a DTD isn't suitable for defining the HTML language. The HTML 5 validator also contains a wealth of information on this topic (including RELAX NG schemas for HTML5)
Keep in mind that doctypes only serve one purpose in browsers: switch between quirks, almost standards and standards mode. Therefore, using <video> and <audio> will work with any doctype declaration. IMO, using an XHTML doctype is quite useless, as every page you send with text/html MIME type is parsed as (tag-soup) HTML anyways. I suggest using the HTML5 doctype (<!doctype html>), as it is easier to remember and doesn't force you in XML syntax without a reason.
Why didn't they just add these tags to
XHTML?
They actually did, there is an XML serialization of HTML 5 (XHTML5). To use this, you have to send your pages with an XML MIME type, such as application/xhtml+xml. This is not (yet) supported by IE, though.
What is the behavior supposed to be if
I use a new HTML 5 tag in a future
version of Firefox but use the DTD for
XHTML?
And what if I mix HTML 5 markup with
XHTML 1.0 Trans?
If your markup isn't implemented as part of your chosen DTD - then logically, that markup shouldn't be followed. But browser implementations aren't always strictly logical.
Why didn't they just add these tags to
XHTML? How do we support both XHTML
and HTML 5?
xHTML is not better than HTML, but it's more suited to some applications. One of the main benefits of xHTML is that it can be transformed into different formats using XSLT. For example, you could use XSLT to automatically transform xHTML into an RSS feed or another XML format.
You don't need to support both formats - weigh up the benefits/drawbacks for each with your project's requirements. HTML 5 probably won't be standard for quite some time.
(X)HTML5 is just the next version. You should be using XHTML1.1 until XHTML5 is well-supported.
You probably should not use the backwards-compatability SGML profile of HTML5. It makes things harder for scrapers and small parsers.
Your doctype will tell the browser whether you're using HTML5 or XHTML. You can't just shove a tag from one doctype into a document of another doctype and expect it to work.
Without a doctype, it's all just tag soup anyway.
Don't use things like video/audio tags when 99% of people won't be able to view it properly on their browser. For either of these two examples I'd suggest using FLV.
As far as why they don't add it to XHTML... firstly 1.0 isn't the most recent version, 1.1 was released a while ago.
Eventually things get standardized and we'll see these types of tags in both standards, but for now just do what you can to ensure the most amount of people can view your content.

Resources