XHTML still harmful? - xhtml

I'm starting a project where the client has mandated the use of XHTML 1.0 Strict. Now I'm wondering whether the problems described in Sending XHTML as text/html Considered Harmful are still current and whether I should try to convince the client that this (very strongly stated) requirement is counterproductive.
Does Internet explorer handle application/xhtml+xml correctly by now?

IE9 handles application/xhtml+xml, including SVG inside it, one of the main reasons to want to use this media type. (Otherwise, there's relatively little point in using it to date, as you get a bunch of scripting changes, and IE<9 incompatibility, in return for relatively little if any performance gain at the moment.)
I don't agree with Hixie that serving XHTML as text/html has ever been really harmful. Using the HTML-compatibility guidelines, XHTML poses no problems to any browsers since the ancient Netscape 4. Although it doesn't really get you anything on the client-side, it can be helpful to your own page handling workflow if you're working with XML processing tools. And the XML syntax rules, being stricter-but-simpler than HTML, are a good thing to author to; this gives the validator a chance to pick up on errors that are valid constructs in SGML/HTML but which are almost certainly not what you meant. (On the other hand, since the validator won't enforce HTML-compatibility guidelines there are a couple of places where it can let through well-formed-but-troublesome markup, most commonly self-closed <script> tags breaking the whole page.)
Specifically, to answer his points: /> and related SGML issues are only a problem to tools that really believe HTML is SGML—which is no browser ever, in the past. In the future, it is specifically allowed in non-XML HTML5.
Hiding scripts/stylesheets from ‘legacy’ (pre-HTML 3.2!) browsers hasn't been an issue for a decade or so: I came up with the mangled comment hack he (rightly) derides as ridiculous, but it was only an exercise; I never intended anyone to use it except in some strange hypothetical emergency. It's certainly not ‘necessary’ for using embedded scripts and stylesheets in XHTML-as-HTML... a straight //<![CDATA[ hack is enough if you need to be able to include < and & characters, and more commonly you don't even need that.
No-one actually wants to sniff for XHTML-as-HTML and treat it differently, so that whole section is moot. “Sending XHTML 1.1 as text/html is NEVER fine” has been changed by W3C (it now is fine after all), and XHTML 2.0 is dead.
So yes, use XHTML 1.0 Strict, or XHTML 1.1 or XHTML5, if you like. But until IE9 is your baseline browser (and that's not going to be the case for ages), you'll have to stick with text/html.

Internet Explorer 9 will handle application/xhtml+xml documents through a tag soup parser.
Internet Explorer 8 and earlier will prompt the user to save the document or open it in another application.
Internet Explorer 6 and newer all have significant market share (although this does depend, to some degree, on your market).
Nothing significant has changed as regards browser support for real XHTML for many years.
It is still far more trouble than it is worth unless you actually use XML parsers in your production chain (in which case, good luck persuading them to output XHTML that meets the HTML Compatibility Guidelines).

This depends on what you mean by "Internet Explorer".
For instance, IE6 is still from something like 2001 (that hasn't changed), and no, it still doesn't handle it correctly.

Over the past one year, (27th May 2017 - 27th April 2018), the combined share of IE 6, 7, and 8 comprises 1.72% according to netmarketshare.
Every other major browser supports real XHTML (i.e. sent with the application/xhtml+xml MIME type. My answer to you is "No, it's not harmful".
Whether it's advantageous, I would guess it doesn't matter much until you actually grok and use XML technologies (SVG, MathML, etc) on the web comfortably (yes HTML syntax also supports them, but it's virtually a hack).
If browser makers put more effort into XML parsers, it could still matter for pure parsing speed.

Related

Is XHTML + SMIL still relevant? Building a rich text editor

Are XHTML+SMIL still relevant given recent standards? All material I can find is 4+ years old.
I am researching options to build my own cross-browser-compatible rich text editor and this is one of the options (offered by Microsoft). Mozilla advocates iFrames, but I've always been told frames are a bad practice and would cause problems across browsers anyway.
I built my own rich editor and use zero frames, frameworks, etc. I'm not sure why you would need/want SMIL for a rich editor. XHTML is completely valid, I serve all my work as application/xhtml+xml and have been updating my web software from XHTML 1.1 to XHTML5 though I consider versionless doctypes invalid as all software should always explicitly declare what version of it's given syntax is. I'm guessing you want to animate editing bits though you could use CSS3 animation and CSS3 transitions just fine and have support for Internet Explorer, a good long term plan is to consider Internet Explorer 10 as your minimal browser to work with. So I say keep XHTML and it's beautiful strictness though go about your goals in a more compatible method.

Which standard (HTML/XHTML ) to learn to be ready to use HTML5 when it happens?

I am really new to this so please forgive the basicness of my question...
I want to learn to design websites and I have a program which I am planning to learn (Dreamweaver CS5) using tutorials from Lynda.com. However on the tutorial it says you should have a good grasp of HTML and CSS before starting Dreamweaver.
I looked at the Lynda.com video for HTML but it is all focused on XHTML. http://www.lynda.com/tutorial/47603
Now I am a bit confused. I heard a new standard was coming in (HTML5). If I learn XHTML - does that mean that I will then have to go back at a later date and learn HTML4 so that I can then catch up and learn HTML5 or will I be able to use my XHTML knowledge and add the future HTML5 code to it?
For example there is a Lynda video on HTML5 but the author says you need a knowledge of html before you can watch it.
Do you think the Lynda.com video on XHTML/HTML is a good place to start or do I need to get a book on HTML4 instead?
If you were starting out now would you learn HTML4 or XHTML?
Thanks
XHTML, absolutely.
Last recommended HTML version was 4.x, and it's from 90s era.
Learn XHTML as much as possible, and try to use strict versions.
I agree with #Matías, if only because of it's strictness which will likely result in cleaner code in the long run. That said, porting from one html version to another shouldn't be too difficult regardless of which one you choose.
I find that when programming the use of XHTML is nice because it allows me to catch errors in my markup at compile time instead of some obscure bug showing itself way later when I modify a page.
The whole lack of XHTML 1.1 support in IE has been a pain, but there are work arounds such as XSL transformations and the such. IE9 has finally added support.
Once (X)HTML5 support becomes strong in the major browsers I intend on using XHTML5 in any web projects I do for work. Supporting legacy IE versions will still be a pain, but it will be manageable.
I would learn HTML4.01, but only because I detest XHTML.
It doesn't matter that much, making the port from (X)HTML x.xx to (X)HTML y.yy is not that hard. You'll have a few pitfalls, but that's all.
On the other hand, HTML5 is quite different and you can start learning it already. It's already happening.
Whatever you learn, make sure you learn the Strict version.
Check this out for future proofing: http://blog.twostepmedia.co.uk/css3-still-novelty-or-usable-in-everyday-web-development/
To the O/P, learn the basics of HTML4 and then get straight onto HTML5, you'll be way ahead of the pack and your websites WILL stand out :)
I would personally work on learning HTML5. By the time you get proficient at it to be good enough to professionally code websites, most of the major browser vendors will have adopted it as the standard.
Remember, web technology moves fast! What's hot today will be obsolete tomorrow, and what's in beta now will be hot tomorrow.
I found this http://headjs.com, a modernizer, here on Stack Overflow, which is used to future-proof web applications. This makes learning and using HTML5 markup a possibility today, so that as browser vendors update their applications, they'll slide right into the HTML5 functionality.
Make CSS apply only for Opera 11?
For a brief summary:
HTML 4.01 is the current standard of markup languages for the internet.
XHTML 1.0 was forked off from HTML 4.01. It introduced greater strictness in validation, more XML-like syntax (eg. <br /> instead of <br>) and XML namespaces for things like MathML (for embedding mathematical equations in pages.... very infrequently used). In theory XHTML allowed people to define their own tags.... but in practice this never happened. In actuality, the only real different it has from HTML 4.01 are the self-closing tags, a different doctype (the header at the top of HTML documents), and a few attributes on the <html> tag.
XHTML 1.1 was a natural progression from XHTML 1.0. It introduced even greater strictness, and enforced things like mime-types for served documents. However, because it declared it was XML instead of HTML, and had to be served to the browser as XML (which Internet Explorer to this day does not support), it never took off.
XHTML 2.0 was a draft recommendation that got scrapped along the way. No-one subsequently uses it.
HTML 5 is the next evolution from HTML 4.01. It adds a lot of new tags, new functionality such as local storage (meaning more web-app type applications are possible), and some other goodies. It comes in two flavours - HTML 5, which uses HTML-style syntax, and XHTML 5, which uses XHTML syntax with self-closing tags (and is not to be confused with XHTML 2, which is dead remember.) It is 'the next big thing' in web markup languages, but is still in draft stage. Some browsers are introducing support for new HTML 5 tags, but legacy browsers have no support.
HTML 5 cannot be safely used in current sites, due to the draft nature of the specification. Some sites are doing so, but those sites can possibly get the whole nature of the language yanked out from under their feet.
HTML 5 is not expected to be a formal recommendation until 2022.
In summary: The current language of the web is HTML 4.01. HTML 5 expands on that greatly, but is not ready for everyday use. And the differences between HTML 4.01 and any flavour of XML, are minimal at best.
XHTML's main benefit, as Matias said, is it's XML compatibility, and also the other way round; I regularly use an XSLT to transform an XML document into XHTML. Although XSLT can output HTML, it's HTML that's compliant with XML anyway.
Strictly speaking, there's no reason you can't write HTML5 that's totally XML compliant; for that reason alone, I'd say go with HTML5, and by writing it so that it IS XML compliant, you also get all the benefits of XHTML.

What type of XHTML and CSS validations errors are safe to avoid?

What type of XHTML and CSS validations errors can be avoided? which would not harmful today and tomorrow (if we do not touch xhtml, css )?
I mean errors which will not create any problem on future upgrade of browser, css and html version? they just show as an error today?
I think one thing I know is Vendor extensions. Are there any other errors/warnings which will not create any bad effect for user and developer?
If I'm making a site and i get many errors should i try to give time to solve every error? if i will try to solve all error then i will have to use javascript on some instances in place of css
The XHTML and CSS validators will validate against the corresponding specifications of the W3C standards. Ignoring these mean that your page(s) are deviating from those standards.
Web browsers aim to implement these standards, so ignoring a warning is likely to cause issues on at least some browsers. Therefore, you cannot ignore any warning that the validators give.
Also, having XHTML and CSS conformant web pages is not guaranteed to work on all browsers and be compatible with them as the browsers may implement something differently or incorrectly.
Having conformant pages is still a good thing, as most browsers are (for the most part) conformant and having more conformant pages helps put the ownership on the browser implementers. That is, you (as a web page author) need only concern yourself with being standard compliant. If a browser can't handle that, the issue is with the browser, not the web page author.
If you want to be compatible with a large number of browsers, start with the valid conformant page and then add the minimum needed to get it working on other non-conformant browsers. Doing it this way is a lot easier than starting with a non-conformant page and trying to make that work on most browsers.
You should try to avoid all parse errors. If in doubt, try the validator.w3.org and use the html tidy function to clean up the code.
Each browser will render and parse XHTML and css differently. Even if it works now it might not work tomorrow.
The only safe answer is "none". The best guarantee you have for future compatibility wth all browsers is stick to the standard and have fully validated xhtml and css.

Why are HTML5 and XHTML 2 separate standards?

Is there a reason why these two standards are being developed separately? They seem to be solving the same problem but what are the differences and, if they are to remain separate, what roles are they expected to take in web development in the future?
Browser vendors care a great deal about backwards compatibility. The group speccing XHTML2 didn’t.
Note that XHTML2 isn’t solving all the same problems HTML5 is solving. HTML5 is much broader in scope than XHTML2. HTML5 covers processing models, JavaScript APIs, video, audio, application widgets, etc. but XHTML2 does not.
As for expected roles, representatives from top browser vendors participate in the HTML WG but not in the XHTML2 WG. On the other hand, people showing interest in the “Backplane” are participating in the XHTML2 WG.
See also David Baron’s post about how the W3C works.
This article only answers part of the question. It doesn't explain what the likely roles of the two standards will be in the future.:
X/HTML 5 Versus XHTML 2
As for the likely roles, people are saying that:
W3C started work on XHTML 2, throwing away backward-compatibility
Some people didn't like that, and started to define HTML 5
Eventually, W3C were persuaded to adopt HTML 5 as well
Browser vendors seem to be behind HTML 5 (but not XHTML 2)
If browser vendors don't support XHTML 2 then I don't know what its role is. On the other hand XHTML 2 can be more-or-less converted to XHTML 1, e.g. using an XSL transformation, so it seems to me that it would be (much) easier for anyone to support, if they wanted to, than HTML 5 will be.
XHTML2 is dead.
Have a look at the first chapter of HTML5 FOR WEB DESIGNERS by Jeremy Keith which explains superbly the differences in a summarized way.
This is largely an accurate explanation, IMO, but it should be noted that HTML5 isn't backwards compatible - new elements like section cannot be styled with CSS in even IE7. Yes, there are JavaScript work-arounds but these aren't sufficient, both because not everyone has JavaScript enabled, far from every developer will become aware of these, and similarly not every developer has the ability to use JavaScript in this way.
HTML 5 has been constructed with backwards compatibility in mind, unlike XHTML 2, which was created in order to break away from restrictions involved with backwards compatibility.
The W3C allowed the XHTML 2 working group to expire, essentially ending development of XHTML 2. HTML 5, with backwards compatibility and new features, will become the doctype of the future.

Is it acceptable for invalid XHTML?

I've noticed a lot of sites, SO included, use XHTML as their mark-up language and then fail to adhere to the spec. Just browsing the source for SO there are missing closing tags for paragraphs, invalid elements, etc.
So should tools (and developers) use the XHTML doctype if they are going to produce invalid mark up? And should browsers be more firm in their acceptance of poor mark-up?
And before anyone shouts hypocrite, my blog has one piece of invalid mark-up involving the captha (or it did the last time I checked) which involves styling the noscript tag.
There are many reasons to use valid markup. My favorite is that it allows you to use validation as a form of regression testing, preventing the markup equivalent of "delta rot" from leading to real rendering problems once the errors reach some critical mass. And really, it's just plain sloppy to allow "lazy" errors like typos and mis-nested/unclosed tags to accumulate. Valid markup is one way to identify passionate programmers.
There's also the issue of debugging: valid markup also gives you a stable baseline from which to work on the inevitable cross-browser compatibility woes. No web developer who values his time should begin debugging browser compatibility problems without first ensuring that the markup is at least syntactically valid—and any other invalid markup should have a good reason for being there.
(Incidentally, stackoverflow.com fails both these tests, and suggestions to fix the problems were declined.)
All of that said, to answer your specific question, it's probably not worthwhile to use one of the XHTML doctypes unless you plan to produce valid (or at least well-formed) markup. XHTML's primary advantages are derived from the fact that XHTML is XML, allowing it to be processed and transformed by tools and technologies that work with XML. If you don't plan to make your XHTML well-formed XML, then there's little point in choosing that doctype. The latest HTML 4 spec will probably do everything you need, and it's much more forgiving.
We should always try to make it validate according to standards. We'll be sure that the website will display and work fine on current browsers AND future browsers.
I don't think that, if you specify a doctype, there is any reason not to adhere to this doctype.
Using XHTML makes automated error detection easy, every change can be automatically checked for invalid markup. This prevents errors, especially when using automatically generated content. It is really easy for a web developer using a templating engine (JSP, ASP.NET StringTemplate, etcetera) to copy/paste one closing tag too little or too many. When this is your only error, it can be detected and fixed immediately. I once worked for a site that had 165 validation errors per page, of which 2 or 3 were actual bugs. These were hard to find in the clutter of other errors. Automatic validation would have prevented these errors at the source.
Needless to say, choosing a standard and sticking to it can never benefit interoperability with other systems (screen scrapers, screen readers, search engines) and I have never come across a situation where a valid semantic XHTML with CSS solution wasn't possible for all major browsers.
Obviously, when working with complex systems, it's not always possible to stick to your doctype, but this is mostly a result of improper communication between the different teams developing different parts of these systems, or, most likely, legacy systems. In the last case it's probably better to isolate these cases and change your doctype accordingly.
It's good to be pragmatic and not adhere to XHTML just because someone said so, regardless of costs, but with current knowledge about CSS and browsers, testing and validation tools, most of the time the benefits are much greater than the costs.
You can say that I have an OCD on XHTML validity. I find that most of the problems with the code not being valid comes from programmers not knowing the difference between HTML and XHTML. I've been writing 100% valid XHTML and CSS or a while now and have never had any major rendering problems with other browsers. If you keep everything valid, and don't try anything too exotic css wise, you will save yourself a ton of time in fixes.
I wouldn't use XHTML at all just to save myself the philosophical stress. It's not like any browsers are treating it like XHTML anyway.
Browsers will reject poor mark-up if the page is sent as application/xhtml+xml, but they rarely are. This is fine.
I would be more concerned about things like inline use of CSS and JavaScript with Stack Overflow, just because they make maintenance harder.
Though I believe in striving for valid XHTML and CSS, it's often hard to do for a number of reasons.
First, some of the content could be loaded via AJAX. Sometimes, fragments are not properly inserted into the existing DOM.
The HTML that you are viewing may not have all been produced in the same document. For example, the page could be made of up components, or templates, and then thrown together right before the browser renders it. This isn't an excuse, but you can't assume that the HTML you're seeing was hand coded all at once.
What if some of the code generated by Markdown is invalid? You can't blame Stack Overflow for not producing valid code.
Lastly, the purpose of the DOCTYPE is not to simply say "Hey, I'm using valid code" but it's also to give the browser a heads up what you're trying to do so that it can at least come close to correctly parsing that information.
I don't think that most developers specify a DOCTYPE and then explicitly fail to adhere to it.
while I agree with the sentiment of "if it renders fine then don't worry about it" statement, however it's good for follow a standard, even though it may not be fully supported right now. you can still use Table for layout, but it's not good for a reason.
No, you should not use XHTML if you can't guarantee well-formedness, and in practice you can't guarantee it if you don't use XML serializer to generate markup. Read about producing XML.
Well-formedness is the thing that differentiates XHTML from HTML. XHTML with "just one" markup error ceases to be XHTML. It has to be perfect every time.
If "XHTML" site appears to work with some errors, it's because browsers ignore the DOCTYPE and interpret page as HTML.
See XHTML proxy that forces interpretation of pages as XHTML. Most of the time they fail miserably. This is one of the reason why future of XHTML is uncertain and why development of HTML has been resumed.
It depends. I had that issue with my blog where a YouTube video caused invalid XHTML, but it rendered fine. On the other hand, I have a "Valid XHTML" link, and a combination of a "Valid XHTML" claim and invalid XHTML is not professional.
As SO does not claim to be valid, I think it's acceptable, but personally if I were Jeff i would be bothered and try to fix it even if it looks good in modern browsers, but some people rather just move on and actually get things done instead of fixing non-existent bugs.
So long as it works in IE, FF, Safari, (insert other browser here) you should be okay. Validation isn't as important as having it render correctly in multiple browsers. Just because it is valid, doesn't mean it'll work in IE properly, for instance.
Run Google Analytics or similar on your site and see what kind of browsers your users are using and then judge which browsers you need to support the most and worry about the less important ones when you have the spare time to do so.
I say, if it renders OK, then it doesn't matter if it's pixel perfect.
It takes a while to get a site up and running the way you want it, going back and making changes is going to change the way the page renders slightly, then you have to fix those problems.
Now, I'm not saying you should built sloppy web pages, but I see no reason to fix what ain't broke. Browsers aren't going to drop support for error correction anytime in the near future.
I don't understand why everyone get caught up trying to make their websites fit the standard when some browsers sill have problems properly rendering standard code. I've been in web design for something like 10 years and I stopped double codding (read: hacking css), and changing stupid stuff just so I could put a button on my site.
I believe that using a < div> will cause you to be invalid regardless, and it get a bit harder to do any major JavaScript/AJAX without it.
There are so many standards and they are so badly "enforced" or supported that I don't think it matters. Don't get me wrong, I think there should be standards but because they are not enforced, nobody follows them and it's a massive downward spiral.
For 99.999% of the sites out there, it really won't matter. The only time I've had it matter, I ran the HTML input through HTMLTidy to XHTML-ize it, and then ran my processing on it.
Pretty much, it's the old programmer's axiom: trust no input.

Resources