XML in markup, any ideas why? - xhtml

I am doing some research on a Motorola site and came across a ton of weird markup. I wanted to get ideas on why this is used and/or why it is a good idea?
Using this page as an example, you can view the source and see tons of tags like
<title xml:lang="en" lang="en"
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head xml:lang="en">
Any ideas? How does this affect SEO and and general best practices? How good or bad is this?
The page seems to explode when I run it through the W3C validator.

Chances are, the page is being built from a number of fragments and processes that don't attempt to maintain valid HTML. With a mixture of knowledge, luck and testing it's possible to build web pages that work fine, even though they're nowhere near valid.
How does it affect SEO? Probably surprisingly little. Search engine parsers have to do the same thing that browsers do, otherwise authors would exploit differences to serve one content to browsers and different content to search engines.
The success of a search engine depends on matching the search string to pages that the display the content that the user was looking for. So long as the page displays correctly, whether the page uses valid markup or not is of no interest to the user, and therefore of no interest to the search engine.
In terms of general best practices, it scores 0.

The web page is just a normal XHTML which leverages XML internationalization. I don't think SEO is a goal here.

Related

Software or website for validating and correcting for XHTML syntax

I have some jsp pages and html pages.
I would like to format all of them so that all are conform to XHTML standard.
Is there any software or online website that provide the functionality of validating XHTML pages and if possible auto-correcting it?
There are a variety of validators around but not necessarily autocorrecting.
CSE HTML Validator is one. This will indicate errors in your code by line and make recommendations.
HTML Tidy will autocorrect. I believe it does XHTML. Do a backup of your files before using it as the result may not always be what you want.
W3 run an online validator http://validator.w3.org/. This will show you your errors.
http://validator.w3.org/ Try this website. This website validates the website according to w3c standards.

What is "Extensible" about XHTML?

Why is XHTML called "eXtensible" (the X in XHTML)? Can we, as individual web developers actually extend it?
What separates it from ordinary HTML?
Well, firstly, things have moved on somewhat, and XHTML isn't really a thing anymore. HTML5 isn't parsed as XML, and XHTML 2.0 was of course cancelled.
Despite that, it's possible to use XHTML if you use the application/xhtml+xml mimetype, just be aware of the various shortcomings of that (any error = yellow screen of death, older IEs don't render anything at all).
For a new project, use the HTML5 doctype and serve as text/html. XHTML can be considered as a failure for many reasons.
Anyway, with XHTML you can do things like this:
<!DOCTYPE html SYSTEM "http://example.com/my-xhtml-custom.dtd">
<html xmlns='http://www.w3.org/1999/xhtml' xmlns:custom="http://example.com/" xml:lang='en-US'>
then copy http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd and edit it how you like, and put it where we referenced earlier.
The w3c have a lot to say about this, specifically:
Don't do this! Documents need to have a meaning as well as correct
syntax. SGML and XML only define syntax. HTML and XHTML define
meaning. If you add elements that aren't defined by a standard, only
you yourself know what they mean. And in 20 or 50 years, even you may
not know it anymore…
Of course, you can experiment, for example to work on future Web
formats, but other than that you should not use proprietary elements.
Nowadays, we thankfully have HTML5 which dropped all this XML stuff (no one was using it and it adds a lot of complexity). It's not extensible in the same way, but that's probably a good thing!

what is the point of being XHTML compliant?

All modern browsers understand HTML, so what is the point of being XHTML compliant other then writing more characters found on the far right side of the keyboard.
There is no point that I can think of. The W3C has canceled XHTML 2.0, although there is supposed to be an XHTML5, which I guess is HTML5 for masochists. Originally XHTML was going to lead us into the world of "correct" HTML documents, but it generated as many (or more) problems than it ever solved.
We validate against either HTML 4.01 Transitional or HTML5 (to the degree that you can do that). That plus clean CSS gives you about the best you can shoot for.
XHTML was originally supposed to be a "next generation of HTML", as well as a stricter version of HTML (which would cause failures if any error showed up in the page). Due to a variety of loopholes and any number of other issues with XHTML (such as pages serving up the wrong mimetype), hardly any pages are actually XHTML, they're just HTML with some extra characters.
Eventually, HTML5 was proposed, w3c split into two groups, then the people working on XHTML 2.0 switched to something better (HTML5) and now everyone is talking about HTML5 taking over everything.
For a longer version (with far more detail), check out this chapter from Dive Into HTML5: http://diveintohtml5.ep.io/past.html
According to http://www.dev-archive.net/articles/xhtml.html, one of the reasons XHTML was created was:
to add the XML ability to extend the language through namespaces. This will make it possible for an author to express more structures and richer semantics than is possible with HTML today. In effect XHTML inherits the possibility of supporting more than one language — instead of extending HTML in a monolithic fashion, XHTML can be extended through modules, where each module define a specific subset of the language.This, theoretically, means extension of the language can be done without the need for a browser upgrade.
XHTML is meant to make the use of XML–based languages in end–user applications such as browsers easy, but can also be used for various data processing and storage purposes in situations where the web is only one of several channels. XHTML take advantage of the extensibility of XML to support multiple namespaces and through them languages.
That article also notes that for most people this won't be useful:
Recommendations
If you don’t have any specific need to deliver XML–based structures to the client, e.g. due to mixing namespaces such as having MathML content in your pages, using Ruby (XHTML 1.1) or techniques such as ACCESS (XHTML 1.2) then consider whether you won’t be better off simply by using HTML 4.01 Strict.
Edit with additional thoughts:
I forgot to mention the point I popped in here to bring up too - XHTML can be more easily manipulated into other languages using XSL transforms.

Which standard (HTML/XHTML ) to learn to be ready to use HTML5 when it happens?

I am really new to this so please forgive the basicness of my question...
I want to learn to design websites and I have a program which I am planning to learn (Dreamweaver CS5) using tutorials from Lynda.com. However on the tutorial it says you should have a good grasp of HTML and CSS before starting Dreamweaver.
I looked at the Lynda.com video for HTML but it is all focused on XHTML. http://www.lynda.com/tutorial/47603
Now I am a bit confused. I heard a new standard was coming in (HTML5). If I learn XHTML - does that mean that I will then have to go back at a later date and learn HTML4 so that I can then catch up and learn HTML5 or will I be able to use my XHTML knowledge and add the future HTML5 code to it?
For example there is a Lynda video on HTML5 but the author says you need a knowledge of html before you can watch it.
Do you think the Lynda.com video on XHTML/HTML is a good place to start or do I need to get a book on HTML4 instead?
If you were starting out now would you learn HTML4 or XHTML?
Thanks
XHTML, absolutely.
Last recommended HTML version was 4.x, and it's from 90s era.
Learn XHTML as much as possible, and try to use strict versions.
I agree with #Matías, if only because of it's strictness which will likely result in cleaner code in the long run. That said, porting from one html version to another shouldn't be too difficult regardless of which one you choose.
I find that when programming the use of XHTML is nice because it allows me to catch errors in my markup at compile time instead of some obscure bug showing itself way later when I modify a page.
The whole lack of XHTML 1.1 support in IE has been a pain, but there are work arounds such as XSL transformations and the such. IE9 has finally added support.
Once (X)HTML5 support becomes strong in the major browsers I intend on using XHTML5 in any web projects I do for work. Supporting legacy IE versions will still be a pain, but it will be manageable.
I would learn HTML4.01, but only because I detest XHTML.
It doesn't matter that much, making the port from (X)HTML x.xx to (X)HTML y.yy is not that hard. You'll have a few pitfalls, but that's all.
On the other hand, HTML5 is quite different and you can start learning it already. It's already happening.
Whatever you learn, make sure you learn the Strict version.
Check this out for future proofing: http://blog.twostepmedia.co.uk/css3-still-novelty-or-usable-in-everyday-web-development/
To the O/P, learn the basics of HTML4 and then get straight onto HTML5, you'll be way ahead of the pack and your websites WILL stand out :)
I would personally work on learning HTML5. By the time you get proficient at it to be good enough to professionally code websites, most of the major browser vendors will have adopted it as the standard.
Remember, web technology moves fast! What's hot today will be obsolete tomorrow, and what's in beta now will be hot tomorrow.
I found this http://headjs.com, a modernizer, here on Stack Overflow, which is used to future-proof web applications. This makes learning and using HTML5 markup a possibility today, so that as browser vendors update their applications, they'll slide right into the HTML5 functionality.
Make CSS apply only for Opera 11?
For a brief summary:
HTML 4.01 is the current standard of markup languages for the internet.
XHTML 1.0 was forked off from HTML 4.01. It introduced greater strictness in validation, more XML-like syntax (eg. <br /> instead of <br>) and XML namespaces for things like MathML (for embedding mathematical equations in pages.... very infrequently used). In theory XHTML allowed people to define their own tags.... but in practice this never happened. In actuality, the only real different it has from HTML 4.01 are the self-closing tags, a different doctype (the header at the top of HTML documents), and a few attributes on the <html> tag.
XHTML 1.1 was a natural progression from XHTML 1.0. It introduced even greater strictness, and enforced things like mime-types for served documents. However, because it declared it was XML instead of HTML, and had to be served to the browser as XML (which Internet Explorer to this day does not support), it never took off.
XHTML 2.0 was a draft recommendation that got scrapped along the way. No-one subsequently uses it.
HTML 5 is the next evolution from HTML 4.01. It adds a lot of new tags, new functionality such as local storage (meaning more web-app type applications are possible), and some other goodies. It comes in two flavours - HTML 5, which uses HTML-style syntax, and XHTML 5, which uses XHTML syntax with self-closing tags (and is not to be confused with XHTML 2, which is dead remember.) It is 'the next big thing' in web markup languages, but is still in draft stage. Some browsers are introducing support for new HTML 5 tags, but legacy browsers have no support.
HTML 5 cannot be safely used in current sites, due to the draft nature of the specification. Some sites are doing so, but those sites can possibly get the whole nature of the language yanked out from under their feet.
HTML 5 is not expected to be a formal recommendation until 2022.
In summary: The current language of the web is HTML 4.01. HTML 5 expands on that greatly, but is not ready for everyday use. And the differences between HTML 4.01 and any flavour of XML, are minimal at best.
XHTML's main benefit, as Matias said, is it's XML compatibility, and also the other way round; I regularly use an XSLT to transform an XML document into XHTML. Although XSLT can output HTML, it's HTML that's compliant with XML anyway.
Strictly speaking, there's no reason you can't write HTML5 that's totally XML compliant; for that reason alone, I'd say go with HTML5, and by writing it so that it IS XML compliant, you also get all the benefits of XHTML.

What DOCTYPE should I target today?

I'm refactoring a .Net web application that is in
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
Right now the approach is just to aim for the stars and go for the latest doctype just because it's latest, I would like to make a wiser choice and target a specific one and for good reasons.
There are similar questions existing but the answers might be outdated now.
What is the difference, advantages, disadvantages between standards and quirks mode, what are some quirks I may run into with differently set doctypes?
I have been told that an XHTML doctype is preferable to integrate AJAX since the UpadtePanel serializes it and to do so needs to have a XHTML do type, to what extent is this true?
And for browser compatibility, in which direction are browsers going in terms of DOCTYPE, is there a common thrend or do they differ?
HTML5 doctype, which is
<!DOCTYPE html>
XHTML is largely dead as a standard, and never was implemented correctly in most cases.
The new thing is HTML 5.
<!DOCTYPE html> is what you use to specify it. That's it. No DTD name or URL or whatever.
If you're using something that likes XML, like .net, then you might want to use XHTML. But don't do it for any other reason; XHTML never was really popular as a standard, or at least it was almost never used correctly.
Any Doctype:
HTML 4.01 or XHTML 1.0
Strict or Transitional
served as html (not html+xml) should be OK. There's no such thing as a better doctype, you just have to choose one filling your needs and then stick to its rules.
Avoid Frameset, but if you've to, use the title attribute to describe the role of each frame to a screen reader user (same with iframe btw).
Quirks mode (no Doctype) is a PITA, avoid it at all cost. This was OK 8 years ago.
No XML prologue unless you're serving html+xml (good luck with that! If you like complicated things when it's not needed, that's your choice)
If you are forced to use attributes that are forbidden in Strict mode (target="_blank" for example) than use Transitional mode: this is why it was created! And please indicate to your users that the link will open in a new page, whether in the text of your link or in its title. This is important from an accessibility point of view.
HTML 5 is the next big thing, we're waiting for it but as long as it won't work in every browser (I mean IE without JS) it's not advisable to use it in "serious" public sites. Is it even a Draft? What if entire part of it are rewritten in a couple of months?
My web agency uses it for its website but we won't use it on a client site anytime soon: it's just too soon.
Sidenote: I often see catch phrases like "a modern website in HTML5 and CSS3" implying that CSS3 is made for HTML 5. CSS3 has nothing to do with HTML5 and can already be used, as long as it degrades gracefully on old browsers.
You can design HTML5 with CSS2.1 or HTML4.01 Transitional with the latest CSS3 animations that only work in webkit nightlies, no problem.
Whatever you choose, make sure your MIME-Type is compatible with your DOCTYPE
The browser will use the MIME-Type (the HTTP Header ContentType) to determine how to treat your page. For example: A DOCTYPE of XHTML 1.1 Strict served as ContentType Text\HTML is parsed as HTML.
DOCTYPE is important, but largely irrelevant if the wrong ContentType is used.
Browsers have never actually used DOCTYPE to determine the markup language of your document (they use HTTP Content-type instead), so which DOCTYPE you chose was never hugely relevant - just as long as you are using a valid DOCTYPE of some description. Whichever you choose is up to you.
If you're writing HTML, <!DOCTYPE html> is the shortest to type, and puts all browsers into standards mode (which is what you want).
If you're writing XHTML, <!DOCTYPE html> is also perfectly legitimate (XHTML actually requires no DOCTYPE at all, as it relies entirely on HTTP Content-type, but there's no harm putting a DOCTYPE in for portability.
Don't use <!doctype html> - while this is technically valid HTML, it's invalid XHTML so will break if you ever try to parse your page as XML.
Slightly OT sidenote: Some people here have commented that XHTML is a "dead" standard - this is false. XHTML has been integrated into the upcoming HTML5 spec. The spec is entitled "HTML5: A vocabulary and associated APIs for HTML and XHTML"
See:
http://www.w3.org/TR/html5/the-xhtml-syntax.html
http://html5doctor.com/html-5-xml-xhtml-5/

Resources