Why are some XML URI's recommended by the W3C? - xhtml

I dont know if my question would make sense but in the XML DTD, some websites use URI's that relate to W3.org sites, Why is this?

The DOCTYPE declaration on top of every HTML and XHTML documents defines the document type and the possible elements in the documents. They do this by referring to the DTD (Document Type Definition), which is nominally a file on the W3C website, w3.org.
Now in most cases, browsers don't even load the DTD off the W3C website. Although they're able to load and parse it (DTDs are written in SGML after all, a language all browsers know), they actually don't. They already know what's in it!
If it helps, think of it as being cached and pre-parsed.
Or to look at it in another way, the exact URL of the DTD is enough for the browser to know what is being meant. And since HTML5, specifying the DTD is actually optional; the browsers are smart enough to know what you mean even without saying which one explicitly.
Hope this answers your question.

Related

What is "Extensible" about XHTML?

Why is XHTML called "eXtensible" (the X in XHTML)? Can we, as individual web developers actually extend it?
What separates it from ordinary HTML?
Well, firstly, things have moved on somewhat, and XHTML isn't really a thing anymore. HTML5 isn't parsed as XML, and XHTML 2.0 was of course cancelled.
Despite that, it's possible to use XHTML if you use the application/xhtml+xml mimetype, just be aware of the various shortcomings of that (any error = yellow screen of death, older IEs don't render anything at all).
For a new project, use the HTML5 doctype and serve as text/html. XHTML can be considered as a failure for many reasons.
Anyway, with XHTML you can do things like this:
<!DOCTYPE html SYSTEM "http://example.com/my-xhtml-custom.dtd">
<html xmlns='http://www.w3.org/1999/xhtml' xmlns:custom="http://example.com/" xml:lang='en-US'>
then copy http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd and edit it how you like, and put it where we referenced earlier.
The w3c have a lot to say about this, specifically:
Don't do this! Documents need to have a meaning as well as correct
syntax. SGML and XML only define syntax. HTML and XHTML define
meaning. If you add elements that aren't defined by a standard, only
you yourself know what they mean. And in 20 or 50 years, even you may
not know it anymore…
Of course, you can experiment, for example to work on future Web
formats, but other than that you should not use proprietary elements.
Nowadays, we thankfully have HTML5 which dropped all this XML stuff (no one was using it and it adds a lot of complexity). It's not extensible in the same way, but that's probably a good thing!

XHTML still harmful?

I'm starting a project where the client has mandated the use of XHTML 1.0 Strict. Now I'm wondering whether the problems described in Sending XHTML as text/html Considered Harmful are still current and whether I should try to convince the client that this (very strongly stated) requirement is counterproductive.
Does Internet explorer handle application/xhtml+xml correctly by now?
IE9 handles application/xhtml+xml, including SVG inside it, one of the main reasons to want to use this media type. (Otherwise, there's relatively little point in using it to date, as you get a bunch of scripting changes, and IE<9 incompatibility, in return for relatively little if any performance gain at the moment.)
I don't agree with Hixie that serving XHTML as text/html has ever been really harmful. Using the HTML-compatibility guidelines, XHTML poses no problems to any browsers since the ancient Netscape 4. Although it doesn't really get you anything on the client-side, it can be helpful to your own page handling workflow if you're working with XML processing tools. And the XML syntax rules, being stricter-but-simpler than HTML, are a good thing to author to; this gives the validator a chance to pick up on errors that are valid constructs in SGML/HTML but which are almost certainly not what you meant. (On the other hand, since the validator won't enforce HTML-compatibility guidelines there are a couple of places where it can let through well-formed-but-troublesome markup, most commonly self-closed <script> tags breaking the whole page.)
Specifically, to answer his points: /> and related SGML issues are only a problem to tools that really believe HTML is SGML—which is no browser ever, in the past. In the future, it is specifically allowed in non-XML HTML5.
Hiding scripts/stylesheets from ‘legacy’ (pre-HTML 3.2!) browsers hasn't been an issue for a decade or so: I came up with the mangled comment hack he (rightly) derides as ridiculous, but it was only an exercise; I never intended anyone to use it except in some strange hypothetical emergency. It's certainly not ‘necessary’ for using embedded scripts and stylesheets in XHTML-as-HTML... a straight //<![CDATA[ hack is enough if you need to be able to include < and & characters, and more commonly you don't even need that.
No-one actually wants to sniff for XHTML-as-HTML and treat it differently, so that whole section is moot. “Sending XHTML 1.1 as text/html is NEVER fine” has been changed by W3C (it now is fine after all), and XHTML 2.0 is dead.
So yes, use XHTML 1.0 Strict, or XHTML 1.1 or XHTML5, if you like. But until IE9 is your baseline browser (and that's not going to be the case for ages), you'll have to stick with text/html.
Internet Explorer 9 will handle application/xhtml+xml documents through a tag soup parser.
Internet Explorer 8 and earlier will prompt the user to save the document or open it in another application.
Internet Explorer 6 and newer all have significant market share (although this does depend, to some degree, on your market).
Nothing significant has changed as regards browser support for real XHTML for many years.
It is still far more trouble than it is worth unless you actually use XML parsers in your production chain (in which case, good luck persuading them to output XHTML that meets the HTML Compatibility Guidelines).
This depends on what you mean by "Internet Explorer".
For instance, IE6 is still from something like 2001 (that hasn't changed), and no, it still doesn't handle it correctly.
Over the past one year, (27th May 2017 - 27th April 2018), the combined share of IE 6, 7, and 8 comprises 1.72% according to netmarketshare.
Every other major browser supports real XHTML (i.e. sent with the application/xhtml+xml MIME type. My answer to you is "No, it's not harmful".
Whether it's advantageous, I would guess it doesn't matter much until you actually grok and use XML technologies (SVG, MathML, etc) on the web comfortably (yes HTML syntax also supports them, but it's virtually a hack).
If browser makers put more effort into XML parsers, it could still matter for pure parsing speed.

What type of XHTML and CSS validations errors are safe to avoid?

What type of XHTML and CSS validations errors can be avoided? which would not harmful today and tomorrow (if we do not touch xhtml, css )?
I mean errors which will not create any problem on future upgrade of browser, css and html version? they just show as an error today?
I think one thing I know is Vendor extensions. Are there any other errors/warnings which will not create any bad effect for user and developer?
If I'm making a site and i get many errors should i try to give time to solve every error? if i will try to solve all error then i will have to use javascript on some instances in place of css
The XHTML and CSS validators will validate against the corresponding specifications of the W3C standards. Ignoring these mean that your page(s) are deviating from those standards.
Web browsers aim to implement these standards, so ignoring a warning is likely to cause issues on at least some browsers. Therefore, you cannot ignore any warning that the validators give.
Also, having XHTML and CSS conformant web pages is not guaranteed to work on all browsers and be compatible with them as the browsers may implement something differently or incorrectly.
Having conformant pages is still a good thing, as most browsers are (for the most part) conformant and having more conformant pages helps put the ownership on the browser implementers. That is, you (as a web page author) need only concern yourself with being standard compliant. If a browser can't handle that, the issue is with the browser, not the web page author.
If you want to be compatible with a large number of browsers, start with the valid conformant page and then add the minimum needed to get it working on other non-conformant browsers. Doing it this way is a lot easier than starting with a non-conformant page and trying to make that work on most browsers.
You should try to avoid all parse errors. If in doubt, try the validator.w3.org and use the html tidy function to clean up the code.
Each browser will render and parse XHTML and css differently. Even if it works now it might not work tomorrow.
The only safe answer is "none". The best guarantee you have for future compatibility wth all browsers is stick to the standard and have fully validated xhtml and css.

Should I not use those XHTML elements/tags/attributes which will not be in the HTML 5 spec?

If I use XHTML 1.0 Strict currently, then should I not use those XHTML elements/tags and attributes which will not be in the HTML5 spec? E.g. <acronym> and <big>
If you want to be safe yes (and you should want to be safe, it's the....well, safest, route), use the tags appropriate to the DOCTYPE you're using of course.
In practice will it matter using deprecated tags? no, probably not for years to come. I'm not saying do it, just that browsers won't call you on it, at least not at the moment.
HTML isn't final yet, so it's not included, but you can view an updated "which tags are in my DTD?" here: http://www.w3schools.com/tags/ref_html_dtd.asp
Note that strictly, HTML5 never uses the term "Deprecated". "Absent" or "Obsolete" are used instead. I believe the reason is that "deprecated" implies that support for them may be removed in the future, whereas that is not currently considered likely by the browser manufacturers, because too many existing pages would break.
Also in at least one case, that of the profile attribute, the reason for making it obsolete is that it is useless, and therefore effort adding it is effort wasted. But if you already have it in your page, effort removing it is effort wasted, so you might as well leave it.
There again, HTML5 is becoming fragmented, and there is a proposal to add the profile attribute back in to HTML as a separate document from the main spec, as is the intention for RDFa and microdata.
In the same way, there are still proposals to remove or change the names of HTML5 elements and attributes which may or may not make it into the final spec.
For all these reasons, I'd say it's far too early to be putting in work to remove HTML5 "Absent" elements and attributes from your HTML documents.
Even if you leave them in there, it's a piece of cake to search and replace them with tags that are accepted in HTML 5.
There's no need to go searching through reams of code to change those tags, but yes, it's a good idea to phase them out. Use <abbr> instead of <acronym> and some form of CSS instead of <big>. The <strong> tag might be more appropriate in some cases if you style it to change the font size, or use one of the heading tags if that's more appropriate.

Is it acceptable for invalid XHTML?

I've noticed a lot of sites, SO included, use XHTML as their mark-up language and then fail to adhere to the spec. Just browsing the source for SO there are missing closing tags for paragraphs, invalid elements, etc.
So should tools (and developers) use the XHTML doctype if they are going to produce invalid mark up? And should browsers be more firm in their acceptance of poor mark-up?
And before anyone shouts hypocrite, my blog has one piece of invalid mark-up involving the captha (or it did the last time I checked) which involves styling the noscript tag.
There are many reasons to use valid markup. My favorite is that it allows you to use validation as a form of regression testing, preventing the markup equivalent of "delta rot" from leading to real rendering problems once the errors reach some critical mass. And really, it's just plain sloppy to allow "lazy" errors like typos and mis-nested/unclosed tags to accumulate. Valid markup is one way to identify passionate programmers.
There's also the issue of debugging: valid markup also gives you a stable baseline from which to work on the inevitable cross-browser compatibility woes. No web developer who values his time should begin debugging browser compatibility problems without first ensuring that the markup is at least syntactically valid—and any other invalid markup should have a good reason for being there.
(Incidentally, stackoverflow.com fails both these tests, and suggestions to fix the problems were declined.)
All of that said, to answer your specific question, it's probably not worthwhile to use one of the XHTML doctypes unless you plan to produce valid (or at least well-formed) markup. XHTML's primary advantages are derived from the fact that XHTML is XML, allowing it to be processed and transformed by tools and technologies that work with XML. If you don't plan to make your XHTML well-formed XML, then there's little point in choosing that doctype. The latest HTML 4 spec will probably do everything you need, and it's much more forgiving.
We should always try to make it validate according to standards. We'll be sure that the website will display and work fine on current browsers AND future browsers.
I don't think that, if you specify a doctype, there is any reason not to adhere to this doctype.
Using XHTML makes automated error detection easy, every change can be automatically checked for invalid markup. This prevents errors, especially when using automatically generated content. It is really easy for a web developer using a templating engine (JSP, ASP.NET StringTemplate, etcetera) to copy/paste one closing tag too little or too many. When this is your only error, it can be detected and fixed immediately. I once worked for a site that had 165 validation errors per page, of which 2 or 3 were actual bugs. These were hard to find in the clutter of other errors. Automatic validation would have prevented these errors at the source.
Needless to say, choosing a standard and sticking to it can never benefit interoperability with other systems (screen scrapers, screen readers, search engines) and I have never come across a situation where a valid semantic XHTML with CSS solution wasn't possible for all major browsers.
Obviously, when working with complex systems, it's not always possible to stick to your doctype, but this is mostly a result of improper communication between the different teams developing different parts of these systems, or, most likely, legacy systems. In the last case it's probably better to isolate these cases and change your doctype accordingly.
It's good to be pragmatic and not adhere to XHTML just because someone said so, regardless of costs, but with current knowledge about CSS and browsers, testing and validation tools, most of the time the benefits are much greater than the costs.
You can say that I have an OCD on XHTML validity. I find that most of the problems with the code not being valid comes from programmers not knowing the difference between HTML and XHTML. I've been writing 100% valid XHTML and CSS or a while now and have never had any major rendering problems with other browsers. If you keep everything valid, and don't try anything too exotic css wise, you will save yourself a ton of time in fixes.
I wouldn't use XHTML at all just to save myself the philosophical stress. It's not like any browsers are treating it like XHTML anyway.
Browsers will reject poor mark-up if the page is sent as application/xhtml+xml, but they rarely are. This is fine.
I would be more concerned about things like inline use of CSS and JavaScript with Stack Overflow, just because they make maintenance harder.
Though I believe in striving for valid XHTML and CSS, it's often hard to do for a number of reasons.
First, some of the content could be loaded via AJAX. Sometimes, fragments are not properly inserted into the existing DOM.
The HTML that you are viewing may not have all been produced in the same document. For example, the page could be made of up components, or templates, and then thrown together right before the browser renders it. This isn't an excuse, but you can't assume that the HTML you're seeing was hand coded all at once.
What if some of the code generated by Markdown is invalid? You can't blame Stack Overflow for not producing valid code.
Lastly, the purpose of the DOCTYPE is not to simply say "Hey, I'm using valid code" but it's also to give the browser a heads up what you're trying to do so that it can at least come close to correctly parsing that information.
I don't think that most developers specify a DOCTYPE and then explicitly fail to adhere to it.
while I agree with the sentiment of "if it renders fine then don't worry about it" statement, however it's good for follow a standard, even though it may not be fully supported right now. you can still use Table for layout, but it's not good for a reason.
No, you should not use XHTML if you can't guarantee well-formedness, and in practice you can't guarantee it if you don't use XML serializer to generate markup. Read about producing XML.
Well-formedness is the thing that differentiates XHTML from HTML. XHTML with "just one" markup error ceases to be XHTML. It has to be perfect every time.
If "XHTML" site appears to work with some errors, it's because browsers ignore the DOCTYPE and interpret page as HTML.
See XHTML proxy that forces interpretation of pages as XHTML. Most of the time they fail miserably. This is one of the reason why future of XHTML is uncertain and why development of HTML has been resumed.
It depends. I had that issue with my blog where a YouTube video caused invalid XHTML, but it rendered fine. On the other hand, I have a "Valid XHTML" link, and a combination of a "Valid XHTML" claim and invalid XHTML is not professional.
As SO does not claim to be valid, I think it's acceptable, but personally if I were Jeff i would be bothered and try to fix it even if it looks good in modern browsers, but some people rather just move on and actually get things done instead of fixing non-existent bugs.
So long as it works in IE, FF, Safari, (insert other browser here) you should be okay. Validation isn't as important as having it render correctly in multiple browsers. Just because it is valid, doesn't mean it'll work in IE properly, for instance.
Run Google Analytics or similar on your site and see what kind of browsers your users are using and then judge which browsers you need to support the most and worry about the less important ones when you have the spare time to do so.
I say, if it renders OK, then it doesn't matter if it's pixel perfect.
It takes a while to get a site up and running the way you want it, going back and making changes is going to change the way the page renders slightly, then you have to fix those problems.
Now, I'm not saying you should built sloppy web pages, but I see no reason to fix what ain't broke. Browsers aren't going to drop support for error correction anytime in the near future.
I don't understand why everyone get caught up trying to make their websites fit the standard when some browsers sill have problems properly rendering standard code. I've been in web design for something like 10 years and I stopped double codding (read: hacking css), and changing stupid stuff just so I could put a button on my site.
I believe that using a < div> will cause you to be invalid regardless, and it get a bit harder to do any major JavaScript/AJAX without it.
There are so many standards and they are so badly "enforced" or supported that I don't think it matters. Don't get me wrong, I think there should be standards but because they are not enforced, nobody follows them and it's a massive downward spiral.
For 99.999% of the sites out there, it really won't matter. The only time I've had it matter, I ran the HTML input through HTMLTidy to XHTML-ize it, and then ran my processing on it.
Pretty much, it's the old programmer's axiom: trust no input.

Resources