Why this code showing error in W3C validator? - xhtml

Why this code showing error in W3C validator "character data is not allowed here"
<blockquote>all visible objects, man, are but as pasteboard masks.
But in each event -- in the living act, the undoubted
deed -- there, some unknown but still reasoning thing
puts forth the mouldings of its feature from behind
the unreasoning mask. If man will strike, strike
through the mask. All visible objects, man, are but as pasteboard masks.
But in each event -- in the living act, the undoubted
deed -- there, some unknown but still reasoning thing
puts forth the mouldings of its feature from behind
the unreasoning mask. If man will strike, strike
through the mask.</blockquote>
It does not giving any error in this validator http://www.onlinewebcheck.com/

You can't put text inside a <blockquote> tag. You have to wrap it in another element such as a <p> tag:
<blockquote>
<p>My text.</p>
</blockquote>

A blockquote is not supposed to directly contain text. You'll need to wrap your text in a single p tag or series of p tags before it'll validate.

Note: To validate a blockquote element as strict HTML/XHTML, the element must contain only other block-level elements, like this:
<blockquote>
<p>Here is a long quotation here is a long quotation</p>
</blockquote>
Source: w3schools.com

Related

How to avoid broken thematic sections (eg. div) in HTML?

I am trying to transfer a text from a printed book into HTML5, but meanwhile I am trying to keep its thematic and page/paragraph/lines layout structure exactly as it is. For example, every page of the printed book is divided as a <div> section eg. <div class=page id=55> so that it emulates/represents exactly the page unit of the printed book, and also facilitate referencing. I don't care much how the text will be rendered on the browser, this is something that I can think about later. I just want the HTML and the browser to "know" the original pagination and layout of the printed book.
The problem is that in the printed book, some paragraphs or even boxes, tables etc span over to the next page. If I translate it to HTML, I do it like this:
<div class=page id=1>
<p>Once upon a time...</p>
...
<p>...and so the bold knight
</div>
<div class=page id=2>
slew the evil dragon.</p>
<p>Text...</p>
...
This is illegal in HTML, as we have a <p> tag being interrupted by a </div> tag, and then a new div element beginning with a plain text, which is closed by a </p> tag.
HTML would expect me to close the first part of the broken paragraph with a </p>, and continue with a new <p> tag after the div, but I am not doing this because it doesn't correspond to the pagnation of the original book, and would result in half-paragraphs being understood are 2 proper paragraphs.
So, how to use legal HTML while maintaining the theoretical page/paragraph/broken paragraph/page break structure and information, or at least making the brower "know" the original pagination? Is there a more appropriate tag or method to emulate the page break while keeping the page number id?
Perhaps something like
<p>...and so the brave knight<some tag(s) that show page 2 begins here>killed the dragon</p>
How about instead of encapsulating each page within a div you include a tag at the start of each page designating the page number. An aside tag seems appropriate for this.
<aside class="page-number" data-page="1">Page 1</aside>
<p>Once upon a time...</p>
<p>...and so the bold knight</p>
<aside class="page-number" data-page="2">Page 2</aside>
<p class="continued">slew the evil dragon.</p>
<p>Text...</p>
If you need to continue a paragraph then you'll have to break into multiple elements, but perhaps you can specify when a paragraph is a continuation of a previous one. For instance using the continued class as shown above.
If you really don't want to break the p tag then you could put a span within it that is only used for semantic reasons. Something like this;
<p>...and so the bold knight
<span class="page-marker" aria-hidden="true" data-page="1"></span>
slew the evil dragon.</p>
But this kind of makes less semantic sense than the previous solution.
Try adding display: inline; to either the CSS style of the class page or the style attribute of each page div.

HTML Tags: Presentational vs Structural

I found many different views on many articles on presentation tags, with some people thinking all tags are presentational, but some others do not think so.
For example: in the HTML 5 specification, they do not think <small> is presentational.
In this list of tags - which are all HTML 5 supported - which tag is presentational and which is not?
<abbr>
<address>
<area>
<b>
<bdo>
<blockquote>
<br>
<button>
<cite>
<dd>
<del>
<dfn>
<dl>
<dt>
<em>
<hr>
<i>
<ins>
<kbd>
<map>
<menu>
<pre>
<q>
<samp>
<small>
<span>
<strong>
<sub>
<sup>
<var>
Who decides which HTML tag is presentational and Which is not - and how do they make that decision? Is it a particularly large group such as the W3C or is it based on groups of web developers, i.e. the web community? Also, between the two, which advice we should follow for deciding which tags are presentational?
If a tag is valid as according to the W3C in accepted doctypes, then what are the pros to not using any xhtml tag from any point of view?
in user/usability/accessibility point of view
if we use more HTML tags then pages without CSS will better.
in developer point of view
if we make use of more available tags in HTML, than we do not need to use <span class=className">
it takes more time to write and it uses more charter space than tags in HTML and CSS both.
For example:
instead of using:
<span class="boldtext">Some text<span>
.boldtext {font-weight:700}
We can use:
<b>Some text<b>
b {font-weight:700}
it looks cleaner, it is easier to use , it uses less characters - which will reduce the page size - and it is more readable in source. It also does not break the rule of content and presentation separation.
We can also do this:
<b class="important">Some text<b>
b.important {font-weight:700}
and whenever we want to change font-weight then we can change css only in both examples.
If a tag is considered valid by w3c in their recognized doctypes, then what are the pros to not using any X/HTML presentational tags which are not directly recognized by either the W3C, or by the HTML specifications?
Can we change any design parameters without changing anything in HTML? Does this fit within the meme of content and presentation separation?
If any HTML tag breaks the rule of separation, then does not the css property Content break as well?
see this article.
Why are the HEIGHT and WIDTH attributes for the IMG element permitted?. does it not break the rule of separation? A good debate on this matter can be found here.
W3C decides the semantics of tags. The specification documents of HTML5 gives conditions on the use of the various tags.
HTML5
To continue with your example, there is nothing wrong with using <b> to bold some text unless:
The text being bolded is a single entity already represented by a tag:
Incorrect:
<label for="name"><b>Name:</b></label>
Correct: (Use CSS to style the element)
label { font-weight: bold; }
<label for="name">Name:</label>
The text is being bolded to put added emphasis and weight on a section or words of a block of text.
Incorrect:
<p>HTML has been created to <b>semantically</b> represent documents.</p>
Correct: (Use <strong>)
<p>HTML has been created to <strong>semantically</strong> represent documents.</p>
The following is an example of proper use of the <b> tag:
Correct:
<p>You may <b>logout</b> at any time.</p>
I realize that there doesn't seem to be a lot of difference between the above example and the one using <strong> as the proper example. To simply explain it, the word semantically plays an important role in the sentence and its emphasis is being strengthened by bold font, while logout is simply bolded for presentation purposes.
The following would be an improper usage.
Incorrect:
<p><b>Warning:</b> Following the procedure described below may irreparably damage your equipment.</p>
Correct: (This is used to add strong emphasis, therefore use <strong>)
<p><strong>Warning:</strong> Following the procedure described below may irreparably damage your equipment.</p>
Using <span class="bold"> is markup-smell and simply shouldn't be allowed. The <span> element is used to apply style on inline elements when a generic presentation tag (ie.: <b> doesn't apply) For example to make some text green:
Incorrect:
<p>You will also be happy to know <span class="bold">ACME Corp</span> is a <span class="eco-green">certified green</span> company.</p>
Correct: (Explanation below)
<p>You will also be happy to know <b>ACME Corp</b> is a <em class="eco-green">certified green</em> company.</p>
The reason here why you would want to use <em> as opposed to <span> for the word green is because the color green here is used to add emphasis on the fact that ACME Corp is a certified green company.
The following would be a good example of the use of a <span> tag:
Correct:
<p>You may press <kbd>CTRL+G</hbd> at any time to change your pen color to <span class="pen-green">green</span>.</p>
In this example, the word green is styled in green simply to reflect the color, not to add any emphasis (<em>) or strong emphasis (<strong>).
The whole distinction between "presentation" elements versus "structure" element is, in my opinion, a matter of common sense, not something defined by W3C or anyone else. :-P
An element that describes what its content is (as opposed to how it should look) is a structure element. Everything else is, by definition, not structural, and therefore a presentation element.
Now, I'll answer the second part of your post. I understand this is a contentious topic, but I'll speak my mind anyway.
Well-made HTML should not concern itself with how it should look. That's the job of the stylesheet. The reason it should leave it to the stylesheet, is so you can deliver one stylesheet for desktop computers, another one for netbooks, smartphones, "dumbphones" (for lack of a better term), Kindles, and (if you care about accessibility, and you should) screen readers.
By using presentation markup in your HTML, you force a certain "look" across all these different types of media, removing the ability of the designer to choose a look that works best for such devices. This is micromanagement of the worst sort, and designers will hate you for it. :-)
To use your example, instead of using <b>, you should ask yourself what the boldness is supposed to express. If you're trying to express a section title, use one of the header tags (<h1> through <h6>). If you're trying to express strong emphasis, use <strong>. You get the idea. Express the what, not the how; leave the how to the stylesheet designers.
</soapbox>
It's not that presentational elements should be avoided, it's that markup should be as semantic as possible. When designing a document structure, default styling should be considered a secondary affect. If an element is used solely for presentation, it's not semantic, no matter what element is used.
The example usage of <b> isn't semantic, because <b> imparts no meaning. <span class="boldtext"> also isn't semantic. As such, their usage is mixing presentation into the structure.

Correct use of Blockquote, q and cite?

Is this the correct use of Blockquote, q and cite?
<p>
<blockquote>Type HTML in the textarea above, <q>and it will magically appear</q> in the frame below.
</blockquote>
<cite>refrence url
</p>
Is use of Blockquote, q semantically correct? or both are presentational element , so should not be used?
Yes. They are not presentational elements — blockquote represents a block quotation, q represents an inline quotation, and cite represents a reference to a name, work, standard, URL, etc.
You do have some validation errors that are fairly common with blockquote. A blockquote element cannot be inside a paragraph, and in HTML4 actually needs to contain paragraphs. The nesting of the p and blockquote elements in your fragment needs to be reversed.
The blockquote element (also the q element) can optionally have a cite attribute to specify a URI where the quote came from. HTML5 says user agents should make that link available to the user, and HTML4 doesn't say anything at all. I would include the URI both in the cite attribute and as an inline link, since browsers don't handle it.
Here's how I would write that fragment, with those revisions in mind:
<blockquote cite="http://stackoverflow.com">
<p>Type HTML in the textarea above, <q>and it will magically
appear</q> in the frame below.</p>
</blockquote>
<p>
<cite>reference url</cite>
</p>
Validate this fragment
The other answers on this page are out of date, but the question is still valid.
The q element semantically represents a quotation, and is an inline element. It should be used like so (ie. no block elements inside it):
<p>
In the words of <cite>Charles Bukowski</cite> -
<q>An intellectual says a simple thing in a hard way.
An artist says a hard thing in a simple way.</q>
</p>
Another example:
<p>
<q>This is correct, said Hillary.</q> is a quote from the
popular daytime TV drama <cite>When Ian became Hillary</cite>.
</p>
The q element should not be placed inside a blockquote element, as it would be redundant -- both denote a quote.
A blockquote is a block element, allowing other block elements to be placed inside:
<blockquote>
<p>My favorite book is <cite>At Swim-Two-Birds</cite>.</p>
- <cite>Mike Smith</cite>
</blockquote>
<cite> is an inline element representing the title of a body of work. Since the W3C and WHATWG have now agreed to work together, we have one answer as to what it may contain: The name of a book, a film, a TV show, a game, a song, a play, etc, etc.
It should NOT be a URL or an author's name (a URL can be added with a normal a element and an author is not a piece of work that you're citing).
This is a valid usage:
<figure>
<blockquote>
<p>The truth may be puzzling. It may take some work to grapple with.
It may be counterintuitive. It may contradict deeply held
prejudices. It may not be consonant with what we desperately want to
be true. But our preferences do not determine what's true.</p>
</blockquote>
<figcaption>Carl Sagan, in "<cite>Wonder and Skepticism</cite>", from
the <cite>Skeptical Inquirer</cite> Volume 19, Issue 1 (January-February
1995)</figcaption>
</figure>
You could consider BLOCKQUOTE analogous to a DIV and Q analogous to SPAN.
Recommended usage is to enclose large quotes in BLOCKQUOTE and small, single line or sentence quotes in Q.
<blockquote>
<p>This is a big quote.</p>
<p>This is the second paragraph with a smaller <q>quote</q> inside</p>
</blockquote>
Cite is an attribute on either which merely points to the source.
Using attributes such as the cite attribute of the blockquote or q doesn't make it easily displayable (without JS or tricky CSS) and so does not address the aim of displaying a reference link easily. It is now conforming to include cite (and/or footer) into blockquote to specify the source, either textually of through a URL, of the quote, like below :
<blockquote>
<p>Beware of bugs in the above code; I have only proved it correct, not tried it.” </p>
<cite>Donald Knuth: Notes on the van Emde Boas construction of priority deques: An instructive use of recursion, March 29th, 1977
</blockquote>
Note that :
cases of cite that are part of the quote contents (not the source reference) are also deemed quite rare, and should be handled through a differenciating class on the relevant cite subtag)
Regarding q, it is indeed aimed to quote inline, but it is more likely to be used outside of blockquotes (quotes into quotes are quite rare).
According to this, "cite" is an attribute of q - and is not well supported at that.
The semantic (and valid) use of the <cite> element is still under debate even if "in HTML5, use of this element to mark a person's name is no longer considered semantically appropriate."
You'll find a very detailed and useful article about "<blockquote>, <q> and <cite>" here:
http://html5doctor.com/blockquote-q-cite/

Extracting text fragment from a HTML body (in .NET)

I have an HTML content which is entered by user via a richtext editor so it can be almost anything (less those not supposed to be outside the body tag, no worries about "head" or doctype etc).
An example of this content:
<h1>Header 1</h1>
<p>Some text here</p><p>Some more text here</p>
<div align=right>A link here</div><hr />
<h1>Header 2</h1>
<p>Some text here</p><p>Some more text here</p>
<div align=right>A link here</div><hr />
The trick is, I need to extract first 100 characters of the text only (HTML tags stripped). I also need to retain the line breaks and not break any word.
So the output for the above will be something like:
Header 1
Some text here
Some more text here
A link here
Header 2
Some text here
Some
It has 98 characters and line breaks are retained. What I can achieve so far is to strip the all HTML tags using Regex:
Regex.Replace(htmlStr, "<[^>]*>", "")
Then trim the length using Regex as well with:
Regex.Match(textStr, #"^.{1,100}\b").Value
My problem is, how to retaining the line break?. I get an output like:
Header 1
Some text hereSome more text here
A link here
Header 2
Some text hereSome more text
Notice the joining sentences? Perhaps someone can show me some other ways of solving this problem. Thanks!
Additional Info: My purpose is to generate plain text synopsis from a bunch of HTML content. Guess this will help clarify the this problem.
I think how I would solve this is to look at it as though it were a simple browser. Create a base Tag class, make it abstract with maybe an InnerHTML property and a virtual method PrintElement.
Next, create classes for each HTML tag that you care about and inherit from your base class. Judging from your example, the tags you care most about are h1, p, a, and hr. Implement the PrintElement method such that it returns a string that prints out the element properly based on the InnerHTML (such as the p class' PrintElement would return "\n[InnerHTML]\n").
Next, build a parser that will parse through your HTML and determine which object to create and then add those objects to a queue (a tree would be better, but doesn't look like it's necessary for your purposes).
Finally, go through your queue calling the PrintElement method for each element.
May be more work than you had planned, but it's a far more robust solution than simply using regex and should you decided to change your mind in the future and want to show simple styling it's just a matter of going back and modifying your PrintElement methods.
For info, stripping html with a regex is... full of subtle problems. The HTML Agility Pack may be more robust, but still suffers from the words bleeding together:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
string text = doc.DocumentNode.InnerText;
One way could be to strip html in three steps:
Regex.Replace(htmlStr, "<[^/>]*>", "") // don't strip </.*>
Regex.Replace(htmlStr, "</p>", "\r\n") // all paragraph ends are replaced w/ new line
Regex.Replace(htmlStr, "<[^>]*>", "") // replace remaining </.*>
Well, I need to close this though not having the ideal solution. Since the HTML tags used in my app are very common ones (no tables, list etc) with little or no nesting, what I did is to preformat the HTML fragments before I save them after user input.
Remove all line breaks
Add a line break prefix to all block tags (e.g. div, p, hr, h1/2/3/4 etc)
Before I extract them out to be displayed as plain-text, use regex to remove the html tag and retain the line-break. Hardly any rocket science but works for me.

What tag should be used for short text like "back to top" , "Read more" etc?

What tag should be used for short text like.
Back to top
Read more
is <p> appropirate or something else should be use. because these are not paragraph.
Which is more semantic
<p>Back to top</p>
or
Back to top
or
<div>Back to top</div>
In general you should use the anchor <a> tag.
Nesting an <a> inside a <p> is perfectly valid, but in general the <p> should be reserved for paragraphs of text. Since yours is just a link, the <a> tag alone will probably be the most recommended.
If you want your link to appear as a block element, simply style it with display: block;. The fact that the <a> tag is normally displayed inline is only because it is its default style.
Anchor tag
Back to top
Read more
You can embed an anchor tag inside a block element. So something like this
<p>Back to top</p>
Inline elements must be enclosed inside block level elements, so this is the basic approach:
<p>Back to top</p>
Usually though the <a> element is already inside a <div> tag so the <p> isn't absolutely necessary but it is more semantically correct – it's still a paragraph of text even if there's only a few words in it.
There's no obvious semantic tag for such.
Perhaps you don't really need a tag there at all! Please check for this case.
If your "short texts" are links, then you obviously need <a href=. If you need a CSS style for the text, you can put it into the a tag too.
* If you need a tag for structuring only or to hang CSS styles from, then use <span>.

Resources