Drupal Feeds showing encoded HTML markup

Drupal Feeds showing encoded HTML markup - drupal

Can't figure out why this is happening, but my RSS feeds are showing HTML encoding in the description field that I need to get rid of:
For example:
<description><div class="field field-type-text field-field-location">
I just can't figure out why this would be happening.

That's correct. The content of <description> is supposed to be XML-text-encoded HTML. At least for RSS 2.0; other versions of RSS are notoriously inconsistent and woolly on this matter.
(If it weren't encoded, then only well-formed and namespaced XHTML could go in the element. This approach was not taken, primarily because RSS predates XHTML.)

I believe you are suffering from this issue:
http://drupal.org/node/666930
It's a core PHP bug that exists in certain versions of PHP, here's the core bug:
http://bugs.php.net/bug.php?id=45996

Related

What's the difference between XHTML and DHTML?

Reading about both separatedly, looks like the same, html+xml+javascript.
What's the difference between then? Is there any?

XHTML is a w3c standard, a form of HTML that also strictly conforms to XML.
DHTML is a largely deprecated term (it is short for 'Dynamic HTML') which was introduced when the first early static web sites started introducing client side scripting to make the page more 'dynamic'. Nobody really talks in terms of DHTML any more (the term got superseded by 'ajax', and 'web 2.0', and 'web app')

No, neither one is HTML + XML + Javascript.
XHTML is HTML (but also XML)
It's just a dialect of HTML that conforms to the syntax rules of XML. Javascript is not part of the XHTML specification (or any HTML specification).
DHTML is HTML + Javascript
It stands for Dynamic HTML, and Javascript adds the dynamic part. The HTML part can also be the dialect XHTML.

XHTML is a dialect that is based on the XML language while DHTML is not a dialect or a language but a collection of other technologies
Both were created to provide additional features and interactivity to HTML
DHTML still uses HTML at its core and is plagued with HTML related problems
XHTML is more streamlined and easier to code with because of its conformance to XML
DHTML is already outdated and has been replaced by other technologies
take a look to this post:
http://www.differencebetween.net/technology/difference-between-dhtml-and-xhtml/

DHTML is dynamic HTML, means the contents of html becomes dynamic and changes time to time and did not require developer again after one time creation of language.
XHTML extensive HTML means this is also uses XML , simple is that in XML you can build your own tags and used in your project or file.

ASP.NET RSS Feed giving style sheet error

Hey im wondering why I am receiving the following error in my rss feed
"This XML file does not appear to have any style information associated with it. The document tree is shown below."
from a bit of research ive done this is becuase I dont have a stylesheet attached. But I have done plenty of RSS Feeds before and normally they pick up the default look and feel as below
http://www.tbray.org/ongoing/ongoing.atom
I am just wondering why this one is giving the above error ?

Most likely, the content-type of your feed is not correct, so IE is treating it as raw XML.
What is the URL of your feed? What browser are you using?
the tbray.org URL returns a
Content-Type: application/atom+xml
What is the content-type header returned by your misbehaving feed?
('wget --save-headers ...' may be useful)

my guess is that you need to make sure you declare your namespaces.
Take tim bray's feed and save it locally as test.htm. bring it up in firefox and it will show nicely. now if you remove a namespace in that's being used like :thr the content will disappear. if you remove the base namespace you'll just get plain text.

HTML5 syntax - HTML vs XHTML

Even with HTML5 being the path forward for HTML we get two options as developers: XHTML syntax and HTML syntax. I've been using XHTML as my main doctype for 5 or so years so I'm very comfortable with it.
But my question is given that non-xml syntax will be allowed, is there any reason to stick with a valid XML syntax? Do you gain anything going with one over another, besides preference (compatibility, etc)? Personally I'll feel a little dirty going back to not closing tags, is second nature to me now, but would I gain something going back to HTML syntax?
Update: I guess my true question is is there a reason to switch from XHTML to HTML syntax? I've been using XHTML for years and not sure if there is a reason to switch back. Browser compatibility (IE was sometimes finiky with the application/xhtml+xml mime-type), etc?

The advantage of XHTML syntax is that it is XML. It can be easily parsed, understood and manipulated. The HTML syntax is a lot harder for clients to work with.
Nonsense! The HTML5 spec defines how to parse HTML in a way that is relatively easy to implement, and off-the-shelf parsers are being developed that can be easily integrated into tool chains. It's even possible for an HTML5 parser to be integrated into an XML tool chain in place of an XML parser.
But what you need to understand is that in practice, you're most likely using HTML anyway, even if you think you're using XHTML based on the DOCTYPE. If your content is being served as text/html, instead of application/xhtml+xml or another XML MIME type, then your content will be processed as HTML.
With HTML5, you can choose to use HTML-only syntax, meaning that it is only compatible with being served and processed as text/html it is not well-formed XML. Or use XHTML-only syntax, meaning that is is well-formed XML, but uses XML features that are not compatible with HTML. Or, you can write a Polyglot document, which is conforming and compatible with both HTML and XHTML processing (In principle, this is conceptually similar to writing XHTML 1.0 that conforms with Appendix C guidelines).

I guess my true question is is there a
reason to switch from XHTML to HTML
syntax? I've been using XHTML for
years and not sure if there is a
reason to switch back. Browser
compatibility (IE was sometimes finiky
with the application/xhtml+xml
mime-type), etc?
As mentioned in a previous answer, text/html is gets parsed as HTML and application/xhtml+xml gets parsed as XML. Thus, you should use the syntax that matches the MIME type you use.
If you are now serving text/html but using XHTML syntax, then you should fix your content to use the HTML5 syntax. You may already be close, since HTML5 allows the XMLesque /> empty element syntax for void elements (elements that are always empty, such as img and br).
If you are now using application/xhtml+xml, IE support would be a reason to switch to text/html and the HTML syntax if you care about supporting IE.
Trying to write polyglot documents that are correct HTML5 and XHTML5 (for serving different MIME types do different browsers with the same payload bytes) is harder than it seems at first sight and not worth the trouble.

The HTML5 draft is very clear about which syntax to use:
use HTML syntax when sending pages as text/html
use XHTML syntax when sending pages as application/xhtml+xml
Reference: http://dev.w3.org/html5/spec/Overview.html#authors-using-xhtml

When using XHTML you can mix it with other XML content, f.e. MathML, SVG or your own proprietary format, by just changing namespace at some point. Also, you can embed XHTML inside other XML documents.
(well, actually MathML and SVG can be used in non-XML HTML5 too, but they are special-cased)

You shouldn't use XHTML to serve content on the Web (or any network including Internet Explorer clients); see Sending XHTML as text/html Considered Harmful for the full rationale.

Most of the benefits of XHTML have failed to materialise. While I wouldn't recommend it for new projects, XHTML served as text/html seems to be quite manageable and widespread, as long as you follow the compatibility guidelines. It probably isn't worthwhile changing any significant projects back to the HTML serialisation.

I like XHTML, because it forces me to write a good page. There are many advantages to XHTML, because browsers parse it faster, and you need to make well formed XML rather than just HTML. Also, you need to serve a page with the MIME Type application/xhtml+xml or you don't get any of the advantages of the X. The only problem with XHTML is that it won't display in IE8 and earlier.

The advantage of XHTML syntax is that it is XML. It can be easily parsed, understood and manipulated. The HTML syntax is a lot harder for clients to work with.
But ultimately, it is just a matter of syntax. Both forms are allowed for HTML5.

Update: I guess my true question is is there a reason to switch from XHTML to HTML syntax? I've been using XHTML for years and not sure if there is a reason to switch back. Browser compatibility (IE was sometimes finiky with the application/xhtml+xml mime-type), etc?
You have to really consider two things. The language you are writing and the language you are sending. The Web is defined by 3 components:
URI
A resource - Markup Language (document)
A protocol - HTTP (tool for managing information space)
You can write a document with an XML syntax on your desktop such as using XHTML. In this specific environment, if you give the extension ".xhtml" to the filename and open it with your local browser, it will be parsed as XML. If you give the extension ".html" to the filename, it will be parsed as HTML. Basically in your authoring tool, it is XML, but this doesn't matter anymore once you process it with a tool.
On the Web, your ressource identified by a URI will be sent with a specific mimetype, most of the time, these days, people are using text/html. The mimetype defines how the client (browser, search engine bot, etc.) must process your document. If you are using an XML syntax but send it with text/html, the document will be processed by an html parser.
For sending your documents over the wire as XML, you have to configure your server to send it as application/xhtml+xml. (Note: that IE8 and previous versions do not understand what is application/xhtml+xml and they will propose the save menu.)
The HTML 5 Abstract model has been designed in a way that you can almost write it with an html syntax or an xml syntax in text/html. Almost because even if you write with an XML syntax (closing empty elements, quotes around attributes, etc.) you will get into troubles for complex pages which are calling scripting and namespaces, due to the way XML parsers and HTML parsers deal with those.

2019 UPDATE
W3 own words about XHTML:
"A newer specification exists that is recommended for new adoption in place of this specification. New implementations should follow the latest version of the HTML specification."
So, you should use HTML 5.*

Images in RSS feed

Whenever I see images in an RSS feed, they are embedded in CDATA, rather than surrounded by tags.
In my feed, I would like the images to show up without doing that.
Whether in the browser, or a feed reader (Bloglines) or through FeedBurner, the following structure does not show images, although it is valid RSS. Does anyone have experience with this?
<item>
<category>Viewbook</category>
<title>Widget</title>
<description>Learn more about our widgets.</description>
<link>http://www.widget.com/Default.aspx</link>
<image>
<url>http://www.widget.com/images/thumb.gif</url>
<title>Widget</title>
<link>http://www.widget.com/Default.aspx</link>
<description>Learn more about our widgets.</description>
</image>
</item>

On Colonol Sponsz' hint, I researched:
There's no image tag for items, only for the channel. So you have to do it via the CDATA tag.

For completeness: In RSS 2.0, you CAN have a single enclosure inside an item, which per the spec. can be for a single image. However I understand that support among feed aggregators varies. More typically this is used for things like podcasts. The RSS 2.0 standard states:
<enclosure> is an optional sub-element of <item>.
It has three required attributes. url says where the enclosure is located, length says how big it is in bytes, and type says what its type is, a standard MIME type.
The url must be an http url.
Note that you must include the size of the item, along with the URL and mime type.
However, as others indicated, including the picture(s) in CDATA is much more common.

I believe you can use <media:content ....> items with good support by most rss readers, it is working flawlessly for us on mailchimp (rss to email newsletter).
See http://kb.mailchimp.com/article/how-can-i-format-the-image-content-in-my-rss-to-email-campaigns
EDIT: Here's a live link: https://blog.mailchimp.com/rss-to-email-enhancement-for-publishers/

You can use the media:content element (spec) within item.
Make sure you declare the MRSS (Media RSS) namespace (the xmlns:media attribute, below) for this element, if it is not declared for the whole RSS feed, as it won't validate otherwise. (E.g., out-of-the-box WordPress.)
<media:content
xmlns:media="http://search.yahoo.com/mrss/"
url="http://www.widget.com/images/thumb.gif"
medium="image"
type="image/jpeg"
width="150"
height="150" />
This may or may not display as you'd like; you'd have to experiment. Embedding in content is in that way simpler, though this route helps with things like MailChimp integration (h/t this answer) or other custom solutions.
An example implementation for WordPress is in my answer here.

Use, e.g.:
<enclosure url="http://www.scripting.com/mp3s/weatherReportSuite.mp3" length="12216320" type="audio/mpeg" />
Documentation here

It works with a seperate tag, as you said. The problem is the specification of version 2.0.
I know, there are feed reader that does supress images for bandwidth reasons.
Source: RSS specification 2.0 via Wikipedia

Are there any tools out there to compare the structure of 2 web pages?

I receive HTML pages from our creative team, and then use those to build aspx pages. One challenge I frequently face is getting the HTML I spit out to match theirs exactly. I almost always end up screwing up the nesting of <div>s between my page and the master pages.
Does anyone know of a tool that will help in this situation -- something that will compare 2 pages and output the structural differences? I can't use a standard diff tool, because IDs change from what I receive from creative, text replaces lorem ipsum, etc..

You can use HTMLTidy to convert the HTML to well-formed XML so you can use XML Diff, as Gulzar suggested.
tidy -asxml index.html

If out output XML compliant HTML. Or at least translate your HTML product into XML compliancy, you at least could then XSL your output to remove the content and id tags. Apply the same transformation to their html, and then compare.

I was thinking on lines of XML Diff since HTML can be represented as an XML Document.
The challenge with HTML is that it might not be always well formed. Found one more here showing how to use XMLDiff class.

A copy of my own answer from here.
What about DaisyDiff (Java and PHP vesions available).
Following features are really nice:
Works with badly formed HTML that can be found "in the wild".
The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.
In addition to the default visual diff, HTML source can be diffed coherently.
Provides easy to understand descriptions of the changes.
The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.

winmerge is a good visual diff program

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex