RSS Feed validation issue - rss

I am having an issue with my RSS feed that has not been an issue before. You can find the feed file here.
I understand that a .js isnt a standard RSS enclosure, but it hasn't been an issue for feeds since I started using the tag.
I scoured it for ampersands or anything that might be causing the hangup, but nothing. Any idea what could be the cause of that problem?

You have an unclosed tag <itunes:category text="Star Wars">; you can use an xml validator to look for errors like this.

Related

Mediawiki markup on RSS

Is it possible to remove the markup wiki language from the RSS feed and only show the article content?
Because I am using different template like info-boxes etc. and when people click the RSS link it show all the template markup and all the unnecessary coding that people don't really care. I been trying to find a good tutorial or help where I can accomplish this.
Screentshot
As Dereckson says, no, it's not possible. Feeds are just an alternate way to consume recent changes.
The ability to consume recent changes in parsed format essentially equates the feature request for visual diffs (HTML diffs). Will be possible at some point with Parsoid.

What formatting can RSS readers reliably interprete?

Im making a normal RSS feed for my website. I need to include simple html formatting in the description eg paragraphs, line breaks, lists, etc. To do this I need to wrap the description content as CDATA.
The issue with this is that when I validate my feed the content of the CDATA is ignored. So although the feed validates, I dont actually know if everything is ok or not.
How can I find out what markup will likely be read ok by the various RSS readers?
Can I use whatever markup I would happily put in a website? How about inline styles? Or is more like designing html emails? Thanks
RSS files are XML Formatted plain text, I think that's the only standard you can rely upon.
I think most Syndicators look like they're handling HTML in RSS as they simply download the linked article when you choose the header.
If you're looking to embed rich content, then you may well be better investigating Atom instead of RSS.
Have a look at this S/O question: Which is better for encoding HTML for RSS?

Why do Google News feeds have such strange structure?

I'm trying to incorporate a google news feed in my website (Using the built-in SimplePie functionality of WordPress).
However, the default feed gets rendered in a strange table structure. Sure enough, when I inspect the feed XML, I see that Google News has a whole bunch of table html as its 'description' element, complete with embedded styles, etc (See this example)- essentially dictating how the feed must be displayed, and not allowing for any effective css based customization.
This seems really dumb- can anyone help explain what is going on, or at least agree with me that this is just a terrible feed architecture?
Feeds often include html tags, as many (most?) readers will handle and use them, and that way the RSS provider can have some nice looking output in the reader, as you've guessed. (I prefer flagging it as CDATA unless it's proper xhtml, as it's not valid xml/rss otherwise). It's not in the original spirit of RSS perhapts, but the Google feed is just an extreme example of common practice. As per your problem, does strip_htmltags help (simplepie.org/wiki/reference/simplepie/strip_htmltags)?

How to remove script tags?

On my website I have tags that people can use to post on my site and mess with it.
What I would like to know is how do i make it so the browser just reviews everything in a
file I write too as text.
I would like it so there is no html and in certian parts of my website!
All of it is in ASP
Check out http://htmlpurifier.org/. It filters HTML input with a white list of acceptable tags, so you don't get anything undesirable - like iframes, or javascript, etc. Assuming I understand your question correctly.
The general term for what you are asking for is "HTML Sanitization" - There is actually a good discussion of this, along with some code on the StackOverflow Blog.

What's the best way to remove (or ignore) script and form tags in HTML?

I have text stored in SQL as HTML. I'm not guaranteed that this data is well-formed, as users can copy/paste from anywhere into the editor control I'm using, or manually edit the HTML that's generated.
The question is: what's the best way of going about removing or somehow ignoring <script/> and <form/> tags so that, when the user's text is displayed elsewhere in the Web Application, it doesn't disrupt the normal operation of the containing page.
I've toyed with the idea of simply doing a "Find and Replace" for <script>/<form>with <div> (obviously taking into account whitespace and closing tags, if they exist). I'm also open to any way to somehow "ignore" certain tags. For all I know, there could be some built-in way of saying (in HTML, CSS, or JavaScript) "for all elements in <div id="MyContent">, treat <form> and <script> as <div>.
Any help or advice would be greatly appreciated!
In terms of sanitising user input, form and script tags are not the only ones that should be cleaned up.
The best way of doing this job depends a little on what tools you are using. Have a look at these questions:
What’s the best method for sanitizing user input with PHP?
Sanitising user input using Python
It depends on which language you're using. In general, I'd recommend using an HTML parser, constructing a small DOM from the snippet, then nuking unwanted elements. There are many good HTML parser, especially designed to handle real-world, messy HTML. Examples include BeautifulSoup (Python), HTMLParser (Java)... And, since the answer came in while I was typing, what Colin said!
Don't try and do it yourself - there are far too many tricks for getting bits of script and general nastiness into a page. Use the Microsoft AntiXSS library - version 3.1 has HTML sanitation built in. You probably want the GetSafeHTMLFragment method, which returns a sanitised chunk of HTML. See my previous answer.
Since you're using .Net I would recommend HtmlAgilityPack as it is easy to work with and works well with malformed HTML.
Though the answers suggested were acceptable, I ended up using a good old regular expression to replace begin and end <script> and <form> tags with <div>'s.
txtStore.Text=Regex.Replace(txtStore, "<.*?>", string.Empty);
I had faced same problem before. But my scenario was something different. I was adding content with ajax request to page. The content coming in ajax response was html and it also included script tags. I just wanted to get html without any script so I did removed all script tags from ajax response with jquery.
jquery-remove-script-tags-from-string

Resources