Why do Google News feeds have such strange structure? - rss

I'm trying to incorporate a google news feed in my website (Using the built-in SimplePie functionality of WordPress).
However, the default feed gets rendered in a strange table structure. Sure enough, when I inspect the feed XML, I see that Google News has a whole bunch of table html as its 'description' element, complete with embedded styles, etc (See this example)- essentially dictating how the feed must be displayed, and not allowing for any effective css based customization.
This seems really dumb- can anyone help explain what is going on, or at least agree with me that this is just a terrible feed architecture?

Feeds often include html tags, as many (most?) readers will handle and use them, and that way the RSS provider can have some nice looking output in the reader, as you've guessed. (I prefer flagging it as CDATA unless it's proper xhtml, as it's not valid xml/rss otherwise). It's not in the original spirit of RSS perhapts, but the Google feed is just an extreme example of common practice. As per your problem, does strip_htmltags help (simplepie.org/wiki/reference/simplepie/strip_htmltags)?

Related

How to style content coming from a Headless CMS?

Last month I read about Headless CMS for the first time, and I just felt in loved with that approach.
But just right after, I wondered how could I format and/or add style to the content if some day I worked with this technology.
By styling the content, I mean words within a title, paragraph and so on; not a whole paragraph, which is quite obvious it can be done.
It seems to me that it is impossible, since you only get a JSON with no HTML whatsoever; just raw texts. So it looks like this is the major downside of consuming content through a Headless CMS from a Front End perspective.
Formatting text is just fundamental, specially when dealing with large content. And I am sure I cannot be the first one concerned about not being able to add some bold and/or italics to a text to emphasize the important parts of it.
But I can't find any website discussing this topic, just "how to model the content" and whatnot.
Does really no one care about it?
I would appreciate if anyone could shed some light about this question.
Diving into the Headless CMS #RicoHancock has pointed out, I've learnt that it is completely feasible to store rich text and strucuted content within a JSON that can be converted to HTML following some specifications I wasn't aware of.
In the particular case of DatoCMS, they use a specification called dast.
To learn more about it, visit their docs (the following link contains very illustrative code examples):
https://www.datocms.com/docs/structured-text/dast
Paraphrasing their own words:
Structured Text format adheres to the Unified collective, which offers a big ecosystem of utilities to parse, transform, manipulate, convert and serialize content of any kind.
The "Unified collective" is a collective of free and open source packages to work with content as structured data with plugins. In order to create the syntax trees, Unified uses UNIST nodes.
UNIST is a specification, and stands for "UNiversal Syntax Tree".
More info about the UNIST spec and the Unified ecosystem:
https://github.com/syntax-tree/unist
https://unifiedjs.com/learn/guide/introduction-to-unified/
https://unifiedjs.com/learn/guide/using-unified/
TLDR: Markdown.
The company I work for uses DatoCMS. We have a blog, and each blog post is created in our CMS by our copywriting team. DatoCMS allows us (the developers) to create "blocks" that make up the blog post. We have image blocks and content blocks that are rendered by a template file on our frontend. The content blocks support Markdown, so italics, bold, and links work. When our copywriting/marketing team want to make a new blog post, they go to the CMS, create a new post, add a title, slug, and blocks, and then save.
I don't have much experience with other Headless CMS', so not sure if Markdown will work there, but I don't know why it wouldn't, Markdown is all over the internet. (In fact, this answer is Markdown XD)

Mediawiki markup on RSS

Is it possible to remove the markup wiki language from the RSS feed and only show the article content?
Because I am using different template like info-boxes etc. and when people click the RSS link it show all the template markup and all the unnecessary coding that people don't really care. I been trying to find a good tutorial or help where I can accomplish this.
Screentshot
As Dereckson says, no, it's not possible. Feeds are just an alternate way to consume recent changes.
The ability to consume recent changes in parsed format essentially equates the feature request for visual diffs (HTML diffs). Will be possible at some point with Parsoid.

How to get a site's entire RSS history as XML?

Given a website/blog's RSS feed link, is there any way to get that site's entire RSS history (all its blog posts EVER) in a single XML file?
Is this something that is only possible from the other end (ie. a site publishes it's entire blogroll history as RSS)? In which case, how is this achieved?
Thanks!
S
RSS is just another way of expressing the data. It depends entirely on the site. If the site provides a way for you to specify how many items you want (which is unlikely), then you should know that that won't work on other sites.
Technically speaking, formatting the data in RSS is no different than formatting it in HTML. For example, many sites (including this one), need to represent some sequential data (questions in SO's case) on a page in HTML. To do this, the site will iterate through some data source (like a database), and output HTML so your web browser can render it, until it hits some limit. Knowing that limit is impossible, as it depends on the site. This is exactly what RSS does: it iterates through a data source, spitting out XML as it goes along. Again, knowing the limit is not possible.
Is this something that is only possible from the other end ...? In which case, how is this achieved?
If you can change how your site generates the RSS, simply remove the limit. I know this is vague, but it really depends on the implementation. There are dozens of RSS implementations, all different, and all behaving differently.
So my point is, nothing will work universally, you have to change the site itself to modify that behavior.
You are right there. The site has to publish its entire history, otherwise you can't get it. Doing it on server side, if you have access to the database, its quite easy. Just dump all the rows as XML. It actually takes effort to filter and limit the xml. How you can do it on blogging platforms? You could use plugins that allow you to do this

What formatting can RSS readers reliably interprete?

Im making a normal RSS feed for my website. I need to include simple html formatting in the description eg paragraphs, line breaks, lists, etc. To do this I need to wrap the description content as CDATA.
The issue with this is that when I validate my feed the content of the CDATA is ignored. So although the feed validates, I dont actually know if everything is ok or not.
How can I find out what markup will likely be read ok by the various RSS readers?
Can I use whatever markup I would happily put in a website? How about inline styles? Or is more like designing html emails? Thanks
RSS files are XML Formatted plain text, I think that's the only standard you can rely upon.
I think most Syndicators look like they're handling HTML in RSS as they simply download the linked article when you choose the header.
If you're looking to embed rich content, then you may well be better investigating Atom instead of RSS.
Have a look at this S/O question: Which is better for encoding HTML for RSS?

How can I apply my CSS stylesheet to an RSS feed

On my blog I use some CSS classes which are defined in my stylesheet, but in RSS readers those styles don't show up. I had been searching for class="whatever" and replacing with style="something: something;". But this means whenever I modify my CSS I need to modify my RSS-generating code too, and it doesn't work for a tag which belongs to multiple classes (i.e. class="snapshot accent"). Is there any way to point to my stylesheet from my feed?
The popular RSS readers WILL NOT bother downloading a style sheet, even if you provide one and link to it using <?xml-stylesheet?>.
Many RSS readers simply strip all inline style attributes from your tags. From testing today, I discovered that Outlook 2007 seems to strip out all styles, for example, even if they are inline.
Good RSS readers allow a limited set of inline style attributes. See, for example, this article at Bloglines about what CSS they won't strip. From experimentation, Google Reader seems to pass through certain styles unharmed.
The philosophy of RSS is indeed that the reader is responsible for presentation. Many people think that RSS should be plain text and that CSS in RSS feeds is inappropriate. It's probably not appropriate to impose a different font on your RSS feeds. However, certain types of content (for example, images floated on the left, with captions positioned carefully) require a minimal amount of styling in order to maintain their semantic meaning.
The point of RSS is to be display agnostic. You should not be putting style attributes on your feed.
I found this blog post that describes how to add style to your RSS feed.
Because RSS is (supposed to be) XML, you can use XML stylesheets.
http://www.w3.org/TR/xml-stylesheet/
The purpose of an RSS feed is to allow the easy transmission of content to places outside your site. The whole idea is that the content within the feed is format-free, so that it can be read by any piece of software. The program that is reading the your feed is in charge of how to present it visually. For example, if you had a website that read RSS, you would want to parse the feed into HTML, and style it that way. However, if you were building a desktop application to read the feed, you would implement the formatting quite differently.

Resources