I want to store the items of an RSS feed in a database. I created a table 'items' where, among other fields (author, guid, ...), I have a field for the body. How big should that be? Is there a limitation on the size of the body of an rss item? If not, what's the average of this size?
EDIT: as pointed out by random, Maximum length of RSS description item already answers my first two questions. The last one remains open: I still wonder if anyone has run tests on the average of the content of the description tag in a rss feed.
Related
So, I'm working on trying to improve a Mailchimp RSS campaign that was created by one of my coworkers.
The email that gets sent out is a list of posts from different categories in our website.
So to do this the RSS campaign is made up of different FEEDBLOCKS – one FEEDBLOCK for each kind of category on the website. An example of one of the FEEDBLOCKs looks like this (which is pretty standard and basic, I guess):
*|FEEDBLOCK:http://website.com/specific-category/feed|*
*|FEEDITEMS:[$count=2]|*
*|FEEDITEM:TITLE|*
*|END:FEEDITEMS|*
*|END:FEEDBLOCK|*
The thing I want to fix is for the FEEDBLOCK to only show new posts from the past 7 days (the Mailchimp campaign goes out once a week). At the moment, we do this manually by changing the number in the *|FEEDITEMS:[$count=2]|* field. We have to manually count the number of new posts on the website each week and input the new count number so the correct number of new posts are displayed on the email.
I'm pretty new to using RSS feeds and Mailchimp but it seems to me from knowing some basic coding that there should be a way to do this automatically, rather than having to manually change the count number for every FEEDBLOCK before we send out the email to our subscribers.
Can any of you give me any advice on how I can change the code we're using to update the count number automatically?
Thanks in advance!
I'm not entirely sure this is how Feedblock works - but with the standard RSS merge tags available in Mailchimp, the schedule of the email determines how many items are pulled in. For example, I have an rss feed scheduled for once a week - with no item number parameters the code - Mailchimp pulls in only the articles posted to my feed since the last time the campaign sent - i.e. in one week it can be anywhere from 3 to 10.
I am trying to do the same thing. FEEDBLOCKS and FEEDITEMS do not act like RSSFEED and RSSITEMS. If you use FEEDBLOCKS the RSS dates entered in the "RSS Feed" step are not used!
This is what I received from MailChimp: "Any FEEDBLOCK tag in a sense will be its own feed so any dates set for the RSS campaign are ignored. FEEDBLOCK's are usually used for non-rss campaigns to display something from another feed. You can only control how many post are shown when using the FEEDBLOCK tag."
Unfortunately I, like you, have only the manual solution! I am not a coder, but I think the rssfeed itself has to be changed. That would be to build an RSS feed that limits entries to a certain date range i.e., previous 7 days ending with previous full day.
I am creating an RSS feed for a website I am working on. I read about RSS and it is pretty simple: It is a specially formatted XML file.
However, I could not find information about the following two questions
Is there a limit to the number of entries/items in an RSS feed? Should I have 10 entries only? Or can I go up to 100 for example? What if I have more entries than 100 per day? What can I do?
Can I have pages with each page displaying 10? So for example, www.emample.com/rss/ will give page 1, and www.example.com/rss/2 will give page 2 of RSS, and www.example.com/rss/3 will give page 3, and so on. The reason for this question is the following: If I am restricted to only 10 rss items, what happens if I have 50 items updated to the site since my last RSS update?
Thanks.
Is there a limit to the number of entries/items in an RSS feed? Should I have 10 entries only? Or can I go up to 100 for example? What if I have more entries than 100 per day? What can I do?
Depends on version of RSS used. If you're using the UserLand RSS 0.91 spec, for example, the number of item in a channel should be limited to 15 according to info in the RSS 2.0 spec. If you think of RSS as a format for periodic updates this makes sense—though it can be limiting.
If you look at jekyll-feed RubyGem—which uses Atom and is deployed on GitHub Pages sites—the number of posts is limited to 10. But you can do whatever your want as long as the spec permits.
For example, if you have more than 100 entries per day you're obviously going to want to increase the number and RSS 2.0 (and maybe Atom?) is fine with that. To signify to the RSS user agent the content is updated with a high frequency you can use the Syndication module to output time:
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
(Don't forget to add the XML namespace when using.)
Just keep in mind if you're using the Content module to output the full text of an an article in a CDATASection you're probably going to want to truncate some text if the sections are large.
Can I have pages with each page displaying 10?
Yes, though if you do you should probably have an RSS for your RSS to indicate the pages and use guid without a URL to ensure they're properly identified and deduped by the feed reader. This is also going to depend on the RSS user agent and what your desired results are.
And just for fun here are some modules for RSS 2.0 which make it very extensible:
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
Only use what you need and drop namespaces you aren't using.
Another option is to look at Atom, which is another format read by all modern readers transparently (no one will notice this is Atom or RSS). Atom has pagination as per this RFC.
Generally, though, pagination is not widely used to say the least... so you probably don't need to bother too much!
Whether you pick RSS or Atom, it's useless to make your feeds "too large". Stick to a small-ish number of items, between 10 and 20, depending on how often you publish items.
Also think about implementing PubSubHubbub which is a fairly simple publish/subscribe protocol which will let anyone interested in your content know that a given feed has updated.
RSS 2.0 Specification:
In RSS 0.91, various elements are restricted to 500 or 100 characters.
There can be no more than 15 s in a 0.91 . There are no
string-length or XML-level limits in RSS 0.92 and greater. Processors
may impose their own limits, and generators may have preferences that
say no more than a certain number of s can appear in a channel,
or that strings are limited in length.
In RSS 2.0, a provision is made for linking a channel to its
identifier in a cataloging system, using the channel-level category
feature, described above. For example, to link a channel to its
Syndic8 identifier, include a category element as a sub-element of
, with domain "Syndic8", and value the identifier for your
channel in the Syndic8 database. The appropriate category element for
Scripting News would be 1765.
An RSS file is primarily used to tell subscribers when there's is new content on your site. You would generally set the number of s in your feed to reasonably accommodate the number of pages that change on a regular basis over a certain period.
If you want the SEs to know about your pages then another type of XML file would be suggested; site maps.
By RSS 2.0 specification, link, title and description are required elements. In reality though, any of those three can be missing. I read data from multiple feeds and I want to display them in a similar manner, how can I consolidate the data?
To simplify the Really simple syndication, you can make those elements in the resulting object/table:
link - There are several elements that can contain a link. Other than <link> itself, there is<guid>. If permalink="true", it is a good link. If not permalink, it can be a link as well, but it may lead nowhere. There can also be <enclosure> (one or more), however, they link to files or streams, not webpages.
title - If there is no <title>, you can take a piece of <description>, remove any HTML from it though.
description - If <description> isn't present, leave it empty.
guid - If it's not present, select first available combination from those:
link-<pubDate>, link-title, link, title-<pubDate>, title, <pubDate>
The generated guid doesn't have to be really unique, be aware of that.
pubDate - if you must show some date and it's not present, generate one upon saving.
I'm generating an RSS feed from my blog. I'm using node-rss. When I make a minor edit to one the posts listed in the feed, Google Reader lists the item as unread, even though I marked it as read a week ago.
My RSS feed contains title, description, link, guid and pubDate elements for each item. For guid, I'm just using the canonical URL to the item. The pubDate element is the date/time that the entry was first published, rather than the time of the last edit.
The feed itself contains lastBuildDate, which is set to the time that the RSS feed was generated (i.e. when it was requested).
As far as I can tell, there's nothing in the RSS feed that flags the item as being changed. So why does Google Reader think that the item has been updated, and why does it show it as unread again?
Does it look at the content (which has changed)? If so, can I do something in the RSS feed to mark this as a minor update, thus preventing Google Reader from showing it as unread?
If, in Google Reader, you mouse over the date in the top-right of each post, you'll see that it has "Received" and "Published" dates.
"Received" appears to be when the Google Reader server saw the new content, whereas "Published" comes from the feed itself.
Google Reader appears to use the "Received" date to decide whether something is new.
So, to get the correct behaviour
Don't put anything in the feed that's older than (say) 6 months.
Limit the feed XML to the most-recent 10 or so items.
Of course, the second could imply the first...
What is the correct response an RSS client should have when it encounters a feed that has multiple items with the same guid/identifier?
Currently in my application, any items that use an existing guid won't be cached or displayed because it believes it already has that item.
In this example feed a lot of items share this id:
tag:blizzard.com,2010-10-22:diablo3:feed:en-us:1
According to w3 when there are duplicate entries in an RSS feed:
Atom Processors MAY choose to display all of them or some subset of them. One typical behavior would be to display only the entry with the latest atom: updated timestamp.
I would go with the spec and display only the entry with the latest updated timestamp. Don't forget to send an email to Blizzard support and have them get their RSS validated - just don't threaten to keep them out of the next raid.
Take care.
I think your app is doing it right. Don't get fancy. If you've already seen an item with that guid you don't present it a second time. You should contact whe webmaster for the feed if possible and alert them to the problem.
Does each item have a unique URL? If so, fall back to using the URL.