When implementing an RSS feed, how do you handle its update frequency? - rss

I'm writing an RSS feed. Let's say it's for a list of entries as in a blog.
How do I handle updating the feed? I mean, let's assume that The feed always displays the last 10 entries.
If someone subscribes now, he'll get the last 10 entries (1..10)... what if there are for example 2 new articles, and then what will his feed reader do? Because I will return the articles (2..12).
Do I have to do any special handling to start from a certain article in the feed, or do I just always put the last 10 and this will be fine

Returning the last n articles will be fine. Because you assign a unique identifier to each article (you do, right?) the feed reader can easily keep track of what it has already seen or not.
The feed reader will probably watch to see how often new articles appear, to help determine how often it checks for new articles.

Related

Automatically including posts from the last seven days inside a FEEDBLOCK for a MailChimp RSS Campaign

So, I'm working on trying to improve a Mailchimp RSS campaign that was created by one of my coworkers.
The email that gets sent out is a list of posts from different categories in our website.
So to do this the RSS campaign is made up of different FEEDBLOCKS – one FEEDBLOCK for each kind of category on the website. An example of one of the FEEDBLOCKs looks like this (which is pretty standard and basic, I guess):
*|FEEDBLOCK:http://website.com/specific-category/feed|*
*|FEEDITEMS:[$count=2]|*
*|FEEDITEM:TITLE|*
*|END:FEEDITEMS|*
*|END:FEEDBLOCK|*
The thing I want to fix is for the FEEDBLOCK to only show new posts from the past 7 days (the Mailchimp campaign goes out once a week). At the moment, we do this manually by changing the number in the *|FEEDITEMS:[$count=2]|* field. We have to manually count the number of new posts on the website each week and input the new count number so the correct number of new posts are displayed on the email.
I'm pretty new to using RSS feeds and Mailchimp but it seems to me from knowing some basic coding that there should be a way to do this automatically, rather than having to manually change the count number for every FEEDBLOCK before we send out the email to our subscribers.
Can any of you give me any advice on how I can change the code we're using to update the count number automatically?
Thanks in advance!
I'm not entirely sure this is how Feedblock works - but with the standard RSS merge tags available in Mailchimp, the schedule of the email determines how many items are pulled in. For example, I have an rss feed scheduled for once a week - with no item number parameters the code - Mailchimp pulls in only the articles posted to my feed since the last time the campaign sent - i.e. in one week it can be anywhere from 3 to 10.
I am trying to do the same thing. FEEDBLOCKS and FEEDITEMS do not act like RSSFEED and RSSITEMS. If you use FEEDBLOCKS the RSS dates entered in the "RSS Feed" step are not used!
This is what I received from MailChimp: "Any FEEDBLOCK tag in a sense will be its own feed so any dates set for the RSS campaign are ignored. FEEDBLOCK's are usually used for non-rss campaigns to display something from another feed. You can only control how many post are shown when using the FEEDBLOCK tag."
Unfortunately I, like you, have only the manual solution! I am not a coder, but I think the rssfeed itself has to be changed. That would be to build an RSS feed that limits entries to a certain date range i.e., previous 7 days ending with previous full day.

How to detect updates in podcast feeds?

I have a large set of podcast feed URLs which I'm periodically polling to check for updates. I'm really struggling to find a robust way to detect if a feed has changed that doesn't have any false positives. I'd like to be able to detect not just if there is a new episode, but also if an existing episode was updated.
RSS and Atom feeds provide pubDate, lastBuildDate or updated elements. However, I'm finding these frequently misused so that the feed is actually inserting the current date time into these fields each request. This makes them difficult to rely on to detect changes.
My next thought was to strip all date information from the podcasts, then MD5 hash the feed contents. I can then compare the feed hashes to detect changes to the feeds.
This seems to work for about 90% of the cases. However, there are still hundreds of podcasts that insert dynamic data into their feeds.
One podcast has the following as their podcast cover art:
http://erikglassman.hipcast.com/albumart/1000.1439649026.jpg
Where 1439649026 is what I assume is a timestamp. This second number changes with each request of their feed.
This is starting to seem like a losing battle. If I can't reliably trust the date fields of a podcast feed, and if some percentage of podcasts insert dynamic data into their feed text, how can I reliably detect changes to a feed in a robust way?
Everything you say is true, so it's not a good idea to try to detect changes at the feed level, instead look for them at the item level.
That generally works, if it doesn't the feed can't be used by anyone, so the source of the feed is likely to have fixed any problem. That's why I think it works so well.
I've been writing feed readers as long as they have existed, my current product is called River4, it's available as open source, MIT License, so you can use it as example code, for this and other issues.
This is where it checks if an item is new:
https://github.com/scripting/river4/blob/master/river4.js#L1411
That might move around as the code changes, so look for a routine called getItemGuid. It shows you how to get a value that uniquely identifies the item. I use this code for my podcatcher, http://podcatch.com/, and it seems to catch the new items, and doesn't get false positives.
Hope this helps! :-)

What causes Google Reader to think an item is updated in an RSS feed?

I'm generating an RSS feed from my blog. I'm using node-rss. When I make a minor edit to one the posts listed in the feed, Google Reader lists the item as unread, even though I marked it as read a week ago.
My RSS feed contains title, description, link, guid and pubDate elements for each item. For guid, I'm just using the canonical URL to the item. The pubDate element is the date/time that the entry was first published, rather than the time of the last edit.
The feed itself contains lastBuildDate, which is set to the time that the RSS feed was generated (i.e. when it was requested).
As far as I can tell, there's nothing in the RSS feed that flags the item as being changed. So why does Google Reader think that the item has been updated, and why does it show it as unread again?
Does it look at the content (which has changed)? If so, can I do something in the RSS feed to mark this as a minor update, thus preventing Google Reader from showing it as unread?
If, in Google Reader, you mouse over the date in the top-right of each post, you'll see that it has "Received" and "Published" dates.
"Received" appears to be when the Google Reader server saw the new content, whereas "Published" comes from the feed itself.
Google Reader appears to use the "Received" date to decide whether something is new.
So, to get the correct behaviour
Don't put anything in the feed that's older than (say) 6 months.
Limit the feed XML to the most-recent 10 or so items.
Of course, the second could imply the first...

Questions on building RSS feed

I am building RSS feed for the first time and I have some simple, direct questions that I was unable to find on the web, well at list in a sense that would be clear to me. Can you help me understand following
Which items should I include in RSS generation? should I always put in all the articles or what is the criteria when I query my articles for the feed?
What value should I set for pubDate? The specification says "The publication date for the content in the channel. For example, the New York Times publishes on a daily basis, the publication date flips once every 24 hours. That's when the pubDate of the channel changes.". I do not quite understand how to apply this to my feed. I have new articles daily, should I set the pubDate to let say 06:00 AM today and update it every day?
lastBuildDate: if I understand this right is the date of the latest updated item?
Which items should I include in RSS generation?
You should have one generic feed with all the new articles you post (for example: news). Additionally if you got your webpage split into categories, or you have some specific feeds (eg. calendar of the events) then it's good to create additional separate RSS for each one of them
What value should I set for pubDate? I do not quite understand how to apply this to my feed. I have new articles daily, should I set the pubDate to let say 06:00 AM today and update it every day?
Always set pubDate to the time when your news/articles went online. So if you have new articles daily pubDate should be a date when they were released to the public. Not random hour in the morning. Not the moment when you started writing them.
lastBuildDate: if I understand this right is the date of the latest updated item?
lastBuildDate is the most recent date when any of the results was posted or modified. Usually you should skip it - especially if your lastBuildDate will be simply a most recent pubDate. It's an optional parameter.
I use lastBuildDate only for calendar RSS feeds to show when the calendar was updated (as in calendars you not only add new entries but also often edit existing).
You should put every article, but the best is to provide different feeds for different categories, even search keywords. You can build it like any dynamic page, with a querystring.
that's not super important, you can put whatever. I don't think may feed readers use it.
theoretically it's the date the content changed. So the date of the latest updated item should work.
Something super important, since people are going to do polling on this page (meaning a lot of requests on the page)
- Cache it on your server
- Serve and Etag header and/or a LastModifiedDate. That way your server can respond with just a "not modified" if the client has it in cache already.

How to get more Feed items?

How would I get the next page or more results for a feed?
For example, when I go to Security Now feed page, there is no "next" link of any kind and the url parameter of "page=100" does nothing:
http://leoville.tv/podcasts/sn.xml
I get only 1 page of results of about 20 episodes. However my Google Reader can successfully retrieve episodes that are earlier than that.
Indeed it is true that Google Reader caches the items and it is NOT possible to paginate on RSS2, RSS or Atom feeds (unless they have rel=next link, which none of them seem to have).
However, we can leverage the existing Google Reader infrastructure, with some work, to retrieve a list of, say 200 items!
Given the above podcast url we retrieve the latest 200 episodes by:
Using the ...google.ca/reader/atom/feed prefix instead of the usual view/feed as can be seen in your google reader.
Appending n=200 as the query parameter.
So we have:
http://www.google.ca/reader/atom/feed/http://leoville.tv/podcasts/sn.xml?hl=en&n=200
There is a very insightful reverse-engineered google-reader API project located at http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI
Google reader caches RSS entries. You can't get any more from the actual feed if they don't allow for it.

Resources