How does Mailchimp and similar services detect New RSS? - rss

I have a page which displays a random post every day, and I have made a custom RSS feed page for it. My question is how Mailchimp and similar services (such as IFTTT) that uses RSS feeds detects "New Items"?
Cause my RSS feed looks like this:
<channel>
<title><b>Quote of the Day</b></title>
<atom:link rel="self" href="http://www.mysite.com/qotd/feed-2" type="application/rss+xml"/>
<link>http:/mysite.com/qotd/feed-2</link>
<description>Textt</description>
<language>en-us</language>
<pubDate>Wed, 22 Jan 2014 21:30:45 +0000</pubDate>
<lastBuildDate>Wed, 22 Jan 2014 21:30:45 +0000</lastBuildDate>
<item>
<link>http://www.mysite.com/412</link>
<pubDate>Wed, 20 Jan 2014 07:17:45 +0000</pubDate>
<description><![CDATA[]]></description>
<guid>http://www.mysite.com/412</guid>
</item>
</channel>
</rss>
Every 24h, the feed updates with a new item. The problem is that there will always be 1 item in total, and the pubdate might be an older date before, depending on which random post it pulls. Will services that uses RSS still detect it as if a new item has been added?

It's not considered good practice to have only a single item in your feed. It exposes you to the risk of missed update if a given service polls the feed unfrequently enough.
The "default" behavior is to poll the feed regularly (every hour, every day... etc) and compare the <item> elements between 2 fetches. They will likely use the <guid> to determine whether an item has already been detected or not.
Additionally, many consuming apps (like IFTTT in your case but I'm not sure for Mailchimp) will support PubSubHubbub which is a webhook protocol so they know when a given feed has been updated. It saves them (and you!) resources because they won't have to poll the feed often and yet will always have timely updates.
Most of the time, the <pubDate> does not matter. Feel free to share the actual URL of your feed so we can tell you exactly what's going on.

That depends on the implementation, but usually they will compare the new item with all items stored locally to determine whether it was in the RSS earlier, and only display it if it's not, thus this should be fine (at least the two RSS feed readers I have used until now worked this way).
If you want the items to be displayed in the order they appeared on your website however you should use the current date instead of the actual date of the item.

There is a helpful troubleshooting guide by Mailchimp. It seems Mailchimp does evaluate the Pubdate to decide that items are published. From the page:
Make sure your pubDate tags are set up and populating correctly. If a pubDate is set in another time zone or a day off, MailChimp may not recognize that the items were posted before the next campaign is triggered.
Below is an example of the correct setup for your RSS feed. The pubDate is in English.We pull in the date for any of these tags, in this order: 'pubDate', 'pubdate', 'published', 'created', 'updated', 'date.'

Related

What causes Google Reader to think an item is updated in an RSS feed?

I'm generating an RSS feed from my blog. I'm using node-rss. When I make a minor edit to one the posts listed in the feed, Google Reader lists the item as unread, even though I marked it as read a week ago.
My RSS feed contains title, description, link, guid and pubDate elements for each item. For guid, I'm just using the canonical URL to the item. The pubDate element is the date/time that the entry was first published, rather than the time of the last edit.
The feed itself contains lastBuildDate, which is set to the time that the RSS feed was generated (i.e. when it was requested).
As far as I can tell, there's nothing in the RSS feed that flags the item as being changed. So why does Google Reader think that the item has been updated, and why does it show it as unread again?
Does it look at the content (which has changed)? If so, can I do something in the RSS feed to mark this as a minor update, thus preventing Google Reader from showing it as unread?
If, in Google Reader, you mouse over the date in the top-right of each post, you'll see that it has "Received" and "Published" dates.
"Received" appears to be when the Google Reader server saw the new content, whereas "Published" comes from the feed itself.
Google Reader appears to use the "Received" date to decide whether something is new.
So, to get the correct behaviour
Don't put anything in the feed that's older than (say) 6 months.
Limit the feed XML to the most-recent 10 or so items.
Of course, the second could imply the first...

Questions on building RSS feed

I am building RSS feed for the first time and I have some simple, direct questions that I was unable to find on the web, well at list in a sense that would be clear to me. Can you help me understand following
Which items should I include in RSS generation? should I always put in all the articles or what is the criteria when I query my articles for the feed?
What value should I set for pubDate? The specification says "The publication date for the content in the channel. For example, the New York Times publishes on a daily basis, the publication date flips once every 24 hours. That's when the pubDate of the channel changes.". I do not quite understand how to apply this to my feed. I have new articles daily, should I set the pubDate to let say 06:00 AM today and update it every day?
lastBuildDate: if I understand this right is the date of the latest updated item?
Which items should I include in RSS generation?
You should have one generic feed with all the new articles you post (for example: news). Additionally if you got your webpage split into categories, or you have some specific feeds (eg. calendar of the events) then it's good to create additional separate RSS for each one of them
What value should I set for pubDate? I do not quite understand how to apply this to my feed. I have new articles daily, should I set the pubDate to let say 06:00 AM today and update it every day?
Always set pubDate to the time when your news/articles went online. So if you have new articles daily pubDate should be a date when they were released to the public. Not random hour in the morning. Not the moment when you started writing them.
lastBuildDate: if I understand this right is the date of the latest updated item?
lastBuildDate is the most recent date when any of the results was posted or modified. Usually you should skip it - especially if your lastBuildDate will be simply a most recent pubDate. It's an optional parameter.
I use lastBuildDate only for calendar RSS feeds to show when the calendar was updated (as in calendars you not only add new entries but also often edit existing).
You should put every article, but the best is to provide different feeds for different categories, even search keywords. You can build it like any dynamic page, with a querystring.
that's not super important, you can put whatever. I don't think may feed readers use it.
theoretically it's the date the content changed. So the date of the latest updated item should work.
Something super important, since people are going to do polling on this page (meaning a lot of requests on the page)
- Cache it on your server
- Serve and Etag header and/or a LastModifiedDate. That way your server can respond with just a "not modified" if the client has it in cache already.

RSS for Future Items

This may be a simple question, but for some reason I don't know this answer. Is it possible to create an RSS feed file that contains contents for an entire year but only publishes the current date and previous date information?
I have a client that wants to do a "this day in history" post. Currently, I am using IFTTT, and created around sixty dated posts for the next two months. Of course, this works -- but it is very labor intensive.
Is it possible to create an RSS feed that you could put all 365 days of data in to, but if someone pulls up the feed it only shows today's item and prior days in the feed?
Or is RSS not the proper technology to do this? The reason I am using RSS is for ease of use, and IFTTT will take those RSS feeds and pump it in to Facebook and Twitter for automatic status updates for my client.
There are various tools that let you define Facebook and Twitter posts in advance, to be published at a specified date and time in the future. Why not use one of those instead of writing your own?
A quick search for "scheduled twitter post" uncovered Later Bro, Twuffer and twAitter but there must be dozens to choose from.
If you're looking for just posting on Facebook and Twitter, and not an RSS feed as well, I'd follow Matthew's suggestion. If you want an RSS feed, there is a feed for each Twitter feed. But if you want actual RSS, you need to add something in between. An RSS feed is just an XML file. it's not a process. I suggest having a file of some type (maybe RSS, or other XML, or a database table, or even a csv file with all the posts and relevant information, including date. Then a small script that runs as a chron job (or IFTTT if it supports date as trigger and running a script as the "then" part) that pulls the day's feed and updates the actual RSS feed. Pretty simple.
Here is what I ended up doing
Using the Drupal backend of my website, I created a content type specifically for these posts.
I created individual articles for each day, and used the schedule module to schedule the publish date to the date I wanted.
I created an RSS feed of these posts through Drupal.
I linked the newly created RSS feed to IFTTT.
Created an IFTTT recipe to post the text from the RSS feed to Facebook/Twitter/etc.
It wasn't the best solution, but it worked. I was really trying to do this without having to rely on a third-party such as IFTTT, but never really figured out a good way to do it.

Reading RSS by publication date

I want to build an RSS reader for twitter RSS feeds (c# .NET 3.5).
Getting a response from RSS web address and parsing it is very simple. (I did that with XmlDocument.Load("<RSS Feed>")).
The problem is that I need to get RSS items by publication date range.
When loading the application, I want to get all the items since the last time the feeds have been downloaded.
How can I do this?
Does every RSS feed allow that? (Google reader is showing items even from the last year).
It comes down to two sources of data: what the feed currently provides, and what you have stored.
If the feed is only showing the 10 most recent, for example, there is nothing you can do to get the older data. The feed must provide it.
Google Reader runs a cronjob that checks feeds about every 3 hours. It then stores the items in a database for Google Reader to reference any time it needs.

What is the difference between <pubDate> and <lastBuildDate> in RSS?

I have the feeling, in every RSS.xml file, both the pubDate and the lastBuildDate match.
I am sure that this one, is not always true...
So firstly, what is the difference between those two above?
Secondly, the RSS readers, sort the content by Date, based on the pubDate or the lastBuildDate?
pubDate:
The original publication date for the channel or item. (optional)
lastBuildDate:
The most recent time the content of the channel was modified. (optional)
Here are some docs for the optional items in the RSS 2.0 spec.
Answers here are all over the place. Some people are getting confused by the fact that item has a pubDate as well. I believe the OP is specifically asking about the difference between lastBuildDate and pubDate at the channel level.
From the best of my understanding of the RSS spec, which is notorious for ambiguous explanations, lastBuildDate would be the last time the feed was created. For example, if you cache a copy of it on your server for some period of time, lastBuildDate would the time that cached copy was created.
pubDate, on the other hand, seems to be basically the last time any actual content within the feed has changed. For the most part it's pretty much going to be the latest pubDate value from the items in the feed, since generally, the feed content is only changing when some new item gets published. However, it could also be a date when you made some change to the channel, itself, such as changing the channel title, description, etc.
lastBuildDate specifies the last date/time the entry was modified. pubDate specifies the actual publication date/time.
The reason you see these as generally the same is because by the time you get the RSS feed, there hasn't been any edit to the article.
I can't find the RSS spec on this unfortunately, but I am pretty positive that's what they are.
By RSS 2.0 specification, it seems they are roughly equivalent:
lastBuildDate:
The last time the content of the channel changed.
pubDate:
The publication date for the content in the channel. ...
The difference is subtle: They tell us about the method that was used. In case of <pubDate>, the channel is published manually or in fixed period. In case of <lastBuildDate>, the channel is built automatically upon new article being added on the website, adding it as new item.
While the other answers here do provide some good information, I feel the need to elaborate just a little bit for any future visitors.
pubDate
The publication date for the content in the channel. For example, the New York Times publishes on a daily basis, the publication date flips once every 24 hours. That's when the pubDate of the channel changes.
lastBuildDate
The last time the content of the channel changed.
So, taking the New York Times as an example again, the <pubDate> is the date the feed was published while the <lastBuildDate> would be the date the content inside the feed changed. In the end, I would view the <pubDate> as the date the feed is published and the <lastBuildDate> as the date any content in the feed was last modified.

Resources