How to get more Feed items? - rss

How would I get the next page or more results for a feed?
For example, when I go to Security Now feed page, there is no "next" link of any kind and the url parameter of "page=100" does nothing:
http://leoville.tv/podcasts/sn.xml
I get only 1 page of results of about 20 episodes. However my Google Reader can successfully retrieve episodes that are earlier than that.

Indeed it is true that Google Reader caches the items and it is NOT possible to paginate on RSS2, RSS or Atom feeds (unless they have rel=next link, which none of them seem to have).
However, we can leverage the existing Google Reader infrastructure, with some work, to retrieve a list of, say 200 items!
Given the above podcast url we retrieve the latest 200 episodes by:
Using the ...google.ca/reader/atom/feed prefix instead of the usual view/feed as can be seen in your google reader.
Appending n=200 as the query parameter.
So we have:
http://www.google.ca/reader/atom/feed/http://leoville.tv/podcasts/sn.xml?hl=en&n=200
There is a very insightful reverse-engineered google-reader API project located at http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI

Google reader caches RSS entries. You can't get any more from the actual feed if they don't allow for it.

Related

how can I get a global rss feed on Gitlab?

In Gitlab there is an RSS news feed button for every project.
http://git.domain.name/userName/projectName/commits/master.atom?private_token=xxxxxxxxxxxxx
Is there a way to retrieve a global RSS feed for all projects related to one user?
Or even better, if there is a way to capture all user's feeds across all projects, that would be cool.
thanks
This is currently not supported.
The closest feature request there is would be "Team-/groupwide RSS feeds", to have RSS feeds for all events in a team or a group. (not enough vote for now).
The global "RSS" shown on demo.gitlab.com is actually a html result of the event-items representing the activity in response to http://demo.gitlab.com/?limit=20&offset=0:
That would be for all public projects, but I don't see it exposed as an RSS feed.
The OP mauro reports in the comments
the global "RSS" shown in demo.gitlab.com is also showing on my gitlab dashboard, the only thing is that mine is giving a 500 error instead.

Is there a way to get bing and google news search as rss?

Is there a way to get Google and Bing news searches as RSS?
I'd like to have the most recent at the top of the results (ordered by time) if possible.
And show 100 results if that is possible.
I found this for Google, but it only shows 5 items or so, not that great.
https://news.google.com/news/feeds?output=rss&q=politics
I just want to search their news categories and get results as RSS.
You are right that Google no longer publicly shows RSS feeds for News searches. But there's a basic URL format that should continue to work.
If my search topic is "education", my search URL should look like:
https://news.google.com/news/section?cf=all&ned=us&q=education
Replace:
value of ned (probably stands for 'news edition'), i.e. us with the appropriate country code that's relevant to you. E.g. uk (not gb for UK), in (India), etc.
value of q (probably stands for 'search query'), i.e. education with your search term. Combine multiple words with a plus sign, e.g. education+policy (if your search is for 'Education Policy').
Now the RSS feed button in your browser should've become active because an RSS feed is available for the page. It shows 10 latest news items in the feed, not 5. And the RSS feed URL would look like this:
https://news.google.com/news/feeds?cf=all&ned=us&hl=en&q=education&output=rss
Conclusion: RSS feeds aren't available for actual searches, i.e. typing in the search box on Google. You'd have to do it like this.
(PS: Credit goes to the question itself. It would not have occurred to me otherwise.)
Try adding
&output=rss
https://news.google.com/news/section?cf=all&ned=us&q=education**&output=rss**
The custom RSS feed is in this format:
https://news.google.com/news/feeds?q={yourquery}&output=rss.
For example, https://news.google.com/news/feeds?q=developer&output=rss returns news about developer.
Your best bet is somthing like Google Alerts feeds. You can type a query select the type of data in there and get it delivered via RSS. Since they support PubSubHubbub for that, you will also get results in realtime.
If you want another dataset feel free to also check Superfeedr's track feeds which will help you get notified in realtime when keywords are matched across any RSS feed.
I think Google disabled the RSS search output in the past few weeks, and I have not found an alternative. The XML output requires a Google paid account. I now have to scrape the HTML for what I want.
Here's the new way to access Google News RSS feeds:
https://news.google.com/news/rss/search/section/q/{yourquery}

What causes Google Reader to think an item is updated in an RSS feed?

I'm generating an RSS feed from my blog. I'm using node-rss. When I make a minor edit to one the posts listed in the feed, Google Reader lists the item as unread, even though I marked it as read a week ago.
My RSS feed contains title, description, link, guid and pubDate elements for each item. For guid, I'm just using the canonical URL to the item. The pubDate element is the date/time that the entry was first published, rather than the time of the last edit.
The feed itself contains lastBuildDate, which is set to the time that the RSS feed was generated (i.e. when it was requested).
As far as I can tell, there's nothing in the RSS feed that flags the item as being changed. So why does Google Reader think that the item has been updated, and why does it show it as unread again?
Does it look at the content (which has changed)? If so, can I do something in the RSS feed to mark this as a minor update, thus preventing Google Reader from showing it as unread?
If, in Google Reader, you mouse over the date in the top-right of each post, you'll see that it has "Received" and "Published" dates.
"Received" appears to be when the Google Reader server saw the new content, whereas "Published" comes from the feed itself.
Google Reader appears to use the "Received" date to decide whether something is new.
So, to get the correct behaviour
Don't put anything in the feed that's older than (say) 6 months.
Limit the feed XML to the most-recent 10 or so items.
Of course, the second could imply the first...

How to handle non unique item GUIDs/IDs in an RSS feed?

What is the correct response an RSS client should have when it encounters a feed that has multiple items with the same guid/identifier?
Currently in my application, any items that use an existing guid won't be cached or displayed because it believes it already has that item.
In this example feed a lot of items share this id:
tag:blizzard.com,2010-10-22:diablo3:feed:en-us:1
According to w3 when there are duplicate entries in an RSS feed:
Atom Processors MAY choose to display all of them or some subset of them. One typical behavior would be to display only the entry with the latest atom: updated timestamp.
I would go with the spec and display only the entry with the latest updated timestamp. Don't forget to send an email to Blizzard support and have them get their RSS validated - just don't threaten to keep them out of the next raid.
Take care.
I think your app is doing it right. Don't get fancy. If you've already seen an item with that guid you don't present it a second time. You should contact whe webmaster for the feed if possible and alert them to the problem.
Does each item have a unique URL? If so, fall back to using the URL.

Reading RSS by publication date

I want to build an RSS reader for twitter RSS feeds (c# .NET 3.5).
Getting a response from RSS web address and parsing it is very simple. (I did that with XmlDocument.Load("<RSS Feed>")).
The problem is that I need to get RSS items by publication date range.
When loading the application, I want to get all the items since the last time the feeds have been downloaded.
How can I do this?
Does every RSS feed allow that? (Google reader is showing items even from the last year).
It comes down to two sources of data: what the feed currently provides, and what you have stored.
If the feed is only showing the 10 most recent, for example, there is nothing you can do to get the older data. The feed must provide it.
Google Reader runs a cronjob that checks feeds about every 3 hours. It then stores the items in a database for Google Reader to reference any time it needs.

Resources