Quite simply, is there a way to turn this data into an RSS feed?
The data set contains yesterday's Arabic news from several sources. Therefore all data will change after midnight in London.
You can do that with Kimono. Here is the API (login probably required). You can clone the API and make it better.
Related
Is there a way to get Google and Bing news searches as RSS?
I'd like to have the most recent at the top of the results (ordered by time) if possible.
And show 100 results if that is possible.
I found this for Google, but it only shows 5 items or so, not that great.
https://news.google.com/news/feeds?output=rss&q=politics
I just want to search their news categories and get results as RSS.
You are right that Google no longer publicly shows RSS feeds for News searches. But there's a basic URL format that should continue to work.
If my search topic is "education", my search URL should look like:
https://news.google.com/news/section?cf=all&ned=us&q=education
Replace:
value of ned (probably stands for 'news edition'), i.e. us with the appropriate country code that's relevant to you. E.g. uk (not gb for UK), in (India), etc.
value of q (probably stands for 'search query'), i.e. education with your search term. Combine multiple words with a plus sign, e.g. education+policy (if your search is for 'Education Policy').
Now the RSS feed button in your browser should've become active because an RSS feed is available for the page. It shows 10 latest news items in the feed, not 5. And the RSS feed URL would look like this:
https://news.google.com/news/feeds?cf=all&ned=us&hl=en&q=education&output=rss
Conclusion: RSS feeds aren't available for actual searches, i.e. typing in the search box on Google. You'd have to do it like this.
(PS: Credit goes to the question itself. It would not have occurred to me otherwise.)
Try adding
&output=rss
https://news.google.com/news/section?cf=all&ned=us&q=education**&output=rss**
The custom RSS feed is in this format:
https://news.google.com/news/feeds?q={yourquery}&output=rss.
For example, https://news.google.com/news/feeds?q=developer&output=rss returns news about developer.
Your best bet is somthing like Google Alerts feeds. You can type a query select the type of data in there and get it delivered via RSS. Since they support PubSubHubbub for that, you will also get results in realtime.
If you want another dataset feel free to also check Superfeedr's track feeds which will help you get notified in realtime when keywords are matched across any RSS feed.
I think Google disabled the RSS search output in the past few weeks, and I have not found an alternative. The XML output requires a Google paid account. I now have to scrape the HTML for what I want.
Here's the new way to access Google News RSS feeds:
https://news.google.com/news/rss/search/section/q/{yourquery}
I am building RSS feed for the first time and I have some simple, direct questions that I was unable to find on the web, well at list in a sense that would be clear to me. Can you help me understand following
Which items should I include in RSS generation? should I always put in all the articles or what is the criteria when I query my articles for the feed?
What value should I set for pubDate? The specification says "The publication date for the content in the channel. For example, the New York Times publishes on a daily basis, the publication date flips once every 24 hours. That's when the pubDate of the channel changes.". I do not quite understand how to apply this to my feed. I have new articles daily, should I set the pubDate to let say 06:00 AM today and update it every day?
lastBuildDate: if I understand this right is the date of the latest updated item?
Which items should I include in RSS generation?
You should have one generic feed with all the new articles you post (for example: news). Additionally if you got your webpage split into categories, or you have some specific feeds (eg. calendar of the events) then it's good to create additional separate RSS for each one of them
What value should I set for pubDate? I do not quite understand how to apply this to my feed. I have new articles daily, should I set the pubDate to let say 06:00 AM today and update it every day?
Always set pubDate to the time when your news/articles went online. So if you have new articles daily pubDate should be a date when they were released to the public. Not random hour in the morning. Not the moment when you started writing them.
lastBuildDate: if I understand this right is the date of the latest updated item?
lastBuildDate is the most recent date when any of the results was posted or modified. Usually you should skip it - especially if your lastBuildDate will be simply a most recent pubDate. It's an optional parameter.
I use lastBuildDate only for calendar RSS feeds to show when the calendar was updated (as in calendars you not only add new entries but also often edit existing).
You should put every article, but the best is to provide different feeds for different categories, even search keywords. You can build it like any dynamic page, with a querystring.
that's not super important, you can put whatever. I don't think may feed readers use it.
theoretically it's the date the content changed. So the date of the latest updated item should work.
Something super important, since people are going to do polling on this page (meaning a lot of requests on the page)
- Cache it on your server
- Serve and Etag header and/or a LastModifiedDate. That way your server can respond with just a "not modified" if the client has it in cache already.
This may be a simple question, but for some reason I don't know this answer. Is it possible to create an RSS feed file that contains contents for an entire year but only publishes the current date and previous date information?
I have a client that wants to do a "this day in history" post. Currently, I am using IFTTT, and created around sixty dated posts for the next two months. Of course, this works -- but it is very labor intensive.
Is it possible to create an RSS feed that you could put all 365 days of data in to, but if someone pulls up the feed it only shows today's item and prior days in the feed?
Or is RSS not the proper technology to do this? The reason I am using RSS is for ease of use, and IFTTT will take those RSS feeds and pump it in to Facebook and Twitter for automatic status updates for my client.
There are various tools that let you define Facebook and Twitter posts in advance, to be published at a specified date and time in the future. Why not use one of those instead of writing your own?
A quick search for "scheduled twitter post" uncovered Later Bro, Twuffer and twAitter but there must be dozens to choose from.
If you're looking for just posting on Facebook and Twitter, and not an RSS feed as well, I'd follow Matthew's suggestion. If you want an RSS feed, there is a feed for each Twitter feed. But if you want actual RSS, you need to add something in between. An RSS feed is just an XML file. it's not a process. I suggest having a file of some type (maybe RSS, or other XML, or a database table, or even a csv file with all the posts and relevant information, including date. Then a small script that runs as a chron job (or IFTTT if it supports date as trigger and running a script as the "then" part) that pulls the day's feed and updates the actual RSS feed. Pretty simple.
Here is what I ended up doing
Using the Drupal backend of my website, I created a content type specifically for these posts.
I created individual articles for each day, and used the schedule module to schedule the publish date to the date I wanted.
I created an RSS feed of these posts through Drupal.
I linked the newly created RSS feed to IFTTT.
Created an IFTTT recipe to post the text from the RSS feed to Facebook/Twitter/etc.
It wasn't the best solution, but it worked. I was really trying to do this without having to rely on a third-party such as IFTTT, but never really figured out a good way to do it.
When creating an RSS reader, you download the XML formatted document pointed to by the RSS feed link, and you can parse it manually or using the functionality in the SyndicationFeed namespace.
So if we take Scott Guthrie's blog as an example, you download the RSS feed document here, and parse it. My problem is that this document only holds 15 items, yet he has been blogging for a number of years.
Is there a standard or established way of getting the older posts not included in the RSS feed document? Or do you have to find the base address for the blog posts and then parse the pages of the site from there to get them? How do you avoid missing posts on high volume blogs?
With RSS/Atom you can't query older articles.
I built a RSS archival service (https://app.pub.center). All of our data is free to use via REST. We charge money for push notifications.
PubCenter daily polls it's catalog of RSS feeds, and caches the articles. Then, you can get these articles back in a chronological order. For example:
Page 1 of The Atlantic https://pub.center/feed/02702624d8a4c825dde21af94e9169773454e0c3/articles?limit=10&page=1
Page 2 of The Atlantic https://pub.center/feed/02702624d8a4c825dde21af94e9169773454e0c3/articles?limit=10&page=2
As the replies to How Do I Fetch All Old Items on an RSS Feed? already mentioned, a feed may not provide archival data but historical items may be available from another source.
Archive.org’s Wayback Machine has an API to access historical content, including RSS feeds (if their bots have downloaded it). I’ve created the web tool Backfeed that uses this API to regenerate a feed containing concatenated historical items. If you'd like to discuss the implementation in detail please get in touch.
I want to build an RSS reader for twitter RSS feeds (c# .NET 3.5).
Getting a response from RSS web address and parsing it is very simple. (I did that with XmlDocument.Load("<RSS Feed>")).
The problem is that I need to get RSS items by publication date range.
When loading the application, I want to get all the items since the last time the feeds have been downloaded.
How can I do this?
Does every RSS feed allow that? (Google reader is showing items even from the last year).
It comes down to two sources of data: what the feed currently provides, and what you have stored.
If the feed is only showing the 10 most recent, for example, there is nothing you can do to get the older data. The feed must provide it.
Google Reader runs a cronjob that checks feeds about every 3 hours. It then stores the items in a database for Google Reader to reference any time it needs.