Really simple question here:
For a PHP-driven RSS feed, am I just overwriting the same XML file every time I "publish" a new feed thing? and the syndicates it's registered with will pop in from time to time to check that it's new?
Yes. An RSS reader has the URL of the feed and regularly requests the same URL to check for new content.
that's how it works, a simple single xml rss file that gets polled for changes by rss readers
for scalability there there is FeedTree: collaborative RSS and Atom delivery but unlike another well known network program (bittorrent) it hasn't had as much support in readers by default
Essentially, yes. It isn't necessarily a "file" actually stored on disk, but your RSS (or Atom) is just changed to contain the latest items/entries and resides at a particular fixed URL. Clients will fetch it periodically. There are also technologies like PubSubHubbub and pinging for causing updates to get syndicated closer to real-time.
Yes... BUT! There are ways to make the susbcribers life better and also improve your bandwidth :) Implement the PubSubHubbub protocol. It will help any application that wants the content of the feed to be notified as soon as it's available. It'es relatively simple to implement on the publisher side as it only involves a ping.
Related
Reading about RSS leads to many false-informations. I am not quite sure how RSS works. So I have some questions and I hope you dont answer using links-only. There is always another link that claims your link is wrong.
Questions:
If I subscribe to a RSS-Feed the first time, are the feeds from the last 30 years downloaded as a bulk-response may have Gigabytes of data?
Are following requests to a already subscribed RSS-Feed updates to the previous subscription? If yes, how does the server know what messages are already transported to the "client"?
How often are RSS-Feeds downloaded?
Kind regards
You get whatever is currently in the feed. How many entries and how far back that goes is up to the publisher.
No. Each request gets whatever is in the feed at the time.
As often as the client wants to download them. (The format includes options to recommend a frequency but clients may ignore it).
I'm sorry if the title is too generic, but I've been browsing the Internet for one hour and I couldn't find any architectural explanation. I'm totally new both to RSS and Atom protocols, as far as I have understood until now is:
A server publishes documents
Clients subscribe to this server
Clients are notified when the server publishes new documents
Clients consume the documents
It seems like a queueing mechanism (like JMS). What is not clear to me is:
"Clients are notified" is just another way of saying "clients must poll the server to check if there are new messages"?
How does a client know that a message has already been read and that is no longer 'new'? Is this check in charge to the client or to the server?
Can anyone address me to some documentation about that? I've been googling for a while but every search sends me to sites that explain how to use libraries for parsing etc....
Thanx
I think these answer your questions:
How large RSS reader works (netvibes, Google reader...)
How RSS and ATOM inform client about updates? long polling or polling or something else?
RSS 2.0 Specification
https://en.wikipedia.org/wiki/PubSubHubbub
How does a client know that a message has already been read and that
is no longer 'new'?
I think that is specific to the implementation, but for example you could save guids of each fetched <item> and then flag them read as the user reads the items.
I think Janih's answer below is good and you should check all these links.
For more specific details to you questions:
Clients are notified" is just another way of saying "clients must poll
the server to check if there are new messages?
Yes... and no. Yes, polling is the default and yes it's cumbersome. Protocols like PubSubHubbub will help. RSS Feed API services like Superfeedr (which I built!) will do it on your behalf and send you notifications using a webhooks (so you don't have to poll at all!)
Other than HTTP PUT and POST, what other methods can a web application designer use to allow users to upload content (either files or listbox text) from a page of his web app to a remote server?
On the same topic, I was wondering what technology/APIs does a service like Google Docs or Google Drive use? The reason I ask this is: Our Sys Admin has disabled file uploading (via Squid proxy), yet I was able to create and share a document using Google Docs / Google Drive.
Many thanks in advance,
/HS
EDIT Please see the strikeout above.
This depends on the server in question - as the standard set of HTTP commands can be expanded, and some may not be configured/allowed. One of the common commands is "OPTIONS" that ask "what can I do".
But to answer more helpfully: you generally have two main options:
POST (the one you probably want to user as it's nearly always avaiable
GET. You could use GET (but I'm NOT advocating it - just saying you could you it - you should not use a GET to make changes to the server). There are problems with this approach (including size of files, manually handling the encoding etc) but it's possible if you have to go this route.
PUT it often not enabled on servers for security reasons.
More reading: http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
Edit: if "file uploading" is prevented by proxy, have you tried encoding the POST? i.e. As opposed to sending a multipart POST, try encoding the files yourself into POST string and sending that instead? Or encode the file and split into multiple small posts and piecing them together at the other end?
Google Docs uses a mixture of POST and GET. POST for the updates. Google Drive I don't know.
So, say I'm a journalist, who wants some way of easily posting links to stories I've written that are published to my newspaper's website. Alas, my newspaper's website doesn't offer user-level RSS feeds (user-level anything for journalists, really).
Running a search (I.e., http://www.calgaryherald.com/search/search.html?q=Rininsland) brings up everything I've done in reverse chronological order (albeit with some duplicates; ignore for now, will deal with later). Is there any way I can parse this into an RSS feed?
It seems like Yahoo! Pipes might be an easy way to do this, but I'm open to whatever.
Thanks!
Normally this would be a great use of Yahoo Pipes, but it appears that the search page you cited has a robots.txt file, which Pipes respects. This means that Pipes will not pull data from the page.
For more info: "How do I keep Pipes from accessing my web pages?"
http://pipes.yahoo.com/pipes/docs?doc=troubleshooting#q14
You would have to write a scraper yourself that makes an HTTP request to that URL, parses the response, and writes RSS as output. This could be done in many server-side environments such as PHP, Python, etc.
EDIT: Feedity provides a service to scrape web pages into feeds. Here is a Feedity feed of your search url:
http://feedity.com/rss.aspx/calgaryherald-com/UFJWUVZQ
However, unless you sign up for a subscription ($3.25/mo), this feed will be subject to the following constraints:
Free feeds created
without an account are limited to 5
items and 10 hours update interval.
Free feeds created without an account
are automatically purged from our
system after 30 days of inactivity.
Provided it's just links and a timestamp you want for each article then the Yahoo Pipes Search module will return the latest 10 in it's search index of the Herlad site.
Just learning about this via youtube but could not find answer to my question of how reader knows there is an update.
Is it like a Push in blackberry?
RSS is a file format source and doesn't actually know anything about where it gets the entries from. The answer really is: "how can an http request get only the newest results from a server" and the answer is Conditional GET source. Http also supports Conditional PUT.
This is an article about using this feature of http to specifically support rss hackers.
RSS is a pull technology. The reader re-fetches the RSS feed now and then (for example two times per hour, or more often if the reader learns that it's an often updated feed).
The feed is served through regular HTTP and consists of a simple XML file. It is always fetched from the same URL.
It just check the feed for update regularly.
Recently there is a new protocol called pubsubhubbub to make feed push to the listener. But it requires the publishers support it.
Here is a list of web services support real-time RSS pushing, including Google Reader, Blogger, FeedBurner, FriendFeed, MySpace, etc.
Let's summarize :
Usually, a client knows that an RSS feed has been updated through polling, that is regular pull (HTTP GET request on the feed URL)
Push doesn't exist on the web, at least, not with HTTP until HTML5 websocket is fixed.
However, some blog frameworks like Wordpress, Google and others, now support the pubsubhubbub convention. In this mode, you would "subscribe" to the updates of an RSS flow. The "hub" will call an URL on YOUR site (callback URL) to send you updates : that is a push.
Push or pull, in both cases you still need to write some piece of code to update the RSS list on your site, database or wherever you store/display it.
And, as a side question, it is not necessary to request the whole XML at every pull to see if the content has changed : using a standard that is not linked to RSS, but global to the whole HTTP protocol (etag and last-modified headers), you can know if the RSS page was modified after a given date, and grab the whole XML only if modified.
It's a pull. That's why you have to configure your reader how often it should refresh the feed.