Consuming Atom feeds: how does it work? - rss

I'm sorry if the title is too generic, but I've been browsing the Internet for one hour and I couldn't find any architectural explanation. I'm totally new both to RSS and Atom protocols, as far as I have understood until now is:
A server publishes documents
Clients subscribe to this server
Clients are notified when the server publishes new documents
Clients consume the documents
It seems like a queueing mechanism (like JMS). What is not clear to me is:
"Clients are notified" is just another way of saying "clients must poll the server to check if there are new messages"?
How does a client know that a message has already been read and that is no longer 'new'? Is this check in charge to the client or to the server?
Can anyone address me to some documentation about that? I've been googling for a while but every search sends me to sites that explain how to use libraries for parsing etc....
Thanx

I think these answer your questions:
How large RSS reader works (netvibes, Google reader...)
How RSS and ATOM inform client about updates? long polling or polling or something else?
RSS 2.0 Specification
https://en.wikipedia.org/wiki/PubSubHubbub
How does a client know that a message has already been read and that
is no longer 'new'?
I think that is specific to the implementation, but for example you could save guids of each fetched <item> and then flag them read as the user reads the items.

I think Janih's answer below is good and you should check all these links.
For more specific details to you questions:
Clients are notified" is just another way of saying "clients must poll
the server to check if there are new messages?
Yes... and no. Yes, polling is the default and yes it's cumbersome. Protocols like PubSubHubbub will help. RSS Feed API services like Superfeedr (which I built!) will do it on your behalf and send you notifications using a webhooks (so you don't have to poll at all!)

Related

TLDR; Does RSS-Feeds transport bulk or update?

Reading about RSS leads to many false-informations. I am not quite sure how RSS works. So I have some questions and I hope you dont answer using links-only. There is always another link that claims your link is wrong.
Questions:
If I subscribe to a RSS-Feed the first time, are the feeds from the last 30 years downloaded as a bulk-response may have Gigabytes of data?
Are following requests to a already subscribed RSS-Feed updates to the previous subscription? If yes, how does the server know what messages are already transported to the "client"?
How often are RSS-Feeds downloaded?
Kind regards
You get whatever is currently in the feed. How many entries and how far back that goes is up to the publisher.
No. Each request gets whatever is in the feed at the time.
As often as the client wants to download them. (The format includes options to recommend a frequency but clients may ignore it).

how to show updated data to the users as fast as possible (not real-time)?

In database some entity is getting updated by some backend process. We want to show this updated value to the user not real-time but as fast as possible on website.
Problems we are facing with these approaches.
Polling :- As we know that there are better techniques then polling like SSE, WebSockets.
SSE :- In SSE the connection open for long time(I search on internet and found that it uses long polling). Which might cause problem when user increases.
WebSockets :- As we need only one way communication(from server to client), SSE is better then this.
Our Solution
We check database on every request of user and update the value.(It is not very good as it will depend upon user next request)
Is it good approach or is there any better way to do this or Am I missing something about SSE(misunderstood something).
Is it fine to use SignalR instead of this all?(is there any long connection issue in it or not?)
Thanks.
It's just up to your requirements what you should use.
Options:
You clients need only the update information, in the case they make a request -> Go your way
If you need a solution with different client types like (Webclient, Winformclient, Androidclient,....) and you have for example different browser types which you should support. Not all browsers support all mechanisme... SignalR was designed to choose automatically the right transport mechanisme according to the mechanisme which a clients supports --> SignalR is an option. (Read more details here: https://www.asp.net/signalr) Has also options that your connection keeps alive.
There are also alternatives like https://pusher.com/ (For short this is only a queue where you can send messages, and also subscribe for messages) But these services are only free until for example some data volume.
You can use event based communication. When ever there is a change(event) in the backend/database, server should send a message to clients.
Your app should register to respective events and refresh the UI when ever there is an update.
We used Socket IO for this usecase, in our apps and it worked well.
Here is the website https://socket.io/

Web feed RSS and Atom: both inefficient?

As far as I understand, both web-feeds RSS and Atom request, starting at the client side, content from the server, and they do that at periodic intervals of time. It doesn't matter whether there is new content or not, the client checks for updates.
Wouldn't it be more efficient the other way round? Let the server announce new updates. In this scenario, it would have to keep track of the clients, and when each got what update. It would also have to send a message to each one. But still, it looks more efficient if client-server were not communicating when there are no new news.
Is there a reason why web-feeds are the way they are?
This model is not inherent to feeds (RSS or Atom), but to HTTP itself, where a client queries a server to get data. This is at this point, the only way in a pure client -> server model to determine whether there is any new data available or updated.
Now, in the context of server querying other servers, PubsubHubbub solves that with webhooks. Basically, when polling any given resource, a server can also "subscribe" by providing a webhook which will be called upon a change or update in the feed. This way the subscriber does not have to poll the feed over and over again.

Is there a way using HTTP to allow the server to update the content in a client browser without client requesting for it?

It is quite easy to update the interface by sending jQuery ajax request and updating with new content. But I need something more specific.
I want to send the response to client without their having requested it and update the content when they have found something new on the server. No need to send an ajax request every time. When the server has new data it sends a response to every client.
Is there any way to do this using HTTP or some specific functionality inside the browser?
Websockets, Comet, HTTP long polling.
It has name server push (you can also find it under name Comet technology). Do search using these keywords and you will find bunch examples, tools and so on. No special protocol is required for that.
Aaah! You are trying to break the principles of the web :) You see if the web was pure MVC (model-view-controller) the 'server' could actually send messages to the client(s) and ask them to update. The issue is that the server could be load balanced and the same request could be sent to different servers. Now if you were to send a message back to the client you'll have to know who all are connected to the server. Let's say the site is quite popular and you have about 100,000 people connecting to it every day. You'll actually have to store the IPs of each of them to know where on the internet they are located and to be able to "push" them a message.
Caveats:
What if they are no longer browsing your website? You see currently there is no way to log out automatically if you close your browser. The server needs to check after a fixed timeout if you have logged out (or you send a new nonce with every response to prevent the server from doing that check)
What about a system restart/crash etc? You'd lose all the IPs that you were keeping track of and you are back to square one - you have people connected to you but until you receive new requests you can't really "send" them data when they may be expecting it as per your model.
Let's take an example of facebook's news feeds or "Most recent" link close to the top right - sometimes while you are browsing your wall you see the number next to most recent has gone up or a new 'feed' has come to the top of your wall post! It's the client sending periodic requests to the server to find out what was updated rather than the other way round
You see, it keeps it simple and restful. You may feel it's inefficient for the client to "poll" the server to pull the data and you'd prefer push, but the design of the server gets simplified :)
I suggest ajax-pulling is the best way to go - you are distributing computation to the client and keeping it simple (KIS principle :)
Of course you can get around it, the question is, is it worth it?
Hope this helps :)
RFC 6202 might be a good read.

RSS feed basics - just repeatedly overwriting the same file?

Really simple question here:
For a PHP-driven RSS feed, am I just overwriting the same XML file every time I "publish" a new feed thing? and the syndicates it's registered with will pop in from time to time to check that it's new?
Yes. An RSS reader has the URL of the feed and regularly requests the same URL to check for new content.
that's how it works, a simple single xml rss file that gets polled for changes by rss readers
for scalability there there is FeedTree: collaborative RSS and Atom delivery but unlike another well known network program (bittorrent) it hasn't had as much support in readers by default
Essentially, yes. It isn't necessarily a "file" actually stored on disk, but your RSS (or Atom) is just changed to contain the latest items/entries and resides at a particular fixed URL. Clients will fetch it periodically. There are also technologies like PubSubHubbub and pinging for causing updates to get syndicated closer to real-time.
Yes... BUT! There are ways to make the susbcribers life better and also improve your bandwidth :) Implement the PubSubHubbub protocol. It will help any application that wants the content of the feed to be notified as soon as it's available. It'es relatively simple to implement on the publisher side as it only involves a ping.

Resources