Disconnected meteor application - meteor

I am interested in creating an application using the the Meteor framework that will be disconnected from the network for long periods of time (multiple hours). I believe meteor stores local data in RAM in a mini-mongodb js structure. If the user closes the browser, or refreshes the page, all local changes are lost. It would be nice if local changes were persisted to disk (localStorage? indexedDB?). Any chance that's coming soon for Meteor?
Related question... how does Meteor deal with document conflicts? In other words, if 2 users edit the same MongoDB JSON doc, how is that conflict resolved? Optimistic locking?

Conflict resolution is "last writer wins".
More specifically, each MongoDB insert/update/remove operation on a client maps to an RPC. RPCs from a given client always play back in order. RPCs from different clients are interleaved on the server without any particular ordering guarantee.
If a client tries to issue RPCs while disconnected, those RPCs queue up until the client reconnects, and then play back to the server in order. When multiple clients are executing offline RPCs, the order they finally run on the server is highly dependent on exactly when each client reconnects.
For some offline mutations like MongoDB's $inc and $addToSet, this model works pretty well as is. But many common modifiers like $set won't behave very well across long disconnects, because the mutation will likely conflict with intervening changes from other clients.
So building "offline" apps is more than persisting the local database. You also need to define RPCs that implement some type of conflict resolution. Eventually we hope to have turnkey packages that implement various resolution schemes.

Related

How efficient is Meteor's DDP at syncing very large collections?

Meteor's DDP protocol works very well for syncing a small collection of data from a server to a browser-based client, which inherently limits the amount of data that is processed.
However, consider a situation where Meteor is being used to sync a large collection from one server to another, or just the DDP protocol itself is used to sync one MongoDB with another.
How efficient is DDP in this case (computationally)? How well does it scale to several clients? Is the limit to performance only bandwidth or will DDP hit some CPU bound as well? What is the largest amount of data that can be reasonably synced over DDP right now? Is DDP just the wrong approach for doing this (see references below)?
Some additional thoughts:
As far as I know, the current version of DDP keeps track of each client's entire collection, so it can't be asymptotically very efficient.
Smart Collections were created to improve the performance of server-to-client collection of syncing. But it's unclear to me if this is improving DDP or something else.
See also:
How to implement real-time replication of MongoDB (or CouchDB) to many remote clients
DDP vs Straight MongoDB access for synching large amounts of data
EDIT:
After some empirical experience with this, I have to conclude that the answer is "not very efficient". See https://stackoverflow.com/a/21835534/586086 for an explanation.
Discussions with Meteor devs indicated that this problem will be addressed in the future with a revision of DDP and the publish-subscribe API, whereby the merge box will be removed and clients will handle merging. This will save CPU/memory on the server and allow for much larger datasets to be sent over the wire.
Basically it is more a matter of what and how you are publishing to the client than the number of clients. A request is usually handled in log2(N) if indexed, therefore it is quite easy for the server to recompute the result set even if (in the worst case) the whole collection would change. So, from the server side you can quite quickly get the new result sets to publish to the clients (if they changed from the one they had already).
The real problem (and common error) comes when you do publish everything to the client (like with the former autopublish), so make a publication wisely so that you do only give what the client is supposed to see. You can either prune the documents hiding useless fields or reduce the result set to send to the client by creating a publication with specific to your data scope of use parameters.
Data reactivity (session parameter bound on a publication) should also be handled with care, if for example you are sending a request each time you press a key in the search field, you might quickly overload the connection (still strongly depending on the size of the set you are publishing). We had to take care of this trying to build real estate service over meteor, the data set being over several gigabytes it was quite challenging to handle this without blocking the pipe with overloaded data.
In term of bandwidth, the DDP is quite good because it does supports clever entries updating (sending only fields changes instead of the whole document), moving an item is (will be) supported to (server side reordering).
Also take a look on this excellent answer concerning huge collections, what is done under the hood.

Meteorjs how much of a real time is it really?

I had my chance to play with this tool for a while now and made a chat application instead of a hello world. My project has 2 meteor applications sharing the same mongo database:
client
operator
when I type a message from the operator console it sometimes takes as much as 7-8 seconds to appear to the subscribed client. So my question is...how much of a real time can I expect from this meteor? Right now I can see better results with other services such as pubnub or pusher.
Should the delay come from the fact that it's 2 applications subscribed to the same db?
P.S. I need 2 applications because the client and operator apps are totally different mostly in design and media libraries (css/jquery plugins etc.) which is the only way I found to make the client app much lighter.
If you use two databases without DDP your apps are not going to operate in real time. You should either use one complete app or use DDP to relay messages to the other instance (via Meteor.connect)
This is a bit of an issue for the moment if you want to do the subscription on the server as there isn't really server to server ddp support with subscriptions yet. So you need to use the client to make the subscription:
connection = Meteor.connect("http://YourOtherMetorInstanceUrl");
connection.subscribe("messages");
Instead of
Meteor.subscribe("messages");
In your client app, of course using the same subscription names as you do for your corresponding publish functions on the other meteor instance
Akshat's answer is good, but there's a bit more explanation of why:
When Meteor is running it adds an observer to the collection, so any changes to data in that collection are immediately reactive. But, if you have two applications writing to the same database (and this is how you are synchronizing data), the observer is not in place. So it's not going to be fully real-time.
However, the server does regularly poll the database for outside changes, hence the 7-8 second delay.
It looks like your applications are designed this way to overcome the limitation Meteor has right now where all client code is delivered to all clients. Fixing this is on the roadmap.
In the mean time, in addition to Akshat's suggestion, I would also recommend using Meteor methods to insert messages. Then from client application, use Meteor.call('insertMessage', options ... to add messages via DDP, which will keep the application real-time.
You would also want to separate the databases.

DDP vs Straight MongoDB access for synching large amounts of data

We are building an app in Meteor that will be participating in an education ecosystem.
There are a number of applications (e.g. a GradeBook, a Student Information System, a Reporting System...) that will all need to have their data stores kept in synch with Meteor. The datastore size will be in the hundreds of thousands of documents.
My understanding is that DDP is used to connect "clients" to a Meteor app (by subscribing to feeds when Meteor is pushing data changes and RPC to get the data in to Meteor). And a "client" is generally scoped to a user...so the size of the data set is relatively small compared to the universe of data (a teacher might have access to 100 of the 250K documents).
If I connected a Reporting System (as a "client") to Meteor with DDP, all data in the store would need to be synched...does that mean that every time the Reporting System lost the connection to Meteor, all data would be re-sent from Meteor to the DDP client? (because the Reporting System is interested in ALL the data)...and if that's the case, DDP wouldn't be the way to keep application in synch, right?...it's meant more for much smaller scoped data sets....and we should probably be interacting directly with Mongo to keep things synch.
Thanks!
Mike
based on this
http://meteor.com/blog/2012/03/21/introducing-ddp
Distributed Data Protocol. DDP is a standard way to solve the biggest problem facing client-side JavaScript developers: querying a server-side database, sending the results down to the client, and then pushing changes to the client whenever anything changes in the database.
it seems clear that any new DDP client, receives all data and then deltas as the data changes.
i would suggest that if your 'client' doesnt need reactivity / realtime updates / 2 way synching, you should pull the data directly from mongo and avoid the overhead of 'syncing'. for a 'reporting system' this should be perfectly acceptable, grab a bunch of data, generate reports. you shouldnt care about changing data in this context, just a snapshot and reports from that snapshot.
if you do need the more real time features, DDP is likely worth the overhead and initial setup difficulty.
I think nate's answer goes perfect on what you should do especially considering the volume of data. And if you need to display a whole lot of data if you're using pages to use a paginated subscription so that you can enjoy the realtime functionality (if you decide to use it) without downloading it all at once. Keep in mind though that at the moment the data is sent down like this (for each session, so if the tab is closed and reopened it will be redone):
1 - Connect to DDP Server/Proxy (Long Polling now due to websocket issues with chrome)
2 - Establish a 'subscription'
3 - Fetch all data relevant to subscription (initial download)
4 - Subscription is complete, now the client will 'listen' for changes
5 - Any updates (remove/update/insert, etc) are sent down to the client
There really isn't a sync system at this point where old data is kept offline (in a localstorage or indexed db or anything) so that step no 3 can be avoided and only the syncs from that point would occur.
This is mind, if there is a connectivity interruption (e.g losing connectivitiy for a short peroid of time Meteor will lose connection to the DDP wire and when it reconnects it download everything again as if it were from scratch.

flex and data concurrency

I am soon to embark on a medium scale project. Although this isn't a very high priority in my large list of things to do but I have been trying of how I could affectively handle data concurrency.
I will be using a stateless EJB backend to my flex application.
Ideally I am looking for a simple method to deal with data concurrency. e.g. if data is saved on one interface it is refreshed in another. Or it warns that the data has been changed before saving a new version of the data.
Has anyone any ideas as I am at a loss at the moment. As I mentioned its not a high priority but I would feel a lot better if I had some mechanism to improve the process.
If you are planning on using AMF channels for communication you can use the long polling feature to effectively give your application "push message" type support. Both the BlazeDS and/or GraniteDS data services support this capability for exactly the reasons you mentioned.
Version control systems store user_id and datetime for every revision. You can use same method. Client app get current datetime for requested data and save it. App send on changed data with saved datetime. Server checks datetime of last revision and received datetime. And reply to app accordingly.
Second method is using broadcast messages from server to clients. But I don't think it's applicable in your case. This method put into practice in LAN (environment with stable connect) usually.

Asp.net chat application using database for message queue

I have developed a chat web application which uses a SqlServer database for exchanging messages.
All clients poll every x seconds to check for new messages.
It is obvious that this approach consumes many resources, and I was wondering if there is a "cheaper" way of doing that.
I use the same approach for "presence": checking who is on.
Without using a browser plugin/extension like flash or java applet, browser is essentially a one way communication tool. The request has to be initiated by the browser to fetch data. You cannot 'push' data to the browser.
Many web app using Ajax polling method to simulate a server 'push'. The trick is to balance the frequency/data size with the bandwidth and server resources.
I just did a simple observation for gmail. It does a HttpPost polling every 5 seconds. If there's no 'state' change, the response data size is only a few bytes (not including the http headers). Of course google have huge server resources and bandwidth, that's why I mention: finding a good balance.
That is "Improving user experience vs Server resource". You might need to come out with a creative way of polling strategy, instead of a straightforward polling every x seconds.
E.g. If no activity from party A, poll every 3 seconds. While party A is typing, poll every 5 seconds. This is just a illustraton, you can play around with the numbers, or come out with a more efficient one.
Lastly, the data exchange. The challenge is to find a way to pass minimum data sizes to convey the same info.
my 2 cents :)
For something like a real-time chat app, I'd recommend a distributed cache with a SQL backing. I happen to like memcached with the Enyim .NET provider, so I'd do something like the following:
User posts message
System writes message to database
System writes message to cache
All users poll cache periodically for new messages
The database backing allows you to preload the cache in the event the cache is cleared or the application restarts, but the functional bits rely on in-memory cache, rather than polling the database.
If you are using SQL Server 2005 you can look at Notification Services. Granted this would lock you into SQL 2005 as Notification Services was removed in SQL 2008 it was designed to allow the SQL Server to notify client applications of changes to the database.
If you want something a little more scalable, you can put a couple of bit flags on the Users record. When a message for the user comes in change the bit for new messages to true. When you read the messages change it to 0. Same for when people sign on and off. That way you are reading a very small field that has a damn good chance of already being in cache.
Do the workflow would be ready the bit. If it's 1 then go get the messages from the message table. If it's 0 do nothing.
In ASP.NET 4.0 you can use the Observer Pattern with JavaScript Objects and Arrays ie: AJAX JSON calls with jQuery and or PageMethods.
You are going to always have to hit the database to do analysis on whether there is any data to return or not. The trick will be on making those calls small and only return data when needed.
There are two related solutions built-in to SQL Server 2005 and still available in SQL Server 2008:
1) Service Broker, which allows subscribers to post reads on queues (the RECEIVE command with WAIT..). In your case you would want to send your message through the database by using Service Broker Services fronting these Queues, which could then be picked up by the waiting clients. There's no polling, the waiting clients just get activated when a message is received.
2) Query Notifications, which allow a subscriber to define a Query, and the receive notifications when the dataset that would result from executing that query would change. Built on Service Broker, Query Notifications are somewhat easier to use, but may also be somewhat less efficient. (Not that Query Notifications and their siblings, Event Notifications are frequently mistaken for Notification Services (NS), which causes concern because NS is decommitted in 2008, however, Query & Event Notifications are still fully available and even enhanced in SQL Server 2008).

Resources