How to avoid race conditions on cursor.observe?

How to avoid race conditions on cursor.observe? - meteor

Race condiditions
In my Meteor application, I made an observe within a publish, that insert some new data in certain conditions. The point is that sometimes we have duplicated subscriptions, and race condition leads us to duplicate inserted data.
If it is not possible to have "singleton observers":
How can we avoid race conditions and duplicated inserted data on database?
Example:
Meteor.publish("fortuneUpdate", function () {
var selector = {user: this.userId, seen:false};
DailyFortunes.find(selector).observe({
removed: function(doc, beforeIndex){
if(DailyFortunes.find(selector).count()<1)
createDailyFortune(this.userId);
}
});
}
This question has been moved from How cursor.observe works and how to avoid multiple instances running?

According to Tom, it is not possible, for now, to ensure that calls to subscribe that have the same arguments are shared.
So, if you are having the same problem I had, of redundant data created inside observers, I suggest you, as workaround, to:
Create robust indexes that prevent repeted data creating. Compound Keys is probable what you need here.
Treat duplicate key error exceptions inside your observer ignoring race conditions.
example:
Collection.find(selector).observe({
removed: function(document){
try {
// Workaround to avoid race conditions > https://stackoverflow.com/q/13095647/599991
createNewDocument();
} catch (e) {
// XXX string parsing sucks, maybe
// https://jira.mongodb.org/browse/SERVER-3069 will get fixed one day
if (e.name !== 'MongoError') throw e;
var match = e.err.match(/^E11000 duplicate key error index: ([^ ]+)/);
if (!match) throw e;
//if match, just do nothing.
}
self.flush();
}
});

This is an odd pattern. Can you share some example code?
Generally I'd either expect to see mutations in a method, or setting up an observe inside Meteor.startup() on the server. (The latter is tricky if you're running multiple server processes, but so are many other things in a multi process regime. We'll have a better pattern down the line.)
Because it can be arbitrary JS, a publish function has to run once per subscribing client. It may log new subscriptions, set up per-client server state, or vary its behavior based on this.userId or even a random source. For example, consider a subscription that returns 10 randomly selected documents from a DB collection to each subscribed client!
So the place to optimize the case of many clients subscribing to the same data set is at the DB query layer: if a thousand clients are subscribed to the same DB query, we'll just run that underlying query once.

Related

Firebase RTD, atomic "move" ... delete and add from two "tables"?

In Firebase Realtime Database, it's a pretty common transactional thing that you have
"table" A - think of it as "pending"
"table" B - think of it as "results"
Some state happens, and you need to "move" an item from A to B.
So, I certainly mean this would likely be a cloud function doing this.
Obviously, this operation has to be atomic and you have to be guarded against racetrack effects and so on.
So, for item 123456, you have to do three things
read A/123456/
delete A/123456/
write the value to B/123456
all atomically, with a lock.
In short what is the Firebase way to achieve this?
There's already the awesome ref.transaction system, but I don't think it's relevant here.
Perhaps using triggers in a perverted manner?
IDK
Just for anyone googling here, it's worth noting that the mind-boggling new Firestore (it's hard to imagine anything being more mind-boggling than traditional Firebase, but there you have it...), the new Firestore system has built-in .......
This question is about good old traditional Firebase Realtime.

Gustavo's answer allows the update to happen with a single API call, which either complete succeeds or fails. And since it doesn't have to use a transaction, it has much less contention issues. It just loads the value from the key it wants to move, and then writes a single update.
The problem is that somebody might have modified the data in the meantime. So you need to use security rules to catch that situation and reject it. So the recipe becomes:
read the value of the source node
write the value to its new location while deleting the old location in a single update() call
the security rules validate the operation, either accepting or rejecting it
if rejected, the client retries from #1
Doing so essentially reimplements Firebase Database transactions with client-side code and (some admittedly tricky) security rules.
To be able to do this, the update becomes a bit more tricky. Say that we have this structure:
"key1": "value1",
"key2": "value2"
And we want to move value1 from key1 to key3, then Gustavo's approach would send this JSON:
ref.update({
"key1": null,
"key3": "value1"
})
When can easily validate this operation with these rules:
".validate": "
!data.child("key3").exists() &&
!newData.child("key1").exists() &&
newData.child("key3").val() === data.child("key1").val()
"
In words:
There is currently no value in key3.
There is no value in key1 after the update
The new value of key3 is the current value of key1
This works great, but unfortunately means that we're hardcoding key1 and key3 in our rules. To prevent hardcoding them, we can add the keys to our update statement:
ref.update({
_fromKey: "key1",
_toKey: "key3",
key1: null,
key3: "value1"
})
The different is that we added two keys with known names, to indicate the source and destination of the move. Now with this structure we have all the information we need, and we can validate the move with:
".validate": "
!data.child(newData.child('_toKey').val()).exists() &&
!newData.child(newData.child('_fromKey').val()).exists() &&
newData.child(newData.child('_toKey').val()).val() === data.child(newData.child('_fromKey').val()).val()
"
It's a bit longer to read, but each line still means the same as before.
And in the client code we'd do:
function move(from, to) {
ref.child(from).once("value").then(function(snapshot) {
var value = snapshot.val();
updates = {
_fromKey: from,
_toKey: to
};
updates[from] = null;
updates[to] = value;
ref.update(updates).catch(function() {
// the update failed, wait half a second and try again
setTimeout(function() {
move(from, to);
}, 500);
});
}
move ("key1", "key3");
If you feel like playing around with the code for these rules, have a look at: https://jsbin.com/munosih/edit?js,console

There are no "tables" in Realtime Database, so I'll use the term "location" instead to refer to a path that contains some child nodes.
Realtime Database provides no way to atomically transaction on two different locations. When you perform a transaction, you have to choose a single location, and you may only make changes under that single location.
You might think that you could just transact at the root of the database. This is possible, but those transactions may fail in the face of concurrent non-transaction write operations anywhere within the database. It's a requirement that there must be no non-transactional writes anywhere at the location where transactions take place. In other words, if you want to transact at a location, all clients must be transacting there, and no clients may write there without a transaction.
This rule is certainly going to be problematic if you transact at the root of your database, where clients are probably writing data all over the place without transactions. So, if you want perform an atomic "move", you'll either have to make all your clients use transactions all the time at the common root location for the move, or accept that you can't do this truly atomically.

Firebase works with Dictionaries, a.k.a, key-value pair. And to change data in more than one table on the same transaction you can get the base reference, with a dictionary containing "all the instructions", for instance in Swift:
let reference = Database.database().reference() // base reference
let tableADict = ["TableA/SomeID" : NSNull()] // value that will be deleted on table A
let tableBDict = ["TableB/SomeID" : true] // value that will be appended on table B, instead of true you can put another dictionary, containing your values
You should then merge (how to do it here: How do you add a Dictionary of items into another Dictionary) both dictionaries into one, lets call it finalDict,
then you can update those values, and both tables will be updated, deleting from A and "moving to" B
reference.updateChildValues(finalDict) // update everything on the same time with only one transaction, w/o having to wait for one callback to update another table

How to structure data in Firebase to avoid N+1 selects?

Since Firebase security rules cannot be used to filter children, what's the best way to structure data for efficient queries in a basic multi-user application? I've read through several guides, but they seem to break down when scaled past the examples given.
Say you have a basic messaging application like WhatsApp. Users can open chats with other groups of users to send private messages between themselves. Here's my initial idea of how this could be organized in Firebase (a bit similar to this example from the docs):
{
users: {
$uid: {
name: string,
chats: {
$chat_uid : true,
$chat2_uid: true
}
}
},
chats: {
$uid: {
messages: {
message1: 'first message',
message2: 'another message'
}
}
}
}
Firebase permissions could be set up to only let users read chats that are marked true in their user object (and restrict adding arbitrarily to the chats object, etc).
However this layout requires N+1 selects for several common scenarios. For example: to build the home screen, the app has to first retrieve the user's chats object, then make a get request for each thread to get its info. Same thing if a user wants to search their conversations for a specific string: the app has to run a separate request for every chat they have access to in order to see if it matches.
I'm tempted to set up a node.js server to run root-authenticated queries against the chats tree and skip the client-side firebase code altogether. But that's defeating the purpose of Firebase in the first place.
Is there a way to organize data like this using Firebase permissions and avoid the N+1 select problem?

It appears that n+1 queries do not necessarily need to be avoided and that Firebase is engineered specifically to offer good performance when doing n+1 selects, despite being counter-intuitive for developers coming from a relational database background.
An example of n+1 in the Firebase 2.4.2 documentation is followed by a reassuring message:
// List the names of all Mary's groups
var ref = new Firebase("https://docs-examples.firebaseio.com/web/org");
// fetch a list of Mary's groups
ref.child("users/mchen/groups").on('child_added', function(snapshot) {
// for each group, fetch the name and print it
String groupKey = snapshot.key();
ref.child("groups/" + groupKey + "/name").once('value', function(snapshot) {
System.out.println("Mary is a member of this group: " + snapshot.val());
});
});
Is it really okay to look up each record individually? Yes. The Firebase protocol uses web sockets, and the client libraries do a great deal of internal optimization of incoming and outgoing requests. Until we get into tens of thousands of records, this approach is perfectly reasonable. In fact, the time required to download the data (i.e. the byte count) eclipses any other concerns regarding connection overhead.

Meteor Deps.Autorun know when data has been fully fetched

Is there a way to know when data has been initially fully fetched from the server after running Deps.autorun for the first time?
For example:
Deps.autorun(function () {
var data = ItemsCollection.find().fetch();
console.log(data);
});
Initially my console log will show Object { items=[0] } as the data has not yet been fetched from the server. I can handle this first run.
However, the issue is that the function will be rerun whenever data is received which may not be when the full collection has been loaded. For example, I sometimes received Object { items=[12] } quickly followed by Object { items=[13] } (which isn't due to another client changing data).
So - is there a way to know when a full load has taken place for a certain dependent function and all collections within it?

You need to store the subscription handle somewhere and then use the ready method to determine whether the initial data load has been completed.
So if you subscribe to the collection using:
itemSub = Meteor.subscribe('itemcollections', blah blah...)
You can then surround your find and console.log statements with:
if (itemSub.ready()) { ... }
and they will only be executed once the initial dataset has been received.
Note that there are possible ocassions when the collection handle will return ready marginally before some of the items are received if the collection is large and you are dealing with significant latency, but the problem should be very minor. For more on why and how the ready () method actually works, see this.

Meteor.subscribe returns a handle with a reactive ready method, which is set to true when "an initial, complete snapshot of the record set has been sent" (see http://docs.meteor.com/#publish_ready)
Using this information you can design something simple such as :
var waitList=[Meteor.subscribe("firstSub"),Meteor.subscribe("secondSub"),...];
Deps.autorun(function(){
// http://underscorejs.org/#every
var waitListReady=_.every(waitList,function(handle){
return handle.ready();
});
if(waitListReady){
console.log("Every documents sent in publications is now available.");
}
});
Unless you're prototyping a toy project, this is not a solid design and you probably want to use iron-router (http://atmospherejs.com/package/iron-router) which provides great design patterns to address this kind of problems.
In particular, take a moment and have a look at these 3 videos from the main iron-router contributor :
https://www.eventedmind.com/feed/waiting-on-subscriptions
https://www.eventedmind.com/feed/the-reactive-waitlist-data-structure
https://www.eventedmind.com/feed/using-wait-waiton-and-ready-in-routes

Limiting Children of Object in Firebase

I am looking to get back my whole object, but limit one of my children objects.
For example, say you take a chat app like firebase does and you do "rooms".
So you might have
rooms: {
mainroom:{
name: something,
otherAttrs: mfasfd,
messages: {
0: {
message: something
},
1: {
message: something else
}
}
}
I may have 300 messages in that mainroom, but I want to limit it to 30 say. This example is basic, but in my actual application my objects are very related so I don't want to denormalize any further.
I could do a mainroom call, and then do another child call off of that, but I am wondering if I would get dinged twice. in the initial call it would load all messages anyways, and then I would load 30 of them with the child call. Was just hoping someone would have a better recommendation.

Start by reading up about denormalization. This is a concept which is enforced in SQL by table structures, but also important in NoSQL, although you're given enough rope to tangle yourself up and have a bad day.
So the first step is to split messages into its own path:
URL/rooms
URL/messages
Now you can grab your meta data and messages separately, and call limit to set the number loaded:
var fbRef = new Firebase(URL);
var roomRef = fbRef.child('rooms/'+roomId);
var chatRef = fbRef.child('messages/'+roomId).limit(30);
In case you're not convinced that these should be split up, you're going to run into this same issue when you want to create a dropdown containing a list of room names (you have to load all your messages in the current data structure, just to get the room names).
For great justice, split meta data and detailed records into their own paths. Otherwise, all your base are belong to bandwidth.

Is there a way to tell meteor a collection is static (will never change)?

On my meteor project users can post events and they have to choose (via an autocomplete) in which city it will take place. I have a full list of french cities and it will never be updated.
I want to use a collection and publish-subscribes based on the input of the autocomplete because I don't want the client to download the full database (5MB). Is there a way, for performance, to tell meteor that this collection is "static"? Or does it make no difference?
Could anyone suggest a different approach?

When you "want to tell the server that a collection is static", I am aware of two potential optimizations:
Don't observe the database using a live query because the data will never change
Don't store the results of this query in the merge box because it doesn't need to be tracked and compared with other data (saving memory and CPU)
(1) is something you can do rather easily by constructing your own publish cursor. However, if any client is observing the same query, I believe Meteor will (at least in the future) optimize for that so it's still just one live query for any number of clients. As for (2), I am not aware of any straightforward way to do this because it could potentially mess up the data merging over multiple publications and subscriptions.
To avoid using a live query, you can manually add data to the publish function instead of returning a cursor, which causes the .observe() function to be called to hook up data to the subscription. Here's a simple example:
Meteor.publish(function() {
var sub = this;
var args = {}; // what you're find()ing
Foo.find(args).forEach(function(document) {
sub.added("client_collection_name", document._id, document);
});
sub.ready();
});
This will cause the data to be added to client_collection_name on the client side, which could have the same name as the collection referenced by Foo, or something different. Be aware that you can do many other things with publications (also, see the link above.)
UPDATE: To resolve issues from (2), which can be potentially very problematic depending on the size of the collection, it's necessary to bypass Meteor altogether. See https://stackoverflow.com/a/21835534/586086 for one way to do it. Another way is to just return the collection fetch()ed as a method call, although this doesn't have the benefits of compression.

From Meteor doc :
"Any change to the collection that changes the documents in a cursor will trigger a recomputation. To disable this behavior, pass {reactive: false} as an option to find."
I think this simple option is the best answer

You don't need to publish your whole collection.
1.Show autocomplete options only after user has inputted first 3 letters - this will narrow your search significantly.
2.Provide no more than 5-10 cities as options - this will keep your recordset really small - thus no need to push 5mb of data to each user.
Your publication should look like this:
Meteor.publish('pub-name', function(userInput){
var firstLetters = new RegExp('^' + userInput);
return Cities.find({name:firstLetters},{limit:10,sort:{name:1}});
});