Firebase RTD, atomic "move" ... delete and add from two "tables"? - firebase

In Firebase Realtime Database, it's a pretty common transactional thing that you have
"table" A - think of it as "pending"
"table" B - think of it as "results"
Some state happens, and you need to "move" an item from A to B.
So, I certainly mean this would likely be a cloud function doing this.
Obviously, this operation has to be atomic and you have to be guarded against racetrack effects and so on.
So, for item 123456, you have to do three things
read A/123456/
delete A/123456/
write the value to B/123456
all atomically, with a lock.
In short what is the Firebase way to achieve this?
There's already the awesome ref.transaction system, but I don't think it's relevant here.
Perhaps using triggers in a perverted manner?
IDK
Just for anyone googling here, it's worth noting that the mind-boggling new Firestore (it's hard to imagine anything being more mind-boggling than traditional Firebase, but there you have it...), the new Firestore system has built-in .......
This question is about good old traditional Firebase Realtime.

Gustavo's answer allows the update to happen with a single API call, which either complete succeeds or fails. And since it doesn't have to use a transaction, it has much less contention issues. It just loads the value from the key it wants to move, and then writes a single update.
The problem is that somebody might have modified the data in the meantime. So you need to use security rules to catch that situation and reject it. So the recipe becomes:
read the value of the source node
write the value to its new location while deleting the old location in a single update() call
the security rules validate the operation, either accepting or rejecting it
if rejected, the client retries from #1
Doing so essentially reimplements Firebase Database transactions with client-side code and (some admittedly tricky) security rules.
To be able to do this, the update becomes a bit more tricky. Say that we have this structure:
"key1": "value1",
"key2": "value2"
And we want to move value1 from key1 to key3, then Gustavo's approach would send this JSON:
ref.update({
"key1": null,
"key3": "value1"
})
When can easily validate this operation with these rules:
".validate": "
!data.child("key3").exists() &&
!newData.child("key1").exists() &&
newData.child("key3").val() === data.child("key1").val()
"
In words:
There is currently no value in key3.
There is no value in key1 after the update
The new value of key3 is the current value of key1
This works great, but unfortunately means that we're hardcoding key1 and key3 in our rules. To prevent hardcoding them, we can add the keys to our update statement:
ref.update({
_fromKey: "key1",
_toKey: "key3",
key1: null,
key3: "value1"
})
The different is that we added two keys with known names, to indicate the source and destination of the move. Now with this structure we have all the information we need, and we can validate the move with:
".validate": "
!data.child(newData.child('_toKey').val()).exists() &&
!newData.child(newData.child('_fromKey').val()).exists() &&
newData.child(newData.child('_toKey').val()).val() === data.child(newData.child('_fromKey').val()).val()
"
It's a bit longer to read, but each line still means the same as before.
And in the client code we'd do:
function move(from, to) {
ref.child(from).once("value").then(function(snapshot) {
var value = snapshot.val();
updates = {
_fromKey: from,
_toKey: to
};
updates[from] = null;
updates[to] = value;
ref.update(updates).catch(function() {
// the update failed, wait half a second and try again
setTimeout(function() {
move(from, to);
}, 500);
});
}
move ("key1", "key3");
If you feel like playing around with the code for these rules, have a look at: https://jsbin.com/munosih/edit?js,console

There are no "tables" in Realtime Database, so I'll use the term "location" instead to refer to a path that contains some child nodes.
Realtime Database provides no way to atomically transaction on two different locations. When you perform a transaction, you have to choose a single location, and you may only make changes under that single location.
You might think that you could just transact at the root of the database. This is possible, but those transactions may fail in the face of concurrent non-transaction write operations anywhere within the database. It's a requirement that there must be no non-transactional writes anywhere at the location where transactions take place. In other words, if you want to transact at a location, all clients must be transacting there, and no clients may write there without a transaction.
This rule is certainly going to be problematic if you transact at the root of your database, where clients are probably writing data all over the place without transactions. So, if you want perform an atomic "move", you'll either have to make all your clients use transactions all the time at the common root location for the move, or accept that you can't do this truly atomically.

Firebase works with Dictionaries, a.k.a, key-value pair. And to change data in more than one table on the same transaction you can get the base reference, with a dictionary containing "all the instructions", for instance in Swift:
let reference = Database.database().reference() // base reference
let tableADict = ["TableA/SomeID" : NSNull()] // value that will be deleted on table A
let tableBDict = ["TableB/SomeID" : true] // value that will be appended on table B, instead of true you can put another dictionary, containing your values
You should then merge (how to do it here: How do you add a Dictionary of items into another Dictionary) both dictionaries into one, lets call it finalDict,
then you can update those values, and both tables will be updated, deleting from A and "moving to" B
reference.updateChildValues(finalDict) // update everything on the same time with only one transaction, w/o having to wait for one callback to update another table

Related

Concurrent updates in DynamoDB, are there any guarantees?

In general, if I want to be sure what happens when several threads make concurrent updates to the same item in DynamoDB, I should use conditional updates (i.e.,"optimistic locking"). I know that. But I was wondering if there is any other case when I can be sure that concurrent updates to the same item survive.
For example, in Cassandra, making concurrent updates to different attributes of the same item is fine, and both updates will eventually be available to read. Is the same true in DynamoDB? Or is it possible that only one of these updates survive?
A very similar question is what happens if I add, concurrently, two different values to a set or list in the same item. Am I guaranteed that I'll eventually see both values when I read this set or list, or is it possible that one of the additions will mask out the other during some sort of DynamoDB "conflict resolution" protocol?
I see a version of my second question was already asked here in the past Are DynamoDB "set" values CDRTs?, but the answer refered to a not-very-clear FAQ entry which doesn't exist any more. What's I would most like to see as an answer to my question is an official DynamoDB documentation that says how DynamoDB handles concurrent updates when neither "conditional updates" nor "transactions" are involved, and in particular what happens in the above two examples. Absent such official documentation, does anyone have any real-world experience with such concurrent updates?
I just had the same question and came across this thread. Given that there was no answer I decided to test it myself.
The answer, as far as I can observe is that as long as you are updating different attributes it will eventually succeed. It does take a little bit longer the more updates I push to the item so they appear to be written in sequence rather than in parallel.
I also tried updating a single List attribute in parallel and this expectedly fail, the resulting list once all queries had completed was broken and only had some of the entries pushed to it.
The test I ran was pretty rudimentary and I might be missing something but I believe the conclusion to be correct.
For completeness, here is the script I used, nodejs.
const aws = require('aws-sdk');
const ddb = new aws.DynamoDB.DocumentClient();
const key = process.argv[2];
const num = process.argv[3];
run().then(() => {
console.log('Done');
});
async function run() {
const p = [];
for (let i = 0; i < num; i++) {
p.push(ddb.update({
TableName: 'concurrency-test',
Key: {x: key},
UpdateExpression: 'SET #k = :v',
ExpressionAttributeValues: {
':v': `test-${i}`
},
ExpressionAttributeNames: {
'#k': `k${i}`
}
}).promise());
}
await Promise.all(p);
const response = await ddb.get({TableName: 'concurrency-test', Key: {x: key}}).promise();
const item = response.Item;
console.log('keys', Object.keys(item).length);
}
Run like so:
node index.js {key} {number}
node index.js myKey 10
Timings:
10 updates: ~1.5s
100 updates: ~2s
1000 updates: ~10-20s (fluctuated a lot)
Worth noting is that the metrics show a lot of throttled events but these are handled internally by the nodejs sdk using exponential backoff so once the dust settled everything was written as expected.
Your post contains quite a lot of questions.
There's a note in DynamoDB's manual:
All write requests are applied in the order in which they were received.
I assume that the clients send the requests in the order they were passed through a call.
That should resolve the question whether there are any guarantees. If you update different properties of an item in several requests updating only those properties, it should end up in an expected state (the 'sum' of the distinct changes).
If you, on the other hand, update the whole object, the last one will win.
DynamoDB has #DynamoDbVersion which you can use for optimistic locking to manage concurent writes of whole objects.
For scenarios like auctions, parallel tick counts (such as "likes"), DynamoDB offers AtomicCounters.
If you update a list, that depends on if you use the DynamoDB's list type (L), or if it is just a property and the client translates the lists into a String (S). So if you read a property, change it, and write, and do that in parallel, the result will be subject to eventual consistency - what you will read may not be the latest write. Applied to lists, and several times, you'll end up with some of the elements added, and some not (or, better said, added but then overwritten).

Firebase update on disconnect

I have a node on firebase that lists all the players in the game. This list will update as and when new players join. And when the current user ( me ) disconnects, I would like to remove myself from the list.
As the list will change over time, at the moment I disconnect, I would like to update this list and update firebase.
This is the way I am thinking of doing it, but it doesn't work as .update doesnt accept a function. Only the object. But if I create the object beforehand, when .onDisconnect calls, it will not be the latest object... How should I go about doing this?
payload.onDisconnect().update( () => {
const withoutMe = state.roomObj
const index = withoutMe.players.indexOf( state.userObj.name )
if ( index > -1 ) {
withoutMe.players.splice( index, 1 )
}
return withoutMe
})
The onDisconnect handler was made for this use-case. But it requires that the data of the write operation is known at the time that you set the onDisconnect. If you think about it, this should make sense: since the onDisconnect happens after your client is disconnected, the data of the data of that write operation must be known before the disconnect.
It sounds like you're building a so-called presence system: a list that contains a node for each user that is currently online. The Firebase documentation has an example of such a presence system. The key difference from your approach is that it in the documentation each user only modifies their own node.
So: when the user comes online, they write a node for themselves. And then when they get disconnected, that node gets removed. Since all users write their node under the same parent, that parent will reflect the users that are online.
The actual implementation is a bit more involved since it deals with some edge cases too. So I recommend you check out the code in the documentation I linked, and use that as the basis for your own similar system.

Should I be further denormalizing? [duplicate]

I've read the Firebase docs on Stucturing Data. Data storage is cheap, but the user's time is not. We should optimize for get operations, and write in multiple places.
So then I might store a list node and a list-index node, with some duplicated data between the two, at very least the list name.
I'm using ES6 and promises in my javascript app to handle the async flow, mainly of fetching a ref key from firebase after the first data push.
let addIndexPromise = new Promise( (resolve, reject) => {
let newRef = ref.child('list-index').push(newItem);
resolve( newRef.key()); // ignore reject() for brevity
});
addIndexPromise.then( key => {
ref.child('list').child(key).set(newItem);
});
How do I make sure the data stays in sync in all places, knowing my app runs only on the client?
For sanity check, I set a setTimeout in my promise and shut my browser before it resolved, and indeed my database was no longer consistent, with an extra index saved without a corresponding list.
Any advice?
Great question. I know of three approaches to this, which I'll list below.
I'll take a slightly different example for this, mostly because it allows me to use more concrete terms in the explanation.
Say we have a chat application, where we store two entities: messages and users. In the screen where we show the messages, we also show the name of the user. So to minimize the number of reads, we store the name of the user with each chat message too.
users
so:209103
name: "Frank van Puffelen"
location: "San Francisco, CA"
questionCount: 12
so:3648524
name: "legolandbridge"
location: "London, Prague, Barcelona"
questionCount: 4
messages
-Jabhsay3487
message: "How to write denormalized data in Firebase"
user: so:3648524
username: "legolandbridge"
-Jabhsay3591
message: "Great question."
user: so:209103
username: "Frank van Puffelen"
-Jabhsay3595
message: "I know of three approaches, which I'll list below."
user: so:209103
username: "Frank van Puffelen"
So we store the primary copy of the user's profile in the users node. In the message we store the uid (so:209103 and so:3648524) so that we can look up the user. But we also store the user's name in the messages, so that we don't have to look this up for each user when we want to display a list of messages.
So now what happens when I go to the Profile page on the chat service and change my name from "Frank van Puffelen" to just "puf".
Transactional update
Performing a transactional update is the one that probably pops to mind of most developers initially. We always want the username in messages to match the name in the corresponding profile.
Using multipath writes (added on 20150925)
Since Firebase 2.3 (for JavaScript) and 2.4 (for Android and iOS), you can achieve atomic updates quite easily by using a single multi-path update:
function renameUser(ref, uid, name) {
var updates = {}; // all paths to be updated and their new values
updates['users/'+uid+'/name'] = name;
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.once('value', function(snapshot) {
snapshot.forEach(function(messageSnapshot) {
updates['messages/'+messageSnapshot.key()+'/username'] = name;
})
ref.update(updates);
});
}
This will send a single update command to Firebase that updates the user's name in their profile and in each message.
Previous atomic approach
So when the user change's the name in their profile:
var ref = new Firebase('https://mychat.firebaseio.com/');
var uid = "so:209103";
var nameInProfileRef = ref.child('users').child(uid).child('name');
nameInProfileRef.transaction(function(currentName) {
return "puf";
}, function(error, committed, snapshot) {
if (error) {
console.log('Transaction failed abnormally!', error);
} else if (!committed) {
console.log('Transaction aborted by our code.');
} else {
console.log('Name updated in profile, now update it in the messages');
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.on('child_added', function(messageSnapshot) {
messageSnapshot.ref().update({ username: "puf" });
});
}
console.log("Wilma's data: ", snapshot.val());
}, false /* don't apply the change locally */);
Pretty involved and the astute reader will notice that I cheat in the handling of the messages. First cheat is that I never call off for the listener, but I also don't use a transaction.
If we want to securely do this type of operation from the client, we'd need:
security rules that ensure the names in both places match. But the rules need to allow enough flexibility for them to temporarily be different while we're changing the name. So this turns into a pretty painful two-phase commit scheme.
change all username fields for messages by so:209103 to null (some magic value)
change the name of user so:209103 to 'puf'
change the username in every message by so:209103 that is null to puf.
that query requires an and of two conditions, which Firebase queries don't support. So we'll end up with an extra property uid_plus_name (with value so:209103_puf) that we can query on.
client-side code that handles all these transitions transactionally.
This type of approach makes my head hurt. And usually that means that I'm doing something wrong. But even if it's the right approach, with a head that hurts I'm way more likely to make coding mistakes. So I prefer to look for a simpler solution.
Eventual consistency
Update (20150925): Firebase released a feature to allow atomic writes to multiple paths. This works similar to approach below, but with a single command. See the updated section above to read how this works.
The second approach depends on splitting the user action ("I want to change my name to 'puf'") from the implications of that action ("We need to update the name in profile so:209103 and in every message that has user = so:209103).
I'd handle the rename in a script that we run on a server. The main method would be something like this:
function renameUser(ref, uid, name) {
ref.child('users').child(uid).update({ name: name });
var query = ref.child('messages').orderByChild('user').equalTo(uid);
query.once('value', function(snapshot) {
snapshot.forEach(function(messageSnapshot) {
messageSnapshot.update({ username: name });
})
});
}
Once again I take a few shortcuts here, such as using once('value' (which is in general a bad idea for optimal performance with Firebase). But overall the approach is simpler, at the cost of not having all data completely updated at the same time. But eventually the messages will all be updated to match the new value.
Not caring
The third approach is the simplest of all: in many cases you don't really have to update the duplicated data at all. In the example we've used here, you could say that each message recorded the name as I used it at that time. I didn't change my name until just now, so it makes sense that older messages show the name I used at that time. This applies in many cases where the secondary data is transactional in nature. It doesn't apply everywhere of course, but where it applies "not caring" is the simplest approach of all.
Summary
While the above are just broad descriptions of how you could solve this problem and they are definitely not complete, I find that each time I need to fan out duplicate data it comes back to one of these basic approaches.
To add to Franks great reply, I implemented the eventual consistency approach with a set of Firebase Cloud Functions. The functions get triggered whenever a primary value (eg. users name) gets changed, and then propagate the changes to the denormalized fields.
It is not as fast as a transaction, but for many cases it does not need to be.

Is there a way to tell meteor a collection is static (will never change)?

On my meteor project users can post events and they have to choose (via an autocomplete) in which city it will take place. I have a full list of french cities and it will never be updated.
I want to use a collection and publish-subscribes based on the input of the autocomplete because I don't want the client to download the full database (5MB). Is there a way, for performance, to tell meteor that this collection is "static"? Or does it make no difference?
Could anyone suggest a different approach?
When you "want to tell the server that a collection is static", I am aware of two potential optimizations:
Don't observe the database using a live query because the data will never change
Don't store the results of this query in the merge box because it doesn't need to be tracked and compared with other data (saving memory and CPU)
(1) is something you can do rather easily by constructing your own publish cursor. However, if any client is observing the same query, I believe Meteor will (at least in the future) optimize for that so it's still just one live query for any number of clients. As for (2), I am not aware of any straightforward way to do this because it could potentially mess up the data merging over multiple publications and subscriptions.
To avoid using a live query, you can manually add data to the publish function instead of returning a cursor, which causes the .observe() function to be called to hook up data to the subscription. Here's a simple example:
Meteor.publish(function() {
var sub = this;
var args = {}; // what you're find()ing
Foo.find(args).forEach(function(document) {
sub.added("client_collection_name", document._id, document);
});
sub.ready();
});
This will cause the data to be added to client_collection_name on the client side, which could have the same name as the collection referenced by Foo, or something different. Be aware that you can do many other things with publications (also, see the link above.)
UPDATE: To resolve issues from (2), which can be potentially very problematic depending on the size of the collection, it's necessary to bypass Meteor altogether. See https://stackoverflow.com/a/21835534/586086 for one way to do it. Another way is to just return the collection fetch()ed as a method call, although this doesn't have the benefits of compression.
From Meteor doc :
"Any change to the collection that changes the documents in a cursor will trigger a recomputation. To disable this behavior, pass {reactive: false} as an option to find."
I think this simple option is the best answer
You don't need to publish your whole collection.
1.Show autocomplete options only after user has inputted first 3 letters - this will narrow your search significantly.
2.Provide no more than 5-10 cities as options - this will keep your recordset really small - thus no need to push 5mb of data to each user.
Your publication should look like this:
Meteor.publish('pub-name', function(userInput){
var firstLetters = new RegExp('^' + userInput);
return Cities.find({name:firstLetters},{limit:10,sort:{name:1}});
});

How to avoid race conditions on cursor.observe?

Race condiditions
In my Meteor application, I made an observe within a publish, that insert some new data in certain conditions. The point is that sometimes we have duplicated subscriptions, and race condition leads us to duplicate inserted data.
If it is not possible to have "singleton observers":
How can we avoid race conditions and duplicated inserted data on database?
Example:
Meteor.publish("fortuneUpdate", function () {
var selector = {user: this.userId, seen:false};
DailyFortunes.find(selector).observe({
removed: function(doc, beforeIndex){
if(DailyFortunes.find(selector).count()<1)
createDailyFortune(this.userId);
}
});
}
This question has been moved from How cursor.observe works and how to avoid multiple instances running?
According to Tom, it is not possible, for now, to ensure that calls to subscribe that have the same arguments are shared.
So, if you are having the same problem I had, of redundant data created inside observers, I suggest you, as workaround, to:
Create robust indexes that prevent repeted data creating. Compound Keys is probable what you need here.
Treat duplicate key error exceptions inside your observer ignoring race conditions.
example:
Collection.find(selector).observe({
removed: function(document){
try {
// Workaround to avoid race conditions > https://stackoverflow.com/q/13095647/599991
createNewDocument();
} catch (e) {
// XXX string parsing sucks, maybe
// https://jira.mongodb.org/browse/SERVER-3069 will get fixed one day
if (e.name !== 'MongoError') throw e;
var match = e.err.match(/^E11000 duplicate key error index: ([^ ]+)/);
if (!match) throw e;
//if match, just do nothing.
}
self.flush();
}
});
This is an odd pattern. Can you share some example code?
Generally I'd either expect to see mutations in a method, or setting up an observe inside Meteor.startup() on the server. (The latter is tricky if you're running multiple server processes, but so are many other things in a multi process regime. We'll have a better pattern down the line.)
Because it can be arbitrary JS, a publish function has to run once per subscribing client. It may log new subscriptions, set up per-client server state, or vary its behavior based on this.userId or even a random source. For example, consider a subscription that returns 10 randomly selected documents from a DB collection to each subscribed client!
So the place to optimize the case of many clients subscribing to the same data set is at the DB query layer: if a thousand clients are subscribed to the same DB query, we'll just run that underlying query once.

Resources