EmberFire Relationship Persistance - firebase

Using EmberFire, I'm trying to work with related sets of data. In this case, a Campaign has many Players and Players have many Campaigns.
When I want to add a player to campaign, I understand that I can push the player object to campaign.players, save the player, and then save the campaign. This will update both records so that the relationship is cemented. This works fine.
My question is a hypothetical on how to handle failures when saving one or both records.
For instance, how would you handle an occasion whereby saving the player record succeeded (thus adding a corresponding campaign ID to its campaigns field), but saving the campaign then failed (thus failing to add the player to its players field). It seems like in this case you'd open yourself up to the potential of some very messy data.
I was considering taking a "snapshot" of both records in question and then resetting them to their previous states if one update fails, but this seems like it's going to create some semi-nightmarish code.
Thoughts?

I guess you are using the Real Time Database. If you use the update() method with different paths "you can perform simultaneous updates to multiple locations in the JSON tree with a single call to update()"
Simultaneous updates made this way are atomic: either all updates
succeed or all updates fail.
From the documentation: https://firebase.google.com/docs/database/web/read-and-write#update_specific_fields
So, in your case, you should do something like the following but many variations are possible as soon as the updates object holds all the simultaneous updates:
var updates = {};
updates['/campaigns/players/' + newPlayerKey] = playerData;
updates['/players/campaigns/' + campaignKey] = "true";
return firebase.database().ref().update(updates);

Related

Change default data collected by Firebase/Google analytics

We use Firebase/Google analytics in our android app. Each event is saved with a lot of extra information (user-id, device-info, timestamps, user-properties, geographical location …). The extra info is there by default but we don’t want it to be collected.
We tried 2 things:
1) Update Big Query Schema
Delete the unwanted columns from Big Query. Unfortunately, Big Query creates a new export every day. So we would need to know where those fields are coming from. Something we don't know.
2) DefaultParameters within the app
Tried to use default parameters from inside the app, so the city will always be null. Here is an example with the user’s city
Bundle defaultValues = new Bundle();
defaultValues.putString("geo.city", null);
FirebaseAnalytics.getInstance(ctx).setDefaultEventParameters(defaultValues);
Unfortunately, we still see geo.city in our BigQuery data filled in.
Is there a way of changing what is collected by default?
There is no way to disable the geography information. Analytics uses IP addresses to derive the geolocation of a visitor. Probably the solution about update Big Query Schema is a viable way. You have to create a system that carries out this update on a daily basis precisely because the export takes place every day.

Prevent more than 1 write a second to a Firestore document when using a counter with cloud function

Background:
I have a Firestore database with a users collection. Each user is a document which contains a contacts collection. Each document in that collection is a single contact.
Since firestore does not have a "count" feature for all documents, and since I don't want to read all contacts to count how many contacts a user has, I trigger cloud functions when a contact is added or deleted which increments or decrements numberOfContacts in the user document. In order to make the function idempotent, it has to do multiple reads and writes to prevent incrementing the counter more than once if it's called more than once for the same document. This means that I need to have a different collection of eventIDs that I've already handled so I don't duplicate it. This requires me to run another function once a month to go through each user deleting all such documents (which is a lot of reads and some writes).
Issue
Now the challenge is that the user can import his/her contacts. So if a user imports 10,000 contacts, this function will get fired 10,000 times in quick succession.
How do I prevent that?
Current approach:
Right now I am adding a field in the contact document that indicates that the addition was part of an import. This gets the cloud function to not increment.
I perform the operation from the client 499 contacts at a time in a transaction, which also increments the count as the 500th write. That way the count stays consistent if something failed halfway.
Is this really the best way? It seems so complicated to just have a count of contacts available. I end up doing multiple reads and writes each time a single contact changes plus I have to run a cleanup function every month.
I keep thinking there's gotta be a simpler way.
For those who're curious, it seems like the approach I am taking is the best appraoch.
I add a field in the contact document that indicates that the addition was part of an import (bulkAdd = true). This gets the cloud function to not increment.
I have another cloud function add the contacts 200 at a time (I do FieldValue.timestamp and that counts as another write, so it's 400 writes). I do this in a batch and the 401th write in the batch is the increment count. That way I can bulk import contacts without having to bombard a single document with writes.
Problem with increments
There are duplicate-safe operations like FieldValue.arrayUnion() & FieldValue.arrayRemove(). I wrote a bit about that approach here: Firebase function document.create and user.create triggers firing multiple times
By this approach you make your user document contain a special array field with contact IDs. Once the contact is added to a subcollection and your function is triggered, the contact's id can be written to this field. If the function is triggered twice or more times for one contact, there will be only one instance of it written into the master user doc. But the actual size can be fetched on the client or with one more function triggered on the user doc update. This is a bit simplier than having eventIDs.
Problem with importing 10k+ contacts
This is a bit philosophically.
If I got it, the problem is that a user performs 10k writes. Than these 10k writes trigger 10k functions, which perform additional 10k writes to the master doc (and same amount of reads if they use eventIDs document)?
You can make a special subcollection just for importing multiple contacts to your DB. Instead of writing 10k docs to the DB, the client would create one but big document with 10k contact fields, which triggers a cloud function. The mentioned function would read it all, make the neccessary 10k contact writes + 1 write to master doc with all the arrayUnions. You would just need to think how to prevent 10k invoked function writes (adding a special metadata field like yours bulkAdd)
This is just an opinion.

How to use DynamoDB streams to maintain duplicated data consistency?

From what I understand one of the uses cases of DynamoDB Streams is to maintain/update duplicated data.
Let's say I have a User object, and its name attribute is replicated in many Invoice objects.
When a User edits/updates its name, I will have a lambda using DynamoDb Streams to then update all Invoices related to this user with his new name.
There could be thousands of Invoices related to this user so this updating could take a while, specially because I will want to do a rate limited batch_write so that this operation doesn't throttle my table.
The question is : How can my (web)application know that the lambda has finished updating? For example, I want to show a loading screen to the client using the application untill the duplicated data updating is done, so that he doesn't see any outdated information on his browser.
Or is there other ways of rapidly dealing with updating thousands of duplicated data?
Why aren't you capturing the output of Lambda. You can make Lambda return successful status, once all the updates are persisting to DDB.
Invoice can keep a reference to User object instead of storing the exact name and can fetch name at the time of generating/printing

Use transaction to update value at two different nodes

I have two different nodes in database.
all posts
users
As per the fan-out model when a user adds a post , it gets updated at both all posts and users/uid/posts.
Each post consists of a like button which displays the number of likes.
When a user clicks on it the like should increase by +1.
According to the docs, we use transactionfor this kind of process.
But the problem with using transaction is that it updates only one node as far as i know
But my problem is how shall i update this transaction in both the nodes as mentioned above
Shall i use update method
What is the way to use transaction that gets updated at both the nodes
You can push all your logic for updating the database onto the server side with Cloud Functions for Firebase. Use can use a database trigger to respond to data being written in the database, then execute some JavaScript to make sure the fan-out finishes correctly. It will have the advantage of making sure all the changes happen without depending on the client.
Transactions can't modify data at two different locations at once, but you will still probably want to use them in your client and Cloud Functions to make sure concurrent writes will not have problems.

Efficient DynamoDB schema for time series data

We are building a conversation system that will support messages between 2 users (and eventually between 3+ users). Each conversation will have a collection of users who can participate/view the conversation as well as a collection of messages. The UI will display the most recent 10 messages in a specific conversation with the ability to "page" (progressive scrolling?) the messages to view messages further back in time.
The plan is to store conversations and the participants in MSSQL and then only store the messages (which represents the data that has the potential to grow very large) in DynamoDB. The message table would use the conversation ID as the hash key and the message CreateDate as the range key. The conversation ID could be anything at this point (integer, GUID, etc) to ensure an even message distribution across the partitions.
In order to avoid hot partitions one suggestion is to create separate tables for time series data because typically only the most recent data will be accessed. Would this lead to issues when we need to pull back previous messages for a user as they scroll/page because we have to query across multiple tables to piece together a batch of messages?
Is there a different/better approach for storing time series data that may be infrequently accessed, but available quickly?
I guess we can assume that there are many "active" conversations in parallel, right? Meaning - we're not dealing with the case where all the traffic is regarding a single conversation (or a few).
If that's the case, and you're using a random number/GUID as your HASH key, your objects will be evenly spread throughout the nodes and as far as I know, you shouldn't be afraid of skewness. Since the CreateDate is only the RANGE key, all messages for the same conversation will be stored on the same node (based on their ConversationID), so it actually doesn't matter if you query for the latest 5 records or the earliest 5. In both cases it's query using the index on CreateDate.
I wouldn't break the data into multiple tables. I don't see what benefit it gives you (considering the previous section) and it will make your administrative life a nightmare (just imagine changing throughput for all tables, or backing them up, or creating a CloudFormation template to create your whole environment).
I would be concerned with the number of messages that will be returned when you pull the history. I guess you'll implement that by a query command with the ConversationID as the HASH key and order results by CreationDate descending. In that case, I'd return only the first page of results (I think it returns up to 1MB of data, so depends on an average message length, it might be enough or not) and only if the user keeps scrolling, fetch the next page. Otherwise, you might use a lot of your throughput on really long conversations and anyway, the client doesn't really want to get stuck for a long time waiting for megabytes of data to appear on screen..
Hope this helps

Resources