Periodical MongoDB operations with Meteor - meteor

I am building a voting system with Meteor where items can be up- or downvoted. To sort the voting scores more precisely later on, each item holds the fields dailyScore, monthlyScore and alltimeScore, which get incremented or decremented after a vote. I also need to mention, that both registered and unregistered users can vote every 24h (there are two "voters"-arrays containing the userIds of the registered voters and the IP-addresses of the unregistered voters to keep track of the voters and preventing them to vote more than once a day).
The problem I am facing right now is about finding a way to reliably reset
the dailyScore every new day (let's say at UTC-0)
the monthlyScore every new month (in addition to (1.) apparently)
the two voters-arrays on a daily basis (to the same point of time as (1.))
My thoughts so far:
I could store a servers-side global variable which always contains the lastUpdate-date of any collection. By using the onConnection-callback I can check if(currentTime.getDate() != lastUpdate.getDate()) on the server. If true, I can start the operations performing 1.-3. from above.
Using onConnection might be "too heavy".
Is some kind of cronjob possible to perform 1.-3. every 24h at UTC-0?
I don't think a onLogin-hook is sufficent, because unregistered users can vote as well.
Is there a common pattern or best practice for that? Doing periodical database operations (like every fixed 24h or every new onConnection at a new day) should be a well-known problem.

percolate:synced-cron package works quite well for this kind of scheduled jobs.
Beware, SyncedCron probably won't work as expected on certain shared hosting providers that shutdown app instances when they aren't receiving requests (like Heroku's free dyno tier or Meteor free galaxy).

Related

How to handle offline aggregation using Firestore?

I have been scouring the internet for days on a solution to this problem.
That is, how to handle aggregation when there is no network connection? I have a task management app that looks to aggregate meta data about user tasks. For example, the task can contain tags that can be aggregated to be shown in a dashboard to the user on a daily basis. This would be easy if the user is always online, so I could use transaction or cloud function to aggregate, but when the user is offline, the aggregation will appear to be incorrect, until the user restores their network connection.
Aggregation queries are explained here:
https://firebase.google.com/docs/firestore/solutions/aggregation
Which states a limitation:
Offline support - Client-side transactions will fail when the user's
device is offline, which means you need to handle this case in your
app and retry at the appropriate time.
However, there has yet to be any example or documentation on how to 'handle this case'. How would I go about addressing this problem?
Some thoughts:
I could cache the item if a transaction fails. This item will be aggregated on top of the stored aggregation. However, going down this line would mean that I can't take advantage of the Firestore's "offline mode", because I'm using my own cache on every write while offline anyway.
I could aggregate on demand. That is, never store the aggregation. This is going to be very heavy on read depending on how many tasks a user has. Furthermore, if the aggregation will need to be shared as insights to other users, this option will not work because other users do not have access to the tasks.
I'm at a loss and any help would be appreciated, thanks!
After a lot of research and trial and error I found a solution that can address this problem gracefully.
FieldValue.increment to the rescue.
What FieldValue.increment does is bypass the use of transaction while respecting the default Firestore's offline cache behaviour. It requires the use of set or update on the field directly. The drawback is the inability to use the 'withConverter' on the collection for type safety. I'm willing to live with the drawback considering how useful FieldValue.increment is.
I've done multiple tests and can confirm that the values can be incremented/decremented multiple times locally while offline. This offline value is reflected in a get or snapshot call to the cache. When the network connection is restored, the values are updated on the server.
The value itself is not stored on the cache, it simply stores the "difference" in the FieldValue sentinel for when it is time to update it on the server.
This method only works with incrementing and decrementing values. Storing averages will not be possible using this method. That is because the true total number of items is not known at the time of its calculation when offline.
Instead, the total number of items are stored along side the total value. The average is then calculated when and as needed. In this way the average will always be accurate from a local perspective when offline, and it will also be accurate when online when the total value and count has been synced.

Firestore : Maintaining the count of a collection. Trigger function vs transaction

Let's say I have a collection called persons and another collection called cities with a field population. When a Person is created in a City, I would like to increment the population field in the corresponding city.
I have two options.
Create a onCreate trigger function. Find the city document and increment using FieldValue.increment(1).
Create an HTTPS callable cloud function to create the person. The cloud function executes a transaction in which the person is created and the population is incremented.
The first one is simpler and I am using it right now. But, I am wondering if there could be cases where the onCreate is not called due to some glitch...
I am thinking of moving to the second option. I am wondering if there are any disadvantages. Does HTTPS callable function cost more?
The only problem I see with the HTTPS callables would be that if something fails you would need to handle that on your client side. That would be (at least for me) a little bit to much logic for the client side.
What I can recommend you after almost 4 years experience with exactly that problem is a solution with a virtual queue. I had a long dicussion on that theme here and even with the Firebase ppl on the last in person Google IO and Firebase Summit.
Our problem was that there where those glitches and even if they happend sometimes the changes and transaction failed due to too much requests. After trying every offical recommendation like the shard counters etc. we ended up creating a virtual queue where each onCreate adds an entry to just a Firestore or RTD list/collection and another function that runs eaither by crone or another trigger (that doesn't matter). That cloud function handles each entry in the queue one by one and starts again for each of them to awoid timouts and memeroy limits. We made sure one handler/calculation is enought for a single function to handle it.
This method was the only bullet proof one that could handle thousands of new entries in a second without having an issue. The only downside is that it takes more time than an usual trigger because each entries is calculated one by one. If your calculations are smaller you could do them in batches (that is how we started to).

Updating database at certain time

I'm looking to make my Firebase Database update at a particular time.
The way it should work is that, for a group, the leader sets a deadline time. The group votes on some stuff. At the deadline time, I would like the database to automatically tabulate the votes and store the response within.
I'm not sure how to set these types of rules for the database without doing a check whenever a member of the group is online and refreshes their feed. Also, this would allow any member to write to the vote-result field, which seems bad when I want it to just be automatic. It seems like there should be an easier way than this, but I just can't find anything.
It seems like the other option would be to set up a separate server that counts through all the time-frames and sends an update request when the time has allotted. But it seems like Firebase should have this built in. I'm sure I'm missing something. Thank you in advance.
EDIT: Here is a more comprehensive look at my usecase. I am looking into cron stuff now, as I think it will solve my problem, but I don't know.
1) Leader creates a group and invites friends to it. Event is created is firebase database. Group is created with a specific deadline.
2) Before deadline, leader and friends can vote on certain options. Basically they submit a dictionary to database with their votes.
3) On deadline, either just need to change the state of the group (from voting to closed) or calculate the vote response. Same problem, which is that I don't know to do do it at a certain time w/o using user clients.

How to mitigate against long startup times in firebase workers when dataset gets large

Firebase has an interesting feature/nuisance where when you listen on a data ref, you get all the data that was ever added to that ref. So, for example, when you listen on 'child_added', you get a replay of all the children that were added to that ref from the beginning of time. We are writing a commenting system with a dataset that looks something like this:
/comments
/sites
/sites/articles
/users
Sites have many articles and articles have many comments and users have many comments.
We want to be able to track all the comments a user makes, so we feel it is wise to put comments in a separate ref rather than partition them by the articles they belong to. We have a backend listener that needs to do things on new comments as they arrive (increment their child counts, adjust a user's stats etc.). My concern is that, after a while, it will take this listener a long time to start up if it has to process a replay of every comment ever made.
I thought about possibly storing comments only in articles and storing references to each comment's siteId/articleId/commentId in the user table so we could still find all the comments for a given user, but this complicates the backend, as it would then probably need to have a separate listener for each site or even each article, which could make it difficult to manage so many listeners.
Imagine if one of these articles is on a very high-traffic site with tens of thousands of articles and thousands of comments per article. Is the scaling answer to somehow keep track of the traffic levels of every site and set up and partition them in a way that they are assigned to different worker processes? And what about the question of startup time and how long it takes to replay all data every time we load up our workers?
Adding on to Frank's answer, here are a couple other possibilities.
Use a queue strategy
Since the workers are really expecting to process one-time events, then give them one-time events which they can pull from a queue and delete after they finish processing. This resolves the multiple-worker scenario elegantly and ensures nothing is ever missed because a server was offline
Utilize a timestamp to reduce backlog
A simple strategy for avoiding backlog during reboot/startup of the workers is to add a timestamp to all of the events and then do something like the following:
var startTime = Date.now() - 3600 // an hour ago
pathRef.orderByChild('timestamp').startAt( startTime );
Keep track of the last id processed
This only works well with push ids, since formats that do not sort naturally by key will likely become out of order at some point in the future.
When processing records, have your worker keep track of the last record it added by writing that value into Firebase. Then one can use orderByKey().startAt( lastKeyProcessed ) to avoid the backlog. Annoyingly, we then have to discard the first key. However, this is an efficient query, does not cost data storage for an index, and is quick to implement.
If you only need to process new comments once, you can put them in a separate list, e.g. newComments vs. comments (the ones that have been processed). The when you're done processing, move them from newComments to comments.
Alternatively you can keep all comments in a single list like you have today and add a field (e.g. "isNew") to it that you set to true initially. Then you can filter with orderByChild('isNew').equalTo(true) and update({ isNew: false }) once you're done with processing.

'Assigning a player in multiplayer game' firebase example is not very scalable or is it?

In the firebase example (https://gist.github.com/anantn/4323981), to add an user to the game, we attach the transaction method to playerListRef. Now, every time firebase attempts to update data, it will call the callback passed to the transaction method with the list of userid of all players. If my game supports thousands of users to join at a time, every instance this method executes, the entire user list will be downloaded and passed which will be bad.
If this is true, what is the recommended way to assign users then?
This is specifically what Firebase was designed to handle. If your application needs to actually assign player numbers, this example is the way to go. Otherwise, if the players just need to be in the same "game" or "room" without any notion of ordering you could remove the transaction code to speed things up a bit. The snippet as well as the backend have handled the number of concurrent connections you've mentioned—if you're seeing any specific problems with your code or behavior with Firebase that appears to be a bug, please contact us at support#firebase.com and we can dig into it.

Resources