I've been trying to make the integration between react-query and react-virtualized's InfiniteLoader possible, however I got stuck at a particular point.
Namely, the InfinteLoader's fetchMore function is invoked with an IndexRange ({ startIndex: number, stopIndex: number }). This is important because the batch size can vary and usually the initially batch size is larger than the subsequent batches. Hence it is imperative for the queryFn to have access to the startIndex, stopIndex or a computed limit. For this integration I've been using the firebase sdk however the principal of a cursor is used more widely. I won't go into the specifics, however, firebase's startAfter collection filter is accepts the reference to the last loaded document. Fortunately, the getFetchMore function is invoked with the last batch and the last document in the batch represents the cursor after which the next batch starts. The point being that there is not way to pass both the limit and the last document reference to the query function in order to load the new batch.
Is there are workaround?
Related
I have a scheduled function that runs every three minutes.
It is supposed to look on the database (firestore), query the relevant users, send them emails or perform other db actions.
Once it sends an email to a user, it updates the user with a field 'sent_to_today:true'.
If sent_to_today == true, the function won't touch that user for around 24 hours, which is what's intended.
But, because I have many users, and the function is doing a lot of work, by the time it updates the user with sent_to_today:true, another invocation gets to that user beforehand and processes them for sending emails.
This results in some users getting the same email, twice.
What are my options to make sure this doesn't happen?
Data Model (simplified):
users (Collection)
--- userId (document)
--- sent_to_today [Boolean]
--- NextUpdateTime [String representing a Timestamp in ISO String]
When the function runs, if ("Now" >= NextUpdateTime) && (sent_to_today==false), the user is processed, otherwise, they're skipped.
How do I make sure that the user is only processed by one invocation per day, and not many?
As I said, by the time they're processed by one function invocation (which sets "sent_to_today" to true), the next invocation gets to that user and processes them.
Any help in structuring the data better or using any other logical method would be greatly appreciated.
Here is an idea I'm considering:
Each invocation sets a global document's field, "ex: busy_right_now : true" at the start, and when finished it sets it again to false. If a subsequent invocation runs before the current is finished, it does nothing if busy_right_now is still true.
Option 1.
Do you think you the function can be invoked once in ten minutes, rather every three minutes? If yes - just modify the scheduler, and make sure that 'max instances' attribute is '1'. As the function timeout is only 540 seconds, 10 minutes (600 seconds) is more than enough to avoid overlapping.
Option 2.
When a firestore document is chosen for processing, the cloud function modifies some attribute - i.e. __state - and sets its value to IN_PROGRESS for example. When the processing is finished (email is sent), that attribute value is modified again - to DONE for example. Thus, if the function picks up a document, which has the value IN_PROGRESS in the __state attribute - it simply ignores and continues to the next one.
The drawback - if the function crashes - there might be documents with IN_PROGRESS state, and there should be some mechanism to monitor and resolve such cases.
Option 3.
One cloud function runs through the firestore collection, and for each document, which is to be processed - sends a pubsub message which triggers another cloud function. That one works only with one firestore document. Nevertheless the 'state machine' control is required (like in the Option 2 above). The benefit of the option 3 - higher level of specialisation between functions, and there may be many 'second' cloud functions running in parallel.
What would a Cosmos stored procedure look like that would set the PumperID field for every record to a default value?
We are needing to do this to repair some data, so the procedure would visit every record that has a PumperID field (not all docs have this), and set it to a default value.
Assuming a one-time data maintenance task, arguably the simplest solution is to create a single purpose .NET Core console app and use the SDK to query for the items that require changes, and perform the updates. I've used this approach to rename properties, for example. This works for any Cosmos database and doesn't require deploying any stored procs or otherwise.
Ideally, it is designed to be idempotent so it can be run multiple times if several passes are required to catch new data coming in. If the item count is large, one could optionally use the SDK operations to scale up throughput on start and scale back down when finished. For performance run it close to the endpoint on an Azure Virtual Machine or Function.
For scenarios where you want to iterate through every item in a container and update a property, the best means to accomplish this is to use the Change Feed Processor and run the operation in an Azure function or VM. See Change Feed Processor to learn more and examples to start with.
With Change Feed you will want to start it to read from the beginning of the container. To do this see Reading Change Feed from the beginning.
Then within your delegate you will read each item off the change feed, check it's value and then call ReplaceItemAsync() to write back if it needed to be updated.
static async Task HandleChangesAsync(IReadOnlyCollection<MyType> changes, CancellationToken cancellationToken)
{
Console.WriteLine("Started handling changes...");
foreach (MyType item in changes)
{
if(item.PumperID == null)
{
item.PumperID = "some value"
//call ReplaceItemAsync(), etc.
}
}
Console.WriteLine("Finished handling changes.");
}
When writing event based cloud functions for firebase firestore it's common to update fields in the affected document, for example:
When a document of users collection is updated a function will trigger, let's say we want to determine the user info state and we have a completeInfo: boolean property, the function will have to perform another update so that the trigger will fire again, if we don't use a flag like needsUpdate: boolean to determine if excecuting the function we will have an infinite loop.
Is there any other way to approach this behavior? Or the situation is a consequence of how the database is designed? How could we avoid ending up in such scenario?
I have a few common approaches to Cloud Functions that transform the data:
Write the transformed data to a different document than the one that triggers the Cloud Function. This is by far the easier approach, since there is no additional code needed - and thus I can't make any mistakes in it. It also means there is no additional trigger, so you're not paying for that extra invocation.
Use granular triggers to ensure my Cloud Function only gets called when it needs to actually do some work. For example, many of my functions only need to run when the document gets created, so by using an onCreate trigger I ensure my code only gets run once, even if it then ends up updating the newly created document.
Write the transformed data into the existing document. In that case I make sure to have the checks for whether the transformation is needed in place before I write the actual code for the transformation. I prefer to not add flag fields, but use the existing data for this check.
A recent example is where I update an amount in a document, which then needs to be fanned out to all users:
exports.fanoutAmount = functions.firestore.document('users/{uid}').onWrite((change, context) => {
let old_amount = change.before && change.before.data() && change.before.data().amount ? change.before.data().amount : 0;
let new_amount = change.after.data().amount;
if (old_amount !== new_amount) {
// TODO: fan out to all documents in the collection
}
});
You need to take care to avoid writing a function that triggers itself infinitely. This is not something that Cloud Functions can do for you. Typically you do this by checking within your function if the work was previously done for the document that was modified in a previous invocation. There are several ways to do this, and you will have to implement something that meets your specific use case.
I would take this approach from an execution time perspective, this means that the function for each document will be run twice. Each time when the document is triggered, a field lastUpdate would be there with a timestamp and the function only updates the document if the time is older than my time - eg 10 seconds.
I'm working on a Firebase Cloud Function that updates some aggregate information on some documents in my DB. It's a very simple function and is simply adding 1 to a total # of documents count. Much like the example function found in the Firestore documentation.
I just noticed that when creating a single new document, the function was invoked twice. See below screenshot and note the logged document ID (iDup09btyVNr5fHl6vif) is repeated twice:
After a bit of digging around I found this SO post that says the following:
Delivery of function invocations is not currently guaranteed. As the Cloud Firestore and Cloud Functions integration improves, we plan to guarantee "at least once" delivery. However, this may not always be the case during beta. This may also result in multiple invocations for a single event, so for the highest quality functions ensure that the functions are written to be idempotent.
(From Firestore documentation: Limitations and guarantees)
Which leads me to a problem with their documentation. Cloud Functions as mentioned above are meant to be idempotent (In other words, data they alter should be the same whether the function runs once or runs multiple times). However the example function I linked to earlier (to my eyes) is not idempotent:
exports.aggregateRatings = firestore
.document('restaurants/{restId}/ratings/{ratingId}')
.onWrite(event => {
// Get value of the newly added rating
var ratingVal = event.data.get('rating');
// Get a reference to the restaurant
var restRef = db.collection('restaurants').document(event.params.restId);
// Update aggregations in a transaction
return db.transaction(transaction => {
return transaction.get(restRef).then(restDoc => {
// Compute new number of ratings
var newNumRatings = restDoc.data('numRatings') + 1;
// Compute new average rating
var oldRatingTotal = restDoc.data('avgRating') * restDoc.data('numRatings');
var newAvgRating = (oldRatingTotal + ratingVal) / newNumRatings;
// Update restaurant info
return transaction.update(restRef, {
avgRating: newAvgRating,
numRatings: newNumRatings
});
});
});
});
If the function runs once, the aggregate data is increased as if one rating is added, but if it runs again on the same rating it will increase the aggregate data as if there were two ratings added.
Unless I'm misunderstanding the concept of idempotence, this seems to be a problem.
Does anyone have any ideas of how to increase / decrease aggregate data in Cloud Firestore via Cloud Functions in a way that is idempotent?
(And of course doesn't involve querying every single document the aggregate data is regarding)
Bonus points: Does anyone know if functions will still need to be idempotent after Cloud Firestore is out of beta?
The Cloud Functions documentation gives some guidance on how to make retryable background functions idempotent. The bullet point you're most likely to be interested in here is:
Impose a transactional check outside the function, independent of the code. For example, persist state somewhere recording that a given event ID has already been processed.
The event parameter passed to your function has an eventId property on it that is unique, but will be the same when an even it retried. You should use this value to determine if an action taken by an event has already occurred, so you know to skip the action the second time, if necessary.
As for how exactly to check if an event ID has already been processed by your function, there's a lot of ways to do it, and that's up to you.
You can always opt out of making your function idempotent if you think it's simply not worthwhile, or it's OK to possibly have incorrect counts in some (probably rare) cases.
I'm trying to understand when I should call Query.close(Object) or Query.closeAll();
From the docs I see that it will "close a query result" and "release all resources associated with it," but what does "close" mean? Does it mean I can't use objects that I get out of a Query after I've called close on them?
I ask because I'm trying to compartmentalize my code with functions like getUser(id), which will create a Query, fetch the User, and destroy the query. If I have to keep the Query around just to access the User, then I can't really do that compartmentalization.
What happens if I don't call close on an object? Will it get collected as garbage? Will I be able to access that object later?
Please let me know if I can further specify my question somehow.
You can't use the query results since you closed them (i.e the List object you got back from query.execute). You can access the objects in the results ... if you copied them to your own List, or made references to them in your code. Failure to call close can leak memory
When your query method returns a single object it is easy to simply close the query before returning the single object.
On the other hand, when your query method returns a collection the query method itself can not close the query before returning the result because the query needs to stay open while the caller is iterating through the results.
This puts the responsibility for closing a query that returns a collection on the caller and can introduce leaks if the caller neglects to close the query - I thought there must be a safer way and there is!
Guido, a long time DataNucleus user, created a 'auto closing' collection facade that wraps the collection returned by JDO's Query.execute method. Usage is extremely simple: Wrap the query result inside an instance of the auto closing collection object:
Instead of returning the Query result set like this:
return q.execute();
simply return an 'auto closing' wrapped version of it:
return new JDOQueryResultCollection(q, q.execute());
To the caller it appears like any other Collection but the wrapper keeps a reference to the query that created the collection result and auto closes it when the wrapper is disposed of by the GC.
Guido kindly gave us permission to include his clever auto closing code in our open source exPOJO library. The auto closing classes are completely independent of exPOJO and can be used in isolation. The classes of interest are in the expojo_jdo*.jar file that can be downloaded from:
http://www.expojo.com/
JDOQueryResultCollection is the only class used directly but it does have a few supporting classes.
Simply add the jar to your project and import com.sas.framework.expojo.jdo.JDOQueryResultCollection into any Java file that includes query methods that return results as a collection.
Alternatively you can extract the source files from the jar and include them in your project: They are:
JDOQueryResultCollection.java
Disposable.java
AutoCloseQueryIterator.java
ReadonlyIterator.java