According to this answer, Firestore references cannot be used for JOIN-like queries, i.e. for retrieving the referencing document and the referenced document in one database round-trip. This can be a performance issue, since network latency costs apply for each database round-trip.
Network latency is only a problem if you are not close to the datacenter, which means that if you do the join server-side, i.e. in the Google datacenter where Firestore runs, then it should not be a problem.
Can we use Firebase Functions to implement this functionality in a generic manner? I'm thinking of a service, implemented in Firebase Functions, which sits between the client and the database. Most queries are just passed to the database (where, orderBy, limit etc. must still be possible), but there should be an additional populate: true query parameter. If this parameter is present and set to true, then the referenced documents are returned as well.
Maybe it will also be necessary to indicate which documents should be populated.
Sure, sounds like you could give it a try. You can still expect to pay for all documents read by whatever queries you use, even if you don't return that many to the client.
If you really have a lot of joins to perform, it's typically better to pre-compute the joins in another collection, and have the client query that instead. Then you have the advantage of the client having a local cache the helps both speed and cost.
But there's nothing stopping you from implementing a function that does this, if that's what you need.
Related
being new to firestore, I am trying to keep the number of downloads of documents as small as possible. I figured that I could download documents only once and store them offline. If something is changed in the cloud, download a new copy and replace the offline version of that document (give relevant documents a timestamp of last-change and download when the local version is older). I haven't started with this yet but I am going to assume something like this must already exist, right?
I'm not sure where to start and google isn't giving me many answers with the exception of enablePersistence() from the FirebaseFirestore instance. I have a feeling that this is not the thing I am looking for since it would be weird to artificially turn the network on and off for every time I want to check for changes.
Am I missing something or am I about to discover an optimisation solution to this problem?
What you're describing isn't something that's built in to Firestore. You're going to have to design and build it using Firestore capabilities that you read in the documentation. Persistence is enabled by default, but that's not going to solve your problem, which is rather broadly stated here.
The bottom line is that neither Firestore nor its local cache have an understanding of optimizing the download of only documents that have changed, by whatever definition of "change" that you choose. When you query Firestore, it will, by default, always go to the server and download the full set of documents that match your query. If you want to query only the local cache, you can do that as well, but it will not consult the server or be aware of any changes. These capabilities are not sufficient for the optimizations you describe.
If you want to get only document changes since the last time you queried, you're going to have to design your data so that you can make such a query, perhaps by adding a timestamp field to each document, and use that to query for documents that have changed since the last time you made a query. You might also need to manage your own local cache, since the SDK's local cache might not be flexible enough for what you want.
I recommend that you read this blog post that describes in more detail how the local cache actually works, and how you might take advantage of it (or not).
I have been unable to find any documentation that discusses best practices when it comes to managing images using Firebase.
I am considering whether or not to add the filenames of uploaded images into the database or not.
If they're in the database then I can make one db get request to get the list of existing images. These would be put into the database on fileUploaded events and deleted from the database on delete.
Is it better for me to just do a few file exists requests directly on the Storage and not have the db overhead, for instance (pseudo code);
if ( gcs.exists('/storage_path/favicon.png') ) {
...
}
if ( gcs.exists('/storage_path/favicon-32x32.png') ) {
...
}
Or is it better to store those images filenames in the database, assume they exist, and pull them out in a document .get?
I would like this to be as lightweight as possible, I know in advance the list of filenames we want to know about (they're favicons) so looping and doing .exists is less code, but perhaps slower than putting and pulling from the database.
Any information you have on the efficiency of db document requests and storage exists requests (which I would assume are doing something similar behind the scenes anyway).
Please ask for more information if I'm not clear.
Based on the fact that Firestore clients try to maintain an open socket connection to the database, I'd give the edge to a database get being faster than checking for file existence in Cloud Storage. With Firestore, you're less likely to pay the cost of establishing an SSL connection to the cloud service.
From architectural point of view, I would save the file names in Firestore and retrieve them from the client directly based on the file name, its much simpler and you can add different types of images for different purposes without much thought down the road under same bucket
Is there a way to automatically move expired documents to blob storage via change feed?
I Google but found no solution to automatically move expired documents to blob storage via the change feed option. Is it possible?
There is not built in functionality for something like that and the change feed would be of no use in this case.
The change feed processor (which is what the Azure Function trigger is using too) won't notify you for deleted documents so you can't listen for them.
Your best bet is to write some custom application that does scheduling archiving and deleted the archived document.
As statements in the Cosmos db TTL document: When you configure TTL, the system will automatically delete the expired items based on the TTL value, unlike a delete operation that is explicitly issued by the client application.
So,it is controlled by the cosmos db system,not client side.You could follow and vote up this feedback to push the progress of cosmos db.
To come back to this question, one way I've found that works is to make use of the in-built TTL (Let CosmosDB expire documents) and to have a backup script that queries for documents that are near the TTL, but with a safe window in case of latency - e.g. I have the window up to 24 hours.
The main reasoning for this is that issuing deletes as a query not only uses RUs, but quite a lot of them. Even when you slim your indexes down you can still have massive RU usage, whereas letting Cosmos TTL the documents itself induces no extra RU use.
A pattern that I came up with to help is to have my backup script enable the container-level TTL when it starts (doing an initial backup first to ensure no data loss occurs instantly) and to have a try-except-finally, with the finally removing/turning off the TTL to stop it potentially removing data in case my script is down for longer than the window. I'm not yet sure of the performance hit that might occur on large containers when it has to update indexes, but in a logical sense this approach seems to be effective.
As per the documentation, Firebase Functions are currently supported for 4 regions only - “us-central1”, “us-east1", “europe-west1”, “asia-northeast1"
That means locations further away would incur more latency, and often that translates to lower performance.
How can this limitation be worked around?
1) Choosing a location that is closest to you. You can set up test cloud functions in different regions, and test the round-trip latency. Only you can discover the specifics about your location.
2) Focus your software architecture on infrastructure that is locally available.
Use the client-side Firestore library directly as much as possible. It supports offline data, queueing data to send out later if you don't have internet, and caching read data locally - you can't get faster latency than that! So make sure you use Firestore for CRUD operations.
3) Architect to use CloudFunctions for batch and background processesing. If any business-logic processing is required, write the data to Firestore (using client libraries), and have a FF trigger to do some processing upon the write data-event. Have that trigger update that record with the additional processing, and state. I believe that if you're using the client-side libraries there is a way to have the updated data automatically pushed back to the client-side. (edited)
You also have the bonus benefit of being able to control authorisation with Firestore Auth, where Functions don't have an admin-level authorisation control.
4) Reduce chatter - minimising the amount of CloudFunction calls overall, and ensuring your CloudFunctions themselves do more in one go and return more complete data in one go.
I am storing data in DynamoDB as a Map attribute type and implementing a restful PATCH endpoint to modify that data (RFC 6902).
In my validation routine, I am currently NOT making sure that the Map exists before translating the patch into an updating expression and sending it to DynamoDB.
This means that if the Map is not already set in DynamoDB, the update will fail (ValidationException since the document path does not exist).
My Question: is it appropriate/acceptable/OK to rely on DynamoDB rejecting the update in this way, or should I get a copy of the item and reject the patch in my own validation routines?
I have not been able to think of a reason not to allow DynamoDB the pleasure of rejecting the patch (and it saves me a GET call), but it makes me a little nervous to rely on a 3rd party validation like this (although we specify the API version now when accessing AWS so in theory this should always work...)
This seems like a highly subjective question, yet personally I don't see the benefit of doing your own validation by adding an extra round-trip to DynamoDB when, the way I see it, the worst that can happen is the update succeeding to DynamoDB. This is no different than treating a local database update that returns success as authoritative.
And if you're worried about long-term backwards compatibility (ie. DynamoDB API contract changing from underneath you) then hopefully you've written some functional/integration tests that are run periodically and specifically whenever you update your software dependencies.