being new to firestore, I am trying to keep the number of downloads of documents as small as possible. I figured that I could download documents only once and store them offline. If something is changed in the cloud, download a new copy and replace the offline version of that document (give relevant documents a timestamp of last-change and download when the local version is older). I haven't started with this yet but I am going to assume something like this must already exist, right?
I'm not sure where to start and google isn't giving me many answers with the exception of enablePersistence() from the FirebaseFirestore instance. I have a feeling that this is not the thing I am looking for since it would be weird to artificially turn the network on and off for every time I want to check for changes.
Am I missing something or am I about to discover an optimisation solution to this problem?
What you're describing isn't something that's built in to Firestore. You're going to have to design and build it using Firestore capabilities that you read in the documentation. Persistence is enabled by default, but that's not going to solve your problem, which is rather broadly stated here.
The bottom line is that neither Firestore nor its local cache have an understanding of optimizing the download of only documents that have changed, by whatever definition of "change" that you choose. When you query Firestore, it will, by default, always go to the server and download the full set of documents that match your query. If you want to query only the local cache, you can do that as well, but it will not consult the server or be aware of any changes. These capabilities are not sufficient for the optimizations you describe.
If you want to get only document changes since the last time you queried, you're going to have to design your data so that you can make such a query, perhaps by adding a timestamp field to each document, and use that to query for documents that have changed since the last time you made a query. You might also need to manage your own local cache, since the SDK's local cache might not be flexible enough for what you want.
I recommend that you read this blog post that describes in more detail how the local cache actually works, and how you might take advantage of it (or not).
Related
This is rather just a straight forward question.
Does Firebase Realtime Database guarantees to follow 'First come first serve' rule when handling requests?
But when there is a write-request, and then instantaneously followed by a read-request, is the read-request will fetch updated data?
When there is a write-read-write sequence of requests, does for sure read-request fetch the data written by first write?
Suppose there is a write-request, which was unable to perform (due to some connection issues). As firebase can work even in offline, that change will be saved locally. Now from somewhere else another write-request was made and it completed successfully. After this, if the first device comes online, does it modify the values(since it arrived latest)? or not(since it was initiated prior to the latest changes)?
There are a lot of questions in your post, and many of them depend on how you implement the functionality. So it's not nearly as straightforward as you may think.
The best I can do is explain a bit of how the database works in the scenarios you mention. If you run into more questions from there, I recommend implementing the use-case and posting back with an MCVE for each specific question.
Writes from a single client are handled in the order in which that client makes them.
But writes from different clients are handled with a last-write-wins logic. If your use-case requires something else, include a client-side timestamp in the write and use security rules to reject writes that are older than the current state.
Firebase synchronizes state to the listeners, and not necessarily all (write) events that led to this state. So it is possible (and fairly common) for listeners to not get all state changes that happened, for example if multiple changes to the same state happened while they were offline.
A read of data on a client that this client itself has changed, will always see the state including its own changes.
My C++ application needs to support caching of files downloaded from the network. I started to write a native LRU implementation when someone suggested I look at using SQLite to store an ID, a file blob (typically audio files) and the the add/modify datetimes for each entry.
I have a proof of concept working well for the simple case where one client is accessing the local SQLite database file.
However, I also need to support multiple access by different processes in my application as well as support multiple instances of the application - all reading/writing to/from the same database.
I have found a bunch of posts to investigate but I wanted to ask the experts here too - is this a reasonable use case for SQLite and if so, what features/settings should I dig deeper into in order to support my multiple access case.
Thank you.
M.
Most filesystems are in effect databases too, and most store two or more timestamps for each file, i.e. related to the last modification and last access time allowing implementation of an LRU cache. Using the filesystem directly will make just as efficient use of storage as any DB, and perhaps more so. The filesystem is also already geared toward efficient and relatively safe access by multiple processes (assuming you follow the rules and algorithms for safe concurrent access in a filesystem).
The main advantage of SQLite may be a slightly simpler support for sorting the list of records, though at the cost of using a separate query API. Of course a DB also offers the future ability of storing additional descriptive attributes without having to encode those in the filename or in some additional file(s).
Is there a way to automatically move expired documents to blob storage via change feed?
I Google but found no solution to automatically move expired documents to blob storage via the change feed option. Is it possible?
There is not built in functionality for something like that and the change feed would be of no use in this case.
The change feed processor (which is what the Azure Function trigger is using too) won't notify you for deleted documents so you can't listen for them.
Your best bet is to write some custom application that does scheduling archiving and deleted the archived document.
As statements in the Cosmos db TTL document: When you configure TTL, the system will automatically delete the expired items based on the TTL value, unlike a delete operation that is explicitly issued by the client application.
So,it is controlled by the cosmos db system,not client side.You could follow and vote up this feedback to push the progress of cosmos db.
To come back to this question, one way I've found that works is to make use of the in-built TTL (Let CosmosDB expire documents) and to have a backup script that queries for documents that are near the TTL, but with a safe window in case of latency - e.g. I have the window up to 24 hours.
The main reasoning for this is that issuing deletes as a query not only uses RUs, but quite a lot of them. Even when you slim your indexes down you can still have massive RU usage, whereas letting Cosmos TTL the documents itself induces no extra RU use.
A pattern that I came up with to help is to have my backup script enable the container-level TTL when it starts (doing an initial backup first to ensure no data loss occurs instantly) and to have a try-except-finally, with the finally removing/turning off the TTL to stop it potentially removing data in case my script is down for longer than the window. I'm not yet sure of the performance hit that might occur on large containers when it has to update indexes, but in a logical sense this approach seems to be effective.
Coming from a SQL background, I'm wondering how does one go about doing database migration in firebase?
Assume I have the following data in firebase {dateFrom: 2015-11-11, timeFrom: 09:00} .... and now the front-end client will store and expects data in the form {dateTimeFrom: 2015-011-11T09:00:00-07:00}. How do I update firebase such that all dateFrom: xxxx and timeFrom: yyyy are removed and replaced with dateTimeFrom: xxxxyyyy? Thanks.
You have to create your own script that reads, transform and write it back. You may eider read one node at the time or read the whole DB if it is not big. You may decide to leave the logic to your client when it access to it (if it ever does)
I think you are looking for this: https://github.com/kevlened/fireway
I think is a bad idea to pollute a project with conditionals to update data on the fly.
It is a shame firestore doesn't implement a process for this as it is very common and required to keep the app and db in sync.
FWIW, since I'm using Swift and there isn't a solution like Fireway (that I know of), I've submitted a feature request to the Firebase team that they've accepted as a potential feature.
You can also submit a DB migration feature request to increase the likelihood that they create the feature.
I'm using Firebase to store user profiles. I tried to put the minimum amount of data in each user profile (following the good practices advised in the documentation about structuring data) but as I have more than 220K user profiles, it still represents 150MB when downloading as JSON all user profiles.
And of course, it will grow bigger and bigger as I intend to have a lot more users :)
I can't do queries on those user profiles anymore because each time I do that, I reach 100% Database I/O capacity and thus some other requests, performed by users currently using the app, end up with errors.
I understand that when using queries, Firebase need to consider all data in the list and thus read it all from disk. And 150MB of data seems to be too much.
So is there an actual limit before reaching 100% Database I/O capacity? And what is exactly the usefulness of Firebase queries in that case?
If I simply have small amounts of data, I don't really need queries, I could easily download all data. But now that I have a lot of data, I can't use queries anymore, when I need them the most...
The core problem here isn't the query or the size of the data, it's simply the time required to warm the data into memory (i.e. load it from disk) when it's not being frequently queried. It's likely to be only a development issue, as in production this query would likely be a more frequently used asset.
But if the goal is to improve performance on initial load, the only reasonable answer here is to query on less data. 150MB is significant. Try copying a 150MB file between computers over a wireless network and you'll have some idea what it's like to send it over the internet, or to load it into memory from a file server.
A lot here depends on the use case, which you haven't included.
Assuming you have fairly standard search criteria (e.g. you search on email addresses), you can use indices to store email addresses separately to reduce the data set for your query.
/search_by_email/$user_id/<email address>
Now, rather than 50k per record, you have only the bytes to store the email address per records--a much smaller payload to warm into memory.
Assuming you're looking for robust search capabilities, the best answer is to use a real search engine. For example, enable private backups and export to BigQuery, or go with ElasticSearch (see Flashlight for an example).