For my application it is important to synchronize data between the client and server constantly. I'm thinking of integrating Firestore in my application, but I am wondering what the bandwidth of my collection(s) would be.
As I see it, every time Firestore synchronizes client side and server side, it fetches the entire collection from the server side and checks the differences between the two on the client side. This is a crude, but effective strategy. However, I can't be pulling in what is currently 100kb of database data every few seconds.
The application is loaded onto tablets that have a monthly max of 10GB 4G data. I can't exceed this limit and I don't intend to. By my calculations running this application with a sync of every 10 seconds on 1 tablet for 8 hours a day, 31 days a month, it will use 8.928GB of data. This would suffice for now, but what do I do when my collection size grows to 200 or 300kb?
My question:
How often does Firestore sync data and is there any way I can avoid pulling in the entire collection every single time while still keeping 100% data integrity?
Related
Is it possible to limit the speed at which Google Firestore pushes writes made in an app to the online database?
I'm investigating the feasibility of using Firestore to store a data stream from an IoT device via a mobile device/bluetooth.
The main concern is battery cost - receive a new data packet roughly two minutes, I'm concerned about the additional battery drain that an internet round-trip every two minutes, 24hrs a day, will cost. I also would want to limit updates to wifi connections only.
It's not important for the data to be available online real-time. However it is possible for multiple sources to add to the same datastream in a 2-way sybc, (ie online DB and all devices have the merged data).
I'm currently handling that myself, but when I saw the offline capability of Datastore I hoped I could get all that functionality for free.
I know we can't directly control offline-mode in Firestore, but is there any way to prevent it from always and immediately pushing write changes online?
The only technical question I can see here has to do with how batch writes operate, and more importantly, cost. Simply put, a batch write of 100 writes is the same as writing 100 writes individually. The function is not a way to avoid the write costs of firestore. Same goes for transactions. Same for editing a document (that's a write). If you really want to avoid those costs then you could store the values for the thirty minutes and let the client send the aggregated data in a single document. Though you mentioned you need data to be immediate so I'm not sure that's an option for you. Of course, this would be dependent on what one interprets "immediate" as based off the relative timespan. In my opinion, (I know those aren't really allowed here but it's kind of part of the question) if the data is stored over months/years, 30 minutes is fairly immediate. Either way, batch writes aren't quite the solution I think you're looking for.
EDIT: You've updated your question so I'll update my answer. You can do a local cache system and choose how you update however you wish. That's completely up to you and your own code. Writes aren't really automatic. So if you want to only send a data packet every hour then you'd send it at that time. You're likely going to want to do this in a transaction if multiple devices will write to the same stream so one doesn't overwrite the other if they're sending at the same time. Other than that I don't see firestore being a problem for you.
What is the meaning of the following?
50 Max Connections, 5 GB Data Transfer, 100 MB Data Storage.
Can anyone explain me? Thanks
EDIT - Generous limits for hobbyists
Firebase has now updated the free plan limits
Now you have
100 max connections
10 GB data transfer
1 GB storage
That means that you can have only 50 active users at once, only 5GB data to be transferred within one month and store only 100 MB of your data.
i.g. you have an online web store: only 50 users can be there at once, only 100 mbytes of data (title, price, image of item) can be stored in DB and only 5 GB of transfer - means that your web site will be available to deliver to users only 5gb of data (i.e. your page is 1 mbyte size and users will be able to attend that page only 50 000 times).
UPD: to verify the size of certain page (to define if 5gb is enough for you) - using google chrome right click anywhere on page - "Inspect Element" and switch to tab "Network". Then refresh the page. In bottom status bar you will amount of transferred data (attached size of current stackoverflow page, which is 25 kbytes)
From the same page where the question was copied/pasted:
What is a concurrent connection?
A connection is a measure of the number of users that are using your
app or site simultaneously. It's any open network connection to our
servers. This isn't the same as the total number of visitors to your
site or the total number of users of your app. In our experience, 1
concurrent corresponds to roughly 1,400 monthly visits.
Our Hacker Plan has a hard limit on the number of connections. All of
the paid Firebases, however, are “burstable”, which means there are no
hard caps on usage. REST API requests don't count towards your
connection limits.
Data transfer refers to the amount of bytes sent back and forth between the client and server. This includes all data sent via listeners--e.g. on('child_added'...)--and read/write ops. This does not include hosted assets like CSS, HTML, and JavaScript files uploaded with firebase deploy
Data storage refers to the amount of persistent data that can live in the database. This also does not include hosted assets like CSS, HTML, and JavaScript files uploaded with firebase deploy
These limits mentioned and discussed in the answers, are per project
The number of free projects is not documented. Since this is an abuse vector, the number of free projects is based on some super secret sauce--i.e. your reputation with Cloud. Somewhere in the range of 5-10 seems to be the norm.
Note also that deleted projects take around a week to be deleted and they continue to count against your quota for that time frame.
Ref
I have an application that performs complex queries against what amounts to data organized in a "star schema". The gold-owner keeps adding new "axes" to perform searches on, with the result that performance becomes worse over time. Currently, the execution of a search operation, using a stored procedure to do all the work on the SQL server, takes about 2 seconds, which doesn't fit the gold-owner's desire to have the code be interactive (<0.1 sec response time). Looking at the SQL Server query analyzer, the search is IO-bound on 9 table scans of 100,000 records, and then doing brutal joins. Due to the nature of the queries I need to perform and the limitations of SQL, this cannot be improved.
In desperation, I've rewritten the query processor so that it sucks in the 100,000 records into a cache at application start, then perform the complex queries against the cached memory. Loading all the records from the database takes about 12 seconds. This expensive initial load is mitigated by my rewritten query processor. It now only needs to do a single scan through the records, and gives a response time of 0.02 seconds.
This good news is tainted by the gold-owner's discovery that the 12-second hit for populating the cache is being experienced every hour or so. I'm currently storing the data in the ASP.NET application state, as Application["FactTable"]. It seems the application state is being reset after the ASP.NET application is idle for longer than a dozen minutes or so.
If I move the 100,000 records into the ASP.NET application cache, will I be experiencing these evictions just as often, or can I rely on the data remaining in memory for the fast retrievals for longer periods of time? If the ASP.NET cache is also victim to application resets, what other mechanism should I use? A separate app domain hosting an instance of my database cache comes to mind, but I don't want to go down that route unless my other options are closed off.
I realise that you have a lot of data and processing and you must have tried a few things to speed this scenario up, but using Application State which is managed by IIS will be volatile...
Have you thought of running the your calculations etc in another process, ie, create a windows service that periodically runs the queries to organise your data and save that "flat" data to a database cache. When the user requests the data, they will just get the last DB cached results... and then further speed this up by holding those results in the Application state which can just refresh itself if that gets destroyed?
We are building an app in Meteor that will be participating in an education ecosystem.
There are a number of applications (e.g. a GradeBook, a Student Information System, a Reporting System...) that will all need to have their data stores kept in synch with Meteor. The datastore size will be in the hundreds of thousands of documents.
My understanding is that DDP is used to connect "clients" to a Meteor app (by subscribing to feeds when Meteor is pushing data changes and RPC to get the data in to Meteor). And a "client" is generally scoped to a user...so the size of the data set is relatively small compared to the universe of data (a teacher might have access to 100 of the 250K documents).
If I connected a Reporting System (as a "client") to Meteor with DDP, all data in the store would need to be synched...does that mean that every time the Reporting System lost the connection to Meteor, all data would be re-sent from Meteor to the DDP client? (because the Reporting System is interested in ALL the data)...and if that's the case, DDP wouldn't be the way to keep application in synch, right?...it's meant more for much smaller scoped data sets....and we should probably be interacting directly with Mongo to keep things synch.
Thanks!
Mike
based on this
http://meteor.com/blog/2012/03/21/introducing-ddp
Distributed Data Protocol. DDP is a standard way to solve the biggest problem facing client-side JavaScript developers: querying a server-side database, sending the results down to the client, and then pushing changes to the client whenever anything changes in the database.
it seems clear that any new DDP client, receives all data and then deltas as the data changes.
i would suggest that if your 'client' doesnt need reactivity / realtime updates / 2 way synching, you should pull the data directly from mongo and avoid the overhead of 'syncing'. for a 'reporting system' this should be perfectly acceptable, grab a bunch of data, generate reports. you shouldnt care about changing data in this context, just a snapshot and reports from that snapshot.
if you do need the more real time features, DDP is likely worth the overhead and initial setup difficulty.
I think nate's answer goes perfect on what you should do especially considering the volume of data. And if you need to display a whole lot of data if you're using pages to use a paginated subscription so that you can enjoy the realtime functionality (if you decide to use it) without downloading it all at once. Keep in mind though that at the moment the data is sent down like this (for each session, so if the tab is closed and reopened it will be redone):
1 - Connect to DDP Server/Proxy (Long Polling now due to websocket issues with chrome)
2 - Establish a 'subscription'
3 - Fetch all data relevant to subscription (initial download)
4 - Subscription is complete, now the client will 'listen' for changes
5 - Any updates (remove/update/insert, etc) are sent down to the client
There really isn't a sync system at this point where old data is kept offline (in a localstorage or indexed db or anything) so that step no 3 can be avoided and only the syncs from that point would occur.
This is mind, if there is a connectivity interruption (e.g losing connectivitiy for a short peroid of time Meteor will lose connection to the DDP wire and when it reconnects it download everything again as if it were from scratch.
I have seen several websites that show you a real time update of what's going on in the database. An example could be
A stock ticker website that shows stock prices in real time
Showing data like "What other users are searching for currently.."
I'd assume this would involve some kind of polling mechanism that queries the database every few seconds and renders it on a web page. But the thought scares me when I think about it from the performance standpoint.
In an application I am working on, I need to display the real time status of an operation that a user has submitted. Users wait for the process to be completed. As and when an operation is completed, the status is updated by another process (could be a windows service). Should I query the database every second to get the updated status?
It's not necessarily done in the db. As you suggested that's expensive. Although db might be a backing store, likely a more efficient mechanism is used to accompany the polling operation like storing the real-time status in memory in addition to finally on the db. You can poll memory much more efficiently than SELECT status from Table every second.
Also as I mentioned in a comment, in some circumstances, you can get a lot of mileage out of forging the appearance of status update through animations and such, employ estimation, checking the data source less often.
Edit
(an optimization to use less db resources for real time)
Instead of polling the database per user to check job status every X seconds, slightly alter the behaviour of the situation. Each time a job is added to the database, read the database once to put meta data about all jobs in the cache. So , for example, memory cache will reflect [user49 ... user3, user2, user1, userCurrent] 50 user's jobs if 1 job each. (Maybe I should have written it as [job49 ... job2, job1, job_current] but same idea)
Then individual users' web pages will poll that cache which is always kept current. In this example the db was read just 50 times into the cache (once per job submission). If those 50 users wait an average 1 minute for job processing and poll for status every second then user base polls the cache a total of 50 users x 60 secs = 3000 times.
That's 50 database reads instead of 3000 over a period of 50 min. (avg. one per min.) The cache is always fresh and yet handles the load. It's much less scary than considering hitting the db every second for each user. You can store other stats and info in the cache to help out with estimation and such. As long as fresh cache provides more efficiency then it's a viable alternative to massive db hits.
Note: By cache I mean a global store like Application or 3rd party solution, not the ASP.NET page cache which goes out of scope quickly. Caching using ASP.NET's mechanisms might not suit your situation.
Another Note: The db will know when another job record is appended no matter from where, so a trigger can init the cache update.
Despite a good database solution, so many users polling frequently is likely to create problems with web server connections and you might need a different solution at that level, depending on traffic.
Maybe have a cache and work with it so yo don't hit the database each time the data is modified and update the database every few seconds or minutes or what you like
The problem touches many layers of a web application.
On the client, you either use an iframe whose content autorefreshes every n seconds using the meta refresh tag (HTML), or a javascript which is triggered by a timer and updated a named div (AJAX).
On the server, you have at least two places to cache your data:
One is in the Application object, where you keep a timestamp of the last update, and refresh the cached data as your refresh interval elapses.
If you want to present data from a database, keep aggregated values or cache relevant data for faster retrieval.