Firebase - preventing malicious user from pulling entire database - firebase

I was watching the Firebase doc videos and noticed that in this video: https://www.youtube.com/watch?v=9sOT5VOflvQ&list=PLl-K7zZEsYLn8h1NyU_OV6dX8mBhH2s_L&index=4
at 6:39, Doug mentions that it is possible to limit the amount of document returned by one query, by doing something like:
allow list if: request.query.limit <= 20
However, he mentions that, although this is beneficial because it prevents you from accidentally executing a very costly set of reads, it still won't prevent malicious users from reading everything in your database by making multiple requests and using pagination to sift through your database. I could envision some sort of infinite while loop in JavaScript that makes this very problematic and costly.
The only way that I could think to solve this problem is by somehow using timestamps perhaps, and saving some information associated with each user which informs the database of when they last made a request. Would it be possible to do this and then access those timestamps in the security rules? Something along the lines of (where the second condition is kind of pseudo-code):
allow list if: request.query.limit <= 20 && get(/databases/$(database)/documents/users/$(request.auth.uid)).data.last-time <= 100
This seems to me the most feasible way but if anyone else has thoughts on this, they would be much appreciated!

The problem is that you can't update last-time when this query happens. And since you can't update last-time when they read, there is no way to restrict the number of reads a user can perform through this mechanism.
Because of this limitation it is possible implement a write rate limit in security rules, but not a read rate limit.

Related

Update Firestore with user last active date

I'm looking at writing the date a user was last active to my firestore users table. This information is available in the metadata of the user - lastRefreshTime.
https://firebase.google.com/docs/reference/admin/node/admin.auth.UserMetadata
Has anyone already done this before?
I am looking for an efficient way to do this with minimal writes.
I could run a daily process that checks all users and the dates and updates if changed but wondering if there is a better more efficient way.
How about having each client write it themselves when they go online?
It won't be guaranteed (as malicious users may call the API themselves without writing the value), but it will prevent you from having to have an administrative process over a data set that will be hard to predict the growth of.

Firestore rules, atomic writes, and write limits

I have a two-part question regarding Firestore rule evaluations. The parts are related which is why the single question here...
Part I - Atomic write access
Let's say that I have a written rule such as
allow write: if resource.data.claimedBy == null && request.data.claimedBy == request.auth.uid;
The idea is that any user can make a claim to this resource. But, what if this resource is made available to 1000 users all at once and they all jump to make a claim and make the .update() call all at the same time?
Will this be a first wins scenario? Firebase will be set as the field for the first user, the winner, and then everyone else will have their writes rejected because the rule would fail due to a value being present from the winner's write? Or is there any risk whatsoever that a race condition could result and somehow for a moment the value was one thing and then became another?
I feel like the rules would prohibit a race condition, but I don't know for certain.
Part II - Write limits
Ok, so this part builds off of the first. Firestore has a write limit of 1 write/second in general for a single document. Let's assume Part I works how I hope and the other 999 users will get a write rejection. Do these rejections count towards the write limit of 1 write/sec because a write was initiated, or do they not count because the rules prohibit an actual write?
Obviously, having all these claim attempts at once count as 1000 writes would be bad for the 1 write/second limit.
I am assuming here, but I believe it would not count toward that limit because my understanding is that the limit is imposed by the nature of the underlying storage mechanics, and the rules prevent going to that layer upon rejection. But also again, I don't know for certain.
Part III - Bonus part
Do writes that are rejected by a rule still count as a "write" as far as billing is concerned? I know a query for a document that does not exist (no documents actually read) still counts as a single read, so I am wondering if writes with regards to rules which prohibit the underlying write works in a similar way and incurs a charge.
Thank you so much!
You should use a transaction to avoid and prevent concurrent writes. The transaction can check if the document was previously claimed, then abort if it was.
If a write was denied by a rule, it doesn't count toward any write limits or billing, as no data in the document was actually changed.

Firestore Rules: Allow or limit request to only once every 24h?

Is there a native or efficient way to restrict the user to load a document from a collection only once every 24h?
//Daily Tasks
//User should have only read rights
//User should only be able to read one document every 24h
match /tasks/{documents} {
allow read: if isSignedIn() && request.query.elapsedHours > 24;
}
I was thinking that I might be able to do this using a timestamp in the user document. But this would consume unnecessary writing resources to make a write to the user document with every request for a task document. So before I do it this way, I wanted to find out if anyone had a better approach.
Any ideas? Thanks a lot!
There is no native solution, because security rules can't write back into the database to make a record of the query.
You could instead force access through a backend (such as Cloud Functions) that also records the time of access of the particular authenticated user, and compare against that every time. Note that it will incur an extra document read every call.
There is no real "efficient" way to do so, neither a native at the moment of writing. And finding an actual solution to this "problem" won't be easy without further extensions.
There are however workarounds like with cloud functions for firebase that open new options for solving various limitations firestore has.
A native solution would be keeping track somewhere in the database when each user last accessed the document. This would, as you mentioned, create unnecessary reads and writes just for tracking.
I would prefer a caching mechanism on the client and allow the user to execute multiple reads. Don't forget that if the user clears the cache on the device, he has to query the document(s) again and won't get any data at all if you restrict him completely that way.
I think the best approach, due to the high amount of reads you get, is to cache on client side and set only a limit control (see Request.query limit value). This would look somehow like below:
match /tasks/{documents} {
// Allow only signed in users to query multiple documents
// with a limit explicitly set to less than or equal to 20 documents per read
allow list: if isSignedIn() && request.query.limit <= 20;
// Allow single document read to signed in users
allow get: if isSignedIn();
}

Firestore pricing

There are several questions asked about this topic but I cant find one that answers my question. As described here, there is no clear explanation as to whether the minimum charges are applicable to query.get() or real-time listeners as well. Quoted:
There is a minimum charge of one document read for each query that you perform, even if the query returns no results.
The reason am asking this question even though it may seem obvious for someone is due to the section; *for each query that you perform* in that statement which could mean a one time trigger e.g with get() method.
Scenario: If 10 users are listening to changes in a collection with queries i.e query.addSnapshotListener() then change occurs in one document which matches query filter of only two users, are the other eight charged a cost of one read too?
Database used: Firestore
In this scenario I would say no, the other eight would not be counted as reads because the documents they are listening to have not been updated or have not been added/removed from that collection based on their filters (query params). The reads aren't based on changes to the collection but rather changes to the stream of documents you are specifically listening to. Because that 1 document change was not part of the documents that the other 8 users were listening to then there is no new read for them. However, if that 1 document change led to that document now matching the query filters of those other 8, then yes there would be 8 new reads for those users. Hope that makes sense.
Also it's worth noting that things like have offlinePersistence enabled via the sdk and firestore's caching maximize the efficiency of limiting reads as well as using a singleton Observable that multiple instances in your app subscribe to as oppose to opening multiple streams of the same query throughout your app. Doesn't really apply to this question directory but again while in the same vein, it's worth noting.

What would be the best way to store the questions and responses for a survey where I need to keep the traffic on the database to a minimum?

Background
I am writing a survey that is going to a large audience. It contains 15 questions and there are five possible answers to each question along with potential comments.
The user can cycle through all 15 questions answering them in any order and is allowed to leave the survey at any point and return to answer the remaining questions.
Once an answer has been attempted on all 15 questions a submit button appears which allows them to submit the questions as final answers. Until that stage all answers are required to be retrievable whenever the user loads the survey page up.
The requirement is that the user only sees one question on a page and 'Previous' and 'Next' buttons allow the user to scroll through the questions.
Requirement
I could request the question each time the user clicks a button and save the current response and so on but that would be a large number of hits to a database that is already heavily used. I don't have the time to procure a new server etc so I have to make do with what I have. Is there any way I can cache the questions on the user machine and/or responses? Obviously I need the response data to be secure and only known to the user so I feel a little bit stuck as for the best way of doing this. Any pointers?
I am prepared to offer a bounty of 100 points on this question if it means I get some good quality discussion and feedback going.
Unless there's a reason for using a database, you could always store the results in flat files on the server itself. It doesn't sound like the data you're storing is relational in any way. Worst comes to worst, you could always insert them back into a relational db as a batch job every night.
Another option would be the application cache. However, if your web server suddenly crashes on you, you risk losing information from there.
You could also store the values in the user's cookies.
Based on my personal experience (serving thousands of short survey pages per second) I suspect your fears are unfounded. Among other reasons, the DBMS will cache such small amounts of data far more efficiently that you can.
I've tested this, loading the questions and answers into an Application-scope collection at start up, and serving them from memory after that - often it made no difference at all.
Your alternative is to send everything at once to the browser, and write it as a javascript application, storing the data in (encrypted) cookies and only hitting the database when the whole thing is done. This is tedious but not difficult.
You have three requirements that need to be balanced:
users must be able to return to their survey at any time
answers entered by users must be saved with the least possible chance of data loss
need to minimize database hits
Any solution that involves caching answers in a volatile place (cookies, session, etc) will increase the risk of data loss. The final solution depends on how you rank the three requirements in importance. If the db issue is at the top, then you will either need to risk data loss, or spend a lot of extra time coding a solution using some temporary storage scheme (like Kevin's flat file idea).
A couple of folks suggested that you may be optimizing prematurely. I suggest you consider that idea first - maybe this whole thing is moot.
However, assuming that your db situation is a real problem, I think your best balance of requirements will be a system that saves answer to the db immediately (to prevent data loss) but carefully manages when you actually have to hit the db.
When the app starts up (or when the first user requests the survey) load the survey and its questions into application cache. If any of the questions have a pick list of possible answers, load these also. You will only have to hit the db once during the application lifetime (or your cache duration) to load survey data.
When a user starts their survey, run a single query to load any existing answers (in case they are a returning user) into an object in session - could be as simple as a <List>string. (If you can somehow identify a new user without having to hit the db, then you can skip this step for new users.)
Use the session answer object along with the survey question object in app cache to populate each page without hitting the db again.
When the user submits an answer, compare it to the session answer object to see if it has changed (she may be just clicking 'next' on a page with a previously entered answer). If the answer is new, or has changed, the save it to the db and to the session answer object.
When the user leaves the survey, you don't need to do anything - everything is saved already.
With this scheme, you hit the db once to load the survey, once for each user when they start (or restart) the survey, and once for each new or modified answer. Probably not as much of a reduction as you were hoping for, but it gives you the best data protection.
If the database trips are a problem, you can cache them in the web server (or wherever your application resides) but it sounds like each answer needs to be recorded as the user goes to the next question.
If the questions and possible answers are identical for everyone, I would definitely cache them in the application layer - this can be stored in the Application object. In any case, you could certainly optimize the database calls to return the results as efficiently as possible - i.e. multiple result sets or a joined result set from a single stored proc. If you don't mind multiple copes for each session (or if there is variation), you can stored it in the Session object. Storing it on the client (i.e. a cookie) is not really secure and kind of pointless from a web server-client bandwidth saving persepective.
This sounds a lot like premature optimization to me, though.
Your scenario is a perfect candidate for Predictive Fetch Pattern. I would suggest that you cache all your questions. When the user signs in use the pattern to fetch the first 5 answers (if they have given any answers) and based on their navigation (where their current question is) get the information from the Response object or from the DB.
HTH
Not sure of the languages etc you are using, but most have an application cache. I would store the questions there, and retrieve them from the database and store them when they are not in the cache (when the application recycles).
As for the answers, are the users logging in some how? Is it feasible to save answers in a cookie until all questions are answered?
Edit:
If cookies aren't reliable enough, you could store (in the application cache) a list of queries (inserts/updates) to be executed, they would not be executed until an a query limit was reached or under certain conditions (i.e. execute the query list when a user requests answers that are in the list, execute list when the application recycles, etc).
Pretty crude, but you get the idea:
if (function == "get question" && userQuestionIsInQueue) || function == "finish survey"
execute(Application["querylist"]);
continue as normal...
if function == "submit answer"
if Application["querylist"] == null
Application["querylist"] = newAnswerQuery;
else
Application["querylist"] += newAnswerQuery;
You'd also need to add execute(Application["querylist"]) to the recycle event, I believe you can hook it in the global.asax
Edit 2:
I would also accumulate all database transactions for a request into 1, you if you did have to execute the list, then followed by getting the answer for the user, do them in the same transaction and save a trip. Common practice when optimizing.
This is a classic problem to do with maintaining state between pages in a browser based system. Im also assuming that we want this data to persist even if the user logs out and comes back later. Here are the options:
With a high availability server we can keep a single collection of 15 answers in memory (not session) for this user (probably not a good idea and not easily load balanced)
We denormalise the 15 answers into 1 row of a sql table
We persist the data on the client using a cookie or localStorage (IE8).
My feeling is that the first two options are probably not what you are looking for, so lets explore the last option.
You could quite simply store the answers in a cookie. There is a small chance that this could get lost, and that the user may log in from another machine, but this may be an acceptable risk. With with latest browsers that support HTML5 (inc IE8 afaik) you get the benefit of localStorage which is not as easily deleted as a cookie. You could fall back to cookies if this wasnt available.
Cookies can be encrypted if required.
I would like to offer you the new feature of HTML5 which is called Dom Storage but since only the new browsers are supporting it, it could be a problem using it at this point.
With DOM Storage, you can store data on user browser. Since it can store up to 5MB per domain in Mozilla Firefox[3], Google Chrome, and Opera, 10MB per storage area in Internet Explorer, you can store answers and question ids in the DOM Storage.
Even with DOM Storage, let alone Database hit, you can reduce server hits as well.
Since we all know working with cookies is hassle sometimes and it can store 4kb, the easiest way is now to store key-value information in DOM Storage.
You can store key-value information specifically for sessions as well as locally. When session ends, the session based info will be wiped off from the browser but if you store local based values, even the user closes the tab, the key-value will remain for a while.
Example Code:
<p>
You have viewed this page
<span id="count">an untold number of</span>
time(s).
</p>
<script>
var storage = window.localStorage;
if (!storage.pageLoadCount) storage.pageLoadCount = 0;
storage.pageLoadCount = parseInt(storage.pageLoadCount, 10) + 1;
document.getElementById('count').innerHTML = storage.pageLoadCount;
</script>
You can learn more about DOM Storage from the links below :
https://developer.mozilla.org/en/DOM/Storage
http://en.wikipedia.org/wiki/Web_Storage
http://msdn.microsoft.com/en-us/library/cc197062%28VS.85%29.aspx
do you mean...a cookie?

Resources