Bucket4j api does not provide last comsumtion time - concurrenthashmap

I am using bucket4j to do some rate limiting in my project.
I have 1M users
Basically I have for each user 1 bucket
I keep the buckets in a ConcurrentHashMap<String, Bucket> structure
Not all users are connected at the same time and I need to remove entries in the map which have not been recently used
I need to check periodically on the hashMap and delete those entries which bucket has not been consumed let's say in last 5 minutes
How can I do it ? I know I can have my own data structure but I was really expecting that Bucket could provide something last time a token was consumed
Is it possible ?

Bucket4j does not store last consumption time.
I would recomend to avoid to manage collections of buckets by yourseld, because CaffeineProxyManager perfectly covers your case, see sources https://github.com/bucket4j/bucket4j/blob/8.1/bucket4j-caffeine/src/main/java/io/github/bucket4j/caffeine/CaffeineProxyManager.java
It uses Caffeine to manage collection of buckets and performs accourate bucket livetime calculation. Feel free to ask more clarifying questions in github disscussions https://github.com/bucket4j/bucket4j/discussions

Related

How to avoid Firestore document write limit when implementing an Aggregate Query?

I need to keep track of the number of photos I have in a Photos collection. So I want to implement an Aggregate Query as detailed in the linked article.
My plan is to have a Cloud Function that runs whenever a Photo document is created or deleted, and then increment or decrement the aggregate counter as needed.
This will work, but I worry about running into the 1 write/document/second limit. Say that a user adds 10 images in a single import action. That is 10 executions of the Cloud Function in more-or-less the same time, and thus 10 writes to the Aggregate Query document more-or-less at the same time.
Looking around I have seen several mentions (like here) that the 1 write/doc/sec limit is for sustained periods of constant load, not short bursts. That sounds reassuring, but it isn't really reassuring enough to convince an employer that your choice of DB is a safe and secure option if all you have to go on is that 'some guy said it was OK on Google Groups'. Is there any official sources stating that short write bursts are OK, and if so, what definitions are there for a 'short burst'?
Or are there other ways to maintain an Aggregate Query result document without also subjecting all the aggregated documents to a very restrictive 1 write / second limitation across all the aggregated documents?
If you think that you'll see a sustained write rate of more than once per second, consider dividing the aggregation up in shards. In this scenario you have N aggregation docs, and each client/function picks one at random to write to. Then when a client needs the aggregate, it reads all these subdocuments and adds them up client-side. This approach is quite well explained in the Firebase documentation on distributed counters, and is also the approach used in the distributed counter Firebase Extension.

Unity + Firebase: is it possible to append data to a keys value, or do I have to retrieve keys data every time?

I'm a bit worried that I will reach the free data limits of Firebase in a student project.
Basically my question is:
is it possible to append to the end of the string instead of retrieving key and value, appending and uploading again.
What I want to achieve:
I have to create statistics of user right/wrong answers for particular questions.
I want to have a kvp:
answers: 1r/5w/3r
Where number is the number of users guesses and r/w means right wrong. Whenever the guessing session ends I want to add /numberOfGuesses+RightOrWrongAnswer and the end.
I'm using Unity 2018.
Thank you in advance for all the help!
I don't know how your game is architected or how many people are playing, but I'd be surprised if you hit your free limit on a student project (you can store 1GB and download 10GB). That string is 8 bytes, let's assume worst case scenario: as a UTF32 string, that would be 32 bytes of data - you'd have to pull that down 312 million times to hit a cap (there'll be some overhead, but I can't imagine it being a hugely impactful). If you're afraid of being charged, you can opt to not have a credit card on file to be doubly sure you stay on a student budget.
If you want to reduce the amount of reading/writing though, I might suggest that instead of:
key: <value_string> (so, instead of session_id: "1r/5w/3r")
you structure more like:
key:
- wrong: 5
- right: 3
So have two more values nested under your key. One for all the wrong answers, just an incrementing integer. Then one for all the right answers: just an incrementing integer.
The mechanism to "append" would be a transaction, and you should use these whether you're mutating a string or counter. Firebase tries to be smart with data usage and offline caching, but you don't get much more control other than that.
If order really matters, you might want to get cleverer. You'll generally want to work with the abstractions Realtime Database gives you though to maximize any inherent optimizations (it likes to think in terms of JSON documents, so think about your data layout similarly). This may not be as data optimal, but you may want to consider instead using a ledger of some kind (perhaps using ServerValue.Timestamp to record a single right or wrong answer, and having a cloud function listening to sum up the results in the background after a game - this would be especially useful if you plan on having a lot of users trying to write the same key at the same time).

Complicated data structuring in firebase/firestore

I need an optimal way to store a lot of individual fields in firestore. Here is the problem:
I get json data from some api. it contains a list of users. I need to tell if those users are active, ie have been online in the past n days.
I cannot query each user in the list from the api against firestore, because there could be hundreds of thousands of users in that list, and therefore hundreds of thousands of queries and reads, which is way too expensive.
There is no way to use a list as a map for querying as far as I know in firestore, so that's not an option.
What I initially did was have a cloud function go through and find all the active users maybe once every hour, and place them in firebase realtime database in the structure:
activeUsers{
uid1: true
uid2: true
uid2: true
etc...
}
and every time I need to check which users are active, I get all fields under activeUsers (which is constrained to a maximum of 100,000 fields, approx 3~5 mb.
Now i was going to use that as my final mechanism, but I just realised that firebase charges for amount of bandwidth used, not number of reads. Therefore it could get very expensive doing this over and over whenever a user makes this request. And I cannot query every single result from firebase database as, while it does not charge per read (i think), it would be very slow to carry out hundreds of thousands of queries.
Now I have decided to use cloud firestore as my final hope, since it charges for number of reads and writes primarily as opposed to data downloaded and uploaded. I am going to use cloud functions again to check every hour the active users, and I'm going to try to figure out the best way to store that data within a few documents. I was thinking 10,000 fields per document with all the active users, then when a user needs to get the active users, they get all the documents (would be
10 if there are 100,000 total active users) and maps those client side to filter the active users.
So I really have 2 questions. 1, If I do it this way, what is the best way to store that data in firestore, is it the way I suggested? And 2, is there an all around better way to be performing this check of active users against the list returned from the api? Have I got it all wrong?
You could use firebase storage to store all the users in a text file, then download that text file every time?
Well this is three years old, but I'll answer here.
What you have done is not efficient and not a good approach. What I would do is as follows:
Make a separate collection, for all active users.
and store all the active users unique field such as ID there.
Then query that collection. Update that collection when needed.

Firebase/NoSQL Database: Update stats realtime or compute them?

Is it better to store a counter and update it transactionally in realtime every time it is changed or compute it from each possible data source?
For example:
I am building an app that tracks "checkins", similar to FourSquare.
When the user "checks in" to a location I need to track the stats for the individual AND the total for the location over all users.
(This is a simplified version of the app for brevity purposes, the real app tracks much more information).
For example Joe checks into Coffee Place:
Joe/CoffeePlace/+1
Now Fred, Tom and Lisa also check into the Coffee Place:
Fred/CoffeePlace/+1
Tom/CoffeePlace/+1
Lisa/CoffeePlace/+1
The above is necessary to track for the individuals but the best practice for tracking total number of check-ins at Coffee Place is unclear to me.
Which is the right approach?
Gather data from each user node and then compute all checkins -> display total check ins
Create a file for Coffee Place that updates at each check in like:
CoffeePlace/+1+1+1+1
I would imagine for very large datasets that computationally would be time consuming (not because of the computation but because of gathering the data from each node). What is best practice here?
In Firebase (and most NoSQL databases) you often store the data for the way that your application consumes it.
So if you are trying to track the number of check-ins at a place, then you should at the very least track the check-ins for each place.
PlaceCheckins
CoffeePlace
Fred: true
Tom: true
Lisa: true
If you also want to show the places where a specific user has checked in, you'll keep the same data per user (as you already had).
UserCheckins
Fred:
CoffeePlace: true
Tom:
CoffeePlace: true
Lisa:
CoffeePlace: true
This type of data duplication is very normal in Firebase (and again: NoSQL databases in general). Disk space is cheap, the user's time is not. By duplicating the data, you ensure that you can get exactly what you need for a specific screen in your app.
To keep both lists in sync, you'd use multi-location updates or transactions (depending a bit on how you structure the data). A checkin by Jenny into the coffee place could be coded in JavaScript as:
var updates = {};
updates['/PlaceCheckins/CoffeePlace/Jenny'] = true;
updates['/UserCheckins/Jenny/CoffeePlace'] = true;
ref.update(updates);
If you also want to display the count of check-ins (for a specific user of a specific place), you either download the check-ins and tally client-side or you keep a total in the database. If you take the latter approach, have a look at my answer on Is the way the Firebase database quickstart handles counts secure?
Finally: read this article for a great (non-Firebase specific) introduction to NoSQL data modeling.

How to mitigate against long startup times in firebase workers when dataset gets large

Firebase has an interesting feature/nuisance where when you listen on a data ref, you get all the data that was ever added to that ref. So, for example, when you listen on 'child_added', you get a replay of all the children that were added to that ref from the beginning of time. We are writing a commenting system with a dataset that looks something like this:
/comments
/sites
/sites/articles
/users
Sites have many articles and articles have many comments and users have many comments.
We want to be able to track all the comments a user makes, so we feel it is wise to put comments in a separate ref rather than partition them by the articles they belong to. We have a backend listener that needs to do things on new comments as they arrive (increment their child counts, adjust a user's stats etc.). My concern is that, after a while, it will take this listener a long time to start up if it has to process a replay of every comment ever made.
I thought about possibly storing comments only in articles and storing references to each comment's siteId/articleId/commentId in the user table so we could still find all the comments for a given user, but this complicates the backend, as it would then probably need to have a separate listener for each site or even each article, which could make it difficult to manage so many listeners.
Imagine if one of these articles is on a very high-traffic site with tens of thousands of articles and thousands of comments per article. Is the scaling answer to somehow keep track of the traffic levels of every site and set up and partition them in a way that they are assigned to different worker processes? And what about the question of startup time and how long it takes to replay all data every time we load up our workers?
Adding on to Frank's answer, here are a couple other possibilities.
Use a queue strategy
Since the workers are really expecting to process one-time events, then give them one-time events which they can pull from a queue and delete after they finish processing. This resolves the multiple-worker scenario elegantly and ensures nothing is ever missed because a server was offline
Utilize a timestamp to reduce backlog
A simple strategy for avoiding backlog during reboot/startup of the workers is to add a timestamp to all of the events and then do something like the following:
var startTime = Date.now() - 3600 // an hour ago
pathRef.orderByChild('timestamp').startAt( startTime );
Keep track of the last id processed
This only works well with push ids, since formats that do not sort naturally by key will likely become out of order at some point in the future.
When processing records, have your worker keep track of the last record it added by writing that value into Firebase. Then one can use orderByKey().startAt( lastKeyProcessed ) to avoid the backlog. Annoyingly, we then have to discard the first key. However, this is an efficient query, does not cost data storage for an index, and is quick to implement.
If you only need to process new comments once, you can put them in a separate list, e.g. newComments vs. comments (the ones that have been processed). The when you're done processing, move them from newComments to comments.
Alternatively you can keep all comments in a single list like you have today and add a field (e.g. "isNew") to it that you set to true initially. Then you can filter with orderByChild('isNew').equalTo(true) and update({ isNew: false }) once you're done with processing.

Resources