Best way to overcome Firestore compund queries limitations? - firebase

My document has fields distance and duration and I have to query by less than some distance (lets say 10km or 100km) and less than some duration (1h or 2h). Obviously when I tried to run a query I got the comparisons must all filter on the same field query error.
My solution was to calculate a boolean field for each of the distance and duration condition (distanceIsLessThan10Km, distanceIsLessThan100Km, durationIsLessThan1h, durationIsLessThan2h) which allows me to use queries on both distance and duration.
So now if user selects distance less than 10km and duration less than 2h I can do queries like
.Where("distanceIsLessThan10Km", distanceIsLessThan10Km, true)
.Where("distanceIsLessThan10Km", durationIsLessThan2h, true)
My question is is this a good solution or is there a better way to solve this?

Turning your range conditions into equality checks is indeed a common workaround, and is feasible in cases where you have limited set of ranges to check for.
Since it seems you ranges are static there are very few downsides, but in some other cases you may need to update the additional fields - for which Cloud Functions are a great tool to use.

Related

Is there a workaround for the Firebase Query "NOT-IN" Limit to 10?

I saw a similar question here: Is there a workaround for the Firebase Query "IN" Limit to 10?
The point now is, with the query in, the union works, but with the query
not-in it will be intersection and give me all the documents, anyone knows how to do this?
As #samthecodingman mentioned, it's hard to provide specific advice without examples / code, but I've had to deal with this a few times and there are a few generalized strategies you can take:
Restructure your data - There's no limit on the number of equality operators you can use You can use up to 100 equality operators, so one possible approach is to store your filters/tags as a map, for example:
id: 1234567890,
...
filters: {
filter1: true,
filter2: true,
filter3: true,
}
If a doc doesn't have a particular tag, you could simply omit it, or you could set it to false, depending on your use case.
Note, however, that you may need to create composite indexes if you want to combine equality operators with inequality operators (see the docs). If you have too many filters, this will get unwieldy quickly.
Query everything and cache locally - As you mentioned, fetching all the data repeatedly can get expensive. But if it doesn't change too often or it isn't critical to get the changes in real time, you can cache it locally and refresh at some interval (hourly or daily, for example).
Implement Full-Text Search - If neither of the previous options will work for you, you can always implement full-text search using one of the services Firebase recommends like Elastic. These are typically far more efficient for use-cases with a high number of tags/filters, but obviously there's an upfront time cost for setup and potentially an ongoing monetary cost if your usage is higher than the free tiers these services offer.

How to return x nearest people in flutter with Firebase?

Is it possible to query a certain number of nearest locations from a cloud firestore database in Flutter? The geoflutterfire package only appears to allow you to query locations within a radius. Is the only solution to slowly increase the radius until an acceptable number of users is found? This sounds like a very unclean solution. Are there other packages or methods that will give you this functionality?
Due to the nature of how the Geo*Fire packages work, they cannot return the X nearest results.
The common pattern:
Start with a reasonable range, client-side order the results on distance, and then return the top X.
If you got too few results, try again with a larger range.
Increasing the range is not as expensive in Firestore as you may think, as the documents for the smaller range will already be in the client-side cache after the first geoquery.

What happens when the top-k query does not find enough documents to satisfy k constraint?

I am evaluating the top-k range query using NDCG. Given a spatial area and a query keyword, my top-k range query must return k documents in the given area that are textual relevant to the query keyword.
In my scenario, the range query usually finds only one document to return. But I have to compare this query to another one who can find more objects in the given area, with the same keyword. This is possible because an approach I am testing to improve objects description.
I am not figuring out how to use NDCG to compare these two queries in this scenario. I would like to compare Query A and B using NDCG#5, NDCG#10, but Query A only finds one object. Query A will have high NDCG value because of its lower ability to find more objects (probably the value will be one - the maximum). Query B finds more objects (in my opinion, a better solution) but has a lower NDCG value than query A.
You can consider looking at a different measure, e.g. Recall#10, if you care less about the ranking for your application.
NDCG is a measure designed for web search, where you really want to penalize a system that doesn't return the best item at the topmost result, which is why it has an exponential decay factor. This makes sense for navigational queries like ``stackoverflow'' you will look quite bad if you don't return this website first.
It sounds like you are building something a little more sophisticated, where the user cares about many results. Therefore, a more recall-oriented measure (that cares about getting multiple things right more than the ranking) may make more sense.
its lower ability to find more objects
I'd also double-check your implementation of NDCG: you always want to divide by the ideal ranking, regardless of what actually gets returned. It sounds like your Query A returns 1 correct object, but Query B returns more correct objects, but not at high ranks? Either way, you expect Query A to be divided by the DCG of a perfect ranking -- that means 10, 20, or thousands of "correct" objects. It may be that you just don't have enough judgments, and therefore your "perfect ranking" is too small, and therefore you aren't penalizing Query A enough.

Firebase/GeoFire - Most popular item at location

I am currently in the evaluation process for a database that should serve as a backend for a mobile application.
Right now I am looking at Firebase, and for now I like it really much.
It is a requirement to have the possibility to fetch the
most popular items
at a certain location
(possibly in the future: additionally for a certain time range that would be an attribute of the item)
from the database.
So naturally I stumbled upon GeoFire that provides location based query possibilities for Firebase.
Unfortunately - at least as far as I understood - there is no possibility to order the results by an attribute other than the distance. (correct me if I am wrong).
So what do I do if I am not interested in the distance (I only want to have items in a certain radius, no matter how far from the center) but in the popularity factor (e.g. for the sake of simplicity a simple number that symbolizes popularity)?
IMPORTANT:
Filtering/Sorting on the client-side is not an option (or at least the least preferred one), as the result set could potentially grow to an infinite amount.
First version of the application will be for android, so the Firebase Java Client Library would be used in the first step.
Are there possibilities to solve this or is Firebase out of the race and not the right candidate for the job?
There is no way to add an extra condition to the server-side query of Geofire.
The query model of the Firebase database allows filtering only on a single property. Geofire already performs a seemingly impossible feat of filtering on both latitude and longitude. It does this through the magic of Geohashes, which combine latitude and longitude into a single string.
Strictly speaking you could find a way to extend the magic of Geohashes to add a third number into the mix. But while possible, I doubt it's feasible for most of us.

How to retrieve a row's position within a DynamoDB global secondary index and the total?

I'm implementing a leaderboard which is backed up by DynamoDB, and their Global Secondary Index, as described in their developer guide, http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
But, two of the things that are very necessary for a leaderboard system is your position within it, and the total in a leaderboard, so you can show #1 of 2000, or similar.
Using the index, the rows are sorted the correct way, and I'd assume these calls would be cheap enough to make, but I haven't been able to find a way, as of yet, how to do it via their docs. I really hope I don't have to get the entire table every single time to know where a person is positioned in it, or the count of the entire table (although if that's not available, that could be delayed, calculated and stored outside of the table at scheduled periods).
I know DescribeTable gives you information about the entire table, but I would be applying filters to the range key, so that wouldn't suit this purpose.
I am not aware of any efficient way to get the ranking of a player. The dumb way is to do a query starting from the player with the highest point, move downward, keep incrementing your counter until you reach the target player. So for the user with lowest point, you might end up scanning the whole range.
That being said, you can still get the top 100 player with no problem (Leaders). Just do a query starting from the player with the highest point, and set the query limit to 100.
Also, for a given player, you can get 100 players around him with similar points. You just need do two queries like:
query with hashkey="" and rangekey <= his point, limit 50
query with hashkey="" and rangekey >= his point, limit 50
This was the exact same problem we were facing when we were developing our app. Following are two solutions we had come with to deal with this problem:
Query your index with scanIndex->false that will give you all top players (assuming your score/points key in range) with limit 1000. Then applying this mathematical formula y = mx+b where you can take 2 iteration, mostly 1 and last value to find out m and b, x-points, and y-rank. Based on this you will get the rank if you have user's points (this will not be exact rank value it would be approximate, google does the same if we search some thing in our mail it show
and not exact value in first call.
Get all the records and store it in cache until the next update. This is by far the best and less expensive thing we are using.
The beauty of DynamoDB is that it is highly optimized for very specific (and common) use cases. The cost of this optimization is that many other use cases cannot be achieved as easily as with other databases. Unfortunately yours is one of them. That being said, there are perfectly valid and good ways to do this with DynamoDB. I happen to have built an application that has the same requirement as yours.
What you can do is enable DynamoDB Streams on your table and process item update events with a Lambda function. Every time the number of points for a user changes you re-compute their rank and update your item. Even if you use the same scan operation to re-compute the rank, this is still much better, because it moves the bulk of the cost from your read operation to your write operation, which is kind of the point of NoSQL in the first place. This approach also keeps your point updates fast and eventually consistent (the rank will not update immediately, but is guaranteed to update properly unless there's an issue with your Lambda function).
I recommend to go with this approach and once you reach scale optimize by caching your users by rank in something like Redis, unless you have prior experience with it and can set this up quickly. Pick whatever is simplest first. If you are concerned about your leaderboard changing too often, you can reduce the cost by only re-computing the ranks of first, say, 100 users and schedule another Lambda function to run every several minutes, scan all users and update their ranks all at the same time.

Resources