Firebase/GeoFire - Most popular item at location - firebase

I am currently in the evaluation process for a database that should serve as a backend for a mobile application.
Right now I am looking at Firebase, and for now I like it really much.
It is a requirement to have the possibility to fetch the
most popular items
at a certain location
(possibly in the future: additionally for a certain time range that would be an attribute of the item)
from the database.
So naturally I stumbled upon GeoFire that provides location based query possibilities for Firebase.
Unfortunately - at least as far as I understood - there is no possibility to order the results by an attribute other than the distance. (correct me if I am wrong).
So what do I do if I am not interested in the distance (I only want to have items in a certain radius, no matter how far from the center) but in the popularity factor (e.g. for the sake of simplicity a simple number that symbolizes popularity)?
IMPORTANT:
Filtering/Sorting on the client-side is not an option (or at least the least preferred one), as the result set could potentially grow to an infinite amount.
First version of the application will be for android, so the Firebase Java Client Library would be used in the first step.
Are there possibilities to solve this or is Firebase out of the race and not the right candidate for the job?

There is no way to add an extra condition to the server-side query of Geofire.
The query model of the Firebase database allows filtering only on a single property. Geofire already performs a seemingly impossible feat of filtering on both latitude and longitude. It does this through the magic of Geohashes, which combine latitude and longitude into a single string.
Strictly speaking you could find a way to extend the magic of Geohashes to add a third number into the mix. But while possible, I doubt it's feasible for most of us.

Related

How to model data in dynamodb if your access pattern includes many WHERE conditions

I am a bit confused if this is possible in DynamoDB.
I will give an example of SQL and explain how the query could be optimized and then I will try to explain why I am confused on how to model this and how to access the same data in DynamoDB.
This is not company code. Just an example I made up based on pcpartpicker filter.
SELECT * FROM BUILDS
WHERE CPU='Intel' AND 'OVERCLOCKED'='true'
AND Price < 3000
AND GPU='GeForce RTX 3060'
AND ...
From my understanding, SQL will first do a scan on the BUILDS table and then filter out all the builds where CPU is using intel. From this subset, it then does another WHERE clause to filter 'OVERCLOCEKD' = true so on and so forth. Basically, all of the additional WHERE clauses have a smaller number of rows to filter.
One thing we can do to speed up this query is to create an index on these columns. The main increase in performance is reducing the initial scan on the whole table for the first clause that the database looks at. So in the example above instead of scanning the whole db to find builds that are using intel it can quickly retrieve them since it is indexed.
How would you model this data in DynamoDB? I know you can create a bunch of secondary Indexes but instead of letting the engine do the WHERE clause and passing along the result to do the next set of filtering. It seems like you would have to do all of this yourself. For example, we would need to use our secondary index to find all the builds that use intel, overclocked, less than 3000, and using a specific GPU and then we would need to find the intersection ourselves. Is there a better way to map out this access pattern? I am having a hard time figuring out if this is even possible.
EDIT:
I know I could also just use a normal filter but it seems like this would be pretty expensive since it basically brute force search through the table similar to the SQL solution without indexing.
To see what I mean from pcpartpicker here is the link to the site with this page: https://pcpartpicker.com/builds/
People basically select multiple filters so it makes designing for access patterns even harder.
I'd highly recommend going through the various AWS presentations on YouTube...
In particular here's a link to The Iron Triangle of Purpose - PIE Theorem chapter of the AWS re:Invent 2018: Building with AWS Databases: Match Your Workload to the Right Database (DAT301) presentation.
DynamoDB provides IE - Infinite Scale and Efficiency.
But you need P - Pattern Flexibility.
You'll need to decide if you need PI or PE.

Combining multiple Firestore queries to get specific results (with pagination)

I am working on small app the allows users to browse items based on various filters they select in the view.
After looking though, the firebase documentation I realised that the sort of compound query that I'm trying to create is not possible since Firestore only supports a single "IN" operator per query. To get around this the docs says to use multiple separate queries and then merge the results on the client side.
https://firebase.google.com/docs/firestore/query-data/queries#query_limitations
Cloud Firestore provides limited support for logical OR queries. The in, and array-contains-any operators support a logical OR of up to 10 equality (==) or array-contains conditions on a single field. For other cases, create a separate query for each OR condition and merge the query results in your app.
I can see how this would work normally but what if I only wanted to show the user ten results per page. How would I implement pagination into this since I don't want to be sending lots of results back to the user each time?
My first thought would be to paginate each separate query and then merge them but then if I'm only getting a small sample back from the db I'm not sure how I would compare and merge them with the other queries on the client side.
Any help would be much appreciated since I'm hoping I don't have to move away from firestore and start over in an SQL db.
Say you want to show 10 results on a page. You will need to get 10 results for each of the subqueries, and then merge the results client-side. You will be overreading quite a bit of data, but that's unfortunately unavoidable in such an implementation.
The (preferred) alternative is usually to find a data model that allows you to implement the use-case with a single query. It is impossible to say generically how to do that, but it typically involves adding a field for the OR condition.
Say you want to get all results where either "fieldA" is "Red" or "fieldB" is "Blue". By adding a field "fieldA_is_Red_or_fieldB_is_Blue", you could then perform a single query on that field. This may seem horribly contrived in this example, but in many use-cases it is more reasonable and may be a good way to implement your OR use-case with a single query.
You could just create a complex where
Take a look at the where property in https://www.npmjs.com/package/firebase-firestore-helper
Disclaimer: I am the creator of this library. It helps to manipulate objects in Firebase Firestore (and adds Cache)
Enjoy!

How to find which kinds are not being used in Google Datastore

There's any way to list the kinds that are not being used in google's datastore by our app engine app without having to look into our code and/or logic? : )
I'm not talking about indexes, which I can list by issuing an
gcloud datastore indexes list
and then compare with the datastore-indexes.xml or index.yaml.
I tried to check datastore kinds statistics and other metadata but I could not find anything useful to help me on this matter.
Should I give up to find ways of datastore providing me useful stats and code something to keep collecting datastore statistics(like data size), during a huge period to have at least a clue of which kinds are not being used and then, only after this research, take a look into our app code to see if the kind Model was removed?
Example:
select bytes from __Stat_Kind__
Store it somewhere and keep updating for a period. If the Kind bytes size does not change than probably the kind is not being used anymore.
The idea is to do some cleaning in datastore.
I would like to find which kinds are not being used anymore, maybe for a long time or were created manually to be used once... You know, like a table in oracle that no one knows what is used for and then if we look into the statistics of that table we would see that this table was only used once 5 years ago. I'm trying to achieve the same in datastore, I want to know which kinds are not being used anymore or were used a while ago, then ask around and backup/delete it if no owner was found.
It's an interesting question.
I think you would be best-placed to audit your code and instill organizational practice that requires this documentation to be performed in future as a business|technical pre-prod requirement.
IIRC, Datastore doesn't automatically timestamp Entities and keys (rightly) aren't incremental. So there appears no intrinsic mechanism to track changes short of taking a snapshot (expensive) and comparing your in-flight and backup copies for changes (also expensive and inconclusive).
One challenge with identifying a Kind that appears to be non-changing is that it could be referenced (rarely) by another Kind and so, while it does not change, it is required.
Auditing your code and documenting it for posterity should not only provide you with a definitive answer (and identify owners) but it pays off a significant technical debt that has been incurred and avoids this and probably future problems (e.g. GDPR-like) requirements that will arise in the future.
Assuming you are referring to records being created/updated, then I can think of the following options
Via the Cloud Console (Datastore > Dashboard) - This lists all your 'Kinds' and the number of records in each Kind. Theoretically, you can take a screen shot and compare the counts so that you know which one has experienced an increase or not.
Use of Created/LastModified Date columns - I usually add these 2 columns to most of my datastore tables. If you have them, then you can have a stored function that queries them. For example, you run a query to sort all of your Kinds in descending order of creation (or last modified date) and you only pull the first record from each one. This tells you the last time a record was created or modified.
I would write a function as part of my App, put it behind a page which requires admin privilege (only app creator can run it) and then just clicking a link on my App would give me the information.

Can I use firebase's GeoFire with Priority in a single query?

I'd like to query my data with a query that will return me everything on a radius (geofire can find that alright) but also within a datetime window.
At the moment I'm storing datetimes as priorities, so it's quite easy to query the array asking for data between 2 priority numbers (corresponding to start/end datetimes).
It's also quite easy to put my data in a GeoFire array and then query it to get the radius.
Can I combine those 2 though? In an easy not too hacky way?
Cheers
Not with a single query. You must either filter client side, or do your query in more than one phase.
This is because:
Each node is limited to a single priority value.
Per this blog post, GeoFire uses the priority to store a geohash, which it uses for the lookups.
The easiest way to deal with this is to do additional filtering client side. If the bandwidth impact starts causing issues, partition your data (e.g. group events by month) and do it in multiple phases.

Is there (or has there been considered) anything like 'merge' or 'batch' setting in Firebase?

In doing a bit more programming with Firebase today, I found myself wishing for a couple of features:
1) Merge set:
Say I have a firebase ref that has the value {a:1,b:2,c:3}.
If I do something like ref.set({a:-1,b:-2}) the new value will (unsurprisingly) be {a:-1,b:-2}.
Instead, imagine ref.mergeSet({a:-1,b:-2}) which would have a result in the value of the ref being {a:-1,b:-2,c:3}.
Now, I realize that I could do something like ref.child("a").set(-1) and ref.child("b").set(-2) to achieve this result, but in at least some cases, I'd prefer to get only a single call to my .on() handler.
This segues into my second idea.
2) Batch set:
In my application I'd like a way to force an arbitrary number of calls to .set to only result in one call to .on in other clients. Something like:
ref.startBatch()
ref.child("a").set(1)
ref.child("b").set(2)
....
ref.endBatch()
In batch mode, .set wouldn't result in a call to .on, instead, the minimal number of calls to .on would all result from calling .endBatch.
I readily admit that these ideas are pretty nascent, and I wouldn't be surprised if they conflict with existing architectural features of Firebase, but I thought I'd share them anyway. I find that I'm having to spend more time ensuring consistency across clients when using Firebase than I expected to.
Thanks again, and keep up the great work.
UPDATE: We've added a new update() method to the Firebase web client and PATCH support to the REST API, which allow you to atomically modify multiple siblings at a particular location, while leaving the other siblings unmodified. This is what you described as "mergeSet" and can be used as follows:
ref.update({a: -1, b: -2});
which will update 'a' and 'b', but leave 'c' unmodified.
OLD ANSWER
Thanks for the detailed feature request! We'd love to hear more about your use case and how these primitives would help you. If you're willing to share more details, email support#firebase.com and we can dig into your scenario.
To answer your question though, the primary reason we don't have these features is related our architecture and the performance / consistency guarantees that we're trying to maintain. Not to go too deep, but if you imagine that your Firebase data is spread across many servers, it's easier for us to have stronger guarantees (atomicity, ordering, etc.) when modifying data that's close in the tree than when modifying data that's far away. So by limiting these guarantees to data that you can replace with a single set() call, we push you in a direction that will perform well with the Firebase architecture.
In some cases, you may be able to get roughly what you want by just reorganizing your tree. For instance, if you know you always want to set 'a' and 'b' together, you could put them under a common 'ab' parent and do ref.child('ab').set({a:-1, b:-2});, which won't affect the 'c' child.
Like I said, we'd love to hear more about your scenario. We're in beta so that we can learn from developers about how they're using the API and where it's falling short! support#firebase.com :-)

Resources