I am working on small app the allows users to browse items based on various filters they select in the view.
After looking though, the firebase documentation I realised that the sort of compound query that I'm trying to create is not possible since Firestore only supports a single "IN" operator per query. To get around this the docs says to use multiple separate queries and then merge the results on the client side.
https://firebase.google.com/docs/firestore/query-data/queries#query_limitations
Cloud Firestore provides limited support for logical OR queries. The in, and array-contains-any operators support a logical OR of up to 10 equality (==) or array-contains conditions on a single field. For other cases, create a separate query for each OR condition and merge the query results in your app.
I can see how this would work normally but what if I only wanted to show the user ten results per page. How would I implement pagination into this since I don't want to be sending lots of results back to the user each time?
My first thought would be to paginate each separate query and then merge them but then if I'm only getting a small sample back from the db I'm not sure how I would compare and merge them with the other queries on the client side.
Any help would be much appreciated since I'm hoping I don't have to move away from firestore and start over in an SQL db.
Say you want to show 10 results on a page. You will need to get 10 results for each of the subqueries, and then merge the results client-side. You will be overreading quite a bit of data, but that's unfortunately unavoidable in such an implementation.
The (preferred) alternative is usually to find a data model that allows you to implement the use-case with a single query. It is impossible to say generically how to do that, but it typically involves adding a field for the OR condition.
Say you want to get all results where either "fieldA" is "Red" or "fieldB" is "Blue". By adding a field "fieldA_is_Red_or_fieldB_is_Blue", you could then perform a single query on that field. This may seem horribly contrived in this example, but in many use-cases it is more reasonable and may be a good way to implement your OR use-case with a single query.
You could just create a complex where
Take a look at the where property in https://www.npmjs.com/package/firebase-firestore-helper
Disclaimer: I am the creator of this library. It helps to manipulate objects in Firebase Firestore (and adds Cache)
Enjoy!
Related
I was watching this Firebase video, and one stuff that wasn't clear to me is that the "||" or OR operator isn't supported especially with the way Firebase store indexes.
It was stated that you have to make separate queries and join it on the client side instead of on the firebase side.
Isn't the in operator essentially just a convenience method and acts like multiple OR statements?
https://firebase.blog/posts/2019/11/cloud-firestore-now-supports-in-queries
It is a well known feature of Firestore indexes which are good for range queries. With the indexes applied , for an inequality query, the backend would still have to scan every document in the collection in order to come up with results, and thus will affect the performance when the number of documents increases with time.
So, as per your question regarding the backend logical working of the “in” operator when used in a query and which is also mentioned in this thread on addition of IN queries not only address this performance issue but also supports up to 10 equality clauses on the same field with a logical OR".The arguments which are passed in the “In” operator query, are compared when searching a document.This will allow you to fetch documents with your filter criteria and thus result in function operation to take less time rather than goind one by one through each item.
For the example you could do:
// Get all documents in 'foo' where status is open or upcoming
db.collection('foo').where('status','in',['open','upcoming']).get()
I would also recommend you to check these following similar examples:
How to perform compound queries with logical OR
How to make queries on firestore
Firebase database operator working
Firestore IN operator working
Firestore Query limitation
I'm building a fairly simple Next.js app that allows users to browse potentially millions of user profiles by 4-5 key facets (last login date, team, tags, a few other basic fields).
I soon discovered that Firestore won't allow me to run the queries I need to, if I wish to support multiple filters. I could get, for example, all the profiles with a matching team, but to filter all the other facets I'd have to do it on the front end and end up invoking far more document reads than I'd like to.
Algolia to the rescue! Works great, but boy is that cost going to rack up. I can't feasibly launch this app with Algolia or I'll have to remortgage my house. I also absolutely do not need text search, just basic faceting, so I really feel like Algolia is overkill here regardless of the cost implications.
I really do not want to start querying Firestore for an insane number of documents and then run the filtering on the client, unless this really is the BEST option for my needs. But I need advice.
Thanks guys!
I soon discovered that Firestore won't allow me to run the queries I need to if I wish to support multiple filters.
That's not true, Firestore does for sure allow multiple filters.
I could get, for example, all the profiles with a matching team, but to filter all the other facets I'd have to do it on the front end
You can create a query and chain multiple whereEqualTo() functions. This means that you can filter by multiple fields at once. Here is an example:
rootRef.whereEqualTo("lastLoginData", lastLoginData).whereEqualTo("team", nameOfTeam);
You can chain as many functions as you want. That's an example for Android, but you can find the corresponding queries for each platform and programming language. If you also need to order the results, don't forget to create an index.
Algolia to the rescue!
Algolia is useful only if you need full-text searches in your app, otherwise, for simple queries, you can use the filters that are currently available.
Getting documents and filtering them o the client is not an option. Always filter the documents on the server and get only the results that you need.
I wanted to get some community consensus on how to achieve the following with the Firebase JS SDK (e.g., in React):
Suppose I have a collection users and I wanted to paginate users that do not have document IDs matching a subset of IDs (O(100-1000)). This subset of excluded IDs is dynamic based on the authenticated user.
It seems the not in query only supports up to 10 entries, so this is out of the question.
It also seems it's not possible to fetch all document IDs and filter on the client side, at least not in the 'firebase' JS SDK.
The only workaround I can think of is to have a document that keeps an array of all users document IDs, pull that document locally and perform the filtering/pagination logic locally. The limitation here is that a document can be at most 1MB, so realistically the single document can store at most O(10K) IDs.
Firestore has a special bunch of methods for pagination which may be useful for you. Those are called "query cursors".
You can use them to define the start point startAt() or startAfter() and to define an end point endAt() or endBefore(). Additionally, if needed, those can be combined with limit method.
I strongly encourage you to check this tutorial. Here you can find a quick video explaining the matter and lot of examples in all popular languages.
My table is (device, type, value, timestamp), where (device,type,timestamp) makes a unique combination ( a candidate for composite key in non-DynamoDB DBMS).
My queries can range between any of these three attributes, such as
GET (value)s from (device) with (type) having (timestamp) greater than <some-timestamp>
I'm using dynamoosejs/dynamoose. And from most of the searches, I believe I'm supposed to use a combination of the three fields (as a single field ; device-type-timestamp) as id. However the set: function of Schema doesn't let me use the object properties (such as this.device) and due to some reasons, I cannot do it externally.
The closest I got (id:uuidv4:hashKey, device:string:GlobalSecIndex, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and
(id:uuidv4:rangeKey, device:string:hashKey, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and so on..
However, while using a Query, it becomes difficult to fetch results of particular device,type as the id, (hashKey or rangeKey) keeps missing from the scene.
So the question. How would you do it for such kind of table?
And point to be noted, this table is meant to gather content from IoT devices, which is generated every 5 mins by each device on an average.
I'm curious why you are choosing DynamoDB for this task. Advanced queries like this seem to be much better suited for a SQL based database as opposed to a NoSQL database. Due to the advanced nature of SQL queries, this task in my experience is a lot easier in SQL databases. So I would encourage you to think about if DynamoDB is truly the right system for what you are trying to do here.
If you determine it is, you might have to restructure your data a little bit. You could do something like having a property that is device-type and that will be the device and type values combined. Then set that as an index, and query based on that and sort by the timestamp, and filter out the results that are not greater than the value you want.
You are correct that currently, Dynamoose does not pass in the entire object into the set function. This is something that personally I'm open to exploring. I'm a member on the GitHub project, and if you would like to submit a PR adding that feature I would be more than happy to help explore that option with you and get that into the codebase.
The other thing you might want to explore is having a DynamoDB stream, that will set that device-type property whenever it gets added to your DynamoDB table. That would abstract that logic out of DynamoDB and your application. I'm not sure if it's necessary for what you are doing to decouple it to that level, but it might be something you want to explore.
Finally, depending on your setup, you could figure out which item will be more unique, device or type, and setup an index on that property. Then just query based on that, and filter out the results of the other property that you don't want. I'm not sure if that is what you are looking for, it will of course work, but I'm not sure how many items you will have in your table, and there get to be questions about scalability at a certain level. One way to solve some of those scalability questions might be to set the TTL of your items if you know that you the timestamp you are querying for is constant, or predictable ahead of time.
Overall there are a lot of ways to achieve what you are looking to do. Without more detail about how many items, what exactly those properties will be doing, the amount of scalability you require, which of those properties will be most unique, etc. it's hard to give a good solution. I would highly encourage you to think about if NoSQL is truly the best way to go. That query you are looking to do seems a LOT more like a SQL query. Not saying it's impossible in DynamoDB, but it will require some thought about how you want to structure your data model, and such.
Considering opinion of #charlie-fish, I decided to jump into Dynamoose and improvise the code to pass the model to the set function of the attribute. However, I discovered that the model is already being passed to default parameter of the attribute. So I changed my Schema to the following:
id:hashKey;default: function(model){ return model.device + "" + model.type; }
timestamp:rangeKey
For anyone landing here on this answer, please note that the default & set functions can access attribute options & schema instance using this . However both those functions should be regular functions, rather than arrow functions.
Keeping this here as an answer, but I won't accept it as an answer to my question for sometime, as I want to wait for someone else to hit out a better approach.
I also want to make sure that if a value is passed for id field, it shouldn't be set. For this I can use set to ignore the actual incoming value, which I don't know how, as of yet.
I used opentsdb to save my time series data. Of each data point input, I must get 20 value of data points before. But, I have a large numbers of metrics, I can not call query opentsdb api too many times. How can I do to reduce numbers of query from openTSDB?
As far as I know you can't aggregate different metrics into one single result. But I would suggest two solutions:
You can put multiple metrics queries in one call. If you use HTTP
API endpoint you can do something like this:
http://otsdb:4242/api/query?start=15m-ago&m=avg:metric1{tag1=a}&m=avg:metric2{tag2=b}
You get the results for all queries with the same start(end) dates/times. But with multiple metrics don't forget that it will take longer time...
Redefine your time series.I don't know any details about your data, but if you're going to store and use data you should also think about usage - What queries am I going to use? How often? What would be the most common access to the data? And so on...
That's also what's advised from OpenTSDB documentation [1]:
Cardinality also affects query speed a great deal, so consider the queries you will be performing frequently and optimize your naming schema for those.
So, I would suggest to use tags to overcome this issue of multiple metrics. But as I mentioned I don't know your schema, but OpenTSDB is much more powerful with tags - there are many examples and also filtering options as well.
Edit 1:
From OpenTSDB 2.3 version there is also expression api: http://opentsdb.net/docs/build/html/api_http/query/exp.html
You should be able to handle multiple metric queries together (but I've never used that for any query).