In firestore I'm wondering if there is a way to have a hueristic1 and get all data between two hueristic1 values but order the results based on a hueristic2.
I ask because the data at bottom of both pages
https://firebase.google.com/docs/firestore/query-data/order-limit-data
https://firebase.google.com/docs/firestore/query-data/query-cursors
there seems to be slightly contradictory documentation.
What I want to be able to do is
Ref.startAt(some h1 value).endAt(some second h1 value).orderBy(h2).
I know I'd probably have to index by h1 but even then I'm not sure if there is a way to do this.
Update:
I didn't test this well enough to see that is doesn't produce the desired ordering. The OP asked the question again and got an answer from a Firebase team member:
Because Cloud Firestore doesn't support ordering by a different field
than the supplied inequality, you won't be able to sort by name
directly from the query. Instead you'd need to sort client-side once
you've fetched the data.
The API supports the capability you want, although I don't see an example in the documentation that shows it.
The ordering of the query terms is important. Suppose you have a collection of cities and the fields of interest are population (h1) and name (h2). To get the cities with population in range 1000 to 2000, ordered by name, the query would be:
citiesRef.orderBy("population").orderBy("name").startAt(1000).endAt(2000)
This query requires a composite index, which you can create manually in the console. Or as the documentation there indicates, the system will help you:
Instead of defining a composite index manually, run your query in your
app code to get a link for generating the required index.
Related
Let's say I have a collection of cars and I want to filter them by price range and by year range. I know that Firestore has strict limitations due performance reasons, so something like:
db.collection("products")
.where('price','>=', 70000)
.where('price','<=', 90000)
.where('year','>=', 2015)
.where('year','<=', 2018)
will throw an error:
Invalid query. All where filters with an inequality (<, <=, >, or >=) must be on the same field.
So is there any other way to perform this kind of query without local data managing? Maybe some kind of indexing or tricky data organization?
The error message and documentation are quite explicit on this: a Firestore query can only perform range filtering on a single field. Since you're trying to filter ranges on both price and year, that is not possible in a single Firestore query.
There are two common ways around this:
Perform filtering on one field in the query, and on the other field in your client-side code.
Combine the values of the two range into a single field in some way that allows your use-case with a single field. This is incredibly non-trivial, and the only successful example of such a combination that I know of is using geohashes to filter on latitude and longitude.
Given the difference in effort between these two, I'd recommend picking the first option.
A third option is to model your data differently, as to make it easier to implement your use-case. The most direct implementation of this would be to put all products from 2015-2018 into a single collection. Then you could query that collection with db.collection("products-2015-2018").where('price','>=', 70000).where('price','<=', 90000).
A more general alternative would be to store the products in a collection for each year, and then perform 4 queries to get the results you're looking for: one of each collection products-2015, products-2016, products-2017, and products-2018.
I recommend reading the document on compound queries and their limitations, and watching the video on Cloud Firestore queries.
You can't do multiple range queries as there are limitations mentioned here, but with a little cost to the UI, you can still achieve by indexing the year like this.
db.collection("products")
.where('price','>=', 70000)
.where('price','<=', 90000)
.where('yearCategory','IN', ['new', 'old'])
Of course, new and old go out of date, so you can group the years into yearCategory like yr-2014-2017, yr-2017-2020 so on. The in can only take 10 elements per query so this may give you an idea of how wide of a range to index the years.
You can write to yearCategory during insert or, if you have a large range such as a number of likes, then you'd want another process that polls these data and updates the category.
In Flutter You can do something like this,
final _queryList = await db.collection("products").where('price','>=', 70000).get();
final _docL1 = _querList.where('price','<=', 90000);
Add more queries as you want, but for firestore, you can only request a limited number of queries, and get the data. After that you can filter out according to your need.
I am working on small app the allows users to browse items based on various filters they select in the view.
After looking though, the firebase documentation I realised that the sort of compound query that I'm trying to create is not possible since Firestore only supports a single "IN" operator per query. To get around this the docs says to use multiple separate queries and then merge the results on the client side.
https://firebase.google.com/docs/firestore/query-data/queries#query_limitations
Cloud Firestore provides limited support for logical OR queries. The in, and array-contains-any operators support a logical OR of up to 10 equality (==) or array-contains conditions on a single field. For other cases, create a separate query for each OR condition and merge the query results in your app.
I can see how this would work normally but what if I only wanted to show the user ten results per page. How would I implement pagination into this since I don't want to be sending lots of results back to the user each time?
My first thought would be to paginate each separate query and then merge them but then if I'm only getting a small sample back from the db I'm not sure how I would compare and merge them with the other queries on the client side.
Any help would be much appreciated since I'm hoping I don't have to move away from firestore and start over in an SQL db.
Say you want to show 10 results on a page. You will need to get 10 results for each of the subqueries, and then merge the results client-side. You will be overreading quite a bit of data, but that's unfortunately unavoidable in such an implementation.
The (preferred) alternative is usually to find a data model that allows you to implement the use-case with a single query. It is impossible to say generically how to do that, but it typically involves adding a field for the OR condition.
Say you want to get all results where either "fieldA" is "Red" or "fieldB" is "Blue". By adding a field "fieldA_is_Red_or_fieldB_is_Blue", you could then perform a single query on that field. This may seem horribly contrived in this example, but in many use-cases it is more reasonable and may be a good way to implement your OR use-case with a single query.
You could just create a complex where
Take a look at the where property in https://www.npmjs.com/package/firebase-firestore-helper
Disclaimer: I am the creator of this library. It helps to manipulate objects in Firebase Firestore (and adds Cache)
Enjoy!
I am wondering how to filter firebase firestore by date field, as I can't see data type other than String, Number, Boolean
As below, please advice if someone find a way to filter firestore collection based on date field.
I think it's not possible at the moment. I found this documentation on GCP which is practically the same UI. There is a little bit about filtering, but not many details.
I think that this UI is just for support/test purposes not for everyday use so such feature is not really needed. It's working with API with no problem ( you can check example in JS in this SO question).
If you need this feature you should raise a Feature Request here.
I believe the only thing you can do at this point is apply a filter on your timestamp fields and set the sort order without a condition. The results will show the timestamp field value so it is easy to scan for the time range you are interested in.
Obviously this is not awesome for large document sets where the interesting date is somewhere in the middle but at least sort order will let choose if you want to scroll from the first or last documents by date.
This approach is described in a Firebase Blog post:
Sort and Filter in the Firestore Console
My table is (device, type, value, timestamp), where (device,type,timestamp) makes a unique combination ( a candidate for composite key in non-DynamoDB DBMS).
My queries can range between any of these three attributes, such as
GET (value)s from (device) with (type) having (timestamp) greater than <some-timestamp>
I'm using dynamoosejs/dynamoose. And from most of the searches, I believe I'm supposed to use a combination of the three fields (as a single field ; device-type-timestamp) as id. However the set: function of Schema doesn't let me use the object properties (such as this.device) and due to some reasons, I cannot do it externally.
The closest I got (id:uuidv4:hashKey, device:string:GlobalSecIndex, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and
(id:uuidv4:rangeKey, device:string:hashKey, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and so on..
However, while using a Query, it becomes difficult to fetch results of particular device,type as the id, (hashKey or rangeKey) keeps missing from the scene.
So the question. How would you do it for such kind of table?
And point to be noted, this table is meant to gather content from IoT devices, which is generated every 5 mins by each device on an average.
I'm curious why you are choosing DynamoDB for this task. Advanced queries like this seem to be much better suited for a SQL based database as opposed to a NoSQL database. Due to the advanced nature of SQL queries, this task in my experience is a lot easier in SQL databases. So I would encourage you to think about if DynamoDB is truly the right system for what you are trying to do here.
If you determine it is, you might have to restructure your data a little bit. You could do something like having a property that is device-type and that will be the device and type values combined. Then set that as an index, and query based on that and sort by the timestamp, and filter out the results that are not greater than the value you want.
You are correct that currently, Dynamoose does not pass in the entire object into the set function. This is something that personally I'm open to exploring. I'm a member on the GitHub project, and if you would like to submit a PR adding that feature I would be more than happy to help explore that option with you and get that into the codebase.
The other thing you might want to explore is having a DynamoDB stream, that will set that device-type property whenever it gets added to your DynamoDB table. That would abstract that logic out of DynamoDB and your application. I'm not sure if it's necessary for what you are doing to decouple it to that level, but it might be something you want to explore.
Finally, depending on your setup, you could figure out which item will be more unique, device or type, and setup an index on that property. Then just query based on that, and filter out the results of the other property that you don't want. I'm not sure if that is what you are looking for, it will of course work, but I'm not sure how many items you will have in your table, and there get to be questions about scalability at a certain level. One way to solve some of those scalability questions might be to set the TTL of your items if you know that you the timestamp you are querying for is constant, or predictable ahead of time.
Overall there are a lot of ways to achieve what you are looking to do. Without more detail about how many items, what exactly those properties will be doing, the amount of scalability you require, which of those properties will be most unique, etc. it's hard to give a good solution. I would highly encourage you to think about if NoSQL is truly the best way to go. That query you are looking to do seems a LOT more like a SQL query. Not saying it's impossible in DynamoDB, but it will require some thought about how you want to structure your data model, and such.
Considering opinion of #charlie-fish, I decided to jump into Dynamoose and improvise the code to pass the model to the set function of the attribute. However, I discovered that the model is already being passed to default parameter of the attribute. So I changed my Schema to the following:
id:hashKey;default: function(model){ return model.device + "" + model.type; }
timestamp:rangeKey
For anyone landing here on this answer, please note that the default & set functions can access attribute options & schema instance using this . However both those functions should be regular functions, rather than arrow functions.
Keeping this here as an answer, but I won't accept it as an answer to my question for sometime, as I want to wait for someone else to hit out a better approach.
I also want to make sure that if a value is passed for id field, it shouldn't be set. For this I can use set to ignore the actual incoming value, which I don't know how, as of yet.
As I've worked in my personal lab instance of OpenTSDB, I've started to wonder if it is possible to get it to index on tags as well as metric names. My understanding (correction is welcome...) is that OpenTSDB indexes only on metric names. So, suppose I have something like the following, borrowed from the docs:
tsd.hbase.rpcs{type=*,host=tsd1}
My understanding is that tsd.hbase.rpcs is indexed for searching, but that the keys (type=, host=, etc) are not. Is that correct? If so, is there a way to have them be indexed, or some reasonable approximation of it? Thanks.
Yes you are correct, according to the documentation, OpenTSDB creates keys in the 'tsdb' HBase table of the form
[salt]<metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>]
When you do a query with specific tagk and tagv OpenTSDB can construct the key and look it up. If you have a range of tagk and tagv it will look up all the rows and either aggregate them or return multiple time series, depending on your query.
If you are interested in asking questions about tagks, you should use the OpenTSDB search/lookup api, however this still requires a metric name.
If you want to formulate your question around tagks only, you could consider forwarding your data to Bosun for indexing and using its API
/api/metric/{tagk}/{tagv}
Returns the metrics that are available for the specified tagk/tagv pair. For example, you can see what metrics are available for host=server01