Does Firestore support something like whereNotEqual?
For example, I need to get exact documents where key "xyz" is missing.
In Firebase realtime db, we could get it by calling *.equalTo(null).
Thanks.
Firestore does not support a direct equivalent of !=. The supported query operators are <, <=, ==, >, or >= so there's no "whereNotEqual".
You can test if a field exists at all, because all filters and order bys implicitly create a filter on whether or not a field exists. For example, in the Android SDK:
collection.orderBy("name")
would return only those rows that contain a "name" field.
As with explicit comparison there's no way to invert this query to return those rows where a value does not exist.
There are a few work-arounds. The most direct replacement is to explicitly store null then query collection.whereEqualTo("name", null). This is somewhat annoying though because if you don't populate this from the outset you have to backfill existing data once you want to do this. If you can't upgrade all your clients you'll need to deploy a function to keep this field populated.
Another possibility is to observe that usually missing fields indicate that a document is only partially assembled perhaps because it goes through some state machine or is a sort of union of two non-overlapping types. If you explicitly record the state or type as a discriminant you can query on that rather than field non-presence. This works really well when there are only two states/types but gets messy if there are many states.
Cloud Firestore now supports whereNotEqualTo in database queries.
Keep in mind if you have more than one field in your query you may have to create a composite index in Cloud Firestore.
Related
I've been reading a DynamoDB docs and was unable to understand if it does make sense to query on Global Secondary Index with a usage of 'contains' operator.
My problem is as follows: my dynamoDB document has a list of embedded objects, every object has a 'code' field which is unique:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
]
}
I want to be able to get all documents that contain entities with entity.code = X.
For this purpose I'm considering adding a Global Secondary Index that would contain all entity.codes that are present in current db document separated by a comma. So the example above would look like:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
],
"entitiesGlobalSecondaryIndex":"entityCode1,entityCode2"
}
And then I would like to apply filter expression on entitiesGlobalSecondaryIndex something like: entitiesGlobalSecondaryIndex contains entityCode1.
Would this be efficient or using global secondary index does not make sense in this way and DynamoDB will simply check the condition against every document which is similar so scan?
Any help is very appreciated,
Thanks
The contains operator of a query cannot be run on a partition Key. In order for a query to use any sort of operators (contains, begins with, > < ect...) you must have a range attributes- aka your Sort Key.
You can very well set up a GSI with some value as your PK and this code as your SK. However, GSIs are replication of the table - there is a slight potential for the data ina GSI to lag behind that of the master copy. If the query you're doing against this GSI isn't very often, then you're probably safe from that.
However. If you are trying to do this to the entire table at once then it's no better than a scan.
If what you need is a specific Code to return all its documents at once, then you could do a GSI with that as the PK. If you add a date field as the SK of this GSI it would even be time sorted. If you query against that code in that index, you'll get every single one of them.
Since you may have multiple codes, if they aren't too many per document, you maybe could use a Sparse Index - if you have an entity with code "AAAA" then you also have an attribute named AAAA (or AAAAflag or something.) It is always null/does not exist Unless the entities contains that code. If you do a GSI on this AAAflag attribute, it will only contain documents that contain that entity code, and ignore all where this attribute does not exist on a given document. This may work for you if you can also provide a good PK on this to keep the numbers well partitioned and if you don't have too many codes.
Filter expressions by the way are different than all of the above. Filter expressions are run on tbe data that would be returned, after it is already read out of the table. This is useful I'd you have a multi access pattern setup, but don't want a particular call to get all the documents associated with a particular PK - in the interests of keeping the data your code is working with concise. The query with a filter expression still retrieves everything from that query, but only presents what makes it past the filter.
If are only querying against a particular PK at any given time and you want to know if it contains any entities of x, then a Filter expressions would work perfectly. Of course, this is only per PK and not for your entire table.
If all you need is numbers, then you could do a count attribute on the document, or a meta document on that partition that contains these values and could be queried directly.
Lastly, and I have no idea if this would work or not, if your entities attribute is a map type you might very well be able to filter against entities code - and maybe even with entities.code.contains(value) if it was an SK - but I do not know if this is possible or not
I am trying to figure out (at this point I think the answer is No) if it is possible to build a index on a List Attribute and query NOT_CONTAINS on that attribute.
Example table:
Tasks
Task_id: string
solved_by: List<String> # stores list of user_ids who previously solved this task.
My query would be:
Get me all the tasks not yet solved by current_user
select * from tasks where tasks.solved_by NOT_CONTAINS current_user_id
Is it possible to do this without full scans. I tried creating an attribute of type L but aws cli errors out saying Member must satisfy enum value set: [B, N, S]
If this is not possible with dynamodb, please suggest what datastore I can use.
Any help is highly appreciated. Thanks!
As you found out, and as the error you got suggests, this is NOT possible.
However, I'd argue if your design couldn't be improved. Storing a potentially unbound list of entries (users in your case) inside a single item, which is limited to 400kb seems dangerous.
If instead, you'd store for each task the information that a particular user resolved it as a separate item (partition key - task_id, sort key - user_id) than you could easily look up if a user solved a task or not. You could also store additional information about the particular solution or attempts.
If you haven't heard of DynamoDB single table design yet, or how to overload indexes, I can recommend looking at
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html
https://www.dynamodbbook.com/
Update
I just realised, you care about a negation (NOT_CONTAINS) - for those, you can't use an index anyway. For the sort key you can only use positive comparison (=, <, >, <=, >=, between, begins_with): https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.KeyConditionExpressions
So you might have to rethink the whole approach, to better pre-process the data stored in DDB, so it's easier to fetch, or pick a different database.
In your original question, you defined your access pattern as
Get me all the tasks not yet solved by current_user
In a later comment, you clarified that the access pattern is
A solver should be shown a task that is not yet solved by them.
which is a slightly different access pattern.
Here's one way you could fetch a task not yet solved by a user.
In this data model, I chose to model Users and Tasks as separate items. Tasks have numerically increasing ID's. Each User item should start with the lastSolved attribute set to 1. Each time you fetch a new Task for a user, you fetch TASK#{last_solved+1} and increment the lastSolved attribute by 1.
You could probably take a similar approach by using timestamps instead of numbers... anything sortable, really.
Based on new version of firebase dependencies in flutter, there is new filters such like isNotEqualTo, arrayContains, arrayContainsAny, whereIn and so on.
I try to create streambuilder with some filters, I want to get documents that matches these conditions.
The "uid" field value Does not equal current uid. (isNotEqualTo)
The "school" field value isEqualTo current user school.(isEqualTo)
The "random" field value isGreaterThanOrEqualTo random number that the app generated it (isGreaterThanOrEqualTo)
stream: fireStore.collection(USERS_COLLECTION).where(UID_FIELD,
isNotEqualTo: me.uid).where(SCHOOL_FIELD, isEqualTo:
me.school).where(RANDOM_FIELD, isGreaterThanOrEqualTo:
random).limit(10).snapshots(),
Condition 1 (isNotEqualTo) does not work with me, but 2 and 3 it's working fine, when I add the filter 1 there is error showing to me.
All where filters with an inequality (<, <=, >, or >=) must be on the
same field. But you have inequality filters on 'FieldPath([uid])' and
'FieldPath([random])'. 'package:cloud_firestore/src/query.dart':
Failed assertion: line 486 pos 18: 'hasInequality == field'
That is a Firestore-Limitation
https://firebase.google.com/docs/firestore/query-data/queries
Note the following limitations for != queries:
Only documents where the given field exists can match the query.
You can't combine not-in and != in a compound query.
In a compound query, range (<, <=, >, >=) and != comparisons must all filter on the same field.
So you can either filter using != on the uid OR you can filter using >= on Random.
You cannot do both in the same query.
I'd recommend filtering without the UID and let your repository filter out the UID later, once you've received the snapshots.
actually you can filter on multiple fields by creating a composite index.
Cloud Firestore uses composite indexes to support queries not already supported by single-field indexes.
Cloud Firestore does not automatically create composite indexes like it does for single-field indexes because of the large number of possible field combinations. Instead, Cloud Firestore helps you identify and create required composite indexes as you build your app.
If you attempt the query above without first creating the required index, Cloud Firestore returns an error message containing a link you can follow to create the missing index. This happens any time you attempt a query not supported by an index. You can also define and manage composite indexes manually by using the console or by using the Firebase CLI. For more on creating and managing composite indexes, see Managing Indexes.
https://firebase.google.com/docs/firestore/query-data/index-overview#composite_indexes
Also if in your code you call requireData from your snapshot you will see in your logs a direct link to create that index on Firebase.
According to Datastore Queries there is Operator.IN keyword, allowing to specify multiple query values in single request.
However, it looks absent in gcloud-java-datastore:0.2.2.
What's the workaround to minimize the round-trip time of multiple single requests?
Is there any limitation on how many parallel queries are allowed?
The IN operator is a client-side feature of the Python NDB Client Library, it is not a native Cloud Datastore feature.
Under the covers, the client library splits the query by the IN clause and issues a separate query for each of values. It will then merge all the results together client-side to give you the result.
Since it is a client-side feature, you'll not that other query features cannot really be used with it, such as paging/cursors.
Alternative
If you are issue a static list of values for the IN clause (e.g. 'NEW', 'OPEN', 'ASSIGNED'), consider creating a Boolean field that is set at write-time (e.g. 'is_active') that pre-calcs the total IN clause for the entity.
This will perform better and work in client libraries other than NDB.
In updated documentation on Datastore Queries Operator.IN is not present anymore.
According to docs, there is no difference between IN and = operator:
Comparators are either equivalence comparators: =, IN, CONTAINS, = NULL, HAS ANCESTOR, and HAS DESCENDANT, or inequality comparators: <, <=, >, >=, !=, NOT IN.
Notice that the operator = is another name for the IN and CONTAINS operators.
I'm concerned about read performance, I want to know if putting an indexed field value as null is faster than giving it a value.
I have lots of items with a status field. The status can be, "pending", "invalid", "banned", etc...
my typical request is to find the status "ok" (or null). Since null fields are not saved to datastore, it is already a win to avoid to have a "useless" default value I can replace with null. So I already have less disk space use.
But I was wondering, since datastore is noSql, it doesn't know about the data structure and it doesn't know there is a missing column status. So how does it do the status = null request check?
Does it have to check all columns of each row trying to find my column? or is there some smarter mechanism?
For example, index (null=Entity,key) when we pass a column explicitly saying it is null (if this is the case, does Objectify respect that and keep the field in the list when passing it to the native API if it's null?)
And mainly, which request is more efficient?
The low level API (and Objectify) stores and indexes nulls if you specify that a field/property should be indexed. For Objectify, you can specify #Ignore(IfNull.class) or #Unindex(IfNull.class) if you want to alter this behavior. You are probably confusing this with documentation for other data access APIs.
Since GAE only allows you to query for indexed fields, your question is really: Is it better to index nulls and query for them, or to query for everything and filter out non-null values?
This is purely a question of sparsity. If the overwhelming majority of your records contain null values, then you're probably better off querying for everything and filtering out the ones you don't want manually. A handful of extra entity reads are probably cheaper than updating and storing an extra index. On the other hand, if null records are a small percentage of your data, then you will certainly want the index.
This indexing dilema is not unique to GAE. All databases present this question with respect to low-cardinality fields; it's just that they'll do the table scan (testing and skipping rows) for you.
If you really want to fine-tune this behavior, read Objectify's documentation on Partial Indexes.
null is also treated as a value in datastore and there will be entries for null values in indexes. Datastore doc says, "Datastore distinguishes between an entity that does not possess a property and one that possesses the property with a null value"
Datastore will never check all columns or all records. If you have this property indexed, it will get records from the index only If not indexed, you cannot query by that property.
In terms of query performance, it should be the same, but you can always profile and check.