How to query NotEqual condition in GQL - google-cloud-datastore

I want to filter out some data. I tried different criterias
How to query using GQL Not equal condition ?

This isn't possible with Cloud Datastore. See the list of operators here:
https://cloud.google.com/datastore/docs/reference/gql_reference#operators_and_comparisons
Comparators are either equivalence comparators: =, IN, CONTAINS, = NULL, HAS ANCESTOR, and HAS DESCENDANT, or inequality comparators: <, <=, >, and >=.
This is also worth noting:
There is no way to determine whether an entity lacks a value for a property (that is, whether the property has no value). If you use a condition of the form property = NULL, what will occur is a check whether a null value is explicitly stored for that property. Datastore queries that refer to a property will never return entities that don't have a value for that property.

Related

Are built-in index based ancestor queries efficient?

The indexes doc at https://cloud.google.com/datastore/docs/concepts/indexes says that built-in single property indexes can support
Queries using only ancestor and equality filters
Queries using only inequality filters (which are limited to a single property)
Since the built-in index for the property is sorted by the property value, I understand how it supports a single inequality filter. However, how is it able to support the equality filter with ancestor query? Say I have a million rows for the same property value, but the given ancestor condition only matches 100 rows within those million rows, would it have to scan all the million rows to find the 100 matching rows? I don't think that's the case as some where I read that Cloud Datastore scales with the number of rows in the result set and not the number of rows in the database. So, unless the single property index is internally a multi-column index with first column as the property and the second column as the entity key, I don't see how these ancestor + equality queries can be efficiently supported with built-in single property queries.
Cloud Datastore built-in indexes are always split into a prefix and a postfix at query time. The prefix portion is the part that remains the same (eg equalities or ancestors), the postfix portion is the part that changes (sort order).
Builtin indexes are laid out:
Kind, PropertyName, PropertyValue, Key
For example, a query: FROM MyKind WHERE A > 1
Would divide the prefix/postfix as:
MyKind,A | range<1, inf>
In the case you're asking about (ancestor with equality), FROM MyKind WHERE __key__ HAS ANCESTOR Key('MyAncestor', 1) AND A = 1 the first part of the prefix is easy:
MyKind,A,1
To understand the ancestor piece, we have to consider that Datastore keys are a hierarchy. In the case of MyKind, the keys might looks like: (MyAncestor, 1, MyKind, 345).
This means we can make the prefix for an ancestor + equality query as:
MyKind,A,1,(MyAncestor, 1)
The postfix would then just be all the keys that have (MyAncestor,1) as a prefix and A=1.
This is why you can have an equality with an ancestor using the built-in indexes, but not an inequality with an ancestor.
If you're interested, the video Google I/O 2010 - Next gen queries dives into this in depth.
According to this documentation "The rows of an index table are sorted first by ancestor and then by property values, in the order specified in the index definition."

How to assert that an array contains an element that matches an expected regexp?

I am trying to assert that an array contains at least one element that matches an expected regular expression, and I am trying to do it in the most by-the-book and the least reinventing-the-wheel way possible.
Ideally I would like to ->assertThat($actualArray, $someConstraintObject), where $someConstraintObject is a PHPUnit\Framework\Constraint\Constraint instance that does the RightThing™, and is only composed from constraints that ship with PHPUnit, without having to write a new constraint class.
One thing I considered which cannot possibly work: a LogicalOr constraint made of RegularExpression constraints. Why can't it work? Because RegularExpression constraint cannot "bind" an expected value; it can only be evaluated for an expected value, and I still have no way of "spreading" the values in an array over that constraint with "or" logic. A LogicalOr of RegularExpressions would allow me to check one actual value against multiple regular expressions, but it won't allow me to check multiple actual values against one regular expression.
How do I?

Cloud Firestore whereNotEqual

Does Firestore support something like whereNotEqual?
For example, I need to get exact documents where key "xyz" is missing.
In Firebase realtime db, we could get it by calling *.equalTo(null).
Thanks.
Firestore does not support a direct equivalent of !=. The supported query operators are <, <=, ==, >, or >= so there's no "whereNotEqual".
You can test if a field exists at all, because all filters and order bys implicitly create a filter on whether or not a field exists. For example, in the Android SDK:
collection.orderBy("name")
would return only those rows that contain a "name" field.
As with explicit comparison there's no way to invert this query to return those rows where a value does not exist.
There are a few work-arounds. The most direct replacement is to explicitly store null then query collection.whereEqualTo("name", null). This is somewhat annoying though because if you don't populate this from the outset you have to backfill existing data once you want to do this. If you can't upgrade all your clients you'll need to deploy a function to keep this field populated.
Another possibility is to observe that usually missing fields indicate that a document is only partially assembled perhaps because it goes through some state machine or is a sort of union of two non-overlapping types. If you explicitly record the state or type as a discriminant you can query on that rather than field non-presence. This works really well when there are only two states/types but gets messy if there are many states.
Cloud Firestore now supports whereNotEqualTo in database queries.
Keep in mind if you have more than one field in your query you may have to create a composite index in Cloud Firestore.

Datastore: `SELECT * FROM Entity WHERE property IS NULL` returns no result despite entities without property

I am using Google Cloud Datastore and successfully stored entities.
Now I am trying to query them based on the presence of a property.
I use the "Query by GQL" tab in the Datastore UI: https://console.cloud.google.com/datastore/entities/query/gql
For some entities, I did not specify the property property when I saved them, so I expect these to be set to NULL.
However, when I query SELECT * FROM Entity WHERE property IS NULL, no result is returned.
This is working as intended:
According to the Datastore documentation:
Null is a value, not the absence of a value.
There is no way to determine whether an entity lacks a value for a property (that is, whether the property has no value). If you use a condition of the form nonexistent = NULL, what will occur is a check whether a null value is explicitly stored for that property. For example, SELECT * FROM Task WHERE nonexistent = NULL will never yield an entity with no value set for property nonexistent.
So to explicitly retrieve entities with property set to NULL, you would have to store the property: NULL for these entities.
What you are trying to achieve is not possible. NULL is not the same as property not present.
But as mentioned by Nick Johnson here,
The workaround would be to provide a default value for the updated property, and query for that value
I haven't tried this personally, and I am not sure if it works, but I suggest you to try this.
If it doesn't work, the only other option is to go through every record of the model and update all the existing entities which do not have a value, with a default value.

Operator.IN in Google Datastore

According to Datastore Queries there is Operator.IN keyword, allowing to specify multiple query values in single request.
However, it looks absent in gcloud-java-datastore:0.2.2.
What's the workaround to minimize the round-trip time of multiple single requests?
Is there any limitation on how many parallel queries are allowed?
The IN operator is a client-side feature of the Python NDB Client Library, it is not a native Cloud Datastore feature.
Under the covers, the client library splits the query by the IN clause and issues a separate query for each of values. It will then merge all the results together client-side to give you the result.
Since it is a client-side feature, you'll not that other query features cannot really be used with it, such as paging/cursors.
Alternative
If you are issue a static list of values for the IN clause (e.g. 'NEW', 'OPEN', 'ASSIGNED'), consider creating a Boolean field that is set at write-time (e.g. 'is_active') that pre-calcs the total IN clause for the entity.
This will perform better and work in client libraries other than NDB.
In updated documentation on Datastore Queries Operator.IN is not present anymore.
According to docs, there is no difference between IN and = operator:
Comparators are either equivalence comparators: =, IN, CONTAINS, = NULL, HAS ANCESTOR, and HAS DESCENDANT, or inequality comparators: <, <=, >, >=, !=, NOT IN.
Notice that the operator = is another name for the IN and CONTAINS operators.

Resources