Operator.IN in Google Datastore - google-cloud-datastore

Operator.IN in Google Datastore - google-cloud-datastore

According to Datastore Queries there is Operator.IN keyword, allowing to specify multiple query values in single request.
However, it looks absent in gcloud-java-datastore:0.2.2.
What's the workaround to minimize the round-trip time of multiple single requests?
Is there any limitation on how many parallel queries are allowed?

The IN operator is a client-side feature of the Python NDB Client Library, it is not a native Cloud Datastore feature.
Under the covers, the client library splits the query by the IN clause and issues a separate query for each of values. It will then merge all the results together client-side to give you the result.
Since it is a client-side feature, you'll not that other query features cannot really be used with it, such as paging/cursors.
Alternative
If you are issue a static list of values for the IN clause (e.g. 'NEW', 'OPEN', 'ASSIGNED'), consider creating a Boolean field that is set at write-time (e.g. 'is_active') that pre-calcs the total IN clause for the entity.
This will perform better and work in client libraries other than NDB.

In updated documentation on Datastore Queries Operator.IN is not present anymore.

According to docs, there is no difference between IN and = operator:
Comparators are either equivalence comparators: =, IN, CONTAINS, = NULL, HAS ANCESTOR, and HAS DESCENDANT, or inequality comparators: <, <=, >, >=, !=, NOT IN.
Notice that the operator = is another name for the IN and CONTAINS operators.

Related

Dynamodb: Index on List attribute and query NOT_CONTAINS

I am trying to figure out (at this point I think the answer is No) if it is possible to build a index on a List Attribute and query NOT_CONTAINS on that attribute.
Example table:
Tasks
Task_id: string
solved_by: List<String> # stores list of user_ids who previously solved this task.
My query would be:
Get me all the tasks not yet solved by current_user
select * from tasks where tasks.solved_by NOT_CONTAINS current_user_id
Is it possible to do this without full scans. I tried creating an attribute of type L but aws cli errors out saying Member must satisfy enum value set: [B, N, S]
If this is not possible with dynamodb, please suggest what datastore I can use.
Any help is highly appreciated. Thanks!

As you found out, and as the error you got suggests, this is NOT possible.
However, I'd argue if your design couldn't be improved. Storing a potentially unbound list of entries (users in your case) inside a single item, which is limited to 400kb seems dangerous.
If instead, you'd store for each task the information that a particular user resolved it as a separate item (partition key - task_id, sort key - user_id) than you could easily look up if a user solved a task or not. You could also store additional information about the particular solution or attempts.
If you haven't heard of DynamoDB single table design yet, or how to overload indexes, I can recommend looking at
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html
https://www.dynamodbbook.com/
Update
I just realised, you care about a negation (NOT_CONTAINS) - for those, you can't use an index anyway. For the sort key you can only use positive comparison (=, <, >, <=, >=, between, begins_with): https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.KeyConditionExpressions
So you might have to rethink the whole approach, to better pre-process the data stored in DDB, so it's easier to fetch, or pick a different database.

In your original question, you defined your access pattern as
Get me all the tasks not yet solved by current_user
In a later comment, you clarified that the access pattern is
A solver should be shown a task that is not yet solved by them.
which is a slightly different access pattern.
Here's one way you could fetch a task not yet solved by a user.
In this data model, I chose to model Users and Tasks as separate items. Tasks have numerically increasing ID's. Each User item should start with the lastSolved attribute set to 1. Each time you fetch a new Task for a user, you fetch TASK#{last_solved+1} and increment the lastSolved attribute by 1.
You could probably take a similar approach by using timestamps instead of numbers... anything sortable, really.

How do we apply the ORDER BY and LIMIT,OFFSET Clauses to DAO.fetch(query) in cn1-data-access?

DAO.fetch(query) allows us to get a collection of entities from the sqlite database that meets the query condition. query can be a map or string []. How can we specify ordering with the ORDER BY clause and also how do we apply the LIMIT and OFFSET clauses or do we have to default to db.execute(query)?

Currently ORDER BY, LIMIT, and OFFSET clauses aren't supported. It wouldn't be hard to add. Please file an RFE.
Alternatively it wouldn't be difficult to add this in your own DAO subclass. You can see how fetch(query) is implemented here.

Do a wildcard query in Datastore GQL

I'm trying out Google Cloud's datastore, and have run into a scenario I can't figure out.
I've got two entities of kind searchterm, both with a searchterm property, one "pink chicken", and the other with "red duck".
I'm attempting to use the GQL select * from searchterm where searchterm contains "chicken"
to retrieve the entity that has the searchterm property of "pink chicken". However, it doesn't seem to allow me to do that.
I have to fully state select * from searchterm where searchterm contains "pink chicken" to get the relevant response.
Does contains in GQL not mean what it implies? Would it be possible for me to perform a GQL query that has a wildcard in it to match strings?
Yes, I checked, that searchterm property IS indexed.
Thanks! :D

Cloud Datastore does not support such kind of queries and CONTAINS or contains is just there as equal but it does search substrings. For such cases like yours use the Search API.
You can refer this quote here:
Notice that the operator = is another name for the IN and CONTAINS operators. For example, <value> = <property-name> is the same as <value> IN <property-name>, and <property-name> = <value> is the same as <property-name> CONTAINS <value>. Also <property-name> IS NULL is the same as <property-name> = NULL.
And about the fact that datastore does not support this kind of queries refer to this link:
Restrictions on queries
The nature of the index query mechanism imposes certain restrictions on what a query can do. Cloud Datastore queries do not support substring matches, case-insensitive matches, or so-called full-text search. The NOT, OR, and != operators are not natively supported, but some client libraries may add support on top of Cloud Datastore.

You can use '%' in GQL, so try filtering the query for '%chicken%' and should bring the results you're looking for.

Cloud Firestore whereNotEqual

Does Firestore support something like whereNotEqual?
For example, I need to get exact documents where key "xyz" is missing.
In Firebase realtime db, we could get it by calling *.equalTo(null).
Thanks.

Firestore does not support a direct equivalent of !=. The supported query operators are <, <=, ==, >, or >= so there's no "whereNotEqual".
You can test if a field exists at all, because all filters and order bys implicitly create a filter on whether or not a field exists. For example, in the Android SDK:
collection.orderBy("name")
would return only those rows that contain a "name" field.
As with explicit comparison there's no way to invert this query to return those rows where a value does not exist.
There are a few work-arounds. The most direct replacement is to explicitly store null then query collection.whereEqualTo("name", null). This is somewhat annoying though because if you don't populate this from the outset you have to backfill existing data once you want to do this. If you can't upgrade all your clients you'll need to deploy a function to keep this field populated.
Another possibility is to observe that usually missing fields indicate that a document is only partially assembled perhaps because it goes through some state machine or is a sort of union of two non-overlapping types. If you explicitly record the state or type as a discriminant you can query on that rather than field non-presence. This works really well when there are only two states/types but gets messy if there are many states.

Cloud Firestore now supports whereNotEqualTo in database queries.
Keep in mind if you have more than one field in your query you may have to create a composite index in Cloud Firestore.

Riak search queries via the java client

I am trying to perform queries using the OR operator as following:
MapReduceResult result = riakClient.
mapReduce("some_bucket", "Name:c1 OR c2").
addMapPhase(new NamedJSFunction("Riak.mapValuesJson"), true).
execute();
I only get the 1st object in the query (where name='c1').
If I change the order of the query (i.e. Name:c2 OR c1) again I get only the first object in query (where name='c2').
is the OR operator (and other query operators) supported in the java client?

I got this answer from Basho engeneer, Sean C.:
You either need to group the terms or qualify both of them. Without a field identifier, the search query assumes that the default field is being searched. You can determine how the query will be interpreted by using the 'search-cmd explain' command. Here's two alternate ways to express your query:
Name:c1 OR Name:c2
Name:(c1 OR c2)
both options worked for me!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Operator.IN in Google Datastore - google-cloud-datastore

In updated documentation on Datastore Queries Operator.IN is not present anymore.

Related

Dynamodb: Index on List attribute and query NOT_CONTAINS

How do we apply the ORDER BY and LIMIT,OFFSET Clauses to DAO.fetch(query) in cn1-data-access?

Do a wildcard query in Datastore GQL

Cloud Firestore whereNotEqual

Riak search queries via the java client

Categories

Resources