Datastore: Is there plan to add GQLQuery support? - google-cloud-datastore

I am using gcloud-python library for a project which needs to serve following use case:
Get a batch of entities with a subset of its properties (projection)
gcloud.datastore.api.get_multi() provides me batch get but not projection
and gcloud.datastore.api.Query() provides me projection but not batch get (like a IN query)
AFAIK, GQLQuery provides both IN query(batch get) and projections. Is there a plan to support GQLQueries in gcloud-python library? OR, is there another way to get batching and projection in single request?

Currently there is no way to request a subset of an entities properties. When you have the list of keys that you need, you should use get_multi().
Projection Query Background
In Datastore, projection queries are simply index scans.
For example, consider you are writing the query SELECT * FROM MyKind ORDER BY myFirstProp, mySecondProp. This query will execute against an index: Index(MyKind, myFirstProp, mySecondProp). This index may look something like:
myFirstProp | mySecondProp | __key__
------------------------------------
a 1 k1
a 2 k2
b 1 k3
For each result in the index, Datastore then looks up the key associated with that index result. If you do a projection query where you project only myFirstProp or mySecondProp or both, Datastore can avoid doing the random access lookup to find the associated entity for each result. This is generally where you get the large performance gain from using projections -- not from the savings of transporting it over the network.
Likewise, if you know the list of keys that you need, you can lookup the key directly -- there is no need to look in an index first.
IN Operator
In Python GQL (not in the similar Cloud Datastore GQL), there is the IN operator, which allows you to write a query that looks something like:
SELECT * FROM MyKind WHERE myFirstProp IN ['a', 'b'].
However, Datastore does not actually support this query natively. Inside the python client, this will get converted into disjunctive normal form:
SELECT * FROM MyKind WHERE myFirstProp = 'a'
UNION
SELECT * FROM MyKind WHERE myFirstProp = 'b'
This means for each value inside your IN, you'll be issuing a separate Datastore query.

Related

Querying an extents directly

Is it possible to query database extents directly without specifying name of a table ?
For example when I fire the command .show database extents , I get a list of extents in the given database. If I pick a specific extent id from the result , or in general any extent id belonging to that database , is there a way to query it without reference to table name?
It's not recommended to take any kind of dependency on an extent ID.
Though your use case isn't clear nor standard, it is possible to run a query as follows:
union *
| where extent_id() == '6810147e-1234-1234-1234-d3649e3d3a83'
| take 10

How to introduce indexing to sqlite query in android?

In my android application, I use Cursor c = db.rawQuery(query, null); to query data from a local sqlite database, and one of the query string looks like the following:
SELECT t1.* FROM table t1
WHERE NOT EXISTS (
SELECT 1 FROM table t2
WHERE t2.start_time = t1.start_time AND t2.stop_time > t1.stop_time
)
however, the issue is that the query gets very slow when the database gets huge. Trying to look into introducing indexing to speed up the query, but so far, not been very successful, therefore, would be great to have some help here, as it's also hard to find examples for this for android applications.
You can create a composite index for the columns start_time and stop_time:
CREATE INDEX idx_name ON table_name(start_time, stop_time);
You can read in The SQLite Query Optimizer Overview:
The ON and USING clauses of an inner join are converted into
additional terms of the WHERE clause prior to WHERE clause analysis
...
and:
If an index is created using a statement like this:
CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);
Then the index might be used if the initial columns of the index
(columns a, b, and so forth) appear in WHERE clause terms. The initial
columns of the index must be used with the = or IN or IS operators.
The right-most column that is used can employ inequalities.
You may have to uninstall the app from the device so that the db is deleted and rerun to recreate it, or increase the version number of the db so that you can create the index in the onUpgrade() method.

Cosmos db Order by on 'computed field'

I am trying to select data based on a status which is a string. What I want is that status 'draft' comes first, so I tried this:
SELECT *
FROM c
ORDER BY c.status = "draft" ? 0:1
I get an error:
Unsupported ORDER BY clause. ORDER BY item expression could not be mapped to a document path
I checked Microsoft site and I see this:
The ORDER BY clause requires that the indexing policy include an index for the fields being sorted. The Azure Cosmos DB query runtime supports sorting against a property name and not against computed properties.
Which I guess makes what I want to do impossible with queries... How could I achieve this? Using a stored procedure?
Edit:
About stored procedure: actually, I am just thinking about this, that would mean, I need to retrieve all data before ordering, that would be bad as I take max 100 value from my database... IS there any way I can do it so I don t have to retrieve all data first? Thanks
Thanks!
ORDER BY item expression could not be mapped to a document path.
Basically, we are told we can only sort with properties of document, not derived values. c.status = "draft" ? 0:1 is derived value.
My idea:
Two parts of query sql: The first one select c.* from c where c.status ='draft',second one select c.* from c where c.status <> 'draft' order by c.status. Finally, combine them.
Or you could try to use stored procedure you mentioned in your question to process the data from the result of select * from c order by c.status. Put draft data in front of others by if-else condition.

Dynamodb query expression

Team,
I have a dynamodb with a given hashkey (userid) and sort key (ages). Lets say if we want to retrieve the elements as "per each hashkey(userid), smallest age" output, what would be the query and filter expression for the dynamo query.
Thanks!
I don't think you can do it in a query. You would need to do full table scan. If you have a list of hash keys somewhere, then you can do N queries (in parallel) instead.
[Update] Here is another possible approach:
Maintain a second table, where you have just a hash key (userID). This table will contain record with the smallest age for given user. To achieve that, make sure that every time you update main table you also update second one if new age is less than current age in the second table. You can use conditional update for that. Update can either be done by application itself, or you can have AWS lambda listening to dynamoDB stream. Now if you need smallest age for each use, you still do full table scan of the second table, but this scan will only read relevant records, to it will be optimal.
There are two ways to achieve that:
If you don't need to get this data in realtime you can export your data into a other AWS systems, like EMR or Redshift and perform complex analytics queries there. With this you can write SQL expressions using joins and group by operators.
You can even perform EMR Hive queries on DynamoDB data, but they perform scans, so it's not very cost efficient.
Another option is use DynamoDB streams. You can maintain a separate table that stores:
Table: MinAges
UserId - primary key
MinAge - regular numeric attribute
On every update/delete/insert of an original query you can query minimum age for an updated user and store into the MinAges table
Another option is to write something like this:
storeNewAge(userId, newAge)
def smallestAge = getSmallestAgeFor(userId)
storeSmallestAge(userId, smallestAge)
But since DynamoDB does not has native transactions support it's dangerous to run code like that, since you may end up with inconsistent data. You can use DynamoDB transactions library, but these transactions are expensive. While if you are using streams you will have consistent data, at a very low price.
You can do it using ScanIndexForward
YourEntity requestEntity = new YourEntity();
requestEntity.setHashKey(hashkey);
DynamoDBQueryExpression<YourEntity> queryExpression = new DynamoDBQueryExpression<YourEntity>()
.withHashKeyValues(requestEntity)
.withConsistentRead(false);
equeryExpression.setIndexName(IndexName); // if you are using any index
queryExpression.setScanIndexForward(false);
queryExpression.setLimit(1);

Riak search queries via the java client

I am trying to perform queries using the OR operator as following:
MapReduceResult result = riakClient.
mapReduce("some_bucket", "Name:c1 OR c2").
addMapPhase(new NamedJSFunction("Riak.mapValuesJson"), true).
execute();
I only get the 1st object in the query (where name='c1').
If I change the order of the query (i.e. Name:c2 OR c1) again I get only the first object in query (where name='c2').
is the OR operator (and other query operators) supported in the java client?
I got this answer from Basho engeneer, Sean C.:
You either need to group the terms or qualify both of them. Without a field identifier, the search query assumes that the default field is being searched. You can determine how the query will be interpreted by using the 'search-cmd explain' command. Here's two alternate ways to express your query:
Name:c1 OR Name:c2
Name:(c1 OR c2)
both options worked for me!

Resources