Google Datastore Index Optimization - google-cloud-datastore

While designing key-only queries to filter Google Datastore entities, I am generating many composite indexes that are subsets of another index. Is it possible to use the same composite index for queries that filter on a subset of the properties already indexed? For example, if I have the following key-only queries, would it be possible to have less than three indexes?
Query 1: Entities where a = 1, b = 1, c = 1;
Query 2: Entities where a = 1, b = 1;
Query 3: Entities where a = 1;
Here is a sample of the actual query I am working with:
Query<Key> query = Query.newKeyQueryBuilder()
.setKind("track")
.setFilter(CompositeFilter.and(PropertyFilter.eq("status", 1), PropertyFilter.eq("bpm", 138), PropertyFilter.eq("artist", "AVB"), PropertyFilter.eq("label", "Armada")))
.setOrderBy(OrderBy.asc("date"))
.build();

Datastore can merge smaller indexes together to support larger equality queries, see index merging. Using this feature, a minimal set of indexes for your set of queries would be something like:
index.yaml
indexes:
- kind: Albums
properties:
- name: artist
- name: date
- kind: Albums
properties:
- name: bpm
- name: date
- kind: Albums
properties:
- name: label
- name: date
- kind: Albums
properties:
- name: status
- name: date
This supports equality queries on any number of these properties, sorted by date. Note, however, that index merging has a performance trade-off in some cases.

Related

DynamoDB filter if primary key contains value

CURRENTLY
I have a table in DynamoDB with a single attribute - Primary Key - that contains unique values.
PK
------
#A#B#C#
#B#C#
#C#D#E#
#BC#
ISSUE
I am looking to do 2 searches for #B#C# (1) exact match, and (2) containing match, and therefore only want results:
(1) Exact Match:
#B#C#
(2) Containing Match:
#A#B#C#
#B#C#
Are these 2 searches possible against the primary key?
If so, what is the most efficient query to run? e.g. QUERY or SCAN
Note:
For (2) I am using the following code, but it is returning all items in DB:
params = {
TableName: 'myTable',
FilterExpression: "contains(#key, :v)",
ExpressionAttributeNames: { "#key": "PK" },
ExpressionAttributeValues: { ":v": #B#C# }
}
dynamodb.scan(params,callback)
DynamoDB supports two main types of searches: query and scan. The Query operation finds items based on primary key values. The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index
If you wanted to find the item with a primary key #B#C, you would use the query API:
ddbClient.query(
{
"TableName": "<YOUR TABLE NAME>",
"KeyConditionExpression": "#pk = :pk",
"ExpressionAttributeValues": {
":pk": {
"S": "#B#C"
}
},
"ExpressionAttributeNames": {
"#pk": "PK"
}
}
)
For your second access pattern, you'll need to use the scan API because you are searching across the entire table/secondary index.
You can use scan to test if a primary key has a substring using contains. I don't see anything wrong with the format of your scan operation.
Be careful when using scan this way. Because scan will read your entire table to fetch results, you will have a fairly inefficient operation at scale. If this operation is run infrequently, or you are running it against a sparse index, it's probably fine. However, if it's one of your primary access patterns, you may want to reconsider using the scan API for this operation.

Google Datastore Composite Index with __key__ desc not working when queried

I have an index created like this:
ancestor: NONE
indexId: CICAgJiUpoMK
kind: Candidate
projectId: financialplanning-270210
properties:
- direction: ASCENDING
name: activity.lastupdatets
- direction: DESCENDING
name: id
state: READY
but when running the following query:
select activity.lastupdatets from candidate order by activity.lastupdatets asc, __key__ desc
It fails with an error like "...no suitable composite index found..."

DynamoDB OrderBy operation

Table : Customer
Item: CustomerId,PurchaseType,Name,mobilenumber,price, createdDate
DATA1: cus001,"online","BBBBB","yourmobilenumber",6000,"01/07/2017 01:12:05"
DATA2: cus002,"online","myname","mymobilenumber",500,"10/07/2017 01:12:01"
DATA3: cus003,"online","AAAAA","yourmobilenumber",6000,"10/07/2017 01:12:06"
DATA4: cus004,"online","yourname","yourmobilenumber",1000,"10/07/2017 02:12:06"
DATA5: cus005,"retail","yourname","yourmobilenumber",1000,"10/07/2017 03:12:06"
GSI: price-index[PurchaseType,price]
Query with index "price-index"
condition: purchasetype="online" and price >500
ScanIndex: true
How to get the result based on the following conditions:
purchasetype="online"
price>500
order by Name
You need to create a different GSI:
PurchaseType - partition key of the GSI
Name - sort key of the GSI
Then you can use a query to find all items with the necessary purchase type, order by Name and provide a filter expression to filter all items with high prices.

What's the equivalent DynamoDB solution for this MySQL Query?

I'm familiar with MySQL and am starting to use Amazon DynamoDB for a new project.
Assume I have a MySQL table like this:
CREATE TABLE foo (
id CHAR(64) NOT NULL,
scheduledDelivery DATETIME NOT NULL,
-- ...other columns...
PRIMARY KEY(id),
INDEX schedIndex (scheduledDelivery)
);
Note the secondary Index schedIndex which is supposed to speed-up the following query (which is executed periodically):
SELECT *
FROM foo
WHERE scheduledDelivery <= NOW()
ORDER BY scheduledDelivery ASC
LIMIT 100;
That is: Take the 100 oldest items that are due to be delivered.
With DynamoDB I can use the id column as primary partition key.
However, I don't understand how I can avoid full-table scans in DynamoDB. When adding a secondary index I must always specify a "partition key". However, (in MySQL words) I see these problems:
the scheduledDelivery column is not unique, so it can't be used as a partition key itself AFAIK
adding id as unique partition key and using scheduledDelivery as "sort key" sounds like a (id, scheduledDelivery) secondary index to me, which makes that index pratically useless
I understand that MySQL and DynamoDB require different approaches, so what would be a appropriate solution in this case?
It's not possible to avoid a full table scan with this kind of query.
However, you may be able to disguise it as a Query operation, which would allow you to sort the results (not possible with a Scan).
You must first create a GSI. Let's name it scheduled_delivery-index.
We will specify our index's partition key to be an attribute named fixed_val, and our sort key to be scheduled_delivery.
fixed_val will contain any value you want, but it must always be that value, and you must know it from the client side. For the sake of this example, let's say that fixed_val will always be 1.
GSI keys do not have to be unique, so don't worry if there are two duplicated scheduled_delivery values.
You would query the table like this:
var now = Date.now();
//...
{
TableName: "foo",
IndexName: "scheduled_delivery-index",
ExpressionAttributeNames: {
"#f": "fixed_value",
"#d": "scheduled_delivery"
},
ExpressionAttributeValues: {
":f": 1,
":d": now
},
KeyConditionExpression: "#f = :f and #d <= :d",
ScanIndexForward: true
}

How do you override sequential id allocation in a seed file?

I have the following seed file entry:
shapes = Shape.create([
{
id: 1,
name: 'Rectangle',
surface_count: 4,
created_at: Time.now,
updated_at: Time.now
},
{
id: 3,
name: 'H-shape',
surface_count: 12,
created_at: Time.now,
updated_at: Time.now
}
])
When I seed my SQLite database I end up with two rows that have ids one and two, not one and three. This is just a sample. The actual table is much larger. I am trying to get my test environment to match my production environment where the second row was just deleted and the remaining rows were left as-is.
Sqlite allows you to override an autoincrement simply by including the field in your insert statement.
From your description, it seems likely that your actual insert statement is not including id and your id is defined as autoincrement. If you are using some kind of framework to serialize, you'll need to see how it is defined there and how it treats autoincrement fields. Tracing the actual SQL statement should tell you.

Resources