how is data stored in dynamo db if hash and range key are same in global secondary index - amazon-dynamodb

I am creating global secondary index in dynamo db, now I want to know how are items stored in of both hash key and range key are same? In what order they will be stored in table?

If I am reading your question correctly, you are asking about two different objects, each with a unique primary key in the main table, that are projected into a GSI with the same HASH/RANGE key in the GSI.
Example
Main table
Hash: hash_id
Range: range_id
GSI
Hash: gsi_hash_id
Range: gsi_range_id
Data
{
hash_id: 123,
range_id: 'abc',
gsi_hash_id: 'same',
gsi_range_id: 'also_same'
}
{
hash_id: 234,
range_id: 'bcd',
gsi_hash_id: 'same',
gsi_range_id: 'also_same'
}
The short answer: the items are in no particular order.
The long answer: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html

Related

DynamoDB filter if primary key contains value

CURRENTLY
I have a table in DynamoDB with a single attribute - Primary Key - that contains unique values.
PK
------
#A#B#C#
#B#C#
#C#D#E#
#BC#
ISSUE
I am looking to do 2 searches for #B#C# (1) exact match, and (2) containing match, and therefore only want results:
(1) Exact Match:
#B#C#
(2) Containing Match:
#A#B#C#
#B#C#
Are these 2 searches possible against the primary key?
If so, what is the most efficient query to run? e.g. QUERY or SCAN
Note:
For (2) I am using the following code, but it is returning all items in DB:
params = {
TableName: 'myTable',
FilterExpression: "contains(#key, :v)",
ExpressionAttributeNames: { "#key": "PK" },
ExpressionAttributeValues: { ":v": #B#C# }
}
dynamodb.scan(params,callback)
DynamoDB supports two main types of searches: query and scan. The Query operation finds items based on primary key values. The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index
If you wanted to find the item with a primary key #B#C, you would use the query API:
ddbClient.query(
{
"TableName": "<YOUR TABLE NAME>",
"KeyConditionExpression": "#pk = :pk",
"ExpressionAttributeValues": {
":pk": {
"S": "#B#C"
}
},
"ExpressionAttributeNames": {
"#pk": "PK"
}
}
)
For your second access pattern, you'll need to use the scan API because you are searching across the entire table/secondary index.
You can use scan to test if a primary key has a substring using contains. I don't see anything wrong with the format of your scan operation.
Be careful when using scan this way. Because scan will read your entire table to fetch results, you will have a fairly inefficient operation at scale. If this operation is run infrequently, or you are running it against a sparse index, it's probably fine. However, if it's one of your primary access patterns, you may want to reconsider using the scan API for this operation.

Is safe to set a randomly generated alpha numeric string as primary partition key and sort key in DynamoDB

Here is the sample JSON which we are planning to insert into DynamoDB table. As of now we are having organizationID as primary partition key and __id__ as sort key. Since we will query based on organizationID we kept it as primary partition key. Is it a good approach to keep __id__ as sort key.
{
"__class__": "package",
"__updated__": "2015-10-19T14:30:13Z",
"__created__": "2015-10-19T12:32:28Z",
"transactions": [
{
transaction1
},
{
transaction2
}
],
"carrier": "USPS",
"organizationID": "6406fa6fd32393908125d4d81ec358",
"barcode": "9400110891302408",
"queryString": [
"xxxxxxx",
"YYYY",
"delivered",
],
"deliveredTo": null,
"__id__": "3232d1a045476786fg22dfg32b82209155b32"
}
As per the best practice, you can have timestamp as sort key for the above data model. One advantage of having timestamp as sort key is that you can sort the data for the particular partition key and identity the latest updated item. This is the very common use case for having sort key.
It doesn't make much sense to keep both partition and sort key as randomly generated value because you can't use sort key efficiently (unless I miss something here).

What's the equivalent DynamoDB solution for this MySQL Query?

I'm familiar with MySQL and am starting to use Amazon DynamoDB for a new project.
Assume I have a MySQL table like this:
CREATE TABLE foo (
id CHAR(64) NOT NULL,
scheduledDelivery DATETIME NOT NULL,
-- ...other columns...
PRIMARY KEY(id),
INDEX schedIndex (scheduledDelivery)
);
Note the secondary Index schedIndex which is supposed to speed-up the following query (which is executed periodically):
SELECT *
FROM foo
WHERE scheduledDelivery <= NOW()
ORDER BY scheduledDelivery ASC
LIMIT 100;
That is: Take the 100 oldest items that are due to be delivered.
With DynamoDB I can use the id column as primary partition key.
However, I don't understand how I can avoid full-table scans in DynamoDB. When adding a secondary index I must always specify a "partition key". However, (in MySQL words) I see these problems:
the scheduledDelivery column is not unique, so it can't be used as a partition key itself AFAIK
adding id as unique partition key and using scheduledDelivery as "sort key" sounds like a (id, scheduledDelivery) secondary index to me, which makes that index pratically useless
I understand that MySQL and DynamoDB require different approaches, so what would be a appropriate solution in this case?
It's not possible to avoid a full table scan with this kind of query.
However, you may be able to disguise it as a Query operation, which would allow you to sort the results (not possible with a Scan).
You must first create a GSI. Let's name it scheduled_delivery-index.
We will specify our index's partition key to be an attribute named fixed_val, and our sort key to be scheduled_delivery.
fixed_val will contain any value you want, but it must always be that value, and you must know it from the client side. For the sake of this example, let's say that fixed_val will always be 1.
GSI keys do not have to be unique, so don't worry if there are two duplicated scheduled_delivery values.
You would query the table like this:
var now = Date.now();
//...
{
TableName: "foo",
IndexName: "scheduled_delivery-index",
ExpressionAttributeNames: {
"#f": "fixed_value",
"#d": "scheduled_delivery"
},
ExpressionAttributeValues: {
":f": 1,
":d": now
},
KeyConditionExpression: "#f = :f and #d <= :d",
ScanIndexForward: true
}

how to form composite unique key in dynamo db

I have a table in dynamo DB with the fields A, B, C, D and E.
The primary key is A(partition key) and B(sort key).
I want to have another unique constraint for C and D together as a composite unique key. In mysql I would do something like this
ALTER TABLE YourTable
add CONSTRAINT YourTable_unique UNIQUE (C, D);
I want to do something similar in dynamo DB so that when i create a new entry with an already matching composite unique key(C and D), it does not allow me to create that entry.
from documentation:
To write an item only if it doesn't already exist,
use PutItem with a conditional expression that uses the
attribute_not_exists function and the name of the table's
partition key
you cant add constraint on other keys then partition keys (you cant add constraint on global secondary key)
#Eyal Ch is right. I handled it at the application level, by doing a scan before saving like this :
DynamoDBScanExpression expr = new DynamoDBScanExpression();
expr.addFilterCondition("C",new Condition()
.withComparisonOperator(ComparisonOperator.EQ)
.withAttributeValueList(new AttributeValue().withS("value of C")));
expr.addFilterCondition("D",new Condition()
.withComparisonOperator(ComparisonOperator.EQ)
.withAttributeValueList(new AttributeValue().withS("value of D")));
List<TableClass> responseList = mapper.scan(TableClass.class, expr);
if (responseList.size() == 0){
........//save the record here
}

Tips for querying dynamic object fields in Crate

I have a table such as the one at the end of this question. I insert into the peers_array field a dynamically keyed array/object such as:
{
"130":{
"to":5
},
"175":{
"fr":0
},
"188":{
"fr":0
},
"190":{
"to":5
},
"280":{
"fr":4
}
}
I'm looking for advice on how to wildcard query the key field. Such as:
select * from table where peers_array[*]['to'] > 10
In Elasticsearch I can query like this:
peers_array.*.to: >10
My Table:
CREATE TABLE table (
"id" long primary key,
"sourceRouteId" integer,
"rci" integer,
peers_array object(dynamic),
"partition_date" string primary key
) partitioned by (partition_date) with (number_of_replicas = 0, refresh_interval = 5000);
I'm sorry to say, but this is currently not possible. We'll put it on our backlog. Thanks for reporting this use case.

Resources