Update data with filter on Dynamo Db with Php Sdk - amazon-dynamodb

I have this DynamoDb table:
ID
customer_id
product_code
date_expire
3
12
TRE65GF
2023-11-15
5
12
WDD2
2023-11-15
4
44
BT4D
2023-06-23
What is the best way, in DynamoDb, to update the "date_expire" field to all customers with the same customer_id?
For example ,I want to set the date_expire to "2023-04-17" to all data with customer_id ="12".
Should I do a scan of the table to extract all the "IDs" and then a WriteRequestBatch?
Or is there a quicker way, like normal sql queries ("update table set field=value where condition=xx")?

If this is a common use-case, then I would suggest creating a GSI with a partition key of custome_id
customer_id
product_code
date_expire
ID
12
TRE65GF
2023-11-15
3
12
WDD2
2023-11-15
5
44
BT4D
2023-06-23
4
SELECT * FROM mytable.myindex WHERE customer_id = 12
First you do a Query on the customer_id to give you back all the customers data, then you have a choice on how to update the data:
UpdateItem
Depending on how many items returned it may be best to just iterate over them and call an UpdateItem on each item. UpdateItem is better than the PutItem or BatchWriteItem as its an upsert and not an overwrite, which means you will be less likely to corrupt your data due to conflicts/consistency.
BatchWriteItem
If you have a large amount of items for a customer, BatchWriteItem may be best for speed, where you can write batches of up to 25 items. But as mentioned above, you are overwriting data which can be dangerous when all you want to do is update.
TransactWriteItems
Transactions give you the ability to update batches of up to 100 items at a time, but the caveat is that the batch is ACID compliant, meaning if one item update fails for any reason, they all fail. However, based on your use-case, this may be what you intend to happen.
Examples
PHP examples are available here.

Related

Dynamodb re-order row priority field values and also make them sequential

I have a table in dynamodb, records has priority field. 1-N.
records shown to user with a form and user can update priority field, it means I need to change the priority of the field.
one solution is like when priority of a record changed I reorder all the records that their priory is more than it.
for example if I change a priority of record in N= 5 to 10, I need to order all records that their priority field is more than 5.
what do you recommend?
DynamoDB store all items(records) in order by a tables sort-attribute. However, you are unable to update a key value, you would need to delete and add a new item every time you update.
One way to overcome this is to create a GSI. Depending on the throughput required for you table you may need to artificially shard the partition key. If you expect to consume less than 1000 WCU per second, you won't need to.
gsipk
gsisk
data
1
001
data
1
002
data
1
007
data
1
009
data
Now to get all the data in order of priority you simply Query your index where gsipk = 1.
You can also Update the order attribute gsisk without having to delete and put an item.

To query Last 7 days data in DynamoDB

I have my dynamo db table as follows:
HashKey(Date) ,RangeKey(timestamp)
DB stores the data of each day(hash key) and time stamp(range key).
Now I want to query data of last 7 days.
Can i do this in one query? or do i need to call dbb 7 times for each day? order of the data does not matter So, can some one suggest an efficient query to do that.
I think you have a few options here.
BatchGetItem - The BatchGetItem operation returns the attributes of one or more items from one or more tables. You identify requested items by primary key. You could specify all 7 primary keys and fire off a single request.
7 calls to DynamoDB. Not ideal, but it'd get the job done.
Introduce a global secondary index that projects your data into the shape your application needs. For example, you could introduce an attribute that represents an entire week by using a truncated timestamp:
2021-02-08 (represents the week of 02/08/21T00:00:00 - 02/14/21T12:59:59)
2021-02-16 (represents the week of 02/15/21T00:00:00 - 02/22/21T12:59:59)
I call this a "truncated timestamp" because I am effectively ignoring the HH:MM:SS portion of the timestamp. When you create a new item in DDB, you could introduce a truncated timestamp that represents the week it was inserted. Therefore, all items inserted in the same week will show up in the same item collection in your GSI.
Depending on the volume of data you're dealing with, you might also consider separate tables to segregate ranges of data. AWS has an article describing this pattern.

Query latest record for each ID in DynamoDB

We have a table like this:
user_id | video_id | timestamp
1 2 3
1 3 4
1 3 5
2 1 1
And we need to query latest timestamp for each video viewed by a specific user.
Currently it's done like this:
response = self.history_table.query(
KeyConditionExpression=Key('user_id').eq(int(user_id)),
IndexName='WatchHistoryByTimestamp',
ScanIndexForward=False,
)
It queries all timestamps for all videos of specified user, but it does way huge load to database, because there can be thousands of timestamps of thousands videos.
I tried to find solution on Internet, but as I can see, all SQL solutions uses GROUP BY, but DynamoDB has no such features
There are 2 ways I know of doing this:
Method 1 GSI Global Secondary Index
GroupBy is sort of like partition in DynamoDB, (but not really). Your partition is currently user_id I assume, but you want video_id as the partition key, and timestamp as the sort key. You can do that creating a new GSI, and specify your new sort key timestamp & partition key video_id. This gives you the ability to query for a given video, the latest timestamp, as this query will only use 1 RCU and be super fast just add --max-items 1 --page-size 1. But you will need to supply the video_id.
Method 2 Sparse Index
The problem with 1 is you need to supply an ID, whereas you might just want to have a list of videos with their latest timestamp. There are a couple of ways to do this, one way I like is using a Sparse Index, if you have an attribute, called latest & set that to true for the latest timestamp, you can create a GSI and choose that attribute key latest, but not you will have to manually set and unset this value yourself, which you have to do in lambda streams or your app.
That does seem weird but this is how NoSQL works as opposed to SQL, which I myself am battling with now on a current project, where I am having to use some of these techniques myself, each time I do it just doesn't feel right but hopefully we'll get used to it.

Dynamodb GetBatchItem vs query

Currently I use table.query to get items by matching partition key and sorted by sorting key. Now the new requirement is to handle batch query - a couple of hundred partition keys match and hopefully still sorted by sorting key in each partition key result. I find GetBatchItem that can handle up to 100 items per one query, but look like no sorting. Is one item here one row in DDB or all rows in one partition key?
From performance(query speed) and price perspective which one should I use? And do i have to do sorting for the result by myself if I use GetBatchItem? Ideally I like a solution of fast, cost effective and result sorted by sorting key in each partition key, but the first two are top priority and I can do sorting if I have to. Thanks
Query() is cheaper...
BatchGetItem() runs as individual GetItem() each costing 1 RCU (assuming your item is less than 400K).
Lets say you're item is 10K, Query() can return 40 of them for 1 RCU whereas returning 40 via BatchGetItem() will cost 40 RCU.

dynamodb query to select all items that match a set of values

In a dynamo table I would like to query by selecting all items where an attributes value matches one of a set of values. For example my table has a current_status attribute so I would like all items that either have a 'NEW' or 'ASSIGNED' value.
If I apply a GSI to the current_status attribute it looks like I have to do this in two queries? Or instead do a scan?
DynamoDB does not recommend using scan. Use it only when there is no other option and you have fairly small amount of data.
You need use GSIs here. Putting current_status in PK of GSI would result in hot
partition issue.
The right solution is to put random number in PK of GSI, ranging from 0..N, where N is number of partitions. And put the status in SK of GSI, along with timestamp or some unique information to keep PK-SK pair unique. So when you want to query based on current_status, execute N queries in parallel with PK ranging from 0..N and SK begins_with current_status. N should be decided based on amount of data you have. If the data on each row is less than 4kb, then this parallel query operation would consume N read units without hot partition issue. Below link provides the details information on this
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-gsi-sharding.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html

Resources