how to create/update item in dynamo db? - amazon-dynamodb

I tried a sample code ( see below) , to create new items in dynamo db . based on the docs for dynamodb and boto3, the sample code adds the item in dynamodb in batch, but just from the code , it looks like put item is being called in each iteration of the for loop below. any thoughts, also , i understand for updating item, there is no batch operation , we have to call update item one at a time?
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('my-table')
with table.batch_writer() as writer:
for item in somelist:
writer.put_item(Item=item)

Note that you called the put_item() method on the writer object. This writer object is a batch writer - it is a wrapper of the original table object. This wrapper doesn't perform every put_item() request individually! Instead,
As its name suggests, the batch writer collects batches of up to 25 writes in memory, and only on the 25th call, it sends all 25 writes as one DynamoDB BatchWriteItem request.
Then, at the end of the loop, the writer object is destroyed when the the with block ends, and this sends the final partial batch as one last BatchWriteItem request.
As you can see, Python made efficient writing using batches very transparent and easy.

The boto3 batch writer buffers internally and sends each batch automatically. It’s like magic.

Related

CreateBatchWrite with DynamoDBContext Update/Insert C#

I have a list of files that should be inserted or updated in dynamodb, so I'm doing in this way:
var batch = _dynamoDbContext.CreateBatchWrite<MyEntity>();
batch.AddPutItems(myEntityList);
batch.ExecuteAsync();
This works fine if DynamoDB table is empty, but sometimes I should update instead insert, but I got the following error:
An item with the same key has already been added. Key: Amazon.DynamoDBv2.DocumentModel.Key
How can I solve it ? I need to use batch, because of performance.
You can use transactions to do insert or updates but they are double the cost, otherwise you will need to update one by one
Here's some more info on a previous post
DynamoDB Batch Update

DynamoDB Stream is showing up both INSERT and UPDATE for a new record insertion

I'm seeing 2 events (INSERT and MODIFY) on my DynamoDB table which is a global table with one global secondary index.
As a result configured trigger(lambda) is executing for 2 times for single insertion, adding extra processing cost.
I couldn't see any documentation that explains about 2 events for single DB insertion.
Can anyone help me understand it?
Thanks.

DynamoDB Batch Update

Is there any API in DynamoDB to update a batch of items? There is an API to write new items in batches (BatchWriteItem) and update single item using UpdateItem, but is it possible to update multiple items in one call?
There is no batch update item API available in DynamoDB at the moment.
DynamoDB API operations list
I know this is an old question by now, but DynamoDB recently added a Transaction api which supports update:
Update — Initiates an UpdateItem operation to edit an existing item's attributes or add a new item to the table if it does not already exist. Use this action to add, delete, or update attributes on an existing item conditionally or without a condition.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html
I reached this thread on similar query, hope this might help.
DynamoDB supports Batch Statement Execution which is described in documentation. This works with client object rather than resource object. Then I used the PartiQL update statement supported by DynamoDB and described here.
Python code reference looks something like this:
client = boto3.client('dynamodb')
batch = ["UPDATE users SET active='N' WHERE email='<user_email>' RETURNING [ALL|MODIFIED] [NEW|OLD] *;", "UPDATE users ..."] # Limit to 25 per batch
request_items = [{'Statement': _stat} for _stat in batch]
batch_response = client.batch_execute_statement(Statements=request_items)
This is minimal code. You can use multi-threading to execute multiple batches at once.
With PartiQL you can execute batch insert and update just like SQL.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-reference.multiplestatements.batching.html
BatchWriteItem cannot update items. To update items, use the UpdateItem action.
BatchWriteItem operation puts or deletes multiple items in one or more tables
Reference: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
I use DynamoDBMapper.batchSave(Iterable<? extends Object> objectsToSave) for this purpose.
No there is no batch update currently , you can use a single update Item call and have a workflow over it like AWS SWF or AWS step functions
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB/DocumentClient.html
I use a dynamoDB update trigger, then I made a template that said to me what items I should modify, I put them on a queue and them read queue messages in other to update one by one
Not exactly a batch delete but I did this in a python lambda function just now:
import json
import boto3
client = boto3.client('dynamodb')
def lambda_handler(event, context):
idList = [
"id1",
"id2
...
"id100",
]
for itemID in idList:
test = client.update_item(
TableName='Your-Table-Name',
Key={
'id': {
'S': itemID
}
},
UpdateExpression="set exressionToChange=:r",
ExpressionAttributeValues={
':r': {'S':'New_Value'}},
ReturnValues="UPDATED_NEW")
return
To get the idList, I downloaded the values in a CSV, copied them into VSCode and then did a find and replace with regex (CMD-F then click .\*) and set
find to ".*" and replace to "$0",
which basically replaces every line with itself in quotes and a comma
So basically before:
id1
id2
id3
...
And after
"id1",
"id2",
"id3",
...
Just replace "idList = [...]" with your ids, "Your-Table-Name", "expressionToChange" and lastly, "New_Value".
Also you will have give your lambda function permission to "Update Item" in DynamoDB or you will get an error

Change the schema of a DynamoDB table: what is the best/recommended way?

What is the Amazon-recommended way of changing the schema of a large table in a production DynamoDB?
Imagine a hypothetical case where we have a table Person, with primary hash key SSN. This table may contain 10 million items.
Now the news comes that due to the critical volume of identity thefts, the government of this hypothetical country has introduced another personal identification: Unique Personal Identifier, or UPI.
We have to add an UPI column and change the schema of the Person table, so that now the primary hash key is UPI. We want to support for some time both the current system, which uses SSN and the new system, which uses UPI, thus we need both these two columns to co-exist in the Person table.
What is the Amazon-recommended way to do this schema change?
There are a couple of approaches, but first you must understand that you cannot change the schema of an existing table. To get a different schema, you have to create a new table. You may be able to reuse your existing table, but the result would be the same as if you created a different table.
Lazy migration to the same table, without Streams. Every time you modify an entry in the Person table, create a new item in the Person table using UPI and not SSN as the value for the hash key, and delete the old item keyed at SSN. This assumes that UPI draws from a different range of values than SSN. If SSN looks like XXX-XX-XXXX, then as long as UPI has a different number of digits than SSN, then you will never have an overlap.
Lazy migration to the same table, using Streams. When streams becomes generally available, you will be able to turn on a Stream for your Person table. Create a stream with the NEW_AND_OLD_IMAGES stream view type, and whenever you detect a change to an item that adds a UPI to an existing person in the Person table, create a Lambda function that removes the person keyed at SSN and add a person with the same attributes keyed at UPI. This approach has race conditions that can be mitigated by adding an atomic counter-version attribute to the item and conditioning the DeleteItem call on the version attribute.
Preemptive (scripted) migration to a different table, using Streams. Run a script that scans your table and adds a unique UPI to each Person-item in the Person table. Create a stream on Person table with the NEW_AND_OLD_IMAGES stream view type and subscribe a lambda function to that stream that writes all the new Persons in a new Person_UPI table when the lambda function detects that a Person with a UPI was changed or when a Person had a UPI added. Mutations on the base table usually take hundreds of milliseconds to appear in a stream as stream records, so you can do a hot failover to the new Person_UPI table in your application. Reject requests for a few seconds, point your application to the Person_UPI table during that time, and re-enable requests.
DynamoDB streams enable us to migrate tables without any downtime. I've done this to great effective, and the steps I've followed are:
Create a new table (let us call this NewTable), with the desired key structure, LSIs, GSIs.
Enable DynamoDB Streams on the original table
Associate a Lambda to the Stream, which pushes the record into NewTable. (This Lambda should trim off the migration flag in Step 5)
[Optional] Create a GSI on the original table to speed up scanning items. Ensure this GSI only has attributes: Primary Key, and Migrated (See Step 5).
Scan the GSI created in the previous step (or entire table) and use the following Filter:
FilterExpression = "attribute_not_exists(Migrated)"
Update each item in the table with a migrate flag (ie: “Migrated”: { “S”: “0” }, which sends it to the DynamoDB Streams (using UpdateItem API, to ensure no data loss occurs).
NOTE: You may want to increase write capacity units on the table during the updates.
The Lambda will pick up all items, trim off the Migrated flag and push it into NewTable.
Once all items have been migrated, repoint the code to the new table
Remove original table, and Lambda function once happy all is good.
Following these steps should ensure you have no data loss and no downtime.
I've documented this on my blog, with code to assist:
https://www.abhayachauhan.com/2018/01/dynamodb-changing-table-schema/
I'm using a variant of Alexander's third approach. Again, you create a new table that will be updated as the old table is updated. The difference is that you use code in the existing service to write to both tables while you're transitioning instead of using a lambda function. You may have custom persistence code that you don't want to reproduce in a temporary lambda function and it's likely that you'll have to write the service code for this new table anyway. Depending on your architecture, you may even be able to switch to the new table without downtime.
However, the nice part about using a lambda function is that any load introduced by additional writes to the new table would be on the lambda, not the service.
If the changes involve changing the partition key, you can add a new GSI (global secondary index). Moreover, you can always add new columns/attributes to DynamoDB without needing to migrate tables.

Why can't I recover from a transaction Rollback?

I'm using Entity Framework 6 and hitting a situation where I can't recover from a rolled back transaction.
I need to loop through a list, and for each item, add some entries to two tables. My code is roughly this:
Dim db = New Data.Context
Try
For Each item in list
Using tx = db.Database.BeginTransaction
'add objects to table 1
'add objects to table 2
db.SaveChanges()
tx.Commit()
End Using
Next
Catch ex As Exception
'record the error
End Try
I would expect that it would loop through the whole list, and add entries whenever SaveChanges succeeds, and log them when it fails.
But whenever the SaveChanges call fails, the transaction rolls back and I move to the next item in the list, and then SaveChanges fails for that one too, with the same error. It's as if the context still has the new objects in it and tries to re-save them the next time through the loop. So, during the rollback process, how can I tell the context to forget about those objects so I can continue to loop?
SaveChanges synchronizes your in-memory objects with the database. You have added objects to the in-memory model. They never go away until you delete them.
Adding an object does not queue an insert. It simply adds an object. Until it has been inserted SaveChanges will try to bring the database to the latest state.
EF is not a CRUD helper that you can queue writes to. It tries to conceptually mirror the database in-memory. SaveChanges simply executed the necessary DML for that.
Use one context per row.

Resources