Update GSI of DynamoDB, old data not update in new GSI - amazon-dynamodb

I have one table, name is Ticket
Ticket {
id,
usage,
affiliationOrganization
createdAt,
....
}
GSI1:
PartionKey: usage
SortKey: affiliationOrganization
After a development time, i want update SortKey of GSI1
PartionKey: usage
SortKey: affiliationOrganization#createdAt
But after modifying the GSI, the old data is no longer in this GSI table.
Only when adding new data will these new data be automatically typed into the GSI table.
(My english is stupid, I'm sorry if I wrote it wrong)
I want to automatically update old data to GSI after it is modified
Avoid data loss in GSI table, query is not missing data

The old data will automatically replicate to the GSI should it be eligible. Please ensure all your old items have an attribute named affiliationOrganization#createdAt.
Note that this must be a single attribute, DynamoDB does not combine attributes automatically, you must do so manually.

Related

How to filter DynamoDb by object property value

I have a DynamoDB table:
How shoul I filter entried in DB table where all keys are: access.role = "ADMIN"?
You would be best served by setting up an Global Index (GSI). You set the Partition Key equal to that attribute, and the Sort Key equal to some other attribute that you can guarantee will be unique. Then you use your SDK of choice or the Query option in the console, select the index, and query for partion_key = ADMIN
However. Be aware. Index's are a complete replication of the table. Dynamo is very good at this and relatively fast at doing so, but there is still the possibility that your index will be out of sync with the actual data. If you are not making the call against the index very often you are pretty much fine. If you are calling it very often, then you should restructure your table.
Dynamo is not an SQL. When setting up a dynamo schema you have to consider how you will access your data. your Access Patterns. You should design your data with your Partition Key as the data you will have when looking up (Ie: i always will have a user ID number) and your sort keys as the individual documents related to that PK (ie: a user has a document that is his profile data, a document that is his profile picture url, a document that is a list of his friends user numbers, a document that is ... ect)
Then you use Indexs for things like your question that you wont be doing very often.

How to future proof these possible requirement changes (swaping primary key columns) with a dynamodb table design?

I have the following data structure
item_id String
version String
_id String
data String
_id is simply a UUID to identify the item. There is no need to search for a row by this field yet.
As of now, item_id, an id generated by an external system, is the a primary key. i.e. Given the item_id, I want to be able retrieve version, _id and data from the dynamodb table.
item_id -> (version, _id, data)
Therefore I am setting item_id as the partition key.
I have two questions for future-proofing (evolution of) the above "schema":
In the future, if I want to incorporate version (version number of the item) into the primary key, can I just modify the table and add it to be the partition key?
If I also want to make the data searchable by _id, is it feasible modify the table to assign _id to be the partition key (It is a unique value because it is a UUID) and reassign item_id to be a search key?
I want to avoid creation of new dynamodb table and data migration to create new key structures, because it may lead to down time.
You cannot update primary keys in DynamoDB. From the docs:
You cannot use UpdateItem to update any primary key attributes. Instead, you will need to delete the item, and then use PutItem to create a new item with new attributes.
If you wanted to make data searchable by _id, you could introduce a secondary index with the _id field as the partition key of the index.
For example, let's say your data looked like this:
If you defined a secondary index on _id, the index would look like this (same data as the previous example, just a different logical view):
DynamoDB doesn't currently have any native versioning functionality, so you'll have to incorporate that into your data model. Fortunately, there's lots of discussion about this use case on the web. AWS has a document of DynamoDB "Best Practices", including an example of versioning.

Querying DynamoDb timestamp data in range

I want to migrate my data from DynamoDb to Redshift. I dont want to scan the whole table at once as this might result in throttling.
My Table is as below:
acountId(hash key), lastUpdatedTime.
I thought I can create GSI on lastUpdatedTime and then I can query like give me the data between day1 to day5. Again next day I can do give me data between day6 to day7.
But even with GSI my understanding is that It will scan the whole table As I wont have any hash key to provide. I just have some range of timestamp to query.
Creating a GSI is the right solution indeed. However the GSI creation operation might be a bit slow/expensive if you set GSI to project all attributes. I would recommend creating the GSI on lastUpdatedTime, and project only the partition key (and order key if you have one) using KEYS_ONLY. Then, when you scan, you will only retrieve the item keys and query the item there and then, when migrating.
I recommend reading up on GSIs here: https://docs.aws.amazon.com/fr_fr/amazondynamodb/latest/developerguide/GSI.html

Change the schema of a DynamoDB table: what is the best/recommended way?

What is the Amazon-recommended way of changing the schema of a large table in a production DynamoDB?
Imagine a hypothetical case where we have a table Person, with primary hash key SSN. This table may contain 10 million items.
Now the news comes that due to the critical volume of identity thefts, the government of this hypothetical country has introduced another personal identification: Unique Personal Identifier, or UPI.
We have to add an UPI column and change the schema of the Person table, so that now the primary hash key is UPI. We want to support for some time both the current system, which uses SSN and the new system, which uses UPI, thus we need both these two columns to co-exist in the Person table.
What is the Amazon-recommended way to do this schema change?
There are a couple of approaches, but first you must understand that you cannot change the schema of an existing table. To get a different schema, you have to create a new table. You may be able to reuse your existing table, but the result would be the same as if you created a different table.
Lazy migration to the same table, without Streams. Every time you modify an entry in the Person table, create a new item in the Person table using UPI and not SSN as the value for the hash key, and delete the old item keyed at SSN. This assumes that UPI draws from a different range of values than SSN. If SSN looks like XXX-XX-XXXX, then as long as UPI has a different number of digits than SSN, then you will never have an overlap.
Lazy migration to the same table, using Streams. When streams becomes generally available, you will be able to turn on a Stream for your Person table. Create a stream with the NEW_AND_OLD_IMAGES stream view type, and whenever you detect a change to an item that adds a UPI to an existing person in the Person table, create a Lambda function that removes the person keyed at SSN and add a person with the same attributes keyed at UPI. This approach has race conditions that can be mitigated by adding an atomic counter-version attribute to the item and conditioning the DeleteItem call on the version attribute.
Preemptive (scripted) migration to a different table, using Streams. Run a script that scans your table and adds a unique UPI to each Person-item in the Person table. Create a stream on Person table with the NEW_AND_OLD_IMAGES stream view type and subscribe a lambda function to that stream that writes all the new Persons in a new Person_UPI table when the lambda function detects that a Person with a UPI was changed or when a Person had a UPI added. Mutations on the base table usually take hundreds of milliseconds to appear in a stream as stream records, so you can do a hot failover to the new Person_UPI table in your application. Reject requests for a few seconds, point your application to the Person_UPI table during that time, and re-enable requests.
DynamoDB streams enable us to migrate tables without any downtime. I've done this to great effective, and the steps I've followed are:
Create a new table (let us call this NewTable), with the desired key structure, LSIs, GSIs.
Enable DynamoDB Streams on the original table
Associate a Lambda to the Stream, which pushes the record into NewTable. (This Lambda should trim off the migration flag in Step 5)
[Optional] Create a GSI on the original table to speed up scanning items. Ensure this GSI only has attributes: Primary Key, and Migrated (See Step 5).
Scan the GSI created in the previous step (or entire table) and use the following Filter:
FilterExpression = "attribute_not_exists(Migrated)"
Update each item in the table with a migrate flag (ie: “Migrated”: { “S”: “0” }, which sends it to the DynamoDB Streams (using UpdateItem API, to ensure no data loss occurs).
NOTE: You may want to increase write capacity units on the table during the updates.
The Lambda will pick up all items, trim off the Migrated flag and push it into NewTable.
Once all items have been migrated, repoint the code to the new table
Remove original table, and Lambda function once happy all is good.
Following these steps should ensure you have no data loss and no downtime.
I've documented this on my blog, with code to assist:
https://www.abhayachauhan.com/2018/01/dynamodb-changing-table-schema/
I'm using a variant of Alexander's third approach. Again, you create a new table that will be updated as the old table is updated. The difference is that you use code in the existing service to write to both tables while you're transitioning instead of using a lambda function. You may have custom persistence code that you don't want to reproduce in a temporary lambda function and it's likely that you'll have to write the service code for this new table anyway. Depending on your architecture, you may even be able to switch to the new table without downtime.
However, the nice part about using a lambda function is that any load introduced by additional writes to the new table would be on the lambda, not the service.
If the changes involve changing the partition key, you can add a new GSI (global secondary index). Moreover, you can always add new columns/attributes to DynamoDB without needing to migrate tables.

DynamoDB Change Range Key Column

Is it possible to modify the Rangekey column after table creation. Such as adding new column/attribute and assigning as RangeKey for the table. Tried searching but cant ble to find any articles about changing the Range or Hash key
No, unfortunately it's not possible to change the hash key, range key, or indexes after a table is created in DynamoDB. The DynamoDB UpdateItem API Documentation is clear about the fact that indexes cannot be modified. I can't find a reference to anywhere in the docs that explicitly states that the table keys cannot be modified, but at present they cannot be changed.
Note that DynamoDB is schema-less other than the hash and range key, and you can add other attributes to new items with no problems. Unfortunately, if you need to modify either your hash key or range key, you'll have to make a new table and migrate the data.
Edit (January 2014): DynamoDB now has support for on the fly global secondary indexes
To change or create an additional sort key, you will need to create a new table and migrate over to it, as both actions cannot be done on existing tables.
DynamoDB streams enable us to migrate tables without any downtime. I've done this to great effective, and the steps I've followed are:
Create a new table (let us call this NewTable), with the desired key structure, LSIs, GSIs.
Enable DynamoDB Streams on the original table
Associate a Lambda to the Stream, which pushes the record into NewTable. (This Lambda should trim off the migration flag in Step 5)
[Optional] Create a GSI on the original table to speed up scanning items. Ensure this GSI only has attributes: Primary Key, and Migrated (See Step 5).
Scan the GSI created in the previous step (or entire table) and use the following Filter:
FilterExpression = "attribute_not_exists(Migrated)"
Update each item in the table with a migrate flag (ie: “Migrated”: { “S”: “0” }, which sends it to the DynamoDB Streams (using UpdateItem API, to ensure no data loss occurs).
NOTE: You may want to increase write capacity units on the table during the updates.
The Lambda will pick up all items, trim off the Migrated flag and push it into NewTable.
Once all items have been migrated, repoint the code to the new table
Remove original table, and Lambda function once happy all is good.
Following these steps should ensure you have no data loss and no downtime.
I've documented this on my blog, with code to assist:
https://www.abhayachauhan.com/2018/01/dynamodb-changing-table-schema/

Resources