Read most recent records in DynamoDB - amazon-dynamodb

I have a table in DynamoDB with the following structure:
UserID (String - Partition key)
AsOfTimestamp (number - sort Key)
Product (string) ... and some other attributes.
I also have an LSI defined like this:
UserID (String - Partition Key)
Product (String - Sort Key)
AsOfTimestamp (Number) ... and other attributes
My problem is, I need to read the most recent records for all Products for a given UserID. For example, a return sample from this query could be:
UserID
Product
AsOfDate
User1
Product1
1617816274
User1
Product2
1617816288
My issue is, if I use the index on the table I can get the latest records but that does not guarantee that I will get records for each product, I could get for example a lot of records User1-Product1 before I see the most recent User1-Product2.
If I use the LSI, I can get records for a given UserId sorted by product but I will be forced to sort the results by AsOfDate and read the entire result set to get the most recent ones.
Thanks!

If I understand what you're asking for I think you're going to need another record for that data. What you want is a single AsOfDate for each user/product combination, where the AsOfDate is the most recent. If you add a record that is keyed on UserID and Product then you'll have exactly one record for each. To make that work you'll likely need to change your table structure to support the single table design pattern (or store this data in a different table). In a single table design you might have something like this:
pk
sk
UserID
Product
AsOfDate
User1
Purchase|Product1|1617816274
User1
Product1
1617816274
User1
MostRecent|Product1
User1
Product1
1617816274
User1
Purchase|Product2|1617816274
User1
Product2
1617816274
User1
Purchase|Product2|1617816288
User1
Product2
1617816288
User1
MostRecent|Product2
User1
Product2
1617816288
Then, to get all the most recent records you query for pk = userId and begins_with(sk, 'MostRecent|'). To get the other records you query for pk = userId and begins_with(sk, 'Purchase|'). Your access pattern requirements might have you changing that some, but the idea should be similar to this.
Whenever you do a new "Purchase" you would insert the new row for that, and update the "MostRecent" row as well (you can do that in a transaction if you need to, or use a DynamoDB stream to do that update).

Related

DynamoDB filter items not contains

I have the following problem:
Partition key (pk) and Sort key (sk):
pk sk
1 ITEMS#1
1 ORDERS#1
2 ITEMS#1
3 ITEMS#2
How can I retrieve all pk's that does not contain orders? I have tried the filter:
sk not contains "ORDERS"
But that returns
pk sk
1 ITEMS#1
2 ITEMS#1
3 ITEMS#2
Where I only want to return pk 2 and 3.
To do this efficiently, you'll have to pre-materialize the orders-or-not fact into your data and structure things in such a way that the fact is appropriately indexed and ready to use.
For example, you can create an item under each PK with an SK of META for metadata about that PK, and on that item you can have an attribute of HasOrders that's present if it has any orders in that item collection. Then you can create a GSI using that as the GSI partition key and very efficiently find all items that have orders with a Query.
You'll have to update the META every time someone places their first order.

Single table DynamoDB design tips

I have an old application I am modernizing and bringing to AWS. I will be using DynamoDB for the database and am looking to go with a single table design. This is a multitenant application.
The applications will consist of Organisations, Outlets, Customers & Transactions.
Everything stems from an organization, an organization can have multiple outlets, outlets can have multiple customers and customers can have multiple transactions.
Access patterns are expected to be as follows:
Fetch a customer by its ID
Search for a customer by name or email
Get all customers for a given outlet
Get all transactions for a customer
Get all transactions for an outlet
Get all transactions for an outlet during a given time period (timestamps will be stored with each transaction)
Get all outlets for a given organisation
Get an outlet by its ID
I've been reading into single table designs and utilizing the primary key and sort keys to enable this sort of access but right now I can't quite figure out the table/schema design.
The customer will have the outletID and OrganiastionID attached so I should always know those ID's
Data Structure (can be modified)
Organisations:
id
Name
Owner
List of Outlets
createdAt (timestamp)
Outlets:
OrganisationId
Outlet Name
Number of customers
Number of transactions
createdAt (timestamp)
Customers:
id
OrganisationID
OutletID
firstName
lastName
email
total transactions
total spent
createdAt (timestamp)
Transactions:
id
customerID
OrganisationID
OutletID
createdAt (timestamp)
type
value
You're off to a great start by having a thorough understanding of your entities and access patterns! I've taken a stab at modeling for these access patterns, but keep in mind this is not the only way to model a solution. Data modeling in DynamoDB is iterative, so this is very likely that this specific design might not fit 100% of your use cases.
With that disclaimer out of the way, let's get into it!
I've modeled your access patterns using a single table named data with global secondary indexes (GSI) named GSI1 and GSI2. Each GSI has partition and sort keys named GSI#PK and GSI#SK respectively.
The base table models the following access patterns:
Fetch customer by ID: getItem where PK=CUST#<id> and SK = A
Fetch all transactions for a customer: query where PK=CUST#<id> and SK begins_with TX
Fetch an outlet by ID: getItem where PK=ORG#<id> and SK = A
Fetch all customers for an outlet: query where PK=OUT#<id>#CUST
That last access pattern may require a bit more explanation. I've chosen to model the relationship between outlets and customers using a unique PK/SK pattern where PK is OUT#<id>#CUST and SK isCUST#<id>. When your application records a transaction for a particular customer, it can insert two records in DDB using a batch write operation. The batch write operation would perform two operations:
Write a new Transaction into the Customer partition (e.g. PK = CUST#1 and SK = TX#<id>)
Write a new record to the CUSTOMERLIST partition (e.g. PK = OUT#<id>#CUST and SK = CUST#<id>). It this record already exists, DynamoDB will just overwrite the existing record, which is fine for your use case.
Moving onto GSI1:
GSI1 supports the following operations:
Fetch outlets by organization: query GSI1 where GSI1PK = ORG#<id>
Fetch transactions by outlet: query GSI1 where GSI1PK = OUT#<id>
Fetch transactions by outlet for a given time period: `query GSI1 where GSI1PK=OUT# and GSI1SK between and
And finally, there's GSI2
GSI2 supports the following transactions:
Fetch transactions by organization: query GSI2 where GSI2PK = ORG#<id>
Fetch transactions by organization for a given time period: query GSI2 where GSI2PK=OUT#<id> and GSI2SK between <period1> and <period2>
For your final access pattern, you've asked to support searching for customers by email or name. DynamoDB is really good at finding items by their primary key. DynamoDB is not good for search, where fuzzy or partial matches are expected. If you need an exact match on email or name, you could do that in DynamoDB by incorporating email//name in the primary key of the User item.
I hope this gives you some ideas on how to model your access patterns!

Query DynamoDB with Partition key and Sort Key using OR Conditon

I have a requirement to query the dynamoDB and get all the records which matches a certain criteria. The requirement is, I have table say parent_child_table, which has parent_id and child_id as two columns, now i need to query the table with a particular input id and fetch all the records. for Example
now if I query the db with id 67899, the I should get both two records i.e 12345 and 67899.
I was trying to use below methods :
GetItemRequest itemRequest=new GetItemRequest().withTableName("PARENT_CHILD_TABLE").withKey(partitionKey.entrySet().iterator().next(), sortKey.entrySet().iterator().next());
but am not getting OR operator.
DynamoDB doesn't work like that...
GetItemRequest() can only return a single record.
Query() can return multiple records, but only if you are using a composite primary key (partition key + sort key) and you can only query within a single partition...so all the records to be returned must have the same partition key.
Scan() can return multiple records from any partition, but it does so by always scanning the entire table. Regular use of scan is a bad idea.
Without knowing more it's hard to provide guidance, but consider a schema like so:
partition key sort key
12345 12345
12345 12345#67899
12345 12345#67899#97765
Possibly adding some sort of level indicator in the sort key or just as an attribute.

Delete data records - keep joined data

I was not able to find a better title for this.
Branches Users Attendance
----------- ----------- -----------
branchID^ userID^ courseID^
branchName userName userID*
branchID*
Here's my table. Due to company re-structure I need to delete old branches and the users that belong in them. But when my boss wants to see old Attendances he wants to see old userNames even if they don't exist.
What's the best practice here? I'm thinking to add a Disabled column in Branches/Users so they aren't visible on the web page.
A "soft delete" flag is often used to address the requirement to retain both current and logically deleted data. Alternatively, you could move the rows to archive tables for historical reporting.
Having both current and logically deleted rows in the same table is more convenient if you need combined reporting on both. The downside is the presence of the inactive rows can add more overhead for queries of active data only. Much depends on the percentage of inactive rows and the number of rows.
I use this kind of solution:
Making a Log table:
[Log]
ID (bigint IDENTITY(1,1)) PK
Entity_Id (bigint) FK --'Entity' table is list of my tables
Row_Id (bigint) --Is Id of the row of the `Entity`
Kind (int) --0=Create, 1=Modify, 2=Delete, 3=Undelete
actionDate (datetime) --'= GETDATE()'
user_Id (bigint) FK --'User' table is list of users
Now this query gives me the state of the row:
SELECT TOP(1)
Kind,
actionDate,
user_Id
FROM
[Log]
WHERE
Entity_Id = #Entity_Id AND
Row_Id = #Row_Id
ORDER BY
actionDate DESC
As result is:
0 => Created by `user` in `actionDate`
1 => [Last] Modified by `user` in `actionDate`
2 => [Last] Deleted by `user` in `actionDate`
3 => [Last] Undeleted by `user` in `actionDate`
Note :
If you don't want to clear whole database, don't delete any row.
And when you want to delete do it in a mechanism based on relations.

getting the right data model with dynamodb

I am about to create my first dynamodb table and can't find a proper solution to model my requirements. It sounds very basic, but probably my brain is still too much into relational database world.
I want to do store something similar like that:
A user can buy a product (once or several times). What I want to store is username, product_id
The only things I need to query later are:
which products have been purchased by user X
how many times were they purchased
First I considered having having an item with two attributes: username and product_id. But then I cannot use username as primary key (a user can buy more than once) neither can I user username + product_id (user can buy a product several times)
Now I would go for having username, product_id, counter and taking username + product_id as primary key. However, I will always need to check first if a product was already purchased and update it, otherwise create a new entry. For getting all products of a user I would create a global secondary index on username.
However, I am not very sure if this is the right way. Any feedback would be great!
There are probably a number of ways to do this and I don't know all of your requirements so I can't guarantee this is the right answer for you but based on your description, this is what I would do.
First, I'm assuming that each order has some sort of unique order number associated with it. I would use this order number as the primary key of the table. I wouldn't use a range key. This would ensure that the constraint that all primary keys be unique is met. In addition, when I write the data to DynamoDB I would also write the username and the product_id as additional attributes.
Next, I would create a Global Secondary Index that uses the username as the primary key and the product_id as the range key. Unlike the primary key of the table, GSI keys do not have to be unique so if a user purchased a particular product more than once, this would be fine. This GSI would allow me to perform queries such as "find all orders by username" or "find all orders where username purchased product_id".
If you also needed to do queries like "find all usernames who purchased product_id" you would need another GSI that used product_id as the primary key and username as the range key.

Resources