Selecting data in Dynamodb based on values in another field - amazon-dynamodb

I have a data dump from Plaid API in DynamoDB. Each transaction has transaction_id, pending(bool), and pending_transaction_id (the FK basically to the older pending transaction it replaces)
{
"account_id": "acct1", // partition key
"transaction_id": "txn100", // sort key
"category_id": "22001000",
"pending": false,
"pending_transaction_id": "txn1",
"amount": 500,
},
{
"account_id": "acct1",
"transaction_id": "txn1",
"category_id": "22001000",
"pending": true,
"pending_transaction_id": null,
"amount": 500,
},
Is it possible to query in a single query only pending transactions that don't have a permanent replacement yet?
In other words, if it was relational DB it would be along the lines
select * from txn where pending == false and transaction_id not in (select pending_transaction_id from txn) (or whatever flavor of CTE or left join you prefer).
How do I do this in dynamo db in a single query?

We can have a GSI here to solve this problem.
PK (pending)
SK (pending_transaction_id)
..
false
txn1
..
true
null
..
We can then query over records which PK and get our records.
Points to consider/ observe:
Since SK is null here, record will not be created. This works for us as we don't need those records.
We can include pending = true records in our GSI if required, however that means having "NULL" attribute value.
The advantage with GSI I see here is (considering only pt. 1), we are keeping duplicate records only which we need as part of our query.

Related

DynamoDB how to use sort key with PartiQL query?

Hello I m new to DynamoDB, I have created a TABLE, with Partition Key "pk" and Sort Key "id"
in then item explorer I can query with the pk and sort key value and it seems to work.
In the PartiQL Editor I do
SELECT * FROM "dev" WHERE "pk" = 'config' AND "id" = "7b733512cc98445891dcb07dc4299ace"
and I get the error Filter Expression can only contain non-primary key attributes: Primary key attribute: id
I don't know how I can specify the sort key in the key conditions instead of the filter condition with the WHERE clause.
I found the error, if you use " instead of ' it doesn't work. so the correct query is :
SELECT * FROM "dev" WHERE "pk" = 'config' AND "id" = '7b733512cc98445891dcb07dc4299ace'

DynamoDB filter if primary key contains value

CURRENTLY
I have a table in DynamoDB with a single attribute - Primary Key - that contains unique values.
PK
------
#A#B#C#
#B#C#
#C#D#E#
#BC#
ISSUE
I am looking to do 2 searches for #B#C# (1) exact match, and (2) containing match, and therefore only want results:
(1) Exact Match:
#B#C#
(2) Containing Match:
#A#B#C#
#B#C#
Are these 2 searches possible against the primary key?
If so, what is the most efficient query to run? e.g. QUERY or SCAN
Note:
For (2) I am using the following code, but it is returning all items in DB:
params = {
TableName: 'myTable',
FilterExpression: "contains(#key, :v)",
ExpressionAttributeNames: { "#key": "PK" },
ExpressionAttributeValues: { ":v": #B#C# }
}
dynamodb.scan(params,callback)
DynamoDB supports two main types of searches: query and scan. The Query operation finds items based on primary key values. The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index
If you wanted to find the item with a primary key #B#C, you would use the query API:
ddbClient.query(
{
"TableName": "<YOUR TABLE NAME>",
"KeyConditionExpression": "#pk = :pk",
"ExpressionAttributeValues": {
":pk": {
"S": "#B#C"
}
},
"ExpressionAttributeNames": {
"#pk": "PK"
}
}
)
For your second access pattern, you'll need to use the scan API because you are searching across the entire table/secondary index.
You can use scan to test if a primary key has a substring using contains. I don't see anything wrong with the format of your scan operation.
Be careful when using scan this way. Because scan will read your entire table to fetch results, you will have a fairly inefficient operation at scale. If this operation is run infrequently, or you are running it against a sparse index, it's probably fine. However, if it's one of your primary access patterns, you may want to reconsider using the scan API for this operation.

Is safe to set a randomly generated alpha numeric string as primary partition key and sort key in DynamoDB

Here is the sample JSON which we are planning to insert into DynamoDB table. As of now we are having organizationID as primary partition key and __id__ as sort key. Since we will query based on organizationID we kept it as primary partition key. Is it a good approach to keep __id__ as sort key.
{
"__class__": "package",
"__updated__": "2015-10-19T14:30:13Z",
"__created__": "2015-10-19T12:32:28Z",
"transactions": [
{
transaction1
},
{
transaction2
}
],
"carrier": "USPS",
"organizationID": "6406fa6fd32393908125d4d81ec358",
"barcode": "9400110891302408",
"queryString": [
"xxxxxxx",
"YYYY",
"delivered",
],
"deliveredTo": null,
"__id__": "3232d1a045476786fg22dfg32b82209155b32"
}
As per the best practice, you can have timestamp as sort key for the above data model. One advantage of having timestamp as sort key is that you can sort the data for the particular partition key and identity the latest updated item. This is the very common use case for having sort key.
It doesn't make much sense to keep both partition and sort key as randomly generated value because you can't use sort key efficiently (unless I miss something here).

What's the equivalent DynamoDB solution for this MySQL Query?

I'm familiar with MySQL and am starting to use Amazon DynamoDB for a new project.
Assume I have a MySQL table like this:
CREATE TABLE foo (
id CHAR(64) NOT NULL,
scheduledDelivery DATETIME NOT NULL,
-- ...other columns...
PRIMARY KEY(id),
INDEX schedIndex (scheduledDelivery)
);
Note the secondary Index schedIndex which is supposed to speed-up the following query (which is executed periodically):
SELECT *
FROM foo
WHERE scheduledDelivery <= NOW()
ORDER BY scheduledDelivery ASC
LIMIT 100;
That is: Take the 100 oldest items that are due to be delivered.
With DynamoDB I can use the id column as primary partition key.
However, I don't understand how I can avoid full-table scans in DynamoDB. When adding a secondary index I must always specify a "partition key". However, (in MySQL words) I see these problems:
the scheduledDelivery column is not unique, so it can't be used as a partition key itself AFAIK
adding id as unique partition key and using scheduledDelivery as "sort key" sounds like a (id, scheduledDelivery) secondary index to me, which makes that index pratically useless
I understand that MySQL and DynamoDB require different approaches, so what would be a appropriate solution in this case?
It's not possible to avoid a full table scan with this kind of query.
However, you may be able to disguise it as a Query operation, which would allow you to sort the results (not possible with a Scan).
You must first create a GSI. Let's name it scheduled_delivery-index.
We will specify our index's partition key to be an attribute named fixed_val, and our sort key to be scheduled_delivery.
fixed_val will contain any value you want, but it must always be that value, and you must know it from the client side. For the sake of this example, let's say that fixed_val will always be 1.
GSI keys do not have to be unique, so don't worry if there are two duplicated scheduled_delivery values.
You would query the table like this:
var now = Date.now();
//...
{
TableName: "foo",
IndexName: "scheduled_delivery-index",
ExpressionAttributeNames: {
"#f": "fixed_value",
"#d": "scheduled_delivery"
},
ExpressionAttributeValues: {
":f": 1,
":d": now
},
KeyConditionExpression: "#f = :f and #d <= :d",
ScanIndexForward: true
}

Make own like system for various content

There are three types of content in my database. They are Songs, Albums and Playlists. Albums and Playlists are just collections of songs. And I want to let the user put like for each of them. I made table with columns
LikeId UserId SongId PlaylistId AlbumId
for storing likes. For example if user puts like to song, I put song's id into SongId column and user's id into UserId column. Other columns will be null. It's working good,but I don't like this solution because it's not normalized.
So I want to ask if there are better solutions for this.
You should just create 3 tables - one for User paired with each of Playlist, Song, and Album. They'd look something like:
CREATE TABLE PlaylistLikes
(
UserID INT NOT NULL,
PlaylistID INT NOT NULL,
PRIMARY KEY (UserID, PlaylistID),
FOREIGN KEY (UserID) REFERENCES Users (UserID),
FOREIGN KEY (PlaylistID) REFERENCES Playlists (PlaylistID)
);
CREATE TABLE SongLikes
(
UserID INT NOT NULL,
SongID INT NOT NULL,
PRIMARY KEY (UserID, SongID),
FOREIGN KEY (UserID) REFERENCES Users (UserID),
FOREIGN KEY (SongID) REFERENCES Songs (SongID)
);
CREATE TABLE AlbumLikes
(
UserID INT NOT NULL,
AlbumID INT NOT NULL,
PRIMARY KEY (UserID, AlbumID),
FOREIGN KEY (UserID) REFERENCES Users (UserID),
FOREIGN KEY (AlbumID) REFERENCES Albums (AlbumID)
);
Here, having both columns in the primary key prevents the user from liking the song/playlist/album more than once (unless you want that to be available - then remove it or maybe keep track of that in a 'number of likes' column).
You should avoid putting all 3 different types of likes in the same table - different tables should be used to represent different things. You want to avoid "One True Lookup Table" - here's one answer detailing why: OTLT
If you want to query against all 3 tables, you can create a view which is the result of a UNION between the 3 tables.
How about
LikeId UserId LikeType TargetId
Where LikeType can be "Song", "Playlist" or "Album" ?
Your solution is fine. It has the nice feature that you can set up explicit foreign key relationships to the other tables. In addition, you can verify that exactly one of the values is set by adding a check constraint:
check ((case when SongId is null then 0 else 1 end) +
(case when AlbumId is null then 0 else 1 end) +
(case when PlayListId is null then 0 else 1 end)
) = 1
There is an overhead incurred, of storing NULL values for all three. This is fairly minimal for three values.
You can even add a computed column to get which value is stored:
WhichId = (case when SongId is not null then 'Song'
when AlbumId is not null then 'Album'
when PlayListId is not null then 'PlayList
end);
As a glutton for punishment, I would use three tables: UserLikesSongs, UserLikesPlaylists and UserLikesAlbums. Each contains a UserId and an appropriate reference to one of the other tables: Songs, Albums or Playlists.
This also allows adding additional type-specific information. Perhaps Albums will support a favorite track in the future.
You can always use UNION to combine data from the various entity types.

Resources