DynamoDB transactional insert with multiple conditions (PK/SK attribute_not_exists and SK attribute_exists) - amazon-dynamodb

I have a table with PK (String) and SK (Integer) - e.g.
PK_id SK_version Data
-------------------------------------------------------
c3d4cfc8-8985-4e5... 1 First version
c3d4cfc8-8985-4e5... 2 Second version
I can do a conditional insert to ensure we don't overwrite the PK/SK pair using ConditionalExpression (in the GoLang SDK):
putWriteItem := dynamodb.Put{
TableName: "example_table",
Item: itemMap,
ConditionExpression: aws.String("attribute_not_exists(PK_id) AND attribute_not_exists(SK_version)"),
}
However I would also like to ensure that the SK_version is always consecutive but don't know how to write the expression. In pseudo-code this is:
putWriteItem := dynamodb.Put{
TableName: "example_table",
Item: itemMap,
ConditionExpression: aws.String("attribute_not_exists(PK_id) AND attribute_not_exists(SK_version) **AND attribute_exists(SK_version = :SK_prev_version)**"),
}
Can someone advise how I can write this?
in SQL I'd do something like:
INSERT INTO example_table (PK_id, SK_version, Data)
SELECT {pk}, {sk}, {data}
WHERE NOT EXISTS (
SELECT 1
FROM example_table
WHERE PK_id = {pk}
AND SK_version = {sk}
)
AND EXISTS (
SELECT 1
FROM example_table
WHERE PK_id = {pk}
AND SK_version = {sk} - 1
)
Thanks

A conditional check is applied to a single item. It cannot be spanned across multiple items. In other words, you simply need multiple conditional checks. DynamoDb has transactWriteItems API which performs multiple conditional checks, along with writes/deletes. The code below is in nodejs.
const previousVersionCheck = {
TableName: 'example_table',
Key: {
PK_id: 'prev_pk_id',
SK_version: 'prev_sk_version'
},
ConditionExpression: 'attribute_exists(PK_id)'
}
const newVersionPut = {
TableName: 'example_table',
Item: {
// your item data
},
ConditionExpression: 'attribute_not_exists(PK_id)'
}
await documentClient.transactWrite({
TransactItems: [
{ ConditionCheck: previousVersionCheck },
{ Put: newVersionPut }
]
}).promise()
The transaction has 2 operations: one is a validation against the previous version, and the other is an conditional write. Any of their conditional checks fails, the transaction fails.

You are hitting your head on some of the differences between a SQL and a no-SQL database. DynamoDB is, of course, a no-SQL database. It does not, out of the box, support optimistic locking. I see two straight forward options:
Use a software layer to give you locking on your DynamoDB table. This may or may not be feasible depending on how often updates are made to your table. How fast 'versions' are generated and the maximum time your application can be gated on the lock will likely tell you if this can work foryou. I am not familiar with Go, but the Java API supports this. Again, this isn't a built-in feature of DynamoDB. If there is no such Go API equivalent, you could use the technique described in the link to 'lock' the table for updates. Generally speaking, locking a no-SQL DB isn't a typical pattern as it isn't exactly what it was created to do (part of which is achieving large scale on unstructured documents to allow fast access to many consumers at once)
Stop using an incrementor to guarantee uniqueness. Typically, incrementors are frowned upon in DynamoDB, in part due to the lack of intrinsic support for it and in part because of how DynamoDB shards you don't want a lot of similarity between records. Using a UUID will solve the uniqueness problem, but if you are porting an existing application that means more changes to the elements that create that ID and updates to reading the ID (perhaps to include a creation-time field so you can tell which is the newest, or the prepending or appending of an epoch time to the UUID to do the same). Here is a pertinent link to a SO question explaining on why to use UUIDs instead of incrementing integers.

Based on Hung Tran's answer, here is a Go example:
checkItem := dynamodb.TransactWriteItem{
ConditionCheck: &dynamodb.ConditionCheck{
TableName: "example_table",
ConditionExpression: aws.String("attribute_exists(pk_id) AND attribute_exists(version)"),
Key: map[string]*dynamodb.AttributeValue{"pk_id": {S: id}, "version": {N: prevVer}},
},
}
putItem := dynamodb.TransactWriteItem{
Put: &dynamodb.Put{
TableName: "example_table",
ConditionExpression: aws.String("attribute_not_exists(pk_id) AND attribute_not_exists(version)"),
Item: data,
},
}
writeItems := []*dynamodb.TransactWriteItem{&checkItem, &putItem}
_, _ = db.TransactWriteItems(&dynamodb.TransactWriteItemsInput{TransactItems: writeItems})

Related

Find ONLY soft deleted rows with Sequelize

I'm running a database on sequelize and sqlite and I use soft-deletes to basically archive the data.
I'm aware that with .findAll(paranoid: false) I can find all rows including the soft deleted ones. However I would like to find ONLY the soft-deleted ones.
Is there any way to achieve this? Or is there perhaps a way to do "set operations" with two data results, like finding the relative complement of one in the other?
For this, you can add the following condition to where.
deletedAt: { [Op.not]: null }
For example like this:
const projects = await db.Project.findAndCountAll({
paranoid: false,
order: [['createdAt', 'DESC']],
where: { employer_id: null, deletedAt: { [Op.not]: null } },
limit: parseInt(size),
offset: (page - 1) * parseInt(size),
});

Create a perl hash from a db select

Having some trouble understanding how to create a Perl hash from a DB select statement.
$sth=$dbh->prepare(qq{select authorid,titleid,title,pubyear from books});
$sth->execute() or die DBI->errstr;
while(#records=$sth->fetchrow_array()) {
%Books = (%Books,AuthorID=> $records[0]);
%Books = (%Books,TitleID=> $records[1]);
%Books = (%Books,Title=> $records[2]);
%Books = (%Books,PubYear=> $records[3]);
print qq{$records[0]\n}
print qq{\t$records[1]\n};
print qq{\t$records[2]\n};
print qq{\t$records[3]\n};
}
$sth->finish();
while(($key,$value) = each(%Books)) {
print qq{$key --> $value\n};
}
The print statements work in the first while loop, but I only get the last result in the second key,value loop.
What am I doing wrong here. I'm sure it's something simple. Many thanks.
OP needs better specify the question and do some reading on DBI module.
DBI module has a call for fetchall_hashref perhaps OP could put it to some use.
In the shown code an assignment of a record to a hash with the same keys overwrites the previous one, row after row, and the last one remains. Instead, they should be accumulated in a suitable data structure.
Since there are a fair number of rows (351 we are told) one option is a top-level array, with hashrefs for each book
my #all_books;
while (my #records = $sth->fetchrow_array()) {
my %book;
#book{qw(AuthorID TitleID Title PubYear)} = #records;
push #all_books, \%book;
}
Now we have an array of books, each indexed by the four parameters.
This uses a hash slice to assign multiple key-value pairs to a hash.
Another option is a top-level hash with keys for the four book-related parameters, each having for a value an arrayref with entries from all records
my %books;
while (my #records = $sth->fetchrow_array()) {
push #{$books{AuthorID}}, $records[0];
push #{$books{TitleID}}, $records[1];
...
}
Now one can go through authors/titles/etc, and readily recover the other parameters for each.
Adding some checks is always a good idea when reading from a database.

Azure Cosmos DB (NOT IS_DEFINED OR ) clause with JOIN always evaluates to false

I have a document:
{
contact: {
id: '123'
},
channels: [
{
... some channel info...
}
],
lastUpdatedEpoch: 1583937675
}
And I have following query which doesn't return the above document:
SELECT p FROM p JOIN c IN p.channels
WHERE (NOT IS_DEFINED(p.lastUpdatedEpoch) OR p.lastUpdatedEpoch < 1585733881)
AND p.contact.id = '123'
But when I remove NOT IS_DEFINED check, it correctly returns the document:
SELECT p FROM p JOIN c IN p.channels
WHERE (p.lastUpdatedEpoch < 1585733881)
AND p.contact.id = '123'
I also tried replacing NOT IS_DEFINED clause with FALSE and it returns the document:
SELECT p FROM p JOIN c IN p.channels
WHERE (FALSE OR p.lastUpdatedEpoch < 1585733881)
AND p.contact.id = '123'
Also, if I remove JOIN, the query works as expected and returns the document:
SELECT p FROM
WHERE (NOT IS_DEFINED(p.lastUpdatedEpoch) OR p.lastUpdatedEpoch < 1585733881)
AND p.contact.id = '123'
To me this behavior is unexpected. When lastUpdatedEpoch is defined, I expect the same result from the first and second query (aside from the fact NOT_ISDEFINED will cause the index to be not used).
Could someone please explain what's going on here?
I try to reproduce your issue on my side but failed.The result is expected for me.
Test sample data:
Sql Output:
It seems that you did not refer any columns in channels.I suggest you create some simple test data to verify whether your sql is right.Then try to compare with your actual data.
I contacted CosmosDB team, and the team was able give some insight about the issue.
There was new optimization that was recently put in to allow inequality and NotIsDefined expressions to utilize the index. There was some issue with this optimization, and the team disabled this feature for now. If you are able to observe this issue with your cluster, please contact their support team.

Can't scan on DynamoDB map nested attributes

I'm new to DynamoDB and I'm trying to query a table from javascript using the Dynamoose library. I have a table with a primary partition key of type String called "id" which is basically a long string with a user id. I have a second column in the table called "attributes" which is a DynamoDB map and is used to store arbitrary user attributes (I can't change the schema as this is how a predefined persistence adapter works and I'm stuck working with it for convenience).
This is an example of a record in the table:
Item{2}
attributes Map{2}
10 Number: 2
11 Number: 4
12 Number: 6
13 Number: 8
id String: YVVVNIL5CB5WXITFTV3JFUBO2IP2C33BY
The numeric fields, such as the "12" field, in the Map can be interpreted as "week10", "week11","week12" and "week13" and the numeric values 2,4,6 and 8 are the number of times the application was launched that week.
What I need to do is get all user ids of the records that have more than 4 launches in a specific week (eg week 12) and I also need to get the list of user ids with a sum of 20 launches in a range of four weeks (eg. from week 10 to 13).
With Dynamoose I have to use the following model:
dynamoose.model(
DYNAMO_DB_TABLE_NAME,
{id: String, attributes: Map},
{useDocumentTypes: true, saveUnknown: true}
);
(to match the table structure generated by the persistence adapter I'm using).
I assume I will need to do DynamoDB "scan" to achieve this rather than a "query" and I tried this to get started and get a records where week 12 equals 6 to no avail (I get an empty set as result):
const filter = {
FilterExpression: 'contains(#attributes, :val)',
ExpressionAttributeNames: {
'#attributes': 'attributes',
},
ExpressionAttributeValues: {
':val': {'12': 6},
},
};
model.scan(filter).all().exec(function (err, result, lastKey) {
console.log('query result: '+ JSON.stringify(result));
});
If you don't know Dynamoose but can help with solving this via the AWS SDK tu run a DynamoDB scan directly that might also be helpful for me.
Thanks!!
Try the following.
const filter = {
FilterExpression: '#attributes.#12 = :val',
ExpressionAttributeNames: {
'#attributes': 'attributes',
'#12': '12'
},
ExpressionAttributeValues: {
':val': 6,
},
};
Sounds like what you are really trying to do is filter the items where attributes.12 = 6. Which is what the query above will do.
Contains can't be used for objects or arrays.

CouchDB View with 2 Keys

I am looking for a general solution to a problem with couchdb views.
For example, have a view result like this:
{"total_rows":4,"offset":0,"rows":[
{"id":"1","key":["imported","1"],"value":null},
{"id":"2","key":["imported","2"],"value":null},
{"id":"3","key":["imported","3"],"value":null},
{"id":"4","key":["mapped","4"],"value":null},
{"id":"5,"key":["mapped","5"],"value":null}
]
1) If I want to select only "imported" documents I would use this:
view?startkey=["imported"]&endkey=["imported",{}]
2) If I want to select all imported documents with an higher id then 2:
view?startkey=["imported",2]&endkey=["imported",{}]
3) If I want to select all imported documents with an id between 2 and 4:
view?startkey=["imported",2]&endkey=["imported",4]
My Questtion is: How can I select all Rows with an id between 2 and 4?
You can try to extend the solution above, but prepend keys with a kind of "emit index" flag like this:
map: function (doc) {
emit ([0, doc.number, doc.category]); // direct order
emit ([1, doc.category, doc.number]); // reverse order
}
so you will be able to request them with
view?startkey=[0, 2]&endkey=[0, 4, {}]
or
view?startkey=[1, 'imported', 2]&endkey=[1, 'imported', 4]
But 2 different views will be better anyway.
I ran into the same problem a little while ago so I'll explain my solution. Inside of any map function you can have multiple emit() calls. A map function in your case might look like:
function(doc) {
emit([doc.number, doc.category], null);
emit([doc.category, doc.number], null);
}
You can also use ?include_docs=true to get the documents back from any of your queries. Then your query to get back rows 2 to 4 would be
view?startkey=[2]&endkey=[4,{}]
You can view the rules for sorting at CouchDB View Collation

Resources