Remove items from list attribute conditionally

Remove items from list attribute conditionally - amazon-dynamodb

I have an item with a list attribute, each element of the list is a map.
"listAttr": [
{"field1":"value1", "field2":"value2", ...}
, {"field1":"value3", "field2":"value4", ...}
....
]
I want to execute a REMOVE statement that would look like this:
REMOVE listAttr[x]
With a condition that
listAttr[x].field1 = :val
I want to remove all element of the list that match my condition
I don't want to query the item first to calculate the ids, because it would create a race between insert and remove (id could get out of sync by the time I execute the REMOVE)
The SQL equivalent would be:
DELETE from listAttr where field1=?
How do I do this ?

My understanding is that this isn't possible in DynamoDB. You can use a PUT operation with optimistic locking to avoid lost updates.
If there are frequents writes to the item which would make this operation infeasible then you could look into "vertical partitioning" where you split up a single large item into multiple items, this allows you to reduce the risk of race conditions, in case applications only change parts of the item at a time, and potentially reduces the costs of reads or writes, in case the single item is multiple kb large and you can have more targeted queries.

Related

DynamoDB Limit on query

I have a doubt about Limit on query/scans on DynamoDB.
My table has 1000 records, and the query on all of them return 50 values, but if I put a Limit of 5, that doesn't mean that the query will return the first 5 values, it just say that query for 5 Items on the table (in any order, so they could be very old items or new ones), so it's possible that I got 0 items on the query. How can actually get the latest 5 items of a query? I need to set a Limit of 5 (numbers are examples) because it will to expensive to query/scan for more items than that.
The query has this input
{
TableName: 'transactionsTable',
IndexName: 'transactionsByUserId',
ProjectionExpression: 'origin, receiver, #valid_status, createdAt, totalAmount',
KeyConditionExpression: 'userId = :userId',
ExpressionAttributeValues: {
':userId': 'user-id',
':payment_gateway': 'payment_gateway'
},
ExpressionAttributeNames: {
'#valid_status': 'status'
},
FilterExpression: '#valid_status = :payment_gateway',
Limit: 5
}
The index of my table is like this:
Should I use a second index or something, to sort them with the field createdAt but then, how I'm sure that the query will look into all the items?

if I put a Limit of 5, that doesn't mean that the query will return the first 5 values, it just say that query for 5 Items on the table (in any order, so they could be very old items or new ones), so it's possible that I got 0 items on the query. How can actually get the latest 5 items of a query?
You are correct in your observation, and unfortunately there is no Query options or any other operation that can guarantee 5 items in a single request. To understand why this is the case (it's not just laziness on Amazon's side), consider the following extreme case: you have a huge database with one billion items, but do a very specific query which has just 5 matching items, and now making the request you wished for: "give me back 5 items". Such a request would need to read the entire database of a billion items, before it can return anything, and the client will surely give up by then. So this is not how DyanmoDB's Limit works. It limits the amount of work that DyanamoDB needs to do before responding. So if Limit = 100, DynamoDB will read internally 100 items, which takes a bounded amount of time. But you are right that you have no idea whether it will respond with 100 items (if all of them matched the filter) or 0 items (if none of them matched the filter).
So to do what you want to do efficiently, you'll need to think of a different way to model your data - i.e., how to organize the partition and sort keys. There are different ways to do it, each has its own benefits and downsides, you'll need to consider your options for yourself. Since you asked about GSI, I'll give you some hints about how to use that option:
The pattern you are looking for is called filtered data retrieval. As you noted, if you do a GSI with the sort key being createdAt, you can retrieve the newest items first. But you still need to do a filter, and still don't know how to stop after 5 filtered results (and not 5 pre-filtering) results. The solution is to ask DynamoDB to only put in the GSI, in the first place, items which pass the filtering. In your example, it seems you always use the same filter: "status = payment_gateway". DynamoDB doesn't have an option to run a generic filter function when building the GSI, but it has a different trick up its sleeve to achieve the same thing: Any time you set "status = payment_gateway", also set another attribute "status_payment_gateway", and when status is set to something else, delete the "status_payment_gateway". Now, create the GSI with "status_payment_gateway" as the partition key. DynamoDB will only put items in the GSI if they have this attribute, thereby achieving exactly the filtering you want.
You can also have multiple mutually-exclusive filtering criteria in one GSI by setting the partition key attribute to multiple different values, and you can then do a Query on each of these values separately (using KeyConditionExpression).

Use RocksDB to support key-key-value (RowKey->Containers) by splitting the container

Support I have key/value where value is a logical list of strings where I can append strings. To avoid the situation where inserting a single string item to the queue causing re-write the entire list, I'd using multiple key-value pairs to represent it.
Key -> metadata of the value such as length and subkey format
Key-l1 -> value of item 1 in list
Key-l2 -> value of item 2 in list
Key-ln -> the lastest value in the list
I'd override the key comparer in RocksDB such that sorting of Key-ln formatted key is sort Key part first and ln second (i.e. group by and sort by Key and within the same Key value sort by ln). This way, all the list items along with its root key and metadata are grouped together in sst during initial bulk insert and during later sst compaction.
Appending a new list item becomes (1) first read Key-metadata to get the current list size of n; 2) insert Key-l(n+1) with new value. Deleting list item works as it is for RocksDB by deleting Key-ln and update the metadata.
To ensure the consistency, (1) and (2) will be done inside a RocksDB transaction.
This design seems to be ok?
Now, if I want to add anther feature of TTL for entire key-value(list), I'd use TTL support already in RocksDB. My understanding is that TTL to remove expired item happens during compaction. However, such compaction is not done under a transaction. RocksDB doesn't know that Key-metadata and Key-ln entries are related. It is entirely possible that there is a time window where Key->metadata(root node) is deleted while child nodes of (Key-ln) is not deleted yet (or reverse order). If during this time window, someone reads or update the list, it will get an inconsistent for the Key-list. Any remedy for it?
Thanks

You should use Merge Operator, it's designed for such value append use case. Your design is read-before-write, which has performance penalty, in general it should be avoided if possible: What's read-before-write in NoSQL?.
Options options;
options.merge_operator.reset(new StringAppendOperator(','));
DB::Open(options, kDBPath, &db)
...
db->Merge(WriteOptions(), "key", "value1");
db->Merge(WriteOptions(), "key", "value2");
db_->Get(ReadOptions(), "key", &result); // return "value1,value2"
The above example uses a predefined StringAppendOperator, which simply append new values at the end. You can defined your own MergeOperator to customize the merge operation.
In the backend, the merge operation is done on the read path (and compaction to reduce the version number), details: Merge Operator Implementation.

getting results for a list of primary keys from dynamodb using table

I have a dynamodB table with which i fetch a single row in the following way:
private Table myTable;
myTable = dynamoDB.getTable(tableName);
myTable.getItem(new PrimaryKey(primaryKey, primaryKeyValue));
Is there a way for me to retrieve with a list of primary keys? I see that I can use batchGetItem but however for that I will need to use the interface AmazonDynamoDB. Is there an alternative way using the table?

To get all items in your table you need to use Scan operation:
The Scan operation returns one or more items and item attributes by
accessing every item in a table or a secondary index. To have DynamoDB
return fewer items, you can provide a FilterExpression operation.
If the total number of scanned items exceeds the maximum data set size
limit of 1 MB, the scan stops and results are returned to the user as
a LastEvaluatedKey value to continue the scan in a subsequent
operation. The results also include the number of items exceeding the
limit. A scan can result in no table data meeting the filter criteria.
By default it will return all fields, but you can provide a projection expression to get only some fields (ids in your case):
To read data from a table, you use operations such as GetItem, Query,
or Scan. DynamoDB returns all of the item attributes by default. To
get just some, rather than all of the attributes, use a projection
expression.
A projection expression is a string that identifies the attributes you
want. To retrieve a single attribute, specify its name. For multiple
attributes, the names must be comma-separated.
Keep in mind that scans are expensive, since you pay not for items that DynamoDB returns, but for items that DynamoDB reads in the database:
A Scan operation always scans the entire table or secondary index,
then filters out values to provide the desired result, essentially
adding the extra step of removing data from the result set. Avoid
using a Scan operation on a large table or index with a filter that
removes many results, if possible. Also, as a table or index grows,
the Scan operation slows. The Scan operation examines every item for
the requested values, and can use up the provisioned throughput for a
large table or index in a single operation. For faster response times,
design your tables and indexes so that your applications can use Query
instead of Scan. (For tables, you can also consider using the GetItem
and BatchGetItem APIs.).

Reasons for not having batch get item on Table class:-
Table class is Thread safe
Table class implements the atomic operations of items such as DeleteItemApi, GetItemApi, PutItemApi, QueryApi, ScanApi, UpdateItemApi
Batch get item needs to deal with multiple items
The most important point is that Batch get item can get items from multiple tables
Example code to get items from multiple tables:-
The below code get items from Movies and Post table
DynamoDB dynamoDB = new DynamoDB(dynamoDBClient);
TableKeysAndAttributes movieTableKeyAndAttributes = new TableKeysAndAttributes("Movies").withPrimaryKeys(new PrimaryKey("yearkey",1999 ,"title", "List test title"));
TableKeysAndAttributes postableKeyAndAttributes = new TableKeysAndAttributes("post").withPrimaryKeys(new PrimaryKey("postId", "14"));
BatchGetItemSpec batchGetItemSpec = new BatchGetItemSpec().withTableKeyAndAttributes(movieTableKeyAndAttributes,postableKeyAndAttributes);
BatchGetItemOutcome batchGetItemOutcome = dynamoDB.batchGetItem(batchGetItemSpec);
System.out.println(batchGetItemOutcome.getBatchGetItemResult().getResponses());

EMC Documentum DQL - How to delete repeating attribute

I have a few objects created on my database and I need to delete some of the repeating attributes related to them.
The query I'm trying to run is:
UPDATE gemp1_product objects REMOVE ingredients[1] WHERE (r_object_id = '08015abd8002cd68')
But all I get is the folloing error message:
Error querying databse.
[DM_QUERY_E_UPDATE_INDEX]error: "UPDATE: Unable to REMOVE tghe attribute ingredients at index 1."
[DM_OBJECT_W_DELETE_ATTR_POSITION_ERROR]warning: "attempt to delete
non-existent attribute 88"
Object 08015abd8002cd68 exists and I can see it on the database. Queries like SELECT and DELETE work fine but I do not want to delete the whole object.

There is no easy way to do this. The reason is that repeating attributes are ordered, to enable multiple repeating attributes to be synchronized for a given object.
Either
set the attribute value to be empty for the given position, and change your code to discard empty attributes, or
use multiple DQL statements to shuffle the order so that the last one becomes empty, or
change your data model, e.g. use a single attribute as a property bag with pre-defined delimiters.
Details (1)
UPDATE gemp1_product OBJECTS SET ingredients[1] = '' WHERE ...
Details (2)
For each index; first find the value of index+1:
SELECT ingredients
FROM gemp1_product
WHERE (i_position*-1)-1 = <index+1>
ENABLE (ROW_BASED)
Use the value in a new query:
UPDATE gemp1_product OBJECTS SET ingredients[1] = '<value_from_above>' WHERE ...
It should also be possible to do this by nesting DQL somehow, but it might not be worth the effort.

Something is either wrong with your query or with your repository. I think you are mistyping your attribute name or using wrong index in your UPDATE query.
If you google for DM_OBJECT_W_DELETE_ATTR_POSITION_ERROR you'll see on this link a bit more detailed explanation:
CAUSE: Program executed a DeleteAttr operation that specified an non-existent attribute position (either a negative number or a number larger than the number of attributes in the object).
From this you could guess that type isn't in consistent state, or that you are trying to remove too big index of your repeating attribute, etc. Did you checked your repository with Consistency checker Job and other similar Jobs?
As of for the removing of repeating property (sttribute) value with DQL query, this is unachievable with single query since you need to specify index position which you don't know at first. But writing a simple script or doing it manually if it's not big amount of values to delete is the way you want to go.

Dealing with PL/SQL Collections

I have following declaration for collection
TYPE T_TABLE1 IS TABLE OF TABLE_1%ROWTYPE INDEX BY BINARY_INTEGER;
tbl1_u T_TABLE1;
tbl1_i T_TABLE1;
This table will keep growing and at the end, will be used in FORALL loop to do insert or update on TABLE_1.
Now there might be cases, where I want to delete a certain element. So i am planning to create a procedure, which will take the KEY (unique) and matched the element if that key is found
PSEDUO CODE
FOR i in tbl1_u.FIST..tbl1_u.LAST
LOOP
if tbl1_u(i).key = key then
tbl1.delete(i);
end if;
END LOOP;
My question is,
Once i delete the particular element, would be collection adjust automatically i.e., the index i would be replaced by next element or would that particular index will remain null/invalid and could possibly give me exception if i use it in FORALL INSERT/UPDATE?
I don't think that i can pass TABLE_1%ROWTYPE object to a procedure, do i have to create a record type ?
Any other tip regarding managing collection for bull delete/update/insert would be appreciate. Remeber, I would be dealing with 2 tables, if i am inserting/updating in table_1 then it means i am deleting it from table_2 and vice-versa.

Given that TABLE_1.KEY is unique you might consider using that as the index to your associative arrays. That way you can delete from the collections using the KEY value, which according to the pseudocode is available when doing the deletions. This would also save you having to iterate through the table to find the KEY you want, as the KEY would be the index - so your "deletion" pseudo-code would become:
tbl1_u.delete(key);
To answer your questions:
Since you're using associative arrays, when an element is deleted there is no "empty" space in the collection. The indexes for the elements, however, don't actually change. Therefore you need to use the collection.PRIOR and collection.NEXT methods to loop through the collection. But again, if you use the KEY value as the index you may not need to loop through the collections at all.
You can pass a TABLE_1%ROWTYPE as a parameter to a PL/SQL procedure or function.
You might want to consider using a MERGE statement which could handle doing the inserts and updates in one step. This might allow you to maintain only a single collection. Might be worth looking in to.
Share and enjoy.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex