Riak $bucket index lists non-existent keys - riak

I use a simple secondary index query in Riak, traversing the keys in a bucket:
http://riak01:8098/buckets/my_bucket/index/$bucket/_?max_results=10
There are 10 keys in the result, as expected, such. However, when I use some of these keys in a KV query, Riak doesn't find the item. This is not caused by this particular key just being removed by another process, if I repeat (both index and KV query) in an hour, the results are the same.
What might be the reason for such behavior? Is there a way to make sure the secondary index is always consistent with the actual bucket contents, i.e. it 2i query returns the key if and only if an item with such key exists in a bucket?

Related

DynamoDB querying Global Secondary Index with non-existant ExclusiveStartKey returns unexpected results

I have a DynamoDB table that has
partition key "idA", sort key "idB"
GSI partition key "idB", sort key "idA"
I am attempting to delete all items with specific "idB", so I query the GSI to get a list of records, but also want to paginate results for scale.
If I was querying against the main index I could probably simply re-run the query with a limit and delete each record but because I need to use the GSI this results in records that have been deleted showing up in subsequent queries due to GSI's eventual consistency, ie a records deletion often does not propogate to the GSI before the next query is invoked.
Another path is to use the LastEvaluatedKey from the previous query response as the ExclusiveStartKey for the next query which should result in only new records being returned, however there is a fair chance the record LastEvaluatedKey points too will no longer exist due to being deleted in the previous iteration.
When that happens weird results are returned, I thought it should work because on a main index if you send a non-existent ExclusiveStartKey Dynamo can still figure out where it should start retrieving records from, due to it's ordering system.
But on the GSI the results often start with the next expected record, but then some records might get skipped, and often the non-existent ExclusiveStartKey query will not return a LastEvaluatedKey, even though not all remaining records have been returned.
I am playing with ideas to handle this strange behaviour:
not deleting the last record return to decrease the chance
ExclusiveStartKey does not exist (and deleting it's record after the
next iteration)
doing an extra check when no LastEvaluatedKey is returned to make sure no records actually remain
But they are messy workarounds.
Anybody understand why the weird results happen, does it happen in any structured way?
Any other advice how to solve the task I am performing?
DynamoDB GSIs behave just like the base table and a LEK is a pointer to a position on a partition, it does not need the item to exist to understand where to start the next iteration.
Ensure you are not corrupting the ESK and that you pass it to the next Query exactly as returned as an LEK, it should include GSI keys as well as base table keys.
If you still see an issue after that, please share code.

How to filter DynamoDb by object property value

I have a DynamoDB table:
How shoul I filter entried in DB table where all keys are: access.role = "ADMIN"?
You would be best served by setting up an Global Index (GSI). You set the Partition Key equal to that attribute, and the Sort Key equal to some other attribute that you can guarantee will be unique. Then you use your SDK of choice or the Query option in the console, select the index, and query for partion_key = ADMIN
However. Be aware. Index's are a complete replication of the table. Dynamo is very good at this and relatively fast at doing so, but there is still the possibility that your index will be out of sync with the actual data. If you are not making the call against the index very often you are pretty much fine. If you are calling it very often, then you should restructure your table.
Dynamo is not an SQL. When setting up a dynamo schema you have to consider how you will access your data. your Access Patterns. You should design your data with your Partition Key as the data you will have when looking up (Ie: i always will have a user ID number) and your sort keys as the individual documents related to that PK (ie: a user has a document that is his profile data, a document that is his profile picture url, a document that is a list of his friends user numbers, a document that is ... ect)
Then you use Indexs for things like your question that you wont be doing very often.

Fetch last item of the aws dynamodb table

So I wanted to fetch the last item/row of my dynamodb table but i am not finding resources. My primary key is id having series of incremented numbers such as 1,2,3... for each row respectively.
This is my function.
async function readMessage(){
const params = {
TableName: table,
};
return dynamo.getItem(params).promise();
}
I am not sure as to what i should be adding in my params.
DynamoDB has two types of primary keys:
Partition key – A simple primary key, composed of one attribute known as the partition key.
Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
When fetching an item by partition key, you need to specify the exact partition key. You cannot fetch the max/min partition key.
Instead, you may want to create a sort key with a timestamp (or the ID if it's a sequential number) and use the sort key to fetch the last item.
Check out the AWS docs on Choosing the Right Partition Key for more info.
The proper way to design a table in DynamoDB is based on its expected access patterns; if this is something you need perhaps you should consider using this id as Sort Key instead of Primary Key and then query the table in descending order while also limiting the amount of items to 1.
If instead you don't want to change the schema of your items and you don't care about making at least two operations to do this you have two, not optimal options:
If none of your items ever gets deleted, just make a count first and use that information to know what's the latest item that was written.
Alternatively, if you could consider keeping a "special" record in your DynamoDB table that is basically a count that gets always incremented/written when one of your "other" items gets written. Upon retrieval you first retrieve the value of this special record and use this info to retrieve the actual one.
The combination of the partition key and sort key, makes the primary key of your item in the dynamoDB, so their combination must be unique, otherwise the item will be overwritten.
In almost all my use-cases, I select the primary key as an object attribute, like the brand, an email or a class and then, for the sort key I select the TimeStamp. So in this way, you always know the partition key, we need it to retrieve the values and then you can query your dynamoDB by making some filters by the sort key. For more extensive examples using Python, check the AWS page: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.04.html, where it shows, how you can query your DynamoDB items.
There is also other ways to define the keys in your Dynamo and for that I advise you to check https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html

AWS DynamoDB Query based on non-primary keys

I'm new to AWS DynamoDB and wanted to clarify something. Is it possible to query a table and filter base on a non-primary key attribute. My table looks like the following
Store
Id: PrimaryKey
Name: simple string
Location: simple string
Now I want to query on the Name, but I think I have to give the key as well from what I know? Apart from that I can use the scan but then I will be loading all the data.
From the docs:
The Query operation finds items based on primary key values. You can query any table or secondary index that has a composite primary key (a partition key and a sort key).
DynamoDB requires queries to always use the partition key.
In your case your options are:
create a Global Secondary Index that uses Name as a primary key
use a Scan + Filter if the table is relatively small, or if you expect the result set will include the majority of the records in the table
There are few designs principals that you can follow while you are using DynamoDB. If you are coming from a relational background, you have already witnessed the query limitations from primary key attributes.
Design your tables, for querying and separating hot and cold data.
Create Indexes for Querying from Non Key attributes (You have two options, Global Secondary Index which you can define at any time and Local Secondary Index which you need to specify at table creation time).
With the Global Secondary Index you can promote any NonKey attribute as the Partition Key for the Index and select another attribute for Sort Key for querying. For Local Secondary Index, you can promote any Non Key attribute as the Sort Key keeping the same Partition Key.
Using Indexes for query is important also to improve the efficiency in using provisioned throughput.
Although having indexes consumes the read throughput from the table, it also saves read through put from in a way that, if you project the right amount of attributes to read, it can give a huge benefit in reading. Check the following example.
Lets say you have a DynamoDB table that has items of 40KB. If you read directly from the table to list 10 items, it consumes 100 Read Throughput Units (For one item 10 Units since one unit can read 4KB and multiply it by 10). If you have an index defined just to project the attributes needed to list which will be having 4KB per item, then it will be consuming only 10 Read Throughput Units(One Unit per item) which makes a huge difference in terms of cost.
With DynamoDB its really important how you define Indexes to optimize for Querying not only from Query capability but also in terms of throughput.
You can not query based non-primary key attribute in Dynamo Db.
If you wanted to still do that you can do it using scan query,but scan is costly operation in DyanmoDB and if table is large, then it will affect performance and not recommended because it will scan each item in table and AWS cost you for all item it scan for that query.
There are two ways to achieve it
Keep Store Id as your PrimaryKey/ Partaion key of Dyanmo DB table and add Name/Location as sort Key (only one as Dyanmo DB accept only one Attribute as sort key by design.
Create Global Secondary Indexes for Querying from Non Key attributes which you are more frequenly required.
There are 3 ways to created GSI in Dyanamo DB, In your case select GSI with option INCLUDE and add Name , Location and store ID in Idex.
KEYS_ONLY – Each item in the index consists only of the table partition key and sort key values, plus the index key values. The KEYS_ONLY option results in the smallest possible secondary index.
INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will include other non-key attributes that you specify.
ALL – The secondary index includes all of the attributes from the source table. Because all of the table data is duplicated in the index, an ALL projection results in the largest possible secondary index.

limit offset, sorting and aggregation challenges in DynamoDB

I am using DynamoDB to store my device events (in JSON format) into table for further analysis and using scan APIs to display the result set on UI, which requires
To define limit offset of records,say 10 records per page, means
result set should be paginated(e.g. page-1 has 0-10 records, page-2
has 11-20 records and so on), i got an API like scanRequest.withLimit(10) but it has different meaning of limit offset, does DynamoDB API comes with support of limit offset?
I also need to sort result set on basis of user input fields like sorting on Date, Serial Number etc, but still didn't get any sorting/order by APIs.
I may look for aggregation e.g. on Device Name, Date etc. which also doesn't seems to be available in DynamoDB.
The above situation led me to think about some others noSQL database solutions, Please assist me on above mentioned issues.
The right way to think about DynamoDB is as a key-value store with support for indexes.
"Amazon DynamoDB supports key-value data structures. Each item (row) is a key-value pair where the primary key is the only required attribute for items in a table and uniquely identifies each item. DynamoDB is schema-less. Each item can have any number of attributes (columns). In addition to querying the primary key, you can query non-primary key attributes using Global Secondary Indexes and Local Secondary Indexes."
https://aws.amazon.com/dynamodb/details/
A table can have 2 types of keys:
Hash Type Primary Key—The primary key is made of one attribute, a
hash attribute. DynamoDB builds an unordered hash index on this
primary key attribute. Each item in the table is uniquely identified
by its hash key value.
Hash and Range Type Primary Key—The primary
key is made of two attributes. The first attribute is the hash
attribute and the second one is the range attribute. DynamoDB builds
an unordered hash index on the hash primary key attribute, and a
sorted range index on the range primary key attribute. Each item in
the table is uniquely identified by the combination of its hash and
range key values. It is possible for two items to have the same hash
key value, but those two items must have different range key values.
What kind of primary key have you set up for your Device Events table? I would suggest that you denormalize your data (i.e. pull specific attributes out of the json) and build additional indexes on those attributes that you want to sort and aggregate on: Date, Serial Number, etc. If I know what kind of primary key you have set up on your table, I can point you in the right direction to build these indices so that you can get what you need via the query method. The scan method will be inefficient for you because it reads every row in the table.
Lastly, with regard to your "limit offset" question, I think that you're looking for the ExclusiveStartKey, which will be returned by DynamoDB in the response to your query.
The ExclusiveStartKey is what will help you do pagination. It's not necessary to depend on the LastEvaluatedKey from the response. You'll get LastEvaluatedKey only if you are getting more than a MB worth data. If LIMIT page size is such that total returned data size is less than 1 MB, you'll not get back LastEvaluatedKey. But that does not stop you from using ExclusiveStartKey as an offset.

Resources