DynamoDB query for multiple hash and range key pairs in a global secondary index - amazon-dynamodb

I have a table that contains SuperGroup, Group and User data where a SuperGroup contains multiple Groups and a Group contains multiple Users. Each of these has a type and uuid attribute, where the type corresponds to what they are.
I have a GSI with the hash key as the type attribute and the range key as the uuid and I need a way to query the table such that I can fetch the relevant data for a list of type and uuid pairs. There will always be exactly one of each type.
Pseudo-example of the query inputs:
query_inputs = [
("SuperGroup", "super-group-uuid"),
("Group", "group-uuid"),
("User", "user-uuid"),
]
Can I do this in a single query? I'd like to avoid a scan, but I'm open to modeling my data differently or creating the index differently, if that can help.

Related

Dynamodb index with Json attribute

I am referring to a thread creating an index with JSON
I have a column called data in my DynamoDB table. This is in JSON and the structure of this file looks like this:
{
"config": "aasdfds",
"state":"PROCESSED",
"value" "asfdasasdf"
}
The AWS documentation says that I can create an index with the top level JSON attribute. However I don't know how to do this exactly. When I create the index, should I specify the partition key as data.state, then, in my code, use a query with the column data.state with the value set to PROCESSED, or should I create the partition key as data, then, in my code, look for the column data with the value set to state = "PROCESSED" ?
Top level attribute means DynamoDB supports creating index on Scalar attributes only (String, Number, or Binary).
The JSON attribute is stored as Document data type. So, index can't be created on Document data type.
The key schema for the index. Every attribute in the index key schema
must be a top-level attribute of type String, Number, or Binary. Other
data types, including documents and sets, are not allowed.
Scalar Types – A scalar type can represent exactly one value. The
scalar types are number, string, binary, Boolean, and null.
Document Types – A document type can represent a complex structure
with nested attributes—such as you would find in a JSON document. The
document types are list and map.
Set Types – A set type can represent multiple scalar values. The set
types are string set, number set, and binary set.

limit offset, sorting and aggregation challenges in DynamoDB

I am using DynamoDB to store my device events (in JSON format) into table for further analysis and using scan APIs to display the result set on UI, which requires
To define limit offset of records,say 10 records per page, means
result set should be paginated(e.g. page-1 has 0-10 records, page-2
has 11-20 records and so on), i got an API like scanRequest.withLimit(10) but it has different meaning of limit offset, does DynamoDB API comes with support of limit offset?
I also need to sort result set on basis of user input fields like sorting on Date, Serial Number etc, but still didn't get any sorting/order by APIs.
I may look for aggregation e.g. on Device Name, Date etc. which also doesn't seems to be available in DynamoDB.
The above situation led me to think about some others noSQL database solutions, Please assist me on above mentioned issues.
The right way to think about DynamoDB is as a key-value store with support for indexes.
"Amazon DynamoDB supports key-value data structures. Each item (row) is a key-value pair where the primary key is the only required attribute for items in a table and uniquely identifies each item. DynamoDB is schema-less. Each item can have any number of attributes (columns). In addition to querying the primary key, you can query non-primary key attributes using Global Secondary Indexes and Local Secondary Indexes."
https://aws.amazon.com/dynamodb/details/
A table can have 2 types of keys:
Hash Type Primary Key—The primary key is made of one attribute, a
hash attribute. DynamoDB builds an unordered hash index on this
primary key attribute. Each item in the table is uniquely identified
by its hash key value.
Hash and Range Type Primary Key—The primary
key is made of two attributes. The first attribute is the hash
attribute and the second one is the range attribute. DynamoDB builds
an unordered hash index on the hash primary key attribute, and a
sorted range index on the range primary key attribute. Each item in
the table is uniquely identified by the combination of its hash and
range key values. It is possible for two items to have the same hash
key value, but those two items must have different range key values.
What kind of primary key have you set up for your Device Events table? I would suggest that you denormalize your data (i.e. pull specific attributes out of the json) and build additional indexes on those attributes that you want to sort and aggregate on: Date, Serial Number, etc. If I know what kind of primary key you have set up on your table, I can point you in the right direction to build these indices so that you can get what you need via the query method. The scan method will be inefficient for you because it reads every row in the table.
Lastly, with regard to your "limit offset" question, I think that you're looking for the ExclusiveStartKey, which will be returned by DynamoDB in the response to your query.
The ExclusiveStartKey is what will help you do pagination. It's not necessary to depend on the LastEvaluatedKey from the response. You'll get LastEvaluatedKey only if you are getting more than a MB worth data. If LIMIT page size is such that total returned data size is less than 1 MB, you'll not get back LastEvaluatedKey. But that does not stop you from using ExclusiveStartKey as an offset.

Does dynamodb support something like an "in" clause in its queries?

Say I have table of photos and users.
Given I have a list of users I'm following [user1,user2,...] and I want to get a list of photos of people I'm following.
How can I query the table of photos where photo.createdBy in [user1,user2,user3...]
I saw that dynamodb has a batch operation, but that takes a primary key, and in this case we would be querying against a secondary index (createdBy).
Is there a way to do a query like this in dynamodb?
If you are querying purely on photo.createdBy, then you should create a global secondary index:
To speed up queries on non-key attributes, you can create a global secondary index. A global secondary index contains a selection of attributes from the table, but they are organized by a primary key that is different from that of the table. The index key does not need to have any of the key attributes from the table; it doesn't even need to have the same key schema as a table.
This will, of course, only retrieve one item. To limit results when returning more items, use a FilterExpression:
With a Query or a Scan operation, you can provide an optional filter expression to refine the results returned to you. A filter expression lets you apply conditions to the data after it is queried or scanned, but before it is returned to you. Only the items that meet your conditions are returned.
This can be applied to a Filter or Scan, but be careful of using too many Read Capacity Units when scanning for matching entries.

Dynamodb 'orter by' based on a hash and other column criteria

I'm using dynamo db, I was reading this document Class QueryRequest and says literately:
You can narrow the scope of the query by using comparison operators on
the range key value, or on the index key. You can use the
ScanIndexForward parameter to get results in forward or reverse order,
by range key or by index key.
But, I need know if is possible to sort my data according to another parameter (different to hash or range).
Thanks in advance.
You can only Query a Table of type Hash and Range, or a Local Secondary Index or a Global Secondary Index. You have to create the Table with the indexes, so if you haven't then you cannot query on them.

DynamoDB ordered list

I'm trying to store a List as a DynamoDB attribute but I need to be able to retrieve the list order. At the moment the only solution I have come up with is to create a custom hash map by appending a key to the value and converting the complete value to a String and then store that as a list.
eg. key = position1, value = value1, String to be stored in the DB = "position1#value1"
To use the list I then need to filter out, organise, substring and reconvert to the original type. It seems like a long way round but at the moment its the only solution I can come up with.
Does anybody have any better solutions or ideas?
The List type in the newly added Document Types should help.
Document Data Types
DynamoDB supports List and Map data types, which can be nested to represent complex data structures.
A List type contains an ordered collection of values.
A Map type contains an unordered collection of name-value pairs.
Lists and maps are ideal for storing JSON documents. The List data type is similar to a JSON array, and the Map data type is similar to a JSON object. There are no restrictions on the data types that can be stored in List or Map elements, and the elements do not have to be of the same type.
I don't believe it is possible to store an ordered list as an attribute, as DynamoDB only supports single-valued and (unordered) set attributes. However, the performance overhead of storing a string of comma-separated values (or some other separator scheme) is probably pretty minimal given the fact that all the attributes for row must together be under 64KB.
(source: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/DataModel.html)
Add a range attribute to your primary keys.
Composite Primary Key for Range Queries
A composite primary key enables you to specify two attributes in a table that collectively form a unique primary index. All items in the table must have both attributes. One serves as a “hash partition attribute” and the other as a “range attribute.” For example, you might have a “Status Updates” table with a composite primary key composed of “UserID” (hash attribute, used to partition the workload across multiple servers) and a “Time” (range attribute). You could then run a query to fetch either: 1) a particular item uniquely identified by the combination of UserID and Time values; 2) all of the items for a particular hash “bucket” – in this case UserID; or 3) all of the items for a particular UserID within a particular time range. Range queries against “Time” are only supported when the UserID hash bucket is specified.

Resources