I'm trying to understand the items size limit in dynamo DB, but I'm not sure of what an item actually is.
Looking at the image below, does the 400kb limit apply to the red rectangle (Primary key/Sort key) or to the green rectangle (only primary key)?
Sorry if the question is stupid but I can't find a definitive answer by myself.
Example
Your terminology on the diagram is wrong, which is probably how you're confused.
When people say PK that's the Partition Key, and SK is the Sort Key. Together they make the Primary Key, which is the unique identifier for an item. The SK is actually optional, in which case the Partition Key is also the Primary Key.
An Item is the data put by PutItem and retrieved by GetItem. Basically, a row.
An Item Collection is all items sharing the same PK value.
Related
In every cosmos db repository example I've seen, the id/row key has been generated like this: {partitionKey}:{Guid.newGuid()}. I'm working on a web api where the user won't necessarily have any way of knowing what this random GUID is. But they will know the EmployeeId, ProjectId etc. of the respective object, so I'm wondering if there are any issues with using i.e. EmployeeId as both the partition key and Id?
There's nothing technically wrong with the approach of setting id and partition key the same however you will have just one document per partition and that's bad design IMHO as all your read queries will be cross-partition queries (e.g. listing all employees).
One approach could be to set the partition key as the type of the entity (Employee, Project etc.) and then set the id as the unique identifier of the entity (employee id, project id etc.).
To be honest, if you know the partition key AND the item id, you can do a Point read which is the fastest.
We used to also take the approach of using random guids for all item IDs, but this means you will always need to know this id and partition key. Sometimes a more functional key as the item ID makes more sense so have a good thought about it!
And remember, an item ID is not unique, the uniqueness is only within the partition key.
So you could have two items with the same item ID and different partition key.
So I wanted to fetch the last item/row of my dynamodb table but i am not finding resources. My primary key is id having series of incremented numbers such as 1,2,3... for each row respectively.
This is my function.
async function readMessage(){
const params = {
TableName: table,
};
return dynamo.getItem(params).promise();
}
I am not sure as to what i should be adding in my params.
DynamoDB has two types of primary keys:
Partition key – A simple primary key, composed of one attribute known as the partition key.
Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
When fetching an item by partition key, you need to specify the exact partition key. You cannot fetch the max/min partition key.
Instead, you may want to create a sort key with a timestamp (or the ID if it's a sequential number) and use the sort key to fetch the last item.
Check out the AWS docs on Choosing the Right Partition Key for more info.
The proper way to design a table in DynamoDB is based on its expected access patterns; if this is something you need perhaps you should consider using this id as Sort Key instead of Primary Key and then query the table in descending order while also limiting the amount of items to 1.
If instead you don't want to change the schema of your items and you don't care about making at least two operations to do this you have two, not optimal options:
If none of your items ever gets deleted, just make a count first and use that information to know what's the latest item that was written.
Alternatively, if you could consider keeping a "special" record in your DynamoDB table that is basically a count that gets always incremented/written when one of your "other" items gets written. Upon retrieval you first retrieve the value of this special record and use this info to retrieve the actual one.
The combination of the partition key and sort key, makes the primary key of your item in the dynamoDB, so their combination must be unique, otherwise the item will be overwritten.
In almost all my use-cases, I select the primary key as an object attribute, like the brand, an email or a class and then, for the sort key I select the TimeStamp. So in this way, you always know the partition key, we need it to retrieve the values and then you can query your dynamoDB by making some filters by the sort key. For more extensive examples using Python, check the AWS page: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.04.html, where it shows, how you can query your DynamoDB items.
There is also other ways to define the keys in your Dynamo and for that I advise you to check https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html
Hi, I have created a dynamodb table but this has errors when i try to perform table.GetItem with only username (image attached)
Found this is poorly designed table, so thought of recreating a new table, my question is how to set attributes, local secondary index and global secondary index for a table with one primary key and 3 search columns.
or
Is it possible to have 3 more search columns(User_email, Username,Usertype) apart from partition key column(user_ID) in dynamodb?
The GetItem API requires both partition key and sort key. However, you can use Query API with only partition key attribute value. Sort key is not mandatory for Query API.
Get Item Rule:-
For the primary key, you must provide all of the attributes. For
example, with a simple primary key, you only need to provide a value
for the partition key. For a composite primary key, you must provide
values for both the partition key and the sort key.
You can define a maximum of 5 local secondary indexes and 5 global secondary indexes per table.
An LSI is attached to a specific partition key value, whereas a GSI spans all partition key values. Since items having the same partition key value share the same partition in DynamoDB, the "Local" Secondary Index only covers items that are stored together (on the same partition). Thus, the purpose of the LSI is to query items that have the same partition key value but different sort key values. For example, consider a DynamoDB table that tracks Orders for customers, where CustomerId is the partition key.
With a local secondary index, there is a limit on item collection
sizes: For every distinct partition key value, the total sizes of all
table and index items cannot exceed 10 GB. This might constrain the
number of sort keys per partition key value.
I am using DynamoDB to store my device events (in JSON format) into table for further analysis and using scan APIs to display the result set on UI, which requires
To define limit offset of records,say 10 records per page, means
result set should be paginated(e.g. page-1 has 0-10 records, page-2
has 11-20 records and so on), i got an API like scanRequest.withLimit(10) but it has different meaning of limit offset, does DynamoDB API comes with support of limit offset?
I also need to sort result set on basis of user input fields like sorting on Date, Serial Number etc, but still didn't get any sorting/order by APIs.
I may look for aggregation e.g. on Device Name, Date etc. which also doesn't seems to be available in DynamoDB.
The above situation led me to think about some others noSQL database solutions, Please assist me on above mentioned issues.
The right way to think about DynamoDB is as a key-value store with support for indexes.
"Amazon DynamoDB supports key-value data structures. Each item (row) is a key-value pair where the primary key is the only required attribute for items in a table and uniquely identifies each item. DynamoDB is schema-less. Each item can have any number of attributes (columns). In addition to querying the primary key, you can query non-primary key attributes using Global Secondary Indexes and Local Secondary Indexes."
https://aws.amazon.com/dynamodb/details/
A table can have 2 types of keys:
Hash Type Primary Key—The primary key is made of one attribute, a
hash attribute. DynamoDB builds an unordered hash index on this
primary key attribute. Each item in the table is uniquely identified
by its hash key value.
Hash and Range Type Primary Key—The primary
key is made of two attributes. The first attribute is the hash
attribute and the second one is the range attribute. DynamoDB builds
an unordered hash index on the hash primary key attribute, and a
sorted range index on the range primary key attribute. Each item in
the table is uniquely identified by the combination of its hash and
range key values. It is possible for two items to have the same hash
key value, but those two items must have different range key values.
What kind of primary key have you set up for your Device Events table? I would suggest that you denormalize your data (i.e. pull specific attributes out of the json) and build additional indexes on those attributes that you want to sort and aggregate on: Date, Serial Number, etc. If I know what kind of primary key you have set up on your table, I can point you in the right direction to build these indices so that you can get what you need via the query method. The scan method will be inefficient for you because it reads every row in the table.
Lastly, with regard to your "limit offset" question, I think that you're looking for the ExclusiveStartKey, which will be returned by DynamoDB in the response to your query.
The ExclusiveStartKey is what will help you do pagination. It's not necessary to depend on the LastEvaluatedKey from the response. You'll get LastEvaluatedKey only if you are getting more than a MB worth data. If LIMIT page size is such that total returned data size is less than 1 MB, you'll not get back LastEvaluatedKey. But that does not stop you from using ExclusiveStartKey as an offset.
I only use primary key integer ID for it's "auto-increment function".
What if I don't need an "auto-increment"? Do I still need primary key if I don't care the uniqueness of record?
Example: Lets compare this table:
create table if not exists `table1`
(
name text primary key,
tel text,
address text
);
with this:
create table if not exists `table2`
(
name text,
tel text,
address text
);
table1 applies primary key and table2 don't. Is there any bad thing happen to table2?
I don't need the record to be unique.
SQLite is a relational database system. So it's all about relations. You build relations between tables on keys.
You can have tables without a primary key; it is not necessary for a table to have a primary key. But you will almost always want a primary key to show what makes a record unique in that table and to build relations.
In your example, what would it mean to have two identical records? They would mean the same person, no? Then how would you count how many persons named Anna are in the database? If you count five, how many of them are unique, how many are mere duplicates? Such queries can be done properly, but get overly complicated because of the lacking primary key. And how would you build relations, say the cars a person drives? You would have a car table and then how to link it to the persons table in your example?
There are cases when you want a table without a primary key. These are usually log tables and the like. They are rare. Whenever you are creating a table without a primary key, ask yourself why this is the case. Maybe you are about to build something messy ;-)
You get auto-incrementing primary keys only when a column is declared as INTEGER PRIMARY KEY; other data types result in plain primary keys.
You are not required to declare a PRIMARY KEY.
But even if you do not do this, there will be some column(s) used to identify and look up records.
The PRIMARY KEY declaration helps to document this, enforces uniqueness, and optimizes lookups through the implicit index.