OK! I just want to know why secondary index. Who is the first?
I am from China. Maybe the problem of translation.I am just a little curious.
When i saw another doc.I think i find the answser.
In this doc.I find this --
Improving Data Access with Secondary Indexes
"Amazon DynamoDB provides fast access to items in a table by specifying primary key values. However, many applications might benefit from having one or more secondary (or alternate) keys available, to allow efficient access to data with attributes other than the primary key. To address this, you can create one or more secondary indexes on a table, and issue Query or Scan requests against these indexes."
In this sentence, we can see this keyword -- "secondary (or alternate) keys".In English,'primary' can mean main , or more important.But when we translate into Chinese,we lost their relationship.
So,"Who is the first ?". I think primary key is the 'first' that i care about.
All the above answers are from official doc and self-understanding.If you have any questions, please correct me.
Related
New to DynamoDB, I have the partition group_id, and sort key groupid_storeid_sortk.
I am wanting to setup additional access pattern with the group_id and store_addrss_sortk.
Will this have any impact on performance using the partition key in the secondary index, or would it be better to create a new attribute as the secondary key, even though it would be duplicate data.
ThankYou
It’s fine to use the same partition key attribute again as the PK for the GSI. No problem there.
For the future: You may want to watch some videos on single-table design and start using PK/SK as generic names since you might want to overload what’s inside them for different items. And then you might want GSI1PK/GSI1SK as the GSI keys.
That’s a style thing when you aim for some optimizations single-table design can bring.
An index is simply another table that you don't have to manage yourself. When you create an index, the service (DynamoDB, for example) creates a new table for you and manages the synchronization of the data between the tables.
In DynamoDB you have two types of secondary indexes, Global and Local. If you use the same partition key, you can use both of these options. However, you have to define the secondary local index (SLI) when you create the table and you can't add it later. Only secondary global indexes (SGI) can be added after the creation of the table. You can read more about it in DyanmoDB documentation.
Regarding performance, you need to consider the cost (read/write capacity) on top of the usual time considerations. You need to see if you are writing a lot to the table and not only reading a lot. Based on that you can plan carefully the projection of the data into the new index. Remember that writes are about 10 times more expensive and slower than reads. You can read more about projection best practices here.
Let's say I make a GSI for 'Name' and I have two people in my database who just happen to have the same name:
Tim Cook
Tim Cook
Now this will fail a consistency constraint on insert for duplicate values hence we need another approach.
I was thinking about hashing the name values at the end so that the BEGINS_WITH operator can still be used to search / match on but that puts you in a weird position. What do you salt with? How many characters? The longer the salt the more memory and potentially compute you waste cleaning up the salt before returning the results to the user. The shorter the salt the more likely you are to have collisions. After all there are some incredibly common names out there.
Here's an example of the values salted:
Tim Cook#ABCDEF
Tim Cook#ZYXWVU
This is great as I can insert both values now and now I can create a 'search user by name' endpoint for the user via the BEGINS_WITH('Tim Cook') operation but it feels weird.
I did a bit of searching though on sorting and searching by names in DynamoDB and didn't come up with anything meaningful on how to proceed from here. Wondering what you guys think.
My one and final issue is that names are not evenly spread out so you're inevitably going to have hotter partitions but I just don't see another way around this. Minus of course exfiltrating the data to another data store and querying it there like a full text search store.
You can’t insert to a GSI. So your concern is kind of misplaced.
You also can’t Get Item on a GSI, only Query, and that’s because there’s not necessarily one matching value for a given key.
Note: The GSI always projects the primary key over from the base table.
You can follow the following schema pattern to achieve your goal:
Partition key: Name
Sort/Range key: createdAt (The creation time of that row)
In this case, if the name is same for more than 1 people, you will be returned with all the names sorted automatically. This schema will also allow you to create a unique access pattern for each item of your table.
Partition key -> Sort key
Name -> createdAt
Tim Cook -> "HH:mm:ss"
Each row will have a different creation time and will provide unique composite key values for each item of the table.
For some reason I thought GSI's had the same uniqueness constraint as partition keys however that's not the case - you can have duplicates.
In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique.
Source
So a GSI is a perfectly good way to store duplicated information. Not sure this question is helpful now since it came about through ignorance so it might be worth deleting now.
In "The Dynamo Db Book" by Alex brie, in chapter 13.4 talks about how you can transfer a subset of dynamoDB records to a secondary index. Put another way, how you can filter some records so the secondary index can be used as a sort of SQL GROUP BY.
Where is the official API documentation for this?
Thanks for any help.
The concept you are referring to is a Sparse Index.
AWS wrote an article on the topic. However, I want to point out that this is merely a strategy on how you use the table, not a feature of the API.
When you create a Global Secondary Index, you define a set of attributes that DynamoDB will use to copy your items into the index. You don't do anything special to copy the items into the index yourself, it's something DynamoDB does transparently for you.
If the GSI you've defined doesn't show up on every item in the table, we call the index a "sparse index". In other words, only a subset of items in your table will be in that index.
I'm sure Alex did a much better job of explaining this than I have, but it's important to note that this isn't something the API does for you. It's a side effect of which items you include/exclude in the GSI.
If I know the primary key of the items, Which approach is best approach
Scan with FilterExpression with IN Operator
BatchGetItem with all keys in request parameter
Please recommend the solution in terms of both latency and partitions impact.
Probably neither. Of course it all depends on the key schema and the data in the table, but you probably want to create an Global Secondary Index for your most frequently used queries.
Having said that; performing scans is highly discouraged, especially when working with large volumes of data. So if you know the primary key of the items you're interested in, go for BatchGetItems over doing a scan.
I'm assessing whether if I can use DynamoDB for our next project, what we are building is quite similar to a blogging platform, here is a simple table
Blog Post
ID - primary hash key
Title
DateCreated - primary range key
Votes
I've read enough to know how to List - list of blog posts, Paging - using last fetched index, Get post details - get a row, I will be sorting using DateCreate, which is my range key.
I'm struggling on how do do sort on a secondary index. For example, if we have a column called Votes, how do you do Most Votes? My interpretation is that you can only sort using the range index which I'm already using.
Update
AWS has just announced general availability of the much anticipated Global Secondary Indexes for Amazon DynamoDB, which are addressing the limitations of Local Secondary Indexes discussed further below:
You can now create indexes and perform lookups using attributes other than the item's primary key. [...]
You can now create up to five Global Secondary Indexes when you create a table, each referencing either a hash key or a hash key and a range key. You can also create up to five Local Secondary Indexes, and you can choose to project some or all of the table's attributes into each of the table’s indexes.
Please refer to the blog post for more details on the choice between these two models.
Correction
As rightly pointed out by vartec, I've been getting ahead of myself adding this information at the day Local Secondary Indexes had been announced without properly analyzing the problem at hand, where those are in fact not applicable - ironically I've stressed just that myself in a later comment on another question:
[...] however, please note that local is a crucial limitation: A local secondary index is a data structure that maintains an alternate range key for a given hash key - while this covers many real world scenarios, it doesn't apply to arbitrary non primary key field queries like those of the question at hand.
Thanks vartec for spotting this error and apologies for being misleading here.
Initial (erroneous) answer
Amazon DynamoDB has just announced Support for Local Secondary Indexes to address your use case:
[...] We call the newest capability Local
Secondary Indexes (LSI). While DynamoDB already allows you to perform
low-latency queries based on your table’s primary key, even at
tremendous scale, LSI will now give you the ability to perform fast
queries against other attributes (or columns) in your table. This
gives you the ability to perform richer queries while still meeting
the low-latency demands of responsive, scalable applications.
See also the introductory blog post Local Secondary Indexes for Amazon DynamoDB for a more detailed explanation.
As usual for AWS, the new functionality is released with a constrained feature set at first, which is going to be expanded over time:
Today, local secondary indexes must be defined at the time you create
your DynamoDB tables. In the future, we plan to provide you with an
ability to add or drop LSI for existing tables. If you want to equip
an existing DynamoDB table to local secondary indexes immediately, you
can export the data from your existing table using Elastic Map Reduce,
and import it to a new table with LSI. [emphasis mine]
looks like this isn't possible, you can only sort by the range hashkey
I'm going to load up the table in memory and sort it in memory.