Say I have table of photos and users.
Given I have a list of users I'm following [user1,user2,...] and I want to get a list of photos of people I'm following.
How can I query the table of photos where photo.createdBy in [user1,user2,user3...]
I saw that dynamodb has a batch operation, but that takes a primary key, and in this case we would be querying against a secondary index (createdBy).
Is there a way to do a query like this in dynamodb?
If you are querying purely on photo.createdBy, then you should create a global secondary index:
To speed up queries on non-key attributes, you can create a global secondary index. A global secondary index contains a selection of attributes from the table, but they are organized by a primary key that is different from that of the table. The index key does not need to have any of the key attributes from the table; it doesn't even need to have the same key schema as a table.
This will, of course, only retrieve one item. To limit results when returning more items, use a FilterExpression:
With a Query or a Scan operation, you can provide an optional filter expression to refine the results returned to you. A filter expression lets you apply conditions to the data after it is queried or scanned, but before it is returned to you. Only the items that meet your conditions are returned.
This can be applied to a Filter or Scan, but be careful of using too many Read Capacity Units when scanning for matching entries.
Related
I have a DynamoDB table:
How shoul I filter entried in DB table where all keys are: access.role = "ADMIN"?
You would be best served by setting up an Global Index (GSI). You set the Partition Key equal to that attribute, and the Sort Key equal to some other attribute that you can guarantee will be unique. Then you use your SDK of choice or the Query option in the console, select the index, and query for partion_key = ADMIN
However. Be aware. Index's are a complete replication of the table. Dynamo is very good at this and relatively fast at doing so, but there is still the possibility that your index will be out of sync with the actual data. If you are not making the call against the index very often you are pretty much fine. If you are calling it very often, then you should restructure your table.
Dynamo is not an SQL. When setting up a dynamo schema you have to consider how you will access your data. your Access Patterns. You should design your data with your Partition Key as the data you will have when looking up (Ie: i always will have a user ID number) and your sort keys as the individual documents related to that PK (ie: a user has a document that is his profile data, a document that is his profile picture url, a document that is a list of his friends user numbers, a document that is ... ect)
Then you use Indexs for things like your question that you wont be doing very often.
I've been reading a DynamoDB docs and was unable to understand if it does make sense to query on Global Secondary Index with a usage of 'contains' operator.
My problem is as follows: my dynamoDB document has a list of embedded objects, every object has a 'code' field which is unique:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
]
}
I want to be able to get all documents that contain entities with entity.code = X.
For this purpose I'm considering adding a Global Secondary Index that would contain all entity.codes that are present in current db document separated by a comma. So the example above would look like:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
],
"entitiesGlobalSecondaryIndex":"entityCode1,entityCode2"
}
And then I would like to apply filter expression on entitiesGlobalSecondaryIndex something like: entitiesGlobalSecondaryIndex contains entityCode1.
Would this be efficient or using global secondary index does not make sense in this way and DynamoDB will simply check the condition against every document which is similar so scan?
Any help is very appreciated,
Thanks
The contains operator of a query cannot be run on a partition Key. In order for a query to use any sort of operators (contains, begins with, > < ect...) you must have a range attributes- aka your Sort Key.
You can very well set up a GSI with some value as your PK and this code as your SK. However, GSIs are replication of the table - there is a slight potential for the data ina GSI to lag behind that of the master copy. If the query you're doing against this GSI isn't very often, then you're probably safe from that.
However. If you are trying to do this to the entire table at once then it's no better than a scan.
If what you need is a specific Code to return all its documents at once, then you could do a GSI with that as the PK. If you add a date field as the SK of this GSI it would even be time sorted. If you query against that code in that index, you'll get every single one of them.
Since you may have multiple codes, if they aren't too many per document, you maybe could use a Sparse Index - if you have an entity with code "AAAA" then you also have an attribute named AAAA (or AAAAflag or something.) It is always null/does not exist Unless the entities contains that code. If you do a GSI on this AAAflag attribute, it will only contain documents that contain that entity code, and ignore all where this attribute does not exist on a given document. This may work for you if you can also provide a good PK on this to keep the numbers well partitioned and if you don't have too many codes.
Filter expressions by the way are different than all of the above. Filter expressions are run on tbe data that would be returned, after it is already read out of the table. This is useful I'd you have a multi access pattern setup, but don't want a particular call to get all the documents associated with a particular PK - in the interests of keeping the data your code is working with concise. The query with a filter expression still retrieves everything from that query, but only presents what makes it past the filter.
If are only querying against a particular PK at any given time and you want to know if it contains any entities of x, then a Filter expressions would work perfectly. Of course, this is only per PK and not for your entire table.
If all you need is numbers, then you could do a count attribute on the document, or a meta document on that partition that contains these values and could be queried directly.
Lastly, and I have no idea if this would work or not, if your entities attribute is a map type you might very well be able to filter against entities code - and maybe even with entities.code.contains(value) if it was an SK - but I do not know if this is possible or not
select * from tableName where columnName="value";
How can I fetch a similar result in DynamoDB using java, without using primary key as my attribute (Need to group data based on a value for a particular column).
I have gone through articles regarding getbatchitems, QuerySpec but all these require me to pass the primary key.
Can someone give a lead here?
Short answer is you can't. Whenever you use the Query or GetItem operations in DynamoDB you must always supply the table or index primary key.
You have two options:
Perform a Scan operation on the table and filter by columnName="value". However this requires DynamoDB to look at every item in the table so it is likely to be slow and expensive.
Add a Global Secondary Index to your table. This will require you to define a primary key for the index that contains the columnName you want to query
I'm new to AWS DynamoDB and wanted to clarify something. Is it possible to query a table and filter base on a non-primary key attribute. My table looks like the following
Store
Id: PrimaryKey
Name: simple string
Location: simple string
Now I want to query on the Name, but I think I have to give the key as well from what I know? Apart from that I can use the scan but then I will be loading all the data.
From the docs:
The Query operation finds items based on primary key values. You can query any table or secondary index that has a composite primary key (a partition key and a sort key).
DynamoDB requires queries to always use the partition key.
In your case your options are:
create a Global Secondary Index that uses Name as a primary key
use a Scan + Filter if the table is relatively small, or if you expect the result set will include the majority of the records in the table
There are few designs principals that you can follow while you are using DynamoDB. If you are coming from a relational background, you have already witnessed the query limitations from primary key attributes.
Design your tables, for querying and separating hot and cold data.
Create Indexes for Querying from Non Key attributes (You have two options, Global Secondary Index which you can define at any time and Local Secondary Index which you need to specify at table creation time).
With the Global Secondary Index you can promote any NonKey attribute as the Partition Key for the Index and select another attribute for Sort Key for querying. For Local Secondary Index, you can promote any Non Key attribute as the Sort Key keeping the same Partition Key.
Using Indexes for query is important also to improve the efficiency in using provisioned throughput.
Although having indexes consumes the read throughput from the table, it also saves read through put from in a way that, if you project the right amount of attributes to read, it can give a huge benefit in reading. Check the following example.
Lets say you have a DynamoDB table that has items of 40KB. If you read directly from the table to list 10 items, it consumes 100 Read Throughput Units (For one item 10 Units since one unit can read 4KB and multiply it by 10). If you have an index defined just to project the attributes needed to list which will be having 4KB per item, then it will be consuming only 10 Read Throughput Units(One Unit per item) which makes a huge difference in terms of cost.
With DynamoDB its really important how you define Indexes to optimize for Querying not only from Query capability but also in terms of throughput.
You can not query based non-primary key attribute in Dynamo Db.
If you wanted to still do that you can do it using scan query,but scan is costly operation in DyanmoDB and if table is large, then it will affect performance and not recommended because it will scan each item in table and AWS cost you for all item it scan for that query.
There are two ways to achieve it
Keep Store Id as your PrimaryKey/ Partaion key of Dyanmo DB table and add Name/Location as sort Key (only one as Dyanmo DB accept only one Attribute as sort key by design.
Create Global Secondary Indexes for Querying from Non Key attributes which you are more frequenly required.
There are 3 ways to created GSI in Dyanamo DB, In your case select GSI with option INCLUDE and add Name , Location and store ID in Idex.
KEYS_ONLY – Each item in the index consists only of the table partition key and sort key values, plus the index key values. The KEYS_ONLY option results in the smallest possible secondary index.
INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will include other non-key attributes that you specify.
ALL – The secondary index includes all of the attributes from the source table. Because all of the table data is duplicated in the index, an ALL projection results in the largest possible secondary index.
I have a dynamodB table with which i fetch a single row in the following way:
private Table myTable;
myTable = dynamoDB.getTable(tableName);
myTable.getItem(new PrimaryKey(primaryKey, primaryKeyValue));
Is there a way for me to retrieve with a list of primary keys? I see that I can use batchGetItem but however for that I will need to use the interface AmazonDynamoDB. Is there an alternative way using the table?
To get all items in your table you need to use Scan operation:
The Scan operation returns one or more items and item attributes by
accessing every item in a table or a secondary index. To have DynamoDB
return fewer items, you can provide a FilterExpression operation.
If the total number of scanned items exceeds the maximum data set size
limit of 1 MB, the scan stops and results are returned to the user as
a LastEvaluatedKey value to continue the scan in a subsequent
operation. The results also include the number of items exceeding the
limit. A scan can result in no table data meeting the filter criteria.
By default it will return all fields, but you can provide a projection expression to get only some fields (ids in your case):
To read data from a table, you use operations such as GetItem, Query,
or Scan. DynamoDB returns all of the item attributes by default. To
get just some, rather than all of the attributes, use a projection
expression.
A projection expression is a string that identifies the attributes you
want. To retrieve a single attribute, specify its name. For multiple
attributes, the names must be comma-separated.
Keep in mind that scans are expensive, since you pay not for items that DynamoDB returns, but for items that DynamoDB reads in the database:
A Scan operation always scans the entire table or secondary index,
then filters out values to provide the desired result, essentially
adding the extra step of removing data from the result set. Avoid
using a Scan operation on a large table or index with a filter that
removes many results, if possible. Also, as a table or index grows,
the Scan operation slows. The Scan operation examines every item for
the requested values, and can use up the provisioned throughput for a
large table or index in a single operation. For faster response times,
design your tables and indexes so that your applications can use Query
instead of Scan. (For tables, you can also consider using the GetItem
and BatchGetItem APIs.).
Reasons for not having batch get item on Table class:-
Table class is Thread safe
Table class implements the atomic operations of items such as DeleteItemApi, GetItemApi, PutItemApi, QueryApi, ScanApi, UpdateItemApi
Batch get item needs to deal with multiple items
The most important point is that Batch get item can get items from multiple tables
Example code to get items from multiple tables:-
The below code get items from Movies and Post table
DynamoDB dynamoDB = new DynamoDB(dynamoDBClient);
TableKeysAndAttributes movieTableKeyAndAttributes = new TableKeysAndAttributes("Movies").withPrimaryKeys(new PrimaryKey("yearkey",1999 ,"title", "List test title"));
TableKeysAndAttributes postableKeyAndAttributes = new TableKeysAndAttributes("post").withPrimaryKeys(new PrimaryKey("postId", "14"));
BatchGetItemSpec batchGetItemSpec = new BatchGetItemSpec().withTableKeyAndAttributes(movieTableKeyAndAttributes,postableKeyAndAttributes);
BatchGetItemOutcome batchGetItemOutcome = dynamoDB.batchGetItem(batchGetItemSpec);
System.out.println(batchGetItemOutcome.getBatchGetItemResult().getResponses());