As the title suggest, I'm in a situation where I need to fetch all records from a dynamo table GSI, given that I know the hash key and I know the sort key that I want to avoid.
The table looks like this:
Id - Primary Key,
AId - GSI hash key,
BId - GSI sort key
I need an efficient query to get records by a query like this
AId = 1 and BId != 2.
DynamoDB doesn't support <> operator when querying on hash and sort keys, it's only present on filter expressions, but those are not allowed on any of the primary key fields either.
So what would be the solution here? Scanning is probably not a good idea, unless it would be possible to scan on a partition, but that doesn't seem to be supported either.
So the only solution that is obvious to me at this point is querying by the partition key and then filtering it out client side.
Assuming that your sort key is actually numeric as shown in your example...
Then your best option would be to issue two separate queries..
AId = 1 and BId < 2
AId = 1 and Bid > 2
Actually, as I write this...I think it would work regardless of the type of sort key...
Related
I have a requirement to query the dynamoDB and get all the records which matches a certain criteria. The requirement is, I have table say parent_child_table, which has parent_id and child_id as two columns, now i need to query the table with a particular input id and fetch all the records. for Example
now if I query the db with id 67899, the I should get both two records i.e 12345 and 67899.
I was trying to use below methods :
GetItemRequest itemRequest=new GetItemRequest().withTableName("PARENT_CHILD_TABLE").withKey(partitionKey.entrySet().iterator().next(), sortKey.entrySet().iterator().next());
but am not getting OR operator.
DynamoDB doesn't work like that...
GetItemRequest() can only return a single record.
Query() can return multiple records, but only if you are using a composite primary key (partition key + sort key) and you can only query within a single partition...so all the records to be returned must have the same partition key.
Scan() can return multiple records from any partition, but it does so by always scanning the entire table. Regular use of scan is a bad idea.
Without knowing more it's hard to provide guidance, but consider a schema like so:
partition key sort key
12345 12345
12345 12345#67899
12345 12345#67899#97765
Possibly adding some sort of level indicator in the sort key or just as an attribute.
select * from tableName where columnName="value";
How can I fetch a similar result in DynamoDB using java, without using primary key as my attribute (Need to group data based on a value for a particular column).
I have gone through articles regarding getbatchitems, QuerySpec but all these require me to pass the primary key.
Can someone give a lead here?
Short answer is you can't. Whenever you use the Query or GetItem operations in DynamoDB you must always supply the table or index primary key.
You have two options:
Perform a Scan operation on the table and filter by columnName="value". However this requires DynamoDB to look at every item in the table so it is likely to be slow and expensive.
Add a Global Secondary Index to your table. This will require you to define a primary key for the index that contains the columnName you want to query
We have a group of related documents all sharing the same partition key. The thinking is simply grouping these up should be a case of querying on the partition key and stitching them together. What am I missing?
So
Select * from c where c.CustomerId = "500"
Would return say 3 documents, Address, Sales and Invoices who all have a property named CustomerId , with a value of 500.
I appreciate its not the primary key and I am purposely omiitng a row key.
Perhaps not splitting the documents is the answer but then the different documents have different TTLs and this would then becone problematic, wouldnt it(
CustomerId is the partition key.
The ms docs say this is possible (citing a city = seattle ) example. Where their partitionkey is city....
So, what am I missing, a complete misunderstaning of querying is cosmos ? (i can say I know a partition key is used to break up related data into partitions) I didnt know this made it an unqueryable aspect.
Also I can query with partition key and rowkey no problem.
EDIT 2:
This works:
SELECT * FROM c WHERE c.CustomerId > "499" AND c.CustomerId < "501"
Ok,
So the range query working was a bit of a lead.
Custom indexing on the collection was causing issues.
At this moment, I have removed the custom indexing entirely and will build back up and then post a more specific answer.
What I did read was that the PartitionKey is implicitly indexed anyway. There was an index on this ALSO so maybe this was causing funnies.
Indexing Policies CosmosDB
Maybe I'm not getting at all, but you have to be explicit about the value that you are looking for, I think is not the same:
c.CustomerId = "500"
VS
c.CustomerId = 500
because one is looking for text and the other one for a number, review how is stored your data, and it has to be the same if you want to perform the query using that value (and having in mind CustomerId is the Partition Key).
Say if I had a DynamoDB table:
UserId: S
BookName: S
BorrowedTimestamp: S
HasReturned: B
UserId (partition) and BookName (range) would be keys on the base table.
However I want to query using the other non-key fields e.g. BorrowedTimestamp > 3days and HasReturned is false.
I think I'd need to setup a GSI for this query to work, but it doesn't sound right having a binary field, HasReturned, as the partition key (with BorrowedTimestamp as range key). Is that correct, or am I missing something?
No, you don't need a GSI, but it might be more efficient depending on your circumstances.
Lets take your example of BorrowedTimestamp > 3days. Im going to assume this is for a particular user, so you have a userid to query.
You could do a query with a KeyConditionExpression of userid, then a FilterExpression of BorrowedTimestamp > 3days. Lets say the user has 10 books and 2 of them have a BorrowedTimestamp > 3days. This query will cost you 10 RCU (Read Capacity Units). That's because a FilterExpression just filters out items in your result set - DynamoDB actually found all 10 items in the query.
Now lets say you have a GSI where the partition key was userid and the range key was BorrowedTimestamp. Your KeyConditionExpression could specify both the parition key of the userid and the range key of BorrowedTimestamp > 3days. The result would be exactly the same. However this time it would only cost you 2 RCUs, and those RCUs would come from the index capacity not the table capacity.
Less RCUs sounds good, but remember you have to purchase throughput capacity for your primary index and GSI separately. This can be less efficient because you can't share purchased throughput between queries that use your primary key and GSI.
Finally if you didn't want to specify a userid at all you would use a scan. Scans sometimes don't scale well because they always evaluate every item in the table, but whether it works for you really depends on a lot of things (like how often you will use the scan, how many items you will have in the table etc).
I have 2 LSI in my table with a primary partition Key with primary sort key
Org-ID - primary partition Key
ClientID- primary sort Key
Gender - LSI
Section - LSI
I have no issue with querying a table with one LSI, but how to mention 2 LSI in a table schema.
var params = {
TableName:"MyTable",
IndexNames: ['ClientID-Gender-index','ClientID-Section-index'],
KeyConditionExpression : '#Key1 = :Value1 and #Key2=:Value2 and #Key3=:Value3',
ExpressionAttributeNames:{
"#Key1":"Org-ID",
"#Key2":"Gender",
"#Key3":"Section"
},
ExpressionAttributeValues : {
':Value1' :"Microsoft",
':Value2':"Male",
':Value3':"Cloud Computing"
}};
Can anyone fix the issue in IndexName(line 3) or KeyConditionExpression(line 4), I'm not sure about it.
Issue
Condition can be of length 1 or 2 only
You can only query a single DynamoDB index at a time. You cannot use multiple indexes in the same query.
A simple alternative is to use a single index and apply a query filter, but this will potentially require a lot of records to be scanned and the filter only reduces the amount of data transferred over the network.
A more advanced alternative is to make a compound key. You would most likely want to use a GSI, rather than an LSI for this use case. By making a single new column that is the string concatenation of Key1, Key2, and Key3 you can use this GSI to search all three keys at the same time. This will make each individual record bigger by repeating data but it allows for a more complex query pattern.