Distinct attribute value from Global Secondary Index in DynamoDB - amazon-dynamodb

How do we achieve the similar functionality of distinct keyword from SQL in Amazon DynamoDB?

dyanamo db is not support this kind of functionality, but you can achieve this in some ways (client side, lambda on dynamodb stream the updates another table with distinct values..)
you can find a good answer here: Retrieve distinct values from the hash key - DynamoDB

Related

Multi partition key search operation in DynamoDB

Is there some operation of the Scan API or the Query API that allows to perform a lookup on a table with a composite key (pk/sk) but that varies only in the pk to optimize the Scan operation of the table ?
Let me introduce a use case:
Suppose I have a partition key defined by the id of a project and within each project I have a huge amount of records (sk)
Now, I need to solve the query "return all projects". So I don't have a partition key and I have to perform a scan.
I know that I could create a GSI that solves this problem, but let's assume that this is not the case.
Is there any way to perform a scan that "hops" between each pk, ignoring the elements of the sk's?
In other words, I will collect the information of the first record of each partition key.
DynamoDB is a NoSQL database, as you already know. It is optimized for LOOKUP, and practices that you used to have in SQL databases or other (low-scale) databases are not always available in DynamoDB.
The concept of a partition key is to put records that are part of the same partition together and sorted by the sort key. The other side of it is that records that don't have the same partition key, are stored in other locations. It is not a long list (or tree) of records that you can scan over.
When you design your schema in a NoSQL database, you need to consider the access pattern to that data. If you need a list of all the projects, you need to maintain an index that will allow it.

Query Cosmos DB for all partitions

I am thinking of creating a reporting tool for a Cosmos DB instance in our system. I want to discover all the partition keys and the number of items stored for each. I think I should be using pkranges but do not seem to be able to find any examples of how this should work. Any suggestions?
You can try to get list of pkranges using this api
As of now, there is no api to return list of partition keys. There was a feature request at Azure Cosmos DB Improvement Ideas, but it is declined.
So, you can try "SELECT distinct c.partitionKey FROM c" to get list of distinct partition keys.

Would cross partition query in cosmos db be helpful if we know one part of it

We are using cosmos db for our data storage and there is a case where I have to do cross partition query because I don't know the specific partition key. But I will know a part of it.
To elaborate, my partition key is combination of multiple strings, lets say A-B.
and lets say I only know A but not B. So is there any way to do wild card searching on the partition key.
would that optimize the query or its not possible. Depending on that I will consider if to put A in the the partition key at all or not
Based on my researching and Partitioning in Azure Cosmos DB, nowhere mentions cosmos db partition key supports wildcard searching feature. Only index policy supports wildcard setting:https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy#including-and-excluding-property-paths
So,for your situation,you don't know B so that i'd suggest you considering setting pk as A. Besides,you could vote up this thread:https://github.com/PlagueHO/CosmosDB/issues/153

Modeling ecommerce order table - DynamoDB + SNS + SQS

I create a DynamoDB table that store orders from ecommerce front end. When a user places an order it is stored on a DynamoDB table. This table has a primary key (order_id) and tow global secondary index: (email, SSN).
I would like to query by order status too.
So i would like to retrieve all orders on specific status on specific date. Which is the best way to model this behavior?
Make another global secondary index with a sort key?
Yes, you'll need to add another GSI.
This will, however, cost you money. One question that you can ask yourself is, do you really need real-time/low-latency lookups?
If not, then you can consider copying your DynamoDB data to a datastore like Redshift and run your queries on it. This:
Might be more cost-efficient, depending on your application.
Will allow you to support a wider variety of query patterns in future. (Remember, you can only have 5 GSIs in DynamoDB, and you've already used 2 of them)

DynamoDB: How to find unique hash keys from primary key if its hash-range schema?

I have a dynamodb table.
It has Primary partition key - IdType (String) and Primary sort key - Id (String)
As it's hash range schema, IdType is not unique and one key can be multiple times. I need to find all the unique IdType.
How do we find that? One possible solution is to get all IdType using Scan and process all client side and find unique using our own code. But scan is expensive and scan only limits to 1MB data per scan so it is not feasible to scan as the table is already more than 1 MB data and it will gradually increase in future.
Is there any other way to do this? Any help would be appreciated.
PS: There are no indexes
Short answer would be NO, to query DynamoDB table the first thing you need is the Hash key so this eliminates all the options of Querying data because you must have hash key to find the data.
As far as I know DyanmoDB does not have any inbuilt attribute for finding a uniqueness of a key.
If you want to achieve this you can do it by
1) Scanning the table as you have mentioned and filter it at an application level.
2) If your data is not updated frequently then you can store the data in cache and retrieve the desired information
3) You can use another AWS service called cloudSearch to achieve the desired result (have to pay more)
If you are able to achieve with another method please do share it.
Hope that helps

Resources