Dynamo DB single table design and query - amazon-dynamodb

I'm new to Dynamo and scratching my head on how query my table, and I think I might overthink things. I have a simple table with partners with corresponding developers. I would like to query the partner data based on a given developer id. I've created a secondary global index with base table SK as PK (which lets me query with developer#123), but how would I get the partner item (partner#123) by developer id (developer#123) in this scenario in one single query? I'm only querying using the aws console now so now code examples available.
Any pointers would be greatly appreciated!

Related

Are multiple dynamoDB queries in a single API request bad practice

I'm trying to create my first DynamoDB based project and I'm having some trouble figuring out the best practices working with a NoSQL database.
My usecase currently is storing users and teams. I have a table that has a partition key of either USER#{userId} or TEAM{#teamId}. If the PK is TEAM{#teamId} I store records with SK either TEAM#{teamId} for team details, or USER#{userId} for the user's details in the team (acceptedInvite, joinDate etc). I also have a GSI based on the userId/email column that allows me to query all the teams a user has been invted to, or the user's team, depending on the value of acceptedInvite field. Attached screenshots of the table structure at the moment:
The table
The GSI
In my application I have an access pattern of getting a team's team members, given a user id.
Currently, I'm doing two queries in my lambda function:
Get user's team, by querying the GSI on PK = {userId} and fitler acceptedInvite = true
Get the team data by querying the table on PK = {teamId} and SK begins_with USER#
This works fine, but I'm concerned I need to preform two separate DynamoDB calls in my API function.
I'm wondering if there's a better way to represent this access pattern and if multiple dynamoDB calls are actually that bad, since I cannot see another way to do this.
Any kind of feedback is appreciated!
The best way to avoid making two queries like this is to supply the API caller with all the information needed to make a single DynamoDB request. For your case this means supplying the caller with the teamId. You can do this as either as part of a list operation response, or if it is the authenticated user, then as part of their claims in a JWT.

How to transfer subset of dynamoDB records to a secondary index?

In "The Dynamo Db Book" by Alex brie, in chapter 13.4 talks about how you can transfer a subset of dynamoDB records to a secondary index. Put another way, how you can filter some records so the secondary index can be used as a sort of SQL GROUP BY.
Where is the official API documentation for this?
Thanks for any help.
The concept you are referring to is a Sparse Index.
AWS wrote an article on the topic. However, I want to point out that this is merely a strategy on how you use the table, not a feature of the API.
When you create a Global Secondary Index, you define a set of attributes that DynamoDB will use to copy your items into the index. You don't do anything special to copy the items into the index yourself, it's something DynamoDB does transparently for you.
If the GSI you've defined doesn't show up on every item in the table, we call the index a "sparse index". In other words, only a subset of items in your table will be in that index.
I'm sure Alex did a much better job of explaining this than I have, but it's important to note that this isn't something the API does for you. It's a side effect of which items you include/exclude in the GSI.

How to query from two containers in Cosmos DB (SQL API)

I am new to cosmos db. I chose cosmos db (core sql), created a database having two containers say EmployeeContainer and DepartmentContainer. Now I want to query these two container and want to fetch employee details with associated department details. I stuck on a point and need help.
Below is the structure of my containers.
EmployeeContainer : ID, Name, DepartmentID
DepartmentContainer: ID, Name
Thanks in advance.
Cosmos DB is not a relational database. You do not store different entities in different containers if they are queried together. They are either embedded in other entities or stored as separate rows using a shared partition key with other entities in the same container.
Before you get too far with Cosmos you need to understand how to model and partition data to ensure the best possible performance. I strongly recommend you read the docs on partitioning and specifically read these docs below.
Data modeling in Cosmos DB
Partitioning in Cosmos DB
How to model and partition data - a real world example
And watch Data Modeling in Cosmos DB - What every relational developer should know
It completely depends on the type of data you are trying to model. Generally, it comes down to relationships. 1:1 or 1:few often are best for embedding related items or where queries are updated together. 1:many or many:many for referencing related items are queried or updated independently.
For great talks on these issues check out https://www.gotcosmos.com/conf/ondemand
You can use subquery.
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-subquery#mimic-join-with-external-reference-data
But this may consumes a lot of RU. And only inner join for now.

DynamoDB schema with unknown PartitionKey

I think that's a pretty straight-forward use-case for DynamoDB, but I couldn't think on a good solution.
Let's say I have a table like:
OrderId
PaymentId
AmmountPaid
Sometimes I need to query by OrderId so I can get all the payments made for this Order.
Sometimes I need to know which order a paymentId is related to.
It seems to me it would make sense to have OrderId as the PartitionKey. The issue is that I won't know it when I'm querying based on PaymentId.
Is there a better solution than storing a map of PaymentId -> OrderId on another table?
Thanks!
Use Secondary Indexes.
Some applications might need to perform many kinds of queries, using a variety of different attributes as query criteria. To support these requirements, you can create one or more global secondary indexes and issue Query requests against these indexes. To illustrate, consider a table named GameScores that keeps track of users and scores for a mobile gaming application. Each item in GameScores is identified by a partition key (UserId) and a sort key (GameTitle). The following diagram shows how the items in the table would be organized. (Not all of the attributes are shown)

Modeling ecommerce order table - DynamoDB + SNS + SQS

I create a DynamoDB table that store orders from ecommerce front end. When a user places an order it is stored on a DynamoDB table. This table has a primary key (order_id) and tow global secondary index: (email, SSN).
I would like to query by order status too.
So i would like to retrieve all orders on specific status on specific date. Which is the best way to model this behavior?
Make another global secondary index with a sort key?
Yes, you'll need to add another GSI.
This will, however, cost you money. One question that you can ask yourself is, do you really need real-time/low-latency lookups?
If not, then you can consider copying your DynamoDB data to a datastore like Redshift and run your queries on it. This:
Might be more cost-efficient, depending on your application.
Will allow you to support a wider variety of query patterns in future. (Remember, you can only have 5 GSIs in DynamoDB, and you've already used 2 of them)

Resources