How do I model this in DynamoDB?

How do I model this in DynamoDB? - amazon-dynamodb

I am testing out DynamoDB for a serverless app I am building. I have successfully modeled all of my application's query patterns except one. I was hoping someone could provide some guidance. Here are the details:
Data Model
There are three simple entities: User (~1K records), Product (~100K), ActionItem (~100/product).
A User has a many-to-many relationship with Product.
A Product has a one-to-many relationship with ActionItem.
The Workflow
There's no concept of "Team" for this app. Instead, a user is assigned a set of products which they (and others) are responsible for managing. The user picks the oldest items from their products' action item list, services the item and then closes it.
The use case I am trying to model is: As a user, show me all action items for products to which I am assigned.
Any help would be greatly appreciated.

Really only two options...
If you can store the list of products within the 400KB limit of DDB record, then you could have a record like so...
Hash Key: userID
Sort KEY: "ASSIGNED_PRODUCTS"
Otherwise,
Hash key: UserID
Sort key: "#PRODUCT#10001-54502"
userID in the above might be the raw userid, or if using a GSI, might be something like "#USER#user-id"

Related

What's the best way to store users in DynamoDB so I can get one efficiently, and a related group as well?

I have users for my website that need to log in. In order to do that, I have to check the database for them, by email address or a hash of their email.
Some of my users have an online course in common.
Others are all on the same project.
There are multiple projects and courses.
How might I set up my table so that I can grab individual users, and efficiently query related groups of users?
I'm thinking...
PK = user#mysite
SK = user#email.com
projects = [1,2,3]
courses = [101,202,303]
I can get any user user with a get PK = user#mysite, SK = user#email.com.
But if I query, I have to filter two attributes, and I feel like I'm no longer very efficient.
If I set up users like this on the other hand:
PK = user#email.com
SK = 1#2#3#101#202#303
projects = [1,2,3]
courses = [101,202,303]
Then I can get PK = user#gmail.com and that's unique on its own.
And I can query SK contains 101 for example if I want all the 101 course students.
But I have to maintain this weird # deliminated list of things in the SK string.
Am I thinking about this the right way?

You want to find items which possess a value in an attribute holding a list of values. So do I sometimes! But there is not an index for that.
You can, however, solve this by adding new items to the table.
Your main item would have the email address as both the PK and the SK. It includes attributes listing the courses and projects, and all the other metadata about that user.
For each course, you insert additional items where the course id is the PK and the member emails are the various SKs in that item collection. Same for projects.
Given an email, you can find all about them with a get item. Given a course or project you can find all matching emails with a query against the course or project id. Do a batch get items then if you need all the data about each email.
When someone adds or drops a course or project, you update the main item as well as add/remove the additional indexed items.
Should you want to query by course X and project Y you can pull the matching results to the client and join in the client on email address.
In one of your designs you're proposing a contains against the SK, which is not a supported operator against SKs so that design wouldn't work.

Saving users scores and favorites in Firestore Database

I am working in a small project that uses Firestore database as a backend. I explain about the database so it is understood what I need:
Basically I have a collection that contains a list of documents where each one of them represent a game. For each game I have the name, cover image, info, category, etc.
I also have a collection of the users, where I have the specific UID for each user (retrieved from the auth section), email, etc.
What I want now is to save the score that some user may have in some of these games, as well as the favorite games that the user could save. What I don't get to understand is how to create the connection between the users and the games. For example, I thought that I should save the users score creating a collection within each document(game) in the first collection that mentioned. But when I create this collection with ID "scores" it asks me for the first document where I have to facilitate an ID (if not automatic) and then I don't know how to proceed.
I have read also that I would have to create additional collections in the root folder like "favorites" or "scores" specifying the UID of the user but, how do I connect the user UID, the score, and game which the user got that score from?
I hope I explained myself properly. Thanks.

Firstly, I agree with Doug's comment above. The Firestore tutorial videos are a great resource!
In terms of connecting data to your user, you have some options. You can either:
Create sub-collections under each user. Such as /users/{user_id}/favorites. Favorites could be a sub-collection or an array of game_ids depending on your use case.
Store a userID field in the documents in a top level "scores" or "favorites" collection. Then you can query for scores in the /scores collection by adding a where userID == {user_id} clause to your query of the /scores collection.

Cosmos db user id/email as partition key

I have a dilema about choosing best (syntetic) value for partition key for storing user data.
User document has:
- id (guid)
- email (used to login, e.g.)
- profile data
There are 2 main types of queries:
Looking for user by id (most queries)
Looking for user by email (login and some admin queries)
I want to avoid cross partition queries.
If i choose id for partitionKey (synthetic field) then login queries would be cross partition.
On the other hand, if i choose email then if user ever changes email - its a problem.
What i am thinking is to introduce new type within the collection. Something like:
userId: guid,
userEmail: “email1”,
partitonKey: “users-mappings”
then i can have User document itself as:
id: someguid,
type: “user”,
partitionKey: “user_someguid”,
profileData: {}
that way when user logs in, i first check mappings type/partition by email, get guid and then check actual User document by guid.
also, this way email can be changed without affecting partitioning.
is this a valid approach? any problems with it? am i missing something?

Your question does not has a standard answer. In my opinion, you solution named mapping type causes two queries which is also inefficient. Choosing partition key is always a process of balancing the pros and cons.Please see the guidance from official document.
Based on your description:
1.Looking for user by id (most queries)
2.Looking for user by email (login and some admin queries)
I suggest you to prioritize the most frequent queries, that is to say, id.
My reason:
1.id won't change easily,is relatively stable.
2.Session or cookie can be saved after login, so there is not much accesses to login as same as id.
3.id is your most frequent query condition, so it's impossible to cross all partitions every time.
4.If you do concern about login performance,don't forget adding indexing policy for email column.It could also improve the performance.

As you already know, in querying Cosmos DB, Fan-out should be the last option to query, especially on such a high-volume action such as logging in. Plus, the cost in RUs will be significantly higher with large data.
In the Cosmos DB SQL API, one pattern is to use synthetic partition keys. You can compose a synthetic partition key by concatenating the id and the email on write. This pattern works for a myriad of query scenarios providing flexibility.
Something like this:
{
"id": "123",
"email":"joe#abc.com",
"partitionKey":"123-joe#abc.com"
}
Then on read, do something like this:
SELECT s.something
FROM s
WHERE STARTSWITH(s.partitionKey, "123")
OR
ENDSWITH(s.partitionKey, "joe#abc.com")
You can also use SUBSTRING() etc...
With the above approach, you can search for a user either by their id or email and still use the efficiency of a partition key, minimizing your query RU cost and optimizing performance.

Cloud Firestore and data modeling: From RDBMS to No-SQL

I am building an iOS app that is using Cloud Firestore (not Firebase realtime database) as a backend/database.
Google is trying to push new projects towards Cloud Firestore, and to be honest, developers with new projects should opt-in for Firestore (better querying, easier to scale, etc..).
My issue is the same that any relational database developer has when switching to a no-SQL database: data modeling
I have a very simple scenario, that I will first explain how I would configure it using MySQL:
I want to show a list of posts in a table view, and when the user clicks on one post to expand and show more details for that post (let say the user who wrote it). Sounds easy.
In a relational database world, I would create 2 tables: one named "posts" and one named "users". Inside the "posts" table I would have a foreign key indicating the user. Problem solved.
Poor Barry, never had the time to write a post :(
Using this approach, I can easily achieve what I described, and also, if a user updates his/her details, you will only have to change it in one place and you are done.
Lets now switch to Firestore. I like to think of RDBMS's table names as Firestore's collections and the content/structure of the table as the documents.
In my mind i have 2 possible solutions:
Solution 1:
Follow the same logic as the RDBMS: inside the posts collection, each document should have a key named "userId" and the value should be the documentId of that user. Then by fetching the posts you will know the user. Querying the database a second time will fetch all user related details.
Solution 2:
Data duplication: Each post should have a map (nested object) with a key named "user" and containing any user values you want. By doing this the user data will be attached to every post it writes.
Coming from the normalization realm of RDBMS this sounds scary, but a lot of no-SQL documents encourage duplication(?).
Is this a valid approach?
What happens when a user needs to update his/her email address? How easily you make sure that the email is updated in all places?
The only benefit I see in the second solution is that you can fetch both post and user data in one call.
Is there any other solution for this simple yet very common scenario?
ps: go easy on me, first time no-sql dev.
Thanks in advance.

Use solution 1. Guidance on nesting vs not nesting will depend on the N-to-M relationship of those entities (for example, is it 1 to many, many to many?).
If you believe you will never access an entity without accessing its 'parent', nesting may be appropriate. In firestore (or document-based noSQL databases), you should make the decision whether to nest that entity directly in the document vs in a subcollection based on the expect size of that nested entity. For example, messages in a chat should be a subcollection, as they may in total exceed the maximum document size.
Mongo, a leading noSQL db, provides some guides here
Firestore also provided docs
Hope this helps

#christostsang I would suggest a combination of option 1 and option 2. I like to duplicate data for the view layer and reference the user_id as you suggested.
For example, you will usually show a post and the created_by or author_name with the post. Rather than having to pay additional money and cycles for the user query, you could store both the user_id and the user_name in the document.
A model you could use would be an object/map in firestore here is an example model for you to consider
posts = {
id: xxx,
title: xxx,
body: xxx,
likes: 4,
user: {refId: xxx123, name: "John Doe"}
}
users = {
id: xxx,
name: xxx,
email: xxx,
}
Now when you retrieve the posts document(s) you also have the user/author name included. This would make it easy on a postList page where you might show posts from many different users/authors without needed to query each user to retrieve their name. Now when a user clicks on a post, and you want to show additional user/author information like their email you can perform the query for that one user on the postView page. FYI - you will need to consider changes that user(s) make to their name and if you will update all posts to reflect the name change.

Firebase query for bi-directional link

I'm designing a chat app much like Facebook Messenger. My two current root nodes are chats and users. A user has an associated list of chats users/user/chats, and the chats are added by autoID in the chats node chats/a151jl1j6. That node stores information such as a list of the messages, time of the last message, if someone is typing, etc.
What I'm struggling with is where to make the definition of which two users are in the chat. Originally, I put a reference to the other user as the value of the chatId key in the users/user/chats node, but I thought that was a bad idea incase I ever wanted group chats.
What seems more logical is to have a chats/chat/members node in which I define userId: true, user2id: true. My issue with this is how to efficiently query it. For example, if the user is going to create a new chat with a user, we want to check if a chat already exists between them. I'm not sure how to do the query of "Find chat where members contains currentUserId and friendUserId" or if this is an efficient denormalized way of doing things.
Any hints?

Although the idea of having ids in the format id1---||---id2 definitely gets the job done, it may not scale if you expect to have large groups and you have to account for id2---||---id1 comparisons which also gets more complicated when you have more people in a conversation. You should go with that if you don't need to worry about large groups.
I'd actually go with using the autoId chats/a151jl1j6 since you get it for free. The recommended way to structure the data is to make the autoId the key in the other nodes with related child objects. So chats/a151jl1j6 would contain the conversation metadata, members/a151jl1j6 would contain the members in that conversation, messages/a151jl1j6 would contain the messages and so on.
"chats":{
"a151jl1j6":{}}
"members":{
"a151jl1j6":{
"user1": true,
"user2": true
}
}
"messages":{
"a151jl1j6":{}}
The part where this gets is little "inefficient" is the querying for conversations that include both user1 and user2. The recommended way is to create an index of conversations for each user and then query the members data.
"user1":{
"chats":{
"a151jl1j6":true
}
}
This is a trade-off when it comes to querying relationships with a flattened data structure. The queries are fast since you are only dealing with a subset of the data, but you end up with a lot of duplicate data that need to be accounted for when you are modifying/deleting i.e. when the user leaves the chat conversation, you have to update multiple structures.
Reference: https://firebase.google.com/docs/database/ios/structure-data#flatten_data_structures

I remember I had similar issue some time ago. The way how I solved it:
user 1 has an unique ID id1
user 2 has an unique ID id2
Instead of adding a new chat by autoId chats/a151jl1j6 the ID of the chat was id1---||---id2 (superoriginal human-readable delimeter)
(which is exactly what you've originally suggested)
Originally, I put a reference to the other user as the value of the chatId key in the users/user/chats node, but I thought that was a bad idea in case I ever wanted group chats.
There is a saying: https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it
There might a limitation of how many userIDs can live in the path - you can always hash the value...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex