How to filter list value in DynamoDB - amazon-dynamodb

I am currently new to AWS DynamoDB and noSql.
I am lost here on filtering list value in DynamoDB.
Let's say that I have 2 items in this table.
[
{
"id": 1,
"title": "Robots in Music"
"topics": ["Robots", "Violin"]
},
{
"id": 2,
"title": "Where are good places to see stars"
"topics": ["Robots", "Stars"]
},
]
I want to filter using topics column.
ex.
User wants to get the item with topics having "Robots".
User gets item with id 1 and 2
User wants to get item with topics having "Stars".
Then user gets item with id 2.
I tried to search internet and found that I can use 'QueryFilter' 'contains'.
However, I know that 'contains' is scanning all the table and for DynamoDB they can extract 1MB of data in single query. Which means the action needs to be repeated and it would cost way more than using single index.
Is there any way to use GSI and filter the list effectively?

Unfortunately you cannot index a list type or any other type of nested attrubute, and your use case would require you to Scan the entire table to know which users contained a particular topic.
Would require a Scan
A GetItem if the user wants just id1 or of the user wants both id1 and id2 then a BatchGetItem
Same as 1
GetItem
Of your use-case requires searching nested attributes then you can consider using a relational database or something more flexible like OpenSearch.

Related

How should we model many to many relationships in dynamodb when aiming for single table design

Quick high level background:
DynamoDB with single table design
OpenSearch for full text search
DynamoDB Stream which indexes into OpenSearch on DynamoDB Create/Update/Delete via Lambda
The single table design approach has been working well for us so far, but we also haven't really had many-to-many relationships to deal with. However a new relationship we recently needed to account for is Tags for Entry objects:
interface Entries {
readonly id: string
readonly title: string
readonly tags: Tag[]
}
interface Tags {
readonly id: string
readonly name: string
}
We want to try and stick to a single query/read to retrieve a list of Entries / single Entry but also want to find a good balance between having to manage updates.
A few ways we've considered storing the data:
Store all tag data in the Entry
{
"id": "asdf1234",
"title": "Entry Title",
"tags": [
{
"id": "1234asdf",
"name": "stack"
},
{
"id": "4321hjkl",
"name": "over"
},
{
"id": "7657gdfg",
"name": "flow"
}
]
}
This approach makes reads easy, but updates become a pain - anytime a tag is updated, we would need to find a way to find all Entries that reference that tag and then update it.
Store only the tag ids in the Entry
{
"id": "asdf1234",
"title": "Entry Title",
"tags": ["1234asdf", "4321hjkl", "7657gdfg"]
}
With this approach, no updates would be required when a Tag is updated, but now we have to do multiple reads to return the full data - we would need to query each Tag by id to retrieve its data before returning the full content back to the client.
Store only the tag ids in the Entry but use OpenSearch to query and get data
This option, similar to the one above, would store only the tag ids on the Entry, but then have the Entry document that is indexed on the search side include all Tag data in our stream lambda. Updates on a Tag would still require updates to all Entries (in search) to also query and update each Entry individually - but the question is if its more cost effective to just do it in DynamoDB.
This scenario presents an interesting uni-directional flow:
writes go straight to DynamoDB
DynamoDB stream -> Lambda - do a transformations on the data => index in OpenSearch
reads are exclusively done via OpenSearch
The overall question is, how do applications using nosql with single table design, handle these many-to-many scenarios? Is using a uni-directional flow stated above a good idea/worth it?
Things to consider:
our application leans more heavily on the read side
our application will also utilize search capability quite heavily
Tag updates will not be often

How to combine multiple firebase docs to get a combined result?

In my firebase db I have 3 collections:
Users
{user_id}: {name: "John Smith"}
Items
{item_id}: {value: 12345}
Actions
{action_id}: {action: "example", user: {user_id}, items:{item_id}}
Basically, instead of storing the Users and Items under the Actions Collection, I just keep an ID. But now I need a list of all actions and this also needs info from the Users and Items Collection. How can I efficiently query firebase so I can get a result that looks like this:
{
action: "example",
user: {
name: "John Smith"
},
item: {
value: 1234
}
}
Unfortunately, there is no such thing in firebase or a similar database, basically, you are looking for a traditional join, which is no recommended thing to do in a NoSQL database.
If you want to do it in firebase, you will need:
Get the element you are looking for from your main collection Actions in this case.
Then you need to do another call to the Items collections where item_id == action.item_id.
Then assign in the actions["Item"] = item_gotten.
This is not a recommended use as I said, usually, when you are using a NoSQL Database you are expecting a denormalize structure, from your application you need to save the whole Item, in the Action JSON, and also in the Item. Yes, you will have duplicate data but this is fine for this kind of model. also you shouldn't expect too many changes in one specific object within your whole object key If you are managing a big set of changes you could be using the incorrect kind of DB.
For aggregation queries reference, you might check: https://firebase.google.com/docs/firestore/solutions/aggregation

Cosmos DB simple ID field for end user

Is it possible to include a user friendly ID field into cosmos db documents? This doesn't need to override the default id field that generates when adding a document but can be a custom one that is simple for an end user to know and search for.
Example document, the ref field is what I want to generate as a simple human readable identifier.
{
"id": "57275754475457-5445444-44420478",
"ref": "45H7GI",
"userId": "48412",
"whenCreated": "D2021-11-09T21:56:31.630",
"tenantId": "5566HH"
}
I'm looking at building a ticketing system and would like a simple ID field for a user to be sent and who can reference when updating/ searching for.
Any help with this would be appreciated.
For your own purposes, you can choose to either use id (which is guaranteed to be unique within a partition) or your own property (such as ref as you defined in your example). For any property other than id, you'd need to add a unique-key constraint when creating the container (and at that point, ref would be unique within any partition, just like id).
Really your choice whether you store your custom id's in id or ref. Just know that, if you ever want to do a direct-read (instead of a query), you can only do a direct-read against an id, not against any other property.

Organizing a Cloud Firestore database

I can't manage to determine what is the better way of organizing my database for my app :
My users can create items identified by a unique ID.
The queries I need :
- Query 1: Get all the items created by a user
- Query 2 : From the UID of an item, get its creator
My database is organized as following :
Users database
user1 : {
item1_uid,
item2_uid
},
user2 : {
item3_uid
}
Items database
item1_uid : {
title,
description
},
item2_uid : {
title,
description
},
item3_uid : {
title,
description
}
For the query 2, its quite simple but for the query 2, I need to parse all the users database and list all the items Id to see if there is the one I am looking for. It works right now but I'm afraid that it will slow the request time as the database grows.
Should I add in the items data a row with the user id ? If yes the query will be simpler but I heard that I am not supposed to have twice the same data in the database because it can lead to conflicts when adding or removing items.
Should I add in the items data a row with the user id ?
Yes, this is a very common approach in the NoSQL world and is called denormalization. Denormalization is described, in this "famous" post about NoSQL data modeling, as "copying of the same data into multiple documents in order to simplify/optimize query processing or to fit the user’s data into a particular data model". In other words, the main driver of your data model design is the queries you plan to execute.
More concretely you could have an extra field in your item documents, which contain the ID of the creator. You could even have another one with, e.g., the name of the creator: This way, in one query, you can display the items and their creators.
Now, for maintaining these different documents in sync (for example, if you change the name of one user, you want it to be updated in the corresponding items), you can either use a Batched Write to modify several documents in one atomic operation, or rely on one or more Cloud Functions that would detect the changes of the user documents and reflect them in the item documents.

firebase realtime schema design

i have two set of entities in my firebase realtime schema. Called Orders and customers.
so far i was not actually relating them in my app but was just showing them related. the current schema looked like:
{
"orders" : [
{"id" : 1, "name": "abc", "price": 200, "customer": "vik"}
],
"customers" : [
{"cust_id" : "10", "name" : "vik", "type": "existing"}
]
}
so i have a orders list page showing all the orders in a table which i get firing /orders.json
But practically, instead of having the customer name directly in the orders i should have cust_id attribute as that is the key.
That naturally makes it a standard relational schema where i will be free to change customer attributes without worrying about mismatch in orders.
However, the downside i see right away is that if i have say 20 orders to show in the order list table then instead of 1 i will end up firing 21 rest calls (1 to get order list and 20 to fetch customer name for each of the order)
What are the recommendations or standards around this ?
Firebase is a NoSQL database. So the rules of normalization that you know from relational databases don't necessarily apply.
For example: having the customer name in each order is actually quite normal. It saves having to do a client-side join for each customer record, significantly simplifying the code and improving the speed of the operation. But of course it comes at the cost of having to store data multiple times (quite normal in NoSQL databases), and having to consider if/how you update the duplicated data in case of updates of the customer record.
I recommend reading NoSQL data modeling, watching Firebase for SQL developers, and reading my answer on keeping denormalized data up to date.

Resources