Query dynamodb for text search? - amazon-dynamodb

I am looking to optimize the dynamodb operations i.e removing scans and use queries to fetch data fastly.
Table Data:
itemId itemName itemOwners
hash1 abc [user1, user2]
hash2 abcd [user1, user3]
hash3 xyz [user2, user3]
I have to do the item search using an item name.
Right now, we scan the whole table.
let getItems = {
TableName: ItemsTable,
FilterExpression: 'contains (#itemName, :searchValue)',
ExpressionAttributeNames: { '#itemName': 'itemName' },
ExpressionAttributeValues: { ':searchValue': searchValue },
};
let items = await docClient.scan(getItems).promise();
We then filter the items result if the itemOwners contains the userId for the searching user.
I wanted to know if there is a better way of doing this search query with dynamodb?

There isn't a way to do a contains in DynamoDB without it being a filter condition. DynamoDB is not really designed for full text search. However, there is a way to do some degree of text search capabilities from DynamoDB. I'm not suggesting you should, but you can. Basically you create a record for each word/item combination you want to include in your search. This doesn't allow for partial word matches, but it is a way to get full word matching. That, of course, requires pre-processing all your data to make it available for search. If you decide to go this route I would recommend using DynamoDB streams to manage updating the search data. Every time an item is inserted/updated/deleted from the database you can run a lambda to update the search records for that item. Again, not suggesting you should do this, but you can.
I would recommend investigating CloudSearch as an alternative to this.

Related

Using "OR" and selection by another table model in DynamoDB

I will explain my question with a concrete example. I have single DynamoDB table (for this example). Table is consisting by the two models:
- user: {
firstname
lastname
placeId
typeId
}
// List of favourites for each users
- userFavourites {
userId
favouriteId
favouriteType
}
I would like to effectively find users, by the following rule:
placeId = 'XXX' OR typeId = 'YYY' or user have any favourite with favouriteId: 'ZZZ' and favouriteType: "Dog" OR user have any favourite with favouriteType: "Cat"
I'm using onetable for communication with dynamo: https://doc.onetable.io/start/quick-tour/
Is it possible to do this kind of selection in DynamoDB (with multiple OR and selection by items from another model in same table) and everything together in one rule?
To be efficient with your reads you must do a GetItem or Query which means you have to provide the partition key for the item, that means you cannot do an OR with the native APIs.
You can however do an OR using PartiQL ExecuteStatement where you can say:
SELECT * FROM MYTABLE WHERE PARTITIONKEY IN [1,2,3]
Again this is only useful when it's the partition key.
If you are looking for OR on multiple different values then I suggest perhaps using a more suitable database with more flexible query capability, as to do so with DynamoDB would resul in a full table Scan each time you need a single row/item.

Query item detail based on previous query result DynamoDB

Im still new with DynamoDB , How do I query something based on the previous query result?
This is how my table look like :
I want to query for the list of project info for an user.
From my first query , the result of USER#001 have [PROJECT#001,PROJECT#002].
Then I want to get a list project detail based on the first query.
How do I make an "nested" query ?? or is there anyway there I can query more efficiently ?
*The table structure is fix, I cant change it.
You can't. That's not the way DDB works...
All you could do is a BatchGetItem() with the pk & sk for the two projects.
Or if you don't happen to know the SK, you'd need to make two individual Query() calls with the pk only.

DynamoDB Limit on query

I have a doubt about Limit on query/scans on DynamoDB.
My table has 1000 records, and the query on all of them return 50 values, but if I put a Limit of 5, that doesn't mean that the query will return the first 5 values, it just say that query for 5 Items on the table (in any order, so they could be very old items or new ones), so it's possible that I got 0 items on the query. How can actually get the latest 5 items of a query? I need to set a Limit of 5 (numbers are examples) because it will to expensive to query/scan for more items than that.
The query has this input
{
TableName: 'transactionsTable',
IndexName: 'transactionsByUserId',
ProjectionExpression: 'origin, receiver, #valid_status, createdAt, totalAmount',
KeyConditionExpression: 'userId = :userId',
ExpressionAttributeValues: {
':userId': 'user-id',
':payment_gateway': 'payment_gateway'
},
ExpressionAttributeNames: {
'#valid_status': 'status'
},
FilterExpression: '#valid_status = :payment_gateway',
Limit: 5
}
The index of my table is like this:
Should I use a second index or something, to sort them with the field createdAt but then, how I'm sure that the query will look into all the items?
if I put a Limit of 5, that doesn't mean that the query will return the first 5 values, it just say that query for 5 Items on the table (in any order, so they could be very old items or new ones), so it's possible that I got 0 items on the query. How can actually get the latest 5 items of a query?
You are correct in your observation, and unfortunately there is no Query options or any other operation that can guarantee 5 items in a single request. To understand why this is the case (it's not just laziness on Amazon's side), consider the following extreme case: you have a huge database with one billion items, but do a very specific query which has just 5 matching items, and now making the request you wished for: "give me back 5 items". Such a request would need to read the entire database of a billion items, before it can return anything, and the client will surely give up by then. So this is not how DyanmoDB's Limit works. It limits the amount of work that DyanamoDB needs to do before responding. So if Limit = 100, DynamoDB will read internally 100 items, which takes a bounded amount of time. But you are right that you have no idea whether it will respond with 100 items (if all of them matched the filter) or 0 items (if none of them matched the filter).
So to do what you want to do efficiently, you'll need to think of a different way to model your data - i.e., how to organize the partition and sort keys. There are different ways to do it, each has its own benefits and downsides, you'll need to consider your options for yourself. Since you asked about GSI, I'll give you some hints about how to use that option:
The pattern you are looking for is called filtered data retrieval. As you noted, if you do a GSI with the sort key being createdAt, you can retrieve the newest items first. But you still need to do a filter, and still don't know how to stop after 5 filtered results (and not 5 pre-filtering) results. The solution is to ask DynamoDB to only put in the GSI, in the first place, items which pass the filtering. In your example, it seems you always use the same filter: "status = payment_gateway". DynamoDB doesn't have an option to run a generic filter function when building the GSI, but it has a different trick up its sleeve to achieve the same thing: Any time you set "status = payment_gateway", also set another attribute "status_payment_gateway", and when status is set to something else, delete the "status_payment_gateway". Now, create the GSI with "status_payment_gateway" as the partition key. DynamoDB will only put items in the GSI if they have this attribute, thereby achieving exactly the filtering you want.
You can also have multiple mutually-exclusive filtering criteria in one GSI by setting the partition key attribute to multiple different values, and you can then do a Query on each of these values separately (using KeyConditionExpression).

DynamoDB sub item filter using .Net Core API

First of all, I have table structure like this,
Users:{
UserId
Name
Email
SubTable1:[{
Column-111
Column-112
},
{
Column-121
Column-122
}]
SubTable2:[{
Column-211
Column-212
},
{
Column-221
Column-222
}]
}
As I am new to DynamoDB, so I have couple of questions regarding this as follows:
1. Can I create structure like this?
2. Can we set primary key for subtables?
3. Luckily, I found DynamoDB helper class to do some operations into my DB.
https://www.gopiportal.in/2018/12/aws-dynamodb-helper-class-c-and-net-core.html
But, don't know how to fetch only perticular subtable
4. Can we fetch only specific columns from my main table? Also need suggestion for subtables
Note: I am using .net core c# language to communicate with DynamoDB.
Can I create structure like this?
Yes
Can we set primary key for subtables?
No, hash key can be set on top level scalar attributes only (String, Number etc.)
Luckily, I found DynamoDB helper class to do some operations into my DB.
https://www.gopiportal.in/2018/12/aws-dynamodb-helper-class-c-and-net-core.html
But, don't know how to fetch only perticular subtable
When you say subtables, I assume that you are referring to Array datatype in the above sample table. In order to fetch the data from DynamoDB table, you need hash key to use Query API. If you don't have hash key, you can use Scan API which scans the entire table. The Scan API is a costly operation.
GSI (Global Secondary Index) can be created to avoid scan operation. However, it can be created on scalar attributes only. GSI can't be created on Array attribute.
Other option is to redesign the table accordingly to match your Query Access Pattern.
Can we fetch only specific columns from my main table? Also need suggestion for subtables
Yes, you can fetch specific columns using ProjectionExpression. This way you get only the required attributes in the result set

How to setup Tables in DynamoDB to be able to query a string set/List?

Currently, I have a table in DynamoDB with this schema:
ID : PostID (Hash Key)
Location : LocationID (Range Key)
Tags: List of strings
Now I want to be able to query the Tags list and for each item in the table and find which Posts have those tags. How do I do that? Reading over the DynamoDB docs, I figure that I will have to set up another table since a Query cannot search specific strings in a List/String Set but I was not able to figure out how to structure the table.
For example, item:
PostID: Guid
Location: someLocationString
Tags: ["Queen", "Royalty", "England"]
Now I want to query for items that have a tag called "Queen" and I want the result to be a set of items that has the tag. (PostID + Location)
What would be the optimal structure? And how would I maintain the table when adding new tags?
If I understand the requirements correctly you can go with a table:
Hash: Tag
Range: PostId#LocationId
That way you can query on Tag and get all the pairs.
Keeping both tables in sync is not optimal but it all depends how much atomicity is important to you. Perhaps look at DynamoDB Transactions, or do some "CAS" inserts.

Resources