DynamoDB - query based on multiple columns - amazon-dynamodb

I have a requirement to find all users in a table that have same Id, Email or Phone.
Right now the data looks like this:
Id //hash
Market //sort
Email //gsi
Phone //gsi
I want to be able to do a query and say:
Get all items that have matching Id, email or phone.
From the docs it seems that you can only do a single query based on keys or one index. And it seems that even if I was to combine phone and email into one column and GSI that column I would still be limited to a begin with filter expression, is this correct? Are there any alternatives?

it seems that you can only do a single query based on keys or one index
Yes.
if I was to combine phone and email into one [GSI] I would still be limited to a begin with filter expression, is this correct?
Essentially, yes. Query constraints apply equally to indexes and the table keys. You must specify one-and-only-one Partition Key value, and optionally a range of Sort Key values.
Are there any alternatives?
Overload the Partition Key and denormalise the data. Redefine the Partition Key column (renamed PK) to hold Id, Email and Phone values. Each record is (fully or partially) repeated 3 times, each time with a different PK type.
PK Market Id More fields
Id-1 A Id-1 foo
zaphod#42.com A Id-1 # foo or blank
13015552572 A Id-1 # foo or blank
Querying PK = <something> AND Market > "" will return any matching id, email or phone number value.
If justified by your query patterns, repeat all fields 3x. Alternatively, use a hit on a truncated email/phone record to identify the Id, then query other fields using the Id.
There are different flavours of this pattern. For instance, you could also overload the Sort Key column (renamed to SK) with the Id value for Email and Phone records, which would permit multiple Ids per email/phone.

Related

How to query and order on two separate sort keys in DynamoDB?

GROUPS
userID: string
groupID: string
lastActive: number
birthday: number
Assume I have a DynamoDB table called GROUPS which stores items with these attributes. The table records which users are joined to which groups. Users can be in multiple groups at the same time. Therefore, the composite primary key would most-commonly be:
partition key: userID
sort key: groupID
However, if I wanted to query for all users in a specific group, within a specific birthday range, sorted by lastActive, is this possible and if so what index would I need to create?
Could I synthesize lastActive and userID to create a synthetic sort key, like so:
GROUPS
groupID: string
lastActiveUserID: string (i.e. "20201230T09:45:59-abc123")
birthday: number
Which would make for a different composite primary key where the partition key is groupID and the sort key is lastActiveUserID, which would sort the participants by when they were last active, and then a secondary index to filter by birthday?
As written, no this isn't possible.
within a specific birthday range
implies sk_birthday between :start and :end
sorted by lastActive
implies lastActive as a sort key.
which are mutually exclusive...I can't devise a sort key that would be able to contain both values in a usable format.
You could have a Global Secondary Index with a hash key of group-id and lastActive as a sort key, then filter on birthday. But, that only affects the data returned, it doesn't affect the data read nor the cost to read that data. Additionally, since DDB only reads 1MB of data at a time, you'd have to call it repeatedly in a loop if it's possibly a given group has more than 1MB worth of members.
Also, when your index has a different partition (hash) key than your table, that is a global secondary index (GSI). If your index has the same partition key but a different sort key than the table, that can be done with a local secondary index (LSI)
However for any given query, you can only use the table or a given index. You can't use multiple indexes at the same time
Now having said all that, what exactly to you mean by "specific birthday range" If the range in question is a defined period, by month, by week. Perhaps you could have a GSI where the hash key is "group-id#birthday-period" and sort key is lastActive
So for instance, "give me GROUPA birthdays for next month"
Query(hs = "GROUPA#NOVEMBER")
But if you wanted November and December, you'd have to make two queries and combine & sort the results yourself.
Effective and efficient use of DDB means avoiding Scan() and avoiding the use of filterExpressions that you know will throw away lots of the data read.

Query DynamoDB with Partition key and Sort Key using OR Conditon

I have a requirement to query the dynamoDB and get all the records which matches a certain criteria. The requirement is, I have table say parent_child_table, which has parent_id and child_id as two columns, now i need to query the table with a particular input id and fetch all the records. for Example
now if I query the db with id 67899, the I should get both two records i.e 12345 and 67899.
I was trying to use below methods :
GetItemRequest itemRequest=new GetItemRequest().withTableName("PARENT_CHILD_TABLE").withKey(partitionKey.entrySet().iterator().next(), sortKey.entrySet().iterator().next());
but am not getting OR operator.
DynamoDB doesn't work like that...
GetItemRequest() can only return a single record.
Query() can return multiple records, but only if you are using a composite primary key (partition key + sort key) and you can only query within a single partition...so all the records to be returned must have the same partition key.
Scan() can return multiple records from any partition, but it does so by always scanning the entire table. Regular use of scan is a bad idea.
Without knowing more it's hard to provide guidance, but consider a schema like so:
partition key sort key
12345 12345
12345 12345#67899
12345 12345#67899#97765
Possibly adding some sort of level indicator in the sort key or just as an attribute.

getting the right data model with dynamodb

I am about to create my first dynamodb table and can't find a proper solution to model my requirements. It sounds very basic, but probably my brain is still too much into relational database world.
I want to do store something similar like that:
A user can buy a product (once or several times). What I want to store is username, product_id
The only things I need to query later are:
which products have been purchased by user X
how many times were they purchased
First I considered having having an item with two attributes: username and product_id. But then I cannot use username as primary key (a user can buy more than once) neither can I user username + product_id (user can buy a product several times)
Now I would go for having username, product_id, counter and taking username + product_id as primary key. However, I will always need to check first if a product was already purchased and update it, otherwise create a new entry. For getting all products of a user I would create a global secondary index on username.
However, I am not very sure if this is the right way. Any feedback would be great!
There are probably a number of ways to do this and I don't know all of your requirements so I can't guarantee this is the right answer for you but based on your description, this is what I would do.
First, I'm assuming that each order has some sort of unique order number associated with it. I would use this order number as the primary key of the table. I wouldn't use a range key. This would ensure that the constraint that all primary keys be unique is met. In addition, when I write the data to DynamoDB I would also write the username and the product_id as additional attributes.
Next, I would create a Global Secondary Index that uses the username as the primary key and the product_id as the range key. Unlike the primary key of the table, GSI keys do not have to be unique so if a user purchased a particular product more than once, this would be fine. This GSI would allow me to perform queries such as "find all orders by username" or "find all orders where username purchased product_id".
If you also needed to do queries like "find all usernames who purchased product_id" you would need another GSI that used product_id as the primary key and username as the range key.

Insert or ignore every column

I have a problem with a sqlite command.
I have a table with three columns: Id, user, number.
The id is continuing. Now if I put a user and a number inside my list, my app should compare if such a user with this number already exist. The problem is, if I use a standard "insert or ignore" command, the Id column is not fixed, so I will get a new entry every time.
So is it possible just two compare two of three columns if they are equal?
Or do I have to use a temporary list, where are only two columns exist?
The INSERT OR IGNORE statement ignores the new record if it would violate a UNIQUE constraint.
Such a constraint is created implicitly for the PRIMARY KEY, but you can also create one explicitly for any other columns:
CREATE TABLE MyTable (
ID integer PRIMARY KEY,
User text,
Number number,
UNIQUE (User, Number)
);
You shouldn't use insert or ignore unless you are specifying the key, which you aren't and in my opinion never should if your key is an Identity (Auto number).
Based on User and Number making a record in your table unique, you don't need the id column and your primary key should be user,number.
If for some reason you don't want to do that, and bearing in mind in that case you are saying that User,Number is not your uniqueness constraint then something like
if not exists(Select 1 From MyTable Where user = 10 and Number = 15)
Insert MyTable(user,number) Values(10,15)
would do the job. Not a SqlLite boy, so you might have to rwiddle with the syntax and wrap escape your column names.

How to design DynamoDB table to facilitate searching by time ranges, and deleting by unique ID

I'm new to DynamoDB - I already have an application where the data gets inserted, but I'm getting stuck on extracting the data.
Requirement:
There must be a unique table per customer
Insert documents into the table (each doc has a unique ID and a timestamp)
Get X number of documents based on timestamp (ordered ascending)
Delete individual documents based on unique ID
So far I have created a table with composite key (S:id, N:timestamp). However when I come to query it, I realise that since my id is unique, because I can't do a wildcard search on ID I won't be able to extract a range of items...
So, how should I design my table to satisfy this scenario?
Edit: Here's what I'm thinking:
Primary index will be composite: (s:customer_id, n:timestamp) where customer ID will be the same within a table. This will enable me to extact data based on time range.
Secondary index will be hash (s: unique_doc_id) whereby I will be able to delete items using this index.
Does this sound like the correct solution? Thank you in advance.
You can satisfy the requirements like this:
Your primary key will be h:customer_id and r:unique_id. This makes sure all the elements in the table have different keys.
You will also have an attribute for timestamp and will have a Local Secondary Index on it.
You will use the LSI to do requirement 3 and batchWrite API call to do batch delete for requirement 4.
This solution doesn't require (1) - all the customers can stay in the same table (Heads up - There is a limit-before-contact-us of 256 tables per account)

Resources