How to design DynamoDB table to facilitate searching by time ranges, and deleting by unique ID - amazon-dynamodb

I'm new to DynamoDB - I already have an application where the data gets inserted, but I'm getting stuck on extracting the data.
Requirement:
There must be a unique table per customer
Insert documents into the table (each doc has a unique ID and a timestamp)
Get X number of documents based on timestamp (ordered ascending)
Delete individual documents based on unique ID
So far I have created a table with composite key (S:id, N:timestamp). However when I come to query it, I realise that since my id is unique, because I can't do a wildcard search on ID I won't be able to extract a range of items...
So, how should I design my table to satisfy this scenario?
Edit: Here's what I'm thinking:
Primary index will be composite: (s:customer_id, n:timestamp) where customer ID will be the same within a table. This will enable me to extact data based on time range.
Secondary index will be hash (s: unique_doc_id) whereby I will be able to delete items using this index.
Does this sound like the correct solution? Thank you in advance.

You can satisfy the requirements like this:
Your primary key will be h:customer_id and r:unique_id. This makes sure all the elements in the table have different keys.
You will also have an attribute for timestamp and will have a Local Secondary Index on it.
You will use the LSI to do requirement 3 and batchWrite API call to do batch delete for requirement 4.
This solution doesn't require (1) - all the customers can stay in the same table (Heads up - There is a limit-before-contact-us of 256 tables per account)

Related

Can I create with DynamoDB multiple tables with secondary index concurrencly?

I am confused by the API documentation of CreateTable from DynamoDB. I need to create multiple tables with a secondary index. From the API: https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/dynamodb/DynamoDbClient.html#createTable-software.amazon.awssdk.services.dynamodb.model.CreateTableRequest-
If you want to create multiple tables with secondary indexes on them, you must create the tables sequentially. Only one table with secondary indexes can be in the CREATING state at any given time.
and
Up to 500 simultaneous table operations are allowed per account. These operations include CreateTable, UpdateTable, DeleteTable, UpdateTimeToLive, RestoreTableFromBackup, and RestoreTableToPointInTime.
The only exception is when you are creating a table with one or more secondary indexes. You can have up to 250 such requests running at a time;
Can I create now only one table with a secondary index or 250 at the same time?
If I create multiple tables sequential without waiting on active state is this already concurrency creation?
Must I wait on the active state for every table if I create multiple tables with secondary indexes?
An individual account can only be running one "Create Index" action at a time, no matter how many tables you have.
To understand this it may help to understand what an Index is. An Index is a complete copy of the table, but with a different partition and sort key. So if your original table has a PK of of userId and a sk of sort_key you could now create an index where the partition key is set to sort_key and the sort_key is now set to userId creating an inverted index (a common practice in Dynamo - remember Queries in Dynamo must know what the PK is, so if you have UserID you could access all data of a given User, or if you wanted all Users who have a particular tag, you may have an SK item on users that is something like TAG#ThisTag and then you wanted all users with ThisTag you could do a query against the inverted index with a pk = TAG#ThisTag and get back a list of UserIds.)
While the CreateIndex is being run on a given table, no other actions can be run on it - it wont accept changes to the data/configuration that would cause a fault/mismatch in the copying process. This is one of the reasons a given account is limited to only one create index operation at a time.
As a slight aside if I may - if you have a single account with multiple Dynamos all for the same product, you may want to rethink your database strategy. A single Dynamo Table can be used for many different storages if you set up your PK-SK as generic fields (ie: pk and sk as the attribute names) - No document inside your dynamo has to have the same attributes as any other. And when accessing data, each partition key is exactly as its named - a Partition of data that is all that is accessed when a query is made against that PK. (so if you have 100 items with PK of USER#1 and 100 items with a PK of USER#2 and you query against USER#1 you only access that 100 items - the rest are ignored by the Query and never ever touched - allowing you to in effect have multiple "tables" in a single DynamoDB Table by giving them different Partition Key prefixes.)

Is there any way to retrieve items from dynamodb table by applying filter condition on primary key

I am trying to fetch items from a dynamodb table with some condition on primary key and I don't have any other values with me.I just know that some of records in the table have a different pattern for primary key (like contains a hyphen in it) which others don't.How do I achieve this in a simple way..Do I need to Scan the complete table get the result and filter the desired records
Some thing like "Select * from Student where Id like '%-%', as we do in sql
You will need to do a scan and filter. If the table has a lot of items it could be a slow and expensive process.

How best to perform a query on primary partition key only, for a table which has both partition key and sort key?

Ok, I have a table with primary partition key (Employee ID) and Sort Key (Poject ID). Now I want a list of all projects an employee works on. Also I want list of all employees working on a project. The relationship is many to many. I have created schema in AppSync (GraphQL). Appsync created the required queries and mutations for the type (EmployeeProjects). Now the ListEmployeeProjects takes a filter input with different attributes. My question is when I do the two searches on Employee ID or Project ID only, will it be a complete table scan? How efficient will that be. If it is a table scan, can I reduce the time complexity by creating indexes (GSI or LSI). The end product will have huge amount of data, so I cannot test the app with such data before hand. My project works fine, but I am worried about the problems that might arise later on with a lot of data. Can someone please help.
You don't need to (and should not) perform a Scan for this.
To get all of the projects an employee is working on, you just need to perform a Query on the base table, specifying employee ID as the partition key.
To get all of the employees on a project, you should create a GSI on the table. The partition key should be project ID and sort key should be employee ID. Then perform a Query on the GSI, using partition key of project ID.
In order to model this correctly you will probably want three tables
Employee Table
Project Table
Employee-Project reference table (i.e. just two attributes of employee ID and project ID)

AWS DynamoDB Query based on non-primary keys

I'm new to AWS DynamoDB and wanted to clarify something. Is it possible to query a table and filter base on a non-primary key attribute. My table looks like the following
Store
Id: PrimaryKey
Name: simple string
Location: simple string
Now I want to query on the Name, but I think I have to give the key as well from what I know? Apart from that I can use the scan but then I will be loading all the data.
From the docs:
The Query operation finds items based on primary key values. You can query any table or secondary index that has a composite primary key (a partition key and a sort key).
DynamoDB requires queries to always use the partition key.
In your case your options are:
create a Global Secondary Index that uses Name as a primary key
use a Scan + Filter if the table is relatively small, or if you expect the result set will include the majority of the records in the table
There are few designs principals that you can follow while you are using DynamoDB. If you are coming from a relational background, you have already witnessed the query limitations from primary key attributes.
Design your tables, for querying and separating hot and cold data.
Create Indexes for Querying from Non Key attributes (You have two options, Global Secondary Index which you can define at any time and Local Secondary Index which you need to specify at table creation time).
With the Global Secondary Index you can promote any NonKey attribute as the Partition Key for the Index and select another attribute for Sort Key for querying. For Local Secondary Index, you can promote any Non Key attribute as the Sort Key keeping the same Partition Key.
Using Indexes for query is important also to improve the efficiency in using provisioned throughput.
Although having indexes consumes the read throughput from the table, it also saves read through put from in a way that, if you project the right amount of attributes to read, it can give a huge benefit in reading. Check the following example.
Lets say you have a DynamoDB table that has items of 40KB. If you read directly from the table to list 10 items, it consumes 100 Read Throughput Units (For one item 10 Units since one unit can read 4KB and multiply it by 10). If you have an index defined just to project the attributes needed to list which will be having 4KB per item, then it will be consuming only 10 Read Throughput Units(One Unit per item) which makes a huge difference in terms of cost.
With DynamoDB its really important how you define Indexes to optimize for Querying not only from Query capability but also in terms of throughput.
You can not query based non-primary key attribute in Dynamo Db.
If you wanted to still do that you can do it using scan query,but scan is costly operation in DyanmoDB and if table is large, then it will affect performance and not recommended because it will scan each item in table and AWS cost you for all item it scan for that query.
There are two ways to achieve it
Keep Store Id as your PrimaryKey/ Partaion key of Dyanmo DB table and add Name/Location as sort Key (only one as Dyanmo DB accept only one Attribute as sort key by design.
Create Global Secondary Indexes for Querying from Non Key attributes which you are more frequenly required.
There are 3 ways to created GSI in Dyanamo DB, In your case select GSI with option INCLUDE and add Name , Location and store ID in Idex.
KEYS_ONLY – Each item in the index consists only of the table partition key and sort key values, plus the index key values. The KEYS_ONLY option results in the smallest possible secondary index.
INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will include other non-key attributes that you specify.
ALL – The secondary index includes all of the attributes from the source table. Because all of the table data is duplicated in the index, an ALL projection results in the largest possible secondary index.

How to make values unique in cassandra

I want to make unique constraint in cassandra .
As i want to all the value in my column be unique in my column family
ex:
name-rahul
phone-123
address-abc
now i want that i this row no values equal to rahul ,123 and abc get inserted again on seraching on datastax i found that i can achieve it by doing query on partition key as IF NOT EXIST ,but not getting the solution for getting all the 3 values uniques
means if
name- jacob
phone-123
address-qwe
this should also be not inserted into my database as my phone column has the same value as i have shown with name-rahul.
The short answer is that constraints of any type are not supported in Cassandra. They are simply too expensive as they must involve multiple nodes, thus defeating the purpose of having eventual consistency in first place. If you needed to make a single column unique, then there could be a solution, but not for more unique columns. For the same reason - there is no isolation, no consistency (C and I from the ACID). If you really need to use Cassandra with this type of enforcement, then you will need to create some kind of synchronization application layer which will intercept all requests to the database and make sure that the values are unique, and all constraints are enforced. But this won't have anything to do with Cassandra.
I know this is an old question and the existing answer is correct (you can't do constraints in C*), but you can solve the problem using batched creates. Create one or more additional tables, each with the constrained column as the primary key and then batch the creates, which is an atomic operation. If any of those column values already exist the entire batch will fail. For example if the table is named Foo, also create Foo_by_Name (primary key Name), Foo_by_Phone (primary key Phone), and Foo_by_Address (primary key Address) tables. Then when you want to add a row, create a batch with all 4 tables. You can either duplicate all of the columns in each table (handy if you want to fetch by Name, Phone, or Address), or you can have a single column of just the Name, Phone, or Address.

Resources