I'm a little lost with dynamodb table definition and Keyschema. Here's what i want to achieve :
I'm creating a table to store reporting information. This reporting will be in the folliwing format :
itemId, accountId, date, typeOfMetric, metric1, metric2, metric3
At the moment i expect typeOfMetric to be monthlyReport, or dailyData for example. accountId is for users who are grouped into accounts. So each account can access their own data.
Typically i'm thinking to query the table this way :
get all items with accountId=123 and typeOfMetrics=daily
get one item with accountId=123 and typeOfMetrics=daily and date=2021-11-15
And i'm a little lost with the keyschema and the indexes i should create, any help very welcome!
We can choose accountId as PrimaryKey and date as our sortKey. This will help us query over range.
I guess this will cater your requests for typeOfMetrics:Daily.
And, if we are looking for monthly, we can query over date over the month.
If this doesn't fits your use-case, let us know.
I ended up doing the following :
AccountId -> PrimaryKey
SK -> SortKey
This allows to have items like :
accountId REPORTDAY#2021-01-21 stats1, stats2
accountId REPORTMONTH#2021-01, stats1, stats2
And query that with begginWith
Related
I want to store and retrieve data from a DynamoDB table.
My data (an item = a review a user gave on a feature of an app) have the following attributes :
user string
feature string
appVersion string
timestamp string
rate int
description string
There is multiple features, on multiple versions of the app, and an user can give multiple reviews on these features. So I would like to use (user, appVersion, feature, timestamp) as a primary key.
But it does not seem to be possible to use that much attributes in a primary key in DynamoDB.
The first solution I implemented is to use user as a Partition Key, and a hash of (appVersion, feature, timestamp) as a Sort Key (in a new field named reviewID).
My problem is that, I want to retrieve an item for a given user, feature, appVersion without knowing the timestamp value (let's say I want the item with the latest timestamp, or the list of all items matching the 3 fields)
Without knowing the timestamp, I can't build the Sort Key necessary to retrieve my item. But if I remove the timestamp from the Sort Key, I will not be able to store multiple items having the same (user, appVersion, feature).
What would be the proper way to handle this usecase ?
I am thinking about using a hash of (user, appVersion, feature) as a Partition Key, and the timestamp as a Sort Key, would this be a correct solution ?
Put the timestamp at the end of your SK and then when you Query the data you use begins_with on the SK.
PK SK
UserID appVersion#feature#timestamp
This will allow you to dynamically query the data. For example you want all the users votes for a specific appversion
SELECT * FROM Mytable WHERE PK= 'x' AND SK BEGINS_WITH('{VERSION ID}')
This is done using a Query command.
The answer from Lee Hannigan will work, I like it.
However, keep in mind that accessing a PK is very fast because its hash-based.
I am thinking about using a hash of (user, appVersion, feature) as a
Partition Key, and the timestamp as a Sort Key, would this be a
correct solution?
This might also work, the table would look like this
PK SK
User#{User}AppVersion#{appVersion}#Feature#{feature} TimeStamp#{timestamp}
If you always know the user, appVersion, and the feature, this will be more optimal, because the SK lookup is O(logN)
one way
HASH string "modelName": "user"
RANGE string "id": "b0d5be50-4fae-11ed-981f-dbffcc56c88a"
uuid himself can be used for as timestamp
when searching you could search using reverse index
Another way
HASH string "modelName": "user"
RANGE string "createdAt" "2019-10-12T07:20:50.52Z"
createdAt, use time format rfc3339
when searching you could search using reverse index
Put down on paper what you need and you'll find others way to manage indes HASH/RANGE
Given a DynamoDB table that looks similar to:
sessionId: String
deviceType: String (mobile/tablet/computer/...)
networkType: String (wifi/ethernet/3g/4g/...)
There may be some other fields.
I need to be able to look up a session id given the other parameters. SQLish:
SELECT sessionId WHERE deviceType="Mobile"
SELECT sessionId WHERE networkType in (wifi, ethernet) AND deviceType="Tablet"
But from what I understand, querying in DynamoDB always requires the partition key (sessionId).
Is there an alternative layout to this table that will allow for better querying? We're still in setup phase, so it can be changed.
To be efficient and cost effective, I suggest you to create 2 Global Secondary Indexes (GSI). The PK will be "deviceType" and "networkType". For the SK and I don't have enough information to suggest something. Hence, no need to project all attributes because you only want to retrieve sessionId which is projected by defaut because it is a PK.
To sum up the data model:
PK Attributes
Table: sessionId deviceType, networkType, ...
GSI_1: deviceType sessionId, networkType, ...
GSI_2: networkType sessionId, deviceType, ...
For example, while querying GSI_1, you'll use PK="Mobile" for example to retrieve all related sessionId.
Doing this way is really fast and cost effective as the opposite as scan.
As part of migrating from SQL to DynamoDB I am trying to create a DynamoDB table. The UI allows users to search based on 4 attributes start date, end date, name of event and source of event.
The table has 6 attributes and the above four are subset of it with other attributes being priority and location. The query as described above makes it mandatory to search based on the above four values. whats the best way to store the information in DynamoDB that will help me in querying based on start date and end date fairly easy.
I thought of creating a GSI with hashkey as startdate, rangekey as end date and GSI on the rest two attributes ?
Inshort:
My table in DynamoDB will have 6 attributes
EventName, Location, StartDate, EndDate, Priority and source.
Query will have 4 mandatory attributes
StartDate, EndDate, Source and Event Name.
Thanks for the help.
You can use greater than/less than comparison operators as part of your query http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
So you could try to build a table with schema:
(EventName (hashKey), "StartDate-EndDate" (sortKey), other attributes)
In this case the sort-key is basically a combination of start and end date allowing you to use >= (on the first part) and <= (on the second part)... dynamodb uses ASCII based alphabetical ordering... so lets assume your sortKey looks like the following: "73644-75223" you could use >= "73000-" AND <= "73000-76000" to get the given event.
Additionally, you could create a GSI on your table for each of your remaining attributes that need to be read via query. You then could project data into your index that you want to fetch with the query. In contrast to LSI, queries from GSI do not fetch attributes that are not projected. Be aware of the additional costs (read/write) involved by using GSI (and LSI)... and the additional memory required by data projections...
Hope it helps.
I have a table in dynamodb with following attributes
id -> hashkey
eventname->rangekey
startdate
enddate
locationname
locationtype
cost
I want to query db based on 4 values eventname, locationname, startdate and enddate.
What can be the best way to do it ?
If I create a GSA then I can only do it based on two attributes
I have experienced similar problems as well. In many cases, it can be helpful to construct combined keys. For example, you could create a GSA with partition key "EVENT+LOCATION" and range key "START-END".
If this doesn't help, can you specify your queries in greater details and give an example?
Cheers,
Fabian
I have an custom Invoice object with a look-up relationship to Accounts.
I'm trying to query the data base to get the total number of invoices of the accounts where Connection_Date__c has a value (Connection_Date__c is a custom field of Accounts object)
How can I do this? The query I'm using gives me only the number of accounts but not the number of invoices.
SELECT Name,(SELECT name FROM Invoices__r) FROM Account WHERE Connection_Date__c != null
In SOQL, it's almost easier to write queries that are driven from the child rather than the parent. This is opposite of SQL
Try a query that matches this pattern:
SELECT Count() FROM ChildTable WHERE ChildTable.parentField != Null
SELECT (Parent_Api_Name_In_Child_Object),
COUNT(ID)
(Child_Realtionship_Name__r.Parent_Fields....)
FROM (Child_Object_Api_Name)
GROUP BY (Parent_Api_Name_Child_Object,
Parent Feilds with API Names)
HAVING COUNT(ID){>,<,=,{Optional}}
it is an SQL queried answer
Let me know in case of any questions