I have a table with the following keys:
PKey (Partition Key)
SKey (Sort Key)
I want to store holidays for a range of countries, and answer the following 2 questions:
give me all holidays in 2020 between a range of months
give me all holidays in 2020 between a range of months for a specific country
Currently the PKey holds the year and the SKey the month and day (12-31). I've added Country (ISO) to the SKey, however I can't answer both questions. Is there another design that would work? I'm trying to avoid a GSI.
Mark
Have you considered a Local Secondary Index (LSI) instead of the GSI? There's no extra cost associated with a LSI.
You could have the table sort key be "County"-"Month"-"Day"
Then your second access requirement could be serviced by
Query(pkey = "2020", skey between "GB-10-01" and "GB-12-31")
Then you LSI sort key would just be "Month"-"Day" and could service your first access requirement.
Query(index='MYLSI', pkey = "2020", skey between "10-01" and "12-31")
Related
I have table which contain say 3 columns( companyname, years, roc). I want to find out companyname which satisfy below condition.
Between years 2010 and 2020 and roc> 10 each years. Mean if in any years between 2010 and 2020 roc<10 it should not include that company. It should only show companyname if each year roc >10.
Filter the table for the years that you want and aggregate with a condition in the HAVING clause for the column roc:
SELECT companyname
FROM tablename
WHERE years BETWEEN 2010 AND 2020
GROUP BY companyname
HAVING MIN(roc) > 10;
Okay, I think just now I got your problem. What is the main issue in here is, the companies get duplicated within a single table which violates the DB principles what apparently makes you harder when querying back.
So the best thing you could do is, break this single table into two - COMPANY_TABLE and COMPANY_ROC
So the COMPANY_TABLE will only have the companyID(PRIMARY KEY) and the companyName.
Next have another table called COMPANY_ROC - which will have roc , years , companyID(FOREIGN KEY - PRIMARY KEY of COMPANY_TABLE).
CREATE TABLE COMPANY(
companyID int PRIMARY KEY NOT NULL,
companyName varchar,
)
CREATE TABLE COMPANY_ROC(
companyID int,
roc int,
years int,
FOREIGN KEY(companyID) REFERENCES COMPANY(companyID)
)
SO when you querying, you can query as follows using an INNER JOIN
SELECT COMPANY.companyName from COMPANY INNER JOIN COMPANY_ROC WHERE COMPANY_ROC.years>=2010 AND COMPANY_ROC.years<=2020 AND COMPANY_ROC.roc>10 AND COMPANY.companyID = COMPANY_ROC.companyID
Maybe my Query might have some issues as I didn't test it. Just understand what I explained and give it a try. Breaking the table into two and having primary , foreign keys is the key for easy querying :)
GROUPS
userID: string
groupID: string
lastActive: number
birthday: number
Assume I have a DynamoDB table called GROUPS which stores items with these attributes. The table records which users are joined to which groups. Users can be in multiple groups at the same time. Therefore, the composite primary key would most-commonly be:
partition key: userID
sort key: groupID
However, if I wanted to query for all users in a specific group, within a specific birthday range, sorted by lastActive, is this possible and if so what index would I need to create?
Could I synthesize lastActive and userID to create a synthetic sort key, like so:
GROUPS
groupID: string
lastActiveUserID: string (i.e. "20201230T09:45:59-abc123")
birthday: number
Which would make for a different composite primary key where the partition key is groupID and the sort key is lastActiveUserID, which would sort the participants by when they were last active, and then a secondary index to filter by birthday?
As written, no this isn't possible.
within a specific birthday range
implies sk_birthday between :start and :end
sorted by lastActive
implies lastActive as a sort key.
which are mutually exclusive...I can't devise a sort key that would be able to contain both values in a usable format.
You could have a Global Secondary Index with a hash key of group-id and lastActive as a sort key, then filter on birthday. But, that only affects the data returned, it doesn't affect the data read nor the cost to read that data. Additionally, since DDB only reads 1MB of data at a time, you'd have to call it repeatedly in a loop if it's possibly a given group has more than 1MB worth of members.
Also, when your index has a different partition (hash) key than your table, that is a global secondary index (GSI). If your index has the same partition key but a different sort key than the table, that can be done with a local secondary index (LSI)
However for any given query, you can only use the table or a given index. You can't use multiple indexes at the same time
Now having said all that, what exactly to you mean by "specific birthday range" If the range in question is a defined period, by month, by week. Perhaps you could have a GSI where the hash key is "group-id#birthday-period" and sort key is lastActive
So for instance, "give me GROUPA birthdays for next month"
Query(hs = "GROUPA#NOVEMBER")
But if you wanted November and December, you'd have to make two queries and combine & sort the results yourself.
Effective and efficient use of DDB means avoiding Scan() and avoiding the use of filterExpressions that you know will throw away lots of the data read.
I was looking at DynamoDB to store some data because it looked like it might be a cost effective solution, but after a bit of research I think that it might not fit my use case because I am unable to find relevant unique values for partition and sort keys.
My data are a series of records of natural events for various species of plants e.g. the date and location that someone noticed a Beech tree's leaves appearing.
{
"species": "Beech",
"event": "Budburst",
"year": 2015,
"season": "Spring",
"date": "12/04/2015",
"latitude": "0.00000",
"longitude": "40.000"
}
The main query for the application would be to obtain all of the data for a certain species for a certain event in a certain year:
Endpoint: events/:species/:event-type/:year
This is likely to return a few thousand events which can then be shown in a map.
If this were MongoDB, then I might create an index on a composite field of species+eventType+year. It wouldn't be a unique index, but at least only the few thousand results would be scanned rather than the whole table, so it wouldn't be too bad.
I'm not sure how to achieve the same thing in DynamoDB, though, or even if it is possible, because the partition key, or the partition + sort key combination seems to have to be unique.
Is the only way to make this work to have an incrementing unique event id for the partition key, and then have the species+eventType+year string as the sort key?
If there are any other patterns I'd be grateful to hear about them.
Thanks for reading.
You could do something like this:
{
"species+event+year": "BeechBudhurst2015",
"eventId": 1111-2222-3333-4444
"species": "Beech",
"event": "Budburst",
"year": 2015,
"season": "Spring",
"date": "12/04/2015",
"latitude": "0.00000",
"longitude": "40.000"
}
Create a UUID for each event. This is good practice anyway, there should always be something you can uniquely identify an event with.
As you've already identified, create a composite attribute of species+event+year.
Make species+event+year your partition key and eventId (the UUID) your range key.
When you do a Query, just provide the partition key, which will give you all species with a particular event in a certain year.
If you wanted to use Get item to retrieve an individual event, you would need to specify both the partition and range key.
This design is highly optimised for getting species+event+year. If there are other queries you want to optimise, you might consider having the primary partition key of the eventId - this would be a more common design I think. Then create a GSI for each optimised query (e.g. GSI partition key species+event+year). Note that GSI partition keys do not need to be unique, so there would be no need to set a range key to make each item unique. The downside to using GSIs is that you have have to provision them separately (i.e. it costs you more money).
It sounds like a natural primary key would be species as the hash key and eventType+timeStamp as the sort key. (Use ISO-8601 for the timestamp so that you can query using the begins_with function in your KeyConditionExpression.)
If it's possible that there's more than one event for a given species and event type at the same time, or if you simply lack precise time stamps for the events, then you can use a UUID as the hash key, and create a GSI with species as the hash key and eventType+year, or even species+eventType+year as the hash key, since primary keys do not have to be unique in a GSI.
Also, here is a helpful related question which asks, "How to query DynamoDB by date (range key), with no obvious hash key?"
I'm looking for some advice on a DynamoDB table design to store telemetry data streaming in from 1000's of sensor hubs. The sensor hubs send up to 15,000 messages per day each, containing the following:
timestamp (unix time)
station_id (uuid)
sensor_type (string)
sensor_data (json)
I've looked into best practices for storing time series data, and will adopt a table partitioning strategy, where a new "hot data" table is created each month (and adjust RCU's and WCU's accordingly for older "cooler" tables).
What i'm not sure about is picking a suitable hash key and sort key, as well as setting up indexes, etc.
The majority of the queries to data will be: Give me messages where station_id = "foo" and sensor_type = "bar", and timestamp is between x and y.
At a guess, i'm assuming I would use station_id as the hash key, and timestamp as the sort key, but how do a query for messages with a particular sensor_type without resorting to filters? Would I be best to combine the station_id and sensor_type as the hash key?
Judging from the query example that you've provided I would do create the following table:
stationId_sensorType (String, partition key) - a combined attribute that contains concatenated values for station id and for sensor type
timestamp (Number, range key) - UNIX timestamp that you can use to sort by time stamp or to find only record with timestamps in range.
This will allow to get all values for a pair of (stationId, sensorType).
You can also store stationId and sensorType as separate fields in your items and then you can create GSI on them to support other queries, like, get all values for a stationId.
I have a school which has a many class/grade(1-10) and each class has many students, I need to store a student's record on a yearly basis so that I could partition better. So its basically Class->N years->N students. How do I model this problem to store this on a Dynamo DB
On NoSQL, the design depends on the Query Access Pattern (QAP). As you have not mentioned QAP, how you would like to retreive the data. I have assumed a typical scenario and provided the below design.
Table : Student
Partition Key : Student Id
Sort Key : year
Other attributes: Student name, class etc.
The year is defined as sort key because a student can study in multiple grades (1-10) during different years. For eg,
2010 - He/She could be on grade 5
2011 - He/She could be on grade 6
In case, if you would like to get all the student ids for a particular year, you can create GSI (Global Secondary Index) on year field.
Partition Key for the index : year
If you have any other access pattern, please update the question. So that we can discuss the answer for that particular query access pattern (QAP).