DynamoDB primary key for date range and user id - amazon-dynamodb

I'm still trying to wrap my head around primary key selection in DynamoDB. My current structure is the following, where userId is HASH and sort is RANGE.
userId
sort
event
1
2021-01-18#u2d3-f3d5-s22d-3f52
...
1
2021-01-08#f1d3-s30x-s22d-w2d3
...
2
2021-02-21#s2d2-u2d3-230s-3f52
...
2
2021-02-13#w2d3-e5d5-w2d3-3f52
...
1
2021-01-19#f2d4-f3d5-s22d-3f52
...
1
2020-12-13#f3d5-e5d5-s22d-w2d3
...
2
2020-11-11#e5d5-u2d3-s22d-0j32
...
What I want to achieve is to query all events for a particular user between date A and date B. I have tested a few of solutions that all work, like
Figure out a closest common begins_with for the range I want. If date A is 2019-02-01 and date B is 2021-01-03, then it would be userId = 1 and begins_with (sort, 20), which would return everything from the twenty-first century.
Loop through all months between date A and date B and do a bunch of small queries like userId = 1 and begins_with (sort, 2021-01), then concat the results afterwards.
They all work but have their drawbacks. I'm also a bit unsure of when I'm just complicating things to the point where a scan might actually be worth it instead. Being able to use between would of course be the best option, but I need to put the unique #guid at the end of the range key in order to make each primary key unique.
Am I approaching this the wrong way?

I created a little demo app to show how this works.
You can just use the between condition, because it uses byte-order to implement the between condition. The idea is that you use the regular starting date A and convert it to a string as the beginning of the range. Then you add a day to your end, convert it to string and use that as the end.
The script creates this table (it will look different when you run it):
PK | SK
------------------------------------------------------
demo | 2021-02-26#a4d0f5f3-588a-49d9-8eaa-a3e2f9436ade
demo | 2021-02-27#92b9a41b-9fa5-4ee7-8663-7b801192d8dd
demo | 2021-02-28#e5d162ac-3bbf-417a-9ec7-4024410e1b01
demo | 2021-03-01#7752629e-dc8f-47e0-8cb6-5ed219c434b5
demo | 2021-03-02#dd89ca33-965c-4fe1-8bcc-3d5eee5d6874
demo | 2021-03-03#b696a7fc-ba17-47d5-9d19-454c19e9bccc
demo | 2021-03-04#ee30b1ce-3910-4a59-9e62-09f051b0dc72
demo | 2021-03-05#f0e2405f-6ce9-4fcb-a798-394f7a2f9490
demo | 2021-03-06#bcf76e07-7582-4fe3-8ffd-14f450e60120
demo | 2021-03-07#58d01231-a58d-4c23-b1ed-e525ba102b80
And when I run this function to select the items between two given dates, it returns the result below:
def select_in_date_range(pk: str, start: datetime, end: datetime):
table = boto3.resource("dynamodb").Table(TABLE_NAME)
start = start.isoformat()[:10]
end = (end + timedelta(days=1)).isoformat()[:10]
print(f"Requesting all items starting at {start} and ending before {end}")
result = table.query(
KeyConditionExpression=\
conditions.Key("PK").eq(pk) & conditions.Key("SK").between(start, end)
)
print("Got these items")
for item in result["Items"]:
print(f"PK={item['PK']}, SK={item['SK']}")
Requesting all items starting at 2021-02-27 and ending before 2021-03-04
Got these items
PK=demo, SK=2021-02-27#92b9a41b-9fa5-4ee7-8663-7b801192d8dd
PK=demo, SK=2021-02-28#e5d162ac-3bbf-417a-9ec7-4024410e1b01
PK=demo, SK=2021-03-01#7752629e-dc8f-47e0-8cb6-5ed219c434b5
PK=demo, SK=2021-03-02#dd89ca33-965c-4fe1-8bcc-3d5eee5d6874
PK=demo, SK=2021-03-03#b696a7fc-ba17-47d5-9d19-454c19e9bccc
Full script to try it yourself.
import uuid
from datetime import datetime, timedelta
import boto3
import boto3.dynamodb.conditions as conditions
TABLE_NAME = "sorting-test"
def create_table():
ddb = boto3.client("dynamodb")
ddb.create_table(
AttributeDefinitions=[{"AttributeName": "PK", "AttributeType": "S"}, {"AttributeName": "SK", "AttributeType": "S"}],
TableName=TABLE_NAME,
KeySchema=[{"AttributeName": "PK", "KeyType": "HASH"}, {"AttributeName": "SK", "KeyType": "RANGE"}],
BillingMode="PAY_PER_REQUEST"
)
def create_sample_data():
pk = "demo"
amount_of_events = 10
table = boto3.resource("dynamodb").Table(TABLE_NAME)
start_date = datetime.now()
increment = timedelta(days=1)
print("PK | SK")
print("------------------------------------------------------")
for i in range(amount_of_events):
date = start_date.isoformat()[:10]
unique_id = str(uuid.uuid4())
sk = f"{date}#{unique_id}"
print(f"{pk} | {sk}")
start_date += increment
table.put_item(Item={"PK": pk, "SK": sk})
def select_in_date_range(pk: str, start: datetime, end: datetime):
table = boto3.resource("dynamodb").Table(TABLE_NAME)
start = start.isoformat()[:10]
end = (end + timedelta(days=1)).isoformat()[:10]
print(f"Requesting all items starting at {start} and ending before {end}")
result = table.query(
KeyConditionExpression=\
conditions.Key("PK").eq(pk) & conditions.Key("SK").between(start, end)
)
print("Got these items")
for item in result["Items"]:
print(f"PK={item['PK']}, SK={item['SK']}")
def main():
pass
# create_table()
# create_sample_data()
start = datetime.now() + timedelta(days=1)
end = datetime.now() + timedelta(days=5)
select_in_date_range("demo",start, end)
if __name__ == "__main__":
main()

Related

How to create data table using dynamic dates functions like now() or ago()?

Is there a way to create a data table using dynamic dates function like now() or ago()? I am not sure it's possible or just an issue of finding the right syntax.
Works
let Logs = datatable(timestamp: datetime) [
datetime(2022-12-02 2:00:00.00),
datetime(2022-12-02 6:00:00.00),
];
Does not work
let Logs = datatable(timestamp: datetime) [
now(),
datetime(now()- ago(3h)),
];
Thoughts?
datatable accept only literals (no expressions, no functions).
You can use the union operator with multiple print statements:
union
(print 1, now())
,(print 2, ago(3h))
| project-rename id = print_0, timestamp = print_1
id
timestamp
2
2023-02-15T05:29:23.2203689Z
1
2023-02-15T08:29:23.2203689Z
Fiddle
P.S.
datetime(now()- ago(3h)) is wrong.
datetime accept only literal.
now() - ago(3h) is equal to 3h.
Use ago(3h) or now() - 3h

table counterpart of column_ifexists()

We do have a function column_ifexists() which refers to a certain column if it exists, otherwise it refers to another option if we provide. Is there a similar function for table? I want to refer to a table and run some logic against it in the query , if the table exists , but if it doesn't exist, there shouldn't be a failure -- it should simply return no data.
e.g.
table_ifexists('sometable') | ...<logic>...
Please note that the fields referenced in the query should be defined in the dummy table, otherwise in case of non-existing table, the query will yield an exception
Failed to resolve scalar expression named '...'
In the following example these fields are StartTime, EndTime & EventType
Table exists
let requested_table = "StormEvents";
let dummy_table = datatable (StartTime:datetime, EndTime:datetime, EventType:string)[];
union isfuzzy=true table(requested_table), dummy_table
| where EndTime - StartTime > 30d
| summarize count() by EventType
EventType
count_
Drought
1635
Flood
20
Heat
14
Wildfire
4
Fiddle
Table does not exist
let requested_table = "StormEventsXXX";
let dummy_table = datatable (StartTime:datetime, EndTime:datetime, EventType:string)[];
union isfuzzy=true table(requested_table), dummy_table
| where EndTime - StartTime > 30d
| summarize count() by EventType
EventType
count_
Fiddle

Kusto Query Dynamic sort Order

I have started working on Azure Data Explorer( Kusto) recently.
My requirement to make sorting order of Kusto table in dynamic way.
// Variable declaration
let SortColumn ="run_date";
let OrderBy="desc";
// Actual Code
tblOleMeasurments
| take 10
|distinct column1,column2,column3,run_date
|order by SortColumn OrderBy
Here My code working fine till Sortcolumn but when I tried to add [OrderBy] after [SortColumn] kusto gives me error .
My requirement here is to pass Asc/desc value from Variable [OrderBy].
Kindly assist here with workarounds and solutions which help me .
The sort column and order cannot be an expression, it must be a literal ("asc" or "desc"). If you want to pass the sort column and sort order as a variable, create a union instead where the filter on the variables results with the desired outcome. Here is an example:
let OrderBy = "desc";
let sortColumn = "run_date";
let Query = tblOleMeasurments | take 10 |distinct column1,column2,column3,run_date;
union
(Query | where OrderBy == "desc" and sortColumn == "run_date" | order by run_date desc),
(Query | where OrderBy == "asc" and sortColumn == "run_date" | order by run_date asc)
The number of union legs would be the product of the number of candidate sort columns times two (the two sort order options).
An alternative would be sorting by a calculated column, which is based on your sort_order and sort_column. The example below works for numeric columns
let T = range x from 1 to 5 step 1 | extend y = -10 * x;
let sort_order = "asc";
let sort_column = "y";
T
| order by column_ifexists(sort_column, "") * case(sort_order == "asc", -1, 1)

query with max and second factor [duplicate]

I have:
TABLE MESSAGES
message_id | conversation_id | from_user | timestamp | message
I want:
1. SELECT * WHERE from_user <> id
2. GROUP BY conversation_id
3. SELECT in every group row with MAX(timestamp) **(if there are two same timestamps in a group use second factor as highest message_id)** !!!
4. then results SORT BY timestamp
to have result:
2|145|xxx|10000|message
6|1743|yyy|999|message
7|14|bbb|899|message
with eliminated
1|145|xxx|10000|message <- has same timestamp(10000) as message(2) belongs to the same conversation(145) but message id is lowest
5|1743|me|1200|message <- has message_from == me
example group with same timestamp
i want from this group row 3 but i get row 2 from query
SELECT max(message_timestamp), message_id, message_text, message_conversationId
FROM MESSAGES
WHERE message_from <> 'me'
GROUP BY message_conversationId
ORDER by message_Timestamp DESC
what is on my mind to do union from message_id & timestamp and then get max???
Your query is based on non-standard use of GROUP BY (I think SQLite allows that only for compatibility with MySQL) and I'm not at all sure that it will produce determinate results all the time.
Plus it uses MAX() on concatenated columns. Unless you somehow ensure that the two (concatenated) columns have fixed widths, the results will not be accurate for that reason as well.
I would write the query like this:
SELECT
m.message_timestamp,
m.message_id,
m.message_text,
m.message_conversationId
FROM
( SELECT message_conversationId -- for every conversation
FROM messages as m
WHERE message_from <> 'me'
GROUP BY message_conversationId
) AS mc
JOIN
messages AS m -- join to the messages
ON m.message_id =
( SELECT mi.message_id -- and find one message id
FROM messages AS mi
WHERE mi.message_conversationId -- for that conversation
= mc.message_conversationId
AND mi.message_from <> 'me'
ORDER BY mi.message_timestamp DESC, -- according to the
mi.message_id DESC -- specified order
LIMIT 1 -- (this is the one part)
) ;
Try below sql to achieve your purpose by group by twice.
select m.*
from
Messages m
-- 3. and then joining to get wanted output columns
inner join
(
--2. then selecting from this max timestamp - and removing duplicates
select conversation_id, max(timestamp), message_id
from
(
-- 1. first select max message_id in remainings after the removal of duplicates from mix of cv_id & timestamp
select conversation_id, timestamp, max(message_id) message_id
from Messages
where message <> 'me'
group by conversation_id, timestamp
) max_mid
group by conversation_id
) max_mid_ts on max_mid_ts.message_id = m.message_id
order by m.message_id;
http://goo.gl/MyZjyU
ok it was more simple than I thought:
basically to change select from:
max(message_timestamp)
to:
max(message_timestamp || message_id)
or max(message_timestamp + message_id)
so it will search for max on concatenation of timestamp and message_id
ps. after a digging - it's working only if message id is growing with timestamp ( order of insertion is preserved )
edit:
edit2 :
so why it works ?
SELECT max(message_timestamp+message_id), message_timestamp, message_id, message_conversationId, message_from,message_text
FROM MESSAGES
WHERE message_conversationId = 1521521
AND message_from <> 'me'
ORDER by message_Timestamp DESC

Math with MySQL timestamps in groovy

I need to subtract one timestamp from another in groovy and get the number of hours that has passed. The timestamps are coming from MySQL. When I do simple math I get numbers of days rounded off to zero integers.
endDate - startDate
gives rounded integer
I want a result of 2.35 hours, etc.
You can use groovy.time.TimeCategory like so:
// create two timestamps
import java.sql.Timestamp
def dates = ['2012/08/03 09:00', '2012/08/03 10:30']
def (Timestamp a, Timestamp b) = dates.collect {
new Timestamp( Date.parse( 'yyyy/MM/dd hh:mm', it ).time )
}
// Then compare them with TimeCategory
use(groovy.time.TimeCategory) {
def duration = b - a
println "${duration.days}d ${duration.hours}h ${duration.minutes}m"
}
Which will print:
0d 1h 30m

Resources