What is the recommended way of saving durations in Firestore? - firebase

I am trying to save a duration in Firestore. For example, a user has been working on an exercise for 4 minutes and 44 seconds. What is the recommended data type for this information?
It would be nice if the Firestore increment operator would work on the used data type. I thought about using a Timestamp, but in the Firebase SDK, the increment method expects a number, and a Timestamp is not a number.

What is the recommended way of saving durations in Firestore?
Because you are talking about duration, the solution would be to store it as a number, in particular as an Integer, which is a supported data-type. In your example, 4 minutes and 44 seconds will be stored as:
284
And this is because 4*60 + 44 = 284.
If you want to get back the number of hours and minutes, simply reverse the above algorithm.
It would be nice if the Firestore increment operator would work on the used data type.
Firestore already provides such an option. If you want to add, for example, an hour to an existing duration, you can use:
FieldValue.increment(3600) // 60 * 60
In code it will be as simply as:
val db = Firebase.firestore
db.collection(collName).document(docId).update("field", FieldValue.increment(3600))
If you want to decrement with an hour, pass a negative value:
FieldValue.increment(-3600)
And in code:
db.collection(collName).document(docId).update("field", FieldValue.increment(-3600))
I thought about using a Timestamp
A timestamp is not duration. Firestore timestamps contain time units as small as nanoseconds. So a Timestamp object contains, the year, month, day, hour, minutes, seconds, milliseconds, and nanoseconds. This will not solve your problem in any way.

Related

When I load the datetime column into snowflake I see 10 and 11-hour differences

I'm trying to combine two columns with date and time data in the ETL tool and load them into snowflake. When I load the data as datetime, there are 10 and 11 hour differences. At the same time, I share with you the current hours of Snowflake and my local.
select current_timestamp >> '2022-11-15 23:46:47.318 -0800'
My current hour is now >> '2022-11-16 10:46:47.318'
The photo below will help to understand the problem more closely. STAGE_DATE Merged version of VBRK_FKDAT and VBRK_ERZET. I want to see this date. I combine VBRK_FKDAT and VBRK_ERZET from Data Services and after I get 10 hours I send snowflake. This stands for INVOICEDATE. When I take the difference of INVOICEDATE and STAGE_DATE I get HOURDIFF. Randomly there are 10 and 11 hours difference. I'm trying to understand the problem.
Thank you for your interest.
What you've explained is most probably related to the current setting for the TIMEZONE:
Default: America/Los_Angeles
The timestamps are stored internally in UTC but are displayed for the user based on the TIMESTAMP parameter value for session/user/account.
For more information have a look here.

What is the best way to detect anomalies in exception count, from logs, in azure

I have an asp.net application deployed in azure. This generates plenty of logs, some of which are exceptions. I do have a query in Log Analytics Workspace that picks up exceptions from logs.
I would like to know what is the best and/or cheapest way to detect anomalies in the exception count over a time period.
For example, if the average number of exceptions for every hour is N (based on information collected over the past 1 month or so), and if average goes > N+20 at any time (checked every 1 hour or so), then I need to be notified.
N would be dynamically changing based on trend.
I would like to know what is the best and/or cheapest way to detect anomalies in the exception count over a time period.
Yes, we can achieve this by following steps:
Store the average value in a Stored Query Result in Azure.
Using stored query result
.set stored_query_result
These are some limitations to keep the result. Refer MSDOC for detailed information.
Note: The stored query result will be available only 24 hours.
Workaround Follows
Set the Stored query result
# here i am using Stored query result to store the average value of trace message count for 5 hours
.set stored_query_result average <|
traces
| summarize events = count() by bin(timestamp, 5h)
| summarize avg(events)
2. Once Query Result Set you can use the Stored Query Result value in another KQL Query (The stored value was available till 24 hours)
# Retrieve the stored Query Result
stored_query_result(<StoredQueryResultName>) |
Query follows as per your need
Schedule the alert.

To query Last 7 days data in DynamoDB

I have my dynamo db table as follows:
HashKey(Date) ,RangeKey(timestamp)
DB stores the data of each day(hash key) and time stamp(range key).
Now I want to query data of last 7 days.
Can i do this in one query? or do i need to call dbb 7 times for each day? order of the data does not matter So, can some one suggest an efficient query to do that.
I think you have a few options here.
BatchGetItem - The BatchGetItem operation returns the attributes of one or more items from one or more tables. You identify requested items by primary key. You could specify all 7 primary keys and fire off a single request.
7 calls to DynamoDB. Not ideal, but it'd get the job done.
Introduce a global secondary index that projects your data into the shape your application needs. For example, you could introduce an attribute that represents an entire week by using a truncated timestamp:
2021-02-08 (represents the week of 02/08/21T00:00:00 - 02/14/21T12:59:59)
2021-02-16 (represents the week of 02/15/21T00:00:00 - 02/22/21T12:59:59)
I call this a "truncated timestamp" because I am effectively ignoring the HH:MM:SS portion of the timestamp. When you create a new item in DDB, you could introduce a truncated timestamp that represents the week it was inserted. Therefore, all items inserted in the same week will show up in the same item collection in your GSI.
Depending on the volume of data you're dealing with, you might also consider separate tables to segregate ranges of data. AWS has an article describing this pattern.

Querying dynamo by time intervals

I'm new to DynamoDB and would like some help on how to best structure things, and whether it's the right tool for the job.
Let's say I have thousands of users signed up to receive messages. They can choose to receive messages every half hour, hour, couple of hours or every 4 hours. So essentially there is a schedule attribute for each user's message. Users can also specify a time window for when they receive these messages, e.g. 09:00 - 17:00 and also toggle an active state.
I want to be able to easily get the messages to send to the various users at the right time.
If done in SQL this would be really easy, with something like:
Select * from UserMessageSchedules
where
now() > startTime
and now() < endTime
and userIsActive
and schedule = 'hourly'
But I'm struggling to do something similar in DynamoDB. At first I thought I'd have the following schema:
userId (partion Key)
messageId (sort key)
schedule (one of half_hour, hour, two_hours, four_hours)
startTime_userId
endTime
I'd create a Global Secondary Index with the 'schedule' attribute being the partition key, and startTime + userId being the sort key.
I could then easily query for messages that need sending after a startTime.
But I'd still have to check endTime > now() within my lambda. Also, i'd be reading in most of the table, which seems inefficient and may lead to throughput issues?
And with the limited number of schedules, would I get hot partitions on the GSI?
So I then thought rather than sending messages from a table designed to store users preferences, I could process this table when an entry is made/edited and populate a toSend table, which would look like this:
timeSlot (pk) timeSlot_messageId (sk)
00:30 00:30_Message1_Id
00:30 00:30_Message2_Id
01:00 01:00_Message1_Id
Finding the messages to send at certain time would be nice and fast as I'd just query on the timeSlot
But again I'm worried about hot spots and throughput. Is it ok for each partition to have 1000's rows and for just that partition to be read? Are there any other problems with this approach?
Another possibility would be to have different tables (rather than partitions) for each half hour when something could be sent
e.g, toSendAt_00:30, toSendAt_01:00, toSendAt_01:30
and these would have the messageId as the primary key and would contain the data needing to be sent. I'd just scan the table. Is this overkill?
Rather than do big reads of data every half an hour, would I be better duplicating the data into Elastic Search and querying this?
Thanks!

Recommended Schema for DynamoDB calendar/event like structure

I'm pretty new to DynamoDB design and trying to get the correct schema for my application. In this app different users will enter various attributes about their day. For example "User X, March 1st 12:00-2:00, Tired". There could be multiple entries for a given time, or overlapping times (e.g. tired from 12-2 and eating lunch from 12-1).
I'll need to query based on user and time ranges. Common queries:
Give me all the "actions" for user X between time t1 and t2
Give me all the start times for action Z for user X
My initial thought was that the partition key would be userid and range key for the start time, but that wont work because of duplicate start times right?
A second thought:
UserID - Partition Key
StartTime - RangeKey
Action - JSON document of all actions for that start time
[{ action: "Lunch", endTime:"1pm"},{action:tired, endTime:"2pm"}]
Any recommendation on a proper schema?
This doesn't really have a one solution. And you will need to evaluate multiple options depending on your use case how much data you have/how often would you query and by which fields etc.
But one good solution is to partition your schema like this.
Generated UUID as partition key
UserID
Start time (in unix epoch time or ISO8601 time format)
Advantages
Can handle multiple time zones
Can easily query for userID and start date (you will need secondary index with primary key userID and sort key start time)
More even distribution and less hot keys of your data across dynamoDB partitions because of randomly generated primary key.
Disadvantages
More data for every item (because of UUID) (+16 bytes)
Additional cost for new secondary index, note scanning the data in table is generally much more expensive than having secondary index.
This is pretty close to your initial thought, in order to get a bit more precise answer we will need a lot more information about how many writes and reads are you planning, and what kind of queries you will need.
You are right in that UserID as Partition key and StartTime as rangeKey would be the obvious choice, if it wasn't for the fact of your overlapping activities.
I would consider going for
UserID - Partition Key
StartTime + uuid - RangeKey
StartTime - Plain old attribute
Datetimes in DynamoDB just get stored as strings anyway. So the idea here is that you have StartTime + some uuid as your rangekey, which gives you a sortable table based on datetime whilst also assuring you have unique primary keys. You could then store the StartTime in a separate attribute or have a function for adding/removing the uuid from the StartTime + uuid attribute.

Resources