How do I convert the duration from Minutes to seconds . After the below step, I get the result in Minutes for Duration and I wanted to convert it to seconds for example - 00.04.19 to 259 seconds
| extend duration = ((EndTime - StartTime)/60)
| summarize duration= avg(duration) by EndTime```
Thanks
See: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/datetime-timespan-arithmetic
For example:
print timespan(01:23:45) / 1s
Related
I am very new to Kusto queries and I have one that is giving me the proper data that I export to Excel to manage. My only problem is that I only care (right now) about yesterday and today in two separate Sheets. I can manually change the datetime with the information but I would like to be able to just refresh the data and it pull the newest number.
It sounds pretty simple but I cannot figure out how to specify the exact time I want. Has to be from 2 am day 1 until 1:59 day 2
Thanks
['Telemetry.WorkStation']
| where NexusUid == "08463c7b-fe37-43b6-a0d2-237472b9774d"
| where TelemetryLocalTimeStamp >= make_datetime(2023,2,15,2,0,0) and TelemetryLocalTimeStamp < make_datetime(2023,2,16,01,59,0)
| where NumberOfBinPresentations >0
ago(), now(), startofday() and some datetime arithmetic.
// Sample data generation. Not part of the solution.
let ['Telemetry.WorkStation'] = materialize(range i from 1 to 1000000 step 1 | extend NexusUid = "08463c7b-fe37-43b6-a0d2-237472b9774d", TelemetryLocalTimeStamp = ago(2d * rand()));
// Solution starts here.
['Telemetry.WorkStation']
| where NexusUid == "08463c7b-fe37-43b6-a0d2-237472b9774d"
| where TelemetryLocalTimeStamp >= startofday(ago(1d)) + 2h
and TelemetryLocalTimeStamp < startofday(now()) + 2h
| summarize count(), min(TelemetryLocalTimeStamp), max(TelemetryLocalTimeStamp)
count_
min_TelemetryLocalTimeStamp
max_TelemetryLocalTimeStamp
500539
2023-02-15T02:00:00.0162031Z
2023-02-16T01:59:59.8883692Z
Fiddle
I am trying to process website login session data by each user. I am reading an S3 session log file into an RDD. The data looks something like this.
----------------------------------------
User | Site | Session start | Session end
---------------------------------------
Joe |Waterloo| 9/21/19 3:04 AM |9/21/19 3:18 AM
Stacy|Kirkwood| 8/4/19 3:06 PM |8/4/19 3:54 PM
John |Waterloo| 9/21/19 8:48 AM |9/21/19 9:05 AM
Stacy|Kirkwood| 8/4/19 4:16 PM |8/4/19 5:41 PM
...
...
I want to find out how many users were logged in each second of the hour on a given day.
Example: I might be processing this data for 9/21/19 only. So, I would need to remove all other records and then SUM user sessions for each second of the hour for all 24 hours of 9/21/19. The output should be possibly 24 rows for all the hours of 9/21/19 and then counts for each second of the day(yikes, second by second data!).
Is this something possible to do in pyspark using either rdds or DF?
(Apologize for the tardiness in building the grid).
Thanks
my dataset
data=[['Joe','Waterloo','9/21/19 3:04 AM','9/21/19 3:18 AM'],['Stacy','Kirkwood','8/4/19 3:06 PM','8/4/19 3:54 PM'],['John','Waterloo','9/21/19 8:48 AM','9/21/19 9:05 AM'],
['Stacy','Kirkwood','9/21/19 4:06 PM', '9/21/19 4:54 PM'],
['Mo','Hashmi','9/21/19 1:06 PM', '9/21/19 5:54 PM'],
['Murti','Hash','9/21/19 1:00 PM', '9/21/19 3:00 PM'],
['Floo','Shmi','9/21/19 9:10 PM', '9/21/19 11:54 PM']]
cSchema = StructType([StructField("User", StringType())\
,StructField("Site", StringType())
, StructField("Sesh-Start", StringType())
, StructField("Sesh-End", StringType())])
df= spark.createDataFrame(data,schema=cSchema)
display(df)
parse timestamp
df1=df.withColumn("Start", F.from_unixtime(F.unix_timestamp("Sesh-Start",'MM/dd/yyyy hh:mm aa'),'20yy-MM-dd HH:mm:ss').cast("timestamp")).withColumn("End", F.from_unixtime(F.unix_timestamp("Sesh-End",'MM/dd/yyyy hh:mm aa'),'20yy-MM-dd HH:mm:ss').cast("timestamp")).drop("Sesh-Start","Sesh-End")
build and register udf, for multiple hours per person
def yo(a,b):
from datetime import datetime
d1 = datetime.strptime(str(a), '%Y-%m-%d %H:%M:%S')
d2 = datetime.strptime(str(b), '%Y-%m-%d %H:%M:%S')
y=[]
if d1.hour == d2.hour:
y.append(d1.hour)
else:
for i in range(d1.hour,d2.hour+1):
y.append(i)
return y
rng= udf(yo, ArrayType(IntegerType()))
explode list of hours into column
df2=df1.withColumn("new", rng(F.col("Start"),F.col("End"))).withColumn("new1",F.explode("new")).drop("new")
get seconds for each hour
df3=df2.withColumn("Seconds", when(F.hour("Start")==F.hour("End"), F.col("End").cast('long') - F.col("Start").cast('long'))
.when(F.hour("Start")==F.col("new1"), 3600-F.minute("Start")*60)
.when(F.hour("End")==F.col("new1"), F.minute("End")*60)
.otherwise(3600))
create temp view and query it
df3.createOrReplaceTempView("final")
display(spark.sql("Select new1, sum(Seconds) from final group by new1 order by new1"))
The above answer by Lennart could be more perfomant because he uses a join to get all the different hours, instead I use a UDF which could be slower. My code will work for any user who can be online for any amount of hours. My data used only the day required, so you could use day filter given above to limit your query to the day in question.. Final output
Try to check this:
Initiaize filter.
val filter = to_date("2019-09-21")
val startFilter = to_timestamp("2019-09-21 00:00:00.000")
val endFilter = to_timestamp("2019-09-21 23:59:59.999")
Generate range (0 .. 23).
hours = spark.range(24).collect()
Get actual user sessions that match the filter.
df = sessions.alias("s") \
.where(filter >= to_date(s.start) & filter <= to_date(s.end)) \
.select(s.user, \
when(s.start < startFilter, startFilter).otherwise(s.start).alias("start"), \
when(s.end > endFilter, endFilter).otherwise(s.end).alias("end"))
Combine match user sessions with range of hours.
df2 = df.join(hours, hours.id.between(hour(df.start), hour(df.end)), 'inner') \
.select(df.user, hours.id.alias("hour"), \
(when(hour(df.end) > hours.id, 360).otherwise(minute(df.end) * 60 + second(df.end)) - \
when(hour(df.start) < hours.id, 0).otherwise(minute(df.start) * 60 + second(df.start))).alias("seconds"))
Generate summary: calculate users count and sum of seconds for each hour of sessions.
df2.groupBy(df2.hour)\
.agg(count(df2.user).alias("user counts"), \
sum(dg2.seconds).alias("seconds")) \
.show()
Hope this helps.
I am having trouble figuring out how to calculate duration of a time variable
Any thoughts on how to tackle this?
A military time value encoded as a integer number h,hmm can be processed by converting the number to a SAS time value and then performing delta computations using certain assumptions.
data sleep_log;
input name $ boots_down boots_up;
datalines;
Joe 2000 0600 slept over midnight
Joe 1000 1230 slept into lunch
Joe 1630 1700 30 winks
Joe 0100 0100 out cold!
run;
data sleep_data;
set sleep_log;
down = hms(
int(boots_down / 100) /* extract hours */
, mod(boots_down , 100) /* extract minutes */
, 0 /* seconds not logged, use zero */
);
up = hms(
int(boots_up / 100) /* extract hours */
, mod(boots_up , 100) /* extract minutes */
, 0 /* seconds not logged, use zero */
);
* SAS time values are linear and simple arithmetic can apply;
if up <= down
then delta = '24:00't + up - down; /* presume roll over midnight */
else delta = up - down;
format down up delta time5.;
run;
A more robust log would also record the day, eliminating presumptions and providing a proper time dimension.
You can extract the Hours and Minutes from your numeric military time HHMM , then create a SAS time using HMS() function.
Extract Hours: Divide your HHMM by 100 and save as integer to get hours,
Extract Minutes: get the Remainder (MOD) of HHMM by 100 to get the minutes,
Create a new time variable using HMS(Hour,Minute,Second),
Create a new Datetime for each using DHMS(date,hour,minute,second)
Full Code:
data have;
input sleep awake date_s date_w;
informat date_s date9. date_w date9.;
format sleep z4. awake z4. date_s date9. date_w date9.;
datalines;
2300 0500 12feb2018 13feb2018
2000 0300 11feb2018 12feb2018
0530 1230 10feb2018 10feb2018
;
run;
data want;
set have;
new_sleep_time=hms(int(sleep/100),int(mod(sleep,100)),0);
new_awake_time=hms(int(awake/100),int(mod(awake,100)),0);
dt_awake=dhms(date_w,hour(new_awake_time),minute(new_awake_time),0);
dt_sleep=dhms(date_s,hour(new_sleep_time),minute(new_sleep_time),0);
diff=dt_awake-dt_sleep;
keep new_sleep_time new_awake_time dt_awake dt_sleep diff;
format new_sleep_time time8. new_awake_time time8. diff time8. dt_awake datetime21. dt_sleep datetime21.;
run;
Output:
new_sleep_time=23:00:00 new_awake_time=5:00:00 diff=6:00:00 dt_awake=13FEB2018:05:00:00 dt_sleep=12FEB2018:23:00:00
new_sleep_time=20:00:00 new_awake_time=3:00:00 diff=7:00:00 dt_awake=12FEB2018:03:00:00 dt_sleep=11FEB2018:20:00:00
new_sleep_time=5:30:00 new_awake_time=12:30:00 diff=7:00:00 dt_awake=10FEB2018:12:30:00 dt_sleep=10FEB2018:05:30:00
I have a data frame that looks like this:
date timestamp transfer ID IP Address Username Encryption File Bytes Speed DateTimeStamp
1 20160525 08:22:06.838 F798256B 10.199.194.38:57708 wei2dt - "" 264 "1.62 seconds (1.30 kilobits/sec)" 20160525 08:22:06.838
2 20160525 08:28:26.920 F798256C 10.19.105.15:57708 wei2dt - "isi_audit_log.dmp-sv.tmp" 69 "0.29 seconds (1.93 kilobits/sec)" 20160525 08:28:26.920
3 20160525 08:28:26.923 F798256D 10.19.105.15:57708 wei2dt - "isi_audit_log.dmp-sv.met" 0 "Unable to stat isi_audit_log.dmp-sv.met: No such file or directory" 20160525 08:28:26.923
4 20160525 08:28:26.933 F798256E 10.19.105.15:57708 wei2dt - "CG0009 1364_GT_report.txt" 34 "0.01 seconds (34.0 kilobits/sec)" 20160525 08:28:26.933
I want to count the number of users (usernames) that were online at a certain time. Essentially, I want to check every five minutes or so how many users were active. I need to use the DateTimestamp column to create my intervals and utilize it as a condition to count the number of distinct users at that period of time. I've tried using a while loop to do something of the sort, but it did not work. Are there any suggestions on how I should go about this?
With dplyr
df %>% mutate(timeInt=cut(DateTimeStamp,breaks="5 min")) %>%
group_by(timeInt) %>% summarise(numberUniqueUsers=length(unique(Username)))
I'm trying to get the days, hours and minutes of 9000000 milliseconds, but moment.js is returning 0 days. I'm using Format plugin for the Moment Duration object. https://github.com/jsmreese/moment-duration-format
moment.duration(9000000, "milliseconds").format("dd:hh:mm");
returns "02:30"
How did I get 9000000?
var ms = moment.duration({
days: 1,
hours: 2,
minutes: 30,
})
console.log(ms._milliseconds);
// 9000000
Sounds like humanizeduration is what you are looking for:
humanizeDuration(97320000) // '1 day, 3 hours, 2 minutes'
Here is the github link:
https://github.com/EvanHahn/HumanizeDuration.js
1000 x 60 x 60 x 24 = 86'400'000 milliseconds.
Of course 9 mil is 0 days.
9'000'000 / (1000 x 60 x 60) = 2.5h = 2 hours 30 min
I hope I know how to use calculator
Check this place