and sorry, the following question is probably easy to answer, but I'm just stuck.
I have been querying a value at 5-minute intervals for a year, which I now want to export from Graphite to a CSV file.
http://192.168.1.1/graphite/render/?target=summarize(icinga2.XXXMON1.services.www_XXX_de_-_Besucher_online.check_user_xxx.perfdata.Besucher.value, '5min', 'total')&from=20200101&format=csv
The output looks like this:
2021-01-19 15:00:00,7961.666666666667
2021-01-19 15:05:00,
2021-01-19 15:10:00,
2021-01-19 15:15:00,7635.0
2021-01-19 15:20:00,
2021-01-19 15:25:00,
2021-01-19 15:30:00,7447.333333333333
2021-01-19 15:35:00,
2021-01-19 15:40:00,
2021-01-19 15:45:00,7446.0
2021-01-19 15:50:00,
2021-01-19 15:55:00,
2021-01-19 16:00:00,
What function can I use to have the values output to me in 5minute increments? If I limit the query time to one day, I get the desired data:
2021-01-19 15:00:00,8032.0
2021-01-19 15:05:00,8032.0
2021-01-19 15:10:00,7821.0
2021-01-19 15:15:00,7821.0
2021-01-19 15:20:00,7542.0
2021-01-19 15:25:00,7542.0
2021-01-19 15:30:00,7448.0
2021-01-19 15:35:00,7448.0
2021-01-19 15:40:00,7446.0
2021-01-19 15:45:00,7446.0
2021-01-19 15:50:00,7581.0
2021-01-19 15:55:00,7581.0
Thank you very much for the help!
Related
I am working on a data frame that has a column that uses the date and time format as "mm/dd/yy HH:MM" for some observations and "yyyy/mm/dd HH:MM:SS" format for other observations, of course, this inconsistency results in errors or NA returns in my code, how can I unify the whole column so my calculations are not interrupted by this inconsistency?
enter image description here
Update step by step:
We have a data frame with for columns each of them have character type columns (you can check this with str(df)
In order to change the format from character to datetime in all four columns we use mutate(across(1:4, ...
What we want is that in each column 1:4 the character type is changed to datetime
this can be done with the function parse_date_time from lubridate package
Here we use ~ to indicate an anonymous function
the . indicates column 1-4.
and most important the argument c("ymd_HMS", "mdy_HM") which gives the order of the different formats of the date columns!
We could use parse_date_time() from lubridate package. Important is the argument c("ymd_HMS", "mdy_HM"). Here you define the occurence of the different formats:
and note to use HMS , because:
hms, hm and ms usage is defunct, please use HMS, HM or MS instead. Deprecated in version '1.5.6'.
library(dplyr)
library(lubridate)
df %>%
mutate(across(1:4, ~parse_date_time(., c("ymd_HMS", "mdy_HM"))))
started_at ended_at started_at_1 ended_at_1
<dttm> <dttm> <dttm> <dttm>
1 2021-10-29 17:42:36 2021-10-29 18:00:23 2021-06-13 11:40:00 2021-06-13 12:02:00
2 2021-10-01 15:06:10 2021-10-01 15:09:23 2021-06-27 16:26:00 2021-06-27 16:39:00
3 2021-10-28 23:02:53 2021-10-28 23:07:11 2021-06-10 20:06:00 2021-06-10 20:28:00
4 2021-10-17 00:58:17 2021-10-17 01:02:08 2021-06-11 15:54:00 2021-06-11 16:11:00
5 2021-10-27 18:29:34 2021-10-27 18:34:48 2021-06-05 14:09:00 2021-06-05 14:42:00
6 2021-10-17 13:30:21 2021-10-17 13:35:26 2021-06-05 14:14:00 2021-06-05 14:37:00
7 2021-10-04 19:59:28 2021-10-04 21:06:24 2021-06-16 19:05:00 2021-06-16 19:16:00
8 2021-10-10 00:27:09 2021-10-10 00:39:58 2021-06-23 20:29:00 2021-06-23 20:43:00
data:
structure(list(started_at = c("2021-10-29 17:42:36", "2021-10-01 15:06:10",
"2021-10-28 23:02:53", "2021-10-17 00:58:17", "2021-10-27 18:29:34",
"2021-10-17 13:30:21", "2021-10-04 19:59:28", "2021-10-10 00:27:09"
), ended_at = c("2021-10-29 18:00:23", "2021-10-01 15:09:23",
"2021-10-28 23:07:11", "2021-10-17 01:02:08", "2021-10-27 18:34:48",
"2021-10-17 13:35:26", "2021-10-04 21:06:24", "2021-10-10 00:39:58"
), started_at_1 = c("6/13/21 11:40", "6/27/21 16:26", "6/10/21 20:06",
"6/11/21 15:54", "6/5/21 14:09", "6/5/21 14:14", "6/16/21 19:05",
"6/23/21 20:29"), ended_at_1 = c("6/13/21 12:02", "6/27/21 16:39",
"6/10/21 20:28", "6/11/21 16:11", "6/5/21 14:42", "6/5/21 14:37",
"6/16/21 19:16", "6/23/21 20:43")), class = "data.frame", row.names = c(NA,
-8L))
I have a data.frame with patient.time.in and patient.time.out which indicates when they are due to see a clinician and how long they will be take. There are 4 clinicians available: c("Ian", "Dan", "Anita")
patient.time.in <-
c("09:00:00","09:03:00",
"09:30:00","09:38:00",
"10:00:00","10:30:00",
"11:00:00","11:05:00",
"12:00:00","12:30:00",
"14:30:00","15:30:00")
patient.date.in <- "2022/03/29"
appointment.length <- c(runif(n=NROW(patient.time.in),min=10,max=90))
patient.infection <- sample(c("C","P","NA"),replace = TRUE,prob = c(1/2,1/3,1-1/2+1/3),size = NROW(patient.time.in))# c("P","P","NA","P","C","NA","C","P","NA","NA","C","P")
patient.roster <- data.frame(
ID=seq(1:12),
patient.time.in=lubridate::ymd_hms(paste(patient.date.in,patient.time.in)),
patient.time.out=lubridate::ymd_hms(paste(patient.date.in,patient.time.in))+lubridate::minutes(round(appointment.length)),
patient.infection=patient.infection,
seen.yet=rep("No"),
binary.seen.yet=0,
room=0)
How can I allocate the clinicians based on whether they are free?
So far I have:
patient.roster %>%
mutate(clinician = case_when(patient.time.in > lag(patient.time.out,1)~ "Ian",
TRUE~"Dan")) %>%
mutate(clinician = case_when(row_number()!=1 & clinician=="Ian" & patient.time.in > lag(patient.time.out,1)~ "Dan",
TRUE~"Anita")) %>%
select(patient.time.in, patient.time.out, clinician)
Expected Output: Coincidentally they are Ian Dan Anita repeated.
patient.time.in patient.time.out clinician
<dttm> <dttm> <chr>
1 2022-03-29 09:00:00 2022-03-29 10:02:00 Ian
2 2022-03-29 09:03:00 2022-03-29 10:21:00 Dan
3 2022-03-29 09:30:00 2022-03-29 10:53:00 Anita
4 2022-03-29 09:38:00 2022-03-29 10:45:00 Ian
5 2022-03-29 10:00:00 2022-03-29 11:06:00 Dan
6 2022-03-29 10:30:00 2022-03-29 11:34:00 Anita
7 2022-03-29 11:00:00 2022-03-29 12:27:00 Ian
8 2022-03-29 11:05:00 2022-03-29 12:21:00 Dan
9 2022-03-29 12:00:00 2022-03-29 12:18:00 Anita
10 2022-03-29 12:30:00 2022-03-29 13:28:00 Ian
I have a dataset where new data are recorded at a fixed interval (3-4 minutes). Each 8 records (rows) correspond to a same set of data (CC_01->04 and DC01->04) that I want to stamp to the previous half-hour.
For this I use the floor date function of lubridate that works perfectly:
lubridate::floor_date(data$Date_IV, "30 minutes")
However, sometimes the eighth record starts after the begining of the next half-hour and so the floor_date function stamps it with this new half-hour. But I would like it to be stamped with the previous one (as part of the subset).
Therefore I'm looking for a way to check when this eighth value differs from the previous 7, and correct it if needed.
An exemple :
Label Date_IV Obs. Exp_Flux Floor_date
1 CC_01 2021-07-08 12:38:00 1 -0.290000 2021-07-08 12:30:00
2 DC_01 2021-07-08 12:42:00 2 3.830000 2021-07-08 12:30:00
3 CC_02 2021-07-08 12:45:00 3 -0.527937 2021-07-08 12:30:00
4 DC_02 2021-07-08 12:49:00 4 2.260000 2021-07-08 12:30:00
5 CC_03 2021-07-08 12:52:00 5 -0.743471 2021-07-08 12:30:00
6 DC_03 2021-07-08 12:55:00 6 2.230000 2021-07-08 12:30:00
7 CC_04 2021-07-08 12:59:00 7 -1.510000 2021-07-08 12:30:00
8 DC_04 2021-07-08 13:02:00 8 1.820000 2021-07-08 13:00:00
9 CC_01 2021-07-08 13:05:00 9 -0.190000 2021-07-08 13:00:00
10 DC_01 2021-07-08 13:08:00 10 3.750000 2021-07-08 13:00:00
11 CC_02 2021-07-08 13:11:00 11 -0.423572 2021-07-08 13:00:00
12 DC_02 2021-07-08 13:14:00 12 2.230000 2021-07-08 13:00:00
13 CC_03 2021-07-08 13:18:00 13 -0.635882 2021-07-08 13:00:00
14 DC_03 2021-07-08 13:22:00 14 2.670000 2021-07-08 13:00:00
15 CC_04 2021-07-08 13:25:00 15 -1.440000 2021-07-08 13:00:00
16 DC_04 2021-07-08 13:29:00 16 1.860000 2021-07-08 13:00:00
In my example, the first 8 lines should be stamped to to 12:30:00. The function works for the first 7, but the eighth is stamped to 13:00 as the record was done at 13:02.
This situation doesn't appear for the second measurements set (lines 9->16) as the last measurement started before the next half-hour, so the eight are stamped with 13:00, which is correct. Nothing to correct here.
These measurements are repeated many times, so I cannot modify it by hands.
I hope it makes sens.
Thanks in advance for your help,
Adrien
You can create a group of every 8 rows or create a new group every time CC_01 occurs whichever is most appropriate according to your data and take floor_date value of first value in the group.
library(dplyr)
library(lubridate)
data %>%
group_by(grp = ceiling(Obs/8)) %>%
#Or increment the group value at every occurrence of CC_01
#group_by(grp = cumsum(Label == 'CC_01')) %>%
mutate(Floor_date = floor_date(first(Date_IV), '30 minutes')) %>%
ungroup
Is there a way to calculate a "task time" for working hours only? Working hours 8 to 5, Monday through Friday. Example:
Using datediff():
expected result:
sample task times:
df %>%
select(v_v_initiated,v_v_complete)
v_v_initiated v_v_complete
1 2020-04-23 14:13:52.0000000 2020-04-23 16:04:28.0000000
2 2020-11-10 11:48:53.0000000 2020-11-10 13:12:31.0000000
3 2020-10-20 16:03:39.0000000 2020-10-20 16:25:16.0000000
4 2020-04-02 13:43:54.0000000 2020-04-02 14:14:45.0000000
5 2020-07-09 08:52:54.0000000 2020-07-23 09:18:29.0000000
6 2020-06-09 14:56:33.0000000 2020-06-10 07:44:17.0000000
7 2020-09-17 15:11:39.0000000 2020-09-17 15:13:41.0000000
8 2020-10-28 14:08:20.0000000 2020-10-28 14:07:35.0000000
9 2020-04-21 12:55:36.0000000 2020-04-27 12:56:17.0000000
10 2020-11-06 11:02:03.0000000 2020-11-06 11:02:30.0000000
11 2020-02-17 12:29:21.0000000 2020-02-18 12:52:23.0000000
12 2020-08-25 15:25:46.0000000 2020-08-26 10:18:26.0000000
13 2020-02-19 15:05:28.0000000 2020-02-20 09:43:48.0000000
14 2020-09-23 21:19:41.0000000 2020-09-24 14:52:21.0000000
15 2020-07-01 14:20:11.0000000 2020-07-01 14:20:59.0000000
16 2020-05-01 15:22:58.0000000 2020-05-01 16:32:35.0000000
17 2020-06-29 13:10:58.0000000 2020-06-30 13:53:29.0000000
18 2020-06-16 12:56:54.0000000 2020-06-16 14:27:15.0000000
19 2020-03-27 11:02:29.0000000 2020-03-30 15:18:51.0000000
20 2020-04-08 07:38:01.0000000 2020-04-08 07:52:35.0000000
21 2020-07-30 09:32:42.0000000 2020-07-30 10:32:28.0000000
22 2020-06-17 14:03:31.0000000 2020-07-10 15:38:03.0000000
23 2020-04-24 10:41:27.0000000 2020-04-29 13:07:05.0000000
24 2020-08-26 10:41:10.0000000 2020-08-26 12:55:23.0000000
25 2020-10-26 18:11:16.0000000 2020-10-27 16:10:39.0000000
26 2020-01-08 11:12:49.0000000 2020-01-09 09:18:37.0000000
27 2020-04-17 11:40:10.0000000 2020-04-17 15:51:21.0000000
28 2020-02-11 10:38:21.0000000 2020-02-11 10:33:54.0000000
29 2020-03-23 12:10:21.0000000 2020-03-23 12:33:06.0000000
30 2020-06-02 12:44:00.0000000 2020-06-03 08:28:05.0000000
31 2020-04-13 09:30:31.0000000 2020-04-13 13:16:55.0000000
32 2020-04-07 17:36:02.0000000 2020-04-07 17:36:44.0000000
33 2020-01-15 12:24:42.0000000 2020-01-15 12:25:00.0000000
34 2020-08-18 08:55:58.0000000 2020-08-18 09:02:34.0000000
35 2020-07-06 14:10:23.0000000 2020-07-07 10:28:05.0000000
36 2020-03-25 15:03:20.0000000 2020-03-31 14:17:43.0000000
37 2020-01-29 12:58:33.0000000 2020-02-14 09:53:06.0000000
38 2020-02-07 15:11:21.0000000 2020-02-10 09:13:53.0000000
39 2020-07-27 17:51:13.0000000 2020-07-29 11:52:51.0000000
40 2020-09-02 11:43:02.0000000 2020-09-02 13:10:46.0000000
41 2020-07-22 11:04:50.0000000 2020-07-22 11:12:34.0000000
42 2020-06-29 13:57:17.0000000 2020-06-30 07:34:55.0000000
43 2020-07-21 10:46:58.0000000 2020-07-21 16:15:59.0000000
44 2020-05-27 07:38:46.0000000 2020-05-27 07:51:24.0000000
45 2020-07-14 10:33:49.0000000 2020-07-14 11:38:28.0000000
46 2020-06-04 16:59:09.0000000 2020-06-09 10:49:20.0000000
You could adapt another function that calculates business hours for a time interval (such as this.
First, create a sequence of dates from start to end, and filter by only include weekdays.
Next, create time intervals using the business hours of interest (in this case, "08:00" to "17:00").
Determine how much of each day business hours overlap with your times. This way, if a time starts at "09:05", that time will be used for the start of the day, and not "08:00".
Finally, sum up the time intervals, and determine the number of business days (assuming a 9-hour day), and remainder hours and minutes.
If you want to apply this function to rows in a data frame, you could use mapply as in:
df$business_hours <- mapply(calc_bus_hours, df$start_date, df$end_date)
Hope this is helpful.
library(lubridate)
library(dplyr)
calc_bus_hours <- function(start, end) {
my_dates <- seq.Date(as.Date(start), as.Date(end), by = "day")
my_dates <- my_dates[!weekdays(my_dates) %in% c("Saturday", "Sunday")]
my_intervals <- interval(ymd_hm(paste(my_dates, "08:00"), tz = "UTC"), ymd_hm(paste(my_dates, "17:00"), tz = "UTC"))
int_start(my_intervals[1]) <- pmax(pmin(start, int_end(my_intervals[1])), int_start(my_intervals[1]))
int_end(my_intervals[length(my_intervals)]) <- pmax(pmin(end, int_end(my_intervals[length(my_intervals)])),
int_start(my_intervals[length(my_intervals)]))
total_time <- sum(time_length(my_intervals, "minutes"))
total_days <- total_time %/% (9 * 60)
total_hours <- total_time %% (9 * 60) %/% 60
total_minutes <- total_time - (total_days * 9 * 60) - (total_hours * 60)
paste(total_days, "days,", total_hours, "hours,", total_minutes, "minutes")
}
calc_bus_hours(as.POSIXct("11/4/2020 9:05", format = "%m/%d/%Y %H:%M", tz = "UTC"),
as.POSIXct("11/9/2020 11:25", format = "%m/%d/%Y %H:%M", tz = "UTC"))
[1] "3 days, 2 hours, 20 minutes"
Edit: As mentioned by #DPH this is more complex with holidays and partial holidays.
You could create a data frame of holidays and indicate times open, allowing for partial holidays (e.g., Christmas Eve from 8:00 AM to Noon).
Here is a modified function that should give comparable results.
library(lubridate)
library(dplyr)
holiday_df <- data.frame(
date = as.Date(c("2020-12-24", "2020-12-25", "2020-12-31", "2020-01-01")),
start = c("08:00", "08:00", "08:00", "08:00"),
end = c("12:00", "08:00", "08:00", "08:00")
)
calc_bus_hours <- function(start, end) {
my_dates <- seq.Date(as.Date(start), as.Date(end), by = "day")
my_dates_df <- data.frame(
date = my_dates[!weekdays(my_dates) %in% c("Saturday", "Sunday")],
start = "08:00",
end = "17:00"
)
all_dates <- union_all(
inner_join(my_dates_df["date"], holiday_df),
anti_join(my_dates_df, holiday_df["date"])
) %>%
arrange(date)
my_intervals <- interval(ymd_hm(paste(all_dates$date, all_dates$start), tz = "UTC"),
ymd_hm(paste(all_dates$date, all_dates$end), tz = "UTC"))
int_start(my_intervals[1]) <- pmax(pmin(start, int_end(my_intervals[1])), int_start(my_intervals[1]))
int_end(my_intervals[length(my_intervals)]) <- pmax(pmin(end, int_end(my_intervals[length(my_intervals)])),
int_start(my_intervals[length(my_intervals)]))
total_time <- sum(time_length(my_intervals, "minutes"))
total_days <- total_time %/% (9 * 60)
total_hours <- total_time %% (9 * 60) %/% 60
total_minutes <- total_time - (total_days * 9 * 60) - (total_hours * 60)
paste(total_days, "days,", total_hours, "hours,", total_minutes, "minutes")
}
I have a data frame containing sleep data, with several sleep increments, with a column for the start and a column for the end of the sleep.
For some rows, the starting time is on the previous day and the end time is on the next day.
What I would like to do is to separate such rows into two rows, where the first row contains the starting time till 23:59:59, and the second row 00:00:00 till the end time.
For example:
# A tibble: 6 x 3
sleepdatestarttime sleepdateendtime sleepstage
<dttm> <dttm> <chr>
1 2018-03-02 23:31:00 2018-03-02 23:54:00 rem
2 2018-03-02 23:54:00 2018-03-02 23:55:00 light
3 2018-03-02 23:55:00 2018-03-03 00:02:00 wake
4 2018-03-03 00:02:00 2018-03-03 00:03:30 light
5 2018-03-03 00:03:30 2018-03-03 00:23:30 deep
6 2018-03-03 00:23:30 2018-03-03 02:58:00 light
and the desired output is:
# A tibble: 6 x 3
sleepdatestarttime sleepdateendtime sleepstage
<dttm> <dttm> <chr>
1 2018-03-02 23:31:00 2018-03-02 23:54:00 rem
2 2018-03-02 23:54:00 2018-03-02 23:55:00 light
**3 2018-03-02 23:55:00 2018-03-02 23:59:59 wake
4 2018-03-03 00:00:00 2018-03-03 00:01:59 wake**
5 2018-03-03 00:02:00 2018-03-03 00:03:30 light
6 2018-03-03 00:03:30 2018-03-03 00:23:30 deep
7 2018-03-03 00:23:30 2018-03-03 02:58:00 light
A dplyr solution would be very helpful.
Here is a possible solution but using just base R and not a dplyr. I converted all times to UTC to avoid issue with time conversions. (See a related answer change time zone in R without it returning to original time zone)
Note this solution resorts the entire dataframe by sleepdatestarttime so if there are multiple people on the same day, then the order function on the last line needs modification.
df<-read.table(header=TRUE, text="sleepdatestarttime sleepdateendtime sleepstage
'2018-03-02 23:31:00' '2018-03-02 23:54:00' rem
'2018-03-02 23:54:00' '2018-03-02 23:55:00' light
'2018-03-02 23:55:00' '2018-03-03 00:02:00' wake
'2018-03-03 00:02:00' '2018-03-03 00:03:30' light
'2018-03-03 00:03:30' '2018-03-03 00:23:30' deep
'2018-03-03 00:23:30' '2018-03-03 02:58:00' light")
df$sleepdatestarttime<-as.POSIXct(as.character(df$sleepdatestarttime), tz="UTC")
df$sleepdateendtime<-as.POSIXct(as.character(df$sleepdateendtime), tz="UTC")
#find rows across days
rows<-which(as.Date(df$sleepdatestarttime) !=as.Date(df$sleepdateendtime))
#create the new rows
nstart<-data.frame(sleepdatestarttime= df$sleepdatestarttime[rows],
sleepdateendtime= as.POSIXct(paste(as.Date(df$sleepdatestarttime[rows]), "23:59:59"), tz="UTC"),
sleepstage=df$sleepstage[rows])
nend<-data.frame(sleepdatestarttime= as.POSIXct(paste(as.Date(df$sleepdateendtime[rows]), "00:00:00"), tz="UTC"),
sleepdateendtime= df$sleepdateendtime[rows],
sleepstage=df$sleepstage[rows])
#substitute in the new start rows
df[rows,]<-nstart
#tack on the new ending rows
df<-rbind(df, nend)
#resort the dataframe
df<-df[order(df$sleepdatestarttime ),]
This is a common issue in genomics. The IRanges package on BioConductor has the findOverlaps() function for this purpose. foverlaps() is its data.table version which is used here. AFAIK, there is no dplyr equivalent available.
First we need to create a vector of day start and end times. The call to foverlaps() returns all possible types of overlaps. Finally, the start and end times are adjusted to match with the expected result.
library(data.table)
library(lubridate)
day_seq <- setDT(df)[, .(day_start = seq(
floor_date(min(sleepdatestarttime), "day"),
ceiling_date(max(sleepdateendtime), "day"), "day"))][
, day_end := day_start + days(1)]
setkey(day_seq, day_start, day_end)
foverlaps(
df, day_seq, by.x = c("sleepdatestarttime", "sleepdateendtime"), nomatch = 0L)[
, `:=`(sleepdatestarttime = pmax(sleepdatestarttime, day_start),
sleepdateendtime = pmin(sleepdateendtime, day_end - seconds(1)))][
, c("day_start", "day_end") := NULL][]
i sleepdatestarttime sleepdateendtime sleepstage
1: 1 2018-03-02 23:31:00 2018-03-02 23:54:00 rem
2: 2 2018-03-02 23:54:00 2018-03-02 23:55:00 light
3: 3 2018-03-02 23:55:00 2018-03-02 23:59:59 wake
4: 3 2018-03-03 00:00:00 2018-03-03 00:02:00 wake
5: 4 2018-03-03 00:02:00 2018-03-03 00:03:30 light
6: 5 2018-03-03 00:03:30 2018-03-03 00:23:30 deep
7: 6 2018-03-03 00:23:30 2018-03-03 02:58:00 light
Data
df <- readr::read_table("i sleepdatestarttime sleepdateendtime sleepstage
1 2018-03-02 23:31:00 2018-03-02 23:54:00 rem
2 2018-03-02 23:54:00 2018-03-02 23:55:00 light
3 2018-03-02 23:55:00 2018-03-03 00:02:00 wake
4 2018-03-03 00:02:00 2018-03-03 00:03:30 light
5 2018-03-03 00:03:30 2018-03-03 00:23:30 deep
6 2018-03-03 00:23:30 2018-03-03 02:58:00 light")