I have a df attached and I would like to create a loop that would apply a specific sequence (set by the user in R) based on conditions in column "x9". I would like to be able to set the sequence myself so I can try different sequences for this data frame, I will explain more below.
I have a df of losses and wins for an algorithm. On the first instance of a win I want to take the value in "x9" and divide it by the sequence value. I want to keep iterating through the sequence values until a loss is achieved. Once a loss is achieved the sequence will restart, when "x9" <0 to be specific.
I would like to create the two columns in my example "Risk Control" and "Sequence". Ideally I would like the function to iterate through the entire data frame so I can compare the column "x9" to "Risk Control".
Sample Data:
structure(list(x1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), x2 = c("2016.01.04 01:05",
"2016.01.04 01:12", "2016.01.04 01:13", "2016.01.04 01:17", "2016.01.04 01:20",
"2016.01.04 01:23", "2016.01.04 01:25", "2016.01.04 01:30", "2016.01.04 01:31",
"2016.01.04 01:59"), x3 = c("buy", "close", "buy", "close", "buy",
"close", "buy", "t/p", "buy", "close"), x4 = c(1, 1, 2, 2, 3,
3, 4, 4, 5, 5), x5 = c(8.46, 8.46, 8.6, 8.6, 8.69, 8.69, 8.83,
8.83, 9, 9), x6 = c(1.58873, 1.58955, 1.5887, 1.58924, 1.58862,
1.58946, 1.58802, 1.58902, 1.58822, 1.58899), x7 = c(1.57873,
1.57873, 1.5787, 1.5787, 1.57862, 1.57862, 1.57802, 1.57802,
1.57822, 1.57822), x8 = c(1.58973, 1.58973, 1.5897, 1.5897, 1.58962,
1.58962, 1.58902, 1.58902, 1.58922, 1.58922), x9 = c(0, 478.69,
0, 320.45, 0, 503.7, 0, 609.3, 0, 478.19), x10 = c(30000, 30478.69,
30478.69, 30799.14, 30799.14, 31302.84, 31302.84, 31912.14, 31912.14,
32390.33), `Risk Control` = c(NA, 478.69, NA, 320.45, NA, 251.85,
NA, 304.65, NA, 159.3966667), ...12 = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), Sequence = c(NA, 1, NA, 1, NA, 2, NA, 2, NA,
3)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
), spec = structure(list(cols = list(x1 = structure(list(), class = c("collector_double",
"collector")), x2 = structure(list(), class = c("collector_character",
"collector")), x3 = structure(list(), class = c("collector_character",
"collector")), x4 = structure(list(), class = c("collector_double",
"collector")), x5 = structure(list(), class = c("collector_double",
"collector")), x6 = structure(list(), class = c("collector_double",
"collector")), x7 = structure(list(), class = c("collector_double",
"collector")), x8 = structure(list(), class = c("collector_double",
"collector")), x9 = structure(list(), class = c("collector_double",
"collector")), x10 = structure(list(), class = c("collector_double",
"collector")), `Risk Control` = structure(list(), class = c("collector_double",
"collector")), ...12 = structure(list(), class = c("collector_logical",
"collector")), Sequence = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"))
In short I need assistance in:
1.Constructing a sequence to apply to my df, would like to be able to alter this sequence to try different sequences;
2.Take values in "x9" and create a new column that would apply the sequence values set. The sequence is taking the value in "x9" and dividing it by the sequence number
3.Construct a loop to iterate through the entire df to apply this over all of the values of the dataframe.
In the example above I have manually created "Risk Control" and the sample "Sequence". The sequence in the example is 1,1,2,2,3,3,4. The sequence in the sample uses each number twice before iterating to the next number. Once a loss is achieved in "x9" the sequence restarts.
I would appreciate any help with this function and loop. Thank you
Starting with input data only (not desired columns)
df1 <- df %>% select(1:10)
Reducing this data to only data with x9 not zero
This may not be intended and the user may prefer to key off an x3 event, but hopefully is illustrative.
df1 <- df1 %>% filter(x9 != 0)
Initiate seq column and insert dummy data.
df1$seq <- c(1, NA, 1, NA, NA)
Fill in, thanks to Allan Cameron for this answer to my post link
df1$seq <- unlist(sapply(diff(c(which(!is.na(df1$seq)), nrow(df1) + 1)), seq))
Apply user's rule 2:
df1$risk_control <- df1$x9 / df1$seq
# A tibble: 5 x 12
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 seq risk_control
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 2 2016.01.04 0~ close 1 8.46 1.59 1.58 1.59 479. 30479. 1 479.
2 4 2016.01.04 0~ close 2 8.6 1.59 1.58 1.59 320. 30799. 2 160.
3 6 2016.01.04 0~ close 3 8.69 1.59 1.58 1.59 504. 31303. 1 504.
4 8 2016.01.04 0~ t/p 4 8.83 1.59 1.58 1.59 609. 31912. 2 305.
5 10 2016.01.04 0~ close 5 9 1.59 1.58 1.59 478. 32390. 3 159.
Recombining this with the original data can be performed if desired with:
df2 <- dplyr::left_join(df[, -c(11:13)], df1)
# A tibble: 10 x 12
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 seq risk_control
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 1 2016.01.04 ~ buy 1 8.46 1.59 1.58 1.59 0 30000 NA NA
2 2 2016.01.04 ~ close 1 8.46 1.59 1.58 1.59 479. 30479. 1 479.
3 3 2016.01.04 ~ buy 2 8.6 1.59 1.58 1.59 0 30479. NA NA
4 4 2016.01.04 ~ close 2 8.6 1.59 1.58 1.59 320. 30799. 2 160.
5 5 2016.01.04 ~ buy 3 8.69 1.59 1.58 1.59 0 30799. NA NA
6 6 2016.01.04 ~ close 3 8.69 1.59 1.58 1.59 504. 31303. 1 504.
7 7 2016.01.04 ~ buy 4 8.83 1.59 1.58 1.59 0 31303. NA NA
8 8 2016.01.04 ~ t/p 4 8.83 1.59 1.58 1.59 609. 31912. 2 305.
9 9 2016.01.04 ~ buy 5 9 1.59 1.58 1.59 0 31912. NA NA
10 10 2016.01.04 ~ close 5 9 1.59 1.58 1.59 478. 32390. 3 159.
Related
I want to pick up rows of which time data is between multiple intervals.
The data frame is like this:
dputs
structure(list(ID = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B"), score_time = c("2022/09/01 9:00:00", "2022/09/02 18:00:00",
"2022/09/03 12:00:00", NA, NA, "2022/09/15 18:00:00", "2022/09/18 20:00:00",
NA, NA, NA), score = c(243, 232, 319, NA, NA, 436, 310, NA, NA,
NA), treatment_start = c(NA, NA, NA, "2022/09/02 8:00:00", "2022/09/03 11:00:00",
NA, NA, "2022/09/15 8:00:00", "2022/09/16 14:00:00", "2022/09/16 23:00:00"
), treatment_end = c(NA, NA, NA, "2022/09/02 22:00:00", "2022/09/09 12:00:00",
NA, NA, "2022/09/16 2:00:00", "2022/09/16 22:00:00", "2022/09/17 0:00:00"
)), row.names = c(NA, -10L), spec = structure(list(cols = list(
ID = structure(list(), class = c("collector_character", "collector"
)), score_time = structure(list(), class = c("collector_character",
"collector")), score = structure(list(), class = c("collector_double",
"collector")), treatment_start = structure(list(), class = c("collector_character",
"collector")), treatment_end = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x6000000190b0>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
ID score_time score treatment_start treatment_end
<chr> <chr> <dbl> <chr> <chr>
1 A 2022/09/01 9:00:00 243 NA NA
2 A 2022/09/02 18:00:00 232 NA NA
3 A 2022/09/03 12:00:00 319 NA NA
4 A NA NA 2022/09/02 8:00:00 2022/09/02 22:00:00
5 A NA NA 2022/09/03 11:00:00 2022/09/09 12:00:00
6 B 2022/09/15 18:00:00 436 NA NA
7 B 2022/09/18 20:00:00 310 NA NA
8 B NA NA 2022/09/15 8:00:00 2022/09/16 2:00:00
9 B NA NA 2022/09/16 14:00:00 2022/09/16 22:00:00
10 B NA NA 2022/09/16 23:00:00 2022/09/17 0:00:00
Multiple score values are given for each ID with the measurement time.
And each ID has more than one information of treatment duration shown by start and end time.
My target is score values that are measured during treatment periods.
I tried with the package lubridate and tidyverse to mutate intervals but could not apply "%in%" method.
Here is my attempt until putting intervals in the same rows with score values.
data %>%
mutate(trt_interval = interval(start = treatment_start, end = treatment_end)) %>%
group_by(ID) %>%
mutate(num = row_number()) %>%
pivot_wider(names_from = num, names_prefix = "intvl", values_from = trt_interval) %>%
fill(c(intvl1:last_col()), .direction = "up")
Desired output is like this.
(The first score of A and the last score of B dismissed because their score_time are out of interval.)
ID score
<chr> <dbl>
1 A 232
2 A 319
3 B 436
I want to know the smarter way to put data in a row and how to apply "%in%" for multiple intervals.
Sorry that the question is not qualified and include multiple steps but any advices will be a great help for me.
Hi I would first create two seperate data frames. One for the scores and one for the intervalls. Then would I join them both and filter the score that are within an treatment intervall.
data_score <- data %>%
filter(!is.na(score_time)) %>%
select(-starts_with("treat")) %>%
mutate(score_time = ymd_hms(score_time))
data_score
data_interval <- data %>%
filter(is.na(score_time)) %>%
select(ID,starts_with("treat")) %>%
mutate(trt_interval = interval(start = treatment_start, end = treatment_end))
data_score %>%
inner_join(
data_interval
) %>%
filter(
lubridate::`%within%`(score_time,trt_interval )
)
Hope this helps!!
This question already has answers here:
Calculate difference between values in consecutive rows by group
(4 answers)
Closed 1 year ago.
I have the following dataframe structure
Company Name stock price Date
HP 1.2 10/05/2020
HP 1.4 11/05/2020
APPL 1.1 05/03/2020
APPL 1.2 06/03/2020
FB 5 15/08/2020
FB 5.2 16/08/2020
FB 5.3 17/08/2020
...and so on for multiple companies and their stock prices for different dates.
I wish to calculate daily returns for each stock and I am trying to figure out the code to iterate this dataframe for each company. I.e. when we are done with APPL we start again over for FB by setting the first row to N/A since we don't have returns to compare with, and so on as shown below.
Company Name stock price Date Daily Returns
HP 1.2 10/05/2020 N/A
HP 1.4 11/05/2020 0.2
APPL 1.1 05/03/2020 N/A
APPL 1.2 06/03/2020 0.1
FB 5 15/08/2020 N/A
FB 5.2 16/08/2020 0.2
FB 5.3 17/08/2020 0.1
Is there a more efficient solution to tackle this than extracting a list of unique company names and then cycling through each of them to perform this calculation?
You should use dplyr for this kind of tasks:
library(dplyr)
df %>%
arrange(Company_Name, Date) %>%
group_by(Company_Name) %>%
mutate(Daily_Returns = stock_price - lag(stock_price)) %>%
ungroup()
This returns
Company_Name stock_price Date Daily_Returns
<chr> <dbl> <chr> <dbl>
1 HP 1.2 10/05/2020 NA
2 HP 1.4 11/05/2020 0.2
3 APPL 1.1 05/03/2020 NA
4 APPL 1.2 06/03/2020 0.100
5 FB 5 15/08/2020 NA
6 FB 5.2 16/08/2020 0.200
7 FB 5.3 17/08/2020 0.100
First we order the data by Company_Name and Date
Then we group it by Company_Name, so every calculation starts over again for a new company
Then we calculate the daily returns by substracting the former day (here we use lag)
Data
structure(list(Company_Name = c("HP", "HP", "APPL", "APPL", "FB",
"FB", "FB"), stock_price = c(1.2, 1.4, 1.1, 1.2, 5, 5.2, 5.3),
Date = c("10/05/2020", "11/05/2020", "05/03/2020", "06/03/2020",
"15/08/2020", "16/08/2020", "17/08/2020")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L), spec = structure(list(
cols = list(Company_Name = structure(list(), class = c("collector_character",
"collector")), stock_price = structure(list(), class = c("collector_double",
"collector")), Date = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
I apologize if this is a duplicate question, I couldn't seem to find anything quite like this.
I have some data that I am cleaning and I need to fill missing values. Data looks like this, with dput below. Decimals were removed in print, but included in dput.
> print(tbl_df(df), n=26)
# A tibble: 26 x 6
Year Trial Group1 Group2 Group3 Group4
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Year1 2 346588. 156266 34806. NA
2 Year1 3 342573 NA 34652. 292001.
3 Year1 5 286285. 129257. 29645. 252786.
4 Year1 7 234410. NA 24536. NA
5 Year1 9 184733. 82944. NA 170653
6 Year1 10 NA 81419. 19461 167273.
7 Year1 11 169620. 74688. 18065 155442
8 Year1 14 107652 48381. 11941. 100076
9 Year1 15 88440 39807 10123. 83137
10 Year1 17 NA 31608 7926 64551.
11 Year1 18 63622 29236 7444. 58848.
12 Year1 22 14143. 6366. 1683. 10889.
13 Year2 22 279904 102271 28221. 138804.
14 Year2 25 200386 78628. 21942 NA
15 Year2 26 157182. NA 18099. 91963.
16 Year2 28 121122. 54538 14532. 76422
17 Year2 30 25899. 16773 489. NA
18 Year2 32 112091. 51219. 11298. 71655.
19 Year2 33 108756 49311. 10589. 70167
20 Year2 34 NA 49127. NA 69195.
21 Year2 36 104827 42651. 8568. 63580.
22 Year2 38 44849 14114 2302. 11652
23 Year2 40 104407. 42545 6240 63318.
24 Year2 41 99059. 38423 6766. 58017
25 Year2 42 NA 40432. NA 57932.
26 Year2 44 49119. 8796. 4769. 11233.
dput(df)
structure(list(Year = c("Year1", "Year1", "Year1", "Year1", "Year1",
"Year1", "Year1", "Year1", "Year1", "Year1", "Year1", "Year1",
"Year2", "Year2", "Year2", "Year2", "Year2", "Year2", "Year2",
"Year2", "Year2", "Year2", "Year2", "Year2", "Year2", "Year2"
), Trial = c(2, 3, 5, 7, 9, 10, 11, 14, 15, 17, 18, 22, 22, 25,
26, 28, 30, 32, 33, 34, 36, 38, 40, 41, 42, 44), Group1 = c(346587.6667,
342573, 286285.3333, 234409.6667, 184733.3333, NA, 169620.3333,
107652, 88440, NA, 63622, 14143.33333, 279904, 200386, 157182.3333,
121122.3333, 25899.33333, 112090.6667, 108756, NA, 104827, 44849,
104407.3333, 99058.66667, NA, 49119.33333), Group2 = c(156266,
NA, 129257.3333, NA, 82943.66667, 81419.33333, 74688.33333, 48381.33333,
39807, 31608, 29236, 6365.666667, 102271, 78628.33333, NA, 54538,
16773, 51218.66667, 49311.33333, 49127.33333, 42650.66667, 14114,
42545, 38423, 40432.33333, 8795.666667), Group3 = c(34805.66667,
34651.66667, 29644.66667, 24535.66667, NA, 19461, 18065, 11941.33333,
10123.33333, 7926, 7444.333333, 1683.333333, 28221.33333, 21942,
18099.33333, 14532.33333, 489.3333333, 11297.66667, 10588.66667,
NA, 8567.666667, 2302.333333, 6240, 6765.666667, NA, 4769.333333
), Group4 = c(NA, 292000.6667, 252785.6667, NA, 170653, 167273.3333,
155442, 100076, 83137, 64551.33333, 58847.66667, 10888.66667,
138803.6667, NA, 91963.33333, 76422, NA, 71655.33333, 70167,
69195.33333, 63579.66667, 11652, 63317.66667, 58017, 57932.33333,
11232.66667)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -26L), spec = structure(list(cols = list(
Year = structure(list(), class = c("collector_character",
"collector")), Trial = structure(list(), class = c("collector_double",
"collector")), Group1 = structure(list(), class = c("collector_double",
"collector")), Group2 = structure(list(), class = c("collector_double",
"collector")), Group3 = structure(list(), class = c("collector_double",
"collector")), Group4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
Basically, I need to fill the na values with the previous trial (trials in descending order). So for example, I need to fill row 6, column 3 with the data from row 6, column 4.
But that's not all. I need to create a row for days with missing trials, and then fill those rows with the last trial as well. This is the thing I'm getting hung up on. Is there a way to accomplish both of these?
For example, I need to change tail(df) from A to B.
A.
Year Trial Group1 Group2 Group3 Group4
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Year2 40 104407. 42545 6240 63318.
2 Year2 41 99059. 38423 6766. 58017
3 Year2 42 NA 40432. NA 57932.
4 Year2 44 49119. 8796. 4769. 11233.
B.
Year Trial Group1 Group2 Group3 Group4
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Year2 40 104407. 42545 6240 63318.
2 Year2 41 99059. 38423 6766. 58017
3 Year2 42 49119. 40432. 4769. 57932.
4 Year2 43 49119. 40432. 4769. 57932.
5 Year2 44 49119. 8796. 4769. 11233.
You can use complete and fill with .direction = 'up'
library(dplyr)
library(tidyr)
df %>%
group_by(Year) %>%
complete(Trial = min(Trial):max(Trial)) %>%
fill(starts_with('Group'), .direction = 'up') %>%
ungroup
# A tibble: 44 x 6
# Year Trial Group1 Group2 Group3 Group4
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Year1 2 346588. 156266 34806. 292001.
# 2 Year1 3 342573 129257. 34652. 292001.
# 3 Year1 4 286285. 129257. 29645. 252786.
# 4 Year1 5 286285. 129257. 29645. 252786.
# 5 Year1 6 234410. 82944. 24536. 170653
# 6 Year1 7 234410. 82944. 24536. 170653
# 7 Year1 8 184733. 82944. 19461 170653
# 8 Year1 9 184733. 82944. 19461 170653
# 9 Year1 10 169620. 81419. 19461 167273.
#10 Year1 11 169620. 74688. 18065 155442
# … with 34 more rows
I have a dataframe containing n rows and m columns. Each row is an individual and each column is information on that individual.
df
id age income
1 18 12
2 24 24
3 36 12
4 18 24
. . .
. . .
. . .
I also have a matrix rXcshowing age buckets in each row and income buckets in each column and each element of the matrix is the % of people for each income-age bucket.
matrix age\income
12 24 36 .....
18 0.15 0.12 0.11 ....
24 0.12 0.6 0.2 ...
36 0.02 0.16 0.16 ...
. ..................
. ..................
For each individual in the dataframe, I need to find the right element of the matrix given the age and income bucket of the individual.
The desired output should look like this
df2
id age income y
1 18 12 0.15
2 24 24 0.6
3 36 12 0.02
4 18 24 0.12
. . .
. . .
. . .
I tried with a series of IFs inside a loop (like in the example):
for (i in 1:length(df$x)) {
workingset <- df[i,]
if(workingset$age==18){
temp<-marix[1,]
workingset$y <- ifelse(workingset$income<12, temp[1], ifelse(workingset$income<24,temp[2],ifelse,temp[3])
}else if(workingset$age==24){
temp<-marix[2,]
workingset$y <- ifelse(workingset$income<12, temp[1], ifelse(workingset$income<24,temp[2],ifelse,temp[3])
}else if{
...
}
if(i==1){
df2 <- workingset
}else{
df2<- rbind(df2, workingset)
}
}
This code works, but it takes too long. Is there a way do this job efficiently?
Assuming your data looks exactly like shown you could use dplyr and tidyr.
First convert your matrix (I name it my_mat) into a data.frame
my_mat %>%
as.data.frame() %>%
mutate(age=rownames(.)) %>%
pivot_longer(cols=-age, names_to="income", values_to="y") %>%
mutate(across(where(is.character), as.numeric))
returns
# A tibble: 9 x 3
age income y
<dbl> <dbl> <dbl>
1 18 12 0.15
2 18 24 0.12
3 18 36 0.11
4 24 12 0.12
5 24 24 0.6
6 24 36 0.2
7 36 12 0.02
8 36 24 0.16
9 36 36 0.16
This can be left joined with your data.frame df, so in one go:
my_mat %>%
as.data.frame() %>%
mutate(age=rownames(.)) %>%
pivot_longer(cols=-age, names_to="income", values_to="y") %>%
mutate(across(where(is.character), as.numeric)) %>%
left_join(df, ., by=c("age", "income"))
gives you
# A tibble: 4 x 4
id age income y
<dbl> <dbl> <dbl> <dbl>
1 1 18 12 0.15
2 2 24 24 0.6
3 3 36 12 0.02
4 4 18 24 0.12
Data
my_mat <- structure(c(0.15, 0.12, 0.02, 0.12, 0.6, 0.16, 0.11, 0.2, 0.16
), .Dim = c(3L, 3L), .Dimnames = list(c("18", "24", "36"), c("12",
"24", "36")))
df <- structure(list(id = c(1, 2, 3, 4), age = c(18, 24, 36, 18), income = c(12,
24, 12, 24)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), spec = structure(list(cols = list(
id = structure(list(), class = c("collector_double", "collector"
)), age = structure(list(), class = c("collector_double",
"collector")), income = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
I have a df with timesheet data and am looking for an easy way to summarize it. My data looks like df1 and I want to summarize it as df2. I am having a hard time devising a way to create the increments and allocate hours across them. The tricky part is allocating the hours that span across dates, ID 1 and 3, for example.
df1
ID Garage Unit_Name START_DATE_TIME END_DATE_TIME
<chr> <chr> <chr> <dttm> <dttm>
1 A Truck 1/26/2015 21:00 1/27/2015 7:00
2 B Truck 5/13/2015 6:00 5/13/2015 16:00
3 C Car 8/21/2015 21:00 8/22/2015 7:00
6 C Car 8/21/2015 11:00 8/21/2015 21:00
structure(list(ID = c("<chr>", "1", "2", "3", "6", NA, NA, NA,
NA, NA, NA), Garage = c("<chr>", "A", "B", "C", "C", NA, NA,
NA, NA, NA, NA), Unit_Name = c("<chr>", "Truck", "Truck", "Car",
"Car", NA, NA, NA, NA, NA, NA), START_DATE_TIME = c("<dttm>",
"1/26/2015 21:00", "5/13/2015 6:00", "8/21/2015 21:00", "8/21/2015 11:00",
NA, NA, NA, NA, NA, NA), END_DATE_TIME = c("<dttm>", "1/27/2015 7:00",
"5/13/2015 16:00", "8/22/2015 7:00", "8/21/2015 21:00", NA, NA,
NA, NA, NA, NA)), .Names = c("ID", "Garage", "Unit_Name", "START_DATE_TIME",
"END_DATE_TIME"), row.names = c(NA, -11L), class = c("tbl_df",
"tbl", "data.frame"), spec = structure(list(cols = structure(list(
ID = structure(list(), class = c("collector_character", "collector"
)), Garage = structure(list(), class = c("collector_character",
"collector")), Unit_Name = structure(list(), class = c("collector_character",
"collector")), START_DATE_TIME = structure(list(), class = c("collector_character",
"collector")), END_DATE_TIME = structure(list(), class = c("collector_character",
"collector"))), .Names = c("ID", "Garage", "Unit_Name", "START_DATE_TIME",
"END_DATE_TIME")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
df2
Garage Unit_Name Date Increment Hours
<chr> <chr> <dttm> <chr> <dbl>
A Truck 1/26/2015 18:01-00:00 3
A Truck 1/27/2015 00:01-6:00 6
A Truck 1/27/2015 6:01-12:00 1
B Truck 5/13/2015 6:01-12:00 6
B Truck 5/13/2015 12:01-18:00 4
C Car 8/21/2015 6:01-12:00 1
C Car 8/21/2015 12:01-18:00 6
C Car 8/21/2015 18:01-00:00 6
C Car 8/22/2015 00:01-6:00 6
C Car 8/23/2015 6:01-12:00 1
library(tidyverse)
library(lubridate)
times=c("00:00","06:00","12:00","18:00")
times1=c("00:01","06:01","12:01","18:01")
df1%>%
group_by(Garage,Unit_Name)%>%
mutate(size=n())%>%
summarise(START_DATE_TIME=min(START_DATE_TIME),
END_DATE_TIME=max(END_DATE_TIME))%>%
mutate(S=mdy_hm(START_DATE_TIME),
b=floor(hour(S)/24*4)+1,
m=ymd_hm(paste(format(S,"%F"),get("times",.GlobalEnv)[b])),
n=ymd_hm(paste(format(S,"%F"),get("times",.GlobalEnv)[(b+1)%%4%>%replace(.,.==0,4)]))%>%
if_else(m>.,.+days(1),.),
rem=as.numeric(mdy_hm(END_DATE_TIME)-n),
HOURS=list(as.numeric(c(n-S,rep(6,rem%/%6),rem%%6))))%>%
unnest()%>%
mutate(Date=S+hours(cumsum(lag(HOURS,default = 0))),
b=floor(hour(Date)/24*4)+1,
increament=paste0(get("times1",.GlobalEnv)[b],"-",
get("times",.GlobalEnv)[replace(d<-(b+1)%%4,d==0,4)]),
Date=as.Date(Date))%>%
select(Garage,Date,HOURS,increament)
Groups: Garage [3]
Garage Date HOURS increament
<chr> <date> <dbl> <chr>
1 A 2015-01-26 3. 18:01-00:00
2 A 2015-01-27 6. 00:01-06:00
3 A 2015-01-27 1. 06:01-12:00
4 B 2015-05-13 6. 06:01-12:00
5 B 2015-05-13 4. 12:01-18:00
6 C 2015-08-21 1. 06:01-12:00
7 C 2015-08-21 6. 12:01-18:00
8 C 2015-08-21 6. 18:01-00:00
9 C 2015-08-22 6. 00:01-06:00
10 C 2015-08-22 1. 06:01-12:00