Subsetting 10 day intervals with a overlapping date - r

I have a annual data set that I would like to break into 10 day intervals. For example I would like to subset 2010-12-26 to 2011-01-04 create a home range using the x and y values for those dates, then get the next 9 days plus an overlapping date between the subsetted data this case it would be 2011-01-04 (2011-01-04 to 2011-01-13). Is there a good way to do this?
#Example dataset
library(lubridate)
date <- seq(dmy("26-12-2010"), dmy("15-01-2013"), by = "days")
df <- data.frame(date = date,
x = runif(752, min = 60000, max = 80000),
y = runif(752, min = 800000, max = 900000))
> df
date x y
1 2010-12-26 73649.16 894525.6
2 2010-12-27 69005.21 898233.7
3 2010-12-28 64982.90 873692.6
4 2010-12-29 64592.93 841055.2
5 2010-12-30 60475.99 854524.3
6 2010-12-31 79206.43 879468.2
7 2011-01-01 76692.40 830569.6
8 2011-01-02 70378.51 834338.2
9 2011-01-03 74977.73 820568.0
10 2011-01-04 63023.47 899482.3
11 2011-01-05 77046.80 886369.0
12 2011-01-06 68751.91 841074.7
13 2011-01-07 65471.34 888525.3
14 2011-01-08 61138.68 855039.5
15 2011-01-09 65660.66 880227.2
16 2011-01-10 75526.36 838478.6
17 2011-01-11 64485.74 808947.7
18 2011-01-12 61405.69 887784.1
19 2011-01-13 70561.86 847634.7
20 2011-01-14 69234.98 840012.1
21 2011-01-15 75539.43 817132.5
22 2011-01-16 74227.28 839230.4
23 2011-01-17 74548.59 855006.3
24 2011-01-18 72020.71 815036.7
25 2011-01-19 70814.50 883029.6
26 2011-01-20 76924.65 817289.5
27 2011-01-21 60556.21 807427.2
Thank you for your time.

What about this?
res <- lapply(
seq(0, nrow(df), by = 10),
function(k) df[max(k, 1):min(k + 10, nrow(df)), ]
)
which gives
> head(res)
[[1]]
date x y
1 2010-12-26 63748.27 856758.7
2 2010-12-27 73774.90 860222.6
3 2010-12-28 68893.24 804194.7
4 2010-12-29 79791.86 810624.5
5 2010-12-30 60073.50 809016.0
6 2010-12-31 74020.15 883304.9
7 2011-01-01 67144.95 889235.3
8 2011-01-02 67205.20 810514.2
9 2011-01-03 68518.68 882730.7
10 2011-01-04 70442.87 892934.1
[[2]]
date x y
10 2011-01-04 70442.87 892934.1
11 2011-01-05 65466.26 855725.2
12 2011-01-06 70034.79 879770.8
13 2011-01-07 60195.42 888653.4
14 2011-01-08 65208.12 883176.8
15 2011-01-09 63040.52 821902.3
16 2011-01-10 62302.66 815025.1
17 2011-01-11 77662.53 829474.5
18 2011-01-12 64802.65 809961.7
19 2011-01-13 71812.61 810755.1
20 2011-01-14 63086.30 820029.9
[[3]]
date x y
20 2011-01-14 63086.30 820029.9
21 2011-01-15 75548.71 806966.7
22 2011-01-16 68572.89 847679.0
23 2011-01-17 71408.65 889490.2
24 2011-01-18 73507.84 815559.7
25 2011-01-19 76854.50 899108.6
26 2011-01-20 79138.08 858537.1
27 2011-01-21 73960.14 898957.3
28 2011-01-22 75048.41 864425.6
29 2011-01-23 61059.20 857558.3
30 2011-01-24 67455.03 853017.1
[[4]]
date x y
30 2011-01-24 67455.03 853017.1
31 2011-01-25 72727.70 891708.8
32 2011-01-26 73230.11 836404.6
33 2011-01-27 67719.05 815528.3
34 2011-01-28 65139.66 826289.8
35 2011-01-29 65145.94 818736.4
36 2011-01-30 74206.03 839014.2
37 2011-01-31 77259.35 855653.0
38 2011-02-01 77809.65 836912.6
39 2011-02-02 62744.02 831549.0
40 2011-02-03 79594.93 873313.6
[[5]]
date x y
40 2011-02-03 79594.93 873313.6
41 2011-02-04 78942.86 825001.1
42 2011-02-05 61346.88 871578.5
43 2011-02-06 68526.18 863300.7
44 2011-02-07 76920.15 844180.0
45 2011-02-08 73023.08 823092.4
46 2011-02-09 64287.09 804682.7
47 2011-02-10 71377.16 829219.8
48 2011-02-11 68930.80 814626.6
49 2011-02-12 70780.95 831549.8
50 2011-02-13 73740.99 895868.0
[[6]]
date x y
50 2011-02-13 73740.99 895868.0
51 2011-02-14 79846.05 844586.6
52 2011-02-15 66559.60 835943.0
53 2011-02-16 68522.99 837633.2
54 2011-02-17 65898.75 891364.4
55 2011-02-18 73809.44 842797.9
56 2011-02-19 73336.53 821166.5
57 2011-02-20 72780.91 883200.6
58 2011-02-21 73240.81 864142.2
59 2011-02-22 78855.11 868599.6
60 2011-02-23 69236.04 845566.6

Alternative solution using dplyr package and applicable when instead of groups of 10 you want groups of n dates. We assume one row per date as in your example.
library(lubridate)
dt <- seq(dmy("26-12-2010"), dmy("15-01-2013"), by = "days")
df <- data.frame(date = dt,
x = runif(752, min = 60000, max = 80000),
y = runif(752, min = 800000, max = 900000))
library(dplyr)
n <- 10
df |>
arrange(date) |>
mutate(id = 0:(nrow(df) - 1),
group = id %/% n + 1) |>
group_by(group) |>
group_split() |>
head(n=2)
#> [[1]]
#> # A tibble: 10 x 5
#> date x y id group
#> <date> <dbl> <dbl> <int> <dbl>
#> 1 2010-12-26 70488. 884674. 0 1
#> 2 2010-12-27 74133. 888636. 1 1
#> 3 2010-12-28 66635. 838681. 2 1
#> 4 2010-12-29 67931. 808998. 3 1
#> 5 2010-12-30 68032. 868329. 4 1
#> 6 2010-12-31 76891. 826684. 5 1
#> 7 2011-01-01 70793. 890401. 6 1
#> 8 2011-01-02 60427. 846447. 7 1
#> 9 2011-01-03 69902. 886152. 8 1
#> 10 2011-01-04 64253. 859245. 9 1
#>
#> [[2]]
#> # A tibble: 10 x 5
#> date x y id group
#> <date> <dbl> <dbl> <int> <dbl>
#> 1 2011-01-05 74260. 844636. 10 2
#> 2 2011-01-06 75631. 807722. 11 2
#> 3 2011-01-07 74443. 840540. 12 2
#> 4 2011-01-08 78903. 811777. 13 2
#> 5 2011-01-09 78531. 894333. 14 2
#> 6 2011-01-10 79310. 812625. 15 2
#> 7 2011-01-11 71701. 801691. 16 2
#> 8 2011-01-12 63254. 854752. 17 2
#> 9 2011-01-13 72813. 837910. 18 2
#> 10 2011-01-14 62718. 877568. 19 2
Created on 2021-07-05 by the reprex package (v2.0.0)

Related

Using lag function to find the last value for a specific individual

I'm trying to create a column in my spreadsheet that takes the last recorded value (IC) for a specific individual (by the Datetime column) and populates it into a column (LIC) for the current event.
A sub-sample of my data looks like this (actual dataset has 4949 rows and 37 individuals):
> head(ACdatas.scale)
Date Datetime ID.2 IC LIC
1 2019-05-25 2019-05-25 11:57 139 High NA
2 2019-06-09 2019-06-09 19:42 139 Low NA
3 2019-07-05 2019-07-05 20:12 139 Medium NA
4 2019-07-27 2019-07-27 17:27 152 Low NA
5 2019-08-04 2019-08-04 9:13 152 Medium NA
6 2019-08-04 2019-08-04 16:18 139 Medium NA
I would like to be able to populate the last value from the IC column into the current LIC column for the current event (see below)
> head(ACdatas.scale)
Date Datetime ID.2 IC LIC
1 2019-05-25 2019-05-25 11:57 139 High NA
2 2019-06-09 2019-06-09 19:42 139 Low High
3 2019-07-05 2019-07-05 20:12 139 Medium Low
4 2019-07-27 2019-07-27 17:27 152 Low NA
5 2019-08-04 2019-08-04 9:13 152 Medium Low
6 2019-08-04 2019-08-04 16:18 139 Medium Medium
I've tried the following code:
ACdatas.scale <- ACdatas.scale %>%
arrange(ID.2, Datetime) %>%
group_by(ID.2) %>%
mutate(LIC= lag(IC))
This worked some of the time, but when I checked back through the data, it seemed to have a problem when the date switched, so it could accurately populate the field within the same day, but not when the previous event was on the previous day. Just to make it super confusing, it only had issues with some of the day switches, and not all! Help please!!
Sample data,
dat <- data.frame(id=c(rep("A",5),rep("B",5)), IC=c(1:5,11:15))
dplyr
library(dplyr)
dat %>%
group_by(id) %>%
mutate(LIC = lag(IC)) %>%
ungroup()
# # A tibble: 10 x 3
# id IC LIC
# <chr> <int> <int>
# 1 A 1 NA
# 2 A 2 1
# 3 A 3 2
# 4 A 4 3
# 5 A 5 4
# 6 B 11 NA
# 7 B 12 11
# 8 B 13 12
# 9 B 14 13
# 10 B 15 14
data.table
library(data.table)
as.data.table(dat)[, LIC := shift(IC, type = "lag"), by = .(id)][]
# id IC LIC
# <char> <int> <int>
# 1: A 1 NA
# 2: A 2 1
# 3: A 3 2
# 4: A 4 3
# 5: A 5 4
# 6: B 11 NA
# 7: B 12 11
# 8: B 13 12
# 9: B 14 13
# 10: B 15 14
base R
dat$LIC <- ave(dat$IC, dat$id, FUN = function(z) c(NA, z[-length(z)]))
dat
# id IC LIC
# 1 A 1 NA
# 2 A 2 1
# 3 A 3 2
# 4 A 4 3
# 5 A 5 4
# 6 B 11 NA
# 7 B 12 11
# 8 B 13 12
# 9 B 14 13
# 10 B 15 14
By using your data:
mydat <- structure(list(Date = structure(c(18041, 18056, 18082,
18104, 18112, 18112),
class = "Date"),
Datetime = structure(c(1558760220,1560084120,
1562332320, 1564223220,
1564884780, 1564910280),
class = c("POSIXct","POSIXt"),
tzone = ""),
ID.2 = c(139, 139, 139, 152, 152, 139),
IC = c("High", "Low", "Medium", "Low", "Medium", "Medium"),
LIC = c(NA, NA, NA, NA, NA, NA)), row.names = c(NA, -6L),
class = "data.frame")
mydat %>% arrange(Datetime) %>% group_by(ID.2) %>% mutate(LIC = lag(IC))
# A tibble: 6 x 5
# Groups: ID.2 [2]
Date Datetime ID.2 IC LIC
<date> <dttm> <dbl> <chr> <chr>
1 2019-05-25 2019-05-25 11:57:00 139 High NA
2 2019-06-09 2019-06-09 19:42:00 139 Low High
3 2019-07-05 2019-07-05 20:12:00 139 Medium Low
4 2019-07-27 2019-07-27 17:27:00 152 Low NA
5 2019-08-04 2019-08-04 09:13:00 152 Medium Low
6 2019-08-04 2019-08-04 16:18:00 139 Medium Medium

How to print a date when the input is number of days since 01-01-60?

I received a set of dates, but it turns out that time is reported in days since 01-01-1960 in this specific data set.
D_INDDTO
1 20758
2 20856
3 21062
4 19740
5 21222
6 21203
The specific date of interest for Patient 1 is 20758 days since 01-01-60
I want to create a new covariate u$date containing the specific date of interest i d%m%y%. I tried
library(tidyverse)
u %>% mutate(date=as.date(D_INDDTO,origin="1960-01-01")
But that did not solve it.
u <- structure(list(D_INDDTO = c(20758, 20856, 21062, 19740, 21222,
21203, 20976, 20895, 18656, 18746)), row.names = c(NA, 10L), class = "data.frame")
Try this:
#Code 1
u %>% mutate(date=as.Date("1960-01-01")+D_INDDTO)
Output:
D_INDDTO date
1 20758 2016-10-31
2 20856 2017-02-06
3 21062 2017-08-31
4 19740 2014-01-17
5 21222 2018-02-07
6 21203 2018-01-19
7 20976 2017-06-06
8 20895 2017-03-17
9 18656 2011-01-29
10 18746 2011-04-29
Or this:
#Code 2
u %>% mutate(date=as.Date(D_INDDTO,origin="1960-01-01"))
Output:
D_INDDTO date
1 20758 2016-10-31
2 20856 2017-02-06
3 21062 2017-08-31
4 19740 2014-01-17
5 21222 2018-02-07
6 21203 2018-01-19
7 20976 2017-06-06
8 20895 2017-03-17
9 18656 2011-01-29
10 18746 2011-04-29
Or this:
#Code 3
u %>% mutate(date=format(as.Date(D_INDDTO,origin="1960-01-01"),'%d%m%y'))
Output:
D_INDDTO date
1 20758 311016
2 20856 060217
3 21062 310817
4 19740 170114
5 21222 070218
6 21203 190118
7 20976 060617
8 20895 170317
9 18656 290111
10 18746 290411
If more customization is required:
#Code 4
u %>% mutate(date=format(as.Date(D_INDDTO,origin="1960-01-01"),'%d-%m-%Y'))
Output:
D_INDDTO date
1 20758 31-10-2016
2 20856 06-02-2017
3 21062 31-08-2017
4 19740 17-01-2014
5 21222 07-02-2018
6 21203 19-01-2018
7 20976 06-06-2017
8 20895 17-03-2017
9 18656 29-01-2011
10 18746 29-04-2011

A running sum for daily data that resets when month turns

I have a 2 column table (tibble), made up of a date object and a numeric variable. There is maximum one entry per day but not every day has an entry (ie date is a natural primary key). I am attempting to do a running sum of the numeric column along with dates but with the running sum resetting when the month turns (the data is sorted by ascending date). I have replicated what I want to get as a result below.
Date score monthly.running.sum
10/2/2019 7 7
10/9/2019 6 13
10/16/2019 12 25
10/23/2019 2 27
10/30/2019 13 40
11/6/2019 2 2
11/13/2019 4 6
11/20/2019 15 21
11/27/2019 16 37
12/4/2019 4 4
12/11/2019 24 28
12/18/2019 28 56
12/25/2019 8 64
1/1/2020 1 1
1/8/2020 15 16
1/15/2020 9 25
1/22/2020 8 33
It looks like the package "runner" is possibly suited to this but I don't really understand how to instruct it. I know I could use a join operation plus a group_by using dplyr to do this, but the data set is very very large and doing so would be wildly inefficient. i could also manually iterate through the list with a loop, but that also seems inelegant. last option i can think of is selecting out a unique vector of yearmon objects and then cutting the original list into many shorter lists and running a plain cumsum on it, but that also feels unoptimal. I am sure this is not the first time someone has to do this, and given how many tools there is in the tidyverse to do things, I think I just need help finding the right one. The reason I am looking for a tool instead of using one of the methods I described above (which would take less time than writing this post) is because this code needs to be very very readable by an audience that is less comfortable with code.
We can also use data.table
library(data.table)
setDT(df)[, Date := as.IDate(Date, "%m/%d/%Y")
][, monthly.running.sum := cumsum(score),format(Date, "%Y-%m")][]
# Date score monthly.running.sum
# 1: 2019-10-02 7 7
# 2: 2019-10-09 6 13
# 3: 2019-10-16 12 25
# 4: 2019-10-23 2 27
# 5: 2019-10-30 13 40
# 6: 2019-11-06 2 2
# 7: 2019-11-13 4 6
# 8: 2019-11-20 15 21
# 9: 2019-11-27 16 37
#10: 2019-12-04 4 4
#11: 2019-12-11 24 28
#12: 2019-12-18 28 56
#13: 2019-12-25 8 64
#14: 2020-01-01 1 1
#15: 2020-01-08 15 16
#16: 2020-01-15 9 25
#17: 2020-01-22 8 33
data
df <- structure(list(Date = c("10/2/2019", "10/9/2019", "10/16/2019",
"10/23/2019", "10/30/2019", "11/6/2019", "11/13/2019", "11/20/2019",
"11/27/2019", "12/4/2019", "12/11/2019", "12/18/2019", "12/25/2019",
"1/1/2020", "1/8/2020", "1/15/2020", "1/22/2020"), score = c(7L,
6L, 12L, 2L, 13L, 2L, 4L, 15L, 16L, 4L, 24L, 28L, 8L, 1L, 15L,
9L, 8L)), row.names = c(NA, -17L), class = "data.frame")
Using lubridate, you can extract month and year values from the date, group_by those values and them perform the cumulative sum as follow:
library(lubridate)
library(dplyr)
df %>% mutate(Month = month(mdy(Date)),
Year = year(mdy(Date))) %>%
group_by(Month, Year) %>%
mutate(SUM = cumsum(score))
# A tibble: 17 x 6
# Groups: Month, Year [4]
Date score monthly.running.sum Month Year SUM
<chr> <int> <int> <int> <int> <int>
1 10/2/2019 7 7 10 2019 7
2 10/9/2019 6 13 10 2019 13
3 10/16/2019 12 25 10 2019 25
4 10/23/2019 2 27 10 2019 27
5 10/30/2019 13 40 10 2019 40
6 11/6/2019 2 2 11 2019 2
7 11/13/2019 4 6 11 2019 6
8 11/20/2019 15 21 11 2019 21
9 11/27/2019 16 37 11 2019 37
10 12/4/2019 4 4 12 2019 4
11 12/11/2019 24 28 12 2019 28
12 12/18/2019 28 56 12 2019 56
13 12/25/2019 8 64 12 2019 64
14 1/1/2020 1 1 1 2020 1
15 1/8/2020 15 16 1 2020 16
16 1/15/2020 9 25 1 2020 25
17 1/22/2020 8 33 1 2020 33
An alternative will be to use floor_date function in order ot convert each date as the first day of each month and the calculate the cumulative sum:
library(lubridate)
library(dplyr)
df %>% mutate(Floor = floor_date(mdy(Date), unit = "month")) %>%
group_by(Floor) %>%
mutate(SUM = cumsum(score))
# A tibble: 17 x 5
# Groups: Floor [4]
Date score monthly.running.sum Floor SUM
<chr> <int> <int> <date> <int>
1 10/2/2019 7 7 2019-10-01 7
2 10/9/2019 6 13 2019-10-01 13
3 10/16/2019 12 25 2019-10-01 25
4 10/23/2019 2 27 2019-10-01 27
5 10/30/2019 13 40 2019-10-01 40
6 11/6/2019 2 2 2019-11-01 2
7 11/13/2019 4 6 2019-11-01 6
8 11/20/2019 15 21 2019-11-01 21
9 11/27/2019 16 37 2019-11-01 37
10 12/4/2019 4 4 2019-12-01 4
11 12/11/2019 24 28 2019-12-01 28
12 12/18/2019 28 56 2019-12-01 56
13 12/25/2019 8 64 2019-12-01 64
14 1/1/2020 1 1 2020-01-01 1
15 1/8/2020 15 16 2020-01-01 16
16 1/15/2020 9 25 2020-01-01 25
17 1/22/2020 8 33 2020-01-01 33
A base R alternative :
df$Date <- as.Date(df$Date, "%m/%d/%Y")
df$monthly.running.sum <- with(df, ave(score, format(Date, "%Y-%m"),FUN = cumsum))
df
# Date score monthly.running.sum
#1 2019-10-02 7 7
#2 2019-10-09 6 13
#3 2019-10-16 12 25
#4 2019-10-23 2 27
#5 2019-10-30 13 40
#6 2019-11-06 2 2
#7 2019-11-13 4 6
#8 2019-11-20 15 21
#9 2019-11-27 16 37
#10 2019-12-04 4 4
#11 2019-12-11 24 28
#12 2019-12-18 28 56
#13 2019-12-25 8 64
#14 2020-01-01 1 1
#15 2020-01-08 15 16
#16 2020-01-15 9 25
#17 2020-01-22 8 33
The yearmon class represents year/month objects so just convert the dates to yearmon and accumulate by them using this one-liner:
library(zoo)
transform(DF, run.sum = ave(score, as.yearmon(Date, "%m/%d/%Y"), FUN = cumsum))
giving:
Date score run.sum
1 10/2/2019 7 7
2 10/9/2019 6 13
3 10/16/2019 12 25
4 10/23/2019 2 27
5 10/30/2019 13 40
6 11/6/2019 2 2
7 11/13/2019 4 6
8 11/20/2019 15 21
9 11/27/2019 16 37
10 12/4/2019 4 4
11 12/11/2019 24 28
12 12/18/2019 28 56
13 12/25/2019 8 64
14 1/1/2020 1 1
15 1/8/2020 15 16
16 1/15/2020 9 25
17 1/22/2020 8 33

How can I fill missing data points in R for a given dataframe

I have a dataframe which contains dates, products and amounts. However product b is not on every date, I would like it to be with an NA or 0 balance. Is this possible?
Summary_Date <-
as.Date(c("2017-01-31",
"2017-02-28",
"2017-03-31",
"2017-03-31",
"2017-04-30",
"2017-05-31",
"2017-05-31",
"2017-06-30"))
Product <-
as.character(c("a","a","a","b","a","a","b","a"))
Amounts <-
as.numeric(c(10,10,10,20,10,10,20,10))
df <- data.frame(Summary_Date,Product,Amounts)
Regards,
Aksel
You can use tidyr:
> library(tidyr)
> complete(data = df,Summary_Date,Product)
# A tibble: 12 x 3
Summary_Date Product Amounts
<date> <fctr> <dbl>
1 2017-01-31 a 10
2 2017-01-31 b NA
3 2017-02-28 a 10
4 2017-02-28 b NA
5 2017-03-31 a 10
6 2017-03-31 b 20
7 2017-04-30 a 10
8 2017-04-30 b NA
9 2017-05-31 a 10
10 2017-05-31 b 20
11 2017-06-30 a 10
12 2017-06-30 b NA

Subsets in subsets without looping in R

this is my starting data:
days <- c("01.01.2018","01.01.2018","01.01.2018","01.01.2018",
"02.01.2018","02.01.2018","02.01.2018","02.01.2018",
"03.01.2018","03.01.2018","03.01.2018","03.01.2018")
time <- c("00:00:00","08:00:00","12:00:00","16:00:00",
"00:00:00","08:00:00","12:00:00","16:00:00",
"00:00:00","08:00:00","12:00:00","16:00:00")
a <- c(10,12,11,14,
12,22,24,20,
11,8,13,16)
b <- c(18,22,26,21,
2,6,7,5,
27,31,29,26)
c <- a-b
d <- c(10,10,10,10,
20,20,20,20,
30,30,30,30)
df <- data.frame(days,time,a,b,c,d)
so df will come out as:
days time a b c d
1 01.01.2018 00:00:00 10 18 -8 10
2 01.01.2018 08:00:00 12 22 -10 10
3 01.01.2018 12:00:00 11 26 -15 10
4 01.01.2018 16:00:00 14 21 -7 10
5 02.01.2018 00:00:00 12 2 10 20
6 02.01.2018 08:00:00 22 6 16 20
7 02.01.2018 12:00:00 24 7 17 20
8 02.01.2018 16:00:00 20 5 15 20
9 03.01.2018 00:00:00 11 27 -16 30
10 03.01.2018 08:00:00 8 31 -23 30
11 03.01.2018 12:00:00 13 29 -16 30
12 03.01.2018 16:00:00 16 26 -10 30
in this dataframe i'd like to
for each day
find the first c value <-10
add the corresponding d values to ranges from the c value found before and the last c value of the day
this is what i've come up:
ndays <- unique(df$days)
for(i in 1:length(ndays)) {
if(!is.na(df[(df$days == ndays[i] & df$c <= -10),]$c[1]))
{
df[(df$days == ndays[i] & df$c <= -10),]$c <- df[(df$days == ndays[i] & df$c <= -10),]$c + df[(df$days == ndays[i] & df$c <= -10),]$d
}
}
Output will be:
days time a b c d
1 01.01.2018 00:00:00 10 18 -8 10
2 01.01.2018 08:00:00 12 22 0 10
3 01.01.2018 12:00:00 11 26 -5 10
4 01.01.2018 16:00:00 14 21 -7 10
5 02.01.2018 00:00:00 12 2 10 20
6 02.01.2018 08:00:00 22 6 16 20
7 02.01.2018 12:00:00 24 7 17 20
8 02.01.2018 16:00:00 20 5 15 20
9 03.01.2018 00:00:00 11 27 14 30
10 03.01.2018 08:00:00 8 31 7 30
11 03.01.2018 12:00:00 13 29 14 30
12 03.01.2018 16:00:00 16 26 20 30
Problem is, i'd like not to use a for loop since is slow, and is not adding d to the entire day. df$c[4] should be 3.
There is a solution using dplyr and lubridate. I'm not 100% sure to understand what you want to do, but I think it should help you to solve your problem.
days <- c("01.01.2018","01.01.2018","01.01.2018","01.01.2018",
"02.01.2018","02.01.2018","02.01.2018","02.01.2018",
"03.01.2018","03.01.2018","03.01.2018","03.01.2018")
time <- c("00:00:00","08:00:00","12:00:00","16:00:00",
"00:00:00","08:00:00","12:00:00","16:00:00",
"00:00:00","08:00:00","12:00:00","16:00:00")
a <- c(10,12,11,14,
12,22,24,20,
11,8,13,16)
b <- c(18,22,26,21,
2,6,7,5,
27,31,29,26)
c <- a-b
d <- c(10,10,10,10,
20,20,20,20,
30,30,30,30)
By creating a variable yday with lubridate functions, you can after group by this variable.You can use cumulative maximum cummax.
library(lubridate)
library(dplyr)
df %>%
mutate(yday = day(dmy(days))) %>%
mutate(is_below = c < -10) %>%
group_by(yday) %>%
mutate(to_add = cummax(is_below)) %>%
mutate(c = if_else(to_add == 1, true = c + d, false = c))
#> # A tibble: 12 x 9
#> # Groups: yday [3]
#> days time a b c d yday is_below to_add
#> <fctr> <fctr> <dbl> <dbl> <dbl> <dbl> <int> <lgl> <int>
#> 1 01.01.2018 00:00:00 10 18 -8 10 1 FALSE 0
#> 2 01.01.2018 08:00:00 12 22 -10 10 1 FALSE 0
#> 3 01.01.2018 12:00:00 11 26 -5 10 1 TRUE 1
#> 4 01.01.2018 16:00:00 14 21 3 10 1 FALSE 1
#> 5 02.01.2018 00:00:00 12 2 10 20 2 FALSE 0
#> 6 02.01.2018 08:00:00 22 6 16 20 2 FALSE 0
#> 7 02.01.2018 12:00:00 24 7 17 20 2 FALSE 0
#> 8 02.01.2018 16:00:00 20 5 15 20 2 FALSE 0
#> 9 03.01.2018 00:00:00 11 27 14 30 3 TRUE 1
#> 10 03.01.2018 08:00:00 8 31 7 30 3 TRUE 1
#> 11 03.01.2018 12:00:00 13 29 14 30 3 TRUE 1
#> 12 03.01.2018 16:00:00 16 26 20 30 3 FALSE 1

Resources