I have a dataframe with a lot of time series:
1 0:03 B 1
2 0:05 A 1
3 0:05 A 1
4 0:05 B 1
5 0:10 A 1
6 0:10 B 1
7 0:14 B 1
8 0:18 A 1
9 0:20 A 1
10 0:23 B 1
11 0:30 A 1
I want to group the time series into every 6 minutes and count the frequency of A and B:
1 0:06 A 2
2 0:06 B 2
3 0:12 A 1
4 0:12 B 1
5 0:18 A 1
6 0:24 A 1
7 0:24 B 1
8 0:18 A 1
9 0:30 A 1
Also, the class of the time series is character. What should I do?
Here's an approach to convert times to POSIXct, cut the times by 6 minute intervals, then count.
First, you need to specify the year, month, day, hour, minute, and seconds of your data. This will help with scaling it to larger datasets.
library(tidyverse)
library(lubridate)
# sample data
d <- data.frame(t = paste0("2019-06-02 ",
c("0:03","0:06","0:09","0:12","0:15",
"0:18","0:21","0:24","0:27","0:30"),
":00"),
g = c("A","A","B","B","B"))
d$t <- ymd_hms(d$t) # convert to POSIXct with `lubridate::ymd_hms()`
If you check the class of your new date column, you will see it is "POSIXct".
> class(d$t)
[1] "POSIXct" "POSIXt"
Now that the data is in "POSIXct", you can cut it by minute intervals! We will add this new grouping factor as a new column called tc.
d$tc <- cut(d$t, breaks = "6 min")
d
t g tc
1 2019-06-02 00:03:00 A 2019-06-02 00:03:00
2 2019-06-02 00:06:00 A 2019-06-02 00:03:00
3 2019-06-02 00:09:00 B 2019-06-02 00:09:00
4 2019-06-02 00:12:00 B 2019-06-02 00:09:00
5 2019-06-02 00:15:00 B 2019-06-02 00:15:00
6 2019-06-02 00:18:00 A 2019-06-02 00:15:00
7 2019-06-02 00:21:00 A 2019-06-02 00:21:00
8 2019-06-02 00:24:00 B 2019-06-02 00:21:00
9 2019-06-02 00:27:00 B 2019-06-02 00:27:00
10 2019-06-02 00:30:00 B 2019-06-02 00:27:00
Now you can group_by this new interval (tc) and your grouping column (g), and count the frequency of occurences. Getting the frequency of observations in a group is a fairly common operation, so dplyr provides count for this:
count(d, g, tc)
# A tibble: 7 x 3
g tc n
<fct> <fct> <int>
1 A 2019-06-02 00:03:00 2
2 A 2019-06-02 00:15:00 1
3 A 2019-06-02 00:21:00 1
4 B 2019-06-02 00:09:00 2
5 B 2019-06-02 00:15:00 1
6 B 2019-06-02 00:21:00 1
7 B 2019-06-02 00:27:00 2
If you run ?dplyr::count() in the console, you'll see that count(d, tc) is simply a wrapper for group_by(d, g, tc) %>% summarise(n = n()).
According to the sample dataset, the time series is given as time-of-day, i.e., without date.
The data.table package has the ITime class which is a time-of-day class stored as the integer number of seconds in the day. With data.table, we can use a rolling join to map times to the upper limit of the 6 minutes intervals (right-closed intervals):
library(data.table)
# coerce from character to class ITime
setDT(ts)[, time := as.ITime(time)]
# create sequence of breaks
breaks <- as.ITime(seq(as.ITime("0:00"), as.ITime("23:59:59"), as.ITime("0:06")))
# rolling join and aggregate
ts[, CJ(breaks, group, unique = TRUE)
][ts, on = .(group, breaks = time), roll = -Inf, .(x.breaks, group)
][, .N, by = .(upper = x.breaks, group)]
which returns
upper group N
1: 00:06:00 B 2
2: 00:06:00 A 2
3: 00:12:00 A 1
4: 00:12:00 B 1
5: 00:18:00 B 1
6: 00:18:00 A 1
7: 00:24:00 A 1
8: 00:24:00 B 1
9: 00:30:00 A 1
Addendum
If the direction of the rolling join is changed (roll = +Inf instead of roll = -Inf) we get left-closed intervals
ts[, CJ(breaks, group, unique = TRUE)
][ts, on = .(group, breaks = time), roll = +Inf, .(x.breaks, group)
][, .N, by = .(lower = x.breaks, group)]
which changes the result significantly:
lower group N
1: 00:00:00 B 2
2: 00:00:00 A 2
3: 00:06:00 A 1
4: 00:06:00 B 1
5: 00:12:00 B 1
6: 00:18:00 A 2
7: 00:18:00 B 1
8: 00:30:00 A 1
Data
library(data.table)
ts <- fread("
1 0:03 B 1
2 0:05 A 1
3 0:05 A 1
4 0:05 B 1
5 0:10 A 1
6 0:10 B 1
7 0:14 B 1
8 0:18 A 1
9 0:20 A 1
10 0:23 B 1
11 0:30 A 1"
, header = FALSE
, col.names = c("rn", "time", "group", "value"))
Related
I want to create a variable with the number of the day a participant took a survey (first day, second day, thirds day, etc.)
The issue is that there are participants that took the survey after midnight.
For example, this is what it looks like:
Id
date
1
08/03/2020 08:17
1
08/03/2020 12:01
1
08/04/2020 15:08
1
08/04/2020 22:16
2
07/03/2020 08:10
2
07/03/2020 12:03
2
07/04/2020 15:07
2
07/05/2020 00:16
3
08/22/2020 09:17
3
08/23/2020 11:04
3
08/24/2020 00:01
4
10/03/2020 08:37
4
10/03/2020 11:13
4
10/04/2020 15:20
4
10/04/2020 23:05
This is what I want:
Id
date
day
1
08/03/2020 08:17
1
1
08/03/2020 12:01
1
1
08/04/2020 15:08
2
1
08/04/2020 22:16
2
2
07/03/2020 08:10
1
2
07/03/2020 12:03
1
2
07/04/2020 15:07
2
2
07/05/2020 00:16
2
3
08/22/2020 09:17
1
3
08/23/2020 11:04
2
3
08/24/2020 00:01
2
4
10/03/2020 08:37
1
4
10/03/2020 11:13
1
4
10/04/2020 15:20
2
4
10/04/2020 23:05
2
How can I create the day variable taking into consideration participants that who took the survey after midnight still belong to the previous day?
I tried the codes here. But I have issues with participants taking surveys after midnight.
Please check the below code
code
data2 <- data %>%
mutate(date2 = as.Date(date, format = "%m/%d/%Y %H:%M")) %>%
group_by(id) %>%
mutate(row = row_number(),
date3 = as.Date(ifelse(row == 1, date2, NA), origin = "1970-01-01")) %>%
fill(date3) %>%
ungroup() %>%
mutate(diff = as.numeric(date2 - date3 + 1)) %>%
select(-date2, -date3, -row)
output
#> id date diff
#> 1 1 08/03/2020 08:17 1
#> 2 1 08/03/2020 12:01 1
#> 3 1 08/04/2020 15:08 2
#> 4 1 08/04/2020 22:16 2
#> 5 2 07/03/2020 08:10 1
#> 6 2 07/03/2020 12:03 1
#> 7 2 07/04/2020 15:07 2
#> 8 2 07/05/2020 00:16 3
Here is one approach that explicitly will show dates considered. First, would make sure your date is in POSIXct format as suggested in comments (if not done already). Then, if the hour is less than 2 (midnight to 2 AM) subtract 1 from the date so the survey_date reflects the day before. If the hour is not less than 2, just keep the date. The timezone tz argument is set to "" to avoid confusion or uncertainty. Finally, after grouping by Id, subtract each survey_date from the first survey_date to get number of days since first survey. You can use as.numeric to make this column numeric if desired.
Note: if you want to just note consecutive days taken the survey (and ignore gaps in days between surveys) you can substitute for the last line:
mutate(day = cumsum(survey_date != lag(survey_date, default = first(survey_date))) + 1)
This will increase day by 1 every new survey_date found for a given Id.
library(tidyverse)
library(lubridate)
df %>%
mutate(date = as.POSIXct(date, format = "%m/%d/%Y %H:%M", tz = "")) %>%
mutate(survey_date = if_else(hour(date) < 2,
as.Date(date, format = "%Y-%m-%d", tz = "") - 1,
as.Date(date, format = "%Y-%m-%d", tz = ""))) %>%
group_by(Id) %>%
mutate(day = survey_date - first(survey_date) + 1)
Output
Id date survey_date day
<int> <dttm> <date> <drtn>
1 1 2020-08-03 08:17:00 2020-08-03 1 days
2 1 2020-08-03 12:01:00 2020-08-03 1 days
3 1 2020-08-04 15:08:00 2020-08-04 2 days
4 1 2020-08-04 22:16:00 2020-08-04 2 days
5 2 2020-07-03 08:10:00 2020-07-03 1 days
6 2 2020-07-03 12:03:00 2020-07-03 1 days
7 2 2020-07-04 15:07:00 2020-07-04 2 days
8 2 2020-07-05 00:16:00 2020-07-04 2 days
9 3 2020-08-22 09:17:00 2020-08-22 1 days
10 3 2020-08-23 11:04:00 2020-08-23 2 days
11 3 2020-08-24 00:01:00 2020-08-23 2 days
12 4 2020-10-03 08:37:00 2020-10-03 1 days
13 4 2020-10-03 11:13:00 2020-10-03 1 days
14 4 2020-10-04 15:20:00 2020-10-04 2 days
15 4 2020-10-04 23:05:00 2020-10-04 2 days
I am trying to create a new column that assigns a unique value to the observation (row) only IF the recorded observation occur after a specific time following the last observation (see data frame).
Context:
I set up camera trap to observe what species visit a particular plot, every visit by a species should get a unique visitID. The actual database contains more complexity but this is the main problem I have.
new.df <- data.frame(
species = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
visit.time = c(seq(ymd_hm('2015-01-01 00:00'), ymd_hm('2015-01-01 00:10'), by = '2 mins'),
seq(ymd_hm('2015-01-01 00:00'), ymd_hm('2015-01-01 00:10'), by = '2 mins'))
)
> new.df
species visit.time
1 A 2015-01-01 00:00:00
2 A 2015-01-01 00:02:00
3 A 2015-01-01 00:04:00
4 A 2015-01-01 00:06:00
5 A 2015-01-01 00:08:00
6 A 2015-01-01 00:10:00
7 B 2015-01-01 00:00:00
8 B 2015-01-01 00:02:00
9 B 2015-01-01 00:04:00
10 B 2015-01-01 00:06:00
11 B 2015-01-01 00:08:00
12 B 2015-01-01 00:10:00
I would like to create a new column called "visitID" that records an each species' visit that occured. However, I only want to assign a unique number only of the visit occurred at least 2 minutes after the previous recorded visit:
> new.df
species visit.time visitID
1 A 2015-01-01 00:00:00 1
2 A 2015-01-01 00:02:00 -
3 A 2015-01-01 00:04:00 2
4 A 2015-01-01 00:06:00 -
5 A 2015-01-01 00:08:00 3
6 A 2015-01-01 00:10:00 -
7 B 2015-01-01 00:00:00 1
8 B 2015-01-01 00:02:00 -
9 B 2015-01-01 00:04:00 2
10 B 2015-01-01 00:06:00 -
11 B 2015-01-01 00:08:00 3
12 B 2015-01-01 00:10:00 -
where - is just an NA
I would usually try using dplyr:mutate with conditional terms ifelse, the problem is I do not know how to account for time elapse from the previous visit.
Please let me know if there are more details that could provide. Thanks!
From your desired output it seems you want a new ID when the time difference between the current and the last recorded visit that received a new ID exceeds 2 minutes. In that case, we could use a cumulative sum that resets at a certain threshold. I've used the function from this answer: dplyr / R cumulative sum with reset
sum_reset_at <- function(thresh) {
function(x) {
accumulate(x, ~if_else(.x>thresh, .y, .x+.y))
}
}
new.df <- new.df %>%
group_by(species) %>% # group df by species
arrange(species, visit.time) %>% # sort the data
mutate(
time.elapsed = as.numeric(difftime(visit.time, lag(visit.time), units = "mins")), # calculate time difference in minutes
time.elapsed = ifelse(is.na(time.elapsed), 0, time.elapsed), # replace NAs at first entries with 0s
time.elapsed.cum = sum_reset_at(2)(time.elapsed), # build cumulative sum that resets once the value is greater (not greater or equal) to two
newID = ifelse(time.elapsed.cum > 2, TRUE, FALSE), # build logical vector that marks the position where a new ID starts
visitID = cumsum(newID) + 1, # generate visit IDs
visitID = replace(visitID, duplicated(visitID), NA) # keep only first entry of an id, replace rest with NA
)
Output:
> new.df
# A tibble: 12 x 6
# Groups: species [2]
species visit.time time.elapsed time.elapsed.cum newID visitID
<fct> <dttm> <dbl> <dbl> <lgl> <dbl>
1 A 2015-01-01 00:00:00 0 0 FALSE 1
2 A 2015-01-01 00:02:00 2 2 FALSE NA
3 A 2015-01-01 00:04:00 2 4 TRUE 2
4 A 2015-01-01 00:06:00 2 2 FALSE NA
5 A 2015-01-01 00:08:00 2 4 TRUE 3
6 A 2015-01-01 00:10:00 2 2 FALSE NA
7 B 2015-01-01 00:00:00 0 0 FALSE 1
8 B 2015-01-01 00:02:00 2 2 FALSE NA
9 B 2015-01-01 00:04:00 2 4 TRUE 2
10 B 2015-01-01 00:06:00 2 2 FALSE NA
11 B 2015-01-01 00:08:00 2 4 TRUE 3
12 B 2015-01-01 00:10:00 2 2 FALSE NA
So basically we are summing up the time differences until they exceed two minutes, then we reset the sum to zero. Where this cumsum is greater than two we need to add a new ID. We do this by adding a logical vector and building the cumsum of that vector (because TRUE = 1 and FALSE = 0). Lastly, we replace the duplicated IDs in the groups to get the output you specified. We can drop the columns you don't need:
> new.df %>% select(-c(time.elapsed, time.elapsed.cum, newID))
# A tibble: 12 x 3
# Groups: species [2]
species visit.time visitID
<fct> <dttm> <dbl>
1 A 2015-01-01 00:00:00 1
2 A 2015-01-01 00:02:00 NA
3 A 2015-01-01 00:04:00 2
4 A 2015-01-01 00:06:00 NA
5 A 2015-01-01 00:08:00 3
6 A 2015-01-01 00:10:00 NA
7 B 2015-01-01 00:00:00 1
8 B 2015-01-01 00:02:00 NA
9 B 2015-01-01 00:04:00 2
10 B 2015-01-01 00:06:00 NA
11 B 2015-01-01 00:08:00 3
12 B 2015-01-01 00:10:00 NA
You can return the differences using diff(). Just make sure to prepend a 2 to each group of species, i.e. c(2, diff(visit.time) / 60), so that the first visit for each species always gets an ID (R will throw an error otherwise).
The only criterion you've given for visitID is that the values for each species are unique, but not that they are consecutive, so I'll assume that 1 5 6 is just as valid as 1 2 3. This simplifies things quite a bit:
library(dplyr)
df %>%
group_by(species) %>%
mutate(tdiff = c(2, diff(visit.time) / 60),
visitID = seq_along(species),
visitID = ifelse(tdiff >= 2, visitID, NA)
)
Which will return the following data frame:
# A tibble: 12 x 4
# Groups: species [2]
species visit.time tdiff visitID
<fct> <dttm> <dbl> <int>
1 A 2015-01-01 00:02:10 2 1
2 A 2015-01-01 00:03:00 0.833 NA
3 A 2015-01-01 00:03:10 0.167 NA
4 A 2015-01-01 00:04:00 0.833 NA
5 A 2015-01-01 00:07:40 3.67 5
6 A 2015-01-01 00:09:40 2 6
7 B 2015-01-01 00:00:40 2 1
8 B 2015-01-01 00:01:10 0.5 NA
9 B 2015-01-01 00:04:10 3 3
10 B 2015-01-01 00:05:40 1.5 NA
11 B 2015-01-01 00:09:40 4 5
12 B 2015-01-01 00:09:50 0.167 NA
Note that I've used a modified dataset because the differences between the times in the example you provide are all == 2.
Data:
df <- structure(list(species = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
visit.time = structure(c(1420070530, 1420070580, 1420070590,
1420070640, 1420070860, 1420070980, 1420070440, 1420070470,
1420070650, 1420070740, 1420070980, 1420070990), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA,
-12L))
So I basically got a while loop function that creates 1's in the "algorithm_column" based on the highest percentages in the "percent" column, until a certain total percentage is reached (90% or something). The rest of the rows that are not taken into account will have a value of 0 in the "algorithm_column" ( Create while loop function that takes next largest value untill condition is met)
I want to show, based on what the loop function found, the min and max times of the column "timeinterval" (the min is where the 1's start and max is the last row with a 1, the 0's are out of the scope). And then finally create a time interval from this.
So if we have the following code, I want to create in another column, lets say "total_time" a calculation from the min time 09:00 ( this is where 1 start in the algorithm_column) until 11:15, which makes a time interval of 02:15 hours added to the "total_time" column.
algorithm
# pc4 timeinterval stops percent idgroup algorithm_column
#1 5464 08:45:00 1 1.3889 1 0
#2 5464 09:00:00 5 6.9444 2 1
#3 5464 09:15:00 8 11.1111 3 1
#4 5464 09:30:00 7 9.7222 4 1
#5 5464 09:45:00 5 6.9444 5 1
#6 5464 10:00:00 10 13.8889 6 1
#7 5464 10:15:00 6 8.3333 7 1
#8 5464 10:30:00 4 5.5556 8 1
#9 5464 10:45:00 7 9.7222 9 1
#10 5464 11:00:00 6 8.3333 10 1
#11 5464 11:15:00 5 6.9444 11 1
#12 5464 11:30:00 8 11.1111 12 0
I have multiple pc4 groups, so it should look at every group and calculate a total_time for each group respectively.
I got this function, but I'm a bit stuck if this is what I need.
test <- function(x) {
ind <- x[["algorithm$algorithm_column"]] == 0
Mx <- max(x[["timeinterval"]][ind], na.rm = TRUE);
ind <- x[["algorithm$algorithm_column"]] == 1
Mn <- min(x[["timeinterval"]][ind], na.rm = TRUE);
list(Mn, Mx) ## or return(list(Mn, Mx))
}
test(algorithm)
Here is a dplyr solution.
library(dplyr)
algorithm %>%
mutate(tmp = cumsum(c(0, diff(algorithm_column) != 0))) %>%
filter(algorithm_column == 1) %>%
group_by(pc4, tmp) %>%
summarise(first = first(timeinterval),
last = last(timeinterval)) %>%
select(-tmp)
## A tibble: 1 x 3
## Groups: pc4 [1]
# pc4 first last
# <int> <fct> <fct>
#1 5464 09:00:00 11:15:00
Data.
algorithm <- read.table(text = "
pc4 timeinterval stops percent idgroup algorithm_column
1 5464 08:45:00 1 1.3889 1 0
2 5464 09:00:00 5 6.9444 2 1
3 5464 09:15:00 8 11.1111 3 1
4 5464 09:30:00 7 9.7222 4 1
5 5464 09:45:00 5 6.9444 5 1
6 5464 10:00:00 10 13.8889 6 1
7 5464 10:15:00 6 8.3333 7 1
8 5464 10:30:00 4 5.5556 8 1
9 5464 10:45:00 7 9.7222 9 1
10 5464 11:00:00 6 8.3333 10 1
11 5464 11:15:00 5 6.9444 11 1
12 5464 11:30:00 8 11.1111 12 0
", header = TRUE)
I have data
dt <- data.table(time=as.POSIXct(c("2018-01-01 01:01:00","2018-01-01 01:05:00","2018-01-01 01:01:00")), y=c(1,10,9))
> dt
time y
1: 2018-01-01 01:01:00 1
2: 2018-01-01 01:05:00 10
3: 2018-01-01 01:01:00 9
and I would like to aggregate by time. Usually, I would do
dt[,list(sum=sum(y),count=.N), by="time"]
time sum count
1: 2018-01-01 01:01:00 10 2
2: 2018-01-01 01:05:00 10 1
but this time, I would also like to get zero values for the minutes in between, i.e.,
time sum count
1: 2018-01-01 01:01:00 10 2
2: 2018-01-01 01:02:00 0 0
3: 2018-01-01 01:03:00 0 0
4: 2018-01-01 01:04:00 0 0
5: 2018-01-01 01:05:00 10 1
Could this be done, for example, using an external vector
times <- seq(from=min(dt$time),to=max(dt$time),by="mins")
that can be fed to the data.table function as a grouping variable?
You would typically do with with a join (either before or after the aggregation). For example:
dt <- dt[J(times), on = "time"]
dt[,list(sum=sum(y, na.rm = TRUE), count= sum(!is.na(y))), by=time]
# time sum count
#1: 2018-01-01 01:01:00 10 2
#2: 2018-01-01 01:02:00 0 0
#3: 2018-01-01 01:03:00 0 0
#4: 2018-01-01 01:04:00 0 0
#5: 2018-01-01 01:05:00 10 1
Or in a "piped" version:
dt[J(times), on = "time"][
, .(sum = sum(y, na.rm = TRUE), count= sum(!is.na(y))),
by = time]
I am looking to run a cumulative sum at every row for values that occur in two columns before and after that point. So in this case I have volume of 2 incident types at every given minute over two days. I want to create a column which adds all the incidents that occured before and after for each row by the type. Sumif from excel comes to mind but I'm not sure how to port that over to R:
EDIT: ADDED set.seed and easier numbers
I have the following data set:
set.seed(42)
master_min =
setDT(
data.frame(master_min = seq(
from=as.POSIXct("2016-1-1 0:00", tz="America/New_York"),
to=as.POSIXct("2016-1-2 23:00", tz="America/New_York"),
by="min"
))
)
incident1= round(runif(2821, min=0, max=10))
incident2= round(runif(2821, min=0, max=10))
master_min = head(cbind(master_min, incident1, incident2), 5)
How do I essentially compute the following logic:
for each row, sum all the incident1s that occured before that row's timestamp and all the incident2s that occured after that row's timestamp? It would be great to get a data table solution, if not a dplyr as I am working with a large dataset. Below is a before and after for the data`:
BEFORE:
master_min incident1 incident2
1: 2016-01-01 00:00:00 9 6
2: 2016-01-01 00:01:00 9 5
3: 2016-01-01 00:02:00 3 5
4: 2016-01-01 00:03:00 8 6
5: 2016-01-01 00:04:00 6 9
AFTER THE CALCULATION:
master_min incident1 incident2 new_column
1: 2016-01-01 00:00:00 9 6 25
2: 2016-01-01 00:01:00 9 5 29
3: 2016-01-01 00:02:00 3 5 33
4: 2016-01-01 00:03:00 8 6 30
5: 2016-01-01 00:04:00 6 9 29
If I understand correctly:
# Cumsum of incident1, without current row:
master_min$sum1 <- cumsum(master_min$incident1) - master_min$incident1
# Reverse cumsum of incident2, without current row:
master_min$sum2 <- rev(cumsum(rev(master_min$incident2))) - master_min$incident2
# Your new column:
master_min$new_column <- master_min$sum1 + master_min$sum2
*update
The following two lines can do the job
master_min$sum1 <- cumsum(master_min$incident1)
master_min$sum2 <- sum(master_min$incident2) - cumsum(master_min$incident2)
I rewrote the question a bit to show a bit more comprehensive structure
library(data.table)
master_min <-
setDT(
data.frame(master_min = seq(
from=as.POSIXct("2016-1-1 0:00", tz="America/New_York"),
to=as.POSIXct("2016-1-1 0:09", tz="America/New_York"),
by="min"
))
)
set.seed(2)
incident1= as.integer(runif(10, min=0, max=10))
incident2= as.integer(runif(10, min=0, max=10))
master_min = cbind(master_min, incident1, incident2)
Now master_min looks like this
> master_min
master_min incident1 incident2
1: 2016-01-01 00:00:00 1 5
2: 2016-01-01 00:01:00 7 2
3: 2016-01-01 00:02:00 5 7
4: 2016-01-01 00:03:00 1 1
5: 2016-01-01 00:04:00 9 4
6: 2016-01-01 00:05:00 9 8
7: 2016-01-01 00:06:00 1 9
8: 2016-01-01 00:07:00 8 2
9: 2016-01-01 00:08:00 4 4
10: 2016-01-01 00:09:00 5 0
Apply transformations
master_min$sum1 <- cumsum(master_min$incident1)
master_min$sum2 <- sum(master_min$incident2) - cumsum(master_min$incident2)
Results
> master_min
master_min incident1 incident2 sum1 sum2
1: 2016-01-01 00:00:00 1 5 1 37
2: 2016-01-01 00:01:00 7 2 8 35
3: 2016-01-01 00:02:00 5 7 13 28
4: 2016-01-01 00:03:00 1 1 14 27
5: 2016-01-01 00:04:00 9 4 23 23
6: 2016-01-01 00:05:00 9 8 32 15
7: 2016-01-01 00:06:00 1 9 33 6
8: 2016-01-01 00:07:00 8 2 41 4
9: 2016-01-01 00:08:00 4 4 45 0
10: 2016-01-01 00:09:00 5 0 50 0