I have data where consecutive runs of zero are separated by runs of non-zero values. I want to create a counter for the runs of zero in the column 'SOG'.
For the first sequence of 0 in SOG, set the counter in column Stops to 1. For the second run of zeros, set 'Stops' to 2, and so on.
SOG Stops
--- -----
4 0
4 0
0 1
0 1
0 1
3 0
4 0
5 0
0 2
0 2
1 0
2 0
0 3
0 3
0 3
SOG <- c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0)
#run length encoding:
tmp <- rle(SOG)
#turn values into logicals
tmp$values <- tmp$values == 0
#cumulative sum of TRUE values
tmp$values[tmp$values] <- cumsum(tmp$values[tmp$values])
#inverse the run length encoding
#[1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3
df$stops<- with(df, cumsum(c(0, diff(!SOG))>0)*!SOG)
# [1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3
Using dplyr:
df <- df %>% mutate(Stops = ifelse(SOG == 0, yes = cumsum(c(0, diff(!SOG) > 0)), no = 0))
#[1] 0 1 1 1 0 0 0 2 2 0 0 3 3 3
EDIT: As an aside to those of us who are still beginners, many of the answers to this question make use of logicals (i.e. TRUE, FALSE). ! before a numeric variable like SOG tests whether the value is 0 and assigns TRUE if it is, and FALSE otherwise.
#[1] 4 0 0 0 3 4 5 0 0 1 2 0 0 0
diff() takes the difference between the value and the one before it. Note that there is one less element in this list than in SOG since the first element doesn't have a lag with which to compute a difference. When it comes to logicals, diff(!SOG) produces 1 for TRUE - FALSE = 1, FALSE - TRUE = -1, and 0 otherwise.
#[1] -4 0 0 3 1 1 -5 0 1 1 -2 0 0
#[1] 1 0 0 -1 0 0 1 0 -1 0 1 0 0
So cumsum(diff(!SOG) > 0) just focuses on the TRUE - FALSE changes
cumsum(diff(!SOG) > 0)
#[1] 1 1 1 1 1 1 2 2 2 2 3 3 3
But since the list of differences is one element shorter, we can append an element:
cumsum(c(0, diff(!SOG) > 0)) #Or cumsum( c(0, diff(!SOG)) > 0 )
#[1] 0 1 1 1 1 1 1 2 2 2 2 3 3 3
Then either "multiply" that list by !SOG as in #akrun's answer or use the ifelse() command. If a particular element of SOG == 0, we use the corresponding element from cumsum(c(0, diff(!SOG) > 0)); if it isn't 0, we assign 0.
A one-liner with rle would be -
df <- data.frame(SOG = c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0))
df <- transform(df, Stops = with(rle(SOG == 0), rep(cumsum(values) * values, lengths)))
# SOG Stops
#1 4 0
#2 4 0
#3 0 1
#4 0 1
#5 0 1
#6 3 0
#7 4 0
#8 5 0
#9 0 2
#10 0 2
#11 1 0
#12 2 0
#13 0 3
#14 0 3
#15 0 3
Say I have a df:
df <- data.frame(flag = c(rep(0, 20)),
include = c(rep(1, 20)))
df[c(4,8,16), ]$flag <- 1
flag include
1 0 1
2 0 1
3 0 1
4 1 1
5 0 1
6 0 1
7 0 1
8 1 1
9 0 1
10 0 1
11 0 1
12 0 1
13 0 1
14 0 1
15 0 1
16 1 1
17 0 1
18 0 1
19 0 1
20 0 1
What I wish to do is change the include flag to 0 if the row is within +/- two rows of a row where flag == 1. The result would look like:
flag include
1 0 1
2 0 0
3 0 0
4 1 1
5 0 0
6 0 0
7 0 0
8 1 1
9 0 0
10 0 0
11 0 1
12 0 1
13 0 1
14 0 0
15 0 0
16 1 1
17 0 0
18 0 0
19 0 1
20 0 1
I've thought of some 'innovative' (read: inefficient and over complicated) ways to do it but was thinking there must be a simple way I'm overlooking.
Would be nice if the answer was such that I could generalize this to +/- n rows, since I have a lot more data and would be looking to potentially search within +/- 10 rows...
Another option with data.table:
n = 2
# find the row number where flag is one
flag_one = which(df$flag == 1)
# find the index where include needs to be updated
idx = setdiff(outer(flag_one, -n:n, "+"), flag_one)
# update include in place
setDT(df)[idx[idx >= 1 & idx <= nrow(df)], include := 0][]
# or as #Frank commented the last step with base R would be
# df$include[idx[idx >= 1 & idx <= nrow(df)]] = 0
# flag include
# 1: 0 1
# 2: 0 0
# 3: 0 0
# 4: 1 1
# 5: 0 0
# 6: 0 0
# 7: 0 0
# 8: 1 1
# 9: 0 0
#10: 0 0
#11: 0 1
#12: 0 1
#13: 0 1
#14: 0 0
#15: 0 0
#16: 1 1
#17: 0 0
#18: 0 0
#19: 0 1
#20: 0 1
Put in a function:
update_n <- function(df, n) {
flag_one = which(df$flag == 1)
idx = setdiff(outer(flag_one, -n:n, "+"), flag_one)
df$include[idx[idx >= 1 & idx <= nrow(df)]] = 0
There must be another simpler way but the first way which I could think of is using sapply and which
df$include[sapply(which(df$flag == 1) , function(x) c(x-2, x-1, x+1, x+2))] <- 0
# flag include
#1 0 1
#2 0 0
#3 0 0
#4 1 1
#5 0 0
#6 0 0
#7 0 0
#8 1 1
#9 0 0
#10 0 0
#11 0 1
#12 0 1
#13 0 1
#14 0 0
#15 0 0
#16 1 1
#17 0 0
#18 0 0
#19 0 1
#20 0 1
We first find out all the indices where flag is 1 and then create the required sequence of numbers around each of it and turn that index of include to 0.
For variable n we can do
n = 2
df$include[sapply(which(df$flag == 1),function(x) setdiff(seq(x-n, x+n),x))] <- 0
replace(x = df$include,
list = sapply(1:NROW(df), function(i)
any(df$flag[c(max(1, i-2):max(1, i-1),
min(i+1, NROW(df)):min(i+2, NROW(df)))] == 1)), values = 0)
# [1] 1 0 0 1 0 0 0 1 0 0 1 1 1 0 0 1 0 0 1 1
For n rows,
replace(x = df$include,
list = sapply(1:NROW(df), function(i)
any(df$flag[c(max(1, i-n):max(1, i-1),
min(i+1, NROW(df)):min(i+n, NROW(df)))] == 1)), values = 0)
Another way is to use zoo::rollapply. To determine if a row is within +/- two rows of a row where flag == 1, we check if the maximum flag in a window is 1.
We need rollapply rather than rollmax because we need to specify partial = T.
is_within_flag_window <- function(flag, n) {
zoo::rollapply(flag, width = (2 * n) + 1, partial = T, FUN = max) == 1
df %>%
mutate(include = ifelse(flag == 1, 1,
ifelse(is_within_flag_window(flag, 2), 0,
Use which and outer.
df$include[outer(which(df$flag==1), -2:2, `+`)] <- 0
If flag=1 within one or two positions of each other then restore the ones overwritten at position 0. Note this step is critical in case the "flag" overlaps in a particular range.
df$include[which(df$flag==1)] <- 1
flag include
1 0 1
2 0 0
3 0 0
4 1 1
5 0 0
6 0 0
7 0 0
8 1 1
9 0 0
10 0 0
11 0 1
12 0 1
13 0 1
14 0 0
15 0 0
16 1 1
17 0 0
18 0 0
19 0 1
20 0 1
If flag = 1 within one or two rows of the beginning or end of the dataset, R will throw errors. Use this:
## assign i for convenience/readability
i <- pmax(1, pmin(nrow(df), outer(which(df$flag==1), -2:2, `+`)))
df$include[i] <- 0
Restore 1s as before
This question already has answers here:
Cumulative sum for positive numbers only [duplicate]
(9 answers)
Closed 6 years ago.
If I have the following vector:
x = c(1,1,1,0,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0,0,1,1,1)
how can I calculate the cumulative sum for all of the consecutive 1's, resetting each time I hit a 0?
So, the desired output would look like this:
> y
[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
This works:
unlist(lapply(rle(x)$lengths, FUN = function(z) 1:z)) * x
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
It relies pretty heavily on your special case of only having 1s and 0s, but for that case it works great! Even better, with #nicola's suggested improvements:
sequence(rle(x)$lengths) * x
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
I read this post about how to split a vector, and use splitAt2 by #Calimo.
So it's like this:
splitAt2 <- function(x, pos) {
out <- list()
pos2 <- c(1, pos, length(x)+1)
for (i in seq_along(pos2[-1])) {
out[[i]] <- x[pos2[i]:(pos2[i+1]-1)]
x = c(1,1,1,0,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0,0,1,1,1)
where_split = which(x == 0)
x_split = splitAt2(x, where_split)
unlist(sapply(x_split, cumsum))
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
Here is another option
ave(x, rleid(x), FUN=seq_along)*x
#[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
Or without any packages
ave(x, cumsum(c(TRUE, x[-1]!= x[-length(x)])), FUN=seq_along)*x
#[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
I have a table in R, how do I make a value in the row that is greater or equal to a certain number a 1 and the rest of the values a 0. For example, if my special number was 4, then every value that is 4 and above 4 in my table would be 1, and the rest would be zero. For example then this table:
a b c d e
Bill 1 2 3 4 5
Susan 4 1 5 4 2
Malcolm 4 5 6 2 1
Reese 0 0 2 3 8
Would Turn Into
a b c d e
Bill 0 0 0 1 1
Susan 1 0 1 1 0
Malcolm 1 1 1 0 0
Reese 0 0 0 0 1
We can create a logical matrix of TRUE/FALSE and convert to binary format by using +
# a b c d e
#Bill 0 0 0 1 1
#Susan 1 0 1 1 0
#Malcolm 1 1 1 0 0
#Reese 0 0 0 0 1
Just to be clear, when we do the >=, it creates a logical matrix of TRUE/FALSE
df1 >=4
# a b c d e
But, the OP wanted this to be convert it to 1/0. There are many ways to do this by coercing TRUE/FALSE to binary form. One option is
(df1>=4) + 0L
Or simply putting a + will do the coercion
According to ?TRUE
Logical vectors are coerced to integer vectors in contexts where a
numerical value is required, with ‘TRUE’ being mapped to ‘1L’,
‘FALSE’ to ‘0L’ and ‘NA’ to ‘NA_integer_’.
We could also wrap with as.integer, but the output will be a vector
#[1] 0 1 1 0 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 1
If we assign the output back to the original dataset, we can change that dataset and keep its structure
df1[] <- as.integer(df1>=4)
# a b c d e
#Bill 0 0 0 1 1
#Susan 1 0 1 1 0
#Malcolm 1 1 1 0 0
#Reese 0 0 0 0 1
This problem is very similar to Consecutive value after column value change in R
So for
SOG <- c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0)
the difference is that now I'd like to count how many groups of SOG there are. For example:
SOG Trips
--- -----
4 1
4 1
0 0
0 0
0 0
3 2
4 2
5 2
0 0
0 0
1 3
2 3
0 0
0 0
0 0
Assuming you mean a "group of SOG" is a set of consecutive non-zero SOG values, i.e. starts with a non-zero SOG value and ends with a non-zero SOG value (not necessarily the same value):
Trips <- ifelse(SOG>0, cumsum(c(SOG[1]>0, diff(SOG>0)) == 1), 0)
# [1] 1 1 0 0 0 2 2 2 0 0 3 3 0 0 0
This is one option:
replace(cumsum(c(SOG[1], abs(diff(SOG))) == SOG & SOG != 0), SOG == 0, 0)
# [1] 1 1 0 0 0 2 2 2 0 0 3 3 0 0 0
You can try my TrueSeq function from my GitHub-only "SOfun" package.
Usage would be:
# [1] 1 1 0 0 0 2 2 2 0 0 3 3 0 0 0
To get the inverse, just negate the as.logical step:
# [1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3
I have got a vector which is as under
a<- c(1,1,1,2,3,2,2,2,2,1,0,0,0,0,2,3,4,4,1,1)
Here we can see that there are lot of duplicate elements, ie. they are repeated ones.
I want a code which can replace all the elements which are consecutive and duplicate by 0 except for the first element. The result which i require is
a<- c(1,0,0,2,3,2,0,0,0,1,0,0,0,0,2,3,4,0,1,0)
I've tried
#which gives
[1] 1 2 3 0 4
You can created a lagged series and compare
> a
[1] 1 1 1 2 3 2 2 2 2 1 0 0 0 0 2 3 4 4 1 1
> ifelse(a == c(a[1]-1,a[(1:length(a)-1)]) , 0 , a)
[1] 1 0 0 2 3 2 0 0 0 1 0 0 0 0 2 3 4 0 1 0
replace(a, duplicated(c(0, cumsum(abs(diff(a))))), 0)
# [1] 1 0 0 2 3 2 0 0 0 1 0 0 0 0 2 3 4 0 1 0