I have data where consecutive runs of zero are separated by runs of non-zero values. I want to create a counter for the runs of zero in the column 'SOG'.
For the first sequence of 0 in SOG, set the counter in column Stops to 1. For the second run of zeros, set 'Stops' to 2, and so on.
SOG Stops
--- -----
4 0
4 0
0 1
0 1
0 1
3 0
4 0
5 0
0 2
0 2
1 0
2 0
0 3
0 3
0 3
SOG <- c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0)
#run length encoding:
tmp <- rle(SOG)
#turn values into logicals
tmp$values <- tmp$values == 0
#cumulative sum of TRUE values
tmp$values[tmp$values] <- cumsum(tmp$values[tmp$values])
#inverse the run length encoding
inverse.rle(tmp)
#[1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3
Try
df$stops<- with(df, cumsum(c(0, diff(!SOG))>0)*!SOG)
df$stops
# [1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3
Using dplyr:
library(dplyr)
df <- df %>% mutate(Stops = ifelse(SOG == 0, yes = cumsum(c(0, diff(!SOG) > 0)), no = 0))
df$Stops
#[1] 0 1 1 1 0 0 0 2 2 0 0 3 3 3
EDIT: As an aside to those of us who are still beginners, many of the answers to this question make use of logicals (i.e. TRUE, FALSE). ! before a numeric variable like SOG tests whether the value is 0 and assigns TRUE if it is, and FALSE otherwise.
SOG
#[1] 4 0 0 0 3 4 5 0 0 1 2 0 0 0
!SOG
#[1] FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
#[12] TRUE TRUE TRUE
diff() takes the difference between the value and the one before it. Note that there is one less element in this list than in SOG since the first element doesn't have a lag with which to compute a difference. When it comes to logicals, diff(!SOG) produces 1 for TRUE - FALSE = 1, FALSE - TRUE = -1, and 0 otherwise.
diff(SOG)
#[1] -4 0 0 3 1 1 -5 0 1 1 -2 0 0
diff(!SOG)
#[1] 1 0 0 -1 0 0 1 0 -1 0 1 0 0
So cumsum(diff(!SOG) > 0) just focuses on the TRUE - FALSE changes
cumsum(diff(!SOG) > 0)
#[1] 1 1 1 1 1 1 2 2 2 2 3 3 3
But since the list of differences is one element shorter, we can append an element:
cumsum(c(0, diff(!SOG) > 0)) #Or cumsum( c(0, diff(!SOG)) > 0 )
#[1] 0 1 1 1 1 1 1 2 2 2 2 3 3 3
Then either "multiply" that list by !SOG as in #akrun's answer or use the ifelse() command. If a particular element of SOG == 0, we use the corresponding element from cumsum(c(0, diff(!SOG) > 0)); if it isn't 0, we assign 0.
A one-liner with rle would be -
df <- data.frame(SOG = c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0))
df <- transform(df, Stops = with(rle(SOG == 0), rep(cumsum(values) * values, lengths)))
df
# SOG Stops
#1 4 0
#2 4 0
#3 0 1
#4 0 1
#5 0 1
#6 3 0
#7 4 0
#8 5 0
#9 0 2
#10 0 2
#11 1 0
#12 2 0
#13 0 3
#14 0 3
#15 0 3
Related
Say I have a df:
df <- data.frame(flag = c(rep(0, 20)),
include = c(rep(1, 20)))
df[c(4,8,16), ]$flag <- 1
df
flag include
1 0 1
2 0 1
3 0 1
4 1 1
5 0 1
6 0 1
7 0 1
8 1 1
9 0 1
10 0 1
11 0 1
12 0 1
13 0 1
14 0 1
15 0 1
16 1 1
17 0 1
18 0 1
19 0 1
20 0 1
What I wish to do is change the include flag to 0 if the row is within +/- two rows of a row where flag == 1. The result would look like:
flag include
1 0 1
2 0 0
3 0 0
4 1 1
5 0 0
6 0 0
7 0 0
8 1 1
9 0 0
10 0 0
11 0 1
12 0 1
13 0 1
14 0 0
15 0 0
16 1 1
17 0 0
18 0 0
19 0 1
20 0 1
I've thought of some 'innovative' (read: inefficient and over complicated) ways to do it but was thinking there must be a simple way I'm overlooking.
Would be nice if the answer was such that I could generalize this to +/- n rows, since I have a lot more data and would be looking to potentially search within +/- 10 rows...
Another option with data.table:
library(data.table)
n = 2
# find the row number where flag is one
flag_one = which(df$flag == 1)
# find the index where include needs to be updated
idx = setdiff(outer(flag_one, -n:n, "+"), flag_one)
# update include in place
setDT(df)[idx[idx >= 1 & idx <= nrow(df)], include := 0][]
# or as #Frank commented the last step with base R would be
# df$include[idx[idx >= 1 & idx <= nrow(df)]] = 0
# flag include
# 1: 0 1
# 2: 0 0
# 3: 0 0
# 4: 1 1
# 5: 0 0
# 6: 0 0
# 7: 0 0
# 8: 1 1
# 9: 0 0
#10: 0 0
#11: 0 1
#12: 0 1
#13: 0 1
#14: 0 0
#15: 0 0
#16: 1 1
#17: 0 0
#18: 0 0
#19: 0 1
#20: 0 1
Put in a function:
update_n <- function(df, n) {
flag_one = which(df$flag == 1)
idx = setdiff(outer(flag_one, -n:n, "+"), flag_one)
df$include[idx[idx >= 1 & idx <= nrow(df)]] = 0
df
}
There must be another simpler way but the first way which I could think of is using sapply and which
df$include[sapply(which(df$flag == 1) , function(x) c(x-2, x-1, x+1, x+2))] <- 0
df
# flag include
#1 0 1
#2 0 0
#3 0 0
#4 1 1
#5 0 0
#6 0 0
#7 0 0
#8 1 1
#9 0 0
#10 0 0
#11 0 1
#12 0 1
#13 0 1
#14 0 0
#15 0 0
#16 1 1
#17 0 0
#18 0 0
#19 0 1
#20 0 1
We first find out all the indices where flag is 1 and then create the required sequence of numbers around each of it and turn that index of include to 0.
For variable n we can do
n = 2
df$include[sapply(which(df$flag == 1),function(x) setdiff(seq(x-n, x+n),x))] <- 0
replace(x = df$include,
list = sapply(1:NROW(df), function(i)
any(df$flag[c(max(1, i-2):max(1, i-1),
min(i+1, NROW(df)):min(i+2, NROW(df)))] == 1)), values = 0)
# [1] 1 0 0 1 0 0 0 1 0 0 1 1 1 0 0 1 0 0 1 1
For n rows,
replace(x = df$include,
list = sapply(1:NROW(df), function(i)
any(df$flag[c(max(1, i-n):max(1, i-1),
min(i+1, NROW(df)):min(i+n, NROW(df)))] == 1)), values = 0)
Another way is to use zoo::rollapply. To determine if a row is within +/- two rows of a row where flag == 1, we check if the maximum flag in a window is 1.
We need rollapply rather than rollmax because we need to specify partial = T.
is_within_flag_window <- function(flag, n) {
zoo::rollapply(flag, width = (2 * n) + 1, partial = T, FUN = max) == 1
}
df %>%
mutate(include = ifelse(flag == 1, 1,
ifelse(is_within_flag_window(flag, 2), 0,
1)))
Use which and outer.
df$include[outer(which(df$flag==1), -2:2, `+`)] <- 0
If flag=1 within one or two positions of each other then restore the ones overwritten at position 0. Note this step is critical in case the "flag" overlaps in a particular range.
df$include[which(df$flag==1)] <- 1
flag include
1 0 1
2 0 0
3 0 0
4 1 1
5 0 0
6 0 0
7 0 0
8 1 1
9 0 0
10 0 0
11 0 1
12 0 1
13 0 1
14 0 0
15 0 0
16 1 1
17 0 0
18 0 0
19 0 1
20 0 1
If flag = 1 within one or two rows of the beginning or end of the dataset, R will throw errors. Use this:
## assign i for convenience/readability
i <- pmax(1, pmin(nrow(df), outer(which(df$flag==1), -2:2, `+`)))
df$include[i] <- 0
Restore 1s as before
This question already has answers here:
Cumulative sum for positive numbers only [duplicate]
(9 answers)
Closed 6 years ago.
If I have the following vector:
x = c(1,1,1,0,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0,0,1,1,1)
how can I calculate the cumulative sum for all of the consecutive 1's, resetting each time I hit a 0?
So, the desired output would look like this:
> y
[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
This works:
unlist(lapply(rle(x)$lengths, FUN = function(z) 1:z)) * x
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
It relies pretty heavily on your special case of only having 1s and 0s, but for that case it works great! Even better, with #nicola's suggested improvements:
sequence(rle(x)$lengths) * x
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
I read this post about how to split a vector, and use splitAt2 by #Calimo.
So it's like this:
splitAt2 <- function(x, pos) {
out <- list()
pos2 <- c(1, pos, length(x)+1)
for (i in seq_along(pos2[-1])) {
out[[i]] <- x[pos2[i]:(pos2[i+1]-1)]
}
return(out)
}
x = c(1,1,1,0,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0,0,1,1,1)
where_split = which(x == 0)
x_split = splitAt2(x, where_split)
unlist(sapply(x_split, cumsum))
# [1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
Here is another option
library(data.table)
ave(x, rleid(x), FUN=seq_along)*x
#[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
Or without any packages
ave(x, cumsum(c(TRUE, x[-1]!= x[-length(x)])), FUN=seq_along)*x
#[1] 1 2 3 0 0 0 0 1 2 0 0 1 2 3 0 0 1 2 3 4 0 0 0 0 1 2 3
I have a table in R, how do I make a value in the row that is greater or equal to a certain number a 1 and the rest of the values a 0. For example, if my special number was 4, then every value that is 4 and above 4 in my table would be 1, and the rest would be zero. For example then this table:
a b c d e
Bill 1 2 3 4 5
Susan 4 1 5 4 2
Malcolm 4 5 6 2 1
Reese 0 0 2 3 8
Would Turn Into
a b c d e
Bill 0 0 0 1 1
Susan 1 0 1 1 0
Malcolm 1 1 1 0 0
Reese 0 0 0 0 1
We can create a logical matrix of TRUE/FALSE and convert to binary format by using +
+(df1>=4)
# a b c d e
#Bill 0 0 0 1 1
#Susan 1 0 1 1 0
#Malcolm 1 1 1 0 0
#Reese 0 0 0 0 1
Just to be clear, when we do the >=, it creates a logical matrix of TRUE/FALSE
df1 >=4
# a b c d e
#Bill FALSE FALSE FALSE TRUE TRUE
#Susan TRUE FALSE TRUE TRUE FALSE
#Malcolm TRUE TRUE TRUE FALSE FALSE
#Reese FALSE FALSE FALSE FALSE TRUE
But, the OP wanted this to be convert it to 1/0. There are many ways to do this by coercing TRUE/FALSE to binary form. One option is
(df1>=4) + 0L
Or
(df1>=4)*1L
Or simply putting a + will do the coercion
+(df1>=4)
According to ?TRUE
Logical vectors are coerced to integer vectors in contexts where a
numerical value is required, with ‘TRUE’ being mapped to ‘1L’,
‘FALSE’ to ‘0L’ and ‘NA’ to ‘NA_integer_’.
We could also wrap with as.integer, but the output will be a vector
as.integer(df1>=4)
#[1] 0 1 1 0 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 1
If we assign the output back to the original dataset, we can change that dataset and keep its structure
df1[] <- as.integer(df1>=4)
df1
# a b c d e
#Bill 0 0 0 1 1
#Susan 1 0 1 1 0
#Malcolm 1 1 1 0 0
#Reese 0 0 0 0 1
This problem is very similar to Consecutive value after column value change in R
So for
SOG <- c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0)
the difference is that now I'd like to count how many groups of SOG there are. For example:
SOG Trips
--- -----
4 1
4 1
0 0
0 0
0 0
3 2
4 2
5 2
0 0
0 0
1 3
2 3
0 0
0 0
0 0
Anyone?
Assuming you mean a "group of SOG" is a set of consecutive non-zero SOG values, i.e. starts with a non-zero SOG value and ends with a non-zero SOG value (not necessarily the same value):
Trips <- ifelse(SOG>0, cumsum(c(SOG[1]>0, diff(SOG>0)) == 1), 0)
# [1] 1 1 0 0 0 2 2 2 0 0 3 3 0 0 0
This is one option:
replace(cumsum(c(SOG[1], abs(diff(SOG))) == SOG & SOG != 0), SOG == 0, 0)
# [1] 1 1 0 0 0 2 2 2 0 0 3 3 0 0 0
You can try my TrueSeq function from my GitHub-only "SOfun" package.
Usage would be:
library(SOfun)
TrueSeq(as.logical(SOG))
# [1] 1 1 0 0 0 2 2 2 0 0 3 3 0 0 0
To get the inverse, just negate the as.logical step:
TrueSeq(!as.logical(SOG))
# [1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3
I have got a vector which is as under
a<- c(1,1,1,2,3,2,2,2,2,1,0,0,0,0,2,3,4,4,1,1)
Here we can see that there are lot of duplicate elements, ie. they are repeated ones.
I want a code which can replace all the elements which are consecutive and duplicate by 0 except for the first element. The result which i require is
a<- c(1,0,0,2,3,2,0,0,0,1,0,0,0,0,2,3,4,0,1,0)
I've tried
unique(a)
#which gives
[1] 1 2 3 0 4
You can created a lagged series and compare
> a
[1] 1 1 1 2 3 2 2 2 2 1 0 0 0 0 2 3 4 4 1 1
> ifelse(a == c(a[1]-1,a[(1:length(a)-1)]) , 0 , a)
[1] 1 0 0 2 3 2 0 0 0 1 0 0 0 0 2 3 4 0 1 0
replace(a, duplicated(c(0, cumsum(abs(diff(a))))), 0)
# [1] 1 0 0 2 3 2 0 0 0 1 0 0 0 0 2 3 4 0 1 0