Determining size of identical adjacents values in a vector - r

I have a vector made of 0 and 1. It refers to hourly met data with 0 = no rain, 1 = rain event during the corresponding hour.
The objective is to determine the duration of all rain events i.e. the length of each block of 1s in the vector.
Is there anything better than a loop screening all values and neigthbours 1 by 1.
Thanks in advance for your help.
All the best,
Vincent

As #joran suggests, rle is what you want.
hourly.rain <- c(0, 0, 1, 1, 0, 1, 0, 0, 1, 1)
with(rle(hourly.rain), lengths[values == 1])
#[1] 2 1 2
If you want to observe an inter-event time, say 2 hours, (i.e., events separated by 2 hours or less are considered the same event), you can also use rle to replace those 0s within the inter-event period with 1s.
inter.event <- 2
hourly.rain <- c(0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1)
with(rle(hourly.rain), {
     fwd.lag <- c(head(values, -1), 1)
     bkwd.lag <- c(1, tail(values, -1))
     replace.vals <- values == 0 & lengths <= inter.event & fwd.lag == bkwd.lag
     rep(replace(values, replace.vals, 1) , lengths)
})
# [1] 0 0 0 1 1 1 1 0 0 0 1 1 1 1

Related

Count the max number of ones in a vector

I am doing the next task.
Suppose that I have the next vector.
(1,1,0,0,0,1,1,1,1,0,0,1,1,1,0)
I need to extract the next info.
the maximum number of sets of consecutive zeros
the mean number of consecutive zeros.
FOr instance in the previous vector
the maximum is: 3, because I have 000 00 0
Then the mean number of zeros is 2.
I am thinking in this idea because I need to do the same but with several observations. I think to implement this inside an apply function.
We could use rle for this. As there are only binary values, we could just apply the rle on the entire vector, then extract the lengths that correspond to 0 (!values - returns TRUE for 0 and FALSE others)
out <- with(rle(v1), lengths[!values])
And get the length and the mean from the output
> length(out)
[1] 3
> mean(out)
[1] 2
data
v1 <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)
You can try another option using regmatches
> v <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)
> s <- paste0(v, collapse = "")
> zeros <- unlist(regmatches(s, gregexpr("0+", s)))
> length(zeros)
[1] 3
> mean(nchar(zeros))
[1] 2

R: randomly sample a nonzero element in a vector and replace other elements with 0

Suppose I have a vector
vec <- c(0, 1, 0, 0, 0, 1, 1, 1, 1, 2)
How do I random sample a nonzero element and turn other elements into 0?
Suppose the element sampled was vec[2], then the resulting vector would be
vec <- c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0)
I know that I can sample the indice of one nonzero element by sample(which(vec != 0), 1), but I am not sure how to proceed from that. Thanks!
You can try the code below
> replace(0 * vec, sample(which(vec != 0), 1), 1)
[1] 0 0 0 0 0 0 0 1 0 0
where
which returns the indices of non-zero values
sample gives a random index
replace replaces the value to 1 at the specific index
Watch out for sample's behavior if which returns only 1 value:
> vec <- c(rep(0, 9), 1)
> sample(which(vec != 0), 1)
[1] 4
This preserves the vector value (instead of turning it to 1) and guards against vectors with only one nonzero value using rep to guarantee sample gets a vector with more than one element:
vec[-sample(rep(which(vec != 0), 2), 1)] <- 0

Calculate cumulative prevalence of carriage of resistant bugs

I have recently started using R. When working on carriage of problematic bacteria, I encountered one problem that I hope somebody could help solve. Apologies if the question is on the easy side.
I want to calculate the cumulative proportion of people who get colonized by the problem bug at various time points (a, b, c) as shown in the dataset below "df". "0" means negative test, "1" means positive test for resistant bug, "NA" means test was not done at the time point. The result should be as described in "x", i.e. if the person ever tests positive on either time point (a,b,c) he should have the value "1" in x. If all his tests were negative he should have value "0", and if he never had a test done, the value should be "NA". Is there a good way to calculate this "x" automatically?
a <- c(0, 0, 1, 0, 0, 1, 0, 0, NA, NA)
b <- c(0, 0, 1, 0, 1, NA, 0, 0, NA, 0)
c <- c(NA, 1, 0, 0, 0, 1, 1, 0, NA, 0)
df <- cbind(a, b, c)
df
x <- c(0, 1, 1, 0, 1, 1, 1, 0,NA,0)
df <- cbind(df, x)
df
I tried to create the x-variable using ifelse, but get problems with missing values. For instance, using the following expression:
y <- ifelse(a==1 | b==1 | c==1, 1, ifelse(a==0 | b==0 | c==0, 0, NA))
df <- cbind(df, y)
df
... the resultant column erroneously get "NA" in row 1 and 10, i.e. when there is a combination of 0 and NA, the result should be 0, not NA.
You can use rowSums :
cols <- c('a', 'b', 'c')
+(rowSums(df[, cols], na.rm = TRUE) > 0) * NA^+(rowSums(!is.na(df[, cols])) == 0)
#[1] 0 1 1 0 1 1 1 0 NA 0
This gives similar result as x shown however, might be difficult to understand.
Here is a simple alternative using apply :
apply(df[, cols], 1, function(x) if(all(is.na(x))) NA else +(any(x == 1, na.rm = TRUE)))
#[1] 0 1 1 0 1 1 1 0 NA 0
This returns NA if all the values in the row are NA else checks if any value has 1 in it.

The diff function for vectors [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I am trying out R studio myself and have a question.
I have a vector
vec <- c(1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1)
I want to make a function to do the following: if the distance between two subsequences of 1's less then 5, then it is going to show 0. But if it is more than 5 it will show 1.
So, if looking at
vec <- c(1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1)
the output is going to be:
0 0 1
I understand how I can find a position of 1:
function_start_of_seq <- function(x) {
one_pos<-which(rle(x)$values==1 %in% TRUE)
And I know that I need to use diff function and cumsum, but I don't know how...
Perhaps an approach regarding rather the 0s than the 1s is more appropriate. In the next line you can check the lengths of the rle() output which distance (i.e. number of 0s between the 1s) exceeds the 5. Just convert it into 0-1 with as.numeric()at the end.
fun1 <- function(x) {
null_pos <- which(rle(x)$values == 0)
tf <- rle(x)$lengths[null_pos] > 5
return(as.numeric(tf))
}
> fun1(vec)
[1] 0 0 1
Does that make sense?
In case you want a one-liner, just do
> as.numeric(rle(vec)$lengths[which(rle(vec)$values == 0)] > 5)
[1] 0 0 1
The part which(rle(vec)$values == 0) selects the positions with distance between 1s sequences (i.e. the output of rle() regarding the 0s) is greater than 5.
as.numeric() then "translates" the output into the 0-1 - form you desire.
An uncool, non-obfuscated, only-calling-rle-once, no-use-of-which answer:
vec <- c(1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1)
r <- rle(vec)
r
## Run Length Encoding
## lengths: int [1:7] 1 2 2 4 1 6 1
## values : num [1:7] 1 0 1 0 1 0 1
So it seems the distance between the 1 sequences is what you're after. We'll assume you know you always have 0's and 1's.
r$values == 0 will return a vector with TRUE or FALSE for the result of each positional evalution. We can use that directly in r$lengths.
rl <- r$lengths[r$values == 0]
rl
## [1] 2 4 6
Since it's just 0 and 1, we don't need a double. integers will do just fine:
as.integer(rl > 5)
## [1] 0 0 1

Vectorize loop with repeating indices

I have a vector of indices that contains repeating values:
IN <- c(1, 1, 2, 2, 3, 4, 5)
I would like to uses these indices to subtract two vectors:
ST <- c(0, 0, 0, 0, 0, 0, 0)
SB <- c(1, 1, 1, 1, 1, 1, 1)
However, I would like to do the subtraction in "order" such that after subtraction of the first index values (0, 1), the second substraction would "build off" the first subtraction. I would like to end up with a vector FN that looks like this:
c(-2, -2, -1, -1, -1, 0, 0)
This is easy enough to do in a for loop:
for(i in seq_along(IN)){
ST[IN[i]] <- ST[IN[i]] - SB[IN[i]]
}
But I need to run this loop many times on long vectors and this can take many hours. Is there any way to vectorize this task and avoid a for loop? Maybe using a data.table technique?
Sure, with data.table, it's
library(data.table)
DT = data.table(ST)
mDT = data.table(IN, SB)[, .(sub = sum(SB)), by=.(w = IN)]
DT[mDT$w, ST := ST - mDT$sub ]
ST
1: -2
2: -2
3: -1
4: -1
5: -1
6: 0
7: 0
Or with base R:
w = sort(unique(IN))
ST[w] <- ST[w] - tapply(SB, IN, FUN = sum)
# [1] -2 -2 -1 -1 -1 0 0
Here is an option using aggregate in base R:
ag <- aggregate(.~IN, data.frame(IN, ST[IN]-SB[IN]), sum)
replace(ST, ag[,1], ag[,2])
#[1] -2 -2 -1 -1 -1 0 0
OR using xtabs:
d <- as.data.frame(xtabs(B~A, data.frame(A=IN, B=ST[IN]-SB[IN])))
replace(ST, d[,1], d[,2])

Resources