Is there an efficient way to calculate the length of portions of a vector that repeat a specified value?
For instance, I want to calculate the length of rainless periods along a vector of daily rainfall values:
daily_rainfall=c(15, 2, 0, 0, 0, 3, 3, 0, 0, 10)
Besides using the obvious but clunky approach of looping through the vector, what cleaner way can I get to the desired answer of
rainless_period_length=c(3, 2)
given the vector above?
R has a built-in function rle: "run-length encoding":
daily_rainfall <- c(15, 2, 0, 0, 0, 3, 3, 0, 0, 10)
runs <- rle(daily_rainfall)
rainless_period_length <- runs$lengths[runs$values == 0]
rainless_period_length
output:
[1] 3 2
Related
I am doing the next task.
Suppose that I have the next vector.
(1,1,0,0,0,1,1,1,1,0,0,1,1,1,0)
I need to extract the next info.
the maximum number of sets of consecutive zeros
the mean number of consecutive zeros.
FOr instance in the previous vector
the maximum is: 3, because I have 000 00 0
Then the mean number of zeros is 2.
I am thinking in this idea because I need to do the same but with several observations. I think to implement this inside an apply function.
We could use rle for this. As there are only binary values, we could just apply the rle on the entire vector, then extract the lengths that correspond to 0 (!values - returns TRUE for 0 and FALSE others)
out <- with(rle(v1), lengths[!values])
And get the length and the mean from the output
> length(out)
[1] 3
> mean(out)
[1] 2
data
v1 <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)
You can try another option using regmatches
> v <- c(1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)
> s <- paste0(v, collapse = "")
> zeros <- unlist(regmatches(s, gregexpr("0+", s)))
> length(zeros)
[1] 3
> mean(nchar(zeros))
[1] 2
In this tutorial, I tried to use another method for converting categorical variables to factor.
In the article, the following method is used.
library(MASS)
library(rpart)
cols <- c('low', 'race', 'smoke', 'ht', 'ui')
birthwt[cols] <- lapply(birthwt[cols], as.factor)
and I replaced the last line by
birthwt[cols] <- as.factor((birthwt[cols]))
but the result is NA all
What is wrong with that?
as.factor((birthwt[cols])) is calling as.factor on a list of 5 vectors. If you do that R will interpret each of those 5 vectors as the levels, and the column headers as the labels, of a factor variable, which is clearly not what you want:
> as.factor(birthwt[cols])
low race smoke ht ui
<NA> <NA> <NA> <NA> <NA>
5 Levels: c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) ...
> labels(as.factor(birthwt[cols]))
[1] "low" "race" "smoke" "ht" "ui"
lapply iterates over a list, calling the function as.factor on each of the vectors separately in that list. You need to do this to convert each variable separately into a factor, rather than attempting to convert the entire list into a single factor, which is what as.factor(birthwt[cols]) does.
So, I have a vector full of 1s and 0s. I need to plot a graph that starts at (0, 0) and rises by 1 for every 1 in the vector and dips by 1 for every 0 in the vector. For example if my vector is [ 1, 1, 1, 0, 1, 0, 1, 1 ] I should get something that looks like
I thought about creating another vector that would hold the sum of the first i elements of the original vector at index i (from the example: [ 1, 2, 3, 3, 4, 4, 5, 6 ]) but that would not account for the dips at 0s. Also, I cannot use loops to solve this.
I would convert the zeros to -1, add a zero at the very beginning to make sure it starts from [0,0] and then plot the cumulative sum:
#starting vec
myvec <- c(1, 1, 1, 0, 1, 0, 1, 1)
#convert 0 to -1
myvec[myvec == 0] <- -1
#add a zero at the beginning to make sure it starts from [0,0]
myvec <- c(0, myvec)
#plot cumulative sum
plot(cumsum(myvec), type = 'line')
#points(cumsum(myvec)) - if you also want the points on top of the line
I have a vector containing 0 and 1. I want to return the maximum value of number of times 1 appears consecutively. For e.g. if x is the input vector
x <-c(0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1)
Expected Output: 3
My attempt:
I'm using function rle to do this job. Here is my sample code:
x<-c(0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1)
y<-rle(x)
max_repeat <-max(y$lengths)
In this scenario, I get output as 4 (corresponding to 0 instead of 1). I tried to use tapply to access the complete output of rle, but I am not able to extract out the maximum repeat corresponding to value 1.
out <-tapply(y$lengths, y$values, max)
This is what I get for out:
0 1
4 3
When I look at the structure of out, it is " int [1:2(1d)] 4 3". I do not have enough experience with dealing this type of variables. I need to extract the value corresponding to 1 i.e. 3. Any help will be appreciated!
Thanks
You can try this:
x<-c(0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1)
y<-rle(x)
max(y$lengths[y$values==1])
# 3
If you want informations about any object, here y you can use the function str, which will return informations about what contains any object.
I found it like this =>
x <-c(0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1)
y=rle(x)
max(y$lengths[y$values==1])
hope it meets your expectations.
Simple question: I've got two vectors of 0's and 1's, a and b. The b vector has as many entries as there are 1's in a. I would like to replace the 1's in a with the entries from b. Of course I can do this in a for loop, but is there a nice vectorized way to do this?
From
a <- c(0, 1, 1, 0, 1)
b <- c(1, 0, 1)
create
c <- c(0, 1, 0, 0, 1)
This is pretty simple: a[a == 1] <- b