Extract first continuous sequence in vector - r

I have a vector:
as <- c(1,2,3,4,5,9)
I need to extract the first continunous sequence in the vector, starting at index 1, such that the output is the following:
1 2 3 4 5
Is there a smart function for doing this, or do I have to do something not so elegant like this:
a <- c(1,2,3,4,5,9)
is_continunous <- c()
for (i in 1:length(a)) {
if(a[i+1] - a[i] == 1) {
is_continunous <- c(is_continunous, i)
} else {
break
}
}
continunous_numbers <- c()
if(is_continunous[1] == 1) {
is_continunous <- c(is_continunous, length(is_continunous)+1)
continunous_numbers <- a[is_continunous]
}
It does the trick, but I would expect that there is a function that can already do this.

It isn't clear what you need if the index of the continuous sequence only if it starts at index one or the first sequence, whatever the beginning index is.
In both case, you need to start by checking the difference between adjacent elements:
d_as <- diff(as)
If you need the first sequence only if it starts at index 1:
if(d_as[1]==1) 1:(rle(d_as)$lengths[1]+1) else NULL
# [1] 1 2 3 4 5
rle permits to know lengths and values for each consecutive sequence of same value.
If you need the first continuous sequence, whatever the starting index is:
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
Examples (for the second option):
as <- c(1,2,3,4,5,9)
d_as <- diff(as)
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
#[1] 1 2 3 4 5
as <- c(4,3,1,2,3,4,5,9)
d_as <- diff(as)
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
# [1] 3 4 5 6 7
as <- c(1, 2, 3, 6, 7, 8)
d_as <- diff(as)
rle_d_as <- rle(d_as)
which(d_as==1)[1]+(0:(rle_d_as$lengths[rle_d_as$values==1][1]))
# [1] 1 2 3

A simple way to catch the sequence would be to find the diff of your vector and grab all elements with diff == 1 plus the very next element, i.e.
d1<- which(diff(as) == 1)
as[c(d1, d1[length(d1)]+1)]
NOTE
This will only work If you only have one sequence in your vector. However If we want to make it more general, then I 'd suggest creating a function as so,
get_seq <- function(vec){
d1 <- which(diff(as) == 1)
if(all(diff(d1) == 1)){
return(c(d1, d1[length(d1)]+1))
}else{
d2 <- split(d1, cumsum(c(1, diff(d1) != 1)))[[1]]
return(c(d2, d2[length(d2)]+1))
}
}
#testing it
as <- c(3, 5, 1, 2, 3, 4, 9, 7, 5, 4, 5, 6, 7, 8)
get_seq(as)
#[1] 3 4 5 6
as <- c(8, 9, 10, 11, 1, 2, 3, 4, 7, 8, 9, 10)
get_seq(as)
#[1] 1 2 3 4
as <- c(1, 2, 3, 4, 5, 6, 11)
get_seq(as)
#[1] 1 2 3 4 5 6

Related

How to validate a condition in a for loop

I am studying R end Data Science. In a question, I need to validate if a number in an array is even.
My code:
vetor <- list(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
for (i in vetor) {
if (i %% 2 == 0) {
print(i)
}
}
But the result is a warning message:
Warning message:
In if (i%%2 == 0) { :
a condição tem comprimento > 1 e somente o primeiro elemento será usado
Translating:
The condition has a length > 1 and only the first element will be used.
What I need, that each element in a list be verified if is even, and if true, then, print it.
In R, how can I do it?
The wrapper for list is not needed
vetor <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
running the OP's code
for (i in vetor) {
if (i %% 2 == 0) {
print(i)
}
}
#[1] 2
#[1] 4
#[1] 6
#[1] 8
#[1] 10
These are vectorized operations. We don't need a loop
vetor[vetor %% 2 == 0]
#[1] 2 4 6 8 10
When we wrap the vector with list, it returns a list of length 1 and the unit will be the whole vector. The for loop in R is a for each loop and not the traditional counter controlled 3 part expression loop. So, the i will be the whole vetor vector.
Because if/else expects a single element and not a vector of length greater than 1, it results in the warning message
Or if we want to store it in a list with each element of length 1, use as.list
vetor <- as.list(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
Let's break down your code and dig into each step to see what happened ...
You should notice that vetor is a list, i.e.,
> vetor
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
In this case, the iterator i in vetor denotes the array in vetor, which can be seen from
> for (i in vetor) {
+ str(i)
+ }
num [1:10] 1 2 3 4 5 6 7 8 9 10
Therefore, when you have condition i%%2==0, you are indeed running
> for (i in vetor) {
+ print(i %% 2 == 0)
+ }
[1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
which is not a single logic value as a condition for if ... else ... state. That is the reason you got the warnings.
Regarding the workaround, you can refer to #akrun's answer, which could help you a lot

finding values in a range in r and sum the number of values

I have a question I have the following data
c(1, 2, 4, 5, 1, 8, 9)
I set a l = 2 and an u = 6
I want to find all the values in the range (3,7)
How can I do this?
In base R we can use comparison operators to create a logical vector and use that for subsetting the original vector
x[x > 2 & x <= 6]
#[1] 3 5 6
Or using a for loop, initialize an empty vector, loop through the elements of 'x', if the value is between 2 and 6, then concatenate that value to the empty vector
v1 <- c()
for(i in x) {
if(i > 2 & i <= 6) v1 <- c(v1, i)
}
v1
#[1] 3 5 6
data
x <- c(3, 5, 6, 8, 1, 2, 1)

How to automatically move from e.g. x[1] to x[2]

I have a random vector (of numbers 1:5) of length 20. I need to count the number of runs of 1 (i.e. each number that is not followed by the same number), 2 (i.e. 2 consecutive numbers the same), 3 and 4.
I'm trying to write a function that takes x[1] and x[2] and compares them, if they are the same then + 1 to a counting variable. After that, x[1] becomes x[2] and x[2] should become x[3] so it keeps on repeating. How do I make x[2] change to x[3] without assigning it again? Sorry if that doesn't make much sense
This is my first day learning R so please simplify as much as you can so I understand lol..
{
startingnumber <- x[1]
nextnumber <- x[2]
count <- 0
repeat {
if (startingnumber == nextnumber) {
count <- count + 1
startingnumber <- nextnumber
nextnumber <- x[3]
} else {
if (startingnumber != nextnumber) {
break
........
}
}
}
}
As mentioned in the comments, using table() on the rle() lengths is probably the most concise solution
E.g:
x <- c(3, 1, 1, 3, 4, 5, 3, 1, 5, 4, 2, 4, 2, 3, 2, 3, 2, 4, 5, 4)
table(rle(x)$lengths)
# 1 2
# 18 1
# or
v <- c(1, 1, 2, 4, 5, 5, 4, 5, 5, 3, 3, 2, 2, 2, 1, 4, 4, 4, 2, 1)
table(rle(v)$lengths)
# 1 2 3
# 6 4 2
In the first example there's 18 singles and one double (the two 1s near the beginning), for a total of 1*18 + 2*1 = 20 values
In the second example there are 6 singles, 4 doubles, and 2 triples, giving a total of 1*6 + 2*4 + 3*2 = 20 values
But if computational speed is of more importance than concise code, we can do better, as both table() and rle() do computations internally that we don't really need. Instead we can assemble a function that only does the bare minimum.
runlengths <- function(x) {
n <- length(x)
r <- which(x[-1] != x[-n])
rl <- diff(c(0, r, n))
rlu <- sort(unique(rl))
rlt <- tabulate(match(rl, rlu))
names(rlt) <- rlu
as.table(rlt)
}
runlengths(x)
# 1 2
# 18 1
runlengths(v)
# 1 2 3
# 6 4 2
Bonus:
You already know that you can compare individual elements of a vector like this
x[1] == x[2]
x[2] == x[3]
but did you know that you can compare vectors with each other, and that you can select multiple elements from a vector by specifying multiple indices? Together that means we can instead of doing
x[1] == x[2]
x[2] == x[3]
.
.
.
x[18] == x[19]
x[19] == x[20]
do
x[1:19] == x[2:20]
# Or even
x[-length(x)] == x[-1]

Applying a Logical Calculation to Two Vectors and Returning the Result in a Third Vector

I'm fairly new to R and am having trouble implementing something that should be very basic. Can someone point me in the right direction?
I need to apply a logical calculation based on the values of two vectors and return the value of that function in a third vector.
I want to do this in a user defined function so I can easily apply this in several other areas of the algorithm and make modifications to the implementation with ease.
Here's what I have tried, but I cannot get this implementation to work. I believe it is because I cannot send vectors as parameters to this function.
<!-- language: python -->
calcSignal <- function(fVector, sVector) {
if(!is.numeric(fVector) || !is.numeric(sVector)) {
0
}
else if (fVector > sVector) {
1
}
else if (fVector < sVector) {
-1
}
else {
0 # is equal case
}
}
# set up data frame
df <- data.frame(x=c("NA", 2, 9, 7, 0, 5), y=c(4, 1, 5, 9, 0, "NA"))
# call function
df$z <- calcSignal(df$x, df$y)
I want the output to be a vector with the following values, but I am not implementing the function correctly.
[0,-1,1,-1,0,0]
Can someone help explain how to implement this function to properly perform the logic outlined?
I appreciate your assistance!
There are some misunderstandings in your code:
in R, "NA" is considered as character (string is called character in R). the correct
form is NA without quotes.
it is worth noting that data.frame automatically will convert character to factor type which can be disabled by using data.frame(...,stringsAsFactors = F).
each column of a data.frame has a type, not each element. so when you have a column containing numbers and NA, class of that column will be numeric and is.numeric gives you True even for NA elements. is.na will do the job
|| only compares first element of each vector. | does elementwise comparison.
Now let's implement what you wanted:
Implementation 1:
#set up data frame
df <- data.frame(x=c(NA, 2, 9, 7, 0, 5), y=c(4, 1, 5, 9, 0, NA))
calcSignal <- function(f,s){
if(is.na(f) | is.na(s))
return(0)
else if(f>s)
return(1)
else if(f<s)
return(-1)
else
return(0)
}
df$z = mapply(calcSignal, df$x, df$y, SIMPLIFY = T)
to run a function on two or more vectors element-wise, we can use mapply.
Implementaion 2
not much different from previous. here the function is easier to use.
#set up data frame
df <- data.frame(x=c(NA, 2, 9, 7, 0, 5), y=c(4, 1, 5, 9, 0, NA))
calcSignal <- function(fVector, sVector) {
res = mapply(function(f,s){
if(is.na(f) | is.na(s))
return(0)
else if(f>s)
return(1)
else if(f<s)
return(-1)
else
return(0)
},fVector,sVector,SIMPLIFY = T)
return(res)
}
df$z = calcSignal(df$x,df$y)
Implementaion 3 (Vectorized)
This one is much better. because it is vectorized and is much faster:
calcSignal <- function(fVector, sVector) {
res = rep(0,length(fVector))
res[fVector>sVector] = 1
res[fVector<sVector] = -1
#This line isn't necessary.It's just for clarification
res[(is.na(fVector) | is.na(sVector))] = 0
return(res)
}
df$z = calcSignal(df$x,df$y)
Output:
> df
x y z
1 NA 4 0
2 2 1 1
3 9 5 1
4 7 9 -1
5 0 0 0
6 5 NA 0
No need for loopage as ?sign has your back:
# fixing the "NA" issue:
df <- data.frame(x=c(NA, 2, 9, 7, 0, 5), y=c(4, 1, 5, 9, 0, NA))
s <- sign(df$x - df$y)
s[is.na(s)] <- 0
s
#[1] 0 1 1 -1 0 0
ifelse is another handy function. Less elegant here than sign though
df <- data.frame(x=c(NA, 2, 9, 7, 0, 5), y=c(4, 1, 5, 9, 0, NA))
cs <- function(x, y){
a <- x > y
b <- x < y
out <- ifelse(a, 1, ifelse(b, -1, 0))
ifelse(is.na(out), 0, out)
}
cs(df$x, df$y)

Variable sample upper value in R

I have the following matrix
m <- matrix(c(2, 4, 3, 5, 1, 5, 7, 9, 3, 7), nrow=5, ncol=2,)
colnames(x) = c("Y","Z")
m <-data.frame(m)
I am trying to create a random number in each row where the upper limit is a number based on a variable value (in this case 1*Y based on each row's value for for Z)
I currently have:
samp<-function(x){
sample(0:1,1,replace = TRUE)}
x$randoms <- apply(m,1,samp)
which work works well applying the sample function independently to each row, but I always get an error when I try to alter the x in sample. I thought I could do something like this:
samp<-function(x){
sample(0:m$Z,1,replace = TRUE)}
x$randoms <- apply(m,1,samp)
but I guess that was wishful thinking.
Ultimately I want the result:
Y Z randoms
2 5 4
4 7 7
3 9 3
5 3 1
1 7 6
Any ideas?
The following will sample from 0 to x$Y for each row, and store the result in randoms:
x$randoms <- sapply(x$Y + 1, sample, 1) - 1
Explanation:
The sapply takes each value in x$Y separately (let's call this y), and calls sample(y + 1, 1) on it.
Note that (e.g.) sample(y+1, 1) will sample 1 random integer from the range 1:(y+1). Since you want a number from 0 to y rather than 1 to y + 1, we subtract 1 at the end.
Also, just pointing out - no need for replace=T here because you are only sampling one value anyway, so it doesn't matter whether it gets replaced or not.
Based on #mathematical.coffee suggestion and my edited example this is the slick final result:
m <- matrix(c(2, 4, 3, 5, 1, 5, 7, 9, 3, 7), nrow=5, ncol=2,)
colnames(m) = c("Y","Z")
m <-data.frame(m)
samp<-function(x){
sample(Z + 1, 1)}
m$randoms <- sapply(m$Z + 1, sample, 1) - 1

Resources