Longest conditional run in a vector in R - r

Given a single vector, I would like to find the longest run, which meets this: the count is being stopped when there is a run of x>1 for the first time, which has got length of at least 5.
For example, I got a vector X:
X <- c(2,3,4,0,1,0,0,0,3,2,2,0,3,3,3,3,3,0,0,0)
My desired run has got length of 12, its end is the beggining of the first run of x>1 which is at least 5 numbers long.
I know my question is not asked in the most aesthetic way, but I think that I explained it sufficiently.

Maybe you are looking for this -
with(rle(X > 1), {
val <- max(lengths[values & lengths >= 5])
inds <- which(values & lengths == val) - 1
cumsum(lengths)[inds]
})
#[1] 12

Related

sampling bug in R? [duplicate]

This question already has answers here:
Sample from vector of varying length (including 1)
(4 answers)
Closed 4 years ago.
I am trying to sample one element out of a numeric vector.
When the length of the vector > 1, the result is one of the numbers of the vector, as expected. However when the vector contains one element, it samples a number between 0 and this single number.
For example:
sample(c(100, 1000), 1)
results in either 100 or 1000, however
sample(c(100), 1)
results in different numbers smaller than 100.
What is going on?
Have a look at the Details of the sample function:
"If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x"
This is (unfortunately) expected behavior. See ?sample. The first line of the Details section:
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x. Note that this convenience feature may lead to undesired behaviour when x is of varying length in calls such as sample(x). See the examples.
Luckily the Examples section provides a suggested fix:
# sample()'s surprise -- example
x <- 1:10
sample(x[x > 8]) # length 2
sample(x[x > 9]) # oops -- length 10!
sample(x[x > 10]) # length 0
## safer version:
resample <- function(x, ...) x[sample.int(length(x), ...)]
resample(x[x > 8]) # length 2
resample(x[x > 9]) # length 1
resample(x[x > 10]) # length 0
You could, of course, also just use an if statement:
sampled_x = if (length(my_x) == 1) my_x else sample(my_x, size = 1)

How to find if two or more continuously elements of a vector are equal in R

I want to find a way to determine if two or more continuously elements of a vector are equal.
For example, in vector x=c(1,1,1,2,3,1,3), the first, the second and the third element are equal.
With the following command, I can determine if a vector, say y, contains two or more continuously elements that are equal to 2 or 3
all(rle(y)$lengths[which( rle(y)$values==2 | rle(y)$values==3 )]==1)
Is there any other faster way?
EDIT
Let say we have the vector z=c(1,1,2,1,2,2,3,2,3,3).
I want a vector with three elements as output. The first element will refer to value 1, the second to 2 and the third one to 3. The values of the elements of the output vector will be equal to 1 if two or more continuously elements of z are the same for one value of 1,2,3 and 0 otherwise. So, the output for the vector z will be (1,1,1).
For the vector w=c(1,1,2,3,2,3,1) the output will be 1,0,0, since only for the value 1 there are two continuously elements, that is in the first and in the second position of w.
I'm not entirely sure if I'm understanding your question as it could be worded better. The first part just asks how you find if continuous elements in a vector are equal. The answer is to use the diff() function combined with a check for a difference of zero:
z <- c(1,1,2,1,2,2,3,2,3,3)
sort(unique(z[which(diff(z) == 0)]))
# [1] 1 2 3
w <- c(1,1,2,3,2,3,1)
sort(unique(w[which(diff(w) == 0)]))
# [1] 1
But your edit example seems to imply you are looking to see if there are repeated units in a vector, of which will only be the integers 1, 2, or 3. Your output will always be X, Y, Z, where
X is 1 if there is at least one "1" repeated, else 0
Y is 2 if there is at least one "2" repeated, else 0
Z is 3 if there is at least one "3" repeated, else 0
Is this correct?
If so, see the following
continuously <- function(x){
s <- sort(unique(x[which(diff(x) == 0)]))
output <- c(0,0,0)
output[s] <- s
return(output)
}
continuously(z)
# [1] 1 2 3
continuously(w)
# [1] 1 0 0
Assuming your series name is z=c(1,1,2,1,2,2,3,2,3,3) then you can do:
(unique(z[c(FALSE, diff(z) == 0)]) >= 0)+0 which will output to 1, 1, 1,
When you run the above command on your other sequenc:
w=c(1,1,2,3,2,3,1)
then (unique(w[c(FALSE, diff(w) == 0)]) >= 0)+0 will return to 1
You may also try this for an exact output like 1,1,1 or 1,0,0
(unique(z[c(FALSE, diff(z) == 0)]) == unique(z))+0 #1,1,1 for z and 1,0,0 for w
Logic:
diff command will take difference between corresponding second and prior items, since total differences will always 1 less than the number of items, I have added first item as FALSE. Then subsetted with your original sequences and for boolean comparison whether the difference returned is zero or not. Finally we convert them to 1s by asking if they are greater than or equal to 0 (To get series of 1s, you may also check it with some other conditions to get 1s).
Assuming your sequence doesn't have negative numbers.

How to append first and last element in each vector (of a different length) of a list in R, without making the code slow?

I am new to R and StackOverFlow. So if there's something I am missing, please let me know. I am working on simulating nonhomogeneous processes to understand the neural behavior better. My code works in a way that I have, say 20 trials lasting 2 seconds each (each trial representing a spike train). Then the list SpikeTimes gives me a list of 20 vectors where each vector corresponds to the time stamps of where the spikes occurred in that particular trial. [Eg.SpikeTimes[1] which looks like this, 0.002250802 0.053934034...1.971574170 2.023090384 means that in the first Spike Train, spikes occurred at 0.002250802, 0.053934034 and so on. I don't know why it also brings up a time stamps beyond my time limit of 2 seconds, but I will work on that later]. My code looks like this--
nTrials=20
t_max=2.0000000
LambdaInv<- function(x){ifelse( x< 15, x/30,
ifelse(x >= 15 & x < 38, ((x-15)/46)+0.5,
ifelse(x >= 38 & x <53, ((x-38)/30)+1.0,
ifelse(x >= 53 & x< 67.4, ((x-53)/72)+1.5,
((x-67.4)/30)+1.7))))}
t = 0
s = 0
X = numeric(0)
NonHomoSpikes <- function(t_max){
while(t <= t_max){
u <- runif(1)
s <- s-log(u)
t <- LambdaInv(s)
X <- c(X,t)
}
return(X)
}
SpikeTimes <- lapply(1:nTrials, function(x) NonHomoSpikes(2))
My problem is that, for each of the vector in the list SpikeTimes; which gives time stamps of spikes, I also want to include the beginning (that is 0) and the end (that is 2) of the spike train. So I want to append this list to have each vector include the first entry as 0 and the last entry as 2.
My SpikeTimes[1] would then look like 0 0.002250802 0.053934034...1.971574170 2 and other SpikeTimes[i] would look similar. I tried SpikeTimes <- c(0, SpikeTimes) for entering 0 in the beginning but it only made the list have 21 vectors instead of 20 with the vector 0 as the first element (I mean I get why that happened). How can I do it in a way that doesn't make my code slow? I am newbie in R and reading up on the internet hasn't help with this particular problem. I would appreciate any sort of input.
Solution: change NonHomoSpikes return statement to return(c(0, X, 2)).
To 'manually' stop if t < 2 (as described in comments):
NonHomoSpikes <- function(t_max){
while(t <= t_max){
u <- runif(1)
s <- s-log(u)
t <- LambdaInv(s)
if(t > 2) break
X <- c(X,t)
}
return(c(0,X,0))
}

Check if numbers in a vector are alternating in R

i need to check if the first number of a vector is smaller than the second number and the second number is greater than the third number and so on. I got so far that i can calculate the differences of the numbers of a vector like this:
n <- sample(3) #may n = 132
diff(n) # outputs 2 -1
I need to check if the first number is positive, the second negative etc. The problem i have is that i need the program to do it for a vector of length n. How can i implement this?
As it is not very clear what i am trying to do here i will give a better example:
May v be a vector c(1,2,4,3).
I need to check if the first number of the vector is smaller than the second, the second greater than the third, the third smaller than the fourth.
So i need to check if 1 < 2 > 4 < 3. (This vector wouldn´t meet the requirements) Every number i will get will be > 0 and is guaranteed to just be there once.
This process needs to be generalized to a given n which is > 0 and a natural number.
v <- c(1, 2, 4, 3)
all(sign(diff(v)) == c(1, -1))
# [1] FALSE
# Warning message:
# In sign(diff(v)) == c(1, -1) :
# longer object length is not a multiple of shorter object length
We can safely ignore the warning message, since we make deliberate use of "recycling" (which means c(1, -1) is implicitly repeated to match the length of sign(diff(v))).
Edit: taking #digEmAll's comment into account, if you want to allow a negative difference rather than a positive one at the start of the sequence, then this naive change should do it:
diffs <- sign(diff(v))
all(diffs == c(1, -1)) || all(diffs == c(-1, 1))
If we need to find whether there are alternative postive, negative difference, then
all(rle(as.vector(tapply(n, as.integer(gl(length(n),
2, length(n))), FUN = diff)))$lengths==1)
#[1] TRUE
Also, as #digEmAll commented and the variation of my initial response
all(rle(sign(diff(n)) > 0)$lengths == 1)
data
n <- c(1, 2, 4, 3)

Removing zero lines from dataframe yields dataframe of zero lines

I have a script that has a bunch of quality control checksums and it got caught on a dataset that had no need to remove any samples (rows) due to quality control. However, this script gave me an unexpected result of a dataframe with zero rows. With example data, why does this work:
data(iris)
##get rid of those pesky factors
iris$Species <- NULL
med <- which(iris[, 1] < 4.9)
medtemp <- iris[-med, ]
dim(medtemp)
[1] 134 4
but this returns a dataframe of zero rows:
small <- which(iris[, 1] < 4.0)
smalltemp <- iris[-small, ]
dim(smalltemp)
[1] 0 4
As does this:
x <- 0
zerotemp <- iris[-x, ]
dim(zerotemp)
[1] 0 4
It seems that the smalltemp dataframe should be the same size as iris since there are no rows to remove at all. Why is this?
Copied verbatim from Patrick Burns's R Inferno p. 41 (I hope this constitutes "fair use" -- if someone objects I'll remove it)
negative nothing is something
> x2 <- 1:4
> x2[-which(x2 == 3)]
[1] 1 2 4
The command above returns all of the values in x2 not equal to 3.
> x2[-which(x2 == 5)]
numeric(0)
The hope is that the above command returns all of x2 since no elements are
equal to 5. Reality will dash that hope. Instead it returns a vector of length
zero.
There is a subtle difference between the two following statements:
x[]
x[numeric(0)]
Subtle difference in the input, but no subtlety in the difference in the output.
There are at least three possible solutions for the original problem.
out <- which(x2 == 5)
if(length(out)) x2[-out] else x2
Another solution is to use logical subscripts:
x2[!(x2 %in% 5)]
Or you can, in a sense, work backwards:
x2[ setdiff(seq along(x2), which(x2 == 5)) ]
Could it be that in your second example, small evaluates to 0?
Taking the zeroth element of a vector will always return the empty vector:
> foo <- 1:3
> foo
[1] 1 2 3
> foo[0]
integer(0)
>
Instead of using which to get your indices, I would use a boolean vector and negate it. That way you can do this:
small <- iris[, 1] < 4.0
smalltemp <- iris[!small, ]
dim(smalltemp)
[1] 150 4
EDIT: I don't think a negative index of 0 (as in your case) is allowed since there is no 0th index and thus R can't exclude that index from your selection. Negative indexing can be interpreted as: "give me back all rows except those with these indices".
It is because of the rules of what to do with an index that is zero. Only strictly positive or strictly negative indices are allowed. As [0] returns nothing, and
R> -0 == 0
[1] TRUE
Hence you get nothing where you expected it to drop nothing.
The identical(0) issue is treated as indexing by a NULL and this is documented to work as if indexing by 0 and hence the same behaviour.
This is discussed in the R Language Definition manual

Resources