Identify position range values within a vector - r

I was wondering if there would be the possibility to identify the position of the range values according to a condition. This condition is determined by the longest sequence of values lower than 3.
For instance,
x <- c(4, 1, 2, 1, 1, 4, 1, 1, 1, 1, 2, 1, 1, 1, 1, 4, 1, 1)
Desired output:
c(7:15)
It may be that split() and rle() could be useful in this case but any help will be more than helpful.

You could do the rle on x < 3, then find which of the TRUEs is max. Then sum the lengths before the match plus one as well as the match itself (which will be the final position). Finally do a sequence with the values.
rl <- rle(x < 3)
w <- which(rl$lengths == max(rl$lengths) & rl$values)
do.call(seq.int, list(sum(rl$lengths[1:(w - 1)]) + 1, sum(rl$lengths[1:w])))
# [1] 7 8 9 10 11 12 13 14 15

Related

R sapply new evaluation of function for each incidence in array

Thanks to lots of help, I've got an expression that substitutes the value from a rbinom into a vector, when certain conditions are met. My problem is that it always substitutes the same value, i.e. does not do a new evaluation for each instance of the conditions being met. I think I just need to wrap it in a sapply statement but haven't got the syntax correct. MWE:
arr1 <- c(8, 2, 5, 2, 3, 2, 2, 2, 8, 2, 4)
arr2 <- c(0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0)
arr1
arr1[Reduce("&", list(arr1 == 2, arr2 ==1))] <- rbinom(1,1,0.5) * 2
arr1
arr1
[1] 8 2 5 0 3 0 0 0 8 0 4
I would have hoped that it changed some of the values but not others, so evaluated the result again for each instance. Is this a good application of purrr::modify2 ? Thx. J
Probably, you mean to use :
inds <- arr1 == 2 & arr2 == 1
arr1[inds] <- rbinom(sum(inds), 1, 0.5) * 2

r sequence problem - max number of changes in a given sequence

Can somebody help me understand a CS problem.
The problem is the New York Time Rollercoaster problem.
I have a queue:
queue <- seq(from = 1, to = 5)
1 2 3 4 5
A person can bribe another person who is ahead of them in the queue but by only a maximum of 2 times. Thus a queue sequence might look like:
Ride: 1, 2, 3, 4, 5 # Original queue
Ride: 1, 2, 3, 5, 4 # 5 bribes number 4
Ride: 1, 2, 5, 3, 4 # 5 bribes number 3 and thus runs out of bribes and cannot move further (it does not state in the problem if 3 can "re-bribe" 5 so I assume they cannot).
Ride: 2, 1, 5, 3, 4 # 2 bribes number 1
So given the input c(1, 2, 3, 4, 5) what are the minimum number of swaps it would take to get to the final output which would be c(2, 1, 5, 3, 4).
Python code from here:
def minimumBribes(q):
moves = 0
for pos, val in enumerate(q):
if (val-1) - pos > 2:
return "Too chaotic"
for j in xrange(max(0,val-2), pos):
if q[j] > val:
moves+=1
return moves
I am trying to re-create this in R and understand the solution.
Here's a way I think -
minimumBribes <- function(final_q) {
change <- final_q - seq_along(final_q)
if(any(change > 2)) return("Too chaotic!")
sum(change[change > 0])
}
minimumBribes(q = c(2, 1, 5, 3, 4))
[1] 3
Explanation -
initial_q <- 1:5
final_q <- c(2, 1, 5, 3, 4)
# calculate change in position; +ve is gain and -ve is loss
change <- final_q - initial_q
[1] 1 -1 2 -1 -1
# it is clear that if some gained x posn combined then other(s) lost x posn combined
# i.e. sum of posn gains and losses will always be 0
# therefore, to get min total swaps, simply add either gains or losses
# which in a way implies the most direct path from initial_q to final_q
sum(change[change > 0])
[1] 3

Variable sample upper value in R

I have the following matrix
m <- matrix(c(2, 4, 3, 5, 1, 5, 7, 9, 3, 7), nrow=5, ncol=2,)
colnames(x) = c("Y","Z")
m <-data.frame(m)
I am trying to create a random number in each row where the upper limit is a number based on a variable value (in this case 1*Y based on each row's value for for Z)
I currently have:
samp<-function(x){
sample(0:1,1,replace = TRUE)}
x$randoms <- apply(m,1,samp)
which work works well applying the sample function independently to each row, but I always get an error when I try to alter the x in sample. I thought I could do something like this:
samp<-function(x){
sample(0:m$Z,1,replace = TRUE)}
x$randoms <- apply(m,1,samp)
but I guess that was wishful thinking.
Ultimately I want the result:
Y Z randoms
2 5 4
4 7 7
3 9 3
5 3 1
1 7 6
Any ideas?
The following will sample from 0 to x$Y for each row, and store the result in randoms:
x$randoms <- sapply(x$Y + 1, sample, 1) - 1
Explanation:
The sapply takes each value in x$Y separately (let's call this y), and calls sample(y + 1, 1) on it.
Note that (e.g.) sample(y+1, 1) will sample 1 random integer from the range 1:(y+1). Since you want a number from 0 to y rather than 1 to y + 1, we subtract 1 at the end.
Also, just pointing out - no need for replace=T here because you are only sampling one value anyway, so it doesn't matter whether it gets replaced or not.
Based on #mathematical.coffee suggestion and my edited example this is the slick final result:
m <- matrix(c(2, 4, 3, 5, 1, 5, 7, 9, 3, 7), nrow=5, ncol=2,)
colnames(m) = c("Y","Z")
m <-data.frame(m)
samp<-function(x){
sample(Z + 1, 1)}
m$randoms <- sapply(m$Z + 1, sample, 1) - 1

R: How to find max length sequence between two values in a vector?

Let's say I want to find the longest length of consecutive numbers (excluding 0) in a sequence in R.
Example: (0,2,3,0,5) in this case it should return 2 .
The solution I came up with is as follows:
A1 <- c(1, 1, 0,1,1,1)
length =NULL
B<-rle(A1==0)
C<-B$lengths
D<-B$values
for(i in 1:length(C)){
if(D[i]==FALSE){length[i]=C[i]}
}
length <- length [!is.na(length )]
max(length)
[1] 3
How can I find the longest sequence of non-zero numbers in a vector in R?
We could use rle. A==0 output a logical index vector, rle computes the lengths and runs of values of adjacent elements that are the same for logical vector. Extract the lengths of values that are not '0' and get the max after removing the first and last elements to account for the maximum lengths of non-zero elements at the start or end of vector.
max(with(rle(A==0), lengths[-c(1, length(lengths))][
!values[-c(1, length(values))]]))
#[1] 2
Another example
A1 <- c(1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0,0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1)
max(with(rle(A1==0), lengths[-c(1, length(lengths))][
!values[-c(1, length(values))]]))
#[1] 4
Or
indx <- A1==0
max(with(rle(A1[which(indx)[1L] : tail(which(indx),1)]==0),
lengths[!values]))
#[1] 4
Update
Based on the new info, may be you can try,
A1 <- c(1, 1, 0,1,1,1)
max(with(rle(A1==0), lengths[!values]))
#[1] 3

ifelse statement in R

I'm looking at a gene in 10 people. And this gene has two alleles, say a and b. And each allele has 3 forms: type 2, 3 or 4.
a <- c(2, 2, 2, 2, 3, 3, 3, 2, 4, 3)
b <- c(4, 2, 3, 2, 4, 2, 3, 4, 4, 4)
I wish to code a variable that tells me how many type 4 alleles the person has: 0, 1, or 2.
var <- ifelse(a==4 & b==4, 2, 0)
The code above doesn't work since I didn't account for the individuals who have just one copy of the type 4 allele. I feel like I might need 2 ifelse statements that work simultaneously?
EDIT: You don't actually need ifelse or any fancy operations other than plus and equal to.
var <- (a == 4) + (b == 4)
If you're set on ifelse, this can be done with
var <- ifelse(a == 4, 1, 0) + ifelse(b == 4, 1, 0)
However, I prefer the following solution using apply. The following will give you three cases, the result being the number of 4's the person has (assuming each row is a person).
a = c(2, 2, 2, 2, 3, 3, 3, 2, 4, 3)
b = c(4, 2, 3, 2, 4, 2, 3, 4, 4, 4)
d <- cbind(a,b)
apply(d, 1, function(x) {sum(x == 4)})
For this operation, I first combined the two vectors into a matrix since it makes applying the function easier. In R, generally if data are the same type it is easier (and faster for the computer) to combine the data into a matrix/data frame/etc., then create a function to be performed on each row/column/etc.
To understand the output, consider what happens to the first row of d.
> d[1, ]
a b
2 4
> d[1, ] == 4
a b
FALSE TRUE
Booleans are interpreted as integers under addition, so
> FALSE + TRUE
[1] 1
It doesn't seem to matter whether the 4 came from a or b, so we end up with three cases: 0, 1, and 2, depending on the number of 4's.

Resources