How to count the number of times a pattern changes? - r

I have a vector created simulating a continious time Markov Chain. The vector represents the path the chain may describe. Simulating 20 steps we could have:
Xt <- c(5, 5, 5, 5, 5, 4, 4, 4, 4, 3, 3, 3, 2, 2, 2, 1, 1, 1, 0 ,0)
Further, the vector can jump 1 by 1 or jump from any state (5,4,3,2,1) to 0. So other simulation could be:
Xt <- c(5, 5, 5, 5, 5, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
I want to count the number of times the simulated chain jumps to other state (when the vector changes of number) within a determined interval. For example:
The number of jumps for the first vector I wrote for the first 10 elements is 2 (Jumps from 5 to 4 and 4 to 0). The number of jumps for the second vector I wrote for the last 10 elements is 0 (The last 10 elements are all 0)
So I would like to count the number of jumps (the number of times the pattern changes). I tried using toString(Xt)and then trying to match some regex but nothing worked. Any ideas?

You can use diff for this which counts the difference between adjacent numbers in a vector. Sum all instances not equal to zero to get total times the pattern changes.
First 10:
sum(diff(Xt[1:10])!=0)
[1] 2
Last 10:
sum(diff(Xt[(length(Xt)-10):length(Xt)])!=0)
[1] 0

Seems like just count the number of times the difference was not zero would deliver the desired result:
Xt <- c(5, 5, 5, 5, 5, 4, 4, 4, 4, 3, 3, 3, 2, 2, 2, 1, 1, 1, 0 ,0)
sum(diff(Xt) != 0)
If the goal was to write a function that takes a string and a starting positon it could be done thusly:
jump_in_next_10 <- function(string, start){
sum( diff(string[start:(start+9)]) != 0 )}
jump_in_next_10(Xt, 3)
#[1] 2

Related

R sapply new evaluation of function for each incidence in array

Thanks to lots of help, I've got an expression that substitutes the value from a rbinom into a vector, when certain conditions are met. My problem is that it always substitutes the same value, i.e. does not do a new evaluation for each instance of the conditions being met. I think I just need to wrap it in a sapply statement but haven't got the syntax correct. MWE:
arr1 <- c(8, 2, 5, 2, 3, 2, 2, 2, 8, 2, 4)
arr2 <- c(0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0)
arr1
arr1[Reduce("&", list(arr1 == 2, arr2 ==1))] <- rbinom(1,1,0.5) * 2
arr1
arr1
[1] 8 2 5 0 3 0 0 0 8 0 4
I would have hoped that it changed some of the values but not others, so evaluated the result again for each instance. Is this a good application of purrr::modify2 ? Thx. J
Probably, you mean to use :
inds <- arr1 == 2 & arr2 == 1
arr1[inds] <- rbinom(sum(inds), 1, 0.5) * 2

r sequence problem - max number of changes in a given sequence

Can somebody help me understand a CS problem.
The problem is the New York Time Rollercoaster problem.
I have a queue:
queue <- seq(from = 1, to = 5)
1 2 3 4 5
A person can bribe another person who is ahead of them in the queue but by only a maximum of 2 times. Thus a queue sequence might look like:
Ride: 1, 2, 3, 4, 5 # Original queue
Ride: 1, 2, 3, 5, 4 # 5 bribes number 4
Ride: 1, 2, 5, 3, 4 # 5 bribes number 3 and thus runs out of bribes and cannot move further (it does not state in the problem if 3 can "re-bribe" 5 so I assume they cannot).
Ride: 2, 1, 5, 3, 4 # 2 bribes number 1
So given the input c(1, 2, 3, 4, 5) what are the minimum number of swaps it would take to get to the final output which would be c(2, 1, 5, 3, 4).
Python code from here:
def minimumBribes(q):
moves = 0
for pos, val in enumerate(q):
if (val-1) - pos > 2:
return "Too chaotic"
for j in xrange(max(0,val-2), pos):
if q[j] > val:
moves+=1
return moves
I am trying to re-create this in R and understand the solution.
Here's a way I think -
minimumBribes <- function(final_q) {
change <- final_q - seq_along(final_q)
if(any(change > 2)) return("Too chaotic!")
sum(change[change > 0])
}
minimumBribes(q = c(2, 1, 5, 3, 4))
[1] 3
Explanation -
initial_q <- 1:5
final_q <- c(2, 1, 5, 3, 4)
# calculate change in position; +ve is gain and -ve is loss
change <- final_q - initial_q
[1] 1 -1 2 -1 -1
# it is clear that if some gained x posn combined then other(s) lost x posn combined
# i.e. sum of posn gains and losses will always be 0
# therefore, to get min total swaps, simply add either gains or losses
# which in a way implies the most direct path from initial_q to final_q
sum(change[change > 0])
[1] 3

Arithmetic operation (Difference) between a vector and a Matrix in R

I am facing this issue when I am trying to calculate the difference between list and matrix in R, it is giving me weird results.
matrix <- matrix(1:10,1:10,nrow = 10,ncol=10)
list1 <- seq(1:10)
diff <- list1-matrix
Below is the output that I am trying to achieve but it does not work. Please let me know what I am doing wrong.
We can do this with multiple ways
1) Transpose the matrix, subtract from the vector and then do the transpose
t(t(matrix)- list1)
2) replicate the vector to make the lengths same and then do the difference
matrix - list1[col(matrix)]
3) Using sweep
sweep(matrix, 2, list1, `-`)
What you need to understand is that in R matrices are stored in column-major order. This means that a matrix is a sequence of numbers order by columns from top to down.
So when you subtract a vector from a matrix you can imagine it as operation with 4 steps:
matrix is turned into a sequence of numbers (a vector)
your "list1" is expanded (repeated) so that it's length matches the length of the vector constructed from the matrix.
the elements in each are subtracted from each other
the numbers are assembled into the matrix form again.
In your case:
1) matrix is turned into a sequence of numbers (by columns) giving you:
1, 1, 1 ,1 ... 1, 2, 2, 2, 2, ... 2, ... ... 10, 10, 10, 10, ... 10
2) Your second vector is repeated until it matches the first one:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 ,2 ,3 ,4 ,5 ,6 ,7 ,8, 9, 10, ... 1, 2, 3, ... 10
3) Now both of them have the same length so they are subtracted from each other:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, ..., -9, -8, -7, ... 0
4) The result is transformed into a matrix again giving you the matrix that you see.
In order to obtain your desired result you have to take this into account. One way is to transpose your matrix, do the subtraction, and transpose it back again:
t(t(matrix) - list1)
This works because after the transpose t the rows are turned into columns.
Another approach is to construct a matrix out of your list1 vector:
matrix2 <- matrix(list1, nrow=10, ncol=10, byrow=TRUE)
And then do the subtraction:
diff <- matrix - matrix2

How to only get value from data.frame in R?

i am trying to calculate the probabilities of 4 dices being thrown in R. I am nearly finished, i just want to know how i could possibly access ONLY the value in a specific row of my test1 dataframe? If i write rowSums(test1[1,]) it gives me both the index AND the sum, but i only want to access the sum to be able to store how many possibilities there are to get i.e. a 4 with 4 dices etc.
HereĀ“s the important place of the code.
wurf1 <- c(1, 2, 3, 4, 5, 6)
wurf2 <- c(1, 2, 3, 4, 5, 6)
wurf3 <- c(1, 2, 3, 4, 5, 6)
wurf4 <- c(1, 2, 3, 4, 5, 6)
test1 <- data.frame(expand.grid(wurf1, wurf2, wurf3, wurf4))
rowSums(test1[1,]) #this gives me:
1
4 #because the sum of the values in index 1 = 4
Thank you for your help in advance.

ifelse statement in R

I'm looking at a gene in 10 people. And this gene has two alleles, say a and b. And each allele has 3 forms: type 2, 3 or 4.
a <- c(2, 2, 2, 2, 3, 3, 3, 2, 4, 3)
b <- c(4, 2, 3, 2, 4, 2, 3, 4, 4, 4)
I wish to code a variable that tells me how many type 4 alleles the person has: 0, 1, or 2.
var <- ifelse(a==4 & b==4, 2, 0)
The code above doesn't work since I didn't account for the individuals who have just one copy of the type 4 allele. I feel like I might need 2 ifelse statements that work simultaneously?
EDIT: You don't actually need ifelse or any fancy operations other than plus and equal to.
var <- (a == 4) + (b == 4)
If you're set on ifelse, this can be done with
var <- ifelse(a == 4, 1, 0) + ifelse(b == 4, 1, 0)
However, I prefer the following solution using apply. The following will give you three cases, the result being the number of 4's the person has (assuming each row is a person).
a = c(2, 2, 2, 2, 3, 3, 3, 2, 4, 3)
b = c(4, 2, 3, 2, 4, 2, 3, 4, 4, 4)
d <- cbind(a,b)
apply(d, 1, function(x) {sum(x == 4)})
For this operation, I first combined the two vectors into a matrix since it makes applying the function easier. In R, generally if data are the same type it is easier (and faster for the computer) to combine the data into a matrix/data frame/etc., then create a function to be performed on each row/column/etc.
To understand the output, consider what happens to the first row of d.
> d[1, ]
a b
2 4
> d[1, ] == 4
a b
FALSE TRUE
Booleans are interpreted as integers under addition, so
> FALSE + TRUE
[1] 1
It doesn't seem to matter whether the 4 came from a or b, so we end up with three cases: 0, 1, and 2, depending on the number of 4's.

Resources