Find P(X<Y<Z) in R - r

I want to find the P(X<Y<Z) in r. For each value of z_i, I want to check whether it satisfies the conditions or not. I demonstrated the problem below. Here I used the ifelse function in r. I don't how to put multiple statements within ifelse. When I type ifelse(z[i]>y>x, 1, 0) I get errors. I want to know how to include this.
x = c(1,1)
y = c(2,2)
z = c(3,3)
value = NULL
n1 = length(x)
n2 = length(y)
n3 = length(z)
for(i in 1: length(z)){
value[i] = sum (ifelse(z[i]>y & z[i]> x & y > x, 1, 0))
}
value
The desired output should be 4 4. But the above code gives 2 2. Thanks in advance.

Related

R Limit matrix in random walk

I have the following code for a random walk, in which I start from i and add up cumulatively for each line.
However, I need to limit my random walk on each line. One way I thought of doing this, would be from the index j (where the value in the position is less than or equal to 0 or greater than or equal to t) of each line replace with null.
simulate_binomial = function(cenarios, rodadas, p){
return(matrix(data=rbinom(cenarios*rodadas, 1, p), nrow=cenarios, ncol=rodadas))
}
i = 2
t = 10
p = 0.8
max_walk = 100
samples = simulate_binomial(1000, max_walk, p)
samples[samples==0] = -1
walk = t(apply(cbind(i, samples), 1, cumsum))
walk1 = apply(walk, 1, function(x) (which((x <= 0) | (x >= t))[1]))
So my walk1 would be the indices of each line that would have a value less than or equal to zero or greater than or equal to t. However, I don't know how to assign null for this index onwards in the line.
My intention is to assign null so that I can plot precisely without this null part and see the effect of the ruin on each line / "scenario".
Can anyone help me plz?
You can change your last apply to :
walk1 <- t(apply(walk, 1, function(x) {
inds <- (which((x <= 0) | (x >= t))[1])
x[(inds+1):length(x)] <- NA
x
}))

Listing all integer compositions in R

What I look for is basically an R-version of the answer to this question: Generating all permutation of numbers that sums up to N. First of all the answer uses java, which I have a really hard time reading. Second of all the code uses "deque", which I cant figure out a way to implement in R.
I have found several algorithms to do this, but they have all been written in programming languages using structures not available in R such as deques, heaps or list-comprehensions.
What I actually need is a way of finding all the vectors v of length N-1 where:
sum(v * 1:(N-1)) == N
and I think I can manage that myself if only I find a way of obtaining all the ordered integer partitions.
As an example for N = 4 all the ordered integer partitions using numbers 1 to N-1 are:
1+1+1+1
1+1+2
1+3
2+2
What I effectively need is output of the either form:
c(1,1,1,1)
c(1,1,2)
c(1,3)
c(2,2)
Or of the form:
c(4,0,0)
c(2,1,0)
c(1,0,1)
c(0,2,0)
since I should be able to convert the former format to the latter by myself. Any hint as to how to approach this problem using R would be greatly appreciated. The latter format is excactly the vectors v such that sum(v * 1:3) is 4.
EDIT:
My own attempt:
rek = function(mat, id1, id2){
if(id1 + id2 != length(mat) + 1){ #If next state not absorbing
mat[id1] = mat[id1] - 1
mat[id2] = mat[id2] - 1
mat[id1+id2] = mat[id1+id2] + 1
out = mat
id = which(mat > 0)
for(i in id){
for(j in id[id>=i]){
if(j == i & mat[i] == 1){
next
}
out = rbind(out, rek(mat,i,j))
}
}
return(out)
}
}
start = c(n, rep(0, n-2))
states = rbind(start, rek(start, 1, 1))
states = states[!duplicated(states), ] #only unique states.
This is incredibly inefficient. E. g. when n = 11, my states has over 120,000 rows prior to removing duplicates, which leaves only 55 rows.
EDIT 2:
Using the parts() function described below I came up with:
temp = partitions::parts(n)
temp = t(temp)
for(i in 1:length(temp[,1])){
row = temp[i,]
if(any(row>(n-1))){#if absorbing state
next
}
counts = plyr::count(row[row>0])
newrow = rep(0,n-1)
id = counts$x
numbs = counts$freq
newrow[id] = numbs
states = rbind(states, newrow)
}
states = states[-1,]#removing the first row, added manually
which excactly gives me the vectors v such that sum(v * 1:(N-1)) is N.
If anyone is interested, this is to be used within coalescent theory, as a way to describe the possible relations between N individuals omitting when all are related. As an example with N = 4:
(4, 0, 0) -- No individuals are related
(2, 1, 0) -- Two individuals are related, the rest are not
(0, 2, 0) -- The individuals are pair-wise related
(1, 0, 1) -- Three individuals are related, the other individual is not.
Hope parts from package partitions could help
library(partitions)
N <- 4
res <- unique(lapply(asplit(parts(N),2),function(x) sort(x[x>0])))[-1]
which gives
> res
[[1]]
[1] 1 3
[[2]]
[1] 2 2
[[3]]
[1] 1 1 2
[[4]]
[1] 1 1 1 1
If you would like to write a custom base R function, here is a recursive version
f <- function(n, vhead = n, v = c()) {
if (n == 0) return(list(v))
unlist(lapply(seq_len(min(n, vhead)), function(k) f(n - k, k, c(k,v))), recursive = FALSE)
}
then we can run
res <- Filter(function(x) length(x)>1,f(N))

Check if each number in a vector is between some numbers in another vector in R

Say I have two vectors, A and B. A has 15 variables and B has 28 variables.
A = c(13,14,29,31,32,39,42,51,59,61,68,91,102,109,120)
B = c(26,26,28,29,30,30,33,38,41,42,45,46,47,47,49,49,80,81,86,86,90,90,92,100,101,105,105,107)
I want a 14 by 27 matrix, Z, where a i by j entry is 1 if (B_j,B_{j+1}] overlaps with (A_i, A_{i+1}].
For instance, the (3,4) entry of Z would be 1 since (29,31] and (29,30] overlap, with 30 as a common number. Is there a fast way to compute this?
I have the following code:
Z = matrix(0, length(A)-1, length(B)-1)
for (i in 1:(length(A)-1)){
nn = which(B > A[i] & B <= A[(i+1)])
if (length(nn)>0){
Z[i,(nn-1)] = 1}}
It works well but my A and B vector often contain 30,000+ elements and it is incredibly slow. Making of the matrix Z even takes unnecessarily long time. Can anyone help with this?
Ideally, there is a vectorized solution to this or a well written function from a package that can do this like cutting a cake.
Here's an option using matrix multiplication. As commented the matrix can get big, and you'll have to see if the speed improvement is worth it:
res1 <- outer(A, B, FUN = function(A, B){B > A})
res2 <- outer(A, B, FUN = function(A, B){B <= A})
dim(res1); dim(res2)
res3 <- (res1[-nrow(res1),] + res2[-1,]) == 2
image(res3)
dim(res3)
op <- par(mfcol=c(1,2))
image(Z, main="Z")
image(res3, main="res3")
par(op)
If closed Intervals [B_j,B_{j+1}] and [A_i, A_{i+1}] are ok for you as well you could use
A <- as.integer(c(13,14,29,31,32,39,42,51,59,61,68,91,102,109,120))
B <- as.integer(c(26,26,28,29,30,30,33,38,41,42,45,46,47,47,49,49,80,81,86,86,90,90,92,100,101,105,105,107))
DT_A <- data.table(A0 = A, A1 = shift(A, type = "lead"), key=c("A0", "A1"))[-length(A)]
DT_B <- data.table(B0 = B, B1 = shift(B, type = "lead"), key=c("B0", "B1"))[-length(B)]
ind_true <- foverlaps(DT_A, DT_B, type="any", mult="all", which=TRUE)[!is.na(yid)]
mat <- matrix(0, length(A)-1, length(B)-1)
mat[ind_true$xid, ind_true$yid] = 1
This answer uses matrix indexing and relies on expand.grid though there are much faster implementations of it. You lag your vectors to create matrices of A and B, then with a function that does simple boolean check, we can index into the matrices with an expanded grid. Then it returns a matrix.
overlap = function(id,x1,x2){
idA = id[,1]
idB = id[,2]
o = (x1[idA,1] >= x2[idB,1] & x1[idA,1] <= x2[idB,2]) | (x1[idA,2] >= x2[idB,1] & x1[idA,2] <= x2[idB,2]) |
(x1[idA,1] <= x2[idB,1] & x1[idA,2] >= x2[idB,1]) | (x1[idA,1] <= x2[idB,2] & x1[idA,2] >= x2[idB,2])
matrix(o,nrow=nrow(x1))
}
A = c(13,14,29,31,32,39,42,51,59,61,68,91,102,109,120)
nA = cbind(lag(A),A)[-1,]
B = c(26,26,28,29,30,30,33,38,41,42,45,46,47,47,49,49,80,81,86,86,90,90,92,100,101,105,105,107)
nB = cbind(lag(B),B)[-1,]
expand.grid.jc <- function(seq1,seq2) {
cbind(Var1 = rep.int(seq1, length(seq2)),
Var2 = rep.int(seq2, rep.int(length(seq1),length(seq2))))
}
ids = expand.grid.jc(1:nrow(nA),1:nrow(nB))
overlap(ids,nA,nB)

Merging two vectors at random in R

I have two vectors x and y. x is a larger vector compared to y. For example (x is set to all zeros here, but that need not be the case)
x = rep(0,20)
y = c(2,3,-1,-1)
What I want to accomplish is overlay some y's in x but at random. So in the above example, x would look like
0,0,2,3,-1,-1,0,0,0,0,2,3,-1,-1,...
Basically, I'll step through each value in x, pull a random number, and if that random number is less than some threshold, I want to overlay y for the next 4 places in x unless I've reached the end of x. Would any of the apply functions help? Thanks much in advance.
A simple way of doing it would be to choose points at random (the same length as x) from the two vectors combined:
sample(c(x, y), length(x), replace = TRUE)
If you want to introduce some probability into it, you could do something like:
p <- c(rep(2, each = length(x)), rep(1, each = length(y)))
sample(c(x, y), length(x), prob = p, replace = TRUE)
This is saying that an x point is twice as likely to be chosen over a y point (change the 2 and 1 in p accordingly for different probabilities).
Short answer: yes :-) . Write some function like
ranx <- runif(length(x)-length(y)+1)
# some loop or apply func...
if (ranx[j] < threshold) x[j:j+length(y)] <- y
# and make sure to stop the loop at length(y)-length(x)
Something like the following worked for me.
i = 1
while(i <= length(x)){
p.rand = runif(1,0,1)
if(p.rand < prob[i]){
p[i:(i+length(y))] = y
i = i+length(y)
}
i = i + 1
}
where prob[i] is some probability vector.

Actual values of the maximum value in each interval

Lets assume
x = c(1, 2, 3.5, 4, 6, 7.5, 8, 9, 10, 11.5, 12)
y = c(2.5, 6.5)
I = split(x, findInterval(x, y))
f = function(I$'i', x) {
d = pmax(outer(x, I$'i', "-"), 0)
colSums(d - d^2/2)
}
I want to calculate the value of f(I$'i', x) in each values of each interval and then find which I$'i' actual value have the maximum value of f(I$'i', x ) in each interval. for example if we have three intervals , my result should be three values of x which f(I$'i', x) is maximum in each interval. how can i find these values?
In addition, it should be mentioned that in each iteration of my code the value of vector y changes.
I wrote this code but i can not find the actual values of the maximum value in each interval:
for(i in 0:length(I)-1){
max.value = I$'i'[which.max(f(I$'i', x))]
}
and i got this error:
Error in pmax(outer(x, I, "-"), 0) :
cannot mix 0-length vectors with others
The problem is attempting to index the ith element of the list. Doing I$'i' is trying to get the element of the list corresponding to the string 'i', which doesn't exist:
> i <- 1
> I$'i'
NULL
To fix this, you should index a list using the [[..]] notation (which indexes them in order, i.e. I[[1]] = I$'0'):
> i <- 1
> I[[i]]
[1] 1 2
> I$'0' # to illustrate the indexing
[1] 1 2
Assuming that f is just meant to take a vector (rather than an index into I), its definition should be something like:
f = function(vec, x) {
d = pmax(outer(x, vec, "-"), 0)
colSums(d - d^2/2)
}
And the loop like:
for (i in 1:length(i)) {
max.value = I[[i]][which.max(f(I[[i]], x))]
}
Note that you can iterate directly over the elements of a list, you don't need to index each one individually, so we could also do:
for (vec in I) {
max.value = vec[which.max(f(vec, x))]
}
(Also, you might want something slightly different to what you have, since in each loop max.value is overwritten.)

Resources