Generate random numbers in R satisfying constraints - r

I need help with a code to generate random numbers according to constraints.
Specifically, I am trying to simulate random numbers ALFA and BETA from, respectively, a Normal and a Gamma distribution such that ALFA - BETA < 1.
Here is what I have written but it does not work at all.
set.seed(42)
n <- 0
repeat {
n <- n + 1
a <- rnorm(1, 10, 2)
b <- rgamma(1, 8, 1)
d <- a - b
if (d < 1)
alfa[n] <- a
beta[n] <- b
l = length(alfa)
if (l == 10000) break
}

Due to vectorization, it will be faster to generate the numbers "all at once" rather than in a loop:
set.seed(42)
N = 1e5
a = rnorm(N, 10, 2)
b = rgamma(N, 8, 1)
d = a - b
alfa = a[d < 1]
beta = b[d < 1]
length(alfa)
# [1] 36436
This generated 100,000 candidates, 36,436 of which met your criteria. If you want to generate n samples, try setting N = 4 * n and you'll probably generate more than enough, keep the first n.
Your loop has 2 problems: (a) you need curly braces to enclose multiple lines after an if statement. (b) you are using n as an attempt counter, but it should be a success counter. As written, your loop will only stop if the 10000th attempt is a success. Move n <- n + 1 inside the if statement to fix:
set.seed(42)
n <- 0
alfa = numeric(0)
beta = numeric(0)
repeat {
a <- rnorm(1, 10, 2)
b <- rgamma(1, 8, 1)
d <- a - b
if (d < 1) {
n <- n + 1
alfa[n] <- a
beta[n] <- b
l = length(alfa)
if (l == 500) break
}
}
But the first way is better... due to "growing" alfa and beta in the loop, and generating numbers one at a time, this method takes longer to generate 500 numbers than the code above takes to generate 30,000.

As commented by #Gregor Thomas, the failure of your attempt is due to the missing of curly braces to enclose the if statement. If you would like to skip {} for if control, maybe you can try the code below
set.seed(42)
r <- list()
repeat {
a <- rnorm(1, 10, 2)
b <- rgamma(1, 8, 1)
d <- a - b
if (d < 1) r[[length(r)+1]] <- cbind(alfa = a, beta = b)
if (length(r) == 100000) break
}
r <- do.call(rbind,r)
such that
> head(r)
alfa beta
[1,] 9.787751 12.210648
[2,] 9.810682 14.046190
[3,] 9.874572 11.499204
[4,] 6.473674 8.812951
[5,] 8.720010 8.799160
[6,] 11.409675 10.602608

Related

Error in while (e_i$X1 < 12 | e_i$X2 < 12) { : argument is of length zero

In an earlier question (R: Logical Conditions Not Being Respected), I learned how to make the following simulation :
Step 1: Keep generating two random numbers "a" and "b" until both "a" and "b" are greater than 12
Step 2: Track how many random numbers had to be generated until it took for Step 1 to be completed
Step 3: Repeat Step 1 and Step 2 100 times
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:100){
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- 1
while(a < 12 | b < 12) {
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- i + 1
}
x <- c(a,b,i)
res <- rbind(res, x)
}
head(res)
[,1] [,2] [,3]
x 12.14232 12.08977 399
x 12.27158 12.01319 1695
x 12.57345 12.42135 302
x 12.07494 12.64841 600
x 12.03210 12.07949 82
x 12.34006 12.00365 782
Question: Now, I am trying to make a slight modification to the above code - Instead of "a" and "b" being produced separately, I want them to be produced "together" (in math terms: "a" and "b" were being produced from two independent univariate normal distributions, now I want them to come from a bivariate normal distribution).
I tried to modify this code myself:
library(MASS)
Sigma = matrix(
c(1,0.5, 0.5, 1), # the data elements
nrow=2, # number of rows
ncol=2, # number of columns
byrow = TRUE) # fill matrix by rows
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:100){
e_i = data.frame(mvrnorm(n = 1, c(10,10), Sigma))
e_i$i <- 1
while(e_i$X1 < 12 | e_i$X2 < 12) {
e_i = data.frame(mvrnorm(n = 1, c(10,10), Sigma))
e_i$i <- i + 1
}
x <- c(e_i$X1, e_i$X2 ,i)
res <- rbind(res, x)
}
res = data.frame(res)
But this is producing the following error:
Error in while (e_i$X1 < 12 | e_i$X2 < 12) { : argument is of length
zero
If I understand your code correctly you are trying to see how many samples occur before both values are >=12 and doing that for 100 trials? This is the approach I would take:
library(MASS)
for(i in 1:100){
n <- 1
while(any((x <- mvrnorm(1, mu=c(10,10), Sigma=diag(0.5, nrow=2)+0.5))<12)) n <- n+1
if(i==1) res <- data.frame("a"=x[1], "b"=x[2], n)
else res <- rbind(res, data.frame("a"=x[1], "b"=x[2], n))
}
Here I am assigning the results of a mvrnorm to x within the while() call. In that same call, it evaluates whether either are less than 12 using the any() function. If that evaluates to FALSE, n (the counter) is increased and the process repeated. Once TRUE, the values are appended to your data.frame and it goes back to the start of the for-loop.
Regarding your code, the mvrnorm() function is returning a vector, not a matrix, when n=1 so both values go into a single variable in the data.frame:
data.frame(mvrnorm(n = 1, c(10,10), Sigma))
Returns:
mvrnorm.n...1..c.10..10...Sigma.
1 9.148089
2 10.605546
The matrix() function within your data.frame() calls, along with some tweaks to your use of i, will fix your code:
library(MASS)
Sigma = matrix(
c(1,0.5, 0.5, 1), # the data elements
nrow=2, # number of rows
ncol=2, # number of columns
byrow = TRUE) # fill matrix by rows
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:10){
e_i = data.frame(matrix(mvrnorm(n = 1, c(10,10), Sigma), ncol=2))
i <- 1
while(e_i$X1[1] < 12 | e_i$X2[1] < 12) {
e_i = data.frame(matrix(mvrnorm(n = 1, c(10,10), Sigma), ncol=2))
i <- i + 1
}
x <- c(e_i$X1, e_i$X2 ,i)
res <- rbind(res, x)
}
res = data.frame(res)

Iteration of a recurrence solution in R

I'm given a question in R language to find the 30th term of the recurrence relation x(n) = 2*x(n-1) - x(n-2), where x(1) = 0 and x(2) = 1. I know the answer is 29 from mathematical deduction. But as a newbie to R, I'm slightly confused by how to make things work here. The following is my code:
loop <- function(n){
a <- 0
b <- 1
for (i in 1:30){
a <- b
b <- 2*b - a
}
return(a)
}
loop(30)
I'm returned 1 as a result, which is way off.
In case you're wondering why this looks Python-ish, I've mostly only been exposed to Python programming thus far (I'm new to programming in general). I've tried to check out all the syntax in R, but I suppose my logic is quite fixed by Python. Can someone help me out in this case? In addition, does R have any resources like PythonTutor to help visualise the code execution logic?
Thank you!
I guess what you need might be something like below
loop <- function(n){
if (n<=2) return(n-1)
a <- 0
b <- 1
for (i in 3:n){
a_new <- b
b <- 2*b - a
a <- a_new
}
return(b)
}
then
> loop(30)
[1] 29
If you need a recursion version, below is one realization
loop <- function(n) {
if (n<=2) return(n-1)
2*loop(n-1)-loop(n-2)
}
which also gives
> loop(30)
[1] 29
You can solve it another couple of ways.
Solve the linear homogeneous recurrence relation, let
x(n) = r^n
plugging into the recurrence relation, you get the quadratic
r^n-2*r^(n-1)+r^(n-2) = 0
, i.e.,
r^2-2*r+1=0
, i.e.,
r = 1, 1
leading to general solution
x(n) = c1 * 1^n + c2 * n * 1^n = c1 + n * c2
and with x(1) = 0 and x(2) = 1, you get c2 = 1, c1 = -1, s.t.,
x(n) = n - 1
=> x(30) = 29
Hence, R code to compute x(n) as a function of n is trivial, as shown below:
x <- function(n) {
return (n-1)
}
x(30)
#29
Use matrix powers (first find the following matrix A from the recurrence relation):
(The matrix A has algebraic / geometric multiplicity, its corresponding eigenvectors matrix is singular, otherwise you could use spectral decomposition yourself for fast computation of matrix powers, here we shall use the library expm as shown below)
library(expm)
A <- matrix(c(2,1,-1,0), nrow=2)
A %^% 29 %*% c(1,0) # [x(31) x(30)]T = A^29.[x(2) x(1)]T
# [,1]
# [1,] 30 # x(31)
# [2,] 29 # x(30)
# compute x(n)
x <- function(n) {
(A %^% (n-1) %*% c(1,0))[2]
}
x(30)
# 29
You're not using the variable you're iterating on in the loop, so nothing is updating.
loop <- function(n){
a <- 0
b <- 1
for (i in 1:30){
a <- b
b <- 2*i - a
}
return(a)
}
You could define a recursive function.
f <- function(x, n) {
n <- 1:n
r <- function(n) {
if (length(n) == 2) x[2]
else r({
x <<- c(x[2], 2*x[2] - x[1])
n[-1]
})
}
r(n)
}
x <- c(0, 1)
f(x, 30)
# [1] 29

Manual simulation of Markov Chain in R

Consider the Markov chain with state space S = {1, 2}, transition matrix
and initial distribution α = (1/2, 1/2).
Simulate 5 steps of the Markov chain (that is, simulate X0, X1, . . . , X5). Repeat the simulation 100
times. Use the results of your simulations to solve the following problems.
Estimate P(X1 = 1|X0 = 1). Compare your result with the exact probability.
My solution:
# returns Xn
func2 <- function(alpha1, mat1, n1)
{
xn <- alpha1 %*% matrixpower(mat1, n1+1)
return (xn)
}
alpha <- c(0.5, 0.5)
mat <- matrix(c(0.5, 0.5, 0, 1), nrow=2, ncol=2)
n <- 10
for (variable in 1:100)
{
print(func2(alpha, mat, n))
}
What is the difference if I run this code once or 100 times (as is said in the problem-statement)?
How can I find the conditional probability from here on?
Let
alpha <- c(1, 1) / 2
mat <- matrix(c(1 / 2, 0, 1 / 2, 1), nrow = 2, ncol = 2) # Different than yours
be the initial distribution and the transition matrix. Your func2 only finds n-th step distribution, which isn't needed, and doesn't simulate anything. Instead we may use
chainSim <- function(alpha, mat, n) {
out <- numeric(n)
out[1] <- sample(1:2, 1, prob = alpha)
for(i in 2:n)
out[i] <- sample(1:2, 1, prob = mat[out[i - 1], ])
out
}
where out[1] is generated using only the initial distribution and then for subsequent terms we use the transition matrix.
Then we have
set.seed(1)
# Doing once
chainSim(alpha, mat, 1 + 5)
# [1] 2 2 2 2 2 2
so that the chain initiated at 2 and got stuck there due to the specified transition probabilities.
Doing it for 100 times we have
# Doing 100 times
sim <- replicate(chainSim(alpha, mat, 1 + 5), n = 100)
rowMeans(sim - 1)
# [1] 0.52 0.78 0.87 0.94 0.99 1.00
where the last line shows how often we ended up in state 2 rather than 1. That gives one (out of many) reasons why 100 repetitions are more informative: we got stuck at state 2 doing just a single simulation, while repeating it for 100 times we explored more possible paths.
Then the conditional probability can be found with
mean(sim[2, sim[1, ] == 1] == 1)
# [1] 0.4583333
while the true probability is 0.5 (given by the upper left entry of the transition matrix).

Cut integer into equally sized integers and assign to vector

Lets assume the integer x. I want to split this quantity in n mostly equal chunks and save the values in a vector. E.g. if x = 10 and n = 4 then the resulting vector would be:
(3,3,2,2)
and if n = 3:
(4,3,3)
Note: The order of the resulting vector does not matter
While this will create a (probably unnecessary) large object when x is large, it is still pretty quick:
x <- 10
n <- 4
tabulate(cut(1:x, n))
#[1] 3 2 2 3
On a decent modern machine dividing 10M records into 100K groups, it takes only 5 seconds:
x <- 1e7
n <- 1e5
system.time(tabulate(cut(1:x, n)))
# user system elapsed
# 5.07 0.06 5.13
Here are some solutions.
1) lpSolve Solve this integer linear program. It should be fast even for large x (but not if n is also large). I also tried it for x = 10,000 and n = 3 and it returned the solution immediately.
For example, for n = 4 and x = 10 it corresponds to
min x4 - x1 such that 0 <= x1 <= x2 <= x3 <= x4 and
x1 + x2 + x3 + x4 = 10 and
x1, x2, x3, x4 are all integer
The R code is:
library(lpSolve)
x <- 10
n <- 4
D <- diag(n)
mat <- (col(D) - row(D) == 1) - D
mat[n, ] <- 1
obj <- replace(numeric(n), c(1, n), c(-1, 1))
dir <- replace(rep(">=", n), n, "=")
rhs <- replace(numeric(n), n, x)
result <- lp("min", obj, mat, dir, rhs, all.int = TRUE)
result$solution
## [1] 2 2 3 3
and if we repeat the above with n = 3 we get:
## [1] 3 3 4
2) lpSolveAPI The lpSolveAPI package's interface to lpSolve supports a sparse matrix specification which may reduce storage if n is large although it may still be slow if n is sufficiently large. Rewriting (1) using this package we have:
library(lpSolveAPI)
x <- 10
n <- 4
mod <- make.lp(n, n)
set.type(mod, 1:n, "integer")
set.objfn(mod, c(-1, 1), c(1, n))
for(i in 2:n) add.constraint(mod, c(-1, 1), ">=", 0, c(i-1, i))
add.constraint(mod, rep(1, n), "=", x)
solve(mod)
get.variables(mod)
## [1] 2 2 3 3
3) Greedy Heuristic This alternative uses no packages. It starts with a candidate solution having n-1 values of x/n rounded down and one remaining value. On each iteration it tries to improve the current solution by subtracting one from the largest values and adding 1 to the same number of smallest values. It stops when it can make no further improvement in the objective, diff(range(soln)).
Note that for x <- 1e7 and n <- 1e5 it is quite an easy to solve since n divides evenly into x. In particular system.time(tabulate(cut(...))) reports 18 sec on my machine and for the same problem the code below takes 0.06 seconds as it gets the answer after 1 iteration.
For x <- 1e7 and n <- 1e5-1 system.time(tabulate(cut(...))) reports 16 seconds on my machine and for the same problem the code below takes 4 seconds finishing after 100 iterations.
In the example below, taken from the question, 10/4 rounded down is 2 so it starts out with c(2, 2, 2, 4). On the first iteration it gets c(2, 2, 3, 3). On the second iteration it cannot get any improvement and so returns the answer.
x <- 10
n <- 4
a <- x %/% n
soln <- replace(rep(a, n), n, x - (n-1)*a)
obj <- diff(range(soln))
iter <- 0
while(TRUE) {
iter <- iter + 1
soln_new <- soln
mx <- which(soln == max(soln))
ix <- seq_along(mx)
soln_new[ix] <- soln_new[ix] + 1
soln_new[mx] <- soln_new[mx] - 1
soln_new <- sort(soln_new)
obj_new <- diff(range(soln_new))
if (obj_new >= obj) break
soln <- soln_new
obj <- obj_new
}
iter
## [1] 2
soln
## [1] 2 2 3 3

Fastest way to do this double summation?

What is the fastest way to do this summation in R?
This is what I have so far
ans = 0
for (i in 1:dimx[1]){
for (j in 1:dimx[2]){
ans = ans + ((x[i,j] - parameters$mu)^2)/(parameters$omega_2[i]*parameters$sigma_2[j])
}
}
where omega_2, and sigma_2 are omega^2 and sigma^2 respectively.
Nothing fancy:
# sample data
m <- matrix(1:20, 4)
sigma <- 1:ncol(m)
omega <- 1:nrow(m)
mu <- 2
sum(((m - mu) / outer(omega, sigma))^2)
Usually it is quite easy to vectorize this kind of operations. In this case, though, it is a bit trickier when n is not equal to m and also because of double summation. But here is how we can proceed:
# n = 3, m = 2
xs <- cbind(1:3, 4:6)
omegas <- 1:3
sigmas <- 1:2
mu <- 3
sum((t((xs - mu) / omegas) / sigmas)^2)
# [1] 5
Here we use recycling three times and t() to divide appropriate elements by sigmas.

Resources