Is it possible to solve the following problem in R?
In particular, I want to find the values of a1 and a2 minimizing the loss below:
> n <- 1000
> x <- rnorm(n, 1, 1)
> e <- rnorm(n, 0, 1)
> d <- a1+a2*x+e < 0
> loss <- (mean(d) - 0.5) + (mean((a1 + a2*x + e)[d=0]) - 2)
That is, I want to find the values of a1 and a2 that make mean(d) and mean((a1+a2*x+e)[d=0]) as close as possible to 0.5 and 2, respectively.
(the chosen values 0.5 and 2 are just temporal values)
Using optim with a function f that computes the defined loss. p is a vector of parameters, i.e. p[1] is your a1, and p[2] your a2. Use reasonable starting values when calling optim with your function.
f <- \(p) {
d <- p[1] + p[2]*x + e < 0
(mean(d) - 0.5) + (mean((p[1] + p[2]*x + e)[d]) - 2)
}
res <- optim(c(0, 0), f)
res$par
# [1] 4.393432e+53 1.010012e+55 ## a1 and a2
Note that d is already boolean.
In case you get different results with different starting values, your distribution might be multi-modal.
Data:
n <- 1e3; set.seed(42); x <- rnorm(n, 1, 1); e <- rnorm(n, 0, 1)
Related
I want to maximize the following function subject to the given constraints.
-p1log(p1) - p3log(p3) - p5log(p5)
subject to p1 + p3 + p5 = 1
and p1 + 3p3 + 5p5 = 3.5
p1 , p3 and p5 all lie between 0 and 1 [They are probabilities].
My question is how do I solve this in R? From what I saw, constrOptim() is one of the functions commonly used to solve these type of problems. However I could not figure it out.
Any help is appreciated.
Package Rsolnp uses Lagrange multipliers to solve non-linear problems with equality constraints. Here is how it would be setup. eps is meant not to have the logarithms produce NaN values.
library(Rsolnp)
f <-function(X) {
x <- X[1]
y <- X[2]
z <- X[3]
res <- -x*log(x) - y*log(y) - z*log(z)
-res
}
eq_f <- function(X){
x <- X[1]
y <- X[2]
z <- X[3]
c(
x + y + z,
x + 3*y + 5*z
)
}
eps <- .Machine$double.eps*10^2
X0 <- c(0.1, 0.1, 0.1)
sol <- solnp(
pars = X0,
fun = f,
eqfun = eq_f,
eqB = c(1, 3.5),
LB = rep(eps, 3),
UB = rep(1, 3)
)
#
#Iter: 1 fn: -1.0512 Pars: 0.21624 0.31752 0.46624
#Iter: 2 fn: -1.0512 Pars: 0.21624 0.31752 0.46624
#solnp--> Completed in 2 iterations
sol$convergence
#[1] 0
sol$pars
#[1] 0.2162396 0.3175208 0.4662396
sol$values
#[1] 0.000000 -1.051173 -1.051173
The last value of sol$values is the function value at the optimal parameters.
We can check that the constraints are met.
sum(sol$pars)
#[1] 1
sum(sol$pars*c(1, 3, 5))
#[1] 3.5
I need help with a code to generate random numbers according to constraints.
Specifically, I am trying to simulate random numbers ALFA and BETA from, respectively, a Normal and a Gamma distribution such that ALFA - BETA < 1.
Here is what I have written but it does not work at all.
set.seed(42)
n <- 0
repeat {
n <- n + 1
a <- rnorm(1, 10, 2)
b <- rgamma(1, 8, 1)
d <- a - b
if (d < 1)
alfa[n] <- a
beta[n] <- b
l = length(alfa)
if (l == 10000) break
}
Due to vectorization, it will be faster to generate the numbers "all at once" rather than in a loop:
set.seed(42)
N = 1e5
a = rnorm(N, 10, 2)
b = rgamma(N, 8, 1)
d = a - b
alfa = a[d < 1]
beta = b[d < 1]
length(alfa)
# [1] 36436
This generated 100,000 candidates, 36,436 of which met your criteria. If you want to generate n samples, try setting N = 4 * n and you'll probably generate more than enough, keep the first n.
Your loop has 2 problems: (a) you need curly braces to enclose multiple lines after an if statement. (b) you are using n as an attempt counter, but it should be a success counter. As written, your loop will only stop if the 10000th attempt is a success. Move n <- n + 1 inside the if statement to fix:
set.seed(42)
n <- 0
alfa = numeric(0)
beta = numeric(0)
repeat {
a <- rnorm(1, 10, 2)
b <- rgamma(1, 8, 1)
d <- a - b
if (d < 1) {
n <- n + 1
alfa[n] <- a
beta[n] <- b
l = length(alfa)
if (l == 500) break
}
}
But the first way is better... due to "growing" alfa and beta in the loop, and generating numbers one at a time, this method takes longer to generate 500 numbers than the code above takes to generate 30,000.
As commented by #Gregor Thomas, the failure of your attempt is due to the missing of curly braces to enclose the if statement. If you would like to skip {} for if control, maybe you can try the code below
set.seed(42)
r <- list()
repeat {
a <- rnorm(1, 10, 2)
b <- rgamma(1, 8, 1)
d <- a - b
if (d < 1) r[[length(r)+1]] <- cbind(alfa = a, beta = b)
if (length(r) == 100000) break
}
r <- do.call(rbind,r)
such that
> head(r)
alfa beta
[1,] 9.787751 12.210648
[2,] 9.810682 14.046190
[3,] 9.874572 11.499204
[4,] 6.473674 8.812951
[5,] 8.720010 8.799160
[6,] 11.409675 10.602608
Consider the Markov chain with state space S = {1, 2}, transition matrix
and initial distribution α = (1/2, 1/2).
Simulate 5 steps of the Markov chain (that is, simulate X0, X1, . . . , X5). Repeat the simulation 100
times. Use the results of your simulations to solve the following problems.
Estimate P(X1 = 1|X0 = 1). Compare your result with the exact probability.
My solution:
# returns Xn
func2 <- function(alpha1, mat1, n1)
{
xn <- alpha1 %*% matrixpower(mat1, n1+1)
return (xn)
}
alpha <- c(0.5, 0.5)
mat <- matrix(c(0.5, 0.5, 0, 1), nrow=2, ncol=2)
n <- 10
for (variable in 1:100)
{
print(func2(alpha, mat, n))
}
What is the difference if I run this code once or 100 times (as is said in the problem-statement)?
How can I find the conditional probability from here on?
Let
alpha <- c(1, 1) / 2
mat <- matrix(c(1 / 2, 0, 1 / 2, 1), nrow = 2, ncol = 2) # Different than yours
be the initial distribution and the transition matrix. Your func2 only finds n-th step distribution, which isn't needed, and doesn't simulate anything. Instead we may use
chainSim <- function(alpha, mat, n) {
out <- numeric(n)
out[1] <- sample(1:2, 1, prob = alpha)
for(i in 2:n)
out[i] <- sample(1:2, 1, prob = mat[out[i - 1], ])
out
}
where out[1] is generated using only the initial distribution and then for subsequent terms we use the transition matrix.
Then we have
set.seed(1)
# Doing once
chainSim(alpha, mat, 1 + 5)
# [1] 2 2 2 2 2 2
so that the chain initiated at 2 and got stuck there due to the specified transition probabilities.
Doing it for 100 times we have
# Doing 100 times
sim <- replicate(chainSim(alpha, mat, 1 + 5), n = 100)
rowMeans(sim - 1)
# [1] 0.52 0.78 0.87 0.94 0.99 1.00
where the last line shows how often we ended up in state 2 rather than 1. That gives one (out of many) reasons why 100 repetitions are more informative: we got stuck at state 2 doing just a single simulation, while repeating it for 100 times we explored more possible paths.
Then the conditional probability can be found with
mean(sim[2, sim[1, ] == 1] == 1)
# [1] 0.4583333
while the true probability is 0.5 (given by the upper left entry of the transition matrix).
I generated a 600 length sample using:
x <- rnorm(600, mean = 30, sd = 10)
and then made another 600 length list using:
y = ((x-30)/10)
and my plan is to have if statements that test whether y is bigger than 1.96 and if -y is smaller than 1.96 and if this happens then increment a variable a by 1 and if this doesn't happen then increment a variable b by 1.
I have tried the following things:
a = 0
b = 0
ifelse(y > 1.96, inc(a) <- 1, inc(b) <- 1)
ifelse(-y < -1.96, inc(a) <- 1, inc(b) <- 1)
Error in inc(a) <- 1 : could not find function "inc<-"
Error in inc(b) <- 1 : could not find function "inc<-"
ifelse(y > 1.96, '+'(a) <- 1, '+'(b) <- 1)
ifelse(-y < -1.96, '+'(a) <- 1, '+'(b) <- 1)
Error in +a <- 1 : could not find function "+<-"
Error in +b <- 1 : could not find function "+<-"
if (y > 1.96) {
a = a + 1
}
if (-y < -1.96) {
a = a + 1
} else{
b = b + 1
}
Warning message:
In if (y > 1.96) { :
the condition has length > 1 and only the first element will be used
ifelse(y > 1.96, a <- a + 1, b <- b + 1)
ifelse(-y < -1.96, a <- a + 1, b <- b + 1)
This almost worked but it gave me an output of all 1's and:
a
# [1] 1
b
# [1] 1
So how else would I increment the variables or what am I doing wrong?
Avoid using an ifelse() for this kind of computation. In this case, it's better to use R's vectorization properties (e.g. there is no "scalar" in R only vectors) by obtaining a vector of TRUE/FALSE values and then sum over the TRUE values.
# Make the data
set.seed(1337)
x = rnorm(600, mean = 30, sd = 10)
y = ((x-30)/10)
# Get Indicators (T/F)
v1 = y > 1.96
v2 = y < -1.96
# Sum over w.r.t to true cases
a = sum(v1) + sum(v2)
# Handle the false cases without resumming.
b = 2*length(y) - a
Also, here we opt to use y < -1.96 to get an appropriate two-side count.
I have below code in Matlab:
alpha=5.5; beta=3.1; a=0; b=1; c=2.5;
X=0; Y=c; % Initial values
while Y > gamma(alpha+beta)/gamma(alpha)/gamma(beta)...
* X.^(alpha-1).* (1-X).^(beta-1);
U=rand; V=rand; X=a+(b-a)*U; Y=c*V;
end;
X;
and I want to convert to R. This is my attempt (something is wrong in operators):
alpha <- 5.5
beta <- 3 .1
a <- 0
b <- 1
c <- 2.5
X <- 0
Y <- c
while(Y > gamma(alpha+beta)/gamma(alpha)/gamma(beta))
{
*X.^(alpha-1).*(1-X).^(beta-1) # incorrect line
U=runif(1, 0, 1)
V=runif(1, 0, 1)
X=a+(b-a)*U
Y=c*V
}
print (X)
The beta had an invalid assignment and you needed the { on the same line as the while:
alpha <- 5.5
beta <- 3.1
a <- 0
b <- 1
c <- 2.5
X <- 0
Y <- c
while (Y > gamma(alpha + beta)/gamma(alpha)/gamma(beta) * X^(alpha - 1) * (1 - X)^(beta - 1)) {
U <- runif(1, 0, 1)
V <- runif(1, 0, 1)
X <- a + (b - a) * U
Y <- c * V
}
Check your code to make sure you are doing what you think you are doing, no assignment is being done on the line after the while, the loop kept exiting immediately for me (but running without error) and printing 0 for X.
EDIT:
Runs correctly now thanks to the nice spot by #Khashaa