How to set NonConvex = 2 in Gurobi in R? - r

I get this error when I run the MWE code below. Does anyone know how to resolve this? thanks!
Error: Error 10020: Q matrix is not positive semi-definite (PSD). Set NonConvex parameter to 2 to solve model.
MWE:
library(gurobi)
library(Matrix)
model <- list()
#optimization problem:
# max x + y
# s.t.
# -x + y <= 0
# x^2 - y^2 <= 10
# 0 <= x < = 20
# 0 <= y <= 20
model$obj <- c(1,1)
model$A <- matrix(c(-1,1), nrow=1, byrow=T) # for LHS of linear constraint: -x + y <= 0
model$rhs <- c(0) # for RHS of linear constraint: -x + y <= 0
model$ub[1] = 20 # x < = 20
model$ub[2] = 20 # y < = 20
model$sense <- c('<')
# non-convex quadratic constraint: x^2 - y^2 <= 10
qc1 <- list()
qc1$Qc <- spMatrix(2, 2, c(1, 2), c(1, 2), c(1.0, -1.0))
qc1$rhs <- 10
model$quadcon <- list(qc1)
#the QC constraint is a non-convex quadratic constraint, so set NonConvex = 2
model$params <- list(NonConvex=2)
gurobi_write(model,'quadtest.lp', env)
result <- gurobi(model) # THIS IS WHERE I GET THE ERROR ABOVE
print(result$objval)
print(result$x)

NM...i see that I can fix this by not putting the params as part of the model list, and instead running it as an input to the gurobi(,) call as follows:
params <- list(NonConvex=2)
result <- gurobi(model, params)

Related

floating-point error fligner.test r function?

I have noticed that using the statistical test fligner.test from the r stats package provides different results with a simple transformation, even though this shouldn't be the case.
Here an example (the difference for the original dataset is much more dramatic):
g <- factor(rep(1:2, each=6))
x1 <- c(2,2,6,6,1,4,5,3,5,6,5,5)
x2 <- (x1-1)/5 #> cor(x1,x2) [1] 1
fligner.test(x1,g) # chi-squared = 4.2794, df = 1, p-value = 0.03858
fligner.test(x2,g) # chi-squared = 4.8148, df = 1, p-value = 0.02822
Looking at the function code, I have noticed that the median centering might be causing the issue:
x1 <- x1 - tapply(x1,g,median)[g]
x2 <- x2 - tapply(x2,g,median)[g]
unique(abs(x1)) # 1 3 2 0
unique(abs(x2)) # 0.2 0.6 0.4 0.2 0.0 <- repeated 0.2
Is this a known issue, and how should this inconsistency be resolved?
I think your analysis is correct here. In your example the problem ultimately occurs because (0.8 - 0.6) == 0.2 is FALSE unless rounded to 15 decimal places. You should file a bug report, since this is avoidable.
If you are desperate in the meantime, you can adapt stats:::fligner.test.default by applying a tiny bit of rounding at the median centering stage to remove floating point inequalities:
fligner <- function (x, g, ...)
{
if (is.list(x)) {
if (length(x) < 2L)
stop("'x' must be a list with at least 2 elements")
DNAME <- deparse1(substitute(x))
x <- lapply(x, function(u) u <- u[complete.cases(u)])
k <- length(x)
l <- lengths(x)
if (any(l == 0))
stop("all groups must contain data")
g <- factor(rep(1:k, l))
x <- unlist(x)
}
else {
if (length(x) != length(g))
stop("'x' and 'g' must have the same length")
DNAME <- paste(deparse1(substitute(x)), "and",
deparse1(substitute(g)))
OK <- complete.cases(x, g)
x <- x[OK]
g <- g[OK]
g <- factor(g)
k <- nlevels(g)
if (k < 2)
stop("all observations are in the same group")
}
n <- length(x)
if (n < 2)
stop("not enough observations")
x <- round(x - tapply(x, g, median)[g], 15)
a <- qnorm((1 + rank(abs(x))/(n + 1))/2)
a <- a - mean(a)
v <- sum(a^2)/(n - 1)
a <- split(a, g)
STATISTIC <- sum(lengths(a) * vapply(a, mean, 0)^2)/v
PARAMETER <- k - 1
PVAL <- pchisq(STATISTIC, PARAMETER, lower.tail = FALSE)
names(STATISTIC) <- "Fligner-Killeen:med chi-squared"
names(PARAMETER) <- "df"
METHOD <- "Fligner-Killeen test of homogeneity of variances"
RVAL <- list(statistic = STATISTIC, parameter = PARAMETER,
p.value = PVAL, method = METHOD, data.name = DNAME)
class(RVAL) <- "htest"
return(RVAL)
}
This now returns the correct result for both your vectors:
fligner(x1,g)
#>
#> Fligner-Killeen test of homogeneity of variances
#>
#> data: x1 and g
#> Fligner-Killeen:med chi-squared = 4.2794, df = 1, p-value = 0.03858
fligner(x2,g)
#>
#> Fligner-Killeen test of homogeneity of variances
#>
#> data: x2 and g
#> Fligner-Killeen:med chi-squared = 4.2794, df = 1, p-value = 0.03858

Error in while (e_i$X1 < 12 | e_i$X2 < 12) { : argument is of length zero

In an earlier question (R: Logical Conditions Not Being Respected), I learned how to make the following simulation :
Step 1: Keep generating two random numbers "a" and "b" until both "a" and "b" are greater than 12
Step 2: Track how many random numbers had to be generated until it took for Step 1 to be completed
Step 3: Repeat Step 1 and Step 2 100 times
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:100){
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- 1
while(a < 12 | b < 12) {
a <- rnorm(1, 10, 1)
b <- rnorm(1, 10, 1)
i <- i + 1
}
x <- c(a,b,i)
res <- rbind(res, x)
}
head(res)
[,1] [,2] [,3]
x 12.14232 12.08977 399
x 12.27158 12.01319 1695
x 12.57345 12.42135 302
x 12.07494 12.64841 600
x 12.03210 12.07949 82
x 12.34006 12.00365 782
Question: Now, I am trying to make a slight modification to the above code - Instead of "a" and "b" being produced separately, I want them to be produced "together" (in math terms: "a" and "b" were being produced from two independent univariate normal distributions, now I want them to come from a bivariate normal distribution).
I tried to modify this code myself:
library(MASS)
Sigma = matrix(
c(1,0.5, 0.5, 1), # the data elements
nrow=2, # number of rows
ncol=2, # number of columns
byrow = TRUE) # fill matrix by rows
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:100){
e_i = data.frame(mvrnorm(n = 1, c(10,10), Sigma))
e_i$i <- 1
while(e_i$X1 < 12 | e_i$X2 < 12) {
e_i = data.frame(mvrnorm(n = 1, c(10,10), Sigma))
e_i$i <- i + 1
}
x <- c(e_i$X1, e_i$X2 ,i)
res <- rbind(res, x)
}
res = data.frame(res)
But this is producing the following error:
Error in while (e_i$X1 < 12 | e_i$X2 < 12) { : argument is of length
zero
If I understand your code correctly you are trying to see how many samples occur before both values are >=12 and doing that for 100 trials? This is the approach I would take:
library(MASS)
for(i in 1:100){
n <- 1
while(any((x <- mvrnorm(1, mu=c(10,10), Sigma=diag(0.5, nrow=2)+0.5))<12)) n <- n+1
if(i==1) res <- data.frame("a"=x[1], "b"=x[2], n)
else res <- rbind(res, data.frame("a"=x[1], "b"=x[2], n))
}
Here I am assigning the results of a mvrnorm to x within the while() call. In that same call, it evaluates whether either are less than 12 using the any() function. If that evaluates to FALSE, n (the counter) is increased and the process repeated. Once TRUE, the values are appended to your data.frame and it goes back to the start of the for-loop.
Regarding your code, the mvrnorm() function is returning a vector, not a matrix, when n=1 so both values go into a single variable in the data.frame:
data.frame(mvrnorm(n = 1, c(10,10), Sigma))
Returns:
mvrnorm.n...1..c.10..10...Sigma.
1 9.148089
2 10.605546
The matrix() function within your data.frame() calls, along with some tweaks to your use of i, will fix your code:
library(MASS)
Sigma = matrix(
c(1,0.5, 0.5, 1), # the data elements
nrow=2, # number of rows
ncol=2, # number of columns
byrow = TRUE) # fill matrix by rows
res <- matrix(0, nrow = 0, ncol = 3)
for (j in 1:10){
e_i = data.frame(matrix(mvrnorm(n = 1, c(10,10), Sigma), ncol=2))
i <- 1
while(e_i$X1[1] < 12 | e_i$X2[1] < 12) {
e_i = data.frame(matrix(mvrnorm(n = 1, c(10,10), Sigma), ncol=2))
i <- i + 1
}
x <- c(e_i$X1, e_i$X2 ,i)
res <- rbind(res, x)
}
res = data.frame(res)

Solving an equation in R with given constraints

I want to maximize the following function subject to the given constraints.
-p1log(p1) - p3log(p3) - p5log(p5)
subject to p1 + p3 + p5 = 1
and p1 + 3p3 + 5p5 = 3.5
p1 , p3 and p5 all lie between 0 and 1 [They are probabilities].
My question is how do I solve this in R? From what I saw, constrOptim() is one of the functions commonly used to solve these type of problems. However I could not figure it out.
Any help is appreciated.
Package Rsolnp uses Lagrange multipliers to solve non-linear problems with equality constraints. Here is how it would be setup. eps is meant not to have the logarithms produce NaN values.
library(Rsolnp)
f <-function(X) {
x <- X[1]
y <- X[2]
z <- X[3]
res <- -x*log(x) - y*log(y) - z*log(z)
-res
}
eq_f <- function(X){
x <- X[1]
y <- X[2]
z <- X[3]
c(
x + y + z,
x + 3*y + 5*z
)
}
eps <- .Machine$double.eps*10^2
X0 <- c(0.1, 0.1, 0.1)
sol <- solnp(
pars = X0,
fun = f,
eqfun = eq_f,
eqB = c(1, 3.5),
LB = rep(eps, 3),
UB = rep(1, 3)
)
#
#Iter: 1 fn: -1.0512 Pars: 0.21624 0.31752 0.46624
#Iter: 2 fn: -1.0512 Pars: 0.21624 0.31752 0.46624
#solnp--> Completed in 2 iterations
sol$convergence
#[1] 0
sol$pars
#[1] 0.2162396 0.3175208 0.4662396
sol$values
#[1] 0.000000 -1.051173 -1.051173
The last value of sol$values is the function value at the optimal parameters.
We can check that the constraints are met.
sum(sol$pars)
#[1] 1
sum(sol$pars*c(1, 3, 5))
#[1] 3.5

Generate random numbers in R satisfying constraints

I need help with a code to generate random numbers according to constraints.
Specifically, I am trying to simulate random numbers ALFA and BETA from, respectively, a Normal and a Gamma distribution such that ALFA - BETA < 1.
Here is what I have written but it does not work at all.
set.seed(42)
n <- 0
repeat {
n <- n + 1
a <- rnorm(1, 10, 2)
b <- rgamma(1, 8, 1)
d <- a - b
if (d < 1)
alfa[n] <- a
beta[n] <- b
l = length(alfa)
if (l == 10000) break
}
Due to vectorization, it will be faster to generate the numbers "all at once" rather than in a loop:
set.seed(42)
N = 1e5
a = rnorm(N, 10, 2)
b = rgamma(N, 8, 1)
d = a - b
alfa = a[d < 1]
beta = b[d < 1]
length(alfa)
# [1] 36436
This generated 100,000 candidates, 36,436 of which met your criteria. If you want to generate n samples, try setting N = 4 * n and you'll probably generate more than enough, keep the first n.
Your loop has 2 problems: (a) you need curly braces to enclose multiple lines after an if statement. (b) you are using n as an attempt counter, but it should be a success counter. As written, your loop will only stop if the 10000th attempt is a success. Move n <- n + 1 inside the if statement to fix:
set.seed(42)
n <- 0
alfa = numeric(0)
beta = numeric(0)
repeat {
a <- rnorm(1, 10, 2)
b <- rgamma(1, 8, 1)
d <- a - b
if (d < 1) {
n <- n + 1
alfa[n] <- a
beta[n] <- b
l = length(alfa)
if (l == 500) break
}
}
But the first way is better... due to "growing" alfa and beta in the loop, and generating numbers one at a time, this method takes longer to generate 500 numbers than the code above takes to generate 30,000.
As commented by #Gregor Thomas, the failure of your attempt is due to the missing of curly braces to enclose the if statement. If you would like to skip {} for if control, maybe you can try the code below
set.seed(42)
r <- list()
repeat {
a <- rnorm(1, 10, 2)
b <- rgamma(1, 8, 1)
d <- a - b
if (d < 1) r[[length(r)+1]] <- cbind(alfa = a, beta = b)
if (length(r) == 100000) break
}
r <- do.call(rbind,r)
such that
> head(r)
alfa beta
[1,] 9.787751 12.210648
[2,] 9.810682 14.046190
[3,] 9.874572 11.499204
[4,] 6.473674 8.812951
[5,] 8.720010 8.799160
[6,] 11.409675 10.602608

How can I modify my code to include loop?

I am trying to create a function that examines how variables with different distributions influence OLS results. I have created two DVs (y1 and y2) but would like to expand this to include five or so. I am trying to change my code to include a loop so I do not have to copy and paste this multiple times, but I am not having much luck. Any suggestions would be greatly appreciated.
library(psych)
library(arm)
library(plyr)
library(fBasics)
regsim <- function(iter, n) {
ek1 <- rnorm(n, 0, 1)
ek2 <- rnorm(n, 0, 5)
x <- rnorm(n, 0, .5)
y1 <- .3*x + ek1
y2 <- .3*x + ek2
#y1
lm1 <- lm(y1 ~ x)
bhat1 <- coef (lm1)[2]
sehat1 <- se.coef (lm1) [2]
skewy1 <- skew(y1)
stdevy1 <- stdev(y1)
#y2
lm2 <- lm(y2 ~ x)
bhat2 <- coef (lm2)[2]
sehat2 <- se.coef (lm2) [2]
skewy2 <- skew(y2)
stdevy2 <- stdev(y2)
results <- c(bhat1, sehat1, stdevy1, skewy1,
bhat2, sehat2, stdevy2, skewy2)
names(results) <- c('b1', 'se1', 'sdy1', 'skewy1',
'b2', 'se2', 'sdy2', 'skewy2')
return(results)
}
iter <-1000
n <-500
results <- NULL
sims <-ldply(1:iter, regsim, n)
sims$n <- n
results <- rbind(results, sims)
Another option...
regsim <- function(n=100,num.y=5,sd=c(1:5)){
if(length(sd) != num.y){stop('length of sd must match number of dependent vars')
} else {
ldply(1:num.y,function(x){
e <- rnorm(n,0,sd=sd[x])
x <- rnorm(n,0,5)
y <- 0.3*x + e
out <- lm(y~x)
b1 <- coef(out)[2]
int <- coef(out)[1]
data.frame(b1=b1,int=int)
})
}
}
regsim(num.y=10,sd=c(1:10))
b1 int
1 0.30817303 0.0781049
2 0.38681600 -0.3359067
3 0.24560773 -0.0277561
4 0.08032659 0.1877233
5 0.39873955 -0.6027522
6 0.21729930 0.7384340
7 0.33761456 -0.1053028
8 0.26502006 -0.1851552
9 0.15452261 -1.6334873
10 -0.10496863 -0.3225169
This will allow you to specify the number of dependent variables and the SD for each error term. You can then use replicate to repeat the function for the desired number of replications.
replicate(10,regsim(),simplify = F)
[[1]]
b1 int
1 0.3047779 -0.01984306
2 0.3133198 -0.20458410
3 0.2833979 -0.25307502
4 0.3066878 -0.03235019
5 0.1374949 0.10958616
[[2]]
b1 int
1 0.2902103 -0.12683502
2 0.3499006 0.06691437
3 0.1949797 -0.14371830
4 0.2358269 0.53117467
5 0.2869511 0.16281380
[[3]]
b1 int
1 0.2952211 0.05905549
2 0.2367774 0.02862166
3 0.0896778 -0.08467935
4 0.2352622 -0.20835837
5 0.3149963 0.07042032
[[4]]
b1 int
1 0.2946468 -0.08266406
2 0.3322577 0.17558135
3 0.2200087 -0.25778150
4 0.1822915 0.34962679
5 0.2442479 0.34433656
[[5]]
b1 int
1 0.2882853 0.12677506
2 0.3455534 -0.27885958
3 0.2981193 0.04598347
4 0.3380173 0.05243198
5 0.2148643 -0.09631672
[[6]]
b1 int
1 0.2962269 0.03743759
2 0.2979327 -0.12830803
3 0.3352781 -0.03935422
4 0.2584965 -0.05924351
5 0.2856802 0.03430055
[[7]]
b1 int
1 0.2968077 -0.10300109
2 0.2954560 0.25979902
3 0.3276077 -0.07001758
4 0.1825841 0.13508932
5 0.4302788 -0.13951914
[[8]]
b1 int
1 0.2992147 0.02084806
2 0.2765976 0.07277813
3 0.2469616 0.44580403
4 0.2601966 -0.09849855
5 0.2679183 0.50501652
[[9]]
b1 int
1 0.2963905 0.03308366
2 0.3356783 -0.06080088
3 0.3199835 0.22533444
4 0.3546083 -0.26909478
5 0.3536241 -0.19795094
[[10]]
b1 int
1 0.3100336 -0.05228032
2 0.4076447 -0.18715063
3 0.3436858 -0.37518649
4 0.4569368 -0.09114672
5 0.3255668 -0.18738138
How about this:
n <- 1000
x <- rnorm(n, 0, .5)
fun_reg <- function(n, ek_mu, ek_sd, x){
s <- list() # list to collect results for output
ek <- rnorm(n, ek_mu, ek_sd)
y <- .3*x + ek
m <- lm(y ~ x)
s$bhat <- coef(m)[2]
s$sehat <- arm::se.coef(m)[2]
s$skewy <- psych::skew(y)
s$stdevy <- fBasics::stdev(y)
return(s)
}
purrr::map_dfr(c(1, 5, 10, 20, 50), ~fun_reg(n, 0, ., x))
Edit:
This now has 500 observations each and the regression is repeated with 1000 draws for each value of the standard deviation. A variable ek_sd has been added to the final output, to reflect with which standard deviation the values were arrived at. Note that x is not redrawn for each iteration, but I'm not entirely sure, that that is what you want. If you want x to be redrawn at each iteration, move it inside the function.
n <- 500
x <- rnorm(n, 0, .5)
fun_reg <- function(n, ek_mu, ek_sd, x){
s <- list()
ek <- rnorm(n, ek_mu, ek_sd)
y <- .3*x + ek
m <- lm(y ~ x)
s$ek_sd <- ek_sd
s$bhat <- coef(m)[2]
s$sehat <- arm::se.coef(m)[2]
s$skewy <- psych::skew(y)
s$stdevy <- fBasics::stdev(y)
return(s)
}
intr <- unlist(lapply(c(1, 5, 10, 20, 50), rep, 1000))
purrr::map_dfr(intr, ~fun_reg(n, 0, ., x))
This reduces the package reliance to just psych::skew and an optional ggplot2 call:
library(psych)
regsim <- function(n, eks) {
x <- rnorm(n, 0, .5)
ek <- sapply(eks, function(x) rnorm(n, 0, x))
y <- 0.3 * x + ek
lms <- lm(y ~ x)
data.frame(b_hat = lms[['coefficients']][2,],
int = lms[['coefficients']][1, ],
skew_y = psych::skew(y),
se_hat = unlist(lapply(summary(lms), function(lst) lst[[4]][2,2]), use.names = FALSE),
sd_y = apply(y, 2, sd),
sd_eks = eks
)
}
iter <-1000
n <-500
eks_sd = c(1,5)
# do the simulations and make them into a nice data.frame
sims <- replicate(iter, regsim(n, eks_sd), simplify = FALSE)
results <- do.call(rbind, sims)
#next parts are optional
results$iter_id <- rep(seq_len(iter), each = length(eks_sd))
tibble::as_tibble(results)
# Random graph because everyone loves graphs
library(ggplot2)
ggplot(results, aes(x = iter_id, y = int)) + geom_point() + facet_grid(vars(sd_eks))
The main thing is that lm can take multiple y arguments. That's why we we create a matrix of ek using sapply.

Resources