R optimization: minimize quadratic function - r

I have a task to minimize 2 similar quadratic functions for beta.
For example:
K <- matrix(data = c(1, 2, 1, 2, 1,
2, 16, 2, 1, 2,
1, 2, 8, 2, 1,
2, 1, 2, 16, 2,
1, 2, 1, 2, 32),
nrow = 5, ncol = 5, byrow = TRUE)
k <- c(-2, 4, 12, 0, 2)
First:
Where lambda is multiplier for penalty function.
I am using optim to solve this problem:
minimize <- function(beta){
value <- (1/2)*(t(beta)%*%K%*%beta) - t(k)%*%beta + 0.1*abs(sum(beta)-n_s) # lambda = 0.1
return(value)
}
betta2 = optim(c(0, 0, 0, 0, 0), minimize)
betta2$par
Which method in optim should I use for better and faster optimization?
Second:
With condition:
Where n_s and epsilon are known constants.
In other words, I have constrained and unconstrained minimization tasks.
I know, I can find a solution of second task using solve.QP, but constraints there might be only linear.
For example, I can minimize my second task using solve.QP:
betta = solve.QP(K, k, K, k, meq = 0, factorized = FALSE)
But I can't add my special constraints there (and I don't really like this solution):
A <- matrix(1:1, ncol = 5)
n_s = 5
epsilon = 0.1
betta = solve.QP(K, k, A, c(-(n_s*epsilon + n_s)), meq = 0, factorized = FALSE) # gets an error: Amat and dvec are incompatible!
How could I correctly use optim (or maybe another?) function to solve my constraint optimization problem (with my special constraint)? Which method should I choose for it?

Related

what is the difference and what does each of the functions exactly do? Why aren't they the same?

polyEval should represent a polynomial equation. For example, polyEval(x = 2, a = c(2, 3, 1)) is supposed to return 12, whereas polyEval(1, c(0, 1, 0, 1, 0, 1)) is supposed to return 3. This is the case for the first polyEval function, but why isn't it for the second? What exactly is the second one doing wrong? Because in my understanding, they should work the same.
polyEval <- function(x,a) {
n <- 1:length(a)
return(sum(c(a[n]*x^(n-1))))
}
polyEval <- function(x,a) {
sum(a*(x^(0:(length(a)))))
}
After correction, both raise x to the power 0, 1, 2, ..., length(a)-1 and then multiply the resulting vector by a and sum.
In the second one length(a) should be length(a)-1.
polyEval2 <- function(x,a) {
sum(a*(x^(0:(length(a)-1))))
}
polyEval2(2, c(2, 3, 1))
## [1] 12
polyEval2(1, c(0, 1, 0, 1, 0, 1))
## [1] 3

Getting (maybe manually) confidence interval of fits after using multi-way clustering package (multiwayvcov)

I am interested in plotting fits with confidence intervals after using two-way clustering package (multiwayvcov).
Here is my reproducible data.
rm(list=ls(all=TRUE))
library(lmtest)
library(multiwayvcov)
dv<-c(1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0)
int1<-c(0.0123, 0.3428, 0.2091, 0.8325, 0.7113, 0.7401, 0.6009, 0.5062, 0.4841, 0.8912, 0.3850, 0.2463, 0.0625, 0.5374, 0.1984)
int2<-c(0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0)
cont<-c(3, 1, 2, 4, 6, 7, 1, 4, 3, 2, 4, 3, 6, 1, 3)
cluster1<-c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3)
cluster2<-c(1, 2, 3, 1, 2, 3, 1, 2, 1, 2, 1, 2, 3, 1, 2)
mydata<-as.data.frame(cbind(dv, int1, int2, cont, cluster1, cluster2))
This is my non-clustered model:
result_lm <- lm(dv~int1+int2+cont,data=mydata)
To get clustered results using "cluster1" and "cluster2", I use functions in the package of "lmtest" and "multiwayvcov" as follows.
cluster_vcov<-cluster.vcov(result_lm, ~cluster1+cluster2)
result_2c<-coeftest(result_lm, cluster_vcov)
Here, "cluster_vcov" is just a variance-covariance matrix and "result_2c" is just an atomic vector. Thus, I am not able to use "predict" function to plot fits on a new dataset ("datagrid") such as
grid <- seq(0,1,.2)
datagrid <- data.frame(int1=rep(grid,2),
int2=c(rep(0,length(grid)),
rep(1,length(grid))))
datagrid$cont<-mean(mydata$cont, na.rm=T)
Before moving to what I have done, here is something similar what I would like to have eventually.
fits <- predict(result_lm,newdata=datagrid,interval="confidence")
plotdata <- data.frame(fits,datagrid)
plotdata$int2 <- plotdata$int2==1
ggplot(plotdata,aes(x=int1,y=fit,ymin=lwr,ymax=upr,color=int2)) + geom_line(aes(linetype = int2)) + geom_ribbon(alpha=.2) + theme(legend.position="none") + scale_color_manual(values=c("red", "darkgreen")) + scale_linetype_manual(values=c("dashed", "solid"))
The result is
To address the problem that "result_2c" does not give a dataframe that can be directly used with "predict", I decided to construct a data by myself as follows.
d_twc_result<-data.frame(matrix(0, nrow =4, ncol = 4) )
colnames(d_twc_result) <- c("Estimate","Std. Error","t value", "Pr(>|t|)")
rownames(d_twc_result) <-c("(Intercept)", "int1","int2", "cont")
for (j in 1:4){
for (i in 1:4){
d_twc_result[i, j]<-result_2c[i,j]
}
}
Then, using "d_twc_result$Estimate", I generate a vector that corresponds to "fits" that one could get after running "predict".
fits<-c(1:12)
for (i in 1:12){
fits[i]<-d_twc_result$Estimate[1]+
d_twc_result$Estimate[2]*datagrid$int1[i]+
d_twc_result$Estimate[3]*datagrid$int2[i]+
d_twc_result$Estimate[4]*datagrid$cont[i]
}
Yet, I was still not able to construct vectors for "lwr" and "upr", which requires 'residuals' or 'standard error'. What I was actually stuck is that it seems impossible to get 'residuals' or 'standard error' because there is no observation on 'dv' in the dataset "datagrid".
Nevertheless, "predict" works with the dataset "datagrid", so I guess that I am poorly understanding how "predict" works or the concept of fit.
It will be highly appreciated if you could help me to get "lwr" and "upr" (if my understanding of the concept of fit is incorrect). Thank for any comment in advance.

Error in fitdist with gamma distribution

Below are my codes:
library(fitdistrplus)
s <- c(11, 4, 2, 9, 3, 1, 2, 2, 3, 2, 2, 5, 8,3, 15, 3, 9, 22, 0, 4, 10, 1, 9, 10, 11,
2, 8, 2, 6, 0, 15, 0 , 2, 11, 0, 6, 3, 5, 0, 7, 6, 0, 7, 1, 0, 6, 4, 1, 3, 5,
2, 6, 0, 10, 6, 4, 1, 17, 0, 1, 0, 6, 6, 1, 5, 4, 8, 0, 1, 1, 5, 15, 14, 8, 1,
3, 2, 9, 4, 4, 1, 2, 18, 0, 0, 10, 5, 0, 5, 0, 1, 2, 0, 5, 1, 1, 2, 3, 7)
o <- fitdist(s, "gamma", method = "mle")
summary(o)
plot(o)
and the error says:
Error in fitdist(s, "gamma", method = "mle") : the function mle
failed to estimate the parameters,
with the error code 100
The Gamma distribution doesn't allow zero values (the likelihood will evaluate to zero, and the log-likelihood will be infinite, for a response of 0) unless the shape parameter is exactly 1.0 (i.e., an exponential distribution - see below) ... that's a statistical/mathematical problem, not a programming problem. You're going to have to find something sensible to do about the zero values. Depending on what makes sense for your application, you could (for example)
choose a different distribution to test (e.g. pick a censoring point and fit a censored Gamma, or fit a zero-inflated Gamma distribution, or ...)
exclude the zero values (fitdist(s[s>0], ...))
set the zero values to some sensible non-zero value (fitdist(replace(s,which(s==0),0.1),...)
which (if any) of these is best depends on your application.
#Sandipan Dey's first answer (leaving the zeros in the data set) appears to make sense, but in fact it gets stuck at the shape parameter equal to 1.
o <- fitdist(s, "exp", method = "mle")
gives the same answer as #Sandipan's code (except that it estimates rate=0.2161572, the inverse of the scale parameter=4.626262 that's estimated for the Gamma distribution - this is just a change in parameterization). If you choose to fit an exponential instead of a Gamma, that's fine - but you should do it on purpose, not by accident ...
To illustrate that the zeros-included fit may not be working as expected, I'll construct my own negative log-likelihood function and display the likelihood surface for each case.
mfun <- function(sh,sc,dd=s) {
-sum(dgamma(dd,shape=sh,scale=sc,log=TRUE))
}
library(emdbook) ## for curve3d() helper function
Zeros-included surface:
cc1 <- curve3d(mfun(x,y),
## set up "shape" limits" so we evaluate
## exactly shape=1.000 ...
xlim=c(0.55,3.55),
n=c(41,41),
ylim=c(2,5),
sys3d="none")
png("gammazero1.png")
with(cc1,image(x,y,z))
dev.off()
In this case the surface is only defined at shape=1 (i.e. an exponential distribution); the white regions represent infinite log-likelihoods. It's not that shape=1 is the best fit, it's that it's the only fit ...
Zeros-excluded surface:
cc2 <- curve3d(mfun(x,y,dd=s[s>0]),
## set up "shape" limits" so we evaluate
## exactly shape=1.000 ...
xlim=c(0.55,3.55),
n=c(41,41),
ylim=c(2,5),
sys3d="none")
png("gammazero2.png")
with(cc2,image(x,y,z))
with(cc2,contour(x,y,z,add=TRUE))
abline(v=1.0,lwd=2,lty=2)
dev.off()
Just provide the initial values for the gamma distribution parameters (scale, shape) to be computed with mle using optim and also the lower bounds for the parameters, it should work.
o <- fitdist(s, "gamma", lower=c(0,0), start=list(scale=1,shape=1))
summary(o)
#Fitting of the distribution ' gamma ' by maximum likelihood
#Parameters :
# estimate Std. Error
#scale 4.626262 NA
#shape 1.000000 NA
#Loglikelihood: -250.6432 AIC: 505.2864 BIC: 510.4766
As per the comments by #Ben Bolker, we may want to exclude the zero points first:
o <- fitdist(s[s!=0], "gamma", method = "mle", lower=c(0,0), start=list(scale=1,shape=1))
summary(o)
#Fitting of the distribution ' gamma ' by maximum likelihood
#Parameters :
# estimate Std. Error
#scale 3.401208 NA
#shape 1.622378 NA
#Loglikelihood: -219.6761 AIC: 443.3523 BIC: 448.19

Manually coded Poisson log likelihood function returns a different result from glm for interactive models

I've coded my own Poisson likelihood function, but it is returning values that are significantly different from glm for a model with an interaction for a specific data. Notice that the function spits out exactly the same result as glm from all other data I've tried, as well as for the model without the interaction for this data.
> # Log likelihood function
> llpoi = function(X, y){
+ # Ensures X is a matrix
+ if(class(X) != "matrix") X = as.matrix(X)
+ # Ensures there's a constant
+ if(sum(X[, 1]) != nrow(X)) X = cbind(1, X)
+ # A useful scalar that I'll need below
+ k = ncol(X)
+ ## Function to be maximized
+ FUN = function(par, X, y){
+ # beta hat -- the parameter we're trying to estimate
+ betahat = par[1:k]
+ # mu hat -- the systematic component
+ muhat = X %*% betahat
+ # Log likelihood function
+ sum(muhat * y - exp(muhat))
+ }
+ # Optimizing
+ opt = optim(rep(0, k), fn = FUN, y = y, X = X, control = list(fnscale = -1), method = "BFGS", hessian = T)
+ # Results, including getting the SEs from the hessian
+ cbind(opt$par, sqrt(diag(solve(-1 * opt$hessian))))
+ }
>
> # Defining inputs
> y = c(2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 3, 1, 1, 3, 2, 2, 2, 3, 1, 2, 4, 3, 3, 3, 1, 3, 0, 2, 1, 2, 4, 1, 2, 0, 2, 1, 2, 1, 4, 1, 2, 0)
> x1 = c(8, 1, 0, 3, 3, 3, 5, 4, 0.4, 1.5, 2, 1, 1, 7, 2, 3, 0, 2, 1.5, 5, 1, 4, 5.5, 6, 3, 3, 2, 0.5, 5, 10, 3, 22, 20, 3, 20, 10, 15, 25, 15, 6, 3.5, 5, 18, 2, 15.0, 16, 24)
> x2 = c(12, 12, 12, 16, 12, 12, 12, 12, 12, 12, 12, 12, 9, 9, 12, 9, 12, 12, 9, 16, 9, 6, 12, 9, 9, 12, 12, 12, 12, 14, 14, 14, 9, 12, 9, 12, 3, 12, 9, 6, 12, 12, 12, 12, 12, 12, 9)
>
> # Results
> withmyfun = llpoi(cbind(x1, x2, x1 * x2), y)
> round(withmyfun, 2)
[,1] [,2]
[1,] 0.96 0.90
[2,] -0.05 0.09
[3,] -0.02 0.08
[4,] 0.00 0.01
> withglm = glm(y ~ x1 + x2 + x1 * x2, family = "poisson")
> round(summary(withglm)$coef[, 1:2], 2)
Estimate Std. Error
(Intercept) 1.08 0.90
x1 -0.07 0.09
x2 -0.03 0.08
x1:x2 0.00 0.01
Is this something data specific? Is it inherent to the
optimization process, which will eventually diverge more significantly from glm and I just got unlucky with this data? Is it a function of using method = "BFGS" for optim?
By rescaling the right-hand side variables, the outcome improves a lot.
> library(data.table)
> setDT(tmp)
> tmp[, x1 := scale(x1)][, x2 := scale(x2)]
>
>
> withmyfun = with(tmp, llpoi(cbind(x1, x2, x1 * x2), y))
> withmyfun
[,1] [,2]
[1,] 0.57076392 0.1124637
[2,] -0.19620040 0.1278070
[3,] -0.01509032 0.1169019
[4,] 0.05636459 0.1380611
>
> withglm = glm(y ~ x1 + x2 + x1 * x2, family = "poisson", data = tmp)
> summary(withglm)$coef[, 1:2]
Estimate Std. Error
(Intercept) 0.57075132 0.1124641
x1 -0.19618199 0.1278061
x2 -0.01507467 0.1169034
x1:x2 0.05636934 0.1380621
>
So, my recommendation is, inside llpoi, to have a procedure to normalize the variables before using optim to the data and rescale the estimates based before the function returns the value. Your example data have too big range, which results in very small estimates of coefficients. This problem gets worse because of the relatively flat likelihood surface because of insignificant variables.
Note:
You can get very close outputs from this except for the intercept. What I meant by standardizing is something like that.
llpoi = function(X, y){
# Ensures X is a matrix
if(class(X) != "matrix") X = as.matrix(X)
# Ensures there's a constant
if(sum(X[, 1]) != nrow(X)) X = cbind(1, X)
# A useful scalar that I'll need below
avgs <- c(0, apply(X[, 2:ncol(X)], 2, mean))
sds <- c(1, apply(X[, 2:ncol(X)], 2, sd))
X<- t((t(X) - avgs)/sds)
k = ncol(X)
## Function to be maximized
FUN = function(par, X, y){
# beta hat -- the parameter we're trying to estimate
betahat = par[1:k]
# mu hat -- the systematic component
muhat = X %*% betahat
# Log likelihood function
sum(muhat * y - exp(muhat))
}
# Optimizing
opt = optim(rep(0, k), fn = FUN, y = y, X = X, control = list(fnscale = -1), method = "BFGS", hessian = T)
# Results, including getting the SEs from the hessian
cbind(opt$par, sqrt(diag(solve(-1 * opt$hessian))))/sds
}
After much research, I learned that the two results differ because glm.fit, the workhorse behind glm optimizes the function through Newton-Raphson method, while I used BFGS in my llpoi function. BFGS is faster, but less precise. The two results will be very similar on most cases, but may differ more significantly when the surface area is too flat or has too many maxima, as correctly pointed out by amatsuo_net, because the climbing algorithm used by BFGS will get stuck.

How does one calculate LD50 from a glmer?

I am analyzing a data set where ~10 individuals are exposed to a set treatment (Time) and mortality is recorded (Alive, Dead). glmer was used to model the data because Treatments were blocked (Trial).
From the following model I want to predict the Time at which 50% of individuals die.
Trial <- c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3)
Time <- c(2, 6, 9, 12, 15, 18, 21, 24, 1, 2, 3, 4, 5, 6, 1.5, 3, 4.5, 6, 39)
Alive <- c(10, 0, 0, 0, 0, 0, 0, 0, 6, 2, 8, 1, 0, 0, 4, 6, 1, 2, 0)
Dead <- c(0, 10, 6, 10, 10, 10, 7, 10, 0, 8, 1, 9, 10, 10, 5, 0, 8, 6, 10)
ostrinaA.glmm<- glmer(cbind(Alive, Dead)~Time+(1|Trial), family = binomial(link="logit"))
summary(ostrinaA.glmm)
If I was simply modelling using glmthe dose.p function from MASS could be used. From a different forum I found generalized code for a dose.p.glmm from Bill Pikounis. It is as follows:
dose.p.glmm <- function(obj, cf = 1:2, p = 0.5) {
eta <- obj$family$linkfun(p)
b <- fixef(obj)[cf]
x.p <- (eta - b[1L])/b[2L]
names(x.p) <- paste("p = ", format(p), ":", sep = "")
pd <- -cbind(1, x.p)/b[2L]
SE <- sqrt(((pd %*% vcov(obj)[cf, cf]) * pd) %*% c(1, 1))
res <- structure(x.p, SE = SE, p = p)
class(res) <- "glm.dose"
res
}
I'm new to coding and need help adjusting this code for my model. My attempt is as follows:
dose.p.glmm <- function(ostrinaA.glmm, cf = 1:2, p = 0.5) {
eta <- ostrinaA.glmm$family$linkfun(p)
b <- fixef(ostrinaA.glmm)[cf]
x.p <- (eta - b[1L])/b[2L]
names(x.p) <- paste("p = ", format(p), ":", sep = "")
pd <- -cbind(1, x.p)/b[2L]
SE <- sqrt(((pd %*% vcov(obj)[cf, cf]) * pd) %*% c(1, 1))
res <- structure(x.p, SE = SE, p = p)
class(res) <- "glm.dose"
res
}
dose.p.glmm(ostrinaA.glmm, cf=1:2, p=0.5)
Error in ostrinaA.glmm$family : $ operator not defined for this S4 class
Any assistance adjusting this code for my model would be greatly appreciated.
At a quick glance I would think replacing
eta <- obj$family$linkfun(p)
with
f <- family(obj)
eta <- f$linkfun(p)
should do the trick.
You also need to replace the res <- ... line with
res <- structure(x.p, SE = matrix(SE), p = p)
This is rather obscure, but is necessary because the print.dose.glm method (from the MASS package) automatically tries to cbind() some stuff together. This fails if SE is a fancy matrix from the Matrix package rather than a vanilla matrix from base R: matrix() does the conversion.
If you are very new to coding, you might not realize that you don't have to change the obj variable name in the code you've copied to ostrina.glmm. In other words, Pikounis's code should work perfectly well with only the two modifications I suggested above.

Resources