I am using R {fExtremes} to find best parameters of GEV distribution for my data (a vector). but get the following error message
Error in solve.default(fit$hessian) : Lapack routine dgesv: system is exactly singular: U[1,1] = 0
I traced back to fit$hessian, found my hessian matrix is a sigular matrix, all of the elements are 0s. The source code (https://github.com/cran/fExtremes/blob/master/R/GevFit.R) of gevFit() shows fit$hessian is calculated by optim(). The output parameters are exactly the same value as the initial parameters. I am wondering what could be the problems of my data that cause this problem? I copied my code here
> min(sample);
[1] 5.240909
> max(sample)
[1] 175.8677
> length(sample)
[1] 6789
> mean(sample)
[1] 78.04107
>para<-gevFit(sample, type = "mle")
Error in solve.default(fit$hessian) :
Lapack routine dgesv: system is exactly singular: U[1,1] = 0
fit = optim(theta, .gumLLH, hessian = TRUE, ..., tmp = data)
> fit
$par
xi -0.3129225
mu 72.5542497
beta 16.4450897
$value
[1] 1e+06
$counts
function gradient
4 NA
$convergence
[1] 0
$message
NULL
$hessian
xi mu beta
xi 0 0 0
mu 0 0 0
beta 0 0 0
I updated my dataset on google docs:
https://docs.google.com/spreadsheets/d/1IRRpjmdrrJPhNmfiLism_P0efV_Ot4HlEsa6kwMnljc/edit?usp=sharing
This is going to be a long story, possibly more suited to https://stats.stackexchange.com/.
====== Part 1 -- The problem ======
This is the sequence generating the error:
library(fExtremes)
samp <- read.csv("optimdata.csv")[ ,2]
## does not converge
para <- gevFit(samp, type = "mle")
We are facing the typical cause of lack-of-convergence when using optim() and friends: inadequate starting values for the optimisation.
To see what goes wrong, let us use the PWM estimator (http://arxiv.org/abs/1310.3222); this consists of an analytical formula, hence it does not incur into convergence problems, since it makes no use of optim():
para <- gevFit(samp, type = "pwm")
fitpwm<- attr(para, "fit")
fitpwm$par.ests
The estimated tail parameter xi is negative, corresponding to a bounded upper tail; in fact the fitted distribution displays even more "upper tail boundedness" than the sample data, as you can see from the "leveling off" of the quantile-quantile graph at the right:
qqgevplot <- function(samp, params){
probs <- seq(0.1,0.99,by=0.01)
qqempir <- quantile(samp, probs)
qqtheor <- qgev(probs, xi=params["xi"], mu=params["mu"], beta=params["beta"])
rang <- range(qqempir,qqtheor)
plot(qqempir, qqtheor, xlim=rang, ylim=rang,
xlab="empirical", ylab="theoretical",
main="Quantile-quantile plot")
abline(a=0,b=1, col=2)
}
qqgevplot(samp, fitpwm$par.ests)
For xi<0.5 the MLE estimator is not regular (http://arxiv.org/abs/1301.5611): the value of -0.46 estimated by PWM for xi is very close to that. Now the PWM estimates are used internally by gevFit() as starting values for optim(): you can see this if you print out the code for the function gevFit():
print(gevFit)
print(.gevFit)
print(.gevmleFit)
The starting value for optim is theta, obtained by PWM. For the specific data at hand, this starting value is not adequate, in that it leads to non-convergence of optim().
====== Part 2 -- solutions? ======
Solution 1 is to use para <- gevFit(samp, type = "pwm") as above. If you'd like to use ML, then you have to specify good starting values for optim(). Unfortunately, the fExtremes package does not make it easy to do so. You can then re-define your own version of .gevmleFit to include those, e.g.
.gevmleFit <- function (data, block = NA, start.param, ...)
{
data = as.numeric(data)
n = length(data)
if(missing(start.param)){
theta = .gevpwmFit(data)$par.ests
}else{
theta = start.param
}
fit = optim(theta, .gevLLH, hessian = TRUE, ..., tmp = data)
if (fit$convergence)
warning("optimization may not have succeeded")
par.ests = fit$par
varcov = solve(fit$hessian)
par.ses = sqrt(diag(varcov))
ans = list(n = n, data = data, par.ests = par.ests, par.ses = par.ses,
varcov = varcov, converged = fit$convergence, nllh.final = fit$value)
class(ans) = "gev"
ans
}
## diverges, just as above
.gevmleFit(samp)
## diverges, just as above
startp <- fitpwm$par.ests
.gevmleFit(samp, start.param=startp)
## converges
startp <- structure(c(-0.1, 1, 1), names=names(fitpwm$par.ests))
.gevmleFit(samp, start.param=startp)$par.ests
Now check this out: the beta estimated by PWM is 0.1245; by changing this to a tiny amount, the MLE is made to converge:
startp <- fitpwm$par.ests
startp["beta"]
startp["beta"] <- 0.13
.gevmleFit(samp, start.param=startp)$par.ests
This hopefully clearly illustrates that to blindly optim()ise works until it doesn't and might then turn into a quite delicate endeavour indeed. For this reason, it might be useful to leave this reply here, rather than to migrate to CrossValidated.
Related
My question relates to the use of R for the derivation of maximum likelihood estimates of parameters when a probability distributions is expressed in the form of an infinite sum, such as the one below due to Rao, Girija et al.
I wanted to see if I could reproduce the maximum likelihood estimates obtained by these authors (who used Matlab, rather than R) when the model is applied to a given set of data. My attempt is given below, although this throws up several warnings that "longer object length is not a multiple of shorter object length". I know why I am getting this warning, but I do not know how to remedy it. How can I edit my code to overcome this?
Also, is there a better way to handle infinite sums? Here I'm just using an arbitrary large number for n (1000).
library(bbmle)
svec <- list(c=1,lambda=1)
x <- scan(textConnection("0.1396263 0.1570796 0.2268928 0.2268928 0.2443461 0.3141593 0.3839724 0.4712389 0.5235988 0.5934119 0.6632251 0.6632251 0.6981317 0.7679449 0.7853982 0.8203047 0.8377580 0.8377580 0.8377580 0.8377580 0.8726646 0.9250245 0.9773844 0.9948377 1.0122910 1.0122910 1.0646508 1.0995574 1.1170107 1.1170107 1.1170107 1.1344640 1.1344640 1.1868239 1.2217305 1.2740904 1.3613568 1.3613568 1.3613568 1.4486233 1.4486233 1.5358897 1.5358897 1.5358897 1.5707963 1.6057029 1.6057029 1.6231562 1.6580628 1.6755161 1.7104227 1.7453293 1.7976891 1.8500490 1.9722221 2.0594885 2.4085544 2.6703538 2.6703538 2.7052603 3.5604717 3.7524579 3.8920842 3.9444441 4.1364303 4.1538836 4.2411501 4.2586034 4.3633231 4.3807764 4.4854962 4.6774824 4.9741884 5.5676003 5.9864793 6.1086524"))
dL <- function(x, c,lambda,n = 1000,log=TRUE) {
k <- 0:n
r <- log(sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda))))
if (log) return(r) else return(exp(r))
}
dat <- data.frame(x)
m1 <- mle2( x ~ dL(c,lambda),
data=dat,
start=svec,
control=list(parscale=unlist(svec)),
method="L-BFGS-B",
lower=c(0,0)
)
I suggest starting out with that algorithm and making a density function that can be tested for proper behavior by integrating over its range of definition, (c(0, 2*pi). You are calling it a "probability function" but that is a term that I associate with CDF's rather than density distributions (PDF's):
dL <- function(x, c=1,lambda=1,n = 1000, log=FALSE) {
k <- 0:n
r <- sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda)))
if (log) {log(r) }
}
vdL <- Vectorize(dL)
integrate(vdL, 0,2*pi)
#0.999841 with absolute error < 9.3e-06
LL <- function(x, c, lambda){ -sum( log( vdL(x, c, lambda))) }
(I think you were trying to pack too much into your log-likelihood function so I decide to break apart the steps.)
When I ran that version I got a warning message from the final mle2 step that I didn't like and I thought it might be the case that this density function was occasionally returning negative values, so this was my final version:
dL <- function(x, c=1,lambda=1,n = 1000) {
k <- 0:n
r <- max( sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda))), 0.00000001)
}
vdL <- Vectorize(dL)
integrate(vdL, 0,2*pi)
#0.999841 with absolute error < 9.3e-06
LL <- function(x, c, lambda){ -sum( log( vdL(x, c, lambda))) }
(m0 <- mle2(LL,start=list(c=0.2,lambda=1),data=list(x=x)))
#------------------------
Call:
mle2(minuslogl = LL, start = list(c = 0.2, lambda = 1), data = list(x = x))
Coefficients:
c lambda
0.9009665 1.1372237
Log-likelihood: -116.96
(The warning and the warning-free LL numbers were the same.)
So I guess I think you were attempting to pack too much into your definition of a log-likelihood function and got tripped up somewhere. There should have been two summations, one for the density approximation and a second one for the summation of the log-likelihood. The numbers in those summations would have been different, hence the error you were seeing. Unpacking the steps allowed success at least to the extent of not throwing errors. I'm not sure what that density represents and cannot verify correctness.
As far as the question of whether there is a better way to approximate an infinite series, the answer hinges on what is known about the rate of convergence of the partial sums, and whether you can set up a tolerance value to compare successive values and stop calculations after a smaller number of terms.
When I look at the density, it makes me wonder if it applies to some scattering process:
curve(vdL(x, c=.9, lambda=1.137), 0.00001, 2*pi)
You can examine the speed of convergence by looking at the ratios of successive terms. Here's a function that does that for the first 10 terms at an arbitrary x:
> ratios <- function(x, c=1, lambda=1) {lambda*c*(x+2*(1:11)*pi)^(-c-1)*(exp(-(x+2*(1:10)*pi)^(-c))^(lambda))/lambda*c*(x+2*(0:10)*pi)^(-c-1)*(exp(-(x+2*(0:10)*pi)^(-c))^(lambda)) }
> ratios(0.5)
[1] 1.015263e-02 1.017560e-04 1.376150e-05 3.712618e-06 1.392658e-06 6.351874e-07 3.299032e-07 1.880054e-07
[9] 1.148694e-07 7.409595e-08 4.369854e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
> ratios(0.05)
[1] 1.755301e-08 1.235632e-04 1.541082e-05 4.024074e-06 1.482741e-06 6.686497e-07 3.445688e-07 1.952358e-07
[9] 1.187626e-07 7.634088e-08 4.443193e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
> ratios(0.5)
[1] 1.015263e-02 1.017560e-04 1.376150e-05 3.712618e-06 1.392658e-06 6.351874e-07 3.299032e-07 1.880054e-07
[9] 1.148694e-07 7.409595e-08 4.369854e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
That looks like pretty rapid convergence to me, so I'm guessing that you could use only the first 20 terms and get similar results. With 20 terms the results look like:
> integrate(vdL, 0,2*pi)
0.9924498 with absolute error < 9.3e-06
> (m0 <- mle2(LL,start=list(c=0.2,lambda=1),data=list(x=x)))
Call:
mle2(minuslogl = LL, start = list(c = 0.2, lambda = 1), data = list(x = x))
Coefficients:
c lambda
0.9542066 1.1098169
Log-likelihood: -117.83
Since you never attempt to interpret a LL in isolation but rather look at differences, I'm guessing that the minor difference will not affect your inferences adversely.
A related question is "using fitdist from fitdistplus with binomial distribution
". fitdistrplus::fitdist is a function that takes univariate data and starting guess for parameters. To fit binomial and betabinomial data, while univariate, the size is also needed. If the size is fixed for every datum, then the aforementioned link has the fix needed. However if the sizes vary and a vector needs to be passed, I'm unsure how to get a properly functioning call.
opt_one in the code below was the solution that was offered in the aforementioned linked post -- that is, the cluster size is known and fixed. For opt_one, I incorrectly specify fix.arg=list(size=125) (in essence making every element of N=125) and this is close enough and the code runs. However, the cluster sizes in N actually vary. I try to specify this in opt_two and get an error. Any thoughts would be appreciated.
library(fitdistrplus)
library(VGAM)
set.seed(123)
N <- 100 + rbinom(1000,25,0.9)
Y <- rbetabinom.ab(rep(1,length(N)), N, 1, 2)
head(cbind(Y,N))
opt_one <-
fitdist(data=Y,
distr=pbetabinom.ab,
fix.arg=list(size=125),
start=list(shape1=1,shape2=1)
)
opt_one
Which gives:
> head(cbind(Y,N))
Y N
[1,] 67 123
[2,] 14 121
[3,] 15 123
[4,] 42 121
[5,] 86 120
[6,] 28 125
> opt_one <-
+ fitdist(data=Y,
+ distr=pbetabinom.ab,
+ fix.arg=list(size=125),
+ start=list(shape1=1,shape2=1)
+ )
Warning messages:
1: In fitdist(data = Y, distr = pbetabinom.ab, fix.arg = list(size = 125), :
The dbetabinom.ab function should return a zero-length vector when input has length zero
2: In fitdist(data = Y, distr = pbetabinom.ab, fix.arg = list(size = 125), :
The pbetabinom.ab function should return a zero-length vector when input has length zero
> opt_one
Fitting of the distribution ' betabinom.ab ' by maximum likelihood
Parameters:
estimate Std. Error
shape1 0.9694054 0.04132912
shape2 2.1337839 0.10108720
Fixed parameters:
value
size 125
Not, bad, as the shape1 and shape2 parameters were 1 and 2, respectively, as specified when we created Y. Here's option 2:
opt_two <-
fitdist(data=Y,
distr=pbetabinom.ab,
fix.arg=list(size=N),
start=list(shape1=1,shape2=1)
)
Which gives an error:
> opt_two <-
+ fitdist(data=Y,
+ distr=pbetabinom.ab,
+ fix.arg=list(size=N),
+ start=list(shape1=1,shape2=1)
+ )
Error in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg, :
'fix.arg' must specify names which are arguments to 'distr'.
An Attempt after initial posting (thanks to Dean Follmann)
I know I can code my own betabinomial likelihood (opt_three, presented below), but would really like to use the tools provided with having a fitdist object -- that is, to have opt_two working.
library(Rfast)
loglik <-function(parm){
A<-parm[1];B<-parm[2]
-sum( Lgamma(A+B) - Lgamma(A)- Lgamma(B) + Lgamma(Y+A) + Lgamma(N-Y+B) - Lgamma(N+A+B) )
}
opt_three <- optim(c(1,1),loglik, method = "L-BFGS-B", lower=c(0,0))
opt_three
Which gives:
> opt_three
$par
[1] 0.9525161 2.0262342
$value
[1] 61805.54
$counts
function gradient
7 7
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Also related is Ben Bolker's answer using mle2. The fitdist solution remains at large.
Look at example 4 of the ?fitdistrplus::fitdist() help page:
# (4) defining your own distribution functions, here for the Gumbel distribution
# for other distributions, see the CRAN task view
# dedicated to probability distributions
#
dgumbel <- function(x, a, b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b))
pgumbel <- function(q, a, b) exp(-exp((a-q)/b))
qgumbel <- function(p, a, b) a-b*log(-log(p))
fitgumbel <- fitdist(serving, "gumbel", start=list(a=10, b=10))
summary(fitgumbel)
plot(fitgumbel)
and then -- feeling inspired and informed because you actually RTM -- make your own [dpq] functions with N specified:
dbbspecifiedsize <- function(x, a, b) dbetabinom.ab(x, size=N, shape1=a, shape2=b)
pbbspecifiedsize <- function(q, a, b) pbetabinom.ab(q, size=N, shape1=a, shape2=b)
qbbspecifiedsize <- function(p, a, b) qbetabinom.ab(p, size=N, shape1=a, shape2=b)
opt_four <-
fitdist(data=Y,
distr="bbspecifiedsize",
start=list(a=1,b=1)
)
opt_four
which gives:
> opt_four
Fitting of the distribution ' bbspecifiedsize ' by maximum likelihood
Parameters:
estimate Std. Error
a 0.9526875 0.04058396
b 2.0261339 0.09576709
which is quite similar to the estimates of opt_three and is a fitdist object.
I'm setting up a simple QP to test the ROI R package. However, the package isn't able to give a wrong solution to a simple toy problem when it is unconstrained.
Example,
# Maximize -1/2 x^2 + x, no constraints
> x <- OP(Q_objective(as.matrix(-1), 1), maximum = TRUE, bounds = V_bound(lb = -Inf, ub = Inf, nobj = 1))
> x
> ROI Optimization Problem:
Maximize a quadratic objective function of length 1 with
- 1 continuous objective variable,
subject to
- 0 constraints
- 1 lower and 0 upper non-standard variable bounds.
Looks good. But when I solve the problem I get,
> sol <- ROI_solve(x, solver = 'qpoases')
> sol
No optimal solution found.
The solver message was: Initialisation failed! QP could not be solved!
The objective value is: 0.000000e+00
> sol$solution
[1] 0
> sol$objval
[1] 0
> sol$status
$code
[1] 1
$msg
solver qpoases
code 36
symbol RET_INIT_FAILED_HOTSTART
message Initialisation failed! QP could not be solved!
roi_code 1
Thats odd. On a related note, when I use the quadprog solver, I'm able to get the unconstrained solution (= 1), however I've had to switch from using quadprog to qpoases for other reasons.
Any help is much appreciated.
EDIT:
Oddly enough this works,
> ROI_solve(OP(Q_objective(as.matrix(-1), 1), maximum = TRUE), solver = 'qpoases')
No optimal solution found.
The solver message was: Initialisation failed! QP could not be solved!
The objective value is: 0.000000e+00
> ROI_solve(OP(Q_objective(as.matrix(1), -1), maximum = FALSE), solver = 'qpoases')
Optimal solution found.
The objective value is: -5.000000e-01
This different results result from a different value set in the parameter hessian_type if the argument hessian_type is not provided by the
user ROI.plugin.qpoases chooses the hessian_type. And due to a
bug in the choosing of the hessian type for the first example it choose
for the first example the hessian_type = 6 (unknown) and for the second
the correct hessian_type = 1 identity.
x1 <- OP(Q_objective(as.matrix(-1), 1), maximum = TRUE, bounds = NULL)
s1 <- ROI_solve(x1, solver = 'qpoases', hessian_type = 1L)
solution(s1)
x1 <- OP(Q_objective(as.matrix(-1), 1), maximum = TRUE, bounds = NULL)
s1 <- ROI_solve(x1, solver = 'qpoases', hessian_type = 6L)
solution(s1)
This is fixed in the new version and the new version is on the way to CRAN.
I have the following equation:
(1)
Rp,t+1= the return of the portfolio .. fr = free risk rate.. rt+1: the return of a strategy.
beta has the following expression:
beta=x0+x1*A+ x2*B+ x3*C+x4*D (Estimated using Generalized Method of Moments (GMM) (1).
A,B,C and D are the risk factors related to rt+1.
My objective is to find the optimal values of x1, x2, x3 and x4 that maximize the utility function of the investor.
with U(Rp,t+1;x) is the investor utility
x is the vector of parameters to maximize
Zt presents the 4 risk factors.
The code is:
ret<-cbind(ret) #ret= rt+1
factors<-cbind(A,B,C,D)
func<-function(x,ret,factors) {
df <- data.frame(A=factors$A*x[1],B=factors$B*x[2],C=factors$C*x[3], D=factors$D*x[4])
H<-as.matrix(factors)
HH<-matrix(H,179,4)
m <- gmm(ret~., data=df, HH)
b<- coef(m)
beta<- b[1]+b[2]*factors$A+b[3]*factors$B+b[4]*factors$C+b[5]*D
beta=cbind(beta)
r=RF+beta*ret #equation (1)
#Annual Sharpe ratio of the portfolio
averp<-mean(r)*12
sigmap<-sqrt(12)*sd(r)
Sharpe<-averp/sigmap
#Calculating utility
u<-1/nrow(r)*sum((1+r)^(1-5)/(1-5))
obj<-u
result <- list(obj=obj,u=u,beta=beta,r=r,averp=averp,sigmap=sigmap,Sharpe=Sharpe)
return(result)
}
#Catching the obj from the function
Final<-function(x,ret,factors){
bra<-func(x,ret,factors)
#print(bra$obj)
return(-bra$obj)
}
p<-optim(par = c(0,1,2,3),Final,method="Nelder-Mead",ret=ret,factors=factors)
bra<-func(x=p$par,ret=ret,factors=factors)
When I run the code, I get the following errors:
for p -->
Error in solve.default(crossprod(hm, xm), crossprod(hm, ym)) :
Lapack routine dgesv: system is exactly singular: U[2,2] = 0
for bra -->
Error in is.data.frame(x) :
(list) object cannot be coerced to type 'double'
I am would be veryy grateful if you could help me on this ! Thank you
I would write a unit test for your func function. You can use browser() to step through it.
Put ret and factors into a data frame. df <- data.frame(ret, A=factors$A*x[1], ...)
Then run m <- gmm(ret~., data=df); beta <- coef(m)
I am using maximum-likelihood optimization in Stan, but unfortunately the optimizing() function doesn't report standard errors:
> MLb4c <- optimizing(get_stanmodel(fitb4c), data = win.data, init = inits)
STAN OPTIMIZATION COMMAND (LBFGS)
init = user
save_iterations = 1
init_alpha = 0.001
tol_obj = 1e-012
tol_grad = 1e-008
tol_param = 1e-008
tol_rel_obj = 10000
tol_rel_grad = 1e+007
history_size = 5
seed = 292156286
initial log joint probability = -4038.66
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
13 -2772.49 9.21091e-005 0.0135987 0.07606 0.9845 15
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
> t2 <- proc.time()
> print(t2 - t1)
user system elapsed
0.11 0.19 0.74
>
> MLb4c
$par
psi alpha beta
0.9495000 0.4350983 -0.2016895
$value
[1] -2772.489
> summary(MLb4c)
Length Class Mode
par 3 -none- numeric
value 1 -none- numeric
How do I get the standard errors of the estimates (or confidence interval - quantiles), and possibly p-values?
EDIT: I did as kindly advised by #Ben Goodrich:
> MLb4cH <- optimizing(get_stanmodel(fitb4c), data = win.data, init = inits, hessian = TRUE)
> sqrt(diag(solve(-MLb4cH$hessian)))
psi alpha beta
0.21138314 0.03251696 0.03270493
But these "unconstrained" standard errors seem to be very different from the real ones - here as is the output from bayesian fitting using stan():
> print(outb4c, dig = 5)
Inference for Stan model: tmp_stan_model.
3 chains, each with iter=500; warmup=250; thin=1;
post-warmup draws per chain=250, total post-warmup draws=750.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
alpha 0.43594 0.00127 0.03103 0.37426 0.41578 0.43592 0.45633 0.49915 594 1.00176
beta -0.20262 0.00170 0.03167 -0.26640 -0.22290 -0.20242 -0.18290 -0.13501 345 1.00402
psi 0.94905 0.00047 0.01005 0.92821 0.94308 0.94991 0.95656 0.96632 448 1.00083
lp__ -2776.94451 0.06594 1.15674 -2780.07437 -2777.50643 -2776.67139 -2776.09064 -2775.61263 308 1.01220
You can specify the hessian = TRUE argument to the optimizing function, which will return the Hessian as part of the list of output. Thus, you can obtain estimated standard errors via sqrt(diag(solve(-MLb4c$hessian))); however those standard errors pertain to the estimates in the unconstrained space. To obtain estimated standard errors for the parameters in the constrained space, you could either use the delta method or draw many times from a multivariate normal distribution whose mean vector is MLb4c$par and whose variance-covariance is solve(-MLb4c$hessian), convert those draws to the constrained space with the constrain_pars function, and estimate the standard deviation of each column.
Here is some R code you could adapt to your case
# 1: Compile and save a model (make sure to pass the data here)
model <- stan(file="model.stan", data=c("N","K","X","y"), chains = 0, iter = 0)
# 2: Fit that model
fit <- optimizing(object=get_stanmodel(model), as_vector = FALSE,
data=c("N","K","X","y"), hessian = TRUE)
# 3: Extract the vector theta_hat and the Hessian for the unconstrained parameters
theta_hat <- unlist(fit$par)
upars <- unconstrain_pars(linear, relist(theta_hat, fit$par))
Hessian <- fit$hessian
# 4: Extract the Cholesky decomposition of the (negative) Hessian and invert
R <- chol(-Hessian)
V <- chol2inv(R)
rownames(V) <- colnames(V) <- colnames(Hessian)
# 5: Produce a matrix with some specified number of simulation draws from a multinormal
SIMS <- 1000
len <- length(theta_hat)
unconstrained <- upars + t(chol(V)) %*%
matrix(rnorm(SIMS * len), nrow = len, ncol = SIMS)
theta_sims <- t(apply(unconstrained, 2, FUN = function(upars) {
unlist(constrain_pars(linear, upars))
}))
# 6: Produce estimated standard errors for the constrained parameters
se <- apply(theta_sims, 2, sd)