What is the current convergence criterion of glmnet? - r

I have attempted to reproduce the results of glmnet with the convergence criterion described in equation 1 and 2 or in the vignette in Appendix 0 on page 34: https://cran.r-project.org/web/packages/glmnet/vignettes/glmnet.pdf
equation1
equation2
Considering that each observation has a weight of 1, this gives me:
delta[i]=crossprod(X[, i], X[, i])* (beta_last[i] - beta_new[i])**2
Then I check if max(delta)>=eps, as described in the vignette
Using this criterion, I do not get the same number of iterations as the glmnet results (often a lag of one or two iterations), leading me to believe that it is out of date. By the way, it seems that the convergence criterion of the glmnet algorithm in the Gaussian case has changed regularly in the last few years.
Do you know what criterion is used to determine the convergence of the algorithm ?
Thanks in advance for your help.

glmnet rescales the weights to sum to 1 before starting the fit, so you're missing a 1/n factor in the definition of delta[i]. But with that fix, this is the criterion used in the current version of glmnet (4.1-3) and also in version 4.1-2. Keep in mind, there are may be other differences like active set/strong set that you may not be using in exactly the same way as glmnet does, which can also affect the number of coordinate descents you realize.

Related

why Strauss-hardcore model could has a gamma bigger than 1?

the spatstat book said clearly that a Strauss model is invalid with a gamma bigger than 1, that is true:
multiple.Strauss<-ppm(P1a4.multiple~1, Strauss(r=51),method='ho')
#Warning message:
#Fitted model is invalid - cannot be simulated
as the L(r) function does has a trough first, I refit the data as a Strauss-hardcore model:
Mo.hybrid<-Hybrid(H=Hardcore(),S=Strauss(51))
multiple.hybrid<-ppm(P1a4.multiple~1,Mo.hybrid,method='ho')
#Hard core distance: 12.65963
#Fitted S interaction parameter gamma: 2.7466492
it interesting to see that the model fitted suceessfully, with a gamma>1 !
I want to know whether the gamma in Strauss-Hardcore model has same meaning with Strauss model, therefore could used as a indicator of aggregation?
Yes, the interpretation is similar and indicates some aggregation behaviour. The model with gamma>1 may be less intuitive to understand: Say the hardcore distance is r=12 and the Strauss interaction distance is R=50. Then you say that pairs of points within distance 12 of each other are heavily penalized (not permitted at all) while pairs of points separated by between 12 and 50 are encouraged (have a higher probability of occurring than at random). Pairs of points separated by more than 50 do not change the baseline probability (complete randomness).
Simulations from the StraussHardcore model often shows strange aggregation behavior, but it may be suitable for your data.

bnlearn::bn.fit difference and calculation of methods "mle" and "bayes"

I try to understand the differences between the two methods bayes and mle in the bn.fit function of the package bnlearn.
I know about the debate between the frequentist and the bayesian approach on understanding probabilities. On a theoretical level I suppose the maximum likelihood estimate mle is a simple frequentist approach setting the relative frequencies as the probability. But what calculations are done to get the bayes estimate? I already checked out the bnlearn documenation, the description of the bn.fit function and some application examples, but nowhere there's a real description of what's happening.
I also tried to understand the function in R by first checking out bnlearn::bn.fit, leading to bnlearn:::bn.fit.backend, leading to bnlearn:::smartSapply but then I got stuck.
Some help would be really appreciated as I use the package for academic work and therefore I should be able to explain what happens.
Bayesian parameter estimation in bnlearn::bn.fit applies to discrete variables. The key is the optional iss argument: "the imaginary sample size used by the bayes method to estimate the conditional probability tables (CPTs) associated with discrete nodes".
So, for a binary root node X in some network, the bayes option in bnlearn::bn.fit returns (Nx + iss / cptsize) / (N + iss) as the probability of X = x, where N is your number of samples, Nx the number of samples with X = x, and cptsize the size of the CPT of X; in this case cptsize = 2. The relevant code is in the bnlearn:::bn.fit.backend.discrete function, in particular the line: tab = tab + extra.args$iss/prod(dim(tab))
Thus, iss / cptsize is the number of imaginary observations for each entry in a CPT, as opposed to N, the number of 'real' observations. With iss = 0 you would be getting a maximum likelihood estimate, as you would have no prior imaginary observations.
The higher iss with respect to N, the stronger the effect of the prior on your posterior parameter estimates. With a fixed iss and a growing N, the Bayesian estimator and the maximum likelihood estimator converge to the same value.
A common rule of thumb is to use a small non-zero iss so that you avoid zero entries in the CPTs, corresponding to combinations that were not observed in the data. Such zero entries could then result in a network which generalizes poorly, such as some early versions of the Pathfinder system.
For more details on Bayesian parameter estimation you can have a look at the book by Koller and Friedman. I suppose many other Bayesian network books also cover the topic.

McDonalds omega: warnings in R

I'm computing omega for several different scales; and get different warning messages for different scales with different omega functions in R. My questions are regarding how to interpret these warnings and if it is safe to report the retrieved omega statistics.
When I'm using the following function from the article "From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation"
ci.reliability(subscale1, interval.type="bca", B=1000)
I get these warnings:
1: In lav_object_post_check(lavobject) :
lavaan WARNING: some estimated variances are negative
2: In lav_object_post_check(lavobject) :
lavaan WARNING: observed variable error term matrix (theta) is not positive definite; use inspect(fit,"theta") to investigate.
And it can be many of them!
What do they mean?
I still receive omega statistics; can they be interpreted or not?
When I use the function:
psych::omega(subscale1)
I get this warning:
Warning message:
In GPFoblq(L, Tmat = Tmat, normalize = normalize, eps = eps, maxit = maxit, :
convergence not obtained in GPFoblq. 1000 iterations used.
Again,
What does it mean; and can I use the omega-statistics that I get?
Note that these warnings appear on different subscales; so one subscale can be computed using one of the function but not the other and vice versa.
EDIT: If it helps: Subscale1 encompasses 4 items; the sample includes N>300. Also, I can run a CFA analysis on these 4 items in lavaan (Chi2=11.8, p<.001; CFI=0.98; RMSEA=0.123).
That particular article to which you are referring seems to be the British Journal of Psychology (2014), 105, 399–412© 2013 by Dunn, Baguley and Brunsden. The omega coefficient they discuss is actually what Rick Zinbarg and I refer to as omega_total. (McDonald developed two omega coefficients which has led to this confusion.)
You are having problems using omega in my psych package. The omega function in psych is meant to find omega_hiearchical as well as omega_total. Thus, it tries (by default) to extract three lower level factors and then, in turn, factor the resulting correlations of those factors. However, with only 4 variables in your sub scale, it can not find a meaningful 3 factor solution. You can specify that you want to find two factors:
omega(subscale1,2)
and it will work. However, omega_h is not particularly meaningful for 4 items.
Contrary to the suggestion of sample size, it is actually due to the number of items.
I think you might find the tutorial for finding omega_h using psych helpful:
[http://personality-project.org/r/psych/HowTo/R_for_omega.pdf]

R function for Likelihood

I'm trying to analyze repairable systems reliability using growth models.
I have already fitted a Crow-Amsaa model but I wonder if there is any package or any code for fitting a Generalized Renewal Process (Kijima Model I) or type II
in R and find it's parameters Beta, Lambda(or alpha) and q.
(or some other model for the mean cumulative function MCF)
The equation number 15 of this article gives an expression for the
Log-likelihood
I tried to create the function like this:
likelihood.G1=function(theta,x){
# x is a vector with the failure times, theta vector of parameters
a=theta[1] #Alpha
b=theta[2] #Beta
q=theta[3] #q
logl2=log(b/a) # First part of the equation
for (i in 1:length(x)){
logl2=logl2 +(b-1)*log(x[i]/(a*(1+q)^(i-1))) -(x[i]/(a*(1+q)^(i-1)))^b
}
return(-logl2) #Negavite of the log-likelihood
}
And then use some rutine for minimize the -Log(L)
theta=c(0.5,1.2,0.8) #Start parameters (lambda,beta,q)
nlm(likelihood.G1,theta, x=Data)
Or also
optim(theta,likelihood.G1,method="BFGS",x=Data)
However it seems to be some mistake, since the parameters it returns has no sense
Any ideas of what I'm doing wrong?
Thanks
Looking at equation (16) of the paper you reference and comparing it with your code it looks like you are missing one term in the for loop. It seems that each data point contributes to three terms of the log-likelihood but in your code (inside the loop) you only have two terms (not considering the updating term)
Specifically, your code does not include the 4th term in equation (16):
and neither it does the 7th term, and so on. This is at least one error in the code. An extra consideration would be that α and β are constrained to be greater than zero. I am not sure if the solver you are using is considering this constraint.

Why is rmvnorm() function returning "In sqrt(ev$values) : NaNs produced", what is this error and how can it be corrected or avoided?

I am working with financial/economic data in case you are wondering about the large size of some of the coefficients below... My general question has to do with the simulation of parameter coefficients output from a linear random effects model in R. I am attempting to generate a random sample of beta coefficients using the model coefficients and the variance-covariance (VCOV) matrix from the same model in R. My question is: Why am I receiving the error below about the square root of the expected values using the rmvnorm() function from the mvtnorm{} package? How can I deal with this warning/issue?
#Example call: lmer model with random effects by YEAR
#mlm<-lmer(DV~V1+V2+V3+V2*V3+V4+V5+V6+V7+V8+V9+V10+V11+(1|YEAR), data=dat)
#Note: 5 years (5 random effects total)
#LMER call yields the following information:
coef<-as.matrix(c(-28037800,0.8368619,2816347,8681918,-414002.6,371010.7,-26580.84,80.17909,271.417,-239.1172,3.463785,-828326))
sigma<-as.matrix(rbind(c(1834279134971.21,-415.95,-114036304870.57,-162630699769.14,-23984428143.44,-94539802675.96,
-4666823087.67,-93751.98,1735816.34,-1592542.75,3618.67,14526547722.87),
c(-415.95,0.00,41.69,94.17,-8.94,-22.11,-0.55,0.00,0.00,0.00,0.00,-7.97),
c(-114036304870.57,41.69,12186704885.94,12656728536.44,-227877587.40,-2267464778.61,
-4318868.82,8909.65,-355608.46,338303.72,-321.78,-1393244913.64),
c(-162630699769.14,94.17,12656728536.44,33599776473.37,542843422.84,4678344700.91,-27441015.29,
12106.86,-225140.89,246828.39,-593.79,-2445378925.66),
c(-23984428143.44,-8.94,-227877587.40,542843422.84,32114305557.09,-624207176.98,-23072090.09,
2051.16,51800.37,-49815.41,-163.76,2452174.23),
c(-94539802675.96,-22.11,-2267464778.61,4678344700.91,-624207176.98,603769409172.72,90275299.55,
9267.90,208538.76,-209180.69,-304.18,-7519167.05),
c(-4666823087.67,-0.55,-4318868.82,-27441015.29,-23072090.09,90275299.55,82486186.42,-100.73,
15112.56,-15119.40,-1.34,-2476672.62),
c(-93751.98,0.00,8909.65,12106.86,2051.16,9267.90,-100.73,2.54,8.73,-10.15,-0.01,-1507.62),
c(1735816.34,0.00,-355608.46,-225140.89,51800.37,208538.76,15112.56,8.73,527.85,-535.53,-0.01,21968.29),
c(-1592542.75,0.00,338303.72,246828.39,-49815.41,-209180.69,-15119.40,-10.15,-535.53,545.26,0.01,-23262.72),
c(3618.67,0.00,-321.78,-593.79,-163.76,-304.18,-1.34,-0.01,-0.01,0.01,0.01,42.90),
c(14526547722.87,-7.97,-1393244913.64,-2445378925.66,2452174.23,-7519167.05,-2476672.62,-1507.62,21968.29,
-23262.72,42.90,229188496.83)))
#Error begins here:
betas<-rmvnorm(n=1000, mean=coef, sigma=sigma)
#rmvnorm breaks, Error returned:
Warning message: In sqrt(ev$values) : NaNs produced
When I Google the following search string: "rmvnorm, "Warning message: In sqrt(ev$values) : NaNs produced," I saw that:
http://www.nickfieller.staff.shef.ac.uk/sheff-only/mvatasksols6-9.pdf On Page 4 that this error indicates "negative eigen values." Although, I have no idea conceptually or practically what a negative eigen value is or why that they would be produced in this instance.
The second search result: [http://www.r-tutor.com/r-introduction/basic-data-types/complex2 Indicates that this error arises because of an attempt to take the square root of -1, which is "not a complex value" (you cannot take the square root of -1).
The question remains, what is going on here with the random generation of the betas, and how can this be corrected?
sessionInfo() R version 3.0.2 (2013-09-25) Platform:
x86_64-apple-darwin10.8.0 (64-bit)
Using the following packages/versions
mvtnorm_0.9-9994,
lme4_1.1-5,
Rcpp_0.10.3,
Matrix_1.1-2-2,
lattice_0.20-23
You have a huge range of scales in your eigenvalues:
range(eigen(sigma)$values)
## [1] -1.005407e-05 1.863477e+12
I prefer to use mvrnorm from the MASS package, just because it comes installed automatically with R. It also appears to be more robust:
set.seed(1001)
m <- MASS::mvrnorm(n=1000, mu=coef, Sigma=sigma) ## works fine
edit: OP points out that using method="svd" with rmvnorm also works.
If you print the code for MASS::mvrnorm, or debug(MASS:mvrnorm) and step through it, you see that it uses
if (!all(ev >= -tol * abs(ev[1L]))) stop("'Sigma' is not positive definite")
(where ev is the vector of eigenvalues, in decreasing order, so ev[1] is the largest eigenvalue) to decide on the positive definiteness of the variance-covariance matrix. In this case ev[1L] is about 2e12, tol is 1e-6, so this would allow negative eigenvalues up to a magnitude of about 2e6. In this case the minimum eigenvalue is -1e-5, well within tolerance.
Farther down MASS::mvrnorm uses pmax(ev,0) -- that is, if it has decided that the eigenvalues are not below tolerance (i.e. it didn't fail the test above), it just truncates the negative values to zero, which should be fine for practical purposes.
If you insisted on using rmvnorm you could use Matrix::nearPD, which tries to force the matrix to be positive definite -- it returns a list which contains (among other things) the eigenvalues and the "positive-definite-ified" matrix:
m <- Matrix::nearPD(sigma)
range(m$eigenvalues)
## [1] 1.863477e+04 1.863477e+12
The eigenvalues computed from the matrix are not quite identical -- nearPD and eigen use slightly different algorithms -- but they're very close.
range(eigen(m$mat)$values)
## [1] 1.861280e+04 1.863477e+12
More generally,
Part of the reason for the huge range of eigenvalues might be predictor variables that are scaled very differently. It might be a good idea to scale your input data if possible to make the variances more similar to each other (i.e., it will make all of your numerical computations more stable) -- you can always rescale the values once you've generated them
It's also the case that when matrices are very close to singular (i.e. some eigenvalues are very close to zero), small numerical differences can change the sign of the eigenvalues. In particular, if you copy and paste the values, you might lose some precision and cause this problem. Using dput(vcov(fit)) or save(vcov(fit)) to save the variance-covariance matrix at full precision is safer.
if you have no idea what "positive definite" means you might want to read up about it. The Wikipedia articles on covariance matrices and positive definite matrices might be a little too technical for you to start with; this question on StackExchange is closer, but still a little technical. The next entry on my Google journey was this one, which looks about right.

Resources