I am currently trying to fit a linear model to count data, where the errors are following a poisson distribution. Namely I would like to minimize the following
where I have i samples. β is a vector with m coefficients and x is consisting of m independent (explanatory) variables. β should sum up to 1 and each coefficient should be larger than 0.
I am using R and I tried the package glmc without much success. The only example in the documentation is only confusing me, as I don't get how the constraint matrix Amat is enforcing a constraint on the coefficients. Is there any other example I could have a look at or another package?
I also tried solving this analytically with medium success.
Any help is appreciated!
kind regards, Lena
Related
I read the glmmTMB package vignettes (https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.html) and have the following questions:
In the vignetts, they fit the model using
glmmTMB(y~ar1(times+0|group),data=dat0)
and mentioned that the time+0 correspondes to a design matrix Z linking observation vector y (rows) with a random effects vector u(columns)"
What's the meaning of +0? Is there any difference with (times|group), (times+1|group) and (1|group)?
Are there any comprehensive summary about the syntax of the covariance structure?
If I want to fit a Negtive binomial model when outcome y_{ij} is generated from R function: rnbinom(mu=x_{ij}beta+b_i+e_{ij},size=1), where i is the group index and j is the individual index, and b_i~N(0,1), e_{ij}~N(0,1). Would the following code correctly specify to model?
dat <- data.frame(y,x,group)
glmmTMB(y~x+(1|group),data=dat,family=nbinom2)
Any suggestion and help is appreciated. Thanks in advance for your help!
I'm working on a classification problem (predicting three classes) and I'm comparing SVM against Random Forest in R.
For evaluation and comparison I want to calculate the bias and variance of the models. I've looked up the two terms in many machine learning books and I'd say I do understand the sense of variance and bias (easiest explanation with the bullseye). But I can't really figure out how to apply it in my case.
Let's say I predict the results for a test set with 4 SVM-models that were trained with 4 different training sets. Each time I get a total error (meaning all wrong predictions/all predictions).
Do I then get the bias for SVM by calculating this?
which would mean that the bias is more or less the mean of the errors?
I hope you can help me with not to complicated formula, because I've already seen many of them.
My response variable is Yijk corresponding to the recovery time of
patient i (i=1,...,I)
with treatment j (j=1,...,J)
and measured at time k (k=1,...,K)
I would like to fit the following model:Model equation, where:
μ is a global fixed intercept
αj is a fixed effect for the treatment
bik is a random effect with the following covariance structure. Denote bi the K-dimensional vector of effect for the patient i, then its variance-covariance matrix would have the following AR(1) structure.
Variance covariance matrix
uijk is the usual error term with variance σ²
Consider the following line of command:
lme(recovery ~ treatment, method="REML", random=~1|patient, correlation=corAR1,form=~time|patient,data=data)
Several questions:
What does this correlation argument correspond to? The structure of covariance of what? Is that the var-cov matrix which I defined as R?
Does the line actually do what I would like to?
If not, what does it do?
If not, is there a way to do what I would like to?
Thank you in advance!
First, you have a command lme, I will assume that is meant to be nlme because a) lme isn't an R command in any package that I know of or that R could find and b) correlation isn't an option in lme4
Second, in the documentation for nlme they have this:
an optional corStruct object describing the within-group correlation
structure. See the documentation of corClasses for a description of
the available corStruct classes. Defaults to NULL, corresponding to no
within-group correlations.
and in corClasses it says
corAR1 autoregressive process of order 1.
So, the answers to your first two questions appears to be "Yes".
I have two types of individuals, say M and F, each described with six variables (forming a 6D space S). I would like to identify the regions in S where the densities of M and F differ maximally. I first tried a logistic binomial model linking F/ M to the six variables but the result of this GLM model is very hard to interpret (in part due to the numerous significant interaction terms). Thus I am thinking to an “spatial” analysis where I would separately estimate the density of M and F individuals everywhere in S, then calculating the difference in densities. Eventually I would manually look for the largest difference in densities, and extract the values at the 6 variables.
I found the function sm.density in the package sm that can estimate densities in a 3d space, but I find nothing for a space with n>3. Would you know something that would manage to do this in R? Alternatively, would have a more elegant method to answer my first question (2nd sentence)?
In advance,
Thanks a lot for your help
The function kde of the package ks performs kernel density estimation for multinomial data with dimensions ranging from 1 to 6.
pdfCluster and np packages propose functions to perform kernel density estimation in higher dimension.
If you prefer parametric techniques, you look at R packages doing gaussian mixture estimation like mclust or mixtools.
The ability to do this with GLM models may be constrained both by interpretablity issues that you already encountered as well as by numerical stability issues. Furthermore, you don't describe the GLM models, so it's not possible to see whether you include consideration of non-linearity. If you have lots of data, you might consider using 2D crossed spline terms. (These are not really density estimates.) If I were doing initial exploration with facilities in the rms/Hmisc packages in five dimensions it might look like:
library(rms)
dd <- datadist(dat)
options(datadist="dd")
big.mod <- lrm( MF ~ ( rcs(var1, 3) + # `lrm` is logistic regression in rms
rcs(var2, 3) +
rcs(var3, 3) +
rcs(var4, 3) +
rcs(var5, 3) )^2,# all 2way interactions
data=dat,
max.iter=50) # these fits may take longer times
bplot( Predict(bid.mod, var1,var2, n=10) )
That should show the simultaneous functional form of var1's and var2's contribution to the "5 dimensional" model estimates at 10 points each and at the median value of the three other variables.
I would first like to say, that I understand that calculating an R^2 value for a non-linear regression isn't exactly correct or a valid thing to do.
However, I'm in a transition period of performing most of our work in SigmaPlot over to R and for our non-linear (concentration-response) models, colleagues are used to seeing an R^2 value associated with the model to estimate goodness-of-fit.
SigmaPlot calculates the R^2 using 1-(residual SS/total SS), but in R I can't seem to extract the total SS (residual SS are reported in summary).
Any help in getting this to work would be greatly appreciated as I try and move us into using a better estimator of goodness-of-fit.
Cheers.
Instead of extracting the total SS, I've just calculated them:
test.mdl <- nls(ctrl.adj~a/(1((conc.calc/x0)^b)),
data=dataSet,
start=list(a=100,b=10,x0=40), trace=T);
1 - (deviance(test.mdl)/sum((ctrl.adj-mean(ctrl.adj))^2))
I get the same R^2 as when using SigmaPlot, so all should be good.
So the total variation in y is like (n-1)*var(y) and the proportion not explained my your model is sum(residuals(fit)^2) so do something like 1-(sum(residuals(fit)^2)/((n-1)*var(y)) )