I´m having problem with the anova function in the rms package:
require(rms)
getHdata(prostate)
mod1<-cph(Surv(dtime,status!="Alive")~stage+rx+age+wt,data=prostate,x=T,y=T)
mod2<-cph(Surv(dtime,status!="Alive")~stage+rx+rcs(age,4)+wt,data=prostate,x=T,y=T)
anova(mod1)
anova(mod2)
-everything works alright, but when I try to compare the models for the impact of non-linearity on age
anova(mod1,mod2)
I get
Error in anova.rms(mod1, mod2) : factor names not in design: mod2
What does this mean? What can I do to circumvent it?
//M
You should be able to use the output of anova(mod2) as one way to assess the significance but the best answer would be to compare the -2*log(likelihood) statistics. The anova.rms function is not designed to take two model fits. The second and subsequent unnamed arguments are assumed to be names of terms within the model rather than fit objects.
(Note that with rcs terms you will not see the sum of individual terms equal the full model chi-square values. I have asked Harrell about this and he says to do the cross-model comparisons "by hand".)
This comparison is done using lrtest (per Misha's comment).
Related
I have human data from different ages and gender. After using integration with seurat, how would I best control for these confounding factors during differential gene expression analysis. I see the option of latent.vars in FindMarkers function. Can I give latent.vars = c("Age", "gender") to account for both together? or can I only use one at a time?
Is there alternative package to do the test better?
Thanks in advance!
You can use that argument, but what it means is that you are shifting to a glm based model, and not the default wilcoxon test. you can also see it in the help page (?FindMarkers) :
latent.vars: Variables to test, used only when ‘test.use’ is one of
'LR', 'negbinom', 'poisson', or 'MAST'
You can see how the glm is called in the source code, under GLMDETest. Basically these two covariates are included in the glm to account for their effects on the dependent variable. What is also important is how you treat the covariate age in this case. Would it be categorical or continuous.. this could affect your results.
m1 <- lm(AmountSpent~Catalogs*Salary,data=d)
summary(m1)
m2<-lm(AmountSpent~Catalogs+Salary,data=d)
summary(m2)
anova(m2,m1,test="Chisq")
The output is as follows
What is the better model according to test ? Is the order in which we insert models in the method important? Please explain the statistical concept behind this test
The Chi-square test looks into the statistical significance in reduction of the residual sum of squares between the nested linear models. From your R ouput you can see that adding a term to the regression resulted statistically in a better model (the one with the lower RSS value, Model 2).
It is usual to start the comparing with a simpler model and then add terms, however, the docs also mention that it is up to the user.
You should take a look at the docs of the the anova.lm function here.
For comparing models that are not nested rather use AIC or BIC criteria.
I'm making a PLS model using packages "pls" and "ChemometricswithR". I'm able to perform the model but I have a problem. I did a leave-one-out validation and if I ask for the coefficients I can see only an equation (I suppose the average of all the equations developed in leave one out validation).
Is there a way to see all the "n" equations (where n is the number of the observations in my matrix) with all the slopes coefficients?
this is the model i used: mod2<-plsr(SH_uve~matrix_uve,ncomp=11, data=dataset_uve, validation="LOO",jackknife = TRUE)
This would be easier to answer if you gave more information, how you are calling the functions etc? Based on what you said you are doing I'm assuming you are using the functions crossval() and PCA() from packages "pls" and "ChemometricswithR" respectively. I'm not familiar with these functions but the documentations sates that for coefficients "(only if
jackknife is TRUE) an array with the jackknifed regression coefficients.The dimensions correspond to the predictors, responses, number of components, and segments, respectively". So I would say make sure jackknife=TRUE and that you are specifying the correct number of segments in crossval(). If you are using different functions you should edit your question and add in the relevant information.
OK, i found the solution.
The model i used is:
mod2<plsr(SH_uve~matrix_uve,ncomp=11,data=dataset_uve,validation="LOO",jackknife = TRUE)
The coefficients matrix is inside the mod2 array. I called the matrix with the command:
coefficients<-mod2$validation$coefficients[,,11,] and i obtained the coefficients matrix for all the equations used in the leave-one-out cross validation.
I want to use robust limma on my microarray data and R's user guide says rlm is the correct function to use according to:
http://rss.acs.unt.edu/Rdoc/library/limma/html/mrlm.html
I currently have:
lmFit(ExpressionMatrix, design, method = "robust", na.omit=T)
I can see that I chose the method to be robust. Does that mean that rlm will be called by this lmFit? and if I want it not to be robust, what method should I use?
The help page says:
The function mrlm is used if method="robust".
And then goes on:
If method="ls", then gls.series is used if a correlation structure has been specified, i.e., if ndups>1 or block is non-null and correlation is different from zero. If method="ls" and there is no correlation structure, lm.series is used.
If you follow the links from the help page for lmFit (06.LinearModels)
Fitting Models
The main function for model fitting is lmFit. This is recommended
interface for most users. lmFit produces a fitted model object of
class MArrayLM containing coefficients, standard errors and residual
standard errors for each gene. lmFit calls one of the following three
functions to do the actual computations:
lm.series
Straightforward least squares fitting of a linear model for
each gene.
mrlm
An alternative to lm.series using robust regression as
implemented by the rlm function in the MASS package.
gls.series
Generalized least squares taking into account correlations
between duplicate spots (i.e., replicate spots on the same array) or
related arrays. The function duplicateCorrelation is used to estimate
the inter-duplicate or inter-block correlation before using
gls.series.
I am fitting a model regarding absence-presence data and I would like to check whether the random factor is significant or not.
To do this, one should compare a glmm with a glm and check with the LR-test which one is most significant, if I understand correct.
But if I perform an ANOVA(glm,glmm) , I get an analysis of Deviance Table and no output that compares the models.
How do I get the output that I desire, thus comparing both models?
Thanks in advance,
Koen
Somewhere you got the wrong impression about using anova() for this. Below re was fit using glmmPQL() in MASS package. fe was fit using glm() from base:
> anova(re,fe)
#Error in anova.glmmPQL(re, fe) : 'anova' is not available for PQL fits
That message appears to be the sole reason anova.glmmPQL() was created.
See this thread for verification and vague explanation:
https://stat.ethz.ch/pipermail/r-help/2002-July/022987.html
simply anova does not work with glmmPQL you need to use glmer from lme4 package to be able to use anova.