what is the difference between lmFit and rlm - r

I want to use robust limma on my microarray data and R's user guide says rlm is the correct function to use according to:
http://rss.acs.unt.edu/Rdoc/library/limma/html/mrlm.html
I currently have:
lmFit(ExpressionMatrix, design, method = "robust", na.omit=T)
I can see that I chose the method to be robust. Does that mean that rlm will be called by this lmFit? and if I want it not to be robust, what method should I use?

The help page says:
The function mrlm is used if method="robust".
And then goes on:
If method="ls", then gls.series is used if a correlation structure has been specified, i.e., if ndups>1 or block is non-null and correlation is different from zero. If method="ls" and there is no correlation structure, lm.series is used.

If you follow the links from the help page for lmFit (06.LinearModels)
Fitting Models
The main function for model fitting is lmFit. This is recommended
interface for most users. lmFit produces a fitted model object of
class MArrayLM containing coefficients, standard errors and residual
standard errors for each gene. lmFit calls one of the following three
functions to do the actual computations:
lm.series
Straightforward least squares fitting of a linear model for
each gene.
mrlm
An alternative to lm.series using robust regression as
implemented by the rlm function in the MASS package.
gls.series
Generalized least squares taking into account correlations
between duplicate spots (i.e., replicate spots on the same array) or
related arrays. The function duplicateCorrelation is used to estimate
the inter-duplicate or inter-block correlation before using
gls.series.

Related

Equivalent of nlcom (Stata) in R? Nonlinear transformations of regression coefficients

I would like to perform a nonlinear transformation of a regression coefficient. For example:
, or
.
Stata has a convenient implementation with nlcom this that employs the delta method to estimate standard errors and corresponding confidence intervals. I understand a simple transformation as posted can be simply done by directly addressing the coefficient of interest from the model. However, if we are interested in the ratio of several linear and nonlinear combinations, what would be an efficient method to produce confidence bounds on a transformation such as this? Moreover, when coefficients have a full co-variance matrix with standard errors estimated along with them.
To answer my own question, I discovered the library(msm) package that accommodates my request nicely with the function deltamethod(). UCLA has a really nice write up of this method, so I am providing the link for anyone who might have a similar need.
Using the delta method for nonlinear transformations of regression coefficients.
The deltaMethod() function from package car also accomplishes the same, providing as its output the estimate, its standard error and 95% confidence interval.

R: robust package -- lmRob how to find the psi function used in the calculations

I am using lmRob.
require(robust)
stack.rob.int <- lmRob(Loss ~.*., data = stack.dat)
Fine but, I was wondering how I could obtain the psi-function that is used by the lmRob function in the actual fitting. Thanks in advance for any help!
If I were to use the lmrob function in robustbase, is it possible to change the psi function to subtract it by a constant. I am trying to implement the bootstrap as per Lahiri (Annals of Statistics, 1992) where the way to still keep the bootstrap valid is mentioned to be to replace the psi() with the originalpsi() minus the mean ot the residuals while fitting the bootstrap for the robust linear model.
So, there is no way to access the psi function directly for robust::lmRob().
Simply put, lmRob() calls lmRob.fit() (or lmRob.wfit() if you supply weights) which subsequently calls lmRob.fit.compute() that then sets initial values for a Fortran version depending on the lmRob.control() set to either "bisquare" or "optimal".
As a result of the above discussion, if you need access to the psi functions, you may wish to use robustbase as it has easy access to many psi functions (c.f. the biweights)
Edit 1
Regarding:
psi function evaluated at the residuals in lmRob
No. The details of what is available after running lmRob is available in the lmRob.object. The documentation is accessible via ?lmRob.object. Regarding residuals, the following are available in the lmRob object.
residuals: the residual vector corresponding to the estimates returned in coefficients.
T.residuals: the residual vector corresponding to the estimates returned in T.coefficients.
M.weights: the robust estimate weights corresponding to the final MM-estimates in coefficients, if applies.
T.M.weights: the robust estimate weights corresponding to the initial S-estimates in T.coefficients, if applies.
Regarding
what does "optimal" do in lmRob?
Optimal refers to the following psi function:
sign(x)*(- (phi'(|x|) + c) / (phi(|x|) )
For other traditional psi functions, you may wish to look at robustbase's vignette
or a robust textbook.

PLS coefficients with r

I'm making a PLS model using packages "pls" and "ChemometricswithR". I'm able to perform the model but I have a problem. I did a leave-one-out validation and if I ask for the coefficients I can see only an equation (I suppose the average of all the equations developed in leave one out validation).
Is there a way to see all the "n" equations (where n is the number of the observations in my matrix) with all the slopes coefficients?
this is the model i used: mod2<-plsr(SH_uve~matrix_uve,ncomp=11, data=dataset_uve, validation="LOO",jackknife = TRUE)
This would be easier to answer if you gave more information, how you are calling the functions etc? Based on what you said you are doing I'm assuming you are using the functions crossval() and PCA() from packages "pls" and "ChemometricswithR" respectively. I'm not familiar with these functions but the documentations sates that for coefficients "(only if
jackknife is TRUE) an array with the jackknifed regression coefficients.The dimensions correspond to the predictors, responses, number of components, and segments, respectively". So I would say make sure jackknife=TRUE and that you are specifying the correct number of segments in crossval(). If you are using different functions you should edit your question and add in the relevant information.
OK, i found the solution.
The model i used is:
mod2<plsr(SH_uve~matrix_uve,ncomp=11,data=dataset_uve,validation="LOO",jackknife = TRUE)
The coefficients matrix is inside the mod2 array. I called the matrix with the command:
coefficients<-mod2$validation$coefficients[,,11,] and i obtained the coefficients matrix for all the equations used in the leave-one-out cross validation.

When to choose nls() over loess()?

If I have some (x,y) data, I can easily draw straight-line through it, e.g.
f=glm(y~x)
plot(x,y)
lines(x,f$fitted.values)
But for curvy data I want a curvy line. It seems loess() can be used:
f=loess(y~x)
plot(x,y)
lines(x,f$fitted)
This question has evolved as I've typed and researched it. I started off with wanting to a simple function to fit curvy data (where I know nothing about the data), and wanting to understand how to use nls() or optim() to do that. That was what everyone seemed to be suggesting in similar questions I found. But now I stumbled upon loess() I'm happy. So, now my question is why would someone choose to use nls or optim instead of loess (or smooth.spline)? Using the toolbox analogy, is nls a screwdriver and loess is a power-screwdriver (meaning I'd almost always choose the latter as it does the same thing but with less of my effort)? Or is nls a flat-head screwdriver and loess a cross-head screwdriver (meaning loess is a better fit for some problems, but for others it simply won't do the job)?
For reference, here is the play data I was using that loess gives satisfactory results for:
x=1:40
y=(sin(x/5)*3)+runif(x)
And:
x=1:40
y=exp(jitter(x,factor=30)^0.5)
Sadly, it does less well on this:
x=1:400
y=(sin(x/20)*3)+runif(x)
Can nls(), or any other function or library, cope with both this and the previous exp example, without being given a hint (i.e. without being told it is a sine wave)?
UPDATE: Some useful pages on the same theme on stackoverflow:
Goodness of fit functions in R
How to fit a smooth curve to my data in R?
smooth.spline "out of the box" gives good results on my 1st and 3rd examples, but terrible (it just joins the dots) on the 2nd example. However f=smooth.spline(x,y,spar=0.5) is good on all three.
UPDATE #2: gam() (from mgcv package) is great so far: it gives a similar result to loess() when that was better, and a similar result to smooth.spline() when that was better. And all without hints or extra parameters. The docs were so far over my head I felt like I was squinting at a plane flying overhead; but a bit of trial and error found:
#f=gam(y~x) #Works just like glm(). I.e. pointless
f=gam(y~s(x)) #This is what you want
plot(x,y)
lines(x,f$fitted)
Nonlinear-least squares is a means of fitting a model that is non-linear in the parameters. By fitting a model, I mean there is some a priori specified form for the relationship between the response and the covariates, with some unknown parameters that are to be estimated. As the model is non-linear in these parameters NLS is a means to estimate values for those coefficients by minimising a least-squares criterion in an iterative fashion.
LOESS was developed as a means of smoothing scatterplots. It has a very less well defined concept of a "model" that is fitted (IIRC there is no "model"). LOESS works by trying to identify pattern in the relationship between response and covariates without the user having to specify what form that relationship is. LOESS works out the relationship from the data themselves.
These are two fundamentally different ideas. If you know the data should follow a particular model then you should fit that model using NLS. You could always compare the two fits (NLS vs LOESS) to see if there is systematic variation from the presumed model etc - but that would show up in the NLS residuals.
Instead of LOESS, you might consider Generalized Additive Models (GAMs) fitted via gam() in recommended package mgcv. These models can be viewed as a penalised regression problem but allow for the fitted smooth functions to be estimated from the data like they are in LOESS. GAM extends GLM to allow smooth, arbitrary functions of covariates.
loess() is non-parametric, meaning you don't get a set of coefficients you can use later - it's not a model, just a fit line. nls() will give you coefficients you could use to build an equation and predict values with a different but similar data set - you can create a model with nls().

anova.rms problem with rcs() terms

I´m having problem with the anova function in the rms package:
require(rms)
getHdata(prostate)
mod1<-cph(Surv(dtime,status!="Alive")~stage+rx+age+wt,data=prostate,x=T,y=T)
mod2<-cph(Surv(dtime,status!="Alive")~stage+rx+rcs(age,4)+wt,data=prostate,x=T,y=T)
anova(mod1)
anova(mod2)
-everything works alright, but when I try to compare the models for the impact of non-linearity on age
anova(mod1,mod2)
I get
Error in anova.rms(mod1, mod2) : factor names not in design: mod2
What does this mean? What can I do to circumvent it?
//M
You should be able to use the output of anova(mod2) as one way to assess the significance but the best answer would be to compare the -2*log(likelihood) statistics. The anova.rms function is not designed to take two model fits. The second and subsequent unnamed arguments are assumed to be names of terms within the model rather than fit objects.
(Note that with rcs terms you will not see the sum of individual terms equal the full model chi-square values. I have asked Harrell about this and he says to do the cross-model comparisons "by hand".)
This comparison is done using lrtest (per Misha's comment).

Resources