Regression code - r

How do I create a customized function in R that fits all multiple linear regression models from the given data with number of variables specified by the user? The function looks like this:
BodyFat.lm <- lm(PercentBodyFat ~ ., data = BodyFat)
fits for all data. I want function where user specify the number of variables like
(my.data = BodyFat, n = 2)

You should be able to do what you want with dredge in the MuMin package. Perhaps something like this:
library(MuMIn)
BodyFat.lm.2 <- dredge(BodyFat.lm, m.max=2, m.min=2)

As a great resource which shows a possible solution, you might want to reference the following excellent post by Mark Heckmann which shows how to calculate all possible linear regression models for a given set of predictors. As the author points out, you can take a few approaches:
1) Write a lot of code (he does this), to follow a repetition driven step-by-step analysis approach
2) Make use of a specialized package. The author suggests the packages leaps and meifly, but notes that both seem to have some drawbacks. Note that you can see specific code and more information on Hadley Wickham's meifly package here: https://github.com/hadley/meifly/

Related

R, mitools::MIcombine, what is the reason for no p-values?

I am currently running a simple linear regression model with 5 multiply imputed datasets in R.
E.g. model <- with(imp, lm(outcome ~ exposure))
To pool the summary estimates I could use the command summary(mitools::MIcombine(model)) from the mitools package. However, this does not give results for p-values. I could also use the command summary(pool(model)) from the mice package and this does give results for p-values.
Because of this, I am wondering if there is a specific reason why MIcombine does not produce p-values?
After looking through the documentation, it doesn't seem like there is a particular reason that the mitools library doesn't provide p-values. Although, the package's focus is on imputation, not model results.
However, you don't need either of these packages to see your results–along with the per model p-values. I started writing this as a comment but decided to include the code. If you weren't aware...you can use base R's summary. I realize that the output of mice is comparative, as is mitools. I thought it was important enough to mention this, as well.
If the output of your call is model, then this will work.
library(tidyverse)
map(1:length(model), ~summary(model[.x]))

Bioassay dose response fitting with heteroscedastic data

I am using the drc package in R to fit dose response curves (4-param logistic: LL.4) for biological assays. The data I collect is typically heteroscedastic (example image below). I am looking for ways to account for this when calling drm. I have found three possibilities that seem promising:
Use the type="Poisson" parameter to drm. However, over- and under-dispersion are probable for many assays so this isn't likely to be a general solution
Follow drm with a call to drc.boxcox. This seems to be more general and could work.
Use the "varPower" tranform that used to be implemented in drc.multdrc and in drc.drm before it was commented out (search for "varPower" in the drm source). I could un-comment those sections to restore the varPower functionality.
My questions are, what is the most accepted way to handle this? Also, does anyone know why varPower variance handling was removed from the drc package?
Example code:
# Naive method
a <- drm(y~x,data=subs, fct=LL.4(),control=ctl, start=params)
#Poisson Method
a <- drm(y~x,data=subs, fct=LL.4(),control=ctl, start=params, type="Poisson")
#BOXCOX method
a <- drm(y~x,data=subs, fct=LL.4(),control=ctl, start=params)
a2 <- boxcox(a)
Example Data:
I found the answer to this question in this paper by the authors of the drc package. In the paper they comment:
Weights may be used for addressing variance heterogeneity in the
response. However, the transform-both-sides approach should be
preferred over using often very imprecisely determined weights
The "transform-both-sides" approach refers to using the drc.boxcox function (code in the original question).
Further advice was provided in a personal communication with one of the authors of the drc package. He advised that presently, the medrc R package is better suited for dose response analysis in R.

How to fit a restricted VAR model in R?

I was trying to understand how may I fit a VAR model that is specific and
not general.
I understand that fitting a model such as general VAR(1) is done by
importing the "vars" package from Cran
for example
consider that y is a matrix of a 10 by 2. then I did this after importing vars package
y=df[,1:2] # df is a dataframe with alot of columns (just care about the first two)
VARselect(y, lag.max=10, type="const")
summary(fitHilda <- VAR(y, p=1, type="const"))
This work fine if no restriction is being made on the coefficients. However, if I would like to fit this restricted VAR model in R
How may I do so in R?
Please refer me to a page if you know any? If there is anything unclear from your prespective please do not mark down let me know what is it and I will try to make it as clear as I understand.
Thank you very much in advance
I was not able to find how may I put restrictions the way I would like to. However, I find a way to go through that by doing as follow.
Try to find the number of lags using a certain information criterion like
VARselect(y, lag.max=10, type="const")
This will enable you to find the lag length. I found it to be one in my case. Then afterwards fit a VAR(1) model to your data. which is in my case y.
t=VAR(y, p=1, type="const")
When I view the summary. I find that some of the coefficients may be statistically insignificant.
summary(t)
Then afterwards run the built-in function from the package 'vars'
t1=restrict(t, method = "ser", thresh = 2.0, resmat = NULL)
This function enables one to Estimation of a VAR, by imposing zero restrictions by significance
to see the result write
summary(t1)

Function to plot model with one variable varying and others constant

It's simple, but I can't remember how this procedure is called, hence I was not able to find the function to do so. I want to explore the effects and gradients of a simple lm() model by plotting the response of one variable at a time, the others being kept constant.
Can anybody tell me which function to use to do so? I seem to remember it's a function generating several plots, or something like this. It could be something akin to sensitivity analysis... Sorry for the beginner question.
Thank you in advance!
The car package has a lot of utilities for analyzing regression models. This sounds like a component+residual plot (or partial residuals plot).
library(car) # for avPlots(...)
fit <- lm(mpg~wt+hp+disp, mtcars)
crPlots(fit)
As noted in the comments, termplot(...) does basically the same thing.

How do you perform a goodness of link test for a generalized linear model in R?

I'm working on fitting a generalized linear model in R (using glm()) for some data that has two predictors in full factorial. I'm confident that the gamma family is the right error distribution to use but not sure about which link function to use so I'd like to test all possible link functions against one another. Of course, I can do this manually by making a separate model for each link function and then compare deviances, but I imagine there is a R function that will do this and compile results. I have searched on CRAN, SO, Cross-validated, and the web - the closest function I found was clm2 but I do not believe I want a cumulative link model - based on my understanding of what clm's are.
My current model looks like this:
CO2_med_glm_alf_gamma <- glm(flux_median_mod_CO2~PercentH2OGrav+
I(PercentH2OGrav^2)+Min_Dist+
I(Min_Dist^2)+PercentH2OGrav*Min_Dist,
data = NC_alf_DF,
family=Gamma(link="inverse"))
How do I code this model into an R function that will do such a 'goodness-of-link' test?
(As far as the statistical validity of such a test goes, this discussion as well as a discussion with a stats post-doc lead me to believe that is valid to compare AIC or deviances between generalized linear models that are identical except for having different link functions)
This is not "all possible links", it's testing against a specified class of links, but there is a goodness-of-link test by Pregibon that is implemented in the LDdiag package. It's not on CRAN, but you can install it from the archives via
devtools::install_version("LDdiag","0.1")
The example given (not that exciting) is
quine$Days <- ifelse(quine$Days==0, 1, quine$Days)
ex <- glm(Days ~ ., family = Gamma(link="log"), data = quine)
pregibon(ex)
The pregibon family of link functions is implemented in the glmx package. As pointed out by Achim Zeleis in comments, the package provides various parametric link functions and supports general estimation and inference based on such parametric links (or more generally parametric families). To see a worked example how this can be employed for a variety of goodness-of-link assessements, see example("WECO", package = "glmx"). This replicates the analyses from two papers by Koenker and Yoon (see below).
This example might be useful too.
Koenker R (2006). “Parametric Links for Binary Response.” R News, 6(4), 32--34; link to page with supplementary materials.
Koenker R, Yoon J (2009). “Parametric Links for Binary Choice Models: A Fisherian-Bayesian Colloquy.” Journal of Econometrics, 152, 120--130; PDF.
I have learned that the dredge function (MuMIn package) can be used to perform goodness-of-link tests on glms, lms, etc. More generally it is a model selection function but allows for a good deal of customization. In this case, you can use the varying option to compare models fit with different link functions. See the Beetle example that they work for details.

Resources