Error comparing linear mixed effects models - r

I want to see whether the fixed effect Group2 in my model is significant. The model is:
Response ~ Group1 + Group2 + Gender + Age + BMI + (1 | Subject)
To check the significance I create a null model not containing the effect Group2:
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
and the full model containing the effect Group2:
Resp.model = lmer(Response~Group1+Group2+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
Then I use anova() to compare the two, but I get an error:
anova(Resp.null, Resp.model)
## Error in anova.merMod(Resp.null, Resp.model) :
## models were not all fitted to the same size of dataset
I think that the problem is that Group1 contains NaN, but I thought that linear mixed models were robust to missing data.
How can I solve this problem and compare the two models?
Do I have to delete the rows corresponding to NaN and fit Resp.null without these rows?
The data can be downloaded here.
Please note that you should replace "<undefined>" with NaN like this:
mydata = read.csv("mydata.csv")
mydata[mydata == "<undefined>"] <- NA

To avoid the "models were not all fitted to the same size of dataset" error in anova, you must fit both models on the exact same subset of data.
There are two simple ways to do this, and while this reproducible example uses lm and update, for lmer objects the same approach should work:
# 1st approach
# define a convenience wrapper
update_nested <- function(object, formula., ..., evaluate = TRUE){
update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}
# prepare data with NAs
data(mtcars)
for(i in 1:ncol(mtcars)) mtcars[i,i] <- NA
xa <- lm(mpg~cyl+disp, mtcars)
xb <- update_nested(xa, .~.-cyl)
anova(xa, xb)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 2nd approach
xc <- update(xa, .~.-cyl, data=na.omit(mtcars[ , all.vars(formula(xa))]))
anova(xa, xc)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If however you're only interested in testing a single variable (e.g. Group2), then perhaps the Anova() or linearHypothesis() in car would work as well for this usecase.
See also:
How to update `lm` or `glm` model on same subset of data?
R error which says "Models were not all fitted to the same size of dataset"

Fit Resp.model first, then use Resp.model#frame as data argument.
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=Resp.model#frame,REML=FALSE)

Related

linearHypothesis equivalent for ols command (rms package) in R

I am trying to use "linearHypothesis" function from "car" package to test coefficients of a model estimated with "ols" from "rms" package. The function works with "lrm" objects but not with "ols" objects. Have you got any alternatives? I know that using "lm" would sort the issue but I want to use "ols" since it is easier getting clustered standard errors there.
You can use glht from the multcomp package.
library(rms)
library(multcomp)
d <- datadist(swiss); options(datadist="d")
fit <- ols(Fertility ~ ., data = swiss)
summary(fit)
test <- glht(fit, linfct = "Agriculture = 0")
summary(test)
# Fit: ols(formula = Fertility ~ ., data = swiss, x = TRUE)
#
# Linear Hypotheses:
# Estimate Std. Error z value Pr(>|z|)
# Agriculture == 0 -0.1721 0.0703 -2.448 0.0144 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

How to call columns for cbind using column names stored in a vector (attempting to perform pairwise comparison of columns with manova in R)

I am trying to perform a pairwise manova analysis where I loop through all the possible pairs of my columns. I think this is best communicated with an example:
varList <- colnames(iris)
m1 <- manova(cbind(varList[1], varList[2]) ~ Species, data = iris)
# Error in model.frame.default(formula = cbind(varList[1], varList[2]) ~ :
# variable lengths differ (found for 'Species')
m2 <- manova(cbind(noquote(varList[1]), noquote(varList[2])) ~ Species,
data = iris)
# Error in model.frame.default(formula = cbind(noquote(varList[1]), noquote(varList[2])) ~ :
# variable lengths differ (found for 'Species')
m3 <- manova(cbind(Sepal.Length, Petal.Length) ~ Species, data = iris)
m4 <- manova(cbind(iris[ ,1], iris[ ,3]) ~ Species, data = iris)
summary(m3)
# Df Pillai approx F num Df den Df Pr(>F)
# Species 2 0.9885 71.829 4 294 < 2.2e-16 ***
# Residuals 147
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R.version.string
# [1] "R version 3.4.2 (2017-09-28)"
RStudio.Version()$version
# [1] ‘1.1.383’
I think this is more related to referring to colnames from a vector in my cbind() function. I saw something about the using parenthesis from this question here, but can't get that to work for my case. I can call the columns by their number (see m4), but I'd prefer to use column names if possible.
You need to wrap each of the entries from the vector that you are calling with eval(as.symbol()).
So:
m1 <- manova(cbind(eval(as.symbol(varList[1])), eval(as.symbol(varList[2]))) ~ Species, data = iris) should work.

Non-numeric argument to binary operator R NLS package

I am using the nls package in R to perform a nonlinear fit. I have specified my independent variable as follows:
t <- seq(1,7)
and my dependent variables as P <- c(0.0246, 0.2735, 0.5697, 0.6715, 0.8655, 0.9614, 1)
I then have tried:
m <- nls(P ~ 1 / (c + q*exp(-b*t))^(1/v)),
but every time I get:
"Error in c + q * exp(-b * t) : non-numeric argument to binary
operator"
Every one of my variables is numeric. Any ideas?
Thanks!
You have more than one problem in your script. The main issue is that you should never use names which are used by R: t is the matrix transpose, c is a common method to create vectors, and q is the quit instruction. nls() will not try to fit them, as they are already defined. I recommend using more meaningful and less dangerous variables such as Coef1, Coef2, …
The second problem is that you are trying to fit a model with 4 variables to a dataset with 7 data... This may yield singularities and other problems.
For the sake of the argument, I have reduced your model to three variables, and changed some names:
Time <- seq(1,7)
Prob <- c(0.0246, 0.2735, 0.5697, 0.6715, 0.8655, 0.9614, 1)
plot(Time, Prob)
And now we perform the nls() fit:
Fit <- nls(Prob ~ 1 / (Coef1 + Coef2 * exp(-Coef3 * Time)))
X <- data.frame(Time = seq(0, 7, length.out = 100))
Y <- predict(object = Fit, newdata = X)
lines(X$Time, Y)
And a summary of the results:
summary(Fit)
# Formula: Prob ~ 1/(Coef1 + Coef2 * exp(-Coef3 * Time))
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# Coef1 1.00778 0.06113 16.487 7.92e-05 ***
# Coef2 23.43349 14.42378 1.625 0.1796
# Coef3 1.04899 0.21892 4.792 0.0087 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.06644 on 4 degrees of freedom
#
# Number of iterations to convergence: 12
# Achieved convergence tolerance: 3.04e-06
I know it is not exactly what you wanted, but I hope it helps.

Extract standard errors from tsls output

I want to extract the standard errors from the output of the tsls command in the sem R package.
Using some generic code as an example:
fit = tsls(Y ~ X, ~Z)
summary(fit)
The summary function outputs several things besides the regression estimates (e.g., model formulas, summary of the residuals).
I want an equivalent to fit$coef that outputs standard errors. But that doesn't seem to be an option. All the code used to do the equivalent for glm and lm output doesn't seem to work here. Is there any way to hack the output?
Sometimes it takes a little bit of digging to find where these values are coming from. The best place to look, if you don't get any clues from str(fit), would be to look at what summary.tsls is doing.
With some help from getAnywhere("summary.tsls"), we see:
getAnywhere("summary.tsls")
# A single object matching ‘summary.tsls’ was found
# It was found in the following places
# registered S3 method for summary from namespace sem
# namespace:sem
# with value
#
# function (object, digits = getOption("digits"), ...)
# {
# ###
# ### \\\SNIP///
# ###
# std.errors <- sqrt(diag(object$V))
# ###
# ### \\\SNIP///
# ###
# }
# <bytecode: 0x503c530>
# <environment: namespace:sem>
So, to get the value you are looking for, you need to calculate it yourself with:
sqrt(diag(fit$V))
A reproducible example:
library(sem)
fit <- tsls(Q ~ P + D, ~ D + F + A, data=Kmenta)
summary(fit)
#
# 2SLS Estimates
#
# Model Formula: Q ~ P + D
#
# Instruments: ~D + F + A
#
# Residuals:
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -3.4300 -1.2430 -0.1895 0.0000 1.5760 2.4920
#
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 94.63330387 7.92083831 11.94738 1.0762e-09 ***
# P -0.24355654 0.09648429 -2.52431 0.021832 *
# D 0.31399179 0.04694366 6.68869 3.8109e-06 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 1.9663207 on 17 degrees of freedom
sqrt(diag(fit$V))
# (Intercept) P D
# 7.92083831 0.09648429 0.04694366

Format custom summary output to match with ANOVA output in R

I'm new to R. We have an assignment that i'm working on. The assignment is on creating R package to mimic Anova table. I have created all the necessary function that is mandated in the assignment. The function calculates the correct values, but I couldn't make it display like ANOVA table that R's built in anova() function can. This is my summary.oneway function
summary.oneway <- function(object, ...){
#model <- oneway(object)
fval <- object$FValue
TAB <- list(t(object$AOV), "Mean Sq."= rbind(object$MSB, object$MSW),
"F Value" = fval, p.value = object$p.value)
res <- list(call=object$call, onewayAnova = TAB)
class(res) <- "summary.oneway"
res
}
This is the output:
Analysis of Variance:
oneway.formula(formula = coag ~ diet, data = coagdata)
[[1]]
Sum of Squares Deg. of Freedom
diet 228 3
Residual 112 20
$`Mean Sq.`
1
[1,] 76.0
[2,] 5.6
$`F Value`
1
13.57143
$p.value
1
4.658471e-05
Actual ANOVA output:
Analysis of Variance Table
Response: coag
Df Sum Sq Mean Sq F value Pr(>F)
diet 3 228 76.0 13.571 4.658e-05 ***
Residuals 20 112 5.6
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
How can I achieve this format? Where and what am I missing?
Thank you so much for your help.
Kuni
The Anova output uses the print method print.anova you may want to take look at methods(print) and specifically stats:::print.anova
You will most likely want to create your own print function
print.oneway <- function(object, ...) {
foo
bar
}

Resources