How to get residuals from Repeated measures ANOVA model in R - r

Normally from aov() you can get residuals after using summary() function on it.
But how can I get residuals when I use Repeated measures ANOVA and formula is different?
## as a test, not particularly sensible statistically
npk.aovE <- aov(yield ~ N*P*K + Error(block), npk)
npk.aovE
summary(npk.aovE)
Error: block
Df Sum Sq Mean Sq F value Pr(>F)
N:P:K 1 37.0 37.00 0.483 0.525
Residuals 4 306.3 76.57
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
N 1 189.28 189.28 12.259 0.00437 **
P 1 8.40 8.40 0.544 0.47490
K 1 95.20 95.20 6.166 0.02880 *
N:P 1 21.28 21.28 1.378 0.26317
N:K 1 33.14 33.14 2.146 0.16865
P:K 1 0.48 0.48 0.031 0.86275
Residuals 12 185.29 15.44
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Intuitial summary(npk.aovE)$residuals return NULL..
Can anyone can help me with this?

Look at the output of
> names(npk.aovE)
and try
> npk.aovE$residuals
EDIT: I apologize I read your example way too quickly. What I suggested is not possible with multilevel models with aov(). Try the following:
> npk.pr <- proj(npk.aovE)
> npk.pr[[3]][, "Residuals"]
Here's a simpler reproducible anyone can mess around with if they run into the same issue:
x1 <- gl(8, 4)
block <- gl(2, 16)
y <- as.numeric(x1) + rnorm(length(x1))
d <- data.frame(block, x1, y)
m <- aov(y ~ x1 + Error(block), d)
m.pr <- proj(m)
m.pr[[3]][, "Residuals"]

The other option is with lme:
require(MASS) ## for oats data set
require(nlme) ## for lme()
require(multcomp) ## for multiple comparison stuff
Aov.mod <- aov(Y ~ N * V + Error(B/V), data = oats)
the_residuals <- aov.out.pr[[3]][, "Residuals"]
Lme.mod <- lme(Y ~ N * V, random = ~1 | B/V, data = oats)
the_residuals <- residuals(Lme.mod)
The original example came without the interaction (Lme.mod <- lme(Y ~ N * V, random = ~1 | B/V, data = oats)) but it seems to be working with it (and producing different results, so it is doing something).
And that's it...
...but for completeness:
1 - The summaries of the model
summary(Aov.mod)
anova(Lme.mod)
2 - The Tukey test with repeated measures anova (3 hours looking for this!!). It does raises a warning when there is an interaction (* instead of +), but it seems to be safe to ignore it. Notice that V and N are factors inside the formula.
summary(Lme.mod)
summary(glht(Lme.mod, linfct=mcp(V="Tukey")))
summary(glht(Lme.mod, linfct=mcp(N="Tukey")))
3 - The normality and homoscedasticity plots
par(mfrow=c(1,2)) #add room for the rotated labels
aov.out.pr <- proj(aov.mod)
#oats$resi <- aov.out.pr[[3]][, "Residuals"]
oats$resi <- residuals(Lme.mod)
qqnorm(oats$resi, main="Normal Q-Q") # A quantile normal plot - good for checking normality
qqline(oats$resi)
boxplot(resi ~ interaction(N,V), main="Homoscedasticity",
xlab = "Code Categories", ylab = "Residuals", border = "white",
data=oats)
points(resi ~ interaction(N,V), pch = 1,
main="Homoscedasticity", data=oats)

Related

R: testing whether a coefficient is equal across the different equations in a multivariate regression (using linearhypothesis())?

I have a question about how to compare coefficients in a multivariate regression in R.
I conducted a survey in which I measured three different attitudes (scale variables). My goal is to estimate whether some characteristics of the respondents (age, gender, education and ideological position) can explain their (positve/negative) attitudes.
I was advised to conduct a multivariate multiple regression instead of three univariate multiple regression. The code of my multivariate model is:
MMR <- lm(cbind(Attitude_1, Attitude_2, Attitude_3) ~
Age + Gender + Education + Ideological_position,
data = survey)
summary(MMR)
What I am trying to do next is to estimate whether the coefficients of let's say 'Gender' are statistically significant across the three individual models.
I found a very clear instruction how to do this in Stata (https://stats.idre.ucla.edu/stata/dae/multivariate-regression-analysis/), but I don't have a license, so I have to find an alternative in R. I know a similar question has been asked here before (R - Testing equivalence of coefficients in multivariate multiple regression), but the answer was that there does not exist a package (or function) in R which can be used for this purpose. Because this answer was provided a few years back, I was wondering whether in the meantime some new packages or functions are implemented.
More precisely, I was wondering whether I can use the linearHypothesis() function (https://www.rdocumentation.org/packages/car/versions/3.0-11/topics/linearHypothesis)? I already know that this function allows me to test, for instance, whether the coefficient of Gender equals to coefficient of Education:
linearhypothesis(MMR, c("GenderFemale", "EducationHigh-educated")
Can I also use this function to test whether the coefficient of Gender in the equation modelling Attitude_1 equals the coefficient of Gender in the equation modelling Attitude_2 or Attitude_3?
Any help would be greatly appreciated!
Since the model presented in the question is not reproducible (the input is missing) let us use this model instead.
fm0 <- lm(cbind(cyl, mpg) ~ wt + hp, mtcars)
We will discuss two approaches using as our linear hypothesis that the intercepts of the cyl and mpg groups are the same, that the wt slopes are the same and the hp slopes are the same.
1) Mean/Variance
In this approach we base the entire comparison only on the coefficients and their variance covariance matrix.
library(car)
v <- vcov(fm0)
co <- setNames(c(coef(fm0)), rownames(v))
h1 <- c("cyl:(Intercept) = mpg:(Intercept)", "cyl:wt = mpg:wt", "cyl:hp = mpg:hp")
linearHypothesis(NULL, h1, coef. = co, vcov. = v)
giving:
Linear hypothesis test
Hypothesis:
cyl:((Intercept) - mpg:(Intercept) = 0
cyl:wt - mpg:wt = 0
cyl:hp - mpg:hp = 0
Model 1: restricted model
Model 2: structure(list(), class = "formula", .Environment = <environment>)
Note: Coefficient covariance matrix supplied.
Df Chisq Pr(>Chisq)
1
2 3 878.53 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
To explain what linearHypothesis is doing note that In this case the hypothesis matrix is L <- t(c(1, -1)) %x% diag(3) and given v then as a large sample approximation we have that L %*% co is distributed as N(0, L %*% v %*% t(L)) under the null hypothesis hence t(L %*% co) %*% solve(L %*% v %*% t(L)) %*% L %*% co is distributed as chi squared with nrow(L) degrees of freedom.
L <- t(c(1, -1)) %>% diag(3)
nrow(L) # degrees of freedom
SSH <- t(L %*% co) %*% solve(L %*% v %*% t(L)) %*% L %*% co # chisq
p <- pchisq(SSH, nrow(L), lower.tail = FALSE) # p value
2) Long form model
With this approach (which is not equivalent to the first one shown above) convert mtcars from wide to long form, mt2. We show how to do that using reshape or pivot_longer at the end but for now we will just form it explicitly. Define lhs as the 32x2 matrix on the left hand side of the fm0 formula, i.e. cbind(cyl, mpg). Note that its column names are c("cyl", "mpg"). Stringing out lhs column by column into a 64 long vector of the cyl column followed by the mpg column gives us our new dependent variable y. We also form a grouping variable g. the same length as y which indicates which column in lhs the corresponding element of y is from.
With mt2 defined we can form fm1. In forming fm1 We will use a weight vector w based on the fm0 sigma values to reflect the fact that the two groups, cyl and mpg, have different values of sigma given by the vector sigma(fm0).
We show below that the fm0 and fm1 models have the same coefficients and then run linearHypothesis.
library(car)
lhs <- fm0$model[[1]]
g. <- colnames(lhs)[col(lhs)]
y <- c(lhs)
mt2 <- with(mtcars, data.frame(wt, hp, g., y))
w <- 1 / sigma(fm0)[g.]^2
fm1 <- lm(y ~ g./(wt + hp) + 0, mt2, weights = w)
# note coefficient names
variable.names(fm1)
## [1] "g.cyl" "g.mpg" "g.cyl:wt" "g.mpg:wt" "g.cyl:hp" "g.mpg:hp"
# check that fm0 and fm1 have same coefs
all.equal(c(t(coef(fm0))), coef(fm1), check.attributes = FALSE)
## [1] TRUE
h2 <- c("g.mpg = g.cyl", "g.mpg:wt = g.cyl:wt", "g.mpg:hp = g.cyl:hp")
linearHypothesis(fm1, h2)
giving:
Linear hypothesis test
Hypothesis:
- g.cyl + g.mpg = 0
- g.cyl:wt + g.mpg:wt = 0
- g.cyl:hp + g.mpg:hp = 0
Model 1: restricted model
Model 2: y ~ g./(wt + hp) + 0
Res.Df RSS Df Sum of Sq F Pr(>F)
1 61 1095.8
2 58 58.0 3 1037.8 345.95 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If L is the hypothesis matrix which is the same as L in (1) except the columns are reordered, q is its number of rows, n is the number of rows of mt2 then SSH/q is distributed F(q, n-q-1) so we have:
n <- nrow(mt2)
L <- diag(3) %x% t(c(1, -1)) # note difference from (1)
q <- nrow(L)
SSH <- t(L %*% coef(fm1)) %*% solve(L %*% vcov(fm1) %*% t(L)) %*% L %*% coef(fm1)
SSH/q # F value
pf(SSH/q, q, n-q-1, lower.tail = FALSE) # p value
anova
An alternative to linearHypothesis is to define the reduced model and then compare the two models using anova. mt2 and w are from above. No packages are used.
fm2 <- lm(y ~ hp + wt, mt2, weights = w)
anova(fm2, fm1)
giving:
Analysis of Variance Table
Model 1: y ~ hp + wt
Model 2: y ~ g./(wt + hp) + 0
Res.Df RSS Df Sum of Sq F Pr(>F)
1 61 1095.8
2 58 58.0 3 1037.8 345.95 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Alternate wide to long calculation
An alternate way to form mt2 is by reshaping mtcars from wide form to long form using reshape.
mt2a <- mtcars |>
reshape(dir = "long", varying = list(colnames(lhs)), v.names = "y",
timevar = "g.", times = colnames(lhs)) |>
subset(select = c("wt", "hp", "g.", "y"))
or using tidyverse (which has rows in a different order but that should not matter as long as mat2b is used consistently in forming fm1 and w.
library(dplyr)
library(tidyr)
mt2b <- mtcars %>%
select(mpg, cyl, wt, hp) %>%
pivot_longer(all_of(colnames(lhs)), names_to = "g.", values_to = "y")

Non-numeric argument to binary operator R NLS package

I am using the nls package in R to perform a nonlinear fit. I have specified my independent variable as follows:
t <- seq(1,7)
and my dependent variables as P <- c(0.0246, 0.2735, 0.5697, 0.6715, 0.8655, 0.9614, 1)
I then have tried:
m <- nls(P ~ 1 / (c + q*exp(-b*t))^(1/v)),
but every time I get:
"Error in c + q * exp(-b * t) : non-numeric argument to binary
operator"
Every one of my variables is numeric. Any ideas?
Thanks!
You have more than one problem in your script. The main issue is that you should never use names which are used by R: t is the matrix transpose, c is a common method to create vectors, and q is the quit instruction. nls() will not try to fit them, as they are already defined. I recommend using more meaningful and less dangerous variables such as Coef1, Coef2, …
The second problem is that you are trying to fit a model with 4 variables to a dataset with 7 data... This may yield singularities and other problems.
For the sake of the argument, I have reduced your model to three variables, and changed some names:
Time <- seq(1,7)
Prob <- c(0.0246, 0.2735, 0.5697, 0.6715, 0.8655, 0.9614, 1)
plot(Time, Prob)
And now we perform the nls() fit:
Fit <- nls(Prob ~ 1 / (Coef1 + Coef2 * exp(-Coef3 * Time)))
X <- data.frame(Time = seq(0, 7, length.out = 100))
Y <- predict(object = Fit, newdata = X)
lines(X$Time, Y)
And a summary of the results:
summary(Fit)
# Formula: Prob ~ 1/(Coef1 + Coef2 * exp(-Coef3 * Time))
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# Coef1 1.00778 0.06113 16.487 7.92e-05 ***
# Coef2 23.43349 14.42378 1.625 0.1796
# Coef3 1.04899 0.21892 4.792 0.0087 **
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.06644 on 4 degrees of freedom
#
# Number of iterations to convergence: 12
# Achieved convergence tolerance: 3.04e-06
I know it is not exactly what you wanted, but I hope it helps.

F-score and standardized Beta for heteroscedasticity-corrected covariance matrix (hccm) in R

I have multiple regression models which failed Breusch-Pagan tests, and so I've recalculated the variance using a heteroscedasticity-corrected covariance matrix, like this: coeftest(lm.model,vcov=hccm(lm.model)). coeftest() is from the lmtest package, while hccm() is from the car package.
I'd like to provide F-scores and standardized betas, but am not sure how to do this, because the output looks like this...
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.000261 0.038824 0.01 0.995
age 0.004410 0.041614 0.11 0.916
exercise -0.044727 0.023621 -1.89 0.059 .
tR -0.038375 0.037531 -1.02 0.307
allele1_num 0.013671 0.038017 0.36 0.719
tR:allele1_num -0.010077 0.038926 -0.26 0.796
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Any advice on how to report these so they are as consistent as possible with the standard summary() and Anova() output from R and car, and the function std_beta() from the sjmisc package?
In case anyone else has this question, here was my solution. It is not particularly elegant, but it works.
I simply used the function for std_beta as a template, and then changed the input for the standard error to that derived from the std_beta() function.
# This is taken from std_beta function from sj_misc package.
# =====================================
b <-coef(lm.model) # Same Estimate
b <-b[-1] # Same intercept
fit.data <- as.data.frame(stats::model.matrix(lm.model)) # Same model.
fit.data <- fit.data[, -1] # Keep intercept
fit.data <- as.data.frame(sapply(fit.data, function(x) if (is.factor(x))
to_value(x, keep.labels = F)
else x))
sx <- sapply(fit.data, sd, na.rm = T)
sy <- sapply(as.data.frame(lm.model$model)[1], sd, na.rm = T)
beta <- b * sx/sy
se <-coeftest(lm.model,vcov=hccm(lm.model))[,2] # ** USE HCCM covariance for SE **
se <- se[-1]
beta.se <- se * sx/sy
data.frame(beta = beta, ci.low = (beta - beta.se *
1.96), ci.hi = (beta + beta.se * 1.96))
For the F-scores, I just squared the t-values.
I hope this saves someone some time.

univariate non linear optimisation in R

I'm trying to find solution in R that performs similarly to MATLAB's trust region reflective algorithm. This question has been asked before but the author was asked to provide reproducible example. I couldn't comment there so the only solution was to post new question. Here's my example:
x <- c(5000,5000,5000,5000,2500,2500,2500,2500,1250,1250,1250,1250,625,625, 625,625,312,312,312,312,156,156,156,156)
y <- c(0.209065186,0.208338898,0.211886104,0.209638321,0.112064803,0.110535275,0.111748670,0.111208841,0.060416469,0.059098975,0.059274827,0.060859512,0.032178730,0.033190833,0.031621743,0.032345817,0.017983939,0.016632180,0.018468540,0.019513489,0.011490089,0.011076365,0.009282322,0.012309134)
Since initial parameter values are the central issue I tried using 'nls2' package which uses 'brute-force' algorithm to find good starting parameters. Even with that, nls and nls.lm cannot reach convergence. Here's some basic code for this:
library('nls2'); library('minpack.lm')
fo <- y ~ I(A * (x ^ B) + C)
sA <- seq(-2,1,len=10) # range of parameter values
sB <- seq(-1,1,len=10)
sC <- seq(-1,1,len=10)
st1 <- expand.grid(A=sA,B=sB,C=sC)
mod1 <- nls2(fo,start=st1,algorithm="brute-force")
fit_ <- nls(fo,start=coef(mod1)) # basic nls
# or nls.lm
fit_ <- nlsLM(fo, start=coef(mod1),algorithm = "LM")
MATLAB produced:
a = 7.593e-05 (6.451e-05, 8.736e-05)
b = 0.9289 (0.9116, 0.9462)
c = 0.002553 (0.001333, 0.003772)
Goodness of fit:
SSE: 2.173e-05
R-square: 0.9998
Adjusted R-square: 0.9998
RMSE: 0.001017
and yes, using these parameter values, R also produced the solution.
Question: how to obtain this in R without using matlab ?
After looking at a the plotted data, I have no problem guessing suitable starting values:
plot(y ~ x)
The data is almost on a straight line through 0. So good starting value vor B and C should be 1 and 0, respectively. Then you only need to guestimate the slope of the straight line. Of course you could also use lm(y ~ x) to find starting values for A and C.
fo <- y ~ A * (x ^ B) + C
DF <- data.frame(x, y)
fit <- nls(fo, start = list(A = 0.001, B = 1, C = 0), data = DF)
summary(fit)
#Formula: y ~ A * (x^B) + C
#
#Parameters:
# Estimate Std. Error t value Pr(>|t|)
#A 7.593e-05 5.495e-06 13.820 5.17e-12 ***
#B 9.289e-01 8.317e-03 111.692 < 2e-16 ***
#C 2.552e-03 5.866e-04 4.351 0.000281 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 0.001017 on 21 degrees of freedom
#
#Number of iterations to convergence: 5
#Achieved convergence tolerance: 9.084e-07
lines(seq(min(x), max(x), length.out = 100),
predict(fit, newdata = data.frame(x = seq(min(x), max(x), length.out = 100))),
col = "blue")

Successively removing predictor variable from formula

I have a model formula in the form of
model.all <- lme(Response ~ A + B + C)
I would like to update this model by successively removing a predictor variable from the model, so I would end up with 3 models, specifically:
mod.1 <- lme(Response ~ B + C) ; mod.2 <- lme(Response ~ A + C) ; mod.3 <- lme(Response ~ A + B)
I am thinking of a loop function, so I am aware of the update function, but I have too many predictor variables to manually change the code.
Any suggestions would be appreciated.
I would use combn in this occasion, see the example below:
Example Data
Response <- runif(100)
A <- runif(100)
B <- runif(100)
C <- runif(100)
Solution
a <- c('A','B','C') #the names of your variables
b <- as.data.frame(combn(a,2)) #two-way combinations of those using combn
#create the formula for each model
my_forms <- sapply(b, function(x) paste('Response ~ ', paste(x,collapse=' + ')))
> my_forms #the formulas that will be used in the model
V1 V2 V3
"Response ~ A + B" "Response ~ A + C" "Response ~ B + C"
#run each model
my_models <- lapply(my_forms, function(x) lm(as.formula(x)))
Output
> summary(my_models[[1]])
Call:
lm(formula = as.formula(x))
Residuals:
Min 1Q Median 3Q Max
-0.48146 -0.20745 -0.00247 0.24263 0.58341
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.32415 0.08232 3.938 0.000155 ***
A 0.25404 0.09890 2.569 0.011733 *
B 0.07955 0.10129 0.785 0.434141
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2828 on 97 degrees of freedom
Multiple R-squared: 0.06507, Adjusted R-squared: 0.04579
F-statistic: 3.375 on 2 and 97 DF, p-value: 0.03827
As you can see each model is saved in as a list element in my_models. I find this quite easy to make and run.

Resources