Unable to get Residuals for AOV in R - r

I have a dataframe in R.
This is part of the head of the DF:
Sujet T E O P meanTR
1 1 0 0 0 0.97
1 1 0 0 0 1.44
2 0 1 0 1 0.94
Sujet : from 1 to 12
T , E , O , P : 1 or 0
meanTR : Numeric
I want to get the anova table, so I tried this:
model_all <- aov(meanTR ~ E*O*P+ Error(Sujet/E*O*P), data = df)
After that, I want to extract the residuals of my models to plot them
So I tried this :
res <- residuals(model_all) returns NULL
So I found people online suggesting this solution:
model_all.pr <- proj(model_all)
res <- model_all.pr[[3]][, "Residuals"]
But this returns subscript out of bound
res <- model_all.pr[[3]]["Residuals"]
But this returns NA
I don't know what I'm doing wrong. I'm really confused
Any help would be appreciated.
The main goal is to be able to run this:
plot(res)
qqnorm(res)

With aov(), you'll get a top-level $residuals attribute for some fits but not others.
For example with a simple model like the following, you can access residuals directly (use str() to see the structure of an object, include what attributes can be accessed):
fit1 <- aov(Sepal.Length ~ Sepal.Width, data=iris)
str(fit1$residuals)
## Named num [1:150] -0.644 -0.956 -1.111 -1.234 -0.722 ...
## - attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
But in the more complex model specification you're using (i.e. with an explicit/custom error term), there are separate residual values in each of the top-level attributes:
fit2 <- aov(Sepal.Length ~ Sepal.Width + Error(Species/Sepal.Length), data=iris)
fit2$residuals # NULL
names(fit2)
## [1] "(Intercept)" "Species" "Sepal.Length:Species" "Within"
fit2$Species$residuals
## 2 3
## -1.136219 5.179749
str(fit2$Within$residuals)
## Named num [1:144] -1.83e-15 -2.49e-15 -1.90e-15 -2.55e-15 -2.89e-15 ...
## - attr(*, "names")= chr [1:144] "7" "8" "9" "10" ...
## ...
## ...
I haven't thought about the stats behind enough to say why this is, but I'm sure it is reasonable.
Hope that helps!

Related

Specifying linear model in R without an intercept with contrasts

I am trying run a linear model in R that does not specify an intercept. The reason is to eventually calculate the sums of squares reduced when an intercept is added. However, I am receiving different results when specifying this model using built-in factor contrasts versus explicitly stating the contrast values (i.e., -.5 and .5).
More specifically, using contrasts() results in a model with 2 terms (no intercept) while explicitly stating the contrast values via a column vector results in the correct model (no intercept and 1 term specifying the contrast).
group <- rep(c("c", "t"), each = 5)
group_cont <- rep(c(-.5, .5), each = 5)
var1 <- runif(10)
var2 <- runif(10)
test_data <- data.frame(
group = factor(group),
group_cont = group_cont,
y = var1,
x = var2
)
contrasts(test_data$group) <- cbind(grp = c(-.5, .5))
summary(lm(y ~ 1 + group, data = test_data)) # full model
summary(lm(y ~ 0 + group, data = test_data)) # weird results
summary(lm(y ~ 0 + group_cont, data = test_data)) # expected
Is there a way to specify a linear model without an intercept, but still use contrasts() to specify the contrast?
lm() asks for a data frame and column names as inputs. When you use contrasts(), you are assigning an attribute to the column in your data frame, which you can call directly using the the contrast function or attr. However, you are not changing the data type itself. Using you example above:
> str(test_data)
'data.frame': 10 obs. of 4 variables:
$ group : Factor w/ 2 levels "c","t": 1 1 1 1 1 2 2 2 2 2 #### still a factor ####
..- attr(*, "contrasts")= num [1:2, 1] -0.5 0.5 #### NOTE The contrast attribute ####
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr "c" "t"
.. .. ..$ : chr "grp"
$ group_cont: num -0.5 -0.5 -0.5 -0.5 -0.5 0.5 0.5 0.5 0.5 0.5
$ y : num 0.161 0.518 0.417 0.335 0.301 ...
$ x : num 0.34 0.729 0.766 0.629 0.191 ...
> attr(test_data$group, "contrasts")
grp
c -0.5
t 0.5
So a attr was added but the type is still a factor. So lm treats it like a factor, providing you a coefficient for each level. Moreover, providing contrast or calling the attr inside lm will throw an error. Depending on what you want to the end to look like, you may need to explore a different package like contrast. There is also a contrast argument in lm but I am not 100% sure this is what you are really looking for. See ?lm for more on that.

How do you obtain confusion matrix for glmnet Multinomial logistic regression?

I fit the Multinomial logistic regression model and I'd like to obtain confusion matrix to obtain the accuracy
library("glmnet")
x=data.matrix(train[-1])
y= data.matrix(train[1])
x_test=data.matrix(test[-1])
y_test=unlist(test[1])
fit.glm=glmnet(x,y,family="multinomial",alpha = 1, type.multinomial = "grouped")
cvfit=cv.glmnet(x, y, family="multinomial", type.multinomial = "grouped", parallel = TRUE)
y_predict=unlist(predict(cvfit, newx = x_test, s = "lambda.min", type = "class"))
and then to calculate confusion matrix I use caret library
library("lattice")
library("ggplot2")
library("caret")
confusionMatrix(data=y_predict,reference=y_test)
I am getting this error which I do not know how to solve that
Error in confusionMatrix.default(data = y_predict, reference = y_test)
: The data must contain some levels that overlap the reference.
I just post the str of y_predict and y_test. They might be helpful
str(y_predict)
chr [1:301, 1] "6" "2" "7" "9" "3" "2" "3" "6" "6" "8" "6" "5" "6" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "1"
str(y_test)
Factor w/ 10 levels "accessory","activation",..: 6 8 2 9 3 2 3 5 10 8 ...
- attr(*, "names")= chr [1:301] "category1" "category2" "category3" "category4" ...
I use unlist to avoid getting this error Error: x must be atomic for 'sort.list'
It would make sense to keep track of your labels, and use that to convert the results from glmnet back to labels, and apply the confusion matrix. I use the iris dataset which has 3 labels:
idx = sample(nrow(iris),100)
train = iris[idx,]
test = iris[-idx,]
We convert the response into a numeric:
x = data.matrix(train[,-5])
y = as.numeric(train[,5]) - 1
x_test = data.matrix(test[,-5])
y_test = as.numeric(test[,5]) - 1
Fit, a bit different here, we get back the probabilities:
cvfit=cv.glmnet(x, y, family="multinomial")
y_predict=predict(cvfit, newx = x_test, s = "lambda.min", type = "response")
In this example, the response is the column Species, in yours it will be test[,1] :
ref_labels = test$Species
pred_labels = levels(test$Species)[max.col(y_predict[,,1])]
caret::confusionMatrix(table(pred_labels,ref_labels))
Confusion Matrix and Statistics
ref_labels
pred_labels setosa versicolor virginica
setosa 20 0 0
versicolor 0 12 0
virginica 0 0 18

R - Extract ns spline object from lmer model and predict on new data

I'm looking to predict 'terms', especially ns splines, from an lmer model. I've replicated the problem with the mtcars dataset (technically poor example, but works to get the point across).
Here is what I'm trying to do with a linear model:
data(mtcars)
mtcarsmodel <- lm(wt ~ ns(drat,2) + hp + as.factor(gear), data= mtcars)
summary(mtcarsmodel)
coef(mtcarsmodel)
test <- predict(mtcarsmodel, type = "terms")
Perfect. However, there is no equivalent 'terms' option for lmer predict (unresolved issue here).
mtcarsmodellmer <- lmer(wt ~ ns(drat,2) + (hp|as.factor(gear)), data= mtcars)
summary(mtcarsmodellmer)
coef(mtcarsmodellmer)
ranef(mtcarsmodellmer)
Given there is no equivalent ‘predict, terms’ function, I was going to extract the fixed and random coefficients above and apply the coefficients to the mtcars data, but have no idea on how to extract an ns spline object from a model and 'predict' it to some new data. The same goes for a 'poly' transformed variable eg. poly(drat, 2) - extra kudos if you can get this as well.
It is not difficult to do it yourself.
library(lme4)
library(splines)
X <- with(mtcars, ns(drat, 2)) ## design matrix for splines (without intercept)
## head(X)
# 1 2
#[1,] 0.5778474 -0.1560021
#[2,] 0.5778474 -0.1560021
#[3,] 0.5738625 -0.1792162
#[4,] 0.2334329 -0.1440232
#[5,] 0.2808520 -0.1704002
#[6,] 0.0000000 0.0000000
## str(X)
# ns [1:32, 1:2] 0.578 0.578 0.574 0.233 0.281 ...
# - attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:2] "1" "2"
# - attr(*, "degree")= int 3
# - attr(*, "knots")= Named num 3.7
# ..- attr(*, "names")= chr "50%"
# - attr(*, "Boundary.knots")= num [1:2] 2.76 4.93
# - attr(*, "intercept")= logi FALSE
# - attr(*, "class")= chr [1:3] "ns" "basis" "matrix"
fit <- lmer(wt ~ X + (hp|gear), data= mtcars)
beta <- coef(fit)
#$gear
# hp (Intercept) X1 X2
#3 0.010614406 2.455403 -2.167337 -0.9246454
#4 0.014601363 2.455403 -2.167337 -0.9246454
#5 0.006342761 2.455403 -2.167337 -0.9246454
#
#attr(,"class")
#[1] "coef.mer"
If we want to predict the ns term, just do
## use `predict.ns`; read `?predict.ns`
x0 <- seq(1, 5, by = 0.2) ## example `newx`
Xp <- predict(X, newx = x0) ## prediction matrix
b <- with(beta$gear, c(X1[1], X2[1])) ## coefficients for spline
y <- Xp %*% b ## predicted mean
plot(x0, y, type = "l")

How to plot MASS:qda scores

From this question, I was wondering if it's possible to extract the Quadratic discriminant analysis (QDA's) scores and reuse them after like PCA scores.
## follow example from ?lda
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
Sp = rep(c("s","c","v"), rep(50,3)))
set.seed(1) ## remove this line if you want it to be pseudo random
train <- sample(1:150, 75)
table(Iris$Sp[train])
## your answer may differ
## c s v
## 22 23 30
Using the QDA here
z <- qda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)
## get the whole prediction object
pred <- predict(z)
## show first few sample scores on LDs
Here, you can see that it's not working.
head(pred$x)
# NULL
plot(LD2 ~ LD1, data = pred$x)
# Error in eval(expr, envir, enclos) : object 'LD2' not found
NOTE: Too long/formatted for a comment. NOT AN ANSWER
You may want to try the rrcov package:
library(rrcov)
z <- QdaCov(Sp ~ ., Iris[train,], prior = c(1,1,1)/3)
pred <- predict(z)
str(pred)
## Formal class 'PredictQda' [package "rrcov"] with 4 slots
## ..# classification: Factor w/ 3 levels "c","s","v": 2 2 2 1 3 2 2 1 3 2 ...
## ..# posterior : num [1:41, 1:3] 5.84e-45 5.28e-50 1.16e-25 1.00 1.48e-03 ...
## ..# x : num [1:41, 1:3] -97.15 -109.44 -54.03 2.9 -3.37 ...
## ..# ct : 'table' int [1:3, 1:3] 13 0 1 0 16 0 0 0 11
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ Actual : chr [1:3] "c" "s" "v"
## .. .. ..$ Predicted: chr [1:3] "c" "s" "v"
It also has robust PCA methods that may be useful.
Unfortunately, not every model in R conforms to the same object structure/API and this won't be a linear model, so it is unlikely to conform to linear model fit structure APIs.
There's an example of how to visualize the qda results here — http://ramhiser.com/2013/07/02/a-brief-look-at-mixture-discriminant-analysis/
And, you can do:
library(klaR)
partimat(Sp ~ ., data=Iris, method="qda", subset=train)
for a partition plot of the qda results.

Create function to automatically create dataset from summary(fit <- lm( y ~ x1 + x2 +… xn)

This question is closely related to my previous question. The only difference is that instead of plotting the data, I want the raw data behind fit. I tried to solve it myself following the last answer but still got stuck.
So I want to retrieve from the fit of a linear regression, the independent variables, the fitted variable, the residual and standardized residual.
I will use the example, which was kindly created by Brian Diggs. So thank you.
dat <- data.frame(x1=rnorm(100), x2=rnorm(100,4,5), x3=rnorm(100,8,27), x4=rnorm(100,- 6,0.1),t=(1:100)+runif(100,-2,2))
dat <- transform(dat, y=x1+4*x2+3.6*x3+4.7*x4+rnorm(100,3,50))
fit <- lm(y~x1+x2+x3+x4, data=dat) # fit
dat$resid <- residuals(fit)
vars <- names(coef(fit))[-1]
The next step I am stuck, as before. I am trying to get only the variables which are used for the regression and bind them to a new dataset. I attempted the following, but it does not work. This step is wrong. I can bind residuals, fitted but not the variables used.
fit.data <- cbind(predict(fit),as.name(names(coef(fit))[2]))
Any help is really appreciated. Yes, still teaching R myself.
You cannot cbind together things which are not of corresponding dimension. For that purpose you need a list. You also will want to work with the fit object since the summary object does not have the fitted values (and maybe would not work with rstandard() but I'm not sure about that).
mod.results <- list(vars = names(coef(fit))[-1],
fitted.values=fit$fitted.values,
residuals = residuals(fit),
std.resid = rstandard(fit))
Putting it in a function is trivial:
> extr.res <- function(fit) {mod.results <- list(vars = names(coef(fit)),
fitted.values=fit$fitted.values, residuals = residuals(fit), std.resid = rstandard(fit)) }
> str(extr.res(fit))
List of 4
$ vars : chr [1:5] "(Intercept)" "x1" "x2" "x3" ...
$ fitted.values: Named num [1:100] -36.19 31.4 -2.59 -130.03 -1.12 ...
..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
$ residuals : Named num [1:100] -71.6 -21.2 -50.7 19 -58.5 ...
..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
$ : Named num [1:100] -1.608 -0.487 -1.175 0.435 -1.297 ...
..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...

Resources