Related
I have a dataset called datamoth where survival is the response variable and treatment is a variable that can be considered both categorical and quantitative. The dataset looks like follows:
survival <- c(17,22,26,20,11,14,37,26,24,11,11,16,8,5,12,3,5,4,14,8,4,6,3,3,10,13,5,7,3,3)
treatment <- c(3,3,3,3,3,3,6,6,6,6,6,6,9,9,9,9,9,9,12,12,12,12,12,12,21,21,21,21,21,21)
days <- c(3,3,3,3,3,3,6,6,6,6,6,6,9,9,9,9,9,9,12,12,12,12,12,12,21,21,21,21,21,21)
datamoth <- data.frame(survival, treatment)
So, I can fit a linear regression model considering treatment as categorical, like this:
lmod<-lm(survival ~ factor(treatment), datamoth)
My question is how to fit a linear regression model with treatment as categorical variable but also considering treatment as a quantitative confounding variable.
I have figured out something like this:
model <- lm(survival ~ factor(treatment) + factor(treatment)*days, data = datamoth)
summary(model)
Call:
lm(formula = survival ~ factor(treatment) + factor(treatment) *
days, data = datamoth)
Residuals:
Min 1Q Median 3Q Max
-9.833 -3.333 -1.167 3.167 16.167
Coefficients: (5 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.333 2.435 7.530 6.96e-08 ***
factor(treatment)6 2.500 3.443 0.726 0.47454
factor(treatment)9 -12.167 3.443 -3.534 0.00162 **
factor(treatment)12 -12.000 3.443 -3.485 0.00183 **
factor(treatment)21 -11.500 3.443 -3.340 0.00263 **
days NA NA NA NA
factor(treatment)6:days NA NA NA NA
factor(treatment)9:days NA NA NA NA
factor(treatment)12:days NA NA NA NA
factor(treatment)21:days NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.964 on 25 degrees of freedom
Multiple R-squared: 0.5869, Adjusted R-squared: 0.5208
F-statistic: 8.879 on 4 and 25 DF, p-value: 0.0001324
But obviously this code is not working because these two variables are collinear.
Does anyone to know how to fix it? Any help will be appreciated.
I am fitting a mixed model using glmmLasso in R using the command:
glmmLasso(fix = Activity ~ Novelty + Valence + ROI + Novelty:Valence +
Novelty:ROI + Valence:ROI + Novelty:Valence:ROI, rnd = list(Subject = ~1),
data = KNov, lambda = 195, switch.NR = F, final.re = TRUE)
To give you a sense of the data, the output of head(KNov) is:
Subject Activity ROI Novelty Valence Side STAIt
1 202 -0.4312944 H N E L -0.2993321
2 202 -0.6742497 H N N L -0.2993321
3 202 -1.0914216 H R E L -0.2993321
4 202 -0.6296091 H R N L -0.2993321
5 202 -0.6023507 H N E R -0.2993321
6 202 -1.1554196 H N N R -0.2993321
(I used KNov$Subject <- factor(KNov$Subject) to have Subject read as a categorical variable)
Activity is a measure of brain activity, Novelty and Valence are categorical variables coding the type of stimulus used to elicit the response and ROI is a categorical variable coding three regions of the brain that we have sampled this activity from. Subject is an ID number for the individuals the data was sampled from (n=94).
The output for glmmLasso is:
Fixed Effects:
Coefficients:
Estimate StdErr z.value p.value
(Intercept) 0.232193 0.066398 3.4970 0.0004705 ***
NoveltyR -0.190878 0.042333 -4.5089 6.516e-06 ***
ValenceN -0.164214 NA NA NA
ROIB 0.000000 NA NA NA
ROIH 0.000000 NA NA NA
NoveltyR:ValenceN 0.064523 0.077290 0.8348 0.4038189
NoveltyR:ROIB 0.000000 NA NA NA
NoveltyR:ROIH 0.000000 NA NA NA
ValenceN:ROIB -0.424670 0.069561 -6.1050 1.028e-09 ***
ValenceN:ROIH 0.000000 NA NA NA
NoveltyR:ValenceN:ROIB 0.000000 NA NA NA
NoveltyR:ValenceN:ROIH 0.000000 NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Random Effects:
StdDev:
Subject
Subject 0.6069078
I would like to get a p-value for the effect of valence. My first thought was that the p-value for valence was not included because it was non-significant and only included in the model because it is part of the significant ValenceR:ROIB interaction, however NoveltyR:ValenceN was also non-significant, but a p-value is given for that. I would like a p-value for valence even if it is non-significant, as this analysis is going to be used for a paper, and I prefer to report actual p-values rather than p>.05.
The problem here is most likely due to a "reduced rank set of predictors", i.e you have a lot of combinations where there are either no entries or where some smaller subset of entries is sufficient to unambiguously precits the rest of the values,. I suggest you run this code and notice that you get zero cells.
with(KNov, table( Novelty ,
Valence,
ROI ,
interaction(Novelty, Valence) )
Please be patient with me. I'm new to this site.
I am modeling turtle nest survival using the coxph() function and have run into a confusing problem with an interaction term between species and nest cages. I have nests from 3 species of turtles (7, 10, and 111 nests per species).
There are nest cages on all nests for the species(1) with 7 nests.
There are no nest cages on all the nests for the species(2) with 10 nests.
There are nest cages on about half of the nests for the species(3) with 111 nests.
Here is my model with the summary output:
S<-Surv(time, event)
n8<-coxph(S~species:cage, data=nesta1)
Warning message:
In coxph(S ~ species:cage, data = nesta1) :
X matrix deemed to be singular; variable 1 5 6
summary(n8)
Call:
coxph(formula = S ~ species:cage, data = nesta1)
n= 128, number of events= 73
coef exp(coef) se(coef) z Pr(>|z|)
species1:cageN NA NA 0.0000 NA NA
species2:cageN 1.2399 3.4554 0.3965 3.128 0.00176 **
species3:cageN 0.5511 1.7351 0.2664 2.068 0.03860 *
species1:cageY -0.1054 0.8999 0.6145 -0.172 0.86379
species2:cageY NA NA 0.0000 NA NA
species3:cageY NA NA 0.0000 NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
species1:cageN NA NA NA NA
species2:cageN 3.4554 0.2894 1.5887 7.515
species3:cageN 1.7351 0.5763 1.0293 2.925
species1:cageY 0.8999 1.1112 0.2698 3.001
species2:cageY NA NA NA NA
species3:cageY NA NA NA NA
Concordance= 0.61 (se = 0.038 )
Rsquare= 0.079 (max possible= 0.993 )
Likelihood ratio test= 10.57 on 3 df, p=0.01426
Wald test = 11.36 on 3 df, p=0.009908
Score (logrank) test = 12.22 on 3 df, p=0.006672
I understand that I would have singularities for species 1 and 2, but not for species 3. Why would the "species3:cageY" line be singular when there are species 3 nests with nest cages on them?
Is it ok to include species 1 and 2 even though they have those singularities?
Edit: I cannot find any errors in my data. I have decimal numbers for the time variable for a few nests, but that doesn't seem to be a problem for species 3 nests without a nest cage. For species 3, I have the full range of time values for nests with and without a nest cage and I have both true and false events for nests with and without a nest cage.
Edit:
with( nesta1, table(event, species, cage))
, , cage = N
species
event 1 2 3
0 0 1 24
1 0 9 38
, , cage = Y
species
event 1 2 3
0 4 0 26
1 3 0 23
Edit 2: I understand that interaction-only models are not very useful, but the interaction term results behave the same way whether I have other main effects in the model or not. I've removed the other main effects to simplify this question.
Thank you!
The study randomized participants by Source (Expert vs Attractive) and by Argument (Strong vs Weak), were categorized into Monitor type (High vs Low). I want to test the significance of the main effects, the two-way interactions, and the three-way interactions of the following dataframe - specifically,
Main effects = Self-Monitors (High vs. Low), Argument (Strong vs. Weak), Source (Attractive vs. Expert)
Two-way interactions = Self-MonitorsArgument, Self-MonitorsSource, Argument*Source
Three-way interactions = Self-MonitorsArgumentSource
This is the code:
data<-data.frame(Monitor=c(rep("High.Self.Monitors", 24),rep("Low.Self.Monitors", 24)),
Argument=c(rep("Strong", 24), rep("Weak", 24), rep("Strong", 24), rep("Weak", 24)),
Source=c(rep("Expert",12),rep("Attractive",12),rep("Expert",12),rep("Attractive",12),
rep("Expert",12),rep("Attractive",12),rep("Expert",12),rep("Attractive",12)),
Response=c(4,3,4,5,2,5,4,6,3,4,5,4,4,4,2,3,5,3,2,3,4,3,2,4,3,5,3,2,6,4,4,3,5,3,2,3,5,5,7,5,6,4,3,5,6,7,7,6,
3,5,5,4,3,2,1,5,3,4,3,4,5,4,3,2,4,6,2,4,4,3,4,3,5,6,4,7,6,7,5,6,4,6,7,5,6,4,4,2,4,5,4,3,4,2,3,4))
data$Monitor<-as.factor(data$Monitor)
data$Argument<-as.factor(data$Argument)
data$Source<-as.factor(data$Source)
I'd like to obtain the main effects, as well as all two-way interactions and the three-way interaction. However, if I type in anova(lm(Response ~ Monitor*Argument*Source, data=data)) I obtain:
Analysis of Variance Table
Response: Response
Df Sum Sq Mean Sq F value Pr(>F)
Monitor 1 24.000 24.0000 13.5322 0.0003947 ***
Source 1 0.667 0.6667 0.3759 0.5413218
Monitor:Source 1 0.667 0.6667 0.3759 0.5413218
Residuals 92 163.167 1.7736
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
and if I enter summary(aov(Response ~ Monitor*Argument*Source, data=data))
Call:
lm.default(formula = Response ~ Monitor * Argument * Source,
data = data)
Residuals:
Min 1Q Median 3Q Max
-2.7917 -0.7917 0.2083 1.2083 2.5417
Coefficients: (4 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.4583 0.2718 12.722 < 2e-16 ***
MonitorLow.Self.Monitors 1.1667 0.3844 3.035 0.00313 **
ArgumentWeak NA NA NA NA
SourceExpert 0.3333 0.3844 0.867 0.38817
MonitorLow.Self.Monitors:ArgumentWeak NA NA NA NA
MonitorLow.Self.Monitors:SourceExpert -0.3333 0.5437 -0.613 0.54132
ArgumentWeak:SourceExpert NA NA NA NA
MonitorLow.Self.Monitors:ArgumentWeak:SourceExpert NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.332 on 92 degrees of freedom
Multiple R-squared: 0.1344, Adjusted R-squared: 0.1062
F-statistic: 4.761 on 3 and 92 DF, p-value: 0.00394
Any thoughts or ideas?
Edit
Your data isn't well randomized as you say. In order to estimate a three-way interaction you'd have to have a group of subjects having "Low", "Strong" and "Expert" combination of levels of the three factors. You do not have such a group.
Look at:
table(data[,1:3])
For example.
I wonder how to extract the Multivariate Tests: Site portion from the output of fm1 in the following MWE.
library(car)
fm1 <- summary(Anova(lm(cbind(Al, Fe, Mg, Ca, Na) ~ Site, data=Pottery)))
fm1
Type II MANOVA Tests:
Sum of squares and products for error:
Al Fe Mg Ca Na
Al 48.2881429 7.08007143 0.60801429 0.10647143 0.58895714
Fe 7.0800714 10.95084571 0.52705714 -0.15519429 0.06675857
Mg 0.6080143 0.52705714 15.42961143 0.43537714 0.02761571
Ca 0.1064714 -0.15519429 0.43537714 0.05148571 0.01007857
Na 0.5889571 0.06675857 0.02761571 0.01007857 0.19929286
------------------------------------------
Term: Site
Sum of squares and products for the hypothesis:
Al Fe Mg Ca Na
Al 175.610319 -149.295533 -130.809707 -5.8891637 -5.3722648
Fe -149.295533 134.221616 117.745035 4.8217866 5.3259491
Mg -130.809707 117.745035 103.350527 4.2091613 4.7105458
Ca -5.889164 4.821787 4.209161 0.2047027 0.1547830
Na -5.372265 5.325949 4.710546 0.1547830 0.2582456
Multivariate Tests: Site
Df test stat approx F num Df den Df Pr(>F)
Pillai 3 1.55394 4.29839 15 60.00000 2.4129e-05 ***
Wilks 3 0.01230 13.08854 15 50.09147 1.8404e-12 ***
Hotelling-Lawley 3 35.43875 39.37639 15 50.00000 < 2.22e-16 ***
Roy 3 34.16111 136.64446 5 20.00000 9.4435e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I also couldn't find how to extract the table of tests but as a workaround you can calculate the results by running the Anova command over all test types.
However the print method, print.Anova.mlm does not return the results, so this needs to be tweaked a little.
library(car)
# create new print function
outtests <- car:::print.Anova.mlm
# allow the function to return the results and disable print
body(outtests)[[16]] <- quote(invisible(tests))
body(outtests)[[15]] <- NULL
# Now run the regression
mod <- lm(cbind(Al, Fe, Mg, Ca, Na) ~ Site, data=Pottery)
# Run the Anova over all tests
tab <- lapply(c("Pillai", "Wilks", "Hotelling-Lawley", "Roy"),
function(i) outtests(Anova(mod, test.statistic=i)))
tab <- do.call(rbind, tab)
row.names(tab) <- c("Pillai", "Wilks", "Hotelling-Lawley", "Roy")
tab
# Type II MANOVA Tests: Pillai test statistic
# Df test stat approx F num Df den Df Pr(>F)
#Pillai 3 1.554 4.298 15 60.000 2.413e-05 ***
#Wilks 3 0.012 13.089 15 50.091 1.840e-12 ***
#Hotelling-Lawley 3 35.439 39.376 15 50.000 < 2.2e-16 ***
#Roy 3 34.161 136.644 5 20.000 9.444e-15 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As the output table is of class anova and data.frame you can use xtable on it.
xtable:::xtable(tab)
fm1$multivariate.tests gets you to the Site portion of the fm1 output.
Then you could use a combination of cat and capture.output for nice printing, or just capture.output for a character vector.
> cat(capture.output(fm1$multivariate.tests)[18:26], sep = "\n")
#
# Multivariate Tests: Site
# Df test stat approx F num Df den Df Pr(>F)
# Pillai 3 1.55394 4.29839 15 60.00000 2.4129e-05 ***
# Wilks 3 0.01230 13.08854 15 50.09147 1.8404e-12 ***
# Hotelling-Lawley 3 35.43875 39.37639 15 50.00000 < 2.22e-16 ***
# Roy 3 34.16111 136.64446 5 20.00000 9.4435e-15 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Update: From the result of
unlist(fm1$multivariate.tests, recursive = FALSE)
it doesn't look like the results are easily accessible as numeric values. So, as you requested, here is what it took to manipulate the results into a matrix. Having done this and then seen user20650's answer, I recommend you follow that suggestion and get the values via an ANOVA table.
co <- capture.output(fm1$multivariate.tests)[20:24]
s <- strsplit(gsub("([*]+$)|[<]", "", co[-1]), "\\s+")
dc <- do.call(rbind, lapply(s, function(x) as.numeric(x[-1])))
row.names(dc) <- sapply(s, "[", 1)
s2 <- strsplit(co[1], " ")[[1]]
s2 <- s2[nzchar(s2)]
s3 <- s2[-c(1, length(s2))]
colnames(dc) <- c(s2[1], paste(s3[c(TRUE, FALSE)], s3[c(FALSE, TRUE)]), s2[10])
dc
# Df test stat approx F num Df den Df Pr(>F)
# Pillai 3 1.55394 4.29839 15 60.00000 2.4129e-05
# Wilks 3 0.01230 13.08854 15 50.09147 1.8404e-12
# Hotelling-Lawley 3 35.43875 39.37639 15 50.00000 2.2200e-16
# Roy 3 34.16111 136.64446 5 20.00000 9.4435e-15
If anyone feels like improving my second code chunk, feel free.