Anova in R to fetch the significant codes - r

Code:
pred_model = anova(m1, m2, test="Chisq")
pred_model
Results:
Analysis of Variance Table
Model 1: male_birth ~ male_death + female_birth + female_death
Model 2: male_birth ~ male_death + female_birth
Res.Df RSS Df Sum of Sq Pr(>Chi)
1 48 3.4883
2 49 3.4951 -1 -0.0068189 0.7594
I am new to R, can anyone suggest how to fetch the significant codes for the model

The summary function will often return a matrix with a column of p-values. In this case, however, the result of anova is returned as a dataframe which has been further "class()-ed" as an "anova"-object (so that it can have its own print-method). Run the first example in ?lm and then:
> lm.D9 <- lm(weight ~ group)
> lm.0 <- lm(weight ~ 1)
> anova(lm.D9,lm.0)
Analysis of Variance Table
Model 1: weight ~ group
Model 2: weight ~ 1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 18 8.7292
2 19 9.4175 -1 -0.68821 1.4191 0.249
> str( anova(lm.D9,lm.0) )
Classes ‘anova’ and 'data.frame': 2 obs. of 6 variables:
$ Res.Df : num 18 19
$ RSS : num 8.73 9.42
$ Df : num NA -1
$ Sum of Sq: num NA -0.688
$ F : num NA 1.42
$ Pr(>F) : num NA 0.249
- attr(*, "heading")= chr "Analysis of Variance Table\n" "Model 1: weight ~ group\nModel 2: weight ~ 1"
So you want the second value in the column named that is named" 'Pr(>F)'
anova(lm.D9,lm.0)$'Pr(>F)'[2]
[1] 0.2490232

Related

Error in MEEM(object, conLin, control$niterEM) in lme function

I'm trying to apply the lme function to my data, but the model gives follow message:
mod.1 = lme(lon ~ sex + month2 + bat + sex*month2, random=~1|id, method="ML", data = AA_patch_GLM, na.action=na.exclude)
Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
dput for data, copy from https://pastebin.com/tv3NvChR (too large to include here)
str(AA_patch_GLM)
'data.frame': 2005 obs. of 12 variables:
$ lon : num -25.3 -25.4 -25.4 -25.4 -25.4 ...
$ lat : num -51.9 -51.9 -52 -52 -52 ...
$ id : Factor w/ 12 levels "24641.05","24642.03",..: 1 1 1 1 1 1 1 1 1 1 ...
$ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
$ bat : int -3442 -3364 -3462 -3216 -3216 -2643 -2812 -2307 -2131 -2131 ...
$ year : chr "2005" "2005" "2005" "2005" ...
$ month : chr "12" "12" "12" "12" ...
$ patch_id: Factor w/ 45 levels "111870.17_1",..: 34 34 34 34 34 34 34 34 34 34 ...
$ YMD : Date, format: "2005-12-30" "2005-12-31" "2005-12-31" ...
$ month2 : Ord.factor w/ 7 levels "January"<"February"<..: 7 7 7 7 7 1 1 1 1 1 ...
$ lonsc : num [1:2005, 1] -0.209 -0.213 -0.215 -0.219 -0.222 ...
$ batsc : num [1:2005, 1] 0.131 0.179 0.118 0.271 0.271 ...
What's the problem?
I saw a solution applying the lme4::lmer function, but there is another option to continue to use lme function?
The problem is that you have collinear combinations of predictors. In particular, here are some diagnostics:
## construct the fixed-effect model matrix for your problem
X <- model.matrix(~ sex + month2 + bat + sex*month2, data = AA_patch_GLM)
lc <- caret::findLinearCombos(X)
colnames(X)[lc$linearCombos[[1]]]
## [1] "sexM:month2^6" "(Intercept)" "sexM" "month2.L"
## [5] "month2.C" "month2^4" "month2^5" "month2^6"
## [9] "sexM:month2.L" "sexM:month2.C" "sexM:month2^4" "sexM:month2^5"
This is in a weird order, but it suggests that the sex × month interaction is causing problems. Indeed:
with(AA_patch_GLM, table(sex, month2))
## sex January February March April May June December
## F 367 276 317 204 43 0 6
## M 131 93 90 120 124 75 159
shows that you're missing data for one sex/month combination (i.e., no females were sampled in June).
You can:
construct the sex/month interaction yourself (data$SM <- with(data, interaction(sex, month2, drop = TRUE))) and use ~ SM + bat — but then you'll have to sort out main effects and interactions yourself (ugh)
construct the model matrix by hand (as above), drop the redundant column(s), then include all the resulting columns in the model:
d2 <- with(AA_patch_GLM,
data.frame(lon,
as.data.frame(X),
id))
## drop linearly dependent column
## note data.frame() has "sanitized" variable names (:, ^ both converted to .)
d2 <- d2[names(d2) != "sexM.month2.6"]
lme(reformulate(colnames(d2)[2:15], response = "lon"),
random=~1|id, method="ML", data = d2)
Again, the results will be uglier than the simpler version of the model.
use a patched version of nlme (I submitted a patch here but it hasn't been considered)
remotes::install_github("bbolker/nlme")

C-index for each treatment arm with variable-treatment interaction

I have difficulty calculating the C-index (UnoC with survAUC R package) for each treatment arm to assess the variable-treatment interaction.
I have a database with 4 explanatory variables X1, X2, X3, X4, as follows:
> str(data)
'data.frame': 1000 obs. of 7 variables:
$ X1 : num -0.578 0.351 0.759 -0.858 -1.022 ...
$ X2 : num -0.7897 0.0339 -1.608 -1.1642 -0.0787 ...
$ X3 : num -0.1561 -0.7147 -0.8229 -0.1519 -0.0318 ...
$ X4 : num 1.4161 -0.0688 -0.155 -0.1571 -0.649 ...
$ TRT : num 0 0 0 0 0 0 0 1 0 1 ...
$ time: num 6.52 2.15 3 1.31 1.56 ...
$ stat: num 1 1 1 1 1 1 1 1 1 1 ...
The variable X4 interacts with the treatment variable and I don't have censored data.
I would like to calculate the C-index (UnoC) for each treatment arm. I expect the C-index to be equal to 0.5 in the control arm and much higher in the experimental arm.
But, I get almost the same value for both arms!
Can anyone confirm that: if I have a strong interaction between a variable and the treatment, the C-index in the experimental arm is high and in the control arm = 0.5?
Here is my attempt:
TR <- data[1:500,]
TE <- data[501:1000,]
s <- Surv(TR$time, TR$stat)
sNew <- Surv(TE$time, TE$stat)
train.fit <- coxph(Surv(time, stat) ~ X4, data=TR)
lpnew <- predict(train.fit, newdata=TE)
# The C-index for each treatment arm
UnoC(Surv.rsp = s[TR$TRT == 1], Surv.rsp.new = sNew[TE$TRT == 1], lpnew = lpnew[TE$TRT == 1])
[1] 0.7577109
UnoC(Surv.rsp = s[TR$TRT == 0], Surv.rsp.new = sNew[TE$TRT == 0], lpnew = -lpnew[TE$TRT == 0])
[1] 0.7295202
Thank you for your Help

R: Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' family")

I tried to use glm for estimate soccer teams strengths.
# data is dataframe (structure on bottom).
model <- glm(Goals ~ Home + Team + Opponent, family=poisson(link=log), data=data)
but get the error:
Error in if (any(y < 0)) stop("negative values not allowed for the 'Poisson' family") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In Ops.factor(y, 0) : ‘<’ not meaningful for factors
data:
> data
Team Opponent Goals Home
1 5a51f2589d39c31899cce9d9 5a51f2579d39c31899cce9ce 3 1
2 5a51f2579d39c31899cce9ce 5a51f2589d39c31899cce9d9 0 0
3 5a51f2589d39c31899cce9da 5a51f2579d39c31899cce9cd 3 1
4 5a51f2579d39c31899cce9cd 5a51f2589d39c31899cce9da 0 0
> is.factor(data$Goals)
[1] TRUE
From the "details" section of documentation for glm() function:
A typical predictor has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response.
So you want to make sure your Goals column is numeric:
df <- data.frame( Team= c("5a51f2589d39c31899cce9d9", "5a51f2579d39c31899cce9ce", "5a51f2589d39c31899cce9da", "5a51f2579d39c31899cce9cd"),
Opponent=c("5a51f2579d39c31899cce9ce", "5a51f2589d39c31899cce9d9", "5a51f2579d39c31899cce9cd", "5a51f2589d39c31899cce9da "),
Goals=c(3,0,3,0),
Home=c(1,0,1,0))
str(df)
#'data.frame': 4 obs. of 4 variables:
# $ Team : Factor w/ 4 levels "5a51f2579d39c31899cce9cd",..: 3 2 4 1
# $ Opponent: Factor w/ 4 levels "5a51f2579d39c31899cce9cd",..: 2 3 1 4
# $ Goals : num 3 0 3 0
# $ Home : num 1 0 1 0
model <- glm(Goals ~ Home + Team + Opponent, family=poisson(link=log), data=df)
Then here is the output:
> model
Call: glm(formula = Goals ~ Home + Team + Opponent, family = poisson(link = log),
data = df)
Coefficients:
(Intercept) Home Team5a51f2579d39c31899cce9ce
-2.330e+01 2.440e+01 -3.089e-14
Team5a51f2589d39c31899cce9d9 Team5a51f2589d39c31899cce9da Opponent5a51f2579d39c31899cce9ce
-6.725e-15 NA NA
Opponent5a51f2589d39c31899cce9d9 Opponent5a51f2589d39c31899cce9da
NA NA
Degrees of Freedom: 3 Total (i.e. Null); 0 Residual
Null Deviance: 8.318
Residual Deviance: 3.033e-10 AIC: 13.98

How to get p-value from a summary anova test in R? [duplicate]

This question already has answers here:
Extract p-value from aov
(7 answers)
Closed 2 years ago.
I have this R code
As = rnorm(5, mean = 0, sd = 5)
Bs = rnorm(5, mean = 0, sd = 5)
Cs = rnorm(5, mean = 0, sd = 5)
dat = data.frame(factor = c("A","A","A","A","A","B","B","B","B","B",
"C","C","C","C","C"),
response = c(As, Bs, Cs))
summary(aov(response ~ factor, data = dat))
That return this result
> summary(aov(response ~ factor, data = dat))
Df Sum Sq Mean Sq F value Pr(>F)
factor 2 36.08 18.04 0.807 0.469
Residuals 12 268.22 22.
I would like to access to the value Pr(>F) that is 0.469. How can I do that?
If you take a look at the structrure of your result (see ?str function) you'll realize about how to use indexing operator ([) in order to get Pr(>F).
> str(result)
List of 1
$ :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables:
..$ Df : num [1:2] 2 12
..$ Sum Sq : num [1:2] 35.7 261.3
..$ Mean Sq: num [1:2] 17.9 21.8
..$ F value: num [1:2] 0.82 NA
..$ Pr(>F) : num [1:2] 0.464 NA
- attr(*, "class")= chr [1:2] "summary.aov" "listof"
You can do it this way
> result[[1]][["Pr(>F)"]][1]
[1] 0.4636054
or
> result[[1]][[5]][1]
[1] 0.4636054
s<-summary(aov(response ~ factor, data = dat))
unlist(s)['Pr(>F)1']
Pr(>F)1
0.1125695

How do I extract lmer fixed effects by observation?

I have a lme object, constructed from some repeated measures nutrient intake data (two 24-hour intake periods per RespondentID):
Male.lme2 <- lmer(BoxCoxXY ~ -1 + AgeFactor + IntakeDay + (1|RespondentID),
data = Male.Data,
weights = SampleWeight)
and I can successfully retrieve the random effects by RespondentID using ranef(Male.lme1). I would also like to collect the result of the fixed effects by RespondentID. coef(Male.lme1) does not provide exactly what I need, as I show below.
> summary(Male.lme1)
Linear mixed model fit by REML
Formula: BoxCoxXY ~ AgeFactor + IntakeDay + (1 | RespondentID)
Data: Male.Data
AIC BIC logLik deviance REMLdev
9994 10039 -4990 9952 9980
Random effects:
Groups Name Variance Std.Dev.
RespondentID (Intercept) 0.19408 0.44055
Residual 0.37491 0.61230
Number of obs: 4498, groups: RespondentID, 2249
Fixed effects:
Estimate Std. Error t value
(Intercept) 13.98016 0.03405 410.6
AgeFactor4to8 0.50572 0.04084 12.4
AgeFactor9to13 0.94329 0.04159 22.7
AgeFactor14to18 1.30654 0.04312 30.3
IntakeDayDay2Intake -0.13871 0.01809 -7.7
Correlation of Fixed Effects:
(Intr) AgFc48 AgF913 AF1418
AgeFactr4t8 -0.775
AgeFctr9t13 -0.761 0.634
AgFctr14t18 -0.734 0.612 0.601
IntkDyDy2In -0.266 0.000 0.000 0.000
I have appended the fitted results to my data, head(Male.Data) shows
NutrientID RespondentID Gender Age SampleWeight IntakeDay IntakeAmt AgeFactor BoxCoxXY lmefits
2 267 100020 1 12 0.4952835 Day1Intake 12145.852 9to13 15.61196 15.22633
7 267 100419 1 14 0.3632839 Day1Intake 9591.953 14to18 15.01444 15.31373
8 267 100459 1 11 0.4952835 Day1Intake 7838.713 9to13 14.51458 15.00062
12 267 101138 1 15 1.3258785 Day1Intake 11113.266 14to18 15.38541 15.75337
14 267 101214 1 6 2.1198688 Day1Intake 7150.133 4to8 14.29022 14.32658
18 267 101389 1 5 2.1198688 Day1Intake 5091.528 4to8 13.47928 14.58117
The first couple of lines from coef(Male.lme1) are:
$RespondentID
(Intercept) AgeFactor4to8 AgeFactor9to13 AgeFactor14to18 IntakeDayDay2Intake
100020 14.28304 0.5057221 0.9432941 1.306542 -0.1387098
100419 14.00719 0.5057221 0.9432941 1.306542 -0.1387098
100459 14.05732 0.5057221 0.9432941 1.306542 -0.1387098
101138 14.44682 0.5057221 0.9432941 1.306542 -0.1387098
101214 13.82086 0.5057221 0.9432941 1.306542 -0.1387098
101389 14.07545 0.5057221 0.9432941 1.306542 -0.1387098
To demonstrate how the coef results relate to the fitted estimates in Male.Data (which were grabbed using Male.Data$lmefits <- fitted(Male.lme1), for the first RespondentID, who has the AgeFactor level 9-13:
- the fitted value is 15.22633, which equals - from the coeffs - (Intercept) + (AgeFactor9-13) = 14.28304 + 0.9432941
Is there a clever command for me to use that will do want I want automatically, which is to extract the fixed effect estimate for each subject, or am I faced with a series of if statements trying to apply the correct AgeFactor level to each subject to get the correct fixed effect estimate, after deducting the random effect contribution off the Intercept?
Update, apologies, was trying to cut down on the output I was providing and forgot about str(). Output is:
>str(Male.Data)
'data.frame': 4498 obs. of 11 variables:
$ NutrientID : int 267 267 267 267 267 267 267 267 267 267 ...
$ RespondentID: Factor w/ 2249 levels "100020","100419",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Gender : int 1 1 1 1 1 1 1 1 1 1 ...
$ Age : int 12 14 11 15 6 5 10 2 2 9 ...
$ BodyWeight : num 51.6 46.3 46.1 63.2 28.4 18 38.2 14.4 14.6 32.1 ...
$ SampleWeight: num 0.495 0.363 0.495 1.326 2.12 ...
$ IntakeDay : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
$ IntakeAmt : num 12146 9592 7839 11113 7150 ...
$ AgeFactor : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
$ BoxCoxXY : num 15.6 15 14.5 15.4 14.3 ...
$ lmefits : num 15.2 15.3 15 15.8 14.3 ...
The BodyWeight and Gender aren't being used (this is the males data, so all the Gender values are the same) and the NutrientID is similarly fixed for the data.
I have been doing horrible ifelse statements sinced I posted, so will try out your suggestion immediately. :)
Update2: this works perfectly with my current data and should be future-proof for new data, thanks to DWin for the extra help in the comment for this. :)
AgeLevels <- length(unique(Male.Data$AgeFactor))
Temp <- as.data.frame(fixef(Male.lme1)['(Intercept)'] +
c(0,fixef(Male.lme1)[2:AgeLevels])[
match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18", "19to30","31to50","51to70","71Plus") )] +
c(0,fixef(Male.lme1)[(AgeLevels+1)])[
match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )])
names(Temp) <- c("FxdEffct")
Below is how I've always found it easiest to extract the individuals' fixed effects and random effects components in the lme4-package. It actually extracts the corresponding fit to each observation. Assuming we have a mixed-effects model of form:
y = Xb + Zu + e
where Xb are the fixed effects and Zu are the random effects, we can extract the components (using lme4's sleepstudy as an example):
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
# Xb
fix <- getME(fm1,'X') %*% fixef(fm1)
# Zu
ran <- t(as.matrix(getME(fm1,'Zt'))) %*% unlist(ranef(fm1))
# Xb + Zu
fixran <- fix + ran
I know that this works as a generalized approach to extracting components from linear mixed-effects models. For non-linear models, the model matrix X contains repeats and you may have to tailor the above code a bit. Here's some validation output as well as a visualization using lattice:
> head(cbind(fix, ran, fixran, fitted(fm1)))
[,1] [,2] [,3] [,4]
[1,] 251.4051 2.257187 253.6623 253.6623
[2,] 261.8724 11.456439 273.3288 273.3288
[3,] 272.3397 20.655691 292.9954 292.9954
[4,] 282.8070 29.854944 312.6619 312.6619
[5,] 293.2742 39.054196 332.3284 332.3284
[6,] 303.7415 48.253449 351.9950 351.9950
# Xb + Zu
> all(round((fixran),6) == round(fitted(fm1),6))
[1] TRUE
# e = y - (Xb + Zu)
> all(round(resid(fm1),6) == round(sleepstudy[,"Reaction"]-(fixran),6))
[1] TRUE
nobs <- 10 # 10 observations per subject
legend = list(text=list(c("y", "Xb + Zu", "Xb")), lines = list(col=c("blue", "red", "black"), pch=c(1,1,1), lwd=c(1,1,1), type=c("b","b","b")))
require(lattice)
xyplot(
Reaction ~ Days | Subject, data = sleepstudy,
panel = function(x, y, ...){
panel.points(x, y, type='b', col='blue')
panel.points(x, fix[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='black')
panel.points(x, fixran[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='red')
},
key = legend
)
It is going to be something like this (although you really should have given us the results of str(Male.Data) because model output does not tell us the factor levels for the baseline values:)
#First look at the coefficients
fixef(Male.lme2)
#Then do the calculations
fixef(Male.lme2)[`(Intercept)`] +
c(0,fixef(Male.lme2)[2:4])[
match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18") )] +
c(0,fixef(Male.lme2)[5])[
match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )]
You are basically running the original data through a match function to pick the correct coefficient(s) to add to the intercept ... which will be 0 if the data is the factor's base level (whose spelling I am guessing at.)
EDIT: I just noticed that you put a "-1" in the formula so perhaps all of your AgeFactor terms are listed in the output and you can tale out the 0 in the coefficient vector and the invented AgeFactor level in the match table vector.

Resources