By default lm summary test slope coefficient equal to zero. My question is very basic. I want to know how to test slope coefficient equal to non-zero value. One approach could be to use confint but this does not provide p-value. I also wonder how to do one-sided test with lm.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2,10,20, labels=c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
summary(lm.D9)
Call:
lm(formula = weight ~ group)
Residuals:
Min 1Q Median 3Q Max
-1.0710 -0.4938 0.0685 0.2462 1.3690
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0320 0.2202 22.850 9.55e-15 ***
groupTrt -0.3710 0.3114 -1.191 0.249
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158
F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249
confint(lm.D9)
2.5 % 97.5 %
(Intercept) 4.56934 5.4946602
groupTrt -1.02530 0.2833003
Thanks for your time and effort.
as #power says, you can do by your hand.
here is an example:
> est <- summary.lm(lm.D9)$coef[2, 1]
> se <- summary.lm(lm.D9)$coef[2, 2]
> df <- summary.lm(lm.D9)$df[2]
>
> m <- 0
> 2 * abs(pt((est-m)/se, df))
[1] 0.2490232
>
> m <- 0.2
> 2 * abs(pt((est-m)/se, df))
[1] 0.08332659
and you can do one-side test by omitting 2*.
UPDATES
here is an example of two-side and one-side probability:
> m <- 0.2
>
> # two-side probability
> 2 * abs(pt((est-m)/se, df))
[1] 0.08332659
>
> # one-side, upper (i.e., greater than 0.2)
> pt((est-m)/se, df, lower.tail = FALSE)
[1] 0.9583367
>
> # one-side, lower (i.e., less than 0.2)
> pt((est-m)/se, df, lower.tail = TRUE)
[1] 0.0416633
note that sum of upper and lower probabilities is exactly 1.
Use the linearHypothesis function from car package. For instance, you can check if the coefficient of groupTrt equals -1 using.
linearHypothesis(lm.D9, "groupTrt = -1")
Linear hypothesis test
Hypothesis:
groupTrt = - 1
Model 1: restricted model
Model 2: weight ~ group
Res.Df RSS Df Sum of Sq F Pr(>F)
1 19 10.7075
2 18 8.7292 1 1.9782 4.0791 0.05856 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The smatr package has a slope.test() function with which you can use OLS.
In addition to all the other good answers, you could use an offset. It's a little trickier with categorical predictors, because you need to know the coding.
lm(weight~group+offset(1*(group=="Trt")))
The 1* here is unnecessary but is put in to emphasize that you are testing against the hypothesis that the difference is 1 (if you want to test against a hypothesis of a difference of d, then use d*(group=="Trt")
You can use t.test to do this for your data. The mu parameter sets the hypothesis for the difference of group means. The alternative parameter lets you choose between one and two-sided tests.
t.test(weight~group,var.equal=TRUE)
Two Sample t-test
data: weight by group
t = 1.1913, df = 18, p-value = 0.249
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2833003 1.0253003
sample estimates:
mean in group Ctl mean in group Trt
5.032 4.661
t.test(weight~group,var.equal=TRUE,mu=-1)
Two Sample t-test
data: weight by group
t = 4.4022, df = 18, p-value = 0.0003438
alternative hypothesis: true difference in means is not equal to -1
95 percent confidence interval:
-0.2833003 1.0253003
sample estimates:
mean in group Ctl mean in group Trt
5.032 4.661
Code up your own test. You know the estimated coeffiecient and you know the standard error. You could construct your own test stat.
Related
For Y = % of population with income below poverty level and X = per capita income of population, I have constructed a box-cox plot and found that the lambda = 0.02020:
bc <- boxcox(lm(Percent_below_poverty_level ~ Per_capita_income, data=tidy.CDI), plotit=T)
bc$x[which.max(bc$y)] # gives lambda
Now I want to fit a simple linear regression using the transformed data, so I've entered this code
transform <- lm((Percent_below_poverty_level**0.02020) ~ (Per_capita_income**0.02020))
transform
But all I get is the error message
'Error in terms.formula(formula, data = data) : invalid power in formula'. What is my mistake?
You could use bcPower() from the car package.
## make sure you do install.packages("car") if you haven't already
library(car)
data(Prestige)
p <- powerTransform(prestige ~ income + education + type ,
data=Prestige,
family="bcPower")
summary(p)
# bcPower Transformation to Normality
# Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
# Y1 1.3052 1 0.9408 1.6696
#
# Likelihood ratio test that transformation parameter is equal to 0
# (log transformation)
# LRT df pval
# LR test, lambda = (0) 41.67724 1 1.0765e-10
#
# Likelihood ratio test that no transformation is needed
# LRT df pval
# LR test, lambda = (1) 2.623915 1 0.10526
mod <- lm(bcPower(prestige, 1.3052) ~ income + education + type, data=Prestige)
summary(mod)
#
# Call:
# lm(formula = bcPower(prestige, 1.3052) ~ income + education +
# type, data = Prestige)
#
# Residuals:
# Min 1Q Median 3Q Max
# -44.843 -13.102 0.287 15.073 62.889
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -3.736e+01 1.639e+01 -2.279 0.0250 *
# income 3.363e-03 6.928e-04 4.854 4.87e-06 ***
# education 1.205e+01 2.009e+00 5.999 3.78e-08 ***
# typeprof 2.027e+01 1.213e+01 1.672 0.0979 .
# typewc -1.078e+01 7.884e+00 -1.368 0.1746
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 22.25 on 93 degrees of freedom
# (4 observations deleted due to missingness)
# Multiple R-squared: 0.8492, Adjusted R-squared: 0.8427
# F-statistic: 131 on 4 and 93 DF, p-value: < 2.2e-16
Powers (more often represented by ^ than ** in R, FWIW) have a special meaning inside formulas [they represent interactions among variables rather than mathematical operations]. So if you did want to power-transform both sides of your equation you would use the I() or "as-is" operator:
I(Percent_below_poverty_level^0.02020) ~ I(Per_capita_income^0.02020)
However, I think you should do what #DaveArmstrong suggested anyway:
it's only the predictor variable that gets transformed
the Box-Cox transformation is actually (y^lambda-1)/lambda (although the shift and scale might not matter for your results)
I have a dateset plink.raw and I am testing if a maker CN00020133 (has three level 0, 1, 2) is associated with phenotype5. I want to compare 0 vs 1 and 2 vs 1 using GLM or fisher extract test:
table(plink.raw$phenotype5,plink.raw$CN00020133)
1 0 2
0 3559 0 7
1 14806 54 123
tested using GLM, I can see the p value for 0 vs 1 is 0.912894.
plink.raw$CN00020133 <- factor(plink.raw$CN00020133, levels=c("1","0","2"))
univariate=glm(phenotype5 ~ relevel(CN00020133,ref ="1"), family = binomial, data = plink.raw)
summary(univariate)
Call:
glm(formula = phenotype5 ~ relevel(CN00020133, ref = "1"),
family = binomial, data = plink.raw)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4173 0.6564 0.6564 0.6564 0.6564
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.42555 0.01867 76.361 < 2e-16 ***
relevel(CN00020133, ref = "1")0 13.14051 120.12616 0.109 0.912894
relevel(CN00020133, ref = "1")2 1.44072 0.38902 3.703 0.000213 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 18158 on 18548 degrees of freedom
Residual deviance: 18114 on 18546 degrees of freedom
AIC: 18120
Number of Fisher Scoring iterations: 13
But if I tested it using fisher exact test, p value for 0 vs 1 is 4.618e-06.
Convictions <- matrix(c(0, 54, 3559, 14806), nrow = 2,dimnames = list(c("control", "case"),c("del/dup", "normal_copy")))
fisher.test(Convictions, alternative = "less")
Fisher's Exact Test for Count Data
data: Convictions
p-value = 9.048e-06
alternative hypothesis: true odds ratio is less than 1
95 percent confidence interval:
0.0000000 0.2374411
sample estimates:
odds ratio 0
You have a case of complete separation, also known as The Hauck-Donner effect occurs. If you CN00020133==0, all of them have 1 as phenotype, and this makes it hard to estimate the standard error of the coefficient. There's a fair amount of material on it, for example Alexej's blog, post by Brian Ripley, Ben Bolker's notes.
If you need to test for significance of the effect of "1", one solution is to use the likelihood ratio test:
df = rbind(
data.frame(phenotype=rep(0:1,c(3559,14806)),CN00020133="1"),
data.frame(phenotype=rep(0:1,c(0,54)),CN00020133="0")
)
anova(glm(phenotype ~ CN00020133,data=df,family=binomial),test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: phenotype
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 18418 18082
CN00020133 1 23.227 18417 18059 1.44e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This gives you different p-value compared to fisher test because it's a different distribution (binomial vs hyper-geometric). But more or less you can conclude there is an added effect of "1" using "0" as reference.
There is a implementation of Firth's logistic regression in R, you can try and see but I must say I am not very familiar with this:
library("logistf")
logistf(phenotype ~ CN00020133,data=df,family=binomial)
I am analysing whether the effects of x_t on y_t differ during and after a specific time period.
I am trying to regress the following model in R using lm():
y_t = b_0 + [b_1(1-D_t) + b_2 D_t]x_t
where D_t is a dummy variable with the value 1 over the time period and 0 otherwise.
Is it possible to use lm() for this formula?
observationNumber <- 1:80
obsFactor <- cut(observationNumber, breaks = c(0,55,81), right =F)
fit <- lm(y ~ x * obsFactor)
For example:
y = runif(80)
x = rnorm(80) + c(rep(0,54), rep(1, 26))
fit <- lm(y ~ x * obsFactor)
summary(fit)
Call:
lm(formula = y ~ x * obsFactor)
Residuals:
Min 1Q Median 3Q Max
-0.48375 -0.29655 0.05957 0.22797 0.49617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.50959 0.04253 11.983 <2e-16 ***
x -0.02492 0.04194 -0.594 0.554
obsFactor[55,81) -0.06357 0.09593 -0.663 0.510
x:obsFactor[55,81) 0.07120 0.07371 0.966 0.337
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3116 on 76 degrees of freedom
Multiple R-squared: 0.01303, Adjusted R-squared: -0.02593
F-statistic: 0.3345 on 3 and 76 DF, p-value: 0.8004
obsFactor[55,81) is zero if observationNumber < 55 and one if its greater or equal its coefficient is your $b_0$. x:obsFactor[55,81) is the product of the dummy and the variable $x_t$ - its coefficient is your $b_2$. The coefficient for $x_t$ is your $b_1$.
I'm trying to estimate parameter for Zero-inflated Conway Maxwell Poisson Mixed Model. I'm not getting why GlmmTMP function is giving approx half value for the non zero effect part and giving nice estimates for the Zero part and dispersion part?
E.g:- Actual value for intercept is 2.5 and I'm getting 1.21
for sexfemale actual value is 1.2 and I'm getting 0.548342
please help me out in this situation?
Thank you
#--------Simulation from ZICOMP mix lambda---------
library(COMPoissonReg)
library(glmmTMB)
set.seed(123)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 5 # maximum follow-up time
# we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at random follow-up times
DF_CMP <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects non-zero part
X <- model.matrix(~ sex * time, data = DF_CMP)
Z <- model.matrix(~ 1, data = DF_CMP)
# design matrices for the fixed and random effects zero part
X_zi <- model.matrix(~ sex, data = DF_CMP)
betas <- c(2.5 , 1.2 , 2.3, -1.5) # fixed effects coefficients non-zero part
shape <- 2
gammas <- c(-1.5, 0.9) # fixed effects coefficients zero part
D11 <- 0.5 # variance of random intercepts non-zero part
# we simulate random effects
b <- rnorm(n, sd = sqrt(D11))
# linear predictor non-zero part
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF_CMP$id,drop = FALSE]))
# linear predictor zero part
eta_zi <- as.vector(X_zi %*% gammas)
DF_CMP$CMP_y <- rzicmp(n * K, lambda = exp(eta_y), nu = shape, p = plogis(eta_zi))
hist(DF_CMP$CMP_y)
#------ estimation -------------
CMPzicmpm0 = glmmTMB(CMP_y~ sex*time + (1|id) , zi= ~ sex, data = DF_CMP, family=compois)
summary(CMPzicmpm0)
> summary(CMPzicmpm0)
Family: compois ( log )
Formula: CMP_y ~ sex * time + (1 | id)
Zero inflation: ~sex
Data: DF_CMP
AIC BIC logLik deviance df.resid
4586.2 4623.7 -2285.1 4570.2 792
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
id (Intercept) 0.1328 0.3644
Number of obs: 800, groups: id, 100
Overdispersion parameter for compois family (): 0.557
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.217269 0.054297 22.42 < 2e-16 ***
sexfemale 0.548342 0.079830 6.87 6.47e-12 ***
time 1.151549 0.004384 262.70 < 2e-16 ***
sexfemale:time -0.735348 0.009247 -79.52 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.6291 0.1373 -11.866 < 2e-16 ***
sexfemale 0.9977 0.1729 5.771 7.89e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I am performing an ANCOVA so as to test what is the relationship between body size (covariate, logLCC) and different head measures (response variable, logLP) in each sex (cathegorical variable, sexo).
I got the slopes for each sex in the lm and I would like to compare them to 1. More specifically, I would like to know if the slopes are significantly higher or less than 1, or if they are equal to 1, as this would have different biological meanings in their allometric relationships.
Here is my code:
#Modelling my lm#
> lm.logLP.sexo.adu<-lm(logLP~logLCC*sexo, data=ADU)
> anova(lm.logLP.sexo.adu)
Analysis of Variance Table
Response: logLP
Df Sum Sq Mean Sq F value Pr(>F)
logLCC 1 3.8727 3.8727 3407.208 < 2.2e-16 ***
sexo 1 0.6926 0.6926 609.386 < 2.2e-16 ***
logLCC:sexo 1 0.0396 0.0396 34.829 7.563e-09 ***
Residuals 409 0.4649 0.0011
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Obtaining slopes#
> lm.logLP.sexo.adu$coefficients
(Intercept) logLCC sexoM logLCC:sexoM
-0.1008891 0.6725818 -1.0058962 0.2633595
> lm.logLP.sexo.adu1<-lstrends(lm.logLP.sexo.adu,"sexo",var="logLCC")
> lm.logLP.sexo.adu1
sexo logLCC.trend SE df lower.CL upper.CL
H 0.6725818 0.03020017 409 0.6132149 0.7319487
M 0.9359413 0.03285353 409 0.8713585 1.0005241
Confidence level used: 0.95
#Comparing slopes#
> pairs(lm.logLP.sexo.adu1)
contrast estimate SE df t.ratio p.value
H - M -0.2633595 0.04462515 409 -5.902 <.0001
#Checking whether the slopes are different than 1#
#Computes Summary with statistics
> s1<-summary(lm.logLP.sexo.adu)
> s1
Call:
lm(formula = logLP ~ logLCC * sexo, data = ADU)
Residuals:
Min 1Q Median 3Q Max
-0.13728 -0.02202 -0.00109 0.01880 0.12468
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.10089 0.12497 -0.807 0.42
logLCC 0.67258 0.03020 22.271 < 2e-16 ***
sexoM -1.00590 0.18700 -5.379 1.26e-07 ***
logLCC:sexoM 0.26336 0.04463 5.902 7.56e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03371 on 409 degrees of freedom
Multiple R-squared: 0.9083, Adjusted R-squared: 0.9076
F-statistic: 1350 on 3 and 409 DF, p-value: < 2.2e-16
#Computes t-student H0: intercept=1. The estimation of coefficients and their s.d. are in s1$coefficients
> t1<-(1-s1$coefficients[2,1])/s1$coefficients[2,2]
#Calculates two tailed probability
> pval<- 2 * pt(abs(t1), df = df.residual(lm.logLP.sexo.adu), lower.tail = FALSE)
> print(pval)
[1] 3.037231e-24
I saw this whole process in several threads here. But all that I can understand is that my slopes are just different from 1.
How could I check that they are greater or smaller than 1?
EDITED
Solved!
#performs one-side test H0=slope bigger than 1
pval<-pt(t1, df = df.residual(lm.logLP.sexo.adu), lower.tail = FALSE)
#performs one-side test H0=slope smaller than 1
pval<-pt(t1, df = df.residual(lm.logLP.sexo.adu), lower.tail = TRUE)
Also, tests should be performed in single-sex models.
How could I check that they are greater or smaller than 1?
As in this post, this post, and as your in question, you can make Wald test which you compute by
t1<-(1-s1$coefficients[2,1])/s1$coefficients[2,2]
Alternatively, use the vcov and coef function to make the code more readable
fit <- lm.logLP.sexo.adu
t1<-(1-coef(fit)[1])/vcov(fit)[1, 1]
The Wald test gives you t-statistics which can be used to make both a two-sided or one-sided test. Thus, you can drop the abs and set the lower.tail argument according to which tail you want to test in.