I am performing an ANCOVA so as to test what is the relationship between body size (covariate, logLCC) and different head measures (response variable, logLP) in each sex (cathegorical variable, sexo).
I got the slopes for each sex in the lm and I would like to compare them to 1. More specifically, I would like to know if the slopes are significantly higher or less than 1, or if they are equal to 1, as this would have different biological meanings in their allometric relationships.
Here is my code:
#Modelling my lm#
> lm.logLP.sexo.adu<-lm(logLP~logLCC*sexo, data=ADU)
> anova(lm.logLP.sexo.adu)
Analysis of Variance Table
Response: logLP
Df Sum Sq Mean Sq F value Pr(>F)
logLCC 1 3.8727 3.8727 3407.208 < 2.2e-16 ***
sexo 1 0.6926 0.6926 609.386 < 2.2e-16 ***
logLCC:sexo 1 0.0396 0.0396 34.829 7.563e-09 ***
Residuals 409 0.4649 0.0011
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Obtaining slopes#
> lm.logLP.sexo.adu$coefficients
(Intercept) logLCC sexoM logLCC:sexoM
-0.1008891 0.6725818 -1.0058962 0.2633595
> lm.logLP.sexo.adu1<-lstrends(lm.logLP.sexo.adu,"sexo",var="logLCC")
> lm.logLP.sexo.adu1
sexo logLCC.trend SE df lower.CL upper.CL
H 0.6725818 0.03020017 409 0.6132149 0.7319487
M 0.9359413 0.03285353 409 0.8713585 1.0005241
Confidence level used: 0.95
#Comparing slopes#
> pairs(lm.logLP.sexo.adu1)
contrast estimate SE df t.ratio p.value
H - M -0.2633595 0.04462515 409 -5.902 <.0001
#Checking whether the slopes are different than 1#
#Computes Summary with statistics
> s1<-summary(lm.logLP.sexo.adu)
> s1
Call:
lm(formula = logLP ~ logLCC * sexo, data = ADU)
Residuals:
Min 1Q Median 3Q Max
-0.13728 -0.02202 -0.00109 0.01880 0.12468
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.10089 0.12497 -0.807 0.42
logLCC 0.67258 0.03020 22.271 < 2e-16 ***
sexoM -1.00590 0.18700 -5.379 1.26e-07 ***
logLCC:sexoM 0.26336 0.04463 5.902 7.56e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03371 on 409 degrees of freedom
Multiple R-squared: 0.9083, Adjusted R-squared: 0.9076
F-statistic: 1350 on 3 and 409 DF, p-value: < 2.2e-16
#Computes t-student H0: intercept=1. The estimation of coefficients and their s.d. are in s1$coefficients
> t1<-(1-s1$coefficients[2,1])/s1$coefficients[2,2]
#Calculates two tailed probability
> pval<- 2 * pt(abs(t1), df = df.residual(lm.logLP.sexo.adu), lower.tail = FALSE)
> print(pval)
[1] 3.037231e-24
I saw this whole process in several threads here. But all that I can understand is that my slopes are just different from 1.
How could I check that they are greater or smaller than 1?
EDITED
Solved!
#performs one-side test H0=slope bigger than 1
pval<-pt(t1, df = df.residual(lm.logLP.sexo.adu), lower.tail = FALSE)
#performs one-side test H0=slope smaller than 1
pval<-pt(t1, df = df.residual(lm.logLP.sexo.adu), lower.tail = TRUE)
Also, tests should be performed in single-sex models.
How could I check that they are greater or smaller than 1?
As in this post, this post, and as your in question, you can make Wald test which you compute by
t1<-(1-s1$coefficients[2,1])/s1$coefficients[2,2]
Alternatively, use the vcov and coef function to make the code more readable
fit <- lm.logLP.sexo.adu
t1<-(1-coef(fit)[1])/vcov(fit)[1, 1]
The Wald test gives you t-statistics which can be used to make both a two-sided or one-sided test. Thus, you can drop the abs and set the lower.tail argument according to which tail you want to test in.
Related
I am working on adjusting urine sodium by urine creatinine and age in order to use the adjusted variable in further analysis.
How do I create a new variable with the adjusted version of the data?? Do I divide NA24 by creatinine and age? Do I multiply them? Please help.
I ran a linear model as follows, but not sure what to do with the information:
Call:
lm(formula = PRENA24 ~ PRECR24mmol * PREALD, data = c1.3)
Residuals:
Min 1Q Median 3Q Max
-228.439 -43.024 -5.215 37.790 274.414
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66.84482 29.60684 2.258 0.02423 *
PRECR24mmol 7.00565 2.10989 3.320 0.00094 ***
PREALD -0.66555 0.60912 -1.093 0.27488
PRECR24mmol:PREALD 0.06335 0.04392 1.442 0.14962
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 65.94 on 798 degrees of freedom
Multiple R-squared: 0.2963, Adjusted R-squared: 0.2937
F-statistic: 112 on 3 and 798 DF, p-value: < 2.2e-16
I need to adjust the PRENA24 value and I want to make a new column with these values (i.e. PRENA24.ADJ).
I know the following is incorrect, but I am not sure what else to do with the information from the linear model. The post lab data is separated by treatment type as well.
c1 <- c1.3 %>%
mutate(PRENA24.ADJ = (PRENA24-66.84482+(7.00565*PRECR24mmol)+(-0.66555*PREALD)))
c2 <- c1 %>%
mutate(NA24.ADJ = (NA24-24.59443+(10.54905*CR24mmol)+(0.58894*ALD)))
data("hprice2")
reg1 <- lm(price ~ rooms + crime + nox, hprice2)
summary(reg1)
Call:
lm(formula = price ~ rooms + crime + nox, data = hprice2)
Residuals:
Min 1Q Median 3Q Max
-18311 -3218 -772 2418 39164
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19371.47 3250.94 -5.959 4.79e-09 ***
rooms 7933.18 407.87 19.450 < 2e-16 ***
crime -199.70 35.05 -5.697 2.08e-08 ***
nox -1306.06 266.14 -4.907 1.25e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6103 on 502 degrees of freedom
Multiple R-squared: 0.5634, Adjusted R-squared: 0.5608
F-statistic: 215.9 on 3 and 502 DF, p-value: < 2.2e-16
Question 1.
Run two alternative (two-sided) t-tests for: H0: B1 = 8000
predict(reg1, data.frame(rooms=8000, crime = -199.70, nox = -1306.06), interval = .99)
Report your t-statistic and whether you reject or fail to reject the null at 90, 95, and/or 99 percent confidence levels.
I suppose by beta1 you mean rooms in this case. Your t.test in the summary is tested against beta0 = 0, you can see from wiki:
so using the example of nox:
tstat = (-1306.06 - 0)/266.14
[2] -4.907417
And p.value is
2*pt(-abs(tstat),502)
[2] 1.251945e-06
the null hypothesis in your case will be 8000 and you test rooms = 8000:
tstat = (7933.18 - 8000)/407.87
2*pt(-abs(tstat),502)
You can also use linearHypothesis from cars to do the above:
library(car)
linearHypothesis(reg1, c("rooms = 8000"))
I have a dateset plink.raw and I am testing if a maker CN00020133 (has three level 0, 1, 2) is associated with phenotype5. I want to compare 0 vs 1 and 2 vs 1 using GLM or fisher extract test:
table(plink.raw$phenotype5,plink.raw$CN00020133)
1 0 2
0 3559 0 7
1 14806 54 123
tested using GLM, I can see the p value for 0 vs 1 is 0.912894.
plink.raw$CN00020133 <- factor(plink.raw$CN00020133, levels=c("1","0","2"))
univariate=glm(phenotype5 ~ relevel(CN00020133,ref ="1"), family = binomial, data = plink.raw)
summary(univariate)
Call:
glm(formula = phenotype5 ~ relevel(CN00020133, ref = "1"),
family = binomial, data = plink.raw)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4173 0.6564 0.6564 0.6564 0.6564
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.42555 0.01867 76.361 < 2e-16 ***
relevel(CN00020133, ref = "1")0 13.14051 120.12616 0.109 0.912894
relevel(CN00020133, ref = "1")2 1.44072 0.38902 3.703 0.000213 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 18158 on 18548 degrees of freedom
Residual deviance: 18114 on 18546 degrees of freedom
AIC: 18120
Number of Fisher Scoring iterations: 13
But if I tested it using fisher exact test, p value for 0 vs 1 is 4.618e-06.
Convictions <- matrix(c(0, 54, 3559, 14806), nrow = 2,dimnames = list(c("control", "case"),c("del/dup", "normal_copy")))
fisher.test(Convictions, alternative = "less")
Fisher's Exact Test for Count Data
data: Convictions
p-value = 9.048e-06
alternative hypothesis: true odds ratio is less than 1
95 percent confidence interval:
0.0000000 0.2374411
sample estimates:
odds ratio 0
You have a case of complete separation, also known as The Hauck-Donner effect occurs. If you CN00020133==0, all of them have 1 as phenotype, and this makes it hard to estimate the standard error of the coefficient. There's a fair amount of material on it, for example Alexej's blog, post by Brian Ripley, Ben Bolker's notes.
If you need to test for significance of the effect of "1", one solution is to use the likelihood ratio test:
df = rbind(
data.frame(phenotype=rep(0:1,c(3559,14806)),CN00020133="1"),
data.frame(phenotype=rep(0:1,c(0,54)),CN00020133="0")
)
anova(glm(phenotype ~ CN00020133,data=df,family=binomial),test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: phenotype
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 18418 18082
CN00020133 1 23.227 18417 18059 1.44e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This gives you different p-value compared to fisher test because it's a different distribution (binomial vs hyper-geometric). But more or less you can conclude there is an added effect of "1" using "0" as reference.
There is a implementation of Firth's logistic regression in R, you can try and see but I must say I am not very familiar with this:
library("logistf")
logistf(phenotype ~ CN00020133,data=df,family=binomial)
I'm taking some models that were built in EViews and putting them into R. I'm having trouble replicating EViews' MA function.
I tried using the lag of the regressions residuals, but this isn't quite the same. I've seen some mentions that this is an ARIMA regression.. Is there no way to replicate MA from EViews in an lm regression?
For example in R:
set.seed(2)
a = data.frame(a = 1:6,
b = runif(6, 0.0, 1.0),
c = runif(6, 0.0, 1.0))
fit_C = lm(c ~ a + b, data = a)
a$C.pred = predict.lm(fit_C, a)
a$C.resid = a$c - a$C.pred
fit_C = lm(c ~ a + b + lag(C.resid, 1), data = a)
summary(fit_C)
Outputs:
Call:
lm(formula = c ~ a + b + lag(C.resid, 1), data = a)
Residuals:
1 2 3 4 5 6
-1.779e-17 -1.131e-17 5.474e-17 -5.218e-18 -1.959e-17 -8.320e-19
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.327e-01 4.279e-17 1.011e+16 <2e-16 ***
a -3.998e-02 1.353e-17 -2.954e+15 <2e-16 ***
b 2.889e-01 7.278e-17 3.969e+15 <2e-16 ***
lag(C.resid, 1) 1.000e+00 8.241e-17 1.213e+16 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.389e-17 on 2 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 5.444e+31 on 3 and 2 DF, p-value: < 2.2e-16
Where in EViews:
Dependent Variable: C01
Method: Least Squares
Date: 09/18/18 Time: 10:24
Sample: 1 6
Included observations: 6
Convergence achieved after 9 iterations
MA Backcast: 0
Variable Coefficient Std. Error t-Statistic Prob.
C 0.892941 0.147320 6.061254 0.0262
A -0.101365 0.041651 -2.433684 0.1354
B 0.063370 0.257874 0.245740 0.8288
MA(1) -0.982901 0.058536 -16.79134 0.0035
R-squared 0.933603 Mean dependent var 0.462030
Adjusted R-squared 0.834008 S.D. dependent var 0.250812
S.E. of regression 0.102186 Akaike info criterion -1.489321
Sum squared resid 0.020884 Schwarz criterion -1.628148
Log likelihood 8.467963 Hannan-Quinn criter. -2.045057
F-statistic 9.373951 Durbin-Watson stat 2.907407
Prob(F-statistic) 0.097923
Inverted MA Roots .98
How do I replicate the MA(1) variable in R?
By default lm summary test slope coefficient equal to zero. My question is very basic. I want to know how to test slope coefficient equal to non-zero value. One approach could be to use confint but this does not provide p-value. I also wonder how to do one-sided test with lm.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2,10,20, labels=c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
summary(lm.D9)
Call:
lm(formula = weight ~ group)
Residuals:
Min 1Q Median 3Q Max
-1.0710 -0.4938 0.0685 0.2462 1.3690
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0320 0.2202 22.850 9.55e-15 ***
groupTrt -0.3710 0.3114 -1.191 0.249
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158
F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249
confint(lm.D9)
2.5 % 97.5 %
(Intercept) 4.56934 5.4946602
groupTrt -1.02530 0.2833003
Thanks for your time and effort.
as #power says, you can do by your hand.
here is an example:
> est <- summary.lm(lm.D9)$coef[2, 1]
> se <- summary.lm(lm.D9)$coef[2, 2]
> df <- summary.lm(lm.D9)$df[2]
>
> m <- 0
> 2 * abs(pt((est-m)/se, df))
[1] 0.2490232
>
> m <- 0.2
> 2 * abs(pt((est-m)/se, df))
[1] 0.08332659
and you can do one-side test by omitting 2*.
UPDATES
here is an example of two-side and one-side probability:
> m <- 0.2
>
> # two-side probability
> 2 * abs(pt((est-m)/se, df))
[1] 0.08332659
>
> # one-side, upper (i.e., greater than 0.2)
> pt((est-m)/se, df, lower.tail = FALSE)
[1] 0.9583367
>
> # one-side, lower (i.e., less than 0.2)
> pt((est-m)/se, df, lower.tail = TRUE)
[1] 0.0416633
note that sum of upper and lower probabilities is exactly 1.
Use the linearHypothesis function from car package. For instance, you can check if the coefficient of groupTrt equals -1 using.
linearHypothesis(lm.D9, "groupTrt = -1")
Linear hypothesis test
Hypothesis:
groupTrt = - 1
Model 1: restricted model
Model 2: weight ~ group
Res.Df RSS Df Sum of Sq F Pr(>F)
1 19 10.7075
2 18 8.7292 1 1.9782 4.0791 0.05856 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The smatr package has a slope.test() function with which you can use OLS.
In addition to all the other good answers, you could use an offset. It's a little trickier with categorical predictors, because you need to know the coding.
lm(weight~group+offset(1*(group=="Trt")))
The 1* here is unnecessary but is put in to emphasize that you are testing against the hypothesis that the difference is 1 (if you want to test against a hypothesis of a difference of d, then use d*(group=="Trt")
You can use t.test to do this for your data. The mu parameter sets the hypothesis for the difference of group means. The alternative parameter lets you choose between one and two-sided tests.
t.test(weight~group,var.equal=TRUE)
Two Sample t-test
data: weight by group
t = 1.1913, df = 18, p-value = 0.249
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2833003 1.0253003
sample estimates:
mean in group Ctl mean in group Trt
5.032 4.661
t.test(weight~group,var.equal=TRUE,mu=-1)
Two Sample t-test
data: weight by group
t = 4.4022, df = 18, p-value = 0.0003438
alternative hypothesis: true difference in means is not equal to -1
95 percent confidence interval:
-0.2833003 1.0253003
sample estimates:
mean in group Ctl mean in group Trt
5.032 4.661
Code up your own test. You know the estimated coeffiecient and you know the standard error. You could construct your own test stat.