How to extract p-values from sumurca object? - r

I'd like to extract the p-values from the summary output of ur.za in package urca.
library(urca)
data(nporg)
gnp <- na.omit(nporg[, "gnp.r"])
za.gnp <- ur.za(gnp, model="both", lag=2)
summary(za.gnp)
> summary(za.gnp)
################################
# Zivot-Andrews Unit Root Test #
################################
Call:
lm(formula = testmat)
Residuals:
Min 1Q Median 3Q Max
-39.753 -9.413 2.138 9.934 22.977
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.49068 10.25301 2.096 0.04096 *
y.l1 0.77341 0.05896 13.118 < 2e-16 ***
trend 1.19804 0.66346 1.806 0.07675 .
y.dl1 0.39699 0.12608 3.149 0.00272 **
y.dl2 0.10503 0.13401 0.784 0.43676
du -25.44710 9.20734 -2.764 0.00788 **
dt 2.11456 0.84179 2.512 0.01515 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 13.72 on 52 degrees of freedom
(3 observations deleted due to missingness)
Multiple R-squared: 0.9948, Adjusted R-squared: 0.9942
F-statistic: 1651 on 6 and 52 DF, p-value: < 2.2e-16
Teststatistic: -3.8431
Critical values: 0.01= -5.57 0.05= -5.08 0.1= -4.82
Potential break point at position: 21
All methods I found for lm summary objects don't seem to work here. And I've spent quite some time searching through str(summary(za.gnp)) to no avail. Any hints on where to look?

Objects of class ur.za are S4 objects, which behave differently to S3 objects like those produced by lm. One difference is the concept of the slot accessed via the # operator.
summary(za.gnp) has pval slot but its value is NULL.
summary(za.gnp)#pval
NULL
However, it also has a testreg slot which contains an lm object with the test results that you can use to obtain the p values in the usual way:
coef(summary(summary(za.gnp)#testreg))[,"Pr(>|t|)"]
(Intercept) y.l1 trend y.dl1 y.dl2 du
4.096351e-02 4.007914e-18 7.674887e-02 2.716223e-03 4.367588e-01 7.884201e-03
dt
1.514797e-02

Related

How to get adjusted dependent variable

I am working on adjusting urine sodium by urine creatinine and age in order to use the adjusted variable in further analysis.
How do I create a new variable with the adjusted version of the data?? Do I divide NA24 by creatinine and age? Do I multiply them? Please help.
I ran a linear model as follows, but not sure what to do with the information:
Call:
lm(formula = PRENA24 ~ PRECR24mmol * PREALD, data = c1.3)
Residuals:
Min 1Q Median 3Q Max
-228.439 -43.024 -5.215 37.790 274.414
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66.84482 29.60684 2.258 0.02423 *
PRECR24mmol 7.00565 2.10989 3.320 0.00094 ***
PREALD -0.66555 0.60912 -1.093 0.27488
PRECR24mmol:PREALD 0.06335 0.04392 1.442 0.14962
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 65.94 on 798 degrees of freedom
Multiple R-squared: 0.2963, Adjusted R-squared: 0.2937
F-statistic: 112 on 3 and 798 DF, p-value: < 2.2e-16
I need to adjust the PRENA24 value and I want to make a new column with these values (i.e. PRENA24.ADJ).
I know the following is incorrect, but I am not sure what else to do with the information from the linear model. The post lab data is separated by treatment type as well.
c1 <- c1.3 %>%
mutate(PRENA24.ADJ = (PRENA24-66.84482+(7.00565*PRECR24mmol)+(-0.66555*PREALD)))
c2 <- c1 %>%
mutate(NA24.ADJ = (NA24-24.59443+(10.54905*CR24mmol)+(0.58894*ALD)))

How do I construct a line of code to test the conditional mean difference in votes by region, controlling for other variables in the model?

Call:
lm(formula = votes ~ redist + party + region, data = prob2)
Residuals:
Min 1Q Median 3Q Max
-56.824 -15.175 0.333 13.903 55.549
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -49.4762 16.7739 -2.950 0.00339 **
redist 1.3881 0.2792 4.972 1.02e-06 ***
party 37.2574 2.1062 17.689 < 2e-16 ***
regionmidwest -9.2403 2.8632 -3.227 0.00136 **
regionsouth -23.4173 3.1394 -7.459 6.45e-13 ***
regionwest 5.3285 3.8537 1.383 0.16761
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 18.87 on 364 degrees of freedom
Multiple R-squared: 0.5756, Adjusted R-squared: 0.5698
F-statistic: 98.74 on 5 and 364 DF, p-value: < 2.2e-16
I have to conduct a joint hypothesis and initially tried the following: linearHypothesis(q2, c("votes = region"), as I called the regression above "q2". However, I got an error saying that "The hypothesis "votes = region" is not well formed: contains bad coefficient/variable names".
Appreciate any help!

variable lengths differ in R using lm

I get a nasty surprise when running lm in R:
variable lengths differ (found for 'returnsandp')
I run the following model:
# regress apple price return on s&p price return
attach(NewSetSlide.ex)
resultr = lm(returnapple ~ returnsandp)
summary(resultr)
It cannot get any more simple than that, but for some reason, I get the error above.
I checked that the length of returnapple & returnsandp is exactly the same. So what on earth is going on, please?
The data.frame in question:
NewSetSlide.ex <- structure(list(returnapple = c(0.1412251, 0.07665801, 0.02560235,
0.09638143, 0.06384145, 0.05163189, -0.1076969, 0.05121892, 0.06428114,
0.09939652, 0.07271771, 0.06923432, 0.02873109, 0.0721757, -0.0121841,
0.07196034, 0.1012038, -0.06786657, 0.06142434, 0.09644931, -0.02754909,
0.005786519, 0.04099078, -0.03320592, -0.03292676, -0.06908485,
-0.01878077, 0.08340874, -0.01004186, -0.1064195, -0.07524236,
-0.006677446, 0.133327, -0.139921, 0.06528701, -0.036831, 0.09006266,
0.01813659, 0.07127628, 0.004334296, -0.02659846, 0.05333548,
0.04774654, 0.1288835, 0.05323629, -0.00006978558, 0.0634182,
-0.0533224, 0.03270362, 0.1026693, -0.05655361, 0.09680779, 0.01662336,
-0.01170586, -0.01063646, 0.0638476, -0.0542103, -0.01501973,
0.1307637, -0.005598485, 0.02798327, 0.1962269, 0.006725292,
0), returnsandp = c(0.1159772758, 0.007614392, 0.1104467964,
0.0359706698, 0.0152313579, 0.0331342721, 0.0189951476, 0.0330947526,
0.0749868297, -0.0124064592, 0.0323295771, -0.0303030364, 0.0113188732,
0.0101582303, -0.0151743475, 0.0174258083, -0.0088341409, -0.0092159647,
-0.0388593467, 0.0134979946, 0.0054655738, -0.05935645, 0.0174692125,
-0.0164511628, 0.1063320628, -0.0034796438, -0.0000602649, -0.0151122528,
0.0223743915, 0.0740851449, 0.0086287811, -0.0028700134, -0.0045942764,
0.0540510532, 0.0121340172, -0.0048475787, -0.0119945162, -0.034724078,
0.0425088143, 0.0650615875, 0.0450610926, 0.0023665278, 0.0714892769,
0.052793919, -0.0141481377, 0.0502292875, 0.0141095206, -0.0586828306,
0.071192607, -0.0854386059, 0.05472933, 0.0214771911, -0.0282882713,
0.1317668962, 0.0369236189, 0.0263898652, -0.0114502121, 0.0060341972,
0.0479144906, 0.0482236974, 0.0349588397, -0.0241661652, -0.2176304161,
-0.0853488645)), class = "data.frame", row.names = c(NA, -64L))
Based on #Dave2e comment.
It is better to use data=NewSetSlide.ex argument inside lm function call to avoid naming conflicts moreover you should avoid using attach function. Please see as below (NewSetSlide.ex data frame was taken from the question above):
resultr <- lm(returnapple ~ returnsandp, data = NewSetSlide.ex)
summary(resultr)
Output:
Call:
lm(formula = returnapple ~ returnsandp, data = NewSetSlide.ex)
Residuals:
Min 1Q Median 3Q Max
-0.166599181 -0.041838291 0.003778841 0.044034591 0.166774667
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.028595156 0.008677931 3.29516 0.0016294 **
returnsandp -0.035466006 0.160976847 -0.22032 0.8263478
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0672294 on 62 degrees of freedom
Multiple R-squared: 0.0007822871, Adjusted R-squared: -0.01533413
F-statistic: 0.04853977 on 1 and 62 DF, p-value: 0.8263478

Wald test on regression coefficients of factorial variable in R

I'm a newbie in R and I have this fitted model:
> mqo_reg_g <- lm(G ~ factor(year), data = data)
> summary(mqo_reg_g)
Call:
lm(formula = G ~ factor(year), data = data)
Residuals:
Min 1Q Median 3Q Max
-0.11134 -0.06793 -0.04239 0.01324 0.85213
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.111339 0.005253 21.197 < 2e-16 ***
factor(year)2002 -0.015388 0.007428 -2.071 0.038418 *
factor(year)2006 -0.016980 0.007428 -2.286 0.022343 *
factor(year)2010 -0.024432 0.007496 -3.259 0.001131 **
factor(year)2014 -0.025750 0.007436 -3.463 0.000543 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.119 on 2540 degrees of freedom
Multiple R-squared: 0.005952, Adjusted R-squared: 0.004387
F-statistic: 3.802 on 4 and 2540 DF, p-value: 0.004361
I want to test the difference between the coefficients of factor(year)2002 and Intercept; factor(year)2006 and factor(year)2002; and so on.
In STATA I know people use the function "test" that performs a Wald tests about the parameters of the fitted model. But I could find how to do in R.
How can I do it?
Thanks!

Different results of lm with same dataset written in two different languages (English and Korean)

Results of lm function applied on two dataset (numeric variables + categorical variables) written in two different languages (one written in English and the other one written in Korean) are different. Except the categorical variables, numeric variable are exactly the same. What could explain the difference in the results?
#data
df3 <- repmis::source_DropboxData("df3_v0.1.csv","gg30a74n4ew3zzg",header = TRUE)
#the one written in korean
out1<-lm(YD~SANJI+TAmin8+TMINup18do6+typ_rain6+DTD9,data=df3)
summary(out1)
#the one written in eng
df3$SANJI[df3$SANJI=="전북"]<-"JB"
df3$SANJI[df3$SANJI=="충북"]<-"CHB"
df3$SANJI[df3$SANJI=="경북"]<-"KB"
df3$SANJI[df3$SANJI=="전남"]<-"JN"
df3$SANJI2[df3$SANJI2=="고창"]<-"Gochang"
df3$SANJI2[df3$SANJI2=="괴산"]<-"Goesan"
df3$SANJI2[df3$SANJI2=="단양"]<-"Danyang"
df3$SANJI2[df3$SANJI2=="봉화"]<-"Fenghua"
df3$SANJI2[df3$SANJI2=="신안"]<-"Sinan"
df3$SANJI2[df3$SANJI2=="안동"]<-"Andong"
df3$SANJI2[df3$SANJI2=="영광"]<-"younggang"
df3$SANJI2[df3$SANJI2=="영양"]<-"youngyang"
df3$SANJI2[df3$SANJI2=="영주"]<-"youngju"
df3$SANJI2[df3$SANJI2=="예천"]<-"Yecheon"
df3$SANJI2[df3$SANJI2=="의성"]<-"Yusaeng"
df3$SANJI2[df3$SANJI2=="제천"]<-"Jechon"
df3$SANJI2[df3$SANJI2=="진안"]<-"Jinan"
df3$SANJI2[df3$SANJI2=="청송"]<-"Changsong"
df3$SANJI2[df3$SANJI2=="해남"]<-"Haenam"
out2<-lm(YD~SANJI+TAmin8+TMINup18do6+typ_rain6+DTD9,data=df3)
summary(out2)
#the one written in korean
#Call:
#lm(formula = YD ~ SANJI + TAmin8 + TMINup18do6 + typ_rain6 +
# DTD9, data = df3)
#Residuals:
# Min 1Q Median 3Q Max
#-98.836 -23.173 -2.261 22.626 111.367
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 970.33251 84.12479 11.534 < 2e-16 ***
#SANJI전남 -33.75664 12.53277 -2.693 0.008158 **
#SANJI전북 -44.17939 11.22274 -3.937 0.000144 ***
#SANJI충북 -44.09285 9.16736 -4.810 4.74e-06 ***
#TAmin8 -25.56618 3.36053 -7.608 9.37e-12 ***
#TMINup18do6 4.58052 0.96528 4.745 6.19e-06 ***
#typ_rain6 -0.19754 0.02862 -6.903 3.23e-10 ***
#DTD9 -16.15975 2.65128 -6.095 1.59e-08 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 37.2 on 112 degrees of freedom
#Multiple R-squared: 0.58, Adjusted R-squared: 0.5538
#F-statistic: 22.1 on 7 and 112 DF, p-value: < 2.2e-16
#the one written in eng
#Call:
#lm(formula = YD ~ SANJI + TAmin8 + TMINup18do6 + typ_rain6 +
# DTD9, data = df3)
#Residuals:
# Min 1Q Median 3Q Max
#-98.836 -23.173 -2.261 22.626 111.367
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 926.23966 84.32621 10.984 < 2e-16 ***
#SANJIJB -0.08654 12.32752 -0.007 0.994
#SANJIJN 10.33620 13.09434 0.789 0.432
#SANJIKB 44.09285 9.16736 4.810 4.74e-06 ***
#TAmin8 -25.56618 3.36053 -7.608 9.37e-12 ***
#TMINup18do6 4.58052 0.96528 4.745 6.19e-06 ***
#typ_rain6 -0.19754 0.02862 -6.903 3.23e-10 ***
#DTD9 -16.15975 2.65128 -6.095 1.59e-08 ***
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 37.2 on 112 degrees of freedom
#Multiple R-squared: 0.58, Adjusted R-squared: 0.5538
#F-statistic: 22.1 on 7 and 112 DF, p-value: < 2.2e-16
Your overall model fits are the same, you just have different reference classes for your factor ("SANJIJ"). Having a different reference level will also affect your intercept but won't change the estimation of your continuous covariates.
You can use relevel() to force a particular reference class (assuming SANJIJ is already a factor) or explicitly create the factor() with a levels= parameter, otherwise the default order is sorted alphabetically and the levels may not sort the same way in the different languages.

Resources