How to extract p-value in var package? - r

How can we extract p-value in var package. When we write summary(var), where 'var' is the name of var model, we see p-value at the bottom the results, but how we can extract this value?
For example:
library(vars)
symbols=c('^N225','^FTSE','^GSPC')
getSymbols(symbols,src='yahoo', from="2003-04-28", to="2007-10-29")
period="daily"
A1=periodReturn(N225$N225.Adjusted,period=period)
B1=periodReturn(FTSE$FTSE.Adjusted,period=period)
C1=periodReturn(GSPC$GSPC.Adjusted,period=period)
datap_1<-cbind(A1,B1,C1)
datap_1<-na.omit(datap_1)
datap_1<-(datap_1)^2
vardatap_3<-VAR(datap_1,p=3,type="none")
summary(vardatap_3)
after summary(vardatap_3) we can see the p-value, like this:
VAR Estimation Results:
=========================
Endogenous variables: N225, FTSE, SP500
Deterministic variables: none
Sample size: 1055
Log Likelihood: 23637.848
Roots of the characteristic polynomial:
0.8639 0.6224 0.6224 0.5711 0.5711 0.5471 0.5471 0.4683 0.4683
Call:
VAR(y = datap_1, p = 3, type = "none")
Estimation results for equation N225:
=====================================
N225 = N225.l1 + FTSE.l1 + SP500.l1 + N225.l2 + FTSE.l2 + SP500.l2 + N225.l3 + FTSE.l3 + SP500.l3
Estimate Std. Error t value Pr(>|t|)
N225.l1 0.03436 0.03116 1.103 0.270
FTSE.l1 0.47025 0.06633 7.089 2.48e-12 ***
SP500.l1 0.60717 0.07512 8.083 1.74e-15 ***
N225.l2 0.14938 0.03057 4.886 1.19e-06 ***
FTSE.l2 -0.05440 0.06744 -0.807 0.420
SP500.l2 -0.09024 0.07782 -1.160 0.246
N225.l3 0.16809 0.02924 5.749 1.18e-08 ***
FTSE.l3 0.04480 0.06597 0.679 0.497
SP500.l3 -0.01007 0.07941 -0.127 0.899
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0002397 on 1046 degrees of freedom
Multiple R-Squared: 0.3099, Adjusted R-squared: 0.304
F-statistic: 52.2 on 9 and 1046 DF, p-value: < 2.2e-16
at the end of thi output, we see that p-value is less than 2.2e-16.
when I run this code:
lapply(coef(vardatap_3), "[", , "Pr(>|t|)"))
the output is:
$N225
N225.l1 FTSE.l1 SP500.l1 N225.l2 FTSE.l2 SP500.l2
2.703965e-01 2.479333e-12 1.738649e-15 1.189843e-06 4.201011e-01 2.464906e-01
N225.l3 FTSE.l3 SP500.l3
1.177588e-08 4.971743e-01 8.990626e-01
$FTSE
N225.l1 FTSE.l1 SP500.l1 N225.l2 FTSE.l2 SP500.l2
8.849041e-01 2.730359e-09 3.415860e-10 8.673114e-01 5.232037e-02 2.887330e-10
N225.l3 FTSE.l3 SP500.l3
8.698535e-02 6.215429e-15 9.290871e-02
$SP500
N225.l1 FTSE.l1 SP500.l1 N225.l2 FTSE.l2 SP500.l2
2.431252e-01 3.928462e-02 4.362288e-02 1.007840e-01 1.141799e-01 8.819460e-03
N225.l3 FTSE.l3 SP500.l3
1.129084e-03 1.426315e-01 1.307562e-06
and it's not the p-value. How can I reach to this value?

If fit is the object returned by the VAR function, you can use
lapply(coef(fit), "[", , "Pr(>|t|)")
to generate a list of vectors of the p-values.
If you want to extract/calculate the p-values for the whole models, you can try
sapply(summary(fit)$varresult, function(x) {
tmp <- x[["fstatistic"]]
pf(tmp[1], tmp[2], tmp[3], lower.tail = FALSE)
})

Related

model selection using StepAIC; how can I see other models besides my final model?

I am conducting model selection on my dataset using the package MASS and the function stepAIC. This is the current code I am using:
mod <- lm(Distance~DiffAge + DiffR + DiffSize + DiffRep + DiffSeason +
Diff.Bkp + Diff.Fzp + Diff.AO + Diff.Aow +
Diff.Lag.NAOw + Diff.Lag.NAO + Diff.Lag.AO + Diff.Lag.Aow, data=data,
na.action="na.exclude")
library(MASS)
step.model<-stepAIC(mod, direction = "both",
trace = FALSE)
summary(step.model)
this gives me the following output:
Call:
lm(formula = Distance ~ Diff.Lag.NAOw + Diff.Lag.AO + DiffSeason,
data = data, na.action = "na.exclude")
Residuals:
Min 1Q Median 3Q Max
-146.984 -48.397 -9.533 42.169 194.950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 77.944 20.247 3.850 0.000184 ***
Diff.Lag.NAOw 11.868 6.261 1.896 0.060209 .
Diff.Lag.AO 24.696 17.475 1.413 0.159947
DiffSeasonEW-LW 41.891 18.607 2.251 0.026014 *
DiffSeasonLW-LW 22.863 20.791 1.100 0.273465
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 67.2 on 132 degrees of freedom
Multiple R-squared: 0.06031, Adjusted R-squared: 0.03183
F-statistic: 2.118 on 4 and 132 DF, p-value: 0.08209
If I am reading this right, the output only shows me the top model (Let me know if this is incorrect!). I would like to see the other, lower-ranked models as well, with their accompanying AIC scores.
Any suggestions on how I can achieve this? Should I modify my code in any way?

Marginal Effects of conditional logit model in R using, "clogit," function

I am trying to figure out how to calculate the marginal effects of my model using the, "clogit," function in the survival package. The margins package does not seem to work with this type of model, but does work with "multinom" and "mclogit." However, I am investigating the affects of choice characteristics, and not individual characteristics, so it needs to be a conditional logit model. The mclogit function works with the margins package, but these results are widely different from the results using the clogit function, why is that? Any help calculating the marginal effects from the clogit function would be greatly appreciated.
mclogit output:
Call:
mclogit(formula = cbind(selected, caseID) ~ SysTEM + OWN + cost +
ENVIRON + NEIGH + save, data = atl)
Estimate Std. Error z value Pr(>|z|)
SysTEM 0.139965 0.025758 5.434 5.51e-08 ***
OWN 0.008931 0.026375 0.339 0.735
cost -0.103012 0.004215 -24.439 < 2e-16 ***
ENVIRON 0.675341 0.037104 18.201 < 2e-16 ***
NEIGH 0.419054 0.031958 13.112 < 2e-16 ***
save 0.532825 0.023399 22.771 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 18380
Residual Deviance: 16670
Number of Fisher Scoring iterations: 4
Number of observations: 8364
clogit output:
Call:
coxph(formula = Surv(rep(1, 25092L), selected) ~ SysTEM + OWN +
cost + ENVIRON + NEIGH + save + strata(caseID), data = atl,
method = "exact")
n= 25092, number of events= 8364
coef exp(coef) se(coef) z Pr(>|z|)
SysTEM 0.133184 1.142461 0.034165 3.898 9.69e-05 ***
OWN -0.015884 0.984241 0.036346 -0.437 0.662
cost -0.179833 0.835410 0.005543 -32.442 < 2e-16 ***
ENVIRON 1.186329 3.275036 0.049558 23.938 < 2e-16 ***
NEIGH 0.658657 1.932195 0.042063 15.659 < 2e-16 ***
save 0.970051 2.638079 0.031352 30.941 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
SysTEM 1.1425 0.8753 1.0685 1.2216
OWN 0.9842 1.0160 0.9166 1.0569
cost 0.8354 1.1970 0.8264 0.8445
ENVIRON 3.2750 0.3053 2.9719 3.6091
NEIGH 1.9322 0.5175 1.7793 2.0982
save 2.6381 0.3791 2.4809 2.8053
Concordance= 0.701 (se = 0.004 )
Rsquare= 0.103 (max possible= 0.688 )
Likelihood ratio test= 2740 on 6 df, p=<2e-16
Wald test = 2465 on 6 df, p=<2e-16
Score (logrank) test = 2784 on 6 df, p=<2e-16
margins output for mclogit
margins(model2A)
SysTEM OWN cost ENVIRON NEIGH save
0.001944 0.000124 -0.001431 0.00938 0.00582 0.0074
margins output for clogit
margins(model2A)
Error in match.arg(type) :
'arg' should be one of “risk”, “expected”, “lp”

How to remove insignificant variables in caret

I'm using glm model with cross-validation (10-folds) of caret package. I'd like to remove the non-significate variables of the model, for example, TX_RESP_Q108B, TX_RESP_Q108C, TX_RESP_Q065.Q etc.
Input
tc <- trainControl("cv", 10, savePredictions = T, classProbs = T)
fit1 <- train(response~., data = my_data, method = "glm", family = "binomial", trControl = tc)
Output
Call:
NULL
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3268 -0.6676 -0.5238 -0.3620 3.0841
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.507239 0.171328 -20.471 < 2e-16 ***
ID_TURNO2 -0.412723 0.024630 -16.757 < 2e-16 ***
ID_TURNO3 -2.089175 0.234093 -8.925 < 2e-16 ***
TX_RESP_Q089B 0.452723 0.047093 9.613 < 2e-16 ***
TX_RESP_Q108B -0.001968 0.330746 -0.006 0.99525
TX_RESP_Q108C 0.220013 0.222716 0.988 0.32322
TX_RESP_Q108D 0.371279 0.178481 2.080 0.03751 *
TX_RESP_Q108E 0.115137 0.164007 0.702 0.48266
TX_RESP_Q108F 0.301288 0.162907 1.849 0.06439 .
TX_RESP_Q079B 0.410358 0.027035 15.179 < 2e-16 ***
TX_RESP_Q005.L 0.558060 0.047220 11.818 < 2e-16 ***
TX_RESP_Q005.Q 0.090621 0.036774 2.464 0.01373 *
TX_RESP_Q005.C -0.144208 0.029758 -4.846 1.26e-06 ***
`TX_RESP_Q005^4` -0.015192 0.023936 -0.635 0.52563
TX_RESP_Q078B 0.400272 0.046226 8.659 < 2e-16 ***
TX_RESP_Q009B 0.035596 0.131552 0.271 0.78671
TX_RESP_Q009C -0.072846 0.150077 -0.485 0.62740
TX_RESP_Q009D 0.270092 0.028457 9.491 < 2e-16 ***
TX_RESP_Q009E 0.184751 0.033084 5.584 2.35e-08 ***
TX_RESP_Q009F 0.118077 0.066456 1.777 0.07561 .
TX_RESP_Q070B 0.483582 0.026233 18.434 < 2e-16 ***
TX_RESP_Q013.L -0.142476 0.069901 -2.038 0.04152 *
TX_RESP_Q013.Q -0.096416 0.056399 -1.710 0.08735 .
TX_RESP_Q013.C 0.075822 0.052626 1.441 0.14965
`TX_RESP_Q013^4` 0.089282 0.049417 1.807 0.07081 .
`TX_RESP_Q013^5` -0.039960 0.041490 -0.963 0.33550
`TX_RESP_Q013^6` 0.084890 0.031894 2.662 0.00778 **
TX_RESP_Q048.L 0.569328 0.063179 9.011 < 2e-16 ***
TX_RESP_Q048.Q -0.016526 0.057944 -0.285 0.77548
TX_RESP_Q048.C 0.112465 0.052459 2.144 0.03204 *
TX_RESP_Q107B 0.167977 0.275342 0.610 0.54182
TX_RESP_Q107C -0.084265 0.198600 -0.424 0.67135
TX_RESP_Q107D 0.017702 0.159386 0.111 0.91156
TX_RESP_Q107E 0.225705 0.146949 1.536 0.12455
TX_RESP_Q107F 0.253538 0.146698 1.728 0.08394 .
TX_RESP_Q062.L -0.027898 0.104988 -0.266 0.79045
TX_RESP_Q062.Q 0.054648 0.076644 0.713 0.47584
TX_RESP_Q062.C -0.021481 0.044358 -0.484 0.62819
TX_RESP_Q045B 0.348216 0.071149 4.894 9.87e-07 ***
TX_RESP_Q045C 0.118404 0.071593 1.654 0.09816 .
TX_RESP_Q045D -0.067446 0.077291 -0.873 0.38287
TX_RESP_Q058B 0.073366 0.076204 0.963 0.33567
TX_RESP_Q058C 0.095275 0.081153 1.174 0.24039
TX_RESP_Q058D 0.167319 0.085421 1.959 0.05014 .
TX_RESP_Q059B -0.206194 0.103281 -1.996 0.04589 *
TX_RESP_Q059C -0.185812 0.105676 -1.758 0.07869 .
TX_RESP_Q059D -0.098488 0.108455 -0.908 0.36383
TX_RESP_Q060B 0.273180 0.060671 4.503 6.71e-06 ***
TX_RESP_Q060C 0.368747 0.063615 5.797 6.77e-09 ***
TX_RESP_Q060D 0.396086 0.067710 5.850 4.92e-09 ***
TX_RESP_Q061B 0.066926 0.087237 0.767 0.44298
TX_RESP_Q061C -0.006212 0.092005 -0.068 0.94617
TX_RESP_Q061D 0.012422 0.096713 0.128 0.89780
TX_RESP_Q063.L 0.024938 0.098261 0.254 0.79965
TX_RESP_Q063.Q 0.070767 0.071592 0.988 0.32291
TX_RESP_Q063.C -0.040853 0.042945 -0.951 0.34146
TX_RESP_Q065.L -0.039285 0.051561 -0.762 0.44612
TX_RESP_Q065.Q 0.019015 0.036056 0.527 0.59792
TX_RESP_Q065.C 0.025068 0.025339 0.989 0.32252
TX_RESP_Q067B -0.075401 0.093803 -0.804 0.42150
TX_RESP_Q067C -0.108572 0.093832 -1.157 0.24724
TX_RESP_Q067D -0.038972 0.094387 -0.413 0.67969
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 52275 on 56144 degrees of freedom
Residual deviance: 48930 on 56083 degrees of freedom
AIC: 49054
Number of Fisher Scoring iterations: 6
I tried (for example)
f2 <- update(f~., TURMA_PROFICIENTE~ID_TURNO + TX_RESP_Q089B + TX_RESP_Q108D)
fit2 <- train(f2, data = dados, method = "glm", family = "binomial", trControl = tc)
#Error mensage
Error in eval(expr, envir, enclos) : object 'TX_RESP_Q089B' not found
This might be a bad idea (as described). You probably should not look at a single model fit, figure out what is significant using the same data, and eliminate terms. It is kind of a self-fulfilling prophecy.
You do seem to have a lot of data, which helps minimize the risk, but I would instead use caret's sbf function to do the feature selection inside of cross-validation to remove the features. This will help protect you from selection bias and give you honest estimates of performance.

How to memorize one element from summary(VAR((b, p = 3, type = "const")) in R?

After running summary(VAR((b, p = 3, type = "const")), I get the result shown at the end of this question.
I would like to memorize not all results, but only one value: diff_gov.l3 **0.37446** (Estimate).
I suppose that my code should start with summary(VAR((b, p = 3, type = "const"))$SOMETHING, but I do not know how to define the position of diff_gov.l3 in column Estimate (0.37446).
Do you have any suggestion how can I define the position of the value I need (summary(VAR((b, p = 3, type = "const"))$SOMETHING) or is there a way to see the position of every element in the result of summary() code?
RESULT:
summary(VAR(b, p = 3, type = "const"))
VAR Estimation Results:
=========================
Endogenous variables: diff_gov, diff_hh
Deterministic variables: const
Sample size: 47
Log Likelihood: 64.057
Roots of the characteristic polynomial:
0.7848 0.7848 0.7722 0.693 0.693 0.5438
Call:
VAR(y = b, p = 3, type = "const")
Estimation results for equation diff_gov:
=========================================
diff_gov = diff_gov.l1 + diff_hh.l1 + diff_gov.l2 + diff_hh.l2 + diff_gov.l3 + diff_hh.l3 + const
Estimate Std. Error t value Pr(>|t|)
diff_gov.l1 0.18760 0.15514 1.209 0.23366
diff_hh.l1 0.06080 0.04760 1.277 0.20889
diff_gov.l2 -0.35682 0.19484 -1.831 0.07450 .
diff_hh.l2 0.14308 0.04650 3.077 0.00376 **
diff_gov.l3 0.37446 0.18893 1.982 0.05438 .
diff_hh.l3 0.02682 0.05061 0.530 0.59910
const 0.02261 0.02707 0.835 0.40849
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.07446 on 40 degrees of freedom
Multiple R-Squared: 0.3626, Adjusted R-squared: 0.267
F-statistic: 3.792 on 6 and 40 DF, p-value: 0.004396
Estimation results for equation diff_hh:
========================================
diff_hh = diff_gov.l1 + diff_hh.l1 + diff_gov.l2 + diff_hh.l2 + diff_gov.l3 + diff_hh.l3 + const
Estimate Std. Error t value Pr(>|t|)
diff_gov.l1 1.403165 0.501766 2.796 0.0079 **
diff_hh.l1 0.007256 0.153957 0.047 0.9626
diff_gov.l2 -1.548307 0.630178 -2.457 0.0184 *
diff_hh.l2 0.057511 0.150394 0.382 0.7042
diff_gov.l3 1.294856 0.611078 2.119 0.0404 *
diff_hh.l3 -0.238964 0.163701 -1.460 0.1522
const 0.212912 0.087541 2.432 0.0196 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2408 on 40 degrees of freedom
Multiple R-Squared: 0.2527, Adjusted R-squared: 0.1406
F-statistic: 2.254 on 6 and 40 DF, p-value: 0.05745
Covariance matrix of residuals:
diff_gov diff_hh
diff_gov 0.005544 0.003397
diff_hh 0.003397 0.057993
Correlation matrix of residuals:
diff_gov diff_hh
diff_gov 1.0000 0.1894
diff_hh 0.1894 1.0000
When you're interested in the structure of an object, so that you can extract particular elements, you can use str(x), e.g. str(summary(VAR(b, p = 3, type = "const"))).
For varest objects produced by vars::VAR, you can access the coefficients, p-values, etc. for a particular equation by accessing summary(m)$varresult$eq$coefficients, where m is the varest object, and eq is the variable of interest (diff_gov in your case).
Here's an example:
library(vars)
data(Canada)
m <- VAR(Canada, p = 2, type = "none")
summary(m)$varresult$e$coefficients
# Estimate Std. Error t value Pr(>|t|)
# e.l1 1.62046761 0.15483879 10.4655148 3.011899e-16
# prod.l1 0.17973134 0.06295812 2.8547760 5.584192e-03
# rw.l1 -0.04425592 0.05652496 -0.7829448 4.361581e-01
# U.l1 0.11310425 0.19947288 0.5670157 5.724195e-01
# e.l2 -0.64815156 0.15207587 -4.2620276 5.893211e-05
# prod.l2 -0.11683270 0.06797209 -1.7188334 8.982575e-02
# rw.l2 0.04475537 0.05472427 0.8178341 4.160775e-01
# U.l2 -0.06581206 0.19724901 -0.3336496 7.395876e-01
The resulting object is a matrix that can be subsetted normally. For example, to extract the estimate associated with rw.l1, you can use:
summary(m)$varresult$e$coefficients['rw.l1', 'Estimate']
[1] -0.04425592
So, in your case, you want something like this:
m <- VAR(b, p = 3, type = "const")
summary(m)$varresult$diff_gov$coefficients['diff_gov.l3', 'Estimate']

How to extract adjusted R squared in vars package?

This question is highly correlated with the question from this link. How to extract p-value in var package?
I just would like to take adjusted R squared from VARS package..
Even though there is a similar question, I don't have any idea to modify to take adjusted r square.. please help me.
I just followed previous example.
library(vars)
symbols=c('^N225','^FTSE','^GSPC')
getSymbols(symbols,src='yahoo', from="2003-04-28", to="2007-10-29")
period="daily"
A1=periodReturn(N225$N225.Adjusted,period=period)
B1=periodReturn(FTSE$FTSE.Adjusted,period=period)
C1=periodReturn(GSPC$GSPC.Adjusted,period=period)
datap_1<-cbind(A1,B1,C1)
datap_1<-na.omit(datap_1)
datap_1<-(datap_1)^2
vardatap_3<-VAR(datap_1,p=3,type="none")
summary(vardatap_3)
Then the summary can be presented like..
VAR Estimation Results:
=========================
Endogenous variables: N225, FTSE, SP500
Deterministic variables: none
Sample size: 1055
Log Likelihood: 23637.848
Roots of the characteristic polynomial:
0.8639 0.6224 0.6224 0.5711 0.5711 0.5471 0.5471 0.4683 0.4683
Call:
VAR(y = datap_1, p = 3, type = "none")
Estimation results for equation N225:
=====================================
N225 = N225.l1 + FTSE.l1 + SP500.l1 + N225.l2 + FTSE.l2 + SP500.l2 + N225.l3 + FTSE.l3 + SP500.l3
Estimate Std. Error t value Pr(>|t|)
N225.l1 0.03436 0.03116 1.103 0.270
FTSE.l1 0.47025 0.06633 7.089 2.48e-12 ***
SP500.l1 0.60717 0.07512 8.083 1.74e-15 ***
N225.l2 0.14938 0.03057 4.886 1.19e-06 ***
FTSE.l2 -0.05440 0.06744 -0.807 0.420
SP500.l2 -0.09024 0.07782 -1.160 0.246
N225.l3 0.16809 0.02924 5.749 1.18e-08 ***
FTSE.l3 0.04480 0.06597 0.679 0.497
SP500.l3 -0.01007 0.07941 -0.127 0.899
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0002397 on 1046 degrees of freedom
Multiple R-Squared: 0.3099, Adjusted R-squared: 0.304
F-statistic: 52.2 on 9 and 1046 DF, p-value: < 2.2e-16
Adjusted r squared values can be accessed in output of function summary() and list element varresult. varresult contains summary tables for each of daily returns.
> lapply(summary(vardatap_3)$varresult, "[", "adj.r.squared")
$daily.returns
$daily.returns$adj.r.squared
[1] 0.3039812
$daily.returns.1
$daily.returns.1$adj.r.squared
[1] 0.3201587
$daily.returns.2
$daily.returns.2$adj.r.squared
[1] 0.1972104

Resources