R, ggplot2, jtools: johnson neyman plot error - r

I am desperately trying to plot a Johnson-Neyman Plot for the following interaction:
nxsc_20 <-lm(meandec20 ~ centered_nep*centered_selfcontrol + factor(study), data = allstudies_wide)
I get the following output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.28264 0.02260 12.508 < 2e-16 ***
centered_nep 0.08998 0.01192 7.551 7.41e-14 ***
centered_selfcontrol 0.01894 0.01021 1.856 0.06364 .
factor(study)2 0.03462 0.02531 1.368 0.17146
factor(study)3 0.35767 0.02635 13.573 < 2e-16 ***
factor(study)4 0.33224 0.03709 8.956 < 2e-16 ***
centered_nep:centered_selfcontrol 0.03706 0.01300 2.850 0.00443 **
Now I try to make a JN-Plot
johnson_neyman(nxsc_20, meandec20, centered_selfcontrol, alpha = 0.05, plot = TRUE)
and I get the following error:
Fehler in vmat[pred, pred] : Indizierung außerhalb der Grenzen (Error in vmat[pred, pred] : indexing out of range)
Can anybody help me with this?
Thank you so much!

The pred = option is not for the response variable but for the predictor you want to plot on the y-axis. This will work:
library(interactions)
allstudies_wide = data.frame(meandec20=rnorm(500),centered_nep = runif(500),
centered_selfcontrol = runif(500), study = sample(1:4,500,replace=TRUE))
nxsc_20 <-lm(meandec20 ~ centered_nep*centered_selfcontrol + factor(study),
data = allstudies_wide)
johnson_neyman(model = nxsc_20, pred = centered_nep,modx = centered_selfcontrol)

Related

R - Fixed-effects regression "plm" vs "lm + as.factor()": interpretation of R and R-Squared

I understand from this question here that coefficients are the same whether we use a lm regression with as.factor() and a plm regression with fixed effects.
N <- 10000
df <- data.frame(a = rnorm(N), b = rnorm(N),
region = rep(1:100, each = 100), year = rep(1:100, 100))
df$y <- 2 * df$a - 1.5 * df$b + rnorm(N)
model.a <- lm(y ~ a + b + factor(year) + factor(region), data = df)
summary(model.a)
# (Intercept) -0.0522691 0.1422052 -0.368 0.7132
# a 1.9982165 0.0101501 196.866 <2e-16 ***
# b -1.4787359 0.0101666 -145.450 <2e-16 ***
library(plm)
pdf <- pdata.frame(df, index = c("region", "year"))
model.b <- plm(y ~ a + b, data = pdf, model = "within", effect = "twoways")
summary(model.b)
# Coefficients :
# Estimate Std. Error t-value Pr(>|t|)
# a 1.998217 0.010150 196.87 < 2.2e-16 ***
# b -1.478736 0.010167 -145.45 < 2.2e-16 ***
library(lfe)
model.c <- felm(y ~ a + b | factor(region) + factor(year), data = df)
summary(model.c)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# a 1.99822 0.01015 196.9 <2e-16 ***
# b -1.47874 0.01017 -145.4 <2e-16 ***
However, the R and R-squared differ significantly. Which one is correct and how does the interpretation changes between the two models? In my case, the R-squared is much larger for the plm specification and is even negative for the lm + factor one.

Predict a value using the "output equation" of a heckit-model (sampleSelection)

I estimate a heckit-model using the heckit-model from sampleSelection.
The model looks as follows:
library(sampleSelection) Heckman = heckit(AgencyTRACE ~ SizeCat + log(Amt_Issued) + log(daysfromissuance) + log(daystomaturity) + EoW + dMon + EoM + VIX_95_Dummy + quarter, Avg_Spread_Choi ~ SizeCat + log(Amt_Issued) + log(daysfromissuance) + log(daystomaturity) + VIX_95_Dummy + TresholdHYIG_II, data=heckmandata, method = "2step")
The summary generates a probit selection equation and an outcome equation - see below:
Tobit 2 model (sample selection model)
2-step Heckman / heckit estimation
2019085 observations (1915401 censored and 103684 observed)
26 free parameters (df = 2019060)
Probit selection equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.038164 0.043275 0.882 0.378
SizeCat2 0.201571 0.003378 59.672 < 2e-16 ***
SizeCat3 0.318331 0.008436 37.733 < 2e-16 ***
log(Amt_Issued) -0.099472 0.001825 -54.496 < 2e-16 ***
log(daysfromissuance) 0.079691 0.001606 49.613 < 2e-16 ***
log(daystomaturity) -0.036434 0.001514 -24.066 < 2e-16 ***
EoW 0.021169 0.003945 5.366 8.04e-08 ***
dMon -0.003409 0.003852 -0.885 0.376
EoM 0.008937 0.007000 1.277 0.202
VIX_95_Dummy1 0.088558 0.006521 13.580 < 2e-16 ***
quarter2019.2 -0.092681 0.005202 -17.817 < 2e-16 ***
quarter2019.3 -0.117021 0.005182 -22.581 < 2e-16 ***
quarter2019.4 -0.059833 0.005253 -11.389 < 2e-16 ***
quarter2020.1 -0.005230 0.004943 -1.058 0.290
quarter2020.2 0.073175 0.005080 14.406 < 2e-16 ***
Outcome equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.29436 6.26019 7.395 1.41e-13 ***
SizeCat2 -25.63433 0.79836 -32.109 < 2e-16 ***
SizeCat3 -34.25275 1.48030 -23.139 < 2e-16 ***
log(Amt_Issued) -0.38051 0.39506 -0.963 0.33547
log(daysfromissuance) 0.02452 0.34197 0.072 0.94283
log(daystomaturity) 7.92338 0.24498 32.343 < 2e-16 ***
VIX_95_Dummy1 -2.34875 0.89133 -2.635 0.00841 **
TresholdHYIG_II1 10.36993 1.07267 9.667 < 2e-16 ***
Multiple R-Squared:0.0406, Adjusted R-Squared:0.0405
Error terms:
Estimate Std. Error t value Pr(>|t|)
invMillsRatio -23.8204 3.6910 -6.454 1.09e-10 ***
sigma 68.5011 NA NA NA
rho -0.3477 NA NA NA
Now I'd like to estimate a value using the outcome equation. I'd like to predict Spread_Choi_All using the following data:
newdata = data.frame(SizeCat=as.factor(1),
Amt_Issued=50*1000000,
daysfromissuance=5*365,
daystomaturity=5*365,
VIX_95_Dummy=as.factor(0),
TresholdHYIG_II=as.factor(0)
SizeCat is a categorical/factor variable with the value 1, 2 or 3.
I have tried varies ways, i.e.
predict(Heckman, part ="outcome", newdata = newdata)
I aim to predict a value (with the data from newdata) using the outcome equation (incl. the invMillsRatio). Is there a way how to predict a value from the outcome equation?

Change Y intercept in Poisson GLM R

Background: I have the following data that I run a glm function on:
location = c("DH", "Bos", "Beth")
count = c(166, 57, 38)
#make into df
df = data.frame(location, count)
#poisson
summary(glm(count ~ location, family=poisson))
Output:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.6376 0.1622 22.424 < 2e-16 ***
locationBos 0.4055 0.2094 1.936 0.0529 .
locationDH 1.4744 0.1798 8.199 2.43e-16 ***
Problem: I would like to change the (Intercept) so I can get all my values relative to Bos
I looked Change reference group using glm with binomial family and How to force R to use a specified factor level as reference in a regression?. I tried there method and it did not work, and I am not sure why.
Tried:
df1 <- within(df, location <- relevel(location, ref = 1))
#poisson
summary(glm(count ~ location, family=poisson, data = df1))
Desired Output:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) ...
locationBeth ...
locationDH ...
Question: How do I solve this problem?
I think your problem is that you are modifying the data frame, but in your model you are not using the data frame. Use the data argument in the model to use the data in the data frame.
location = c("DH", "Bos", "Beth")
count = c(166, 57, 38)
# make into df
df = data.frame(location, count)
Note that location by itself is a character vector. data.frame() coerces it to a factor by default in the data frame. After this conversion, we can use relevel to specify the reference level.
df$location = relevel(df$location, ref = "Bos") # set Bos as reference
summary(glm(count ~ location, family=poisson, data = df))
# Call:
# glm(formula = count ~ location, family = poisson, data = df)
# ...
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 4.0431 0.1325 30.524 < 2e-16 ***
# locationBeth -0.4055 0.2094 -1.936 0.0529 .
# locationDH 1.0689 0.1535 6.963 3.33e-12 ***
# ...

Include a name of a dependent variable in model summary restrospectively

I have a list named "mylist" that contains gam outputs. Summary of the first list is the following:
> summary(mylist[[1]][[1]])
Family: quasipoisson
Link function: log
Formula:
cardva ~ s(trend, k = 11 * 6, fx = T, bs = "cr") + s(temp_01, k = 6, fx = F, bs = "cr") + rh_01 + as.factor(dow) + s(fluepi, k = 4, fx = F, bs = "cr") + as.factor(holiday) + Lag(pm1010, 0)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.1584139 0.0331388 95.309 < 2e-16 ***
rh_01 0.0005441 0.0004024 1.352 0.17639
as.factor(dow)2 0.0356757 0.0127979 2.788 0.00533 **
as.factor(dow)3 0.0388823 0.0128057 3.036 0.00241 **
as.factor(dow)4 0.0107302 0.0129014 0.832 0.40561
as.factor(dow)5 0.0243382 0.0128705 1.891 0.05867 .
as.factor(dow)6 0.0277954 0.0128360 2.165 0.03040 *
as.factor(dow)7 0.0275593 0.0127373 2.164 0.03053 *
as.factor(holiday)1 0.0444349 0.0147219 3.018 0.00255 **
Lag(pm1010, 0) -0.0010816 0.0042891 -0.252 0.80091
After unlisting the list I have extracted the coefficients of the linear terms for the first list:
> head(plist)
[[1]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.1584139271 0.0331388386 95.3085280 0.000000000
rh_01 0.0005441175 0.0004024202 1.3521128 0.176392590
as.factor(dow)2 0.0356757100 0.0127979429 2.7876128 0.005327293
as.factor(dow)3 0.0388823055 0.0128056733 3.0363343 0.002405504
as.factor(dow)4 0.0107302325 0.0129013816 0.8317119 0.405606249
as.factor(dow)5 0.0243382447 0.0128704711 1.8910143 0.058672841
as.factor(dow)6 0.0277953708 0.0128359850 2.1654256 0.030396240
as.factor(dow)7 0.0275592574 0.0127372874 2.1636677 0.030531063
as.factor(holiday)1 0.0444348611 0.0147218816 3.0182868 0.002553265
Lag(pm1010, 0) -0.0010816252 0.0042890866 -0.2521808 0.800910389
My question is: it possible to include the names of the dependent variable (in this example cardiac) as part of the plist?
What I want to achieve is (output deliberately reduced)
cardva Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.1584139271 0.0331388386 95.3085280 0.000000000
rh_01 0.0005441175 0.0004024202 1.3521128 0.176392590
as.factor(dow)2 0.0356757100 0.0127979429 2.7876128 0.005327293
or
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.1584139271 0.0331388386 95.3085280 0.000000000
rh_01 0.0005441175 0.0004024202 1.3521128 0.176392590
as.factor(dow)7 0.0275592574 0.0127372874 2.1636677 0.030531063
as.factor(holiday)1 0.0444348611 0.0147218816 3.0182868 0.002553265
cardva_Lag(pm1010, 0) -0.0010816252 0.0042890866 -0.2521808 0.800910389
Two options: Name the nodes of the list so they would then be printed as:
names(plist)[1] <- 'cardva'
plist[1]
$cardva
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.1584139271 0.0331388386 95.3085280 0.000000000
rh_01 0.0005441175 0.0004024202 1.3521128 0.176392590
as.factor(dow)2 0.0356757100 0.0127979429 2.7876128 0.005327293
as.factor(dow)3 0.0388823055 0.0128056733 3.0363343 0.002405504
as.factor(dow)4 0.0107302325 0.0129013816 0.8317119 0.405606249
as.factor(dow)5 0.0243382447 0.0128704711 1.8910143 0.058672841
as.factor(dow)6 0.0277953708 0.0128359850 2.1654256 0.030396240
as.factor(dow)7 0.0275592574 0.0127372874 2.1636677 0.030531063
as.factor(holiday)1 0.0444348611 0.0147218816 3.0182868 0.002553265
Lag(pm1010, 0) -0.0010816252 0.0042890866 -0.2521808 0.800910389
Or:
temp <- plist[[1]]
rownames(temp)[nrow(temp)] <- paste0( "cardva_", rownames(temp)[nrow(temp)] )

How to read an item from summary in R

I am using Aparch/Garch model (library: "fGarch") and want to read (& use later) the objects like AIC, t-values of the coefficients in the summary of the model fit. How can I do this?
m3<-(garchFit(~arma(1,0)+aparch(1,1), cond.dist= "sged" ,data=t2, trace=FALSE))
summary(m3)
Title:
GARCH Modelling
Call:
garchFit(formula = ~arma(1, 0) + aparch(1, 1), data = t2, cond.dist = "sged",
trace = FALSE)
Mean and Variance Equation:
data ~ arma(1, 0) + aparch(1, 1)
[data = t2]
Conditional Distribution:
sged
Coefficient(s):
mu ar1 omega alpha1 gamma1 beta1 delta skew shape
0.00063936 0.07745422 0.00116542 0.24170185 0.19179650 0.74430731 1.11902269 1.06401615 1.23013925
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu 0.0006394 0.0004789 1.335 0.181828
ar1 0.0774542 0.0256070 3.025 0.002489 **
omega 0.0011654 0.0003097 3.763 0.000168 ***
alpha1 0.2417019 0.0368264 6.563 5.26e-11 ***
gamma1 0.1917965 0.0699436 2.742 0.006104 **
beta1 0.7443073 0.0383066 19.430 < 2e-16 ***
delta 1.1190227 0.2569665 4.355 1.33e-05 ***
skew 1.0640162 0.0295095 36.057 < 2e-16 ***
shape 1.2301392 0.0592616 20.758 < 2e-16 ***
Information Criterion Statistics:
AIC BIC SIC HQIC
-4.835325 -4.803583 -4.835395 -4.823503
I think you'll have to extract those from the output of garchFit, not its summary. Start by looking at:
> attributes(m3)
Then you can access something like $fit$tval by doing
> m3#fit$tval

Resources