R fGarch: presample matrix for garchSpec() - r

I am trying to specify GARCH model by function fGarch::garchSpec() and i need a specified presample. As defined in manual:
presample: a numeric three column matrix with start values for the
series, for the innovations, and for the conditional
variances.
But i am pretty sure, that this is not the correct order. After reading the manuals and codes for functions: 'garchFit', 'garchSpec', 'garchSim' I am still quite confused.
The question is: how to exactly build presample matrix?

You don't need to set an argument to presample. It will supply a "good" guess and in estimating the parameters it will not matter. If you want to simulate data, then I would just make sure the burnin, n.start, is large enough.
Let's look at an example:
library(fGarch)
## First we simulate some data without setting presample:
# we set up the model by spec:
set.seed(911)
spec <- garchSpec(model = list(mu = 0.02, omega = 0.05, alpha = 0.2, beta = 0.75))
# then simulate our GARCH(1,1) model:
garchSim <- garchSim(spec, n = 200, n.start = 1)
plot(garchSim)
And estimates:
> garchFit(~ garch(1, 1), data = garchSim)
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu -0.02196 0.05800 -0.379 0.7049
omega 0.03567 0.02681 1.331 0.1833
alpha1 0.12074 0.04952 2.438 0.0148 *
beta1 0.84527 0.05597 15.103 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log Likelihood:
-265.8417 normalized: -1.329209
Let us now try to add a very xtreme presample. In the above model (and this seed) the presample was:
> spec#presample
Presample:
time z h y
1 0 -0.4324072 1 0.02
now we replace it by c(100, 0.1, 0.1). Since my model is a GARCH(1,1) without any ARMA-part, I only need to set 3 parameters as descriped in the documentation ?garchSpec. After updating spec we estimate the same model:
set.seed(911)
spec#presample <- matrix(c(0.1, 0.1, 0.1), ncol = 3)
garchFit(~ garch(1, 1), data = garchSim)
with the same output:
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu -0.02196 0.05800 -0.379 0.7049
omega 0.03567 0.02681 1.331 0.1833
alpha1 0.12074 0.04952 2.438 0.0148 *
beta1 0.84527 0.05597 15.103 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log Likelihood:
-265.8417 normalized: -1.329209
The likelihood and estimates are identical, but notice when we simulate with the new spec:
set.seed(911)
garchSim <- garchSim(spec, n = 200, n.start = 1)
plot(garchSim)
, the extreme initial sample supplied messed up our nice simulation. But by increasing the burn.in we get:
set.seed(911)
garchSim <- garchSim(spec, n = 200, n.start = 100)
plot(garchSim)

Related

nls model not showing the y0 table but the a and b

I am trying to get the nonlinear formula of y0, a, and b for my curve so I cam plot it on my graph. the summary(nls.mod) output does not show the y0 that I will need to plot the curve and I am not sure why as I have tried everything. The code is below:
# BH this version of plot is used for diagnostic plots for
# BH residuals of a linear model, i.e. using lm.
plot(mdl3 <- lm(ETR ~ wp_Mpa + I(wp_Mpa^2) + I(wp_Mpa^3), data = dat3))
prd <- data.frame(x = seq(-4, 0, by = 0.5))
result <- prd
result$mdl3 <- predict(mdl3, newdat3 = prd)
# BH use nls to fit this model y0+a*exp(b*x)
nls.mod <- nls(ETR ~ y0 + a*exp(b*wp_Mpa), start=c(a = -4, b = 0), data=dat3.no_na)
summary(nls.mod)
and here is the output:
Formula: ETR ~ y0 + a * exp(b * wp_Mpa)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 85.85515 8.62005 9.960 <2e-16 ***
b 0.14157 0.07444 1.902 0.0593 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 58.49 on 134 degrees of freedom
Number of iterations to convergence: 8
Achieved convergence tolerance: 1.515e-06
as you can see for some reason only the a and b show up but the y0 is supposed to be above that
I tried to reassign the variables and that just continued to give me the same output contacted a stats consultant and they just said I needed to change the variables but it still didnt work

Difference between glm with sandwich package and glmrob for Poisson distribution

I´m having some doubts with glmrob(package: robustbase). I want to use glmrob to get the same results as I was having with glm + sandwich.
I was writing:
p_3 <- glm(formula = var1~ var2,
family = poisson(link=log),
data = p3,
na.action = na.omit)
coeftest(p_3, vcov = sandwich)
Both variables are categorical. var1 has two categories and var2 has four.
Now I'm trying to use glmrob to get everything in the same step:
p_2 <- glmrob(formula = var1~ var2,
family = poisson (link=log),
data = p3,
na.action = na.omit,
method= "Mqle",
control = glmrobMqle.control(tcc= 1.2)
)
summary(p2)and summary(p_3)don´t yield the same results so I think that I need to make some changes to this two lines method= "Mqle",control = glmrobMqle.control(tcc= 1.2)but I don´t really know which ones.
Maybe I have to use method="MT" as it works for Poisson models, but I´m not sure.
The outputs:
With glmrob:
summary(p2)
>Call: glmrob(formula = dummy28_n ~ coexistencia, family = poisson(link = log), data = p3, na.action = na.omit, method = "Mqle", control = glmrobMqle.control(tcc = 1.2))
>Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7020 0.5947 -2.862 0.00421 **
coexistenciaPobreza energética 1.1627 0.9176 1.267 0.20514
coexistenciaInseguridad residencial 0.7671 0.6930 1.107 0.26830
coexistenciaCoexistencia de inseguridades 1.3688 0.6087 2.249 0.02453 *
---
>Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Robustness weights w.r * w.x:
143 weights are ~= 1. The remaining 3 ones are
71 124 145
0.6266 0.6266 0.6266
>Number of observations: 146
Fitted by method ‘Mqle’ (in 8 iterations)
>(Dispersion parameter for poisson family taken to be 1)
>No deviance values available
Algorithmic parameters:
acc tcc
0.0001 1.2000
maxit
50
test.acc
"coef"
with glm and sandwich:
coef <- coeftest(p_3, vcov = sandwich)
coef
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.79176 0.52705 -3.3996 0.0006748 ***
coexistenciaPobreza energética 1.09861 0.72648 1.5122 0.1304744
coexistenciaInseguridad residencial 0.69315 0.60093 1.1535 0.2487189
coexistenciaCoexistencia de inseguridades 1.32972 0.53259 2.4967 0.0125349 *
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
´´´

How to calculate by hand standard errors and t statistics of minpack.lm::nlsLM?

Let's consider this code as an example:
a = 10
b = 2
c = 1.05
set.seed(123,kind="Mersenne-Twister",normal.kind="Inversion")
x = runif(100)
data = data.frame(X = x, Y = a + b/c * (((1-x)^-c)-1))
fit_sp <- minpack.lm::nlsLM( formula = Y ~ a + b/c * (((1-X)^-c)-1),
data = data, start = c(a = 0, b = 0.1, c = 0.01),
control = nls.control(maxiter = 1000),
lower = c(0.0001,0.0001,0.0001))
fit_sp
Nonlinear regression model
model: Y ~ a + b/c * (((1 - X)^-c) - 1)
data: data
a b c
10.00 2.00 1.05
residual sum-of-squares: 1.507e-26
Number of iterations to convergence: 13
Achieved convergence tolerance: 1.49e-08
summary(fit_sp)
Formula: Y ~ a + b/c * (((1 - X)^-c) - 1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 1.000e+01 1.516e-15 6.598e+15 <2e-16 ***
b 2.000e+00 4.372e-16 4.574e+15 <2e-16 ***
c 1.050e+00 5.319e-17 1.974e+16 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.246e-14 on 97 degrees of freedom
Number of iterations to convergence: 13
Achieved convergence tolerance: 1.49e-08
I know that non linear least squares calculates the coefficients that minimize the sum of squared residuals. But how is it possible to obtain by hand the standard errors and the t-statistics for the parameters estimate?

How is estimated degree of freedom in GAM determined?

I was working on my GAM model:
y <- c(0.0000943615,0.0074918919,0.0157332851,0.0783308615,
0.1546375803,0.5558444681,0.8583806898,0.9617216854,
0.9848004112,0.9964662546)
x <- log(c(0.05, 0.1, 0.15, 0.2, 0.4, 0.8, 1.6, 3.2, 4.5, 6.4))
fit.gam <- mgcv::gam(y ~ s(x,k=-1, bs="cr"))
summary(fit.gam)
coef(fit.gam)
The model summary tells me that the edf of s(x) is 6.893 with p-value = 0.0017:
> summary(fit.gam)
Family: gaussian
Link function: identity
Formula:
y ~ s(x, k = -1, bs = "cr")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.46135 0.00629 73.34 0.000126 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 6.893 7.902 585.7 0.0017 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.998 Deviance explained = 100%
GCV = 0.0018783 Scale est. = 0.0003957 n = 10
The coefficients of my model still contain 9 coefficients for s(x).
> coef(fit.gam)
(Intercept) s(x).1 s(x).2 s(x).3 s(x).4 s(x).5 s(x).6
0.4613501 -0.3450787 -0.3229509 -0.2895761 -0.1783854 0.1976228 0.5040469
s(x).7 s(x).8 s(x).9
0.6135856 0.6338979 0.6470116
My question is I understand that GAM penalized the variable x in some extent so that the estimated degree of freedom of x = 6.893 < 9, but from the coefficients of s(x), it is hard for me to tell which basis is penalized. How should I understand the relationship between edf and coefficients of s(x)? Thanks!

Variable Selection with mgcv

Is there a way of automating variable selection of a GAM in R, similar to step? I've read the documentation of step.gam and selection.gam, but I've yet to see a answer with code that works. Additionally, I've tried method= "REML" and select = TRUE, but neither remove insignificant variables from the model.
I've theorized that I could create a step model and then use those variables to create the GAM, but that does not seem computationally efficient.
Example:
library(mgcv)
set.seed(0)
dat <- data.frame(rsp = rnorm(100, 0, 1),
pred1 = rnorm(100, 10, 1),
pred2 = rnorm(100, 0, 1),
pred3 = rnorm(100, 0, 1),
pred4 = rnorm(100, 0, 1))
model <- gam(rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4),
data = dat, method = "REML", select = TRUE)
summary(model)
#Family: gaussian
#Link function: identity
#Formula:
#rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4)
#Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.02267 0.08426 0.269 0.788
#Approximate significance of smooth terms:
# edf Ref.df F p-value
#s(pred1) 0.8770 9 0.212 0.1174
#s(pred2) 1.8613 9 0.638 0.0374 *
#s(pred3) 0.5439 9 0.133 0.1406
#s(pred4) 0.4504 9 0.091 0.1775
---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#R-sq.(adj) = 0.0887 Deviance explained = 12.3%
#-REML = 129.06 Scale est. = 0.70996 n = 100
Marra and Wood (2011, Computational Statistics and Data Analysis 55; 2372-2387) compare various approaches for feature selection in GAMs. They concluded that an additional penalty term in the smoothness selection procedure gave the best results. This can be activated in mgcv::gam() by using the select = TRUE argument/setting, or any of the following variations:
model <- gam(rsp ~ s(pred1,bs="ts") + s(pred2,bs="ts") + s(pred3,bs="ts") + s(pred4,bs="ts"), data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="cr") + s(pred2,bs="cr") + s(pred3,bs="cr") + s(pred4,bs="cr"),
data = dat, method = "REML",select=T)
model <- gam(rsp ~ s(pred1,bs="cc") + s(pred2,bs="cc") + s(pred3,bs="cc") + s(pred4,bs="cc"),
data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="tp") + s(pred2,bs="tp") + s(pred3,bs="tp") + s(pred4,bs="tp"), data = dat, method = "REML")
In addition to specifying select = TRUE in your call to function gam, you can increase the value of argument gamma to get stronger penalization. For example, we generate some data:
library("mgcv")
set.seed(2)
dat <- gamSim(1, n=400, dist="normal", scale=5)
## Gu & Wahba 4 term additive model
We fit a GAM with 'standard' penalization and variable selection:
b <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data=dat, method = "REML")
summary(b)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x0) + s(x1) + s(x2) + s(x3)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.890 0.246 32.07 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(x0) 1.363 1.640 0.804 0.3174
## s(x1) 1.681 2.088 11.309 1.35e-05 ***
## s(x2) 5.931 7.086 16.240 < 2e-16 ***
## s(x3) 1.002 1.004 4.102 0.0435 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## R-sq.(adj) = 0.253 Deviance explained = 27.1%
## -REML = 1212.5 Scale est. = 24.206 n = 400
par(mfrow = c(2, 2))
plot(b)
We fit a GAM with stronger penalization and variable selection:
b2 <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data=dat, method = "REML", select = TRUE, gamma = 7)
## summary(b2)
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x0) + s(x1) + s(x2) + s(x3)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.8898 0.2604 30.3 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(x0) 5.330e-05 9 0.000 0.1868
## s(x1) 5.427e-01 9 0.967 7.4e-05 ***
## s(x2) 1.549e+00 9 6.210 < 2e-16 ***
## s(x3) 6.155e-05 9 0.000 0.0812 .
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## R-sq.(adj) = 0.163 Deviance explained = 16.7%
## -REML = 179.46 Scale est. = 27.115 n = 400
plot(b2)
According to the documentation, increasing the value of gamma produces smoother models, because it multiplies the effective degrees of freedom in the GCV or UBRE/AIC criterion.
A possible downside is thus that all non-linear effects will be shrunken towards linear effects, and all linear effects will be shrunken towards zero. This is what we also observe in the plots and output above: With higher value of gamma, some effects are practically penalized out (edf values close 0, F-value of 0), while the other effects are closer to linear (edf values closer to 1).

Resources