I´m having some doubts with glmrob(package: robustbase). I want to use glmrob to get the same results as I was having with glm + sandwich.
I was writing:
p_3 <- glm(formula = var1~ var2,
family = poisson(link=log),
data = p3,
na.action = na.omit)
coeftest(p_3, vcov = sandwich)
Both variables are categorical. var1 has two categories and var2 has four.
Now I'm trying to use glmrob to get everything in the same step:
p_2 <- glmrob(formula = var1~ var2,
family = poisson (link=log),
data = p3,
na.action = na.omit,
method= "Mqle",
control = glmrobMqle.control(tcc= 1.2)
)
summary(p2)and summary(p_3)don´t yield the same results so I think that I need to make some changes to this two lines method= "Mqle",control = glmrobMqle.control(tcc= 1.2)but I don´t really know which ones.
Maybe I have to use method="MT" as it works for Poisson models, but I´m not sure.
The outputs:
With glmrob:
summary(p2)
>Call: glmrob(formula = dummy28_n ~ coexistencia, family = poisson(link = log), data = p3, na.action = na.omit, method = "Mqle", control = glmrobMqle.control(tcc = 1.2))
>Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7020 0.5947 -2.862 0.00421 **
coexistenciaPobreza energética 1.1627 0.9176 1.267 0.20514
coexistenciaInseguridad residencial 0.7671 0.6930 1.107 0.26830
coexistenciaCoexistencia de inseguridades 1.3688 0.6087 2.249 0.02453 *
---
>Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Robustness weights w.r * w.x:
143 weights are ~= 1. The remaining 3 ones are
71 124 145
0.6266 0.6266 0.6266
>Number of observations: 146
Fitted by method ‘Mqle’ (in 8 iterations)
>(Dispersion parameter for poisson family taken to be 1)
>No deviance values available
Algorithmic parameters:
acc tcc
0.0001 1.2000
maxit
50
test.acc
"coef"
with glm and sandwich:
coef <- coeftest(p_3, vcov = sandwich)
coef
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.79176 0.52705 -3.3996 0.0006748 ***
coexistenciaPobreza energética 1.09861 0.72648 1.5122 0.1304744
coexistenciaInseguridad residencial 0.69315 0.60093 1.1535 0.2487189
coexistenciaCoexistencia de inseguridades 1.32972 0.53259 2.4967 0.0125349 *
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
´´´
Related
I am trying to get the nonlinear formula of y0, a, and b for my curve so I cam plot it on my graph. the summary(nls.mod) output does not show the y0 that I will need to plot the curve and I am not sure why as I have tried everything. The code is below:
# BH this version of plot is used for diagnostic plots for
# BH residuals of a linear model, i.e. using lm.
plot(mdl3 <- lm(ETR ~ wp_Mpa + I(wp_Mpa^2) + I(wp_Mpa^3), data = dat3))
prd <- data.frame(x = seq(-4, 0, by = 0.5))
result <- prd
result$mdl3 <- predict(mdl3, newdat3 = prd)
# BH use nls to fit this model y0+a*exp(b*x)
nls.mod <- nls(ETR ~ y0 + a*exp(b*wp_Mpa), start=c(a = -4, b = 0), data=dat3.no_na)
summary(nls.mod)
and here is the output:
Formula: ETR ~ y0 + a * exp(b * wp_Mpa)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 85.85515 8.62005 9.960 <2e-16 ***
b 0.14157 0.07444 1.902 0.0593 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 58.49 on 134 degrees of freedom
Number of iterations to convergence: 8
Achieved convergence tolerance: 1.515e-06
as you can see for some reason only the a and b show up but the y0 is supposed to be above that
I tried to reassign the variables and that just continued to give me the same output contacted a stats consultant and they just said I needed to change the variables but it still didnt work
I’m using the multcomp package to generate contrasts for a geeglm (binomial(link="logit") ) model in R. I am running the geeglm model running the following script.
Library(geepack)
u1<-geeglm(outcome~ px_race_jama,id=npi_gp, family=binomial(link="logit"),data=mf)
Summary(u1)
Call:
geeglm(formula = outcome ~ px_race_jama, family = binomial(link = "logit"),
data = mf, id = npi_gp)
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) -0.4671 0.1541 9.19 0.0024 **
px_race_jama1 0.0959 0.1155 0.69 0.4067
px_race_jama2 -0.0293 0.1503 0.04 0.8453
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 1 0.0506
Correlation: Structure = independenceNumber of clusters: 83 Maximum cluster size: 792
To get the contrasts for the model I run the script
Library(multcomp)
glht(u1,mcp(px_race_jama="Tukey"))
I receive the error:
Error in match.arg(type) :
'arg' should be one of “pearson”, “working”, “response”
Error in modelparm.default(model, ...) :
no ‘vcov’ method for ‘model’ found!
Alternatively, I have tried creating a contrast matrix:
contrast.matrix <- rbind(
`Other-Black` = c(0, -1, 1))
comps <- glht(u1, contrast.matrix)
summary(comps)
However, I receive the same error. Any help on how to correctly generate the contrasts would be greatly appreciated.
Respectfully,
Jdukes
Something like this?
contrast.matrix <- matrix(c(0,-1,1,
0,1,-1),nrow=2,byrow=TRUE)
contrasts_geeglm <- function(fit,model_matrix,vcov_type = "robust"){
vcov_gee = if(vcov_type =="robust"){
fit$geese$vbeta}else{fit$geese$vbeta.naiv}
contrast_est = coef(fit)%*%t(model_matrix)
contrast_se = sqrt(model_matrix%*%vcov_gee%*% t(model_matrix))
output = data.frame(Estimate = contrast_est[1,],
SE = diag(contrast_se)) %>%
mutate(LCI = Estimate - 1.96*SE,
UCI = Estimate + 1.96*SE)
return(output)
}
contrasts_geeglm(u1,contrast.matrix,vcov_type="robust")
I produced the below graph using ggplot2.
PlotEchi = ggplot(data=Echinoidea,
aes(x=Year, y=mean, group = aspect, linetype = aspect, shape=aspect)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.025, position=pd) +
geom_point(position=pd, size=2) +
geom_smooth(method = "gam", formula = y~s(x, k=3), se=F, size = 0.5,colour="black") +
xlab("") +
ylab("Abundance (mean +/- SE)") +
facet_wrap(~ species, scales = "free", ncol=1) +
scale_y_continuous(limits=c(min(y=0), max(Echinoidea$mean+Echinoidea$se))) +
scale_x_continuous(limits=c(min(Echinoidea$Year-0.125), max(Echinoidea$Year+0.125)))
What I would like to do is easily retrieve the adjusted R-square for each of the fitted lines without doing an individual mgcv::gam for each plotted line using model<-gam(df, formula = y~s(x1)....). Any ideas?
This is not really possible, because ggplot2 throws away the fitted object. You can see this in the source here.
1. Solving the problem by patching ggplot2
One ugly workaround is to patch the ggplot2 code on the fly to print out the results. You can do this as follows. The initial assignment throws an error but things work anyways. To undo this just restart your R session.
library(ggplot2)
# assignInNamespace patches `predictdf.glm` from ggplot2 and adds
# a line that prints the summary of the model. For some reason, this
# creates an error, but things work nonetheless.
assignInNamespace("predictdf.glm", function(model, xseq, se, level) {
pred <- stats::predict(model, newdata = data.frame(x = xseq), se.fit = se,
type = "link")
print(summary(model)) # this is the line I added
if (se) {
std <- stats::qnorm(level / 2 + 0.5)
data.frame(
x = xseq,
y = model$family$linkinv(as.vector(pred$fit)),
ymin = model$family$linkinv(as.vector(pred$fit - std * pred$se.fit)),
ymax = model$family$linkinv(as.vector(pred$fit + std * pred$se.fit)),
se = as.vector(pred$se.fit)
)
} else {
data.frame(x = xseq, y = model$family$linkinv(as.vector(pred)))
}
}, "ggplot2")
Now we can make a plot with the patched ggplot2:
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point() + geom_smooth(se = F, method = "gam", formula = y ~ s(x, bs = "cs"))
Console output:
Family: gaussian
Link function: identity
Formula:
y ~ s(x, bs = "cs")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.4280 0.0365 93.91 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 1.546 9 5.947 5.64e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.536 Deviance explained = 55.1%
GCV = 0.070196 Scale est. = 0.066622 n = 50
Family: gaussian
Link function: identity
Formula:
y ~ s(x, bs = "cs")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.77000 0.03797 72.96 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 1.564 9 1.961 8.42e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.268 Deviance explained = 29.1%
GCV = 0.075969 Scale est. = 0.072074 n = 50
Family: gaussian
Link function: identity
Formula:
y ~ s(x, bs = "cs")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.97400 0.04102 72.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 1.279 9 1.229 0.001 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.191 Deviance explained = 21.2%
GCV = 0.088147 Scale est. = 0.08413 n = 50
Note: I do not recommend this approach.
2. Solving the problem by fitting models via tidyverse
I think it's better to just run your models separately. Doing so is quite easy with tidyverse and broom, so I'm not sure why you wouldn't want to do it.
library(tidyverse)
library(broom)
iris %>% nest(-Species) %>%
mutate(fit = map(data, ~mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data = .)),
results = map(fit, glance),
R.square = map_dbl(fit, ~ summary(.)$r.sq)) %>%
unnest(results) %>%
select(-data, -fit)
# Species R.square df logLik AIC BIC deviance df.residual
# 1 setosa 0.5363514 2.546009 -1.922197 10.93641 17.71646 3.161460 47.45399
# 2 versicolor 0.2680611 2.563623 -3.879391 14.88603 21.69976 3.418909 47.43638
# 3 virginica 0.1910916 2.278569 -7.895997 22.34913 28.61783 4.014793 47.72143
As you can see, the extracted R squared values are exactly the same in both cases.
I am trying to specify GARCH model by function fGarch::garchSpec() and i need a specified presample. As defined in manual:
presample: a numeric three column matrix with start values for the
series, for the innovations, and for the conditional
variances.
But i am pretty sure, that this is not the correct order. After reading the manuals and codes for functions: 'garchFit', 'garchSpec', 'garchSim' I am still quite confused.
The question is: how to exactly build presample matrix?
You don't need to set an argument to presample. It will supply a "good" guess and in estimating the parameters it will not matter. If you want to simulate data, then I would just make sure the burnin, n.start, is large enough.
Let's look at an example:
library(fGarch)
## First we simulate some data without setting presample:
# we set up the model by spec:
set.seed(911)
spec <- garchSpec(model = list(mu = 0.02, omega = 0.05, alpha = 0.2, beta = 0.75))
# then simulate our GARCH(1,1) model:
garchSim <- garchSim(spec, n = 200, n.start = 1)
plot(garchSim)
And estimates:
> garchFit(~ garch(1, 1), data = garchSim)
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu -0.02196 0.05800 -0.379 0.7049
omega 0.03567 0.02681 1.331 0.1833
alpha1 0.12074 0.04952 2.438 0.0148 *
beta1 0.84527 0.05597 15.103 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log Likelihood:
-265.8417 normalized: -1.329209
Let us now try to add a very xtreme presample. In the above model (and this seed) the presample was:
> spec#presample
Presample:
time z h y
1 0 -0.4324072 1 0.02
now we replace it by c(100, 0.1, 0.1). Since my model is a GARCH(1,1) without any ARMA-part, I only need to set 3 parameters as descriped in the documentation ?garchSpec. After updating spec we estimate the same model:
set.seed(911)
spec#presample <- matrix(c(0.1, 0.1, 0.1), ncol = 3)
garchFit(~ garch(1, 1), data = garchSim)
with the same output:
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu -0.02196 0.05800 -0.379 0.7049
omega 0.03567 0.02681 1.331 0.1833
alpha1 0.12074 0.04952 2.438 0.0148 *
beta1 0.84527 0.05597 15.103 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Log Likelihood:
-265.8417 normalized: -1.329209
The likelihood and estimates are identical, but notice when we simulate with the new spec:
set.seed(911)
garchSim <- garchSim(spec, n = 200, n.start = 1)
plot(garchSim)
, the extreme initial sample supplied messed up our nice simulation. But by increasing the burn.in we get:
set.seed(911)
garchSim <- garchSim(spec, n = 200, n.start = 100)
plot(garchSim)
Is there a way of automating variable selection of a GAM in R, similar to step? I've read the documentation of step.gam and selection.gam, but I've yet to see a answer with code that works. Additionally, I've tried method= "REML" and select = TRUE, but neither remove insignificant variables from the model.
I've theorized that I could create a step model and then use those variables to create the GAM, but that does not seem computationally efficient.
Example:
library(mgcv)
set.seed(0)
dat <- data.frame(rsp = rnorm(100, 0, 1),
pred1 = rnorm(100, 10, 1),
pred2 = rnorm(100, 0, 1),
pred3 = rnorm(100, 0, 1),
pred4 = rnorm(100, 0, 1))
model <- gam(rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4),
data = dat, method = "REML", select = TRUE)
summary(model)
#Family: gaussian
#Link function: identity
#Formula:
#rsp ~ s(pred1) + s(pred2) + s(pred3) + s(pred4)
#Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.02267 0.08426 0.269 0.788
#Approximate significance of smooth terms:
# edf Ref.df F p-value
#s(pred1) 0.8770 9 0.212 0.1174
#s(pred2) 1.8613 9 0.638 0.0374 *
#s(pred3) 0.5439 9 0.133 0.1406
#s(pred4) 0.4504 9 0.091 0.1775
---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#R-sq.(adj) = 0.0887 Deviance explained = 12.3%
#-REML = 129.06 Scale est. = 0.70996 n = 100
Marra and Wood (2011, Computational Statistics and Data Analysis 55; 2372-2387) compare various approaches for feature selection in GAMs. They concluded that an additional penalty term in the smoothness selection procedure gave the best results. This can be activated in mgcv::gam() by using the select = TRUE argument/setting, or any of the following variations:
model <- gam(rsp ~ s(pred1,bs="ts") + s(pred2,bs="ts") + s(pred3,bs="ts") + s(pred4,bs="ts"), data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="cr") + s(pred2,bs="cr") + s(pred3,bs="cr") + s(pred4,bs="cr"),
data = dat, method = "REML",select=T)
model <- gam(rsp ~ s(pred1,bs="cc") + s(pred2,bs="cc") + s(pred3,bs="cc") + s(pred4,bs="cc"),
data = dat, method = "REML")
model <- gam(rsp ~ s(pred1,bs="tp") + s(pred2,bs="tp") + s(pred3,bs="tp") + s(pred4,bs="tp"), data = dat, method = "REML")
In addition to specifying select = TRUE in your call to function gam, you can increase the value of argument gamma to get stronger penalization. For example, we generate some data:
library("mgcv")
set.seed(2)
dat <- gamSim(1, n=400, dist="normal", scale=5)
## Gu & Wahba 4 term additive model
We fit a GAM with 'standard' penalization and variable selection:
b <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data=dat, method = "REML")
summary(b)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x0) + s(x1) + s(x2) + s(x3)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.890 0.246 32.07 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(x0) 1.363 1.640 0.804 0.3174
## s(x1) 1.681 2.088 11.309 1.35e-05 ***
## s(x2) 5.931 7.086 16.240 < 2e-16 ***
## s(x3) 1.002 1.004 4.102 0.0435 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## R-sq.(adj) = 0.253 Deviance explained = 27.1%
## -REML = 1212.5 Scale est. = 24.206 n = 400
par(mfrow = c(2, 2))
plot(b)
We fit a GAM with stronger penalization and variable selection:
b2 <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data=dat, method = "REML", select = TRUE, gamma = 7)
## summary(b2)
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x0) + s(x1) + s(x2) + s(x3)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.8898 0.2604 30.3 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(x0) 5.330e-05 9 0.000 0.1868
## s(x1) 5.427e-01 9 0.967 7.4e-05 ***
## s(x2) 1.549e+00 9 6.210 < 2e-16 ***
## s(x3) 6.155e-05 9 0.000 0.0812 .
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## R-sq.(adj) = 0.163 Deviance explained = 16.7%
## -REML = 179.46 Scale est. = 27.115 n = 400
plot(b2)
According to the documentation, increasing the value of gamma produces smoother models, because it multiplies the effective degrees of freedom in the GCV or UBRE/AIC criterion.
A possible downside is thus that all non-linear effects will be shrunken towards linear effects, and all linear effects will be shrunken towards zero. This is what we also observe in the plots and output above: With higher value of gamma, some effects are practically penalized out (edf values close 0, F-value of 0), while the other effects are closer to linear (edf values closer to 1).