I am having a question concerning the function coeftest(). I am trying to figure out where from I could get any results of the R-squared of this function. I was fitting a standard ,multiple linear regression as follows:
Wetterstation.lm <- lm(temp~t+temp_auto+dum.jan+dum.feb+dum.mar+dum.apr+dum.may+dum.jun+dum.aug+dum.sep+dum.oct+dum.nov+dum.dec+
dum.jan*t+dum.feb*t+dum.mar*t+dum.apr*t+dum.may*t+dum.jun*t+dum.aug*t+dum.sep*t+dum.oct*t+dum.nov*t+dum.dec*t)
Upfront I defined each of these variables separately and my results were the following:
> summary(Wetterstation.lm)
Call:
lm(formula = temp ~ t + temp_auto + dum.jan + dum.feb + dum.mar +
dum.apr + dum.may + dum.jun + dum.aug + dum.sep + dum.oct +
dum.nov + dum.dec + dum.jan * t + dum.feb * t + dum.mar *
t + dum.apr * t + dum.may * t + dum.jun * t + dum.aug * t +
dum.sep * t + dum.oct * t + dum.nov * t + dum.dec * t)
Residuals:
Min 1Q Median 3Q Max
-10.9564 -1.3214 0.0731 1.3621 9.9312
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.236e+00 9.597e-02 33.714 < 2e-16 ***
t 1.206e-05 3.744e-06 3.221 0.00128 **
temp_auto 8.333e-01 2.929e-03 284.503 < 2e-16 ***
dum.jan -3.550e+00 1.252e-01 -28.360 < 2e-16 ***
dum.feb -3.191e+00 1.258e-01 -25.367 < 2e-16 ***
dum.mar -2.374e+00 1.181e-01 -20.105 < 2e-16 ***
dum.apr -1.582e+00 1.142e-01 -13.851 < 2e-16 ***
dum.may -7.528e-01 1.106e-01 -6.809 9.99e-12 ***
dum.jun -3.283e-01 1.106e-01 -2.968 0.00300 **
dum.aug -2.144e-01 1.094e-01 -1.960 0.05005 .
dum.sep -8.009e-01 1.103e-01 -7.260 3.96e-13 ***
dum.oct -1.752e+00 1.123e-01 -15.596 < 2e-16 ***
dum.nov -2.622e+00 1.181e-01 -22.198 < 2e-16 ***
dum.dec -3.287e+00 1.226e-01 -26.808 < 2e-16 ***
t:dum.jan 2.626e-06 5.277e-06 0.498 0.61877
t:dum.feb 2.479e-06 5.404e-06 0.459 0.64642
t:dum.mar 1.671e-06 5.277e-06 0.317 0.75145
t:dum.apr 1.357e-06 5.320e-06 0.255 0.79872
t:dum.may -3.173e-06 5.276e-06 -0.601 0.54756
t:dum.jun 2.481e-06 5.320e-06 0.466 0.64098
t:dum.aug 5.998e-07 5.298e-06 0.113 0.90985
t:dum.sep -5.997e-06 5.321e-06 -1.127 0.25968
t:dum.oct -5.860e-06 5.277e-06 -1.110 0.26683
t:dum.nov -4.215e-06 5.320e-06 -0.792 0.42820
t:dum.dec -2.526e-06 5.277e-06 -0.479 0.63217
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.12 on 35744 degrees of freedom
Multiple R-squared: 0.9348, Adjusted R-squared: 0.9348
F-statistic: 2.136e+04 on 24 and 35744 DF, p-value: < 2.2e-16
Now I was trying to adjust for heteroskedasticity and autocorrelation using the function coeftest() and vcovHAC as follows:
library("lmtest")
library("sandwich")
Wetterstation.lm.HAC <- coeftest(Wetterstation.lm, vcov = vcovHAC)
The results of these are that:
> Wetterstation.lm.HAC
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.2356e+00 7.8816e-02 41.0529 < 2.2e-16 ***
t 1.2059e-05 2.7864e-06 4.3280 1.509e-05 ***
temp_auto 8.3334e-01 2.9798e-03 279.6659 < 2.2e-16 ***
dum.jan -3.5505e+00 1.1843e-01 -29.9789 < 2.2e-16 ***
dum.feb -3.1909e+00 1.2296e-01 -25.9507 < 2.2e-16 ***
dum.mar -2.3741e+00 1.0890e-01 -21.8002 < 2.2e-16 ***
dum.apr -1.5821e+00 9.5952e-02 -16.4881 < 2.2e-16 ***
dum.may -7.5282e-01 8.8987e-02 -8.4599 < 2.2e-16 ***
dum.jun -3.2826e-01 8.2271e-02 -3.9899 6.622e-05 ***
dum.aug -2.1440e-01 7.7966e-02 -2.7499 0.005964 **
dum.sep -8.0094e-01 8.4456e-02 -9.4835 < 2.2e-16 ***
dum.oct -1.7519e+00 9.2919e-02 -18.8538 < 2.2e-16 ***
dum.nov -2.6224e+00 1.0028e-01 -26.1504 < 2.2e-16 ***
dum.dec -3.2873e+00 1.1393e-01 -28.8546 < 2.2e-16 ***
t:dum.jan 2.6256e-06 5.2429e-06 0.5008 0.616517
t:dum.feb 2.4790e-06 5.5284e-06 0.4484 0.653850
t:dum.mar 1.6713e-06 4.8632e-06 0.3437 0.731107
t:dum.apr 1.3567e-06 4.5670e-06 0.2971 0.766423
t:dum.may -3.1734e-06 4.2970e-06 -0.7385 0.460209
t:dum.jun 2.4809e-06 4.1490e-06 0.5979 0.549880
t:dum.aug 5.9983e-07 4.0379e-06 0.1485 0.881910
t:dum.sep -5.9975e-06 4.1675e-06 -1.4391 0.150125
t:dum.oct -5.8595e-06 4.4635e-06 -1.3128 0.189265
t:dum.nov -4.2151e-06 4.6555e-06 -0.9054 0.365263
t:dum.dec -2.5257e-06 4.9871e-06 -0.5065 0.612539
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
But as I want to add the R-squared in a table that summarizes my results I cannot figure out how to get it. Now I was wondering if there is anyone that could help with this issue and tell me where I could get the information from. Maybe I am just to dumb but as I am quite new to R I would be happy for any help I could get.
Thank you very much in advance.
Simple answer: no there is not. And also there is no reason for doing this. The coeftest() function is using the values of your given model. With stats4::coef the coeftest function is taking the coefficients of the model.
It would be possible to extract the r^2 value if the function intends to do it. Also the imtest coeftest() only returns a paste, so you can't extract values.
To sum this up: lmtest::coeftest() is using the values of lm and so the r^2 is not changing
Background about the standard error: lm uses a slightly different method to calculate the standard error. In the source code you can extract:
R <- chol2inv(Qr$qr[p1, p1, drop = FALSE])
se <- sqrt(diag(R))
So this brings us to following: lm using the Cholesky composition (i also think R using this by default).
coeftest() meanwhile using the sqrt() of the variance-covariance matrix of the residuals(see here). So the autocorrelation. vcov extracts the variance-covriance matrix of a given model (like lm)
se <- vcov.
se <- sqrt(diag(se))
I personally think the users of lm are using the chalesky composition for speed reasons, since you don't have to invert the matrix. But the writers of the lmtest package were no aware of this. But this is just a guess.
Since the t values are calculated in both packages with the estimated values and the standard error like this:
tval <- as.vector(est)/se
and the p-value is based on the tval:
2*pt(abs(tval), rdf, lower.tail = FALSE)
all the differences are based on the different estimated error. Please be aware, that the estimations are identical because coeftest() just inherits them.
Related
I am trying to figure out how to calculate the marginal effects of my model using the, "clogit," function in the survival package. The margins package does not seem to work with this type of model, but does work with "multinom" and "mclogit." However, I am investigating the affects of choice characteristics, and not individual characteristics, so it needs to be a conditional logit model. The mclogit function works with the margins package, but these results are widely different from the results using the clogit function, why is that? Any help calculating the marginal effects from the clogit function would be greatly appreciated.
mclogit output:
Call:
mclogit(formula = cbind(selected, caseID) ~ SysTEM + OWN + cost +
ENVIRON + NEIGH + save, data = atl)
Estimate Std. Error z value Pr(>|z|)
SysTEM 0.139965 0.025758 5.434 5.51e-08 ***
OWN 0.008931 0.026375 0.339 0.735
cost -0.103012 0.004215 -24.439 < 2e-16 ***
ENVIRON 0.675341 0.037104 18.201 < 2e-16 ***
NEIGH 0.419054 0.031958 13.112 < 2e-16 ***
save 0.532825 0.023399 22.771 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 18380
Residual Deviance: 16670
Number of Fisher Scoring iterations: 4
Number of observations: 8364
clogit output:
Call:
coxph(formula = Surv(rep(1, 25092L), selected) ~ SysTEM + OWN +
cost + ENVIRON + NEIGH + save + strata(caseID), data = atl,
method = "exact")
n= 25092, number of events= 8364
coef exp(coef) se(coef) z Pr(>|z|)
SysTEM 0.133184 1.142461 0.034165 3.898 9.69e-05 ***
OWN -0.015884 0.984241 0.036346 -0.437 0.662
cost -0.179833 0.835410 0.005543 -32.442 < 2e-16 ***
ENVIRON 1.186329 3.275036 0.049558 23.938 < 2e-16 ***
NEIGH 0.658657 1.932195 0.042063 15.659 < 2e-16 ***
save 0.970051 2.638079 0.031352 30.941 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
SysTEM 1.1425 0.8753 1.0685 1.2216
OWN 0.9842 1.0160 0.9166 1.0569
cost 0.8354 1.1970 0.8264 0.8445
ENVIRON 3.2750 0.3053 2.9719 3.6091
NEIGH 1.9322 0.5175 1.7793 2.0982
save 2.6381 0.3791 2.4809 2.8053
Concordance= 0.701 (se = 0.004 )
Rsquare= 0.103 (max possible= 0.688 )
Likelihood ratio test= 2740 on 6 df, p=<2e-16
Wald test = 2465 on 6 df, p=<2e-16
Score (logrank) test = 2784 on 6 df, p=<2e-16
margins output for mclogit
margins(model2A)
SysTEM OWN cost ENVIRON NEIGH save
0.001944 0.000124 -0.001431 0.00938 0.00582 0.0074
margins output for clogit
margins(model2A)
Error in match.arg(type) :
'arg' should be one of “risk”, “expected”, “lp”
I estimate a heckit-model using the heckit-model from sampleSelection.
The model looks as follows:
library(sampleSelection) Heckman = heckit(AgencyTRACE ~ SizeCat + log(Amt_Issued) + log(daysfromissuance) + log(daystomaturity) + EoW + dMon + EoM + VIX_95_Dummy + quarter, Avg_Spread_Choi ~ SizeCat + log(Amt_Issued) + log(daysfromissuance) + log(daystomaturity) + VIX_95_Dummy + TresholdHYIG_II, data=heckmandata, method = "2step")
The summary generates a probit selection equation and an outcome equation - see below:
Tobit 2 model (sample selection model)
2-step Heckman / heckit estimation
2019085 observations (1915401 censored and 103684 observed)
26 free parameters (df = 2019060)
Probit selection equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.038164 0.043275 0.882 0.378
SizeCat2 0.201571 0.003378 59.672 < 2e-16 ***
SizeCat3 0.318331 0.008436 37.733 < 2e-16 ***
log(Amt_Issued) -0.099472 0.001825 -54.496 < 2e-16 ***
log(daysfromissuance) 0.079691 0.001606 49.613 < 2e-16 ***
log(daystomaturity) -0.036434 0.001514 -24.066 < 2e-16 ***
EoW 0.021169 0.003945 5.366 8.04e-08 ***
dMon -0.003409 0.003852 -0.885 0.376
EoM 0.008937 0.007000 1.277 0.202
VIX_95_Dummy1 0.088558 0.006521 13.580 < 2e-16 ***
quarter2019.2 -0.092681 0.005202 -17.817 < 2e-16 ***
quarter2019.3 -0.117021 0.005182 -22.581 < 2e-16 ***
quarter2019.4 -0.059833 0.005253 -11.389 < 2e-16 ***
quarter2020.1 -0.005230 0.004943 -1.058 0.290
quarter2020.2 0.073175 0.005080 14.406 < 2e-16 ***
Outcome equation:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.29436 6.26019 7.395 1.41e-13 ***
SizeCat2 -25.63433 0.79836 -32.109 < 2e-16 ***
SizeCat3 -34.25275 1.48030 -23.139 < 2e-16 ***
log(Amt_Issued) -0.38051 0.39506 -0.963 0.33547
log(daysfromissuance) 0.02452 0.34197 0.072 0.94283
log(daystomaturity) 7.92338 0.24498 32.343 < 2e-16 ***
VIX_95_Dummy1 -2.34875 0.89133 -2.635 0.00841 **
TresholdHYIG_II1 10.36993 1.07267 9.667 < 2e-16 ***
Multiple R-Squared:0.0406, Adjusted R-Squared:0.0405
Error terms:
Estimate Std. Error t value Pr(>|t|)
invMillsRatio -23.8204 3.6910 -6.454 1.09e-10 ***
sigma 68.5011 NA NA NA
rho -0.3477 NA NA NA
Now I'd like to estimate a value using the outcome equation. I'd like to predict Spread_Choi_All using the following data:
newdata = data.frame(SizeCat=as.factor(1),
Amt_Issued=50*1000000,
daysfromissuance=5*365,
daystomaturity=5*365,
VIX_95_Dummy=as.factor(0),
TresholdHYIG_II=as.factor(0)
SizeCat is a categorical/factor variable with the value 1, 2 or 3.
I have tried varies ways, i.e.
predict(Heckman, part ="outcome", newdata = newdata)
I aim to predict a value (with the data from newdata) using the outcome equation (incl. the invMillsRatio). Is there a way how to predict a value from the outcome equation?
I'm trying to fit a polynomial function (for learning reasons) to the BostonHousing data with the mlr package.
I'm unable to figure out how to provide a custom formula to the lm function used, especially I want to add a polynomial function to one of the input variables (for testing purposes).
How can this best be achieved?
library(mlr)
data("BostonHousing", package = "mlbench")
regr.task <- makeRegrTask(data = BostonHousing, target = "medv")
regr.learner <- makeLearner("regr.lm")
# I would like to specify the formula used by "regr.lm" myself, how can this be achieved?
regr.train <- train(regr.learner, regr.task)
lm.results <- getLearnerModel(regr.train)
summary(lm.results)
Call:
stats::lm(formula = f, data = d)
Residuals:
Min 1Q Median 3Q Max
-15.595 -2.730 -0.518 1.777 26.199
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.646e+01 5.103e+00 7.144 3.28e-12 ***
crim -1.080e-01 3.286e-02 -3.287 0.001087 **
zn 4.642e-02 1.373e-02 3.382 0.000778 ***
indus 2.056e-02 6.150e-02 0.334 0.738288
chas1 2.687e+00 8.616e-01 3.118 0.001925 **
nox -1.777e+01 3.820e+00 -4.651 4.25e-06 ***
rm 3.810e+00 4.179e-01 9.116 < 2e-16 ***
age 6.922e-04 1.321e-02 0.052 0.958229
dis -1.476e+00 1.995e-01 -7.398 6.01e-13 ***
rad 3.060e-01 6.635e-02 4.613 5.07e-06 ***
tax -1.233e-02 3.760e-03 -3.280 0.001112 **
ptratio -9.527e-01 1.308e-01 -7.283 1.31e-12 ***
b 9.312e-03 2.686e-03 3.467 0.000573 ***
lstat -5.248e-01 5.072e-02 -10.347 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.745 on 492 degrees of freedom
Multiple R-squared: 0.7406, Adjusted R-squared: 0.7338
F-statistic: 108.1 on 13 and 492 DF, p-value: < 2.2e-16
I am trying to see in practice what was explained here what happens to the coefficients once labels are switched but I am not getting what is expected. Here is my attempt:
I am using the example of natality public-use data given as an example in "Practical Data Science with R" Where the output is a logical variable that classifies new born babies if they are atRisk with levels FALSE and TRUE
load(url("https://github.com/WinVector/zmPDSwR/tree/master/CDC/NatalRiskData.rData"))
train <- sdata[sdata$ORIGRANDGROUP<=5,]
test <- sdata[sdata$ORIGRANDGROUP>5,]
complications <- c("ULD_MECO","ULD_PRECIP","ULD_BREECH")
riskfactors <- c("URF_DIAB", "URF_CHYPER", "URF_PHYPER",
"URF_ECLAM")
y <- "atRisk"
x <- c("PWGT", "UPREVIS", "CIG_REC", "GESTREC3", "DPLURAL", complications, riskfactors)
fmla <- paste(y, paste(x, collapse="+"), sep="~")
model <- glm(fmla, data=train, family=binomial(link="logit"))
summary(model)
This results to the following coefficients:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.412189 0.289352 -15.249 < 2e-16 ***
PWGT 0.003762 0.001487 2.530 0.011417 *
UPREVIS -0.063289 0.015252 -4.150 3.33e-05 ***
CIG_RECTRUE 0.313169 0.187230 1.673 0.094398 .
GESTREC3< 37 weeks 1.545183 0.140795 10.975 < 2e-16 ***
DPLURALtriplet or higher 1.394193 0.498866 2.795 0.005194 **
DPLURALtwin 0.312319 0.241088 1.295 0.195163
ULD_MECOTRUE 0.818426 0.235798 3.471 0.000519 ***
ULD_PRECIPTRUE 0.191720 0.357680 0.536 0.591951
ULD_BREECHTRUE 0.749237 0.178129 4.206 2.60e-05 ***
URF_DIABTRUE -0.346467 0.287514 -1.205 0.228187
URF_CHYPERTRUE 0.560025 0.389678 1.437 0.150676
URF_PHYPERTRUE 0.161599 0.250003 0.646 0.518029
URF_ECLAMTRUE 0.498064 0.776948 0.641 0.521489
OK, now let us switch the labels in our atRisk variable:
esdata$atRisk <- factor(sdata$atRisk)
levels(sdata$atRisk) <- c("TRUE", "FALSE")
and re-run the above analysis where I am expecting to see a change in the signs of the above reported coefficients, however, I am getting exactly the same coefficients:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.412189 0.289352 -15.249 < 2e-16 ***
PWGT 0.003762 0.001487 2.530 0.011417 *
UPREVIS -0.063289 0.015252 -4.150 3.33e-05 ***
CIG_RECTRUE 0.313169 0.187230 1.673 0.094398 .
GESTREC3< 37 weeks 1.545183 0.140795 10.975 < 2e-16 ***
DPLURALtriplet or higher 1.394193 0.498866 2.795 0.005194 **
DPLURALtwin 0.312319 0.241088 1.295 0.195163
ULD_MECOTRUE 0.818426 0.235798 3.471 0.000519 ***
ULD_PRECIPTRUE 0.191720 0.357680 0.536 0.591951
ULD_BREECHTRUE 0.749237 0.178129 4.206 2.60e-05 ***
URF_DIABTRUE -0.346467 0.287514 -1.205 0.228187
URF_CHYPERTRUE 0.560025 0.389678 1.437 0.150676
URF_PHYPERTRUE 0.161599 0.250003 0.646 0.518029
URF_ECLAMTRUE 0.498064 0.776948 0.641 0.521489
What is that am I doing wrong here? Can you help please
its because you set train <- sdata[sdata$ORIGRANDGROUP<=5,] and then you change sdata$atRisk <- factor(sdata$atRisk) but your model is using the train dataset, whose levels DID NOT get changed.
Instead you can do
y <- "!atRisk"
x <- c("PWGT", "UPREVIS", "CIG_REC", "GESTREC3", "DPLURAL", complications, riskfactors)
fmla <- paste(y, paste(x, collapse="+"), sep="~")
model <- glm(fmla, data=train, family=binomial(link="logit"))
Call:
glm(formula = fmla, family = binomial(link = "logit"), data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.2641 0.1358 0.1511 0.1818 0.9732
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.412189 0.289352 15.249 < 2e-16 ***
PWGT -0.003762 0.001487 -2.530 0.011417 *
UPREVIS 0.063289 0.015252 4.150 3.33e-05 ***
CIG_RECTRUE -0.313169 0.187230 -1.673 0.094398 .
GESTREC3< 37 weeks -1.545183 0.140795 -10.975 < 2e-16 ***
DPLURALtriplet or higher -1.394193 0.498866 -2.795 0.005194 **
DPLURALtwin -0.312319 0.241088 -1.295 0.195163
ULD_MECOTRUE -0.818426 0.235798 -3.471 0.000519 ***
ULD_PRECIPTRUE -0.191720 0.357680 -0.536 0.591951
ULD_BREECHTRUE -0.749237 0.178129 -4.206 2.60e-05 ***
URF_DIABTRUE 0.346467 0.287514 1.205 0.228187
URF_CHYPERTRUE -0.560025 0.389678 -1.437 0.150676
URF_PHYPERTRUE -0.161599 0.250003 -0.646 0.518029
URF_ECLAMTRUE -0.498064 0.776948 -0.641 0.521489
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2698.7 on 14211 degrees of freedom
Residual deviance: 2463.0 on 14198 degrees of freedom
AIC: 2491
Number of Fisher Scoring iterations: 7
I am trying to find heteroskedasticity-robust standard errors in R, and most solutions I find are to use the coeftest and sandwich packages. However, when I use those packages, they seem to produce queer results (they're way too significant). Both my professor and I agree that the results don't look right. Could someone please tell me where my mistake is? Am I using the right package? Does the package have a bug in it? What should I use instead? Or can you reproduce the same results in STATA?
(The data is CPS data from 2010 to 2014, March samples. I created a MySQL database to hold the data and am using the survey package to help analyze it.)
Thank you in advance. (I have abridged the code somewhat to make it easier to read; let me know if you need to see more.)
>male.nat.reg <- svyglm(log(HOURWAGE) ~ AGE + I(AGE^2) + ... + OVERWORK, subset(fwyrnat2010.design, FEMALE == 0))
>summary(male.nat.reg)
Call:
NextMethod(formula = "svyglm", design)
Survey design:
subset(fwyrnat2010.design, FEMALE == 0)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.599e+00 6.069e-02 26.350 < 2e-16 ***
AGE 4.030e-02 3.358e-03 12.000 < 2e-16 ***
I(AGE^2) -4.131e-04 4.489e-05 -9.204 9.97e-16 ***
NOHSDEG -1.730e-01 1.281e-02 -13.510 < 2e-16 ***
ASSOC 1.138e-01 1.256e-02 9.060 2.22e-15 ***
SOMECOLL 5.003e-02 9.445e-03 5.298 5.11e-07 ***
BACHELOR 2.148e-01 1.437e-02 14.948 < 2e-16 ***
GRADUATE 3.353e-01 3.405e-02 9.848 < 2e-16 ***
INMETRO 3.879e-02 9.225e-03 4.205 4.93e-05 ***
NCHILDOLD 1.374e-02 4.197e-03 3.273 0.001376 **
NCHILDYOUNG 2.334e-02 6.186e-03 3.774 0.000247 ***
NOTWHITE -5.026e-02 8.583e-03 -5.856 3.92e-08 ***
MARRIED -8.226e-03 1.531e-02 -0.537 0.592018
NEVERMARRIED -4.644e-02 1.584e-02 -2.932 0.004009 **
NOTCITIZEN -6.759e-02 1.574e-02 -4.295 3.47e-05 ***
STUDENT -1.231e-01 1.975e-02 -6.231 6.52e-09 ***
VET 3.336e-02 1.751e-02 1.905 0.059091 .
INUNION 2.366e-01 1.271e-02 18.614 < 2e-16 ***
PROFOCC 2.559e-01 1.661e-02 15.413 < 2e-16 ***
TSAOCC 9.997e-02 1.266e-02 7.896 1.27e-12 ***
FFFOCC 2.076e-02 2.610e-02 0.795 0.427859
PRODOCC 2.164e-01 1.281e-02 16.890 < 2e-16 ***
LABOROCC 6.074e-02 1.253e-02 4.850 3.60e-06 ***
AFFIND 6.834e-02 2.941e-02 2.324 0.021755 *
MININGIND 3.034e-01 3.082e-02 9.846 < 2e-16 ***
CONSTIND 1.451e-01 1.524e-02 9.524 < 2e-16 ***
MANUFIND 1.109e-01 1.393e-02 7.963 8.80e-13 ***
UTILIND 1.422e-01 1.516e-02 9.379 3.78e-16 ***
WHOLESALEIND 2.884e-02 1.766e-02 1.633 0.104910
FININD 6.215e-02 2.084e-02 2.983 0.003436 **
BUSREPIND 6.588e-02 1.755e-02 3.753 0.000266 ***
SERVICEIND 5.412e-02 2.403e-02 2.252 0.026058 *
ENTERTAININD -1.192e-01 3.060e-02 -3.896 0.000159 ***
PROFIND 1.536e-01 1.854e-02 8.285 1.55e-13 ***
OVERWORK 6.738e-02 1.007e-02 6.693 6.59e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 0.1367476)
Number of Fisher Scoring iterations: 2
>coeftest(male.nat.reg, vcov = vcovHC(male.nat.reg, type = 'HC0'))
z test of coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.5992e+00 9.7176e-08 16456481 < 2.2e-16 ***
AGE 4.0296e-02 5.4766e-09 7357823 < 2.2e-16 ***
I(AGE^2) -4.1314e-04 7.3222e-11 -5642330 < 2.2e-16 ***
NOHSDEG -1.7305e-01 1.4431e-08 -11991482 < 2.2e-16 ***
ASSOC 1.1378e-01 1.4248e-08 7985751 < 2.2e-16 ***
SOMECOLL 5.0035e-02 9.9689e-09 5019088 < 2.2e-16 ***
BACHELOR 2.1476e-01 2.0588e-08 10430993 < 2.2e-16 ***
GRADUATE 3.3533e-01 8.3327e-08 4024301 < 2.2e-16 ***
INMETRO 3.8790e-02 8.9666e-09 4326013 < 2.2e-16 ***
NCHILDOLD 1.3738e-02 5.2244e-09 2629554 < 2.2e-16 ***
NCHILDYOUNG 2.3344e-02 5.5405e-09 4213300 < 2.2e-16 ***
NOTWHITE -5.0261e-02 1.0150e-08 -4951908 < 2.2e-16 ***
MARRIED -8.2263e-03 1.8867e-08 -436026 < 2.2e-16 ***
NEVERMARRIED -4.6440e-02 1.7847e-08 -2602096 < 2.2e-16 ***
NOTCITIZEN -6.7594e-02 2.4446e-08 -2765080 < 2.2e-16 ***
STUDENT -1.2306e-01 3.2514e-08 -3785014 < 2.2e-16 ***
VET 3.3356e-02 3.0996e-08 1076125 < 2.2e-16 ***
INUNION 2.3659e-01 1.7786e-08 13301699 < 2.2e-16 ***
PROFOCC 2.5594e-01 2.2177e-08 11540563 < 2.2e-16 ***
TSAOCC 9.9971e-02 1.6707e-08 5983922 < 2.2e-16 ***
FFFOCC 2.0762e-02 2.3625e-08 878801 < 2.2e-16 ***
PRODOCC 2.1638e-01 1.3602e-08 15907683 < 2.2e-16 ***
LABOROCC 6.0741e-02 1.3445e-08 4517854 < 2.2e-16 ***
AFFIND 6.8342e-02 3.2895e-08 2077563 < 2.2e-16 ***
MININGIND 3.0343e-01 3.2948e-08 9209326 < 2.2e-16 ***
CONSTIND 1.4512e-01 2.1871e-08 6635457 < 2.2e-16 ***
MANUFIND 1.1094e-01 1.9636e-08 5649569 < 2.2e-16 ***
UTILIND 1.4216e-01 2.0930e-08 6792029 < 2.2e-16 ***
WHOLESALEIND 2.8842e-02 1.8662e-08 1545525 < 2.2e-16 ***
FININD 6.2147e-02 2.8214e-08 2202691 < 2.2e-16 ***
BUSREPIND 6.5883e-02 2.7866e-08 2364269 < 2.2e-16 ***
SERVICEIND 5.4118e-02 2.4758e-08 2185907 < 2.2e-16 ***
ENTERTAININD -1.1922e-01 2.9474e-08 -4044852 < 2.2e-16 ***
PROFIND 1.5364e-01 3.0132e-08 5098879 < 2.2e-16 ***
OVERWORK 6.7376e-02 1.0981e-08 6135525 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The sandwich package is object-oriented and essentially relies on two methods being available: estfun() and bread(), see the package vignettes for more details. For objects of class svyglm these methods are not available but as svyglm objects inherit from glm the glm methods are found and used. I suspect that this leads to incorrect results in the survey context though, possibly by a weighting factor or so. I'm not familiar enough with the survey package to provide a workaround. The survey maintainer might be able to say more... Hope that helps.