Incorrect number of dimensions in forecasting regression model - r

I am using regression model and forecasting its data.
I have the following code:
y <- M3[[1909]]$x
data_ts <- window(y, start=1987, end = 1991-.1)
fit <- tslm(data_ts ~ trend + season)
summary(fit)
It works until now and while forecasting,
plot(forecast(fit, h=18, level=c(80,90,95,99)))
It gives the following error:
Error in `[.default`(X, , piv, drop = FALSE) :
incorrect number of dimensions
Appreciate your help.

This works for me using the current CRAN version (8.15) of the forecast package:
library(forecast)
library(Mcomp)
y <- M3[[1909]]$x
data_ts <- window(y, start=1987, end = 1991-.1)
fit <- tslm(data_ts ~ trend + season)
summary(fit)
#>
#> Call:
#> tslm(formula = data_ts ~ trend + season)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -204.81 -73.66 -11.44 69.99 368.96
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4438.403 65.006 68.277 < 2e-16 ***
#> trend 2.402 1.323 1.815 0.07828 .
#> season2 43.298 84.788 0.511 0.61289
#> season3 598.145 84.819 7.052 3.84e-08 ***
#> season4 499.993 84.870 5.891 1.19e-06 ***
#> season5 673.940 84.942 7.934 3.05e-09 ***
#> season6 604.988 85.035 7.115 3.20e-08 ***
#> season7 571.785 85.148 6.715 1.03e-07 ***
#> season8 695.533 85.282 8.156 1.64e-09 ***
#> season9 176.930 85.436 2.071 0.04603 *
#> season10 656.028 85.610 7.663 6.58e-09 ***
#> season11 -260.875 85.804 -3.040 0.00453 **
#> season12 -887.062 91.809 -9.662 2.79e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 119.9 on 34 degrees of freedom
#> Multiple R-squared: 0.949, Adjusted R-squared: 0.931
#> F-statistic: 52.74 on 12 and 34 DF, p-value: < 2.2e-16
plot(forecast(fit, h=18, level=c(80,90,95,99)))
Created on 2022-01-02 by the reprex package (v2.0.1)
Perhaps you're loading some other packages that are over-writing forecast().

Related

How to fit beta-binomial model on proportional data (not counts) in gamlss

I want to fit beta-binomial regression. I don't have counts but proportions that I want to fit. Here's example:
library(dplyr)
library(gamlss)
df <- tibble(
cluster = LETTERS[1:20]
) |>
mutate(
p = rbeta(n(), 1, 1),
n = as.integer(3 * runif(n()))
)
fit <- gamlss(
p ~ log(n),
weights = n,
data = df,
family = BB(mu.link='identity')
)
I get error:
Error in while (abs(olddv - dv) > cc && itn < cyc) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Warnings look like:
In dbinom(x, size = bd, prob = mu, log = log) : non-integer x = 0.834502
Note that I DON'T want to rounded number of successes such as mutate(y = round(p * n)).
The help file for the BB() family suggests that the dependent variable is expected to be a two-column matrix of the numbers of successes and failures. If you've got p (the probability of success) and you've got n (the number of trials), then you can make both the number of successes k=floor(p*n) and the number of failures notk = n-k. Then, you can do as I did below.
library(dplyr)
library(gamlss)
df <- tibble(
cluster = LETTERS[1:20]
) |>
mutate(
p = rbeta(n(), 1, 1),
n = as.integer(100 * runif(n()))
)
df <- df %>%
mutate(k = floor(p*n),
notk = n-k)
fit <- gamlss(
cbind(k, notk) ~ cluster,
data = df,
family = BB(mu.link='logit')
)
#> ******************************************************************
#> Family: c("BB", "Beta Binomial")
#>
#> Call:
#> gamlss(formula = cbind(k, notk) ~ cluster, family = BB(mu.link = "logit"),
#> data = df)
#>
#> Fitting method: RS()
#>
#> ------------------------------------------------------------------
#> Mu link function: logit
#> Mu Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 8.130e-01 3.836e-01 2.119 0.03406 *
#> clusterB -8.130e-01 2.036e+00 -0.399 0.68973
#> clusterC -3.686e+01 1.000e+05 0.000 0.99971
#> clusterD -2.970e+00 6.922e-01 -4.291 1.78e-05 ***
#> clusterE 3.618e-01 4.843e-01 0.747 0.45508
#> clusterF -3.381e-01 5.317e-01 -0.636 0.52479
#> clusterG -3.569e+00 6.506e-01 -5.485 4.13e-08 ***
#> clusterH -1.118e+00 4.356e-01 -2.566 0.01030 *
#> clusterI -1.712e+00 4.453e-01 -3.845 0.00012 ***
#> clusterJ 1.825e+00 6.315e-01 2.889 0.00386 **
#> clusterK -3.686e+01 1.000e+05 0.000 0.99971
#> clusterL -5.247e-01 4.602e-01 -1.140 0.25419
#> clusterM 1.439e+00 7.167e-01 2.008 0.04464 *
#> clusterN 9.161e-02 4.721e-01 0.194 0.84613
#> clusterO -2.405e+00 1.000e+05 0.000 0.99998
#> clusterP 3.034e-01 5.583e-01 0.543 0.58686
#> clusterQ -1.523e+00 5.389e-01 -2.826 0.00471 **
#> clusterR -2.498e+00 6.208e-01 -4.024 5.73e-05 ***
#> clusterS 1.006e+00 5.268e-01 1.910 0.05619 .
#> clusterT -6.228e-02 4.688e-01 -0.133 0.89433
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> ------------------------------------------------------------------
#> Sigma link function: log
#> Sigma Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -36.034363 0.005137 -7014 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> ------------------------------------------------------------------
#> No. of observations in the fit: 20
#> Degrees of Freedom for the fit: 21
#> Residual Deg. of Freedom: -1
#> at cycle: 9
#>
#> Global Deviance: 69.49748
#> AIC: 111.4975
#> SBC: 132.4079
#> ******************************************************************
Created on 2023-01-20 by the reprex package (v2.0.1)

Can I parallelize a function which by default returns a list? (in R)

Can I parallelize a function which by default returns a list? (in R) I have tried with the parLapply function of the parallel package, but I did not succeed.
Yes, parLapply can return a list of lists. If the function called by parLapply returns a list that's what you get.
library(parallel)
# data
data(mtcars)
# function - model fit on bootstrapped samples from the Boston dataset
model_fit <- function(x) {
n <- nrow(mtcars)
i <- sample(n, n, replace = TRUE)
fit <- lm(mpg ~ ., data = mtcars[i, ])
fit
}
# detect the number of cores
n.cores <- detectCores() - 2L
# Repl bootstrap replicates
Repl <- 100L
cl <- makeCluster(n.cores)
clusterExport(cl, "mtcars")
model_list <- parLapply(cl, 1:Repl, model_fit)
stopCluster(cl)
# a list of class "lm"
summary(model_list[[1]])
#>
#> Call:
#> lm(formula = mpg ~ ., data = mtcars[i, ])
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.86272 -0.80106 -0.08815 0.68233 2.79325
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -19.353679 20.403857 -0.949 0.353648
#> cyl -9.469350 1.921317 -4.929 7.10e-05 ***
#> disp 0.008405 0.014370 0.585 0.564876
#> hp 0.037101 0.020395 1.819 0.083185 .
#> drat 9.573765 2.289820 4.181 0.000421 ***
#> wt -2.543876 2.111203 -1.205 0.241630
#> qsec 3.360474 1.091440 3.079 0.005692 **
#> vs -32.223824 6.698522 -4.811 9.39e-05 ***
#> am -26.326478 5.534103 -4.757 0.000107 ***
#> gear 11.903213 2.420963 4.917 7.30e-05 ***
#> carb -4.022473 1.047176 -3.841 0.000949 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.745 on 21 degrees of freedom
#> Multiple R-squared: 0.9437, Adjusted R-squared: 0.9169
#> F-statistic: 35.2 on 10 and 21 DF, p-value: 7.204e-11
Created on 2022-09-02 by the reprex package (v2.0.1)

Extract confidence interval for both values of binary variable for glm()?

I want to analyze the relation between whether someone smoked or not and the number of drinks of alcohol.
The reproducible data set:
smoking_status
alcohol_drinks
1
2
0
5
1
2
0
1
1
0
1
0
0
0
1
9
1
6
1
5
I have used glm() to analyse this relation:
glm <- glm(smoking_status ~ alcohol_drinks, data = data, family = binomial)
summary(glm)
confint(glm)
Using the above I'm able to extract the p-value and the confidence interval for the entire set.
However, I would like to extract the confidence interval for each smoking status, so that I can produce this results table:
Alcohol drinks, mean (95%CI)
p-values
Smokers
X (X - X)
0.492
Non-smokers
X (X - X)
How can I produce this?
First of all, the response alcohol_drinks is not binary, a logistic regression is out of the question. Since the response is counts data, I will fit a Poisson model.
To have confidence intervals for each binary value of smoking_status, coerce to factor and fit a model without an intercept.
x <- 'smoking_status alcohol_drinks
1 2
0 5
1 2
0 1
1 0
1 0
0 0
1 9
1 6
1 5'
df1 <- read.table(textConnection(x), header = TRUE)
pois_fit <- glm(alcohol_drinks ~ 0 + factor(smoking_status), data = df1, family = poisson(link = "log"))
summary(pois_fit)
#>
#> Call:
#> glm(formula = alcohol_drinks ~ 0 + factor(smoking_status), family = poisson(link = "log"),
#> data = df1)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -2.6186 -1.7093 -0.8104 1.1389 2.4957
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> factor(smoking_status)0 0.6931 0.4082 1.698 0.0895 .
#> factor(smoking_status)1 1.2321 0.2041 6.036 1.58e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#> Null deviance: 58.785 on 10 degrees of freedom
#> Residual deviance: 31.324 on 8 degrees of freedom
#> AIC: 57.224
#>
#> Number of Fisher Scoring iterations: 5
confint(pois_fit)
#> Waiting for profiling to be done...
#> 2.5 % 97.5 %
#> factor(smoking_status)0 -0.2295933 1.399304
#> factor(smoking_status)1 0.8034829 1.607200
#>
exp(confint(pois_fit))
#> Waiting for profiling to be done...
#> 2.5 % 97.5 %
#> factor(smoking_status)0 0.7948568 4.052378
#> factor(smoking_status)1 2.2333058 4.988822
Created on 2022-06-04 by the reprex package (v2.0.1)
Edit
The edit to the question states that the problem was reversed, what is asked is to find out the effect of alcohol drinking on smoking status. And with a binary response, individuals can be smokers or not, a logistic regression is a possible model.
bin_fit <- glm(smoking_status ~ alcohol_drinks, data = df1, family = binomial(link = "logit"))
summary(bin_fit)
#>
#> Call:
#> glm(formula = smoking_status ~ alcohol_drinks, family = binomial(link = "logit"),
#> data = df1)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.7491 -0.8722 0.6705 0.8896 1.0339
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.3474 0.9513 0.365 0.715
#> alcohol_drinks 0.1877 0.2730 0.687 0.492
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 12.217 on 9 degrees of freedom
#> Residual deviance: 11.682 on 8 degrees of freedom
#> AIC: 15.682
#>
#> Number of Fisher Scoring iterations: 4
# Odds ratios
exp(coef(bin_fit))
#> (Intercept) alcohol_drinks
#> 1.415412 1.206413
exp(confint(bin_fit))
#> Waiting for profiling to be done...
#> 2.5 % 97.5 %
#> (Intercept) 0.2146432 11.167555
#> alcohol_drinks 0.7464740 2.417211
Created on 2022-06-05 by the reprex package (v2.0.1)
Another way to conduct a logistic regression is to regress the cumulative counts of smokers on increasing numbers of alcoholic drinks. In order to do this, the data must be sorted by alcohol_drinks, so I will create a second data set, df2. Code inspired this in this RPubs post.
df2 <- df1[order(df1$alcohol_drinks), ]
Total <- sum(df2$smoking_status)
df2$smoking_status <- cumsum(df2$smoking_status)
fit <- glm(cbind(smoking_status, Total - smoking_status) ~ alcohol_drinks, data = df2, family = binomial())
summary(fit)
#>
#> Call:
#> glm(formula = cbind(smoking_status, Total - smoking_status) ~
#> alcohol_drinks, family = binomial(), data = df2)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -0.9714 -0.2152 0.1369 0.2942 0.8975
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -1.1671 0.3988 -2.927 0.003428 **
#> alcohol_drinks 0.4437 0.1168 3.798 0.000146 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 23.3150 on 9 degrees of freedom
#> Residual deviance: 3.0294 on 8 degrees of freedom
#> AIC: 27.226
#>
#> Number of Fisher Scoring iterations: 4
# Odds ratios
exp(coef(fit))
#> (Intercept) alcohol_drinks
#> 0.3112572 1.5584905
exp(confint(fit))
#> Waiting for profiling to be done...
#> 2.5 % 97.5 %
#> (Intercept) 0.1355188 0.6569898
#> alcohol_drinks 1.2629254 2.0053079
plot(smoking_status/Total ~ alcohol_drinks,
data = df2,
xlab = "Alcoholic Drinks",
ylab = "Proportion of Smokers")
lines(df2$alcohol_drinks, fit$fitted, type="l", col="red")
title(main = "Alcohol and Smoking")
Created on 2022-06-05 by the reprex package (v2.0.1)

How to automatically inserting default key in prais winsten estimation?

I got an error message saying
Error in prais_winsten(agvprc.lm1, data = data) :
argument "index" is missing, with no default
How can I avoid inputting all the features
agvprc.lm1=lm(log(avgprc) ~ mon+tues+wed+thurs+t+wave2+wave3)
summary(agvprc.lm1)
agvprc.lm.pw = prais_winsten(agvprc.lm1, data=data)
summary(agvprc.lm.pw)
Not sure if I've understood the question correctly, but to avoid the "Error in prais_winsten(agvprc.lm1, data = data) : argument "index" is missing, with no default" error you need to provide an 'index' to the function (which is a character variable of "ID" and "time"). Using the inbuilt mtcars dataset as an example, with "cyl" as "time":
library(tidyverse)
#install.packages("prais")
library(prais)
#> Loading required package: sandwich
#> Loading required package: pcse
#>
#> Attaching package: 'pcse'
#> The following object is masked from 'package:sandwich':
#>
#> vcovPC
ggplot(mtcars, aes(x = cyl, y = mpg, group = cyl)) +
geom_boxplot() +
geom_jitter(aes(color = hp), width = 0.2)
agvprc.lm1 <- lm(log(mpg) ~ cyl + hp, data = mtcars)
summary(agvprc.lm1)
#>
#> Call:
#> lm(formula = log(mpg) ~ cyl + hp, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.35699 -0.09882 0.01111 0.11948 0.24118
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.7829495 0.1062183 35.615 < 2e-16 ***
#> cyl -0.1072513 0.0279213 -3.841 0.000615 ***
#> hp -0.0011031 0.0007273 -1.517 0.140147
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.1538 on 29 degrees of freedom
#> Multiple R-squared: 0.7503, Adjusted R-squared: 0.7331
#> F-statistic: 43.57 on 2 and 29 DF, p-value: 1.83e-09
agvprc.lm.pw <- prais_winsten(formula = agvprc.lm1,
data = mtcars,
index = c("hp", "cyl"))
#> Iteration 0: rho = 0
#> Iteration 1: rho = 0.6985
#> Iteration 2: rho = 0.7309
#> Iteration 3: rho = 0.7285
#> Iteration 4: rho = 0.7287
#> Iteration 5: rho = 0.7287
#> Iteration 6: rho = 0.7287
#> Iteration 7: rho = 0.7287
summary(agvprc.lm.pw)
#>
#> Call:
#> prais_winsten(formula = agvprc.lm1, data = mtcars, index = c("hp",
#> "cyl"))
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.33844 -0.08166 0.03109 0.13612 0.25811
#>
#> AR(1) coefficient rho after 7 iterations: 0.7287
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.7643405 0.1116876 33.704 <2e-16 ***
#> cyl -0.1061198 0.0298161 -3.559 0.0013 **
#> hp -0.0011470 0.0007706 -1.489 0.1474
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.1077 on 29 degrees of freedom
#> Multiple R-squared: 0.973, Adjusted R-squared: 0.9712
#> F-statistic: 523.3 on 2 and 29 DF, p-value: < 2.2e-16
#>
#> Durbin-Watson statistic (original): 0.1278
#> Durbin-Watson statistic (transformed): 0.4019
Created on 2022-02-28 by the reprex package (v2.0.1)
# To present the index without having to write out all of the variables
# perhaps you could use:
agvprc.lm.pw <- prais_winsten(formula = agvprc.lm1,
data = mtcars,
index = names(agvprc.lm1$coefficients)[3:2])
summary(agvprc.lm.pw)
#>
#> Call:
#> prais_winsten(formula = agvprc.lm1, data = mtcars, index = names(agvprc.lm1$coefficients)[3:2])
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.33844 -0.08166 0.03109 0.13612 0.25811
#>
#> AR(1) coefficient rho after 7 iterations: 0.7287
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.7643405 0.1116876 33.704 <2e-16 ***
#> cyl -0.1061198 0.0298161 -3.559 0.0013 **
#> hp -0.0011470 0.0007706 -1.489 0.1474
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.1077 on 29 degrees of freedom
#> Multiple R-squared: 0.973, Adjusted R-squared: 0.9712
#> F-statistic: 523.3 on 2 and 29 DF, p-value: < 2.2e-16
#>
#> Durbin-Watson statistic (original): 0.1278
#> Durbin-Watson statistic (transformed): 0.4019
NB. there are a number of assumptions here that may not apply to your actual data; with more information, such as a minimal reproducible example, you will likely get a better/more informed answer on stackoverflow

Cluster standard errors for Heckman selection model in R

I am trying to obtain clustered standard errors for a Heckman selection model given the output from the "sampleSelection" package (selection command).
For replication, I am using the examples given in the STATA documentation (see Examples 1 & 2 on pages 7 & 9 - http://www.stata.com/manuals13/rheckman.pdf).
In R, I obtain the results from Example 1 as follows:
install.packages("readstata13")
library(readstata13)
install.packages("sampleSelection")
library(sampleSelection)
## Read STATA data
dat <- data.table(read.dta13("http://www.stata-press.com/data/r13/womenwk.dta"))
## Summary statistics
summary(dat[,.(age, education, married, children, wage, county)])
## Define indicator whether wage variable is defined
dat[, lfp := !is.na(wage)]
## STATA command Example 1: heckman wage educ age, select(married children educ age)
heckmanML <- selection(selection = lfp ~ married + children + education + age, outcome = wage ~ education + age, data = dat)
## Results Example 1
summary(heckmanML)
## STATA command Example 2: heckman wage educ age, select(married children educ age) vce(cluster county)
## <<stuck here>>
Any ideas how I can replicate the last command using the vce(cluster) option? I tried to play around with cluster.vcov from the multiwayvcov package but got stuck with the following error:
cluster.vcov(heckmanML, eval(heckmanML$call$data)[,county])
Error in `[<-.data.frame`(`*tmp*`, i, "K", value = numeric(0)) : replacement has length zero
I adapted code from Mahmoud Arai. I fiddled with the degrees-of-freedom to match the output from the Stata manual and I replace the variance-covariance matrix stored with the fitted model so that the summary output matches Stata's output (see second set of output below).
library(haven)
library(dplyr, warn.conflicts = FALSE)
library(sampleSelection)
#> Loading required package: maxLik
#> Loading required package: miscTools
#>
#> Please cite the 'maxLik' package as:
#> Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
#>
#> If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
#> https://r-forge.r-project.org/projects/maxlik/
library(sandwich)
## Read STATA data
dat <-
read_stata("http://www.stata-press.com/data/r13/womenwk.dta") %>%
## Define indicator whether wage variable is defined
mutate(lfp = !is.na(wage))
## STATA command Example 1: heckman wage educ age, select(married children educ age)
heckmanML <- selection(selection = lfp ~ married + children + education + age,
outcome = wage ~ education + age, data = dat)
## Results Example 1
summary(heckmanML)
#> --------------------------------------------
#> Tobit 2 model (sample selection model)
#> Maximum Likelihood estimation
#> Newton-Raphson maximisation, 3 iterations
#> Return code 8: successive function values within relative tolerance limit (reltol)
#> Log-Likelihood: -5178.304
#> 2000 observations (657 censored and 1343 observed)
#> 10 free parameters (df = 1990)
#> Probit selection equation:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -2.491015 0.189340 -13.156 < 2e-16 ***
#> married 0.445171 0.067395 6.605 5.07e-11 ***
#> children 0.438707 0.027783 15.791 < 2e-16 ***
#> education 0.055732 0.010735 5.192 2.30e-07 ***
#> age 0.036510 0.004153 8.790 < 2e-16 ***
#> Outcome equation:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.48579 1.07704 0.451 0.652
#> education 0.98995 0.05326 18.588 <2e-16 ***
#> age 0.21313 0.02060 10.345 <2e-16 ***
#> Error terms:
#> Estimate Std. Error t value Pr(>|t|)
#> sigma 6.00479 0.16572 36.23 <2e-16 ***
#> rho 0.70350 0.05123 13.73 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------
vcovCL <- function(fm, cluster) {
M <- length(unique(cluster))
N <- length(cluster)
dfc <- M/(M-1)
u <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum))
dfc * sandwich(fm, meat=crossprod(u)/N)
}
heckmanML[["vcovAll"]] <- vcovCL(heckmanML, dat$county)
summary(heckmanML)
#> --------------------------------------------
#> Tobit 2 model (sample selection model)
#> Maximum Likelihood estimation
#> Newton-Raphson maximisation, 3 iterations
#> Return code 8: successive function values within relative tolerance limit (reltol)
#> Log-Likelihood: -5178.304
#> 2000 observations (657 censored and 1343 observed)
#> 10 free parameters (df = 1990)
#> Probit selection equation:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -2.491015 0.115330 -21.599 < 2e-16 ***
#> married 0.445171 0.073147 6.086 1.39e-09 ***
#> children 0.438707 0.031239 14.044 < 2e-16 ***
#> education 0.055732 0.011004 5.065 4.47e-07 ***
#> age 0.036510 0.004038 9.042 < 2e-16 ***
#> Outcome equation:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.48579 1.30210 0.373 0.709
#> education 0.98995 0.06001 16.498 <2e-16 ***
#> age 0.21313 0.02099 10.151 <2e-16 ***
#> Error terms:
#> Estimate Std. Error t value Pr(>|t|)
#> sigma 6.00479 0.15520 38.691 <2e-16 ***
#> rho 0.70350 0.07088 9.925 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------
Created on 2021-05-08 by the reprex package (v2.0.0)

Resources