How to calculate panel bootsrapped standard errors with R? - r

I recently changed from STATA to R and somehow struggles to find some corresponding commands. I would like to get panel bootsrapped standard errors from a Fixed Effect model using the plm library as described here here for STATA users:
My questions concern the approach in general (whether boot is the appropriate library or the library(meboot)
)
How to solve for that particular error using boot:
First get some panel data:
library(plm)
data(EmplUK) # from plm library
test<-function(data, i) coef(plm(wage~emp+sector,data = data[i,],
index=c("firm","year"),model="within"))
Second:
library(boot)
boot<-boot(EmplUK, test, R = 100)
> boot<-boot(EmplUK, test, R = 100)
duplicate couples (time-id)
Error in pdim.default(index[[1]], index[[2]]) :
Called from: top level
Browse[1]>

For some reason , boot will pass an index ( original here) to plm with duplicated values. You should remove all duplicated values and assert that the index is unique before passing it to plm.
test <- function(data,original) {
coef(plm(wage~emp+sector,data = data[unique(original),],
index=c("firm","year"),model="within"))
}
boot(EmplUK, test, R = 100)
## ORDINARY NONPARAMETRIC BOOTSTRAP
## Call:
## boot(data = EmplUK, statistic = test, R = 100)
## Bootstrap Statistics :
## original bias std. error
## t1* -0.1198127 -0.01255009 0.05269375

Related

error when using causalweights package in R

I was trying to estimate a causal effect using inverse probability weighting from the causalweightspackage. However, I keep running into the following error message:
Error in model.frame.default(formula = d ~ x, drop.unused.levels = TRUE) :
variable lengths differ (found for 'x')
I want to estimate the causal effect taking into consideration a matrix including multiple control variables. When using a single control from the data-set, R manages to generate an estimate, but when I try to use the matrix including all my control variables, I receive the above-mentioned error message.
My code is as follows and appears to generate estimates when using a single control instead of my predefined matrix of multiple controls as observable in the following code:
attach(data_clean2)
controls <- cbind(marits_1, nationality1, mother_tongue1, educ1,
lastj_fct1, child_subsidies, contr_2y,
unempl_r, gdp_gr, insured_earn)
ipw_atet <- treatweight(y = duration_ue2, # take initial data
d = treatment,
x = controls,
ATET = TRUE, # if = FALSE, estimates ATE (default)
trim = (1-pscore_max0),
boot = 2)
Has anyone encountered similar problems and found a solution?
Thanks in advance

ggcoef_model error when two random intercepts

When trying to graph the conditional fixed effects of a glmmTMB model with two random intercepts in GGally I get the error:
There was an error calling "tidy_fun()". Most likely, this is because the
function supplied in "tidy_fun=" was misspelled, does not exist, is not
compatible with your object, or was missing necessary arguments (e.g. "conf.level=" or "conf.int="). See error message below.
Error: Error in "stop_vctrs()":
! Can't recycle "..1" (size 3) to match "..2" (size 2).`
I have tinkered with figuring out the issue and it seems to be related to the two random intercepts included in the model. I have also tried extracting the coefficient and standard error information separately through broom.mixed::tidy and then feeding the data frame into GGally:ggcoef() with no avail. Any suggestions?
# Example with built-in randu data set
data(randu)
randu$A <- factor(rep(c(1,2), 200))
randu$B <- factor(rep(c(1,2,3,4), 100))
# Model
test <- glmmTMB(y ~ x + z + (0 +x|A) + (1|B), family="gaussian", data=randu)
# A few of my attempts at graphing--works fine when only one random effects term is in model
ggcoef_model(test)
ggcoef_model(test, tidy_fun = broom.mixed::tidy)
ggcoef_model(test, tidy_fun = broom.mixed::tidy, conf.int = T, intercept=F)
ggcoef_model(test, tidy_fun = broom.mixed::tidy(test, effects="fixed", component = "cond", conf.int = TRUE))
There are some (old!) bugs that have recently been fixed (here, here) that would make confidence interval reporting on RE parameters break for any model with multiple random terms (I think). I believe that if you are able to install updated versions of both glmmTMB and broom.mixed:
remotes::install_github("glmmTMB/glmmTMB/glmmTMB#ci_tweaks")
remotes::install_github("bbolker/broom.mixed")
then ggcoef_model(test) will work.

Unused argument error when building a Confusion Matrix in R

I am currently trying to run Logistic Regression model on my DF.
While I was creating a new modelframe with the actual and predicted values i get get the following error message.
Error
Error in confusionMatrix(as.factor(log_class), lgtest$Satisfaction, positive = "satisfied") :
unused argument (positive = "satisfied")
This is my model:
#### Logistic regression model
log_model = glm(Satisfaction~., data = lgtrain, family = "binomial")
summary(log_model)
log_preds = predict(log_model, lgtest[,1:22], type = "response")
head(log_preds)
log_class = array(c(99))
for (i in 1:length(log_preds)){
if(log_preds[i]>0.5){
log_class[i]="satisfied"}else{log_class[i]="neutral or dissatisfied"}}
### Creating a new modelframe containing the actual and predicted values.
log_result = data.frame(Actual = lgtest$Satisfaction, Prediction = log_class)
lgtest$Satisfaction = factor(lgtest$Satisfaction, c(1,0),labels=c("satisfied","neutral or dissatisfied"))
lgtest
confusionMatrix(log_class, log_preds, threshold = 0.5) ####this works
mr1 = confusionMatrix(as.factor(log_class),lgtest$Satisfaction, positive = "satisfied") ## this is the line that causes the error
I had same problem. I typed "?confusionMatrix" and take this output:
Help on topic 'confusionMatrix' was found in the following packages:
confusionMatrix
(in package InformationValue in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Create a confusion matrix
(in package caret in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
Confusion Matrix
(in package ModelMetrics in library /home/beyza/R/x86_64-pc-linux-gnu-library/3.6)
As we can understand from here, since it is in more than one package, we need to specify which package we want to use.
So I typed code with "caret::confusionMatrix(...)" and it worked!
This is how we can write the code to get rid of argument error when building a confusion matrix in R
caret::confusionMatrix(
data = new_tree_predict$predicted,
reference = new_tree_predict$actual,
positive = "True"
)

How to obtain coefficients' p-values from a nested random effect model using lmeresampler

I estimated a mixed effect model with a nested random effect structure (participants were in different groups) with the lmer command of the lme4 package.
mixed.model <- lmer(ln.v ~ treatment*level+age+income+(1 | group/participant),data=data)
Then I bootstrapped the bootstrap command from the lmeresampler package because of the nested structure. I used the semi-parametric bootstrap.
boot.mixed.model <- bootstrap(model = mixed.model, type = "cgr", fn = extractor, B = 10000, resample=c(data$group,data$participant))
I can obtain bootsrapped confidence intervals via boot.ci (package boot) but in addition I want to report the coefficients' p-values. The output of the bootstrapped model boot.mixed.model provides only the bias and the standard error:
Bootstrap Statistics :
original bias std. error
t1* 0.658442415 -7.060056e-02 2.34685668
t2* -0.452128438 -2.755208e-03 0.17041300
…
What is the best way to calculate the p-values based on these values?
I am unaware of the package called lmeresampler, and it seems to have been removed from cran due to compatibility issues (failed cran checks).
Also, the question does not include data and extractor is not defined, so the example is not reproducible. However the output is the same as you would get by using the bootMer function from lme4 so produce and example using the inbuilt function.
Basically this follows the example from the help(bootMer) page, but expanded for the specific problem. If the object returend by the lmeresampler package is similar, it will contain the objects used.
Reproducible example
library(lme4)
data(Dyestuff, package = "lme4")
fm01ML <- lmer(Yield ~ 1|Batch, Dyestuff, REML = FALSE)
Now the bootMer function simply requires a function, which outputs a vector of interesting parameters.
StatFun <- function(merMod){
pars <- getME(merMod, c("fixef", "theta", "sigma"))
c(beta = pars$fixef, theta = unname(pars$theta * pars$sigma), sigma = pars$sigma) ### <<== Error corrected
}
We can perform our bootstrapping by using the bootMer, which also contains parametric options in type (i suggest reading the details in the help(bootMer) page for more information)
boo01 <- bootMer(fm01ML, StatFun, nsim = 100, seed = 101)
Now for more precise p-values, I'd advice p-values greater closer to 1000 but for time reasons it might not be feasible in every circumstance.
Regardless the output is stored in a matrix t, which we can use to perform a simple Kolmogorov-supremum test:
H0 <- c(0, 0, 0)
Test <- sweep(abs(boo01$t), 2, H0, "-") <= H0 ###<<=== Error corrected
pVals <- colSums(Test)/nrow(Test)
print(pVals)
#output#
beta.(Intercept) theta sigma
0.00 0.12 0.00

Error when bootstrapping large n with boot package (error: integer overflow)

Why can I not bootstrap a statistic with large n using the boot package? Although, 150,000 obs is not large, so I don't know why this isn't working.
Example
library(boot)
bs <- boot(rnorm(150000), sum, R = 1000)
bs
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = rnorm(150000), statistic = sum, R = 1000)
Bootstrap Statistics :
WARNING: All values of t1* are NA
Error Message
In statistic(data, i[r, ], ...) : integer overflow - use
sum(as.numeric(.))
You're not using boot() as documented (which is, admittedly, surprisingly complex). From ?boot:
In all other cases ‘statistic’
must take at least two arguments. The first argument passed
will always be the original data. The second will be a
vector of indices, frequencies or weights which define the
bootstrap sample.
I think you want:
bsum <- function(x,i) sum(x[i])
bs <- boot(rnorm(150000), bsum, R = 1000)
I haven't taken the time to figure out what boot() is actually doing in your case - almost certainly not what you want though.

Resources