Values of bootstrap statistics - r

I want to get the values of bootstrap statistics (original, bias and error) into a separate list - but I cannot figure out how to do that.
Here's an example:
> library(boot)
> set.seed(123)
> mean.fun <- function(data, idx) { mean(data[idx]) }
> data <- boot(data=rnorm(100), statistic=mean.fun, R=999)
> names(data)
[1] "t0" "t" "R" "data"
[5] "seed" "statistic" "sim" "call"
[9] "stype" "strata" "weights"
> data
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = rnorm(100), statistic = mean.fun, R = 999)
Bootstrap Statistics :
original bias std. error
t1* 0.09040591 0.004751773 0.08823615
Now, instead of text I want the actual values. Apparently data$t0 is the "original" but I don't see how to get the values for bias and error.
Also, since typing a function name gives you its code, I typed boost in R and copied a snippet from the source code, and tried to search it on my local R installation. But couldn't find anything. Why, shouldn't R grab that source code from a local storage?

The std.error and bias are not stored as a part of the boot object. It is calculated on the fly (see: https://stat.ethz.ch/pipermail/r-help/2011-July/284660.html)
In your case, try:
mean(data$t) - data$t0
sd(data$t)

Related

Trying to create a data frame for mlogit and keep running into this error Error in names(data)[ix] : invalid subscript type 'language'

I am trying to use this data set https://data.cityofnewyork.us/Transportation/Citywide-Mobility-Survey-Person-Survey-2019/6bqn-qdwq to create an mnl model but every time I try to change my original data frame like this
nydata_df = dfidx(nydata, shape="wide",choice="work_mode",varying = sort)
I get this error here.
Error in names(data)[ix] : invalid subscript type 'language'
I'm unclear about what is causing this error I think it is something wrong with dplyr but I am not sure.
According to this vignette from the mlogit package, the varying argument should be used to specify which variables should be "lengthened" when converting a dataframe from wide to long using dfidx. Are you actively trying to lengthen your dataframe (like in the style of dplyr::pivot_longer())?
If you aren't, I don't believe that you need the varying argument (see ?stats::reshape for more info on varying). If you want to use the varying argument, you should specify specific variables rather than only "sort" (example1, example2). Additionally, when I run your models, I don't get a NaN for McFadden's R2, p-value, or chi-square test. Are your packages fully updated?
library(dfidx)
library(mlogit)
library(performance) # to extract McFadden's R2 easily
packageVersion("dfidx")
#> [1] '0.0.5'
packageVersion("mlogit")
#> [1] '1.1.1'
packageVersion("dplyr")
#> [1] '1.0.10'
# currently running RStudio Version 2022.7.2.576
nydata <- read.csv(url("https://data.cityofnewyork.us/api/views/6bqn-qdwq/rows.csv?accessType=DOWNLOAD"))
nydata_df <- dfidx(data = nydata,
shape = "wide",
choice = "work_mode")
m <- mlogit(work_mode ~ 1, nydata_df)
#summary(m)
r2_mcfadden(m)
#> McFadden's R2
#> 1.110223e-16
m3 <- mlogit(work_mode ~ 1 | harassment_mode + age, nydata_df)
#summary(m3)
r2_mcfadden(m3)
#> McFadden's R2
#> 0.03410362

How to hide packages intersection when method called in R

I have one problem with packages intersection.
I use arfima and forecast packages which both have identical name methods (like AIC, BIC, etc).
So, when I try to run code, I have a problem with methods mismatch - R try to use method from the last loaded package. Yes, I can call exported methods by"::", I know. The main problem connected with internal code of these methods - they use own package methods in the methods body without "::". For example, when I try to use run this code:
require(arfima);
require(forecast);
x <- rnorm(100);
fit <- arfima::arfima(x);
summary(fit);
It gives those errors:
> fit <- arfima::arfima(x);
Note: autoweed is ON. It is possible, but not likely,
that unique modes may be lost.
Beginning the fits with 2 starting values.
> summary(fit);
Error in AIC.logLik(logl) :unrealized type (29) in 'eval'
So, this code w/o loaded forecast package works well (I ran it in "clear" R session):
require(arfima);
#require(forecast);
x <- rnorm(100);
fit <- arfima::arfima(x);
summary(fit);
# Note: autoweed is ON. It is possible, but not likely,
# that unique modes may be lost.
# Beginning the fits with 2 starting values.
#
# summary(fit);
#
# Call:
#
# arfima::arfima(z = x)
#
#
# Mode 1 Coefficients:
# Estimate Std. Error Th. Std. Err. z-value Pr(>|z|)
# d.f -0.0208667 0.0770519 0.0779679 -0.27081 0.78653
# Fitted mean -0.0432115 0.0845518 NA -0.51107 0.60930
# sigma^2 estimated as 0.851957; Log-likelihood = 8.51214; AIC = -11.0243; BIC = 282.976
#
# Numerical Correlations of Coefficients:
# d.f Fitted mean
# d.f 1.00 -0.09
# Fitted mean -0.09 1.00
#
# Theoretical Correlations of Coefficients:
# d.f
# d.f 1.00
#
# Expected Fisher Information Matrix of Coefficients:
# d.f
# d.f 1.65
So, is it possible to hide package for the part or executed code? Something like this:
require(arfima);
#require(forecast);
hide(forecast);
x <- rnorm(100);
fit <- arfima::arfima(x);
summary(fit);
unhide(forecast);
or like this:
require(arfima);
#require(forecast);
used(arfima);
x <- rnorm(100);
fit <- arfima::arfima(x);
summary(fit);
unused(arfima);
Perhaps you should update all your package installations and retry. I don't get any error with your code using a newly installed binary copy of arfima. AIC is generic and when you look at the loaded methods you do see one for objects of class "arfima", so no use of the "::" function should be needed:
> methods(AIC)
[1] AIC.arfima* AIC.default* AIC.logLik*
see '?methods' for accessing help and source code
There is a detach function but it's often not enough to completely set aside a package unless its "unload" parameter is set to TRUE, and in this case may be overkill since your diagnosis appears to be incorrect. This shows that strategy to be feasible but I still suspect unnecessary:
require(arfima);
require(forecast);
detach(package:forecast,unload=TRUE)
x <- rnorm(100);
fit <- arfima::arfima(x);
summary(fit); library(forecast)
This shows that there are class specific methods for summary so there should be no errors caused by package confusion there:
> methods(summary)
[1] summary,ANY-method summary,DBIObject-method
[3] summary,quantmod-method summary.aov
[5] summary.aovlist* summary.arfima*
[7] summary.Arima* summary.arma*
[9] summary.aspell* summary.check_packages_in_dir*
[11] summary.connection summary.data.frame
[13] summary.Date summary.default
[15] summary.ecdf* summary.ets*
[17] summary.factor summary.forecast*
snipped res
Furthermore you are incorrect in thinking there is a forecast::BIC. With forecast and arfima loaded we see only :
methods(BIC)
[1] BIC.arfima*
(And there is no suggestion on the Index page of forecast that either AIC or BIC are defined within that package. So any object created by the forecast functions would be handled by AIC.default and would be expected to fail with BIC.

error message in R : if (nomZ %in% coded) { : argument is of length zero

I'm very new to R (and stackoverflow). I've been trying to conduct a simple slopes analysis for my continuous x dichotomous regression model using lmres, and simpleSlope from the pequod package.
My variables:
SLS - continuous DV
csibdiff - continuous predictor (I already manually centered the variable with another code)
culture - dichotomous moderator
newmod<-lmres(SLS ~ csibdiff*culture, data=sibdat2)
newmodss <-simpleSlope(newmod, pred="csibdiff", mod1="culture")
However, after running the simpleSlope function, I get this error message:
Error in if (nomZ %in% coded) { : argument is of length zero
I don't understand the nomZ part but I assume something was wrong with my variables. What does this mean? I don't have a nomZ named thing in my data at all. None of my variables are null class (I checked them with the is.null() function), and I didn't seem to have accidentally deleted the contents of the variable (I checked with the table() function).
If anyone else can suggest another function/package that I can do a simple slope analysis in, as well, I'd appreciate it. I've been stuck on this problem for a few days now.
EDIT: I subsetted the relevant variables into a csv file.
https://www.dropbox.com/s/6j82ky457ctepkz/sibdat2.csv?dl=0
tl;dr it looks like the authors of the package were thinking primarily about continuous moderators; if you specify mod1="cultureEuropean" (i.e. to match the name of the corresponding parameter in the output) the function returns an answer (I have no idea if it's sensible or not ...)
It would be a service to the community to let the maintainers of the pequod package (maintainer("pequod")) know about this issue ...
Read data and replicate error:
sibdat2 <- read.csv("sibdat2.csv")
library(pequod)
newmod <- lmres(SLS ~ csibdiff*culture, data=sibdat2)
newmodss <- simpleSlope(newmod, pred="csibdiff", mod1="culture")
Check the data:
summary(sibdat2)
We do have some NA values in csibdiff, so try removing these ...
sibdat2B <- na.omit(sibdat2)
But that doesn't actually help (same error as before).
Plot the data to check for other strangeness
library(ggplot2); theme_set(theme_bw())
ggplot(sibdat2B,aes(csibdiff,SLS,colour=culture))+
stat_sum(aes(size=factor(..n..))) +
geom_smooth(method="lm")
There's not much going on here, but nothing obviously wrong either ...
Use traceback() to see approximately where the problem is:
traceback()
3: simple.slope(object, pred, mod1, mod2, coded)
2: simpleSlope.default(newmod, pred = "csibdiff", mod1 = "culture")
1: simpleSlope(newmod, pred = "csibdiff", mod1 = "culture")
We could use options(error=recover) to jump right to the scene of the crime, but let's try step-by-step debugging instead ...
debug(pequod:::simple.slope)
As we go through we can see this:
nomZ <- names(regr$coef)[pos_mod]
nomZ ## character(0)
And looking a bit farther back we can see that pos_mod is also a zero-length integer. Farther back, we see that the code is looking through the parameter names (row names of the variance-covariance matrix) for the name of the modifier ... but it's not there.
debug: pos_pred_mod1 <- fI + grep(paste0("\\b", mod1, "\\b"), jj[(fI +
1):(fI + fII)])
Browse[2]> pos_mod
## integer(0)
Browse[2]> jj[1:fI]
## [[1]]
## [1] "(Intercept)"
##
## [[2]]
## [1] "csibdiff"
##
## [[3]]
## [1] "cultureEuropean"
Browse[2]> mod1
## [1] "culture"
The solution is to tell simpleSlope to look for a variable that is there ...
(newmodss <- simpleSlope(newmod, pred="csibdiff", mod1="cultureEuropean"))
## Simple Slope:
## simple slope standard error t-value p.value
## Low cultureEuropean (-1 SD) -0.2720128 0.2264635 -1.201133 0.2336911
## High cultureEuropean (+1 SD) 0.2149291 0.1668690 1.288011 0.2019241
We do get some warnings about NaNs produced -- you'll have to dig farther yourself to see if you need to worry about them.

How to calculate panel bootsrapped standard errors with R?

I recently changed from STATA to R and somehow struggles to find some corresponding commands. I would like to get panel bootsrapped standard errors from a Fixed Effect model using the plm library as described here here for STATA users:
My questions concern the approach in general (whether boot is the appropriate library or the library(meboot)
)
How to solve for that particular error using boot:
First get some panel data:
library(plm)
data(EmplUK) # from plm library
test<-function(data, i) coef(plm(wage~emp+sector,data = data[i,],
index=c("firm","year"),model="within"))
Second:
library(boot)
boot<-boot(EmplUK, test, R = 100)
> boot<-boot(EmplUK, test, R = 100)
duplicate couples (time-id)
Error in pdim.default(index[[1]], index[[2]]) :
Called from: top level
Browse[1]>
For some reason , boot will pass an index ( original here) to plm with duplicated values. You should remove all duplicated values and assert that the index is unique before passing it to plm.
test <- function(data,original) {
coef(plm(wage~emp+sector,data = data[unique(original),],
index=c("firm","year"),model="within"))
}
boot(EmplUK, test, R = 100)
## ORDINARY NONPARAMETRIC BOOTSTRAP
## Call:
## boot(data = EmplUK, statistic = test, R = 100)
## Bootstrap Statistics :
## original bias std. error
## t1* -0.1198127 -0.01255009 0.05269375

How to retrieve value of a function in R

when calling a function in R, how can I retrieve the result values. For example, I used 'roc' function and I need to extract AUC value and CI (0.6693 and 0.6196-0.7191 respectively in the following example).
> roc(tmpData[,lenCnames], fitted(model), ci=TRUE)
Call:
roc.default(response = tmpData[, lenCnames], predictor = fitted(model), ci = TRUE)
Data: fitted(model) in 127 controls (tmpData[, lenCnames] 0) < 3248 cases (tmpData[, lenCnames] 1).
Area under the curve: 0.6693
95% CI: 0.6196-0.7191 (DeLong)
I can use the following to fetch these values with associated texts.
> z$auc
Area under the curve: 0.6693
> z$ci
95% CI: 0.6196-0.7191 (DeLong)
Is there a way to get only the values and not the text.
I do now how to get these using 'regular expression' or 'strsplit' function, but I suspect there should be some other way to directly access these values.
It's helpful to use reproducible examples when asking a question. Also best to refer to the library you're asking about ("pROC"), since it is not loaded with base R. pROC has functions that extract auc and ci.auc objects from the roc object.
>library("pROC")
>data(aSAH)
# Basic example
>z <- roc(aSAH$outcome, aSAH$s100b,
levels=c("Good", "Poor"))
# Examining the class of 'auc' output shows us that it is also of class 'numeric'
> class(auc(z))
[1] "auc" "numeric"
# calling 'as.numeric' will extract the value
> as.numeric(auc(z))
[1] 0.7313686
# calling 'as.numeric' on the 'ci.auc' object extracts three values.
as.numeric(ci(z))
[1] 0.6301182 0.7313686 0.8326189
# The ones we want are 1 and 3
> as.numeric(ci(z))[c(1,3)]
[1] 0.6301182 0.8326189
Using the functions str, class, and attributes will often help you figure out how to get what you want out of an object.

Resources