I have the following code for a simple bootstrapping example.
X <- runif(100)
errors <- rexp(length(X))-1
Y <- 1 + 2*X + errors
ols.mod <- lm(Y ~ X)
boot <- Boot(ols.mod, method = "case", R = 1000)
The above code works if executed as individual lines. However, when I wrap the code in a function as below:
test_func <- function() {
X <- runif(100)
errors <- rexp(length(X))-1
Y <- 1 + 2*X + errors
ols.mod <- lm(Y ~ X)
boot <- Boot(ols.mod, method = "case", R = 1000)
}
test_func()
Executing this yields an error as below:
Error in eval(predvars, data, env): object 'Y' not found
Any ideas on why this is happening? Thank you!
This works for me.
doBootstrapping <- function(n = 100, r = 1000) {
xy <- data.frame(X = runif(n = n))
errors <- rexp(length(xy$X))-1
xy$Y <- 1 + 2*xy$X + errors
ols.mod <- lm(Y ~ X, data = xy)
boot <- Boot(ols.mod, method = "case", R = r)
}
out <- doBootstrapping(n = 100, r = 1000)
> str(out)
List of 11
$ t0 : Named num [1:2] 0.986 1.953
..- attr(*, "names")= chr [1:2] "(Intercept)" "X"
$ t : num [1:1000, 1:2] 1.188 1.073 1.083 1.127 0.964 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "(Intercept)" "X"
$ R : num 1000
$ data :'data.frame': 100 obs. of 1 variable:
..$ .zero: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
$ seed : int [1:626] 403 334 599478801 441938554 -1355732193 -409499586 -248637084 -2048103023 20568410 1088692573 ...
$ statistic:function (data, indices, .fn)
$ sim : chr "ordinary"
$ call : language boot::boot(data = dd, statistic = boot.f, R = R, .fn = f, parallel = parallel_env, ncpus = ncores)
$ stype : chr "i"
$ strata : num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
$ weights : num [1:100] 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...
- attr(*, "class")= chr "boot"
- attr(*, "boot_type")= chr "boot"
Related
I have this code which works for list [[1]] and list of list [[200]]:
SHAP_Prep_data <- shap.prep(xgb_model = xgb.mod[[1]][[200]],
shap_contrib = shap_values_results[[1]][[200]]$shap_score %>% select(-BIAS0),
X_train = as.matrix(TrainTestData[[1]]$XTrain[[200]])
#top_n = 4
)
I can simply replace out the [[200]] for [[300]], [[400]] etc. and obtain a new data structure (the shap.prep function comes from the shapforxgboost package.
xgb.mod[[1]][[200]] is a single xgboost model
shap_values_results[[1]][[200]]$shap_score %>% select(-BIAS0) is a data frame with the following structure.
> str(shap_values_results[[1]][[200]]$shap_score %>% select(-BIAS0))
'data.frame': 2190 obs. of 29 variables:
$ holiday : num -0.276 -0.347 -0.284 -0.356 -0.197 ...
$ temp : num 0.35 0.25 0.144 0.227 0.16 ...
$ wind : num -0.116 -0.187 -0.25 -0.265 -0.135 ...
$ humidity : num -0.021 0.0125 -0.037 0.016 -0.0196 ...
$ barometer : num -0.0191742 -0.0000462 0.0444956 -0.0148842 -0.0551703 ...
$ weekday : num -0.00421 -0.00937 0.0012 -0.01194 -0.00931 ...
$ weekend : num 0 0 0 0 0 0 0 0 0 0 ...
$ workday_on_holiday : num -0.00949 -0.00949 -0.00885 -0.00949 -0.00885 ...
$ weekend_on_holiday : num 0 0 0 0 0 0 0 0 0 0 ...
$ protocol_active : num 0 0 0 0 0 0 0 0 0 0 ...
$ text_fog : num 0.00714 0.00714 0.00783 0.00783 0.00772 ...
$ text_light_rain : num -0.000364 -0.000364 -0.000364 -0.000364 -0.000364 ...
$ text_mostly_cloudy : num -0.0013 -0.0013 -0.0013 -0.0013 -0.0013 ...
$ text_passing_clouds : num 0.00135 0.00152 0.00363 0.00152 0.00345 ...
$ text_rain : num -0.0000682 -0.0000682 -0.0000682 -0.0000682 -0.0000682 ...
$ text_scattered_clouds: num -0.0941 -0.0832 -0.1497 -0.0813 -0.0965 ...
$ text_sunny : num 0.000635 0.007435 0.009286 0.007435 0.007009 ...
$ month_1 : num 0.045 0.0503 0.062 0.062 0.0484 ...
$ month_2 : num 0.0602 0.0529 0.0526 0.0529 0.1008 ...
$ month_3 : num 0.0467 0.0348 0.0333 0.0348 0.0467 ...
$ month_4 : num -0.03439 -0.03439 -0.00777 -0.03439 -0.00164 ...
$ month_5 : num -0.02191 -0.02191 -0.00836 -0.02026 -0.01533 ...
$ month_6 : num -0.05498 -0.00637 -0.04769 -0.05101 -0.05155 ...
$ month_7 : num -0.1302 -0.1126 -0.0878 -0.0963 -0.1535 ...
$ month_8 : num -0.0418 -0.051 -0.0727 -0.0437 -0.0957 ...
$ month_9 : num 0.164 0.185 0.141 0.193 0.122 ...
$ month_10 : num 0.206 0.251 0.243 0.251 0.211 ...
$ month_11 : num 0.0929 0.0744 0.0302 0.0568 0.0961 ...
$ month_12 : num 0.059 0.0608 0.0806 0.0608 0.0788 ...
Finally as.matrix(TrainTestData[[1]]$XTrain[[200]]) is a dgcMatrix which I convert to a simple matrix using as.matrix() which has structure:
> str(as.matrix(TrainTestData[[1]]$XTrain[[200]]))
num [1:2190, 1:29] 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:29] "holiday" "temp" "wind" "humidity" ...
I have 3 pieces of data I would like to apply the shap.prep function to.
The desired output would be to have a list (or list of lists) where the shap.prep function has been applied. The function requires 3 inputs shap.prep(xgb_model = NULL, shap_contrib = NULL, X_train,
top_n = NULL) which is what I am providing.
How can I use imap correctly to pass all three objects to the shap_prep function and obtain lists as my output?
It's difficult for me to give some dput() data since the I am not sure if its possible to dput() a trained XGBoost model.
EDIT:
I am adding the closest thing I can get to a reproducible example.
data(iris)
df <- split(iris, iris$Species) # I just want to create some lists here
library(xgboost)
library(SHAPforxgboost)
dtrainFunction <- function(i){
dt = xgb.DMatrix(data = data.matrix(i[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]), label = i$Species)
}
dtrain <- map(df, dtrainFunction) # I just apply the dtrainFunction which just puts each list into an xgb.DMatrix
xgb.mod <- map(dtrain, ~xgboost(data = .x, nround = 20)) # Apply the xgboost model to each list
# could not get this part of the code to work but it's not important. I manually put the results into a list below.
# shap_values_function <- function(j){
# map2(
# .x = xgb.mod[[j]],
# .y = df[[j]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")],
# ~shap.values(xgb_model = .x, X_train = as.matrix(.y))
# )
# }
#
# shap_values_results <- lapply(seq(1:3), shap_values_function)
# Here I manually put the results into a list which are lists of shap.values
shap_values_results <- list(
shap.values(xgb_model = xgb.mod[[1]], X_train = as.matrix(df[[1]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")])),
shap.values(xgb_model = xgb.mod[[2]], X_train = as.matrix(df[[2]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")])),
shap.values(xgb_model = xgb.mod[[3]], X_train = as.matrix(df[[3]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]))
)
# Something is wrong here which is something to do with shap_contrib and BIAS0
SHAP_Prep_data <- shap.prep(xgb_model = xgb.mod[[1]],
shap_contrib = shap_values_results[[1]]$shap_score[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")],
X_train = as.matrix(df[[1]])
#top_n = 4
)
shap.plot.summary(data_long = SHAP_Prep_data)
My actual code for the SHAP_Prep_data data is slightly different to the one above since I have list of lists.
EDIT 2:
I have tried the following which throws an error:
SHAP_Prep_data <- pmap(
list(
.x = xgb.model[[1]],
.y = shap_values_results[[1]],
.z = TrainTestData[[1]]$XTrain
), ~shap.prep(
xgb_model = .x,
shap_contrib = .y,
X_train = as.matrix(.z))
)
Error in as.matrix(.z) : object '.z' not found
EDIT 3:
When I apply the function on the iris data example:
SHAP_Prep_data <- pmap(
list(
.x = xgb.mod,
.y = shap_values_results,
.z = dtrain
), ~shap.prep(
xgb_model = .x,
shap_contrib = .y,
X_train = as.matrix(.z))
)
Error in as.matrix(.z) : object '.z' not found
EDIT 4:
I want to be able to access the $shap_score data which is created from the shap.values function used earlier (and also remove the column BIAS0 in the data from the following line).
shap_contrib = shap_values_results[[1]][[1300]]$shap_score %>% select(-BIAS0)
So would another map be needed here? or should I extract the shap_score data earlier in the function and remove the BIAS0 column there so that I can just call NEWDATA_shap_score[[1]][[1300]]?
The issue is that
str(as.matrix(df[[1]]))
#chr [1:50, 1:5] "5.1" "4.9" "4.7" "4.6" "5.0" "5.4" "4.6" "5.0" "4.4" "4.9" "5.4" "4.8" "4.8" "4.3" "5.8" "5.7" "5.4" "5.1" "5.7" "5.1" ...
# - attr(*, "dimnames")=List of 2
# ..$ : chr [1:50] "1" "2" "3" "4" ...
# ..$ : chr [1:5] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ...
returns a character matrix as the last column is a character column. Remove the last column and then do the conversion
out <- shap.prep(xgb_model = xgb.mod[[1]],
shap_contrib = shap_values_results[[1]]$shap_score[,
c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")],
X_train = as.matrix(df[[1]][-5]) ###
#top_n = 4
)
Regarding using this with pmap
out2 <- pmap(list( xgb.model[[1]],
shap_values_results[[1]],
TrainTestData[[1]]$XTrain),
~shap.prep(
xgb_model = ..1,
shap_contrib = ..2$shap_score %>% select(-BIAS0),
X_train = as.matrix(..3)))
If we also want to apply this on the list of lists
pmap(list(xgb.model,
shap_values_results,
TrainTestData), ~
pmap(list(..1, ..2, ..3$xTrain), ~
shap.prep(xgb_model = ..1,
shap_contrib = ..2$shap_score %>% select(-BIAS0),
X_train = as.matrix(..3))))
It's hard to say without a reproducible example, but it sounds like you want pmap rather than imap
a <- list(letters[1:3])
b <- list(letters[4:6])
c <- list(letters[7:9])
purrr::pmap(list(a,b,c), function (x,y,z) paste(x, y, z))
#> [[1]]
#> [1] "a d g" "b e h" "c f i"
Created on 2020-01-08 by the reprex package (v0.3.0)
I have undertaken ARIMA modelling using the auto.arima function for 91 models. The outputs are sitting in a list of lists.
The structure of the outputs for one model looks like the following:
List of 19
$ coef : Named num [1:8] -3.17e-01 -3.78e-01 -8.02e-01 -5.39e+04 -1.33e+05 ...
..- attr(*, "names")= chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
$ sigma2 : num 6.37e+10
$ var.coef : num [1:8, 1:8] 1.84e-02 8.90e-03 -7.69e-03 -8.80e+02 2.83e+03 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
.. ..$ : chr [1:8] "ar1" "ar2" "ma1" "Price.Diff" ...
$ mask : logi [1:8] TRUE TRUE TRUE TRUE TRUE TRUE ...
$ loglik : num -1189
$ aic : num 2395
$ arma : int [1:7] 2 1 0 0 1 1 0
$ residuals: Time-Series [1:87] from 1 to 87: 1810 -59503 263294 240970 94842 ...
$ call : language auto.arima(y = x[, 2], stepwise = FALSE, approximation = FALSE, xreg = x[, 3:ncol(x)], x = list(x = c(1856264.57,| __truncated__ ...
$ series : chr "x[, 2]"
$ code : int 0
$ n.cond : int 0
$ nobs : int 86
$ model :List of 10
..$ phi : num [1:2] -0.317 -0.378
..$ theta: num -0.802
..$ Delta: num 1
..$ Z : num [1:3] 1 0 1
..$ a : num [1:3] -599787 284456 1887763
..$ P : num [1:3, 1:3] 0.00 0.00 -4.47e-23 0.00 3.33e-16 ...
..$ T : num [1:3, 1:3] -0.317 -0.378 1 1 0 ...
..$ V : num [1:3, 1:3] 1 -0.802 0 -0.802 0.643 ...
..$ h : num 0
..$ Pn : num [1:3, 1:3] 1.00 -8.02e-01 -1.83e-23 -8.02e-01 6.43e-01 ...
$ bic : num 2417
$ aicc : num 2398
$ xreg : Time-Series [1:87, 1:5] from 1 to 87: -0.866 -0.466 -1.383 -0.999 -0.383 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "Price.Diff" "Easter" "Christmas" "High.Week" ...
$ x : Time-Series [1:87] from 1 to 87: 1856265 1393925 2200962 2209996 2161707 ...
$ fitted : Time-Series [1:87] from 1 to 87: 1854455 1453429 1937668 1969026 2066864 ...
- attr(*, "class")= chr [1:3] "ARIMA" "forecast_ARIMA" "Arima"
When printed the output looks as follows:
Series: x[, 2]
Regression with ARIMA(2,1,1) errors
Coefficients:
ar1 ar2 ma1 Price.Diff Easter Christmas High.Week Low.Week
-0.3170 -0.3777 -0.8017 -53931.11 -133187.55 -53541.62 -347146.59 216202.71
s.e. 0.1356 0.1319 0.1069 28195.33 68789.25 23396.62 -74115.78 66881.15
sigma^2 estimated as 6.374e+10: log likelihood=-1188.69
AIC=2395.38 AICc=2397.75 BIC=2417.47
I have written the following to export my models to text file format:
# export model outputs to newly created folder
for(i in 1:length(ts_outputs)){
sink(paste0(names(ts_outputs[i]), ".txt"))
print(ts_outputs[i])
sink()
}
This works, to view the model outputs themselves, however I need to be able to import the model outputs back into R to use them to forecast out my time series' forward.
I am assuming that I need to put them back into the original structure once re-imported.
Is there a certain package that has already been written to do this?
Are text files the way to go for the original exporting?
I believe the following is the source code from the forecast package which writes the outputs (https://rdrr.io/github/ttnsdcn/forecast-package/src/R/arima.R):
if (length(x$coef) > 0) {
cat("\nCoefficients:\n")
coef <- round(x$coef, digits=digits)
if (se && nrow(x$var.coef)) {
ses <- rep(0, length(coef))
ses[x$mask] <- round(sqrt(diag(x$var.coef)), digits=digits)
coef <- matrix(coef, 1, dimnames=list(NULL, names(coef)))
coef <- rbind(coef, s.e.=ses)
}
print.default(coef, print.gap=2)
}
cm <- x$call$method
if (is.null(cm) || cm != "CSS")
{
cat("\nsigma^2 estimated as ", format(x$sigma2, digits=digits),
": log likelihood=", format(round(x$loglik, 2)),"\n",sep="")
npar <- length(x$coef) + 1
nstar <- length(x$residuals) - x$arma[6] - x$arma[7]*x$arma[5]
bic <- x$aic + npar*(log(nstar) - 2)
aicc <- x$aic + 2*npar*(nstar/(nstar-npar-1) - 1)
cat("AIC=", format(round(x$aic, 2)), sep="")
cat(" AICc=", format(round(aicc, 2)), sep="")
cat(" BIC=", format(round(bic, 2)), "\n",sep="")
}
else cat("\nsigma^2 estimated as ", format(x$sigma2, digits=digits),
": part log likelihood=", format(round(x$loglik, 2)),
"\n", sep="")
invisible(x)
}
Appreciate any direction/advice.
I'm using MLR package and I stumbled on a problem with an S4 object. More specifically it's the slot name that causes the trouble. I'm looking for a way to change the slot's name, not the value.
Here's a reproducible code example that generates the object in question:
lrn1 = makeLearner("classif.lda", predict.type = "prob")
lrn2 = makeLearner("classif.ksvm", predict.type = "prob")
lrns = list(lrn1, lrn2)
rdesc.outer = makeResampleDesc("CV", iters = 5)
ms = list(auc, mmce)
bmr = benchmark(lrns, tasks = sonar.task, resampling = rdesc.outer,
measures = ms, show.info = FALSE)
preds = getBMRPredictions(bmr, drop = TRUE)
ROCRpreds = lapply(preds, asROCRPrediction)
ROCRperfs = lapply(ROCRpreds, function(x) ROCR::performance(x, "tpr", "fpr"))
The object is made of two lists and I need to change the name slots in both of them. Instead of x.values and y.values the names should be x and y respectively.
str(ROCRperfs$classif.lda)
Formal class 'performance' [package "ROCR"] with 6 slots
..# x.name : chr "False positive rate"
..# y.name : chr "True positive rate"
..# alpha.name : chr "Cutoff"
..# x.values :List of 5
.. ..$ : num [1:43] 0 0 0 0 0 ...
.. ..$ : num [1:42] 0 0 0 0.0526 0.0526 ...
.. ..$ : num [1:42] 0 0 0 0.05 0.05 0.05 0.05 0.05 0.05 0.05 ...
.. ..$ : num [1:43] 0 0 0.0476 0.0476 0.0476 ...
.. ..$ : num [1:43] 0 0 0 0 0 ...
..# y.values :List of 5
.. ..$ : num [1:43] 0 0.0417 0.0833 0.125 0.1667 ...
.. ..$ : num [1:42] 0 0.0455 0.0909 0.0909 0.1364 ...
.. ..$ : num [1:42] 0 0.0476 0.0952 0.0952 0.1429 ...
.. ..$ : num [1:43] 0 0.0476 0.0476 0.0952 0.1429 ...
.. ..$ : num [1:43] 0 0.0435 0.087 0.1304 0.1739 ...
..# alpha.values:List of 5
.. ..$ : num [1:43] Inf 1 1 1 1 ...
.. ..$ : num [1:42] Inf 1 1 1 0.999 ...
.. ..$ : num [1:42] Inf 1 1 1 1 ...
.. ..$ : num [1:43] Inf 1 1 0.999 0.999 ...
.. ..$ : num [1:43] Inf 1 1 1 1 ...
As I'm beginner to OOP in R all I could was to print the slot with slot().
The bottom line is that all I want to do with the object in question is to plot is as follows:
plot(ROCRperfs[[1]], col = "blue", avg = "vertical", spread.estimate = "stderror",
show.spread.at = seq(0.1, 0.8, 0.1), plotCI.col = "blue", plotCI.lwd = 2, lwd = 2)
You cannot change the structure of an S4 class once it's defined. This is a feature, not a bug. By imposing restrictions on what can be done, S4 reduces the chance of bugs creeping into your code.
For example, consider what might happen if you changed the slotnames in the object to x and y, and then passed the object to a function that's expecting x.values and y.values. By not allowing you to make this change, S4 rules out the possibility that code down the line will be given an object whose structure they can't handle.
For your use case, you can just plot the x.values and y.values slots individually:
plot(ROCRperfs[[1]]#x.values, ROCRperfs[[1]]#y.values,
col = "blue", avg = "vertical", spread.estimate = "stderror",
show.spread.at = seq(0.1, 0.8, 0.1), plotCI.col = "blue",
plotCI.lwd = 2, lwd = 2))
I need a faster way of doing linear regression than the lm() method. I found that lm.fit() is quite a bit faster but I'm wondering how to use the results. For example using this code:
x = 1:5
y = 5:1
regr = lm.fit(as.matrix(x), y)
str(regr)
Outputs:
List of 8
$ coefficients : Named num 0.636
..- attr(*, "names")= chr "x1"
$ residuals : num [1:5] 4.364 2.727 1.091 -0.545 -2.182
$ effects : Named num [1:5] -4.719 1.69 -0.465 -2.619 -4.774
..- attr(*, "names")= chr [1:5] "x1" "" "" "" ...
$ rank : int 1
$ fitted.values: num [1:5] 0.636 1.273 1.909 2.545 3.182
$ assign : NULL
$ qr :List of 5
..$ qr : num [1:5, 1] -7.416 0.27 0.405 0.539 0.674
..$ qraux: num 1.13
..$ pivot: int 1
..$ tol : num 1e-07
..$ rank : int 1
..- attr(*, "class")= chr "qr"
$ df.residual : int 4
I'm expecting intercept = 6 and slope = -1 but the result above doesn't contain anyhing near that. Also, does lm.fit() output r squared?
lm.fit allows to do things much more manually, so, as #MrFlick commented, we must include the intercept manually as well using cbind(1, x) as the design matrix. The R^2 is not provided but we may easily compute it:
x <- 1:5
y <- 5:1 + rnorm(5)
regr <- lm.fit(cbind(1, x), y)
regr$coef
# x
# 5.2044349 -0.5535963
1 - var(regr$residuals) / var(y) # R^2
# [1] 0.3557227
1 - var(regr$residuals) / var(y) * (length(y) - 1) / regr$df.residual # Adj. R^2
# [1] 0.1409636
I tried to find the subset but it's showing error as :
I am performing Data Envelopment Analysis using Benchmarking Package in R.
Although I saw similar Question were asked before but it didn't help me .
Update :Structure and Summary of Database
I am performing DEA for V6 and V7.
I guess you need
Large.Cap$V1[e_crs$eff > 0.85]
Using a reproducible example from ?dea
library(Benchmarking)
x <- matrix(c(100,200,300,500,100,200,600),ncol=1)
y <- matrix(c(75,100,300,400,25,50,400),ncol=1)
Large.Cap <- data.frame(v1= LETTERS[1:7], v2= 1:7)
e_crs <- dea(x, y, RTS='crs', ORIENTATION='in')
e_crs
#[1] 0.7500 0.5000 1.0000 0.8000 0.2500 0.2500 0.6667
The e_crs object is a list
str(e_crs)
#List of 12
# $ eff : num [1:7] 0.75 0.5 1 0.8 0.25 ...
# $ lambda : num [1:7, 1:7] 0 0 0 0 0 0 0 0 0 0 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:7] "L1" "L2" "L3" "L4" ...
# $ objval : num [1:7] 0.75 0.5 1 0.8 0.25 ...
# $ RTS : chr "crs"
# $ primal : NULL
# $ dual : NULL
# $ ux : NULL
# $ vy : NULL
# $ gamma :function (x)
# $ ORIENTATION: chr "in"
# $ TRANSPOSE : logi FALSE
# $ param : NULL
# - attr(*, "class")= chr "Farrell"
We extract the 'eff' list element from 'e_crs' to subset the 'v1' column in 'Large.Cap' dataset.
droplevels(Large.Cap$v1[e_crs$eff > 0.85])
#[1] C
#Levels: C