imap_dfr a function to 3 lists with nested maps? - r

I have this code which works for list [[1]] and list of list [[200]]:
SHAP_Prep_data <- shap.prep(xgb_model = xgb.mod[[1]][[200]],
shap_contrib = shap_values_results[[1]][[200]]$shap_score %>% select(-BIAS0),
X_train = as.matrix(TrainTestData[[1]]$XTrain[[200]])
#top_n = 4
)
I can simply replace out the [[200]] for [[300]], [[400]] etc. and obtain a new data structure (the shap.prep function comes from the shapforxgboost package.
xgb.mod[[1]][[200]] is a single xgboost model
shap_values_results[[1]][[200]]$shap_score %>% select(-BIAS0) is a data frame with the following structure.
> str(shap_values_results[[1]][[200]]$shap_score %>% select(-BIAS0))
'data.frame': 2190 obs. of 29 variables:
$ holiday : num -0.276 -0.347 -0.284 -0.356 -0.197 ...
$ temp : num 0.35 0.25 0.144 0.227 0.16 ...
$ wind : num -0.116 -0.187 -0.25 -0.265 -0.135 ...
$ humidity : num -0.021 0.0125 -0.037 0.016 -0.0196 ...
$ barometer : num -0.0191742 -0.0000462 0.0444956 -0.0148842 -0.0551703 ...
$ weekday : num -0.00421 -0.00937 0.0012 -0.01194 -0.00931 ...
$ weekend : num 0 0 0 0 0 0 0 0 0 0 ...
$ workday_on_holiday : num -0.00949 -0.00949 -0.00885 -0.00949 -0.00885 ...
$ weekend_on_holiday : num 0 0 0 0 0 0 0 0 0 0 ...
$ protocol_active : num 0 0 0 0 0 0 0 0 0 0 ...
$ text_fog : num 0.00714 0.00714 0.00783 0.00783 0.00772 ...
$ text_light_rain : num -0.000364 -0.000364 -0.000364 -0.000364 -0.000364 ...
$ text_mostly_cloudy : num -0.0013 -0.0013 -0.0013 -0.0013 -0.0013 ...
$ text_passing_clouds : num 0.00135 0.00152 0.00363 0.00152 0.00345 ...
$ text_rain : num -0.0000682 -0.0000682 -0.0000682 -0.0000682 -0.0000682 ...
$ text_scattered_clouds: num -0.0941 -0.0832 -0.1497 -0.0813 -0.0965 ...
$ text_sunny : num 0.000635 0.007435 0.009286 0.007435 0.007009 ...
$ month_1 : num 0.045 0.0503 0.062 0.062 0.0484 ...
$ month_2 : num 0.0602 0.0529 0.0526 0.0529 0.1008 ...
$ month_3 : num 0.0467 0.0348 0.0333 0.0348 0.0467 ...
$ month_4 : num -0.03439 -0.03439 -0.00777 -0.03439 -0.00164 ...
$ month_5 : num -0.02191 -0.02191 -0.00836 -0.02026 -0.01533 ...
$ month_6 : num -0.05498 -0.00637 -0.04769 -0.05101 -0.05155 ...
$ month_7 : num -0.1302 -0.1126 -0.0878 -0.0963 -0.1535 ...
$ month_8 : num -0.0418 -0.051 -0.0727 -0.0437 -0.0957 ...
$ month_9 : num 0.164 0.185 0.141 0.193 0.122 ...
$ month_10 : num 0.206 0.251 0.243 0.251 0.211 ...
$ month_11 : num 0.0929 0.0744 0.0302 0.0568 0.0961 ...
$ month_12 : num 0.059 0.0608 0.0806 0.0608 0.0788 ...
Finally as.matrix(TrainTestData[[1]]$XTrain[[200]]) is a dgcMatrix which I convert to a simple matrix using as.matrix() which has structure:
> str(as.matrix(TrainTestData[[1]]$XTrain[[200]]))
num [1:2190, 1:29] 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:29] "holiday" "temp" "wind" "humidity" ...
I have 3 pieces of data I would like to apply the shap.prep function to.
The desired output would be to have a list (or list of lists) where the shap.prep function has been applied. The function requires 3 inputs shap.prep(xgb_model = NULL, shap_contrib = NULL, X_train,
top_n = NULL) which is what I am providing.
How can I use imap correctly to pass all three objects to the shap_prep function and obtain lists as my output?
It's difficult for me to give some dput() data since the I am not sure if its possible to dput() a trained XGBoost model.
EDIT:
I am adding the closest thing I can get to a reproducible example.
data(iris)
df <- split(iris, iris$Species) # I just want to create some lists here
library(xgboost)
library(SHAPforxgboost)
dtrainFunction <- function(i){
dt = xgb.DMatrix(data = data.matrix(i[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]), label = i$Species)
}
dtrain <- map(df, dtrainFunction) # I just apply the dtrainFunction which just puts each list into an xgb.DMatrix
xgb.mod <- map(dtrain, ~xgboost(data = .x, nround = 20)) # Apply the xgboost model to each list
# could not get this part of the code to work but it's not important. I manually put the results into a list below.
# shap_values_function <- function(j){
# map2(
# .x = xgb.mod[[j]],
# .y = df[[j]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")],
# ~shap.values(xgb_model = .x, X_train = as.matrix(.y))
# )
# }
#
# shap_values_results <- lapply(seq(1:3), shap_values_function)
# Here I manually put the results into a list which are lists of shap.values
shap_values_results <- list(
shap.values(xgb_model = xgb.mod[[1]], X_train = as.matrix(df[[1]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")])),
shap.values(xgb_model = xgb.mod[[2]], X_train = as.matrix(df[[2]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")])),
shap.values(xgb_model = xgb.mod[[3]], X_train = as.matrix(df[[3]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]))
)
# Something is wrong here which is something to do with shap_contrib and BIAS0
SHAP_Prep_data <- shap.prep(xgb_model = xgb.mod[[1]],
shap_contrib = shap_values_results[[1]]$shap_score[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")],
X_train = as.matrix(df[[1]])
#top_n = 4
)
shap.plot.summary(data_long = SHAP_Prep_data)
My actual code for the SHAP_Prep_data data is slightly different to the one above since I have list of lists.
EDIT 2:
I have tried the following which throws an error:
SHAP_Prep_data <- pmap(
list(
.x = xgb.model[[1]],
.y = shap_values_results[[1]],
.z = TrainTestData[[1]]$XTrain
), ~shap.prep(
xgb_model = .x,
shap_contrib = .y,
X_train = as.matrix(.z))
)
Error in as.matrix(.z) : object '.z' not found
EDIT 3:
When I apply the function on the iris data example:
SHAP_Prep_data <- pmap(
list(
.x = xgb.mod,
.y = shap_values_results,
.z = dtrain
), ~shap.prep(
xgb_model = .x,
shap_contrib = .y,
X_train = as.matrix(.z))
)
Error in as.matrix(.z) : object '.z' not found
EDIT 4:
I want to be able to access the $shap_score data which is created from the shap.values function used earlier (and also remove the column BIAS0 in the data from the following line).
shap_contrib = shap_values_results[[1]][[1300]]$shap_score %>% select(-BIAS0)
So would another map be needed here? or should I extract the shap_score data earlier in the function and remove the BIAS0 column there so that I can just call NEWDATA_shap_score[[1]][[1300]]?

The issue is that
str(as.matrix(df[[1]]))
#chr [1:50, 1:5] "5.1" "4.9" "4.7" "4.6" "5.0" "5.4" "4.6" "5.0" "4.4" "4.9" "5.4" "4.8" "4.8" "4.3" "5.8" "5.7" "5.4" "5.1" "5.7" "5.1" ...
# - attr(*, "dimnames")=List of 2
# ..$ : chr [1:50] "1" "2" "3" "4" ...
# ..$ : chr [1:5] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ...
returns a character matrix as the last column is a character column. Remove the last column and then do the conversion
out <- shap.prep(xgb_model = xgb.mod[[1]],
shap_contrib = shap_values_results[[1]]$shap_score[,
c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")],
X_train = as.matrix(df[[1]][-5]) ###
#top_n = 4
)
Regarding using this with pmap
out2 <- pmap(list( xgb.model[[1]],
shap_values_results[[1]],
TrainTestData[[1]]$XTrain),
~shap.prep(
xgb_model = ..1,
shap_contrib = ..2$shap_score %>% select(-BIAS0),
X_train = as.matrix(..3)))
If we also want to apply this on the list of lists
pmap(list(xgb.model,
shap_values_results,
TrainTestData), ~
pmap(list(..1, ..2, ..3$xTrain), ~
shap.prep(xgb_model = ..1,
shap_contrib = ..2$shap_score %>% select(-BIAS0),
X_train = as.matrix(..3))))

It's hard to say without a reproducible example, but it sounds like you want pmap rather than imap
a <- list(letters[1:3])
b <- list(letters[4:6])
c <- list(letters[7:9])
purrr::pmap(list(a,b,c), function (x,y,z) paste(x, y, z))
#> [[1]]
#> [1] "a d g" "b e h" "c f i"
Created on 2020-01-08 by the reprex package (v0.3.0)

Related

Shuffle Chunks of a Data Frame in R

I have a data frame of 90 rows and 90 columns.
My goal is to divide this data frame into 9 pieces, each with 30 rows and 30 columns. Then, I would like to randomly shuffle these 9 pieces for a new data frame.
I'm not sure how to approach this as I'm fairly new in R.
Any help is appreciated!
Use sample on a list of vectors, then unlist and subset:
If I understand rightly, you're trying to do something a bit like one of those 3*3 sliding picture puzzles, right? Where you split the df into chunks of 30 and then 'shuffle' those chunks. This will do that:
### Create a 90 by 90 data frame
set.seed(2)
df <- as.data.frame(array(runif(90*90), dim = c(90,90)))
### Function to randomise subsetting
rand_dims <- function(){
myDims <- list(1:30, 31:60, 61:90)
unlist(sample(myDims))
}
### Shuffle 'chunks' of df
df[rand_dims(), rand_dims()] -> shuffled
### Checking the leading row/col names
### Shows we've succeeded
colnames(shuffled)[1:5]
#> [1] "V61" "V62" "V63" "V64" "V65"
rownames(shuffled)[1:5]
#> [1] "31" "32" "33" "34" "35"
Created on 2021-01-21 by the reprex package (v0.3.0)
Perhaps you can try the following code using split + sample, which helps shuffle the chunks
set.seed(1)
mat <- matrix(data = rnorm(90 * 90), ncol = 90, nrow = 90)
rrnd <- split(sample(seq(nrow(mat))), gl(3, nrow(mat) / 3))
crnd <- split(sample(seq(ncol(mat))), gl(3, ncol(mat) / 3))
mat.shuffle <- c()
for (i in rrnd) {
for (j in crnd) {
mat.shuffle <- c(mat.shuffle, list(mat[i, j]))
}
}
Note that mat.shuffle is a list of 9 chunks. If you want to transform back to the matrix of size 90x90, you can run the code below afterwards
out <- do.call(
rbind,
lapply(
split(
mat.shuffle,
gl(3, length(mat.shuffle) / 3)
),
function(x) do.call(cbind, x)
)
)
and you will see
> str(out)
num [1:90, 1:90] -0.639 -0.984 -0.108 -0.939 1.743 ...
> str(mat.shuffle)
List of 9
$ : num [1:30, 1:30] -0.639 -0.984 -0.108 -0.939 1.743 ...
$ : num [1:30, 1:30] -0.718 -0.154 -0.29 -0.796 1.7 ...
$ : num [1:30, 1:30] -0.886 0.887 -0.381 0.371 0.327 ...
$ : num [1:30, 1:30] -0.4555 -0.901 0.0312 -3.0223 0.8688 ...
$ : num [1:30, 1:30] 0.705 1.559 -1.278 -0.676 1.933 ...
$ : num [1:30, 1:30] -0.1916 -0.98649 -1.65769 0.00104 -0.19544 ...
$ : num [1:30, 1:30] 1.668 -0.869 1.469 -0.222 -1.869 ...
$ : num [1:30, 1:30] 0.9931 0.4116 1.995 -0.4456 -0.0692 ...
$ : num [1:30, 1:30] -1.462 -0.206 1.197 -1.611 -1.708 ...

R Boot function yields error while wrapped in a function

I have the following code for a simple bootstrapping example.
X <- runif(100)
errors <- rexp(length(X))-1
Y <- 1 + 2*X + errors
ols.mod <- lm(Y ~ X)
boot <- Boot(ols.mod, method = "case", R = 1000)
The above code works if executed as individual lines. However, when I wrap the code in a function as below:
test_func <- function() {
X <- runif(100)
errors <- rexp(length(X))-1
Y <- 1 + 2*X + errors
ols.mod <- lm(Y ~ X)
boot <- Boot(ols.mod, method = "case", R = 1000)
}
test_func()
Executing this yields an error as below:
Error in eval(predvars, data, env): object 'Y' not found
Any ideas on why this is happening? Thank you!
This works for me.
doBootstrapping <- function(n = 100, r = 1000) {
xy <- data.frame(X = runif(n = n))
errors <- rexp(length(xy$X))-1
xy$Y <- 1 + 2*xy$X + errors
ols.mod <- lm(Y ~ X, data = xy)
boot <- Boot(ols.mod, method = "case", R = r)
}
out <- doBootstrapping(n = 100, r = 1000)
> str(out)
List of 11
$ t0 : Named num [1:2] 0.986 1.953
..- attr(*, "names")= chr [1:2] "(Intercept)" "X"
$ t : num [1:1000, 1:2] 1.188 1.073 1.083 1.127 0.964 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "(Intercept)" "X"
$ R : num 1000
$ data :'data.frame': 100 obs. of 1 variable:
..$ .zero: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
$ seed : int [1:626] 403 334 599478801 441938554 -1355732193 -409499586 -248637084 -2048103023 20568410 1088692573 ...
$ statistic:function (data, indices, .fn)
$ sim : chr "ordinary"
$ call : language boot::boot(data = dd, statistic = boot.f, R = R, .fn = f, parallel = parallel_env, ncpus = ncores)
$ stype : chr "i"
$ strata : num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
$ weights : num [1:100] 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...
- attr(*, "class")= chr "boot"
- attr(*, "boot_type")= chr "boot"

transform a complicate list to a data frame

Sorry if this question has been formulated before, but I am totalyy new and I tried what was suggested in other post here with no result. For example I tried:
do.call("rbind", lapply(MET1, as.data.frame))
But it said:
Error in as.data.frame.default(X[[i]], ...) : cannot coerce class
""mixEM"" to a data.frame
I have this list of lists (MET1) with 7 elements (showed below) and I want to transform it into a simplified data frame. Each row of the data frame needs to be one of the elements and I only need the information of lambda, mu and sigma. So basically something like this:
LAMBDA1 LAMBDA2 MU1 MU2 SIGMA1 SIGMA2
0102-A451 0.822 0.178 1711 10850 249 14986
0102-A453 0.813 0.187 1491 4031 108 6877
...
My list of lists is:
str(MET1)
List of 7
$ 0102-A451:List of 9
..$ x : num [1:178] 2088 1579 1638 1507 1862 ...
..$ lambda : num [1:2] 0.822 0.178
..$ mu : num [1:2] 1711 10850
..$ sigma : num [1:2] 249 14986
..$ loglik : num -1440
..$ posterior : num [1:178, 1:2] 0.991 0.997 0.997 0.996 0.996 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "comp.1" "comp.2"
..$ all.loglik: num [1:16] -1703 -1518 -1472 -1450 -1442 ...
..$ restarts : num 0
..$ ft : chr "normalmixEM"
..- attr(*, "class")= chr "mixEM"
$ 0102-A453:List of 9
..$ x : num [1:663] 1414 1506 1399 1423 1421 ...
..$ lambda : num [1:2] 0.813 0.187
..$ mu : num [1:2] 1491 4031
..$ sigma : num [1:2] 108 6877
..$ loglik : num -4847
..$ posterior : num [1:663, 1:2] 0.996 0.997 0.995 0.996 0.996 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "comp.1" "comp.2"
..$ all.loglik: num [1:29] -5760 -4983 -4883 -4861 -4853 ...
..$ restarts : num 0
..$ ft : chr "normalmixEM"
..- attr(*, "class")= chr "mixEM"
...
Here's a tidyverse solution, first we extract the relevant subsets of the lists and convert them to tibble (adding also row numbers). Then we bind the tibbles and do standard tidyr gymnastics :
MET1 %>%
map_dfr(~as_tibble(.x[c("lambda","mu","sigma")]) %>% rownames_to_column,
.id="id") %>%
gather(,,-rowname,-id) %>%
unite(key,key,rowname) %>%
spread(key,value)
# # A tibble: 2 x 7
# id lambda_1 lambda_2 mu_1 mu_2 sigma_1 sigma_2
# * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 0.822 0.178 1711 10850 249 14986
# 2 2 0.813 0.187 1491 4031 108 6877
data:
MET1 <- list(
list(lambda = c(0.822, 0.178),
mu = c(1711, 10850),
sigma = c(249, 14986),
something_else="whatever"),
list(lambda = c(0.813, 0.187),
mu = c(1491, 4031),
sigma = c(108, 6877),
something_else="whatever")
)
Overview
Use lapply() to transform each list within MET1 into a wide data frame. Then, use rbind()
within do.call() to collapse all the lists within MET1 into one data frame. Thank you #Moody_Mudskipper for creating the reproducible data.
# load data
MET1 <-
list(
`0102-A451` = list(
lambda = c(0.822, 0.178),
mu = c(1711, 10850),
sigma = c(249, 14986),
something_else = "whatever"
)
, `0102-A453` = list(
lambda = c(0.813, 0.187),
mu = c(1491, 4031),
sigma = c(108, 6877),
something_else = "whatever"
)
)
# Transfrom MET1
# so that each list is a wide data frame
MET1 <-
lapply(
X = MET1
, FUN = function( i )
data.frame(
LAMBDA_1 = i[["lambda"]][1]
, LAMBDA_2 = i[["lambda"]][2]
, MU_1 = i[["mu"]][1]
, MU_2 = i[["mu"]][2]
, SIGMA_1 = i[["sigma"]][1]
, SIGMA_2 = i[["sigma"]][2]
)
)
# now transfrom MET1
# into one data frame
# one row for each data frame within MET1
MET1 <-
do.call(
what = "rbind"
, args = MET1
)
# view results
MET1
# LAMBDA_1 LAMBDA_2 MU_1 MU_2 SIGMA_1
# 0102-A451 0.822 0.178 1711 10850 249
# 0102-A453 0.813 0.187 1491 4031 108
# SIGMA_2
# 0102-A451 14986
# 0102-A453 6877
# end of script #

dplyr fails to summarize

My data frame is a time series with different variables and looks like
Date prcpmm TmaxC TminC
1/1/1966 0 7.8 0
1/2/1966 0.8 8.3 -1.1
1/3/1966 0 2.8 -5
1/4/1966 0 4.4 -5.6
1/5/1966 0 5.6 -8.9
1/6/1966 0 2.2 -1.7
1/7/1966 0.5 0.6 -12.8
1/8/1966 0 -6.1 -15
1/9/1966 0 3.9 -8.3
1/10/1966 0 5 -8.9
1/11/1966 0 -5 -12.2
1/12/1966 1 -3.3 -11.1
1/13/1966 1.3 2.2 -6.1
starting from 1966 - 2005, link to have an access to the input file: https://drive.google.com/open?id=0B09Gj4L5FPQWeGt0Wl94OWVlelE
I tried the blow mentioned code:
Adrian <- read.csv("~/Adrian.csv")
Adrian <- cbind(Adrian, weatherGen = rep("ObservedAdrian", nrow(Adrian)))
dateObj <- as.POSIXct(Adrian$Date, format = "%m/%d/%Y")
Adrian$Date <- dateObj
Adrian$day <- day(dateObj)
Adrian$month <- month(dateObj)
Adrian$year <- year(dateObj)
Adrian$prcpmm[Adrian$prcpmm==-99.00]<- NA
Adrian$TmaxC[Adrian$TmaxC==-99.00]<- NA
Adrian$TminC[Adrian$TminC==-99.00]<- NA
for (i in 1:length(Adrian$month)) {
if (Adrian$month[i] %in% 3:5) {
Adrian$season[i] = "spring"
Adrian$yearNew[i] = Adrian$year[i]
} else if (Adrian$month[i] %in% 6:8) {
Adrian$season[i] = "summer"
Adrian$yearNew[i] = Adrian$year[i]
} else if (Adrian$month[i] %in% 9:11) {
Adrian$season[i] = "fall"
Adrian$yearNew[i] = Adrian$year[i]
} else{
Adrian$season[i] = "winter"
Adrian$yearNew[i] = Adrian$year[i] + 1
}
}
Adrian$season <- ordered(Adrian$season, levels = c("winter", "spring",
"summer", "fall"))
It executed till this part very well without error.
Observed_everyseason_pVal <- Adrian %>% group_by(yearNew, season) %>%
summarise(pvalMK = mk.test(ts(prcpmm))$pvalue[1], SMK =
mk.test(ts(prcpmm))$Sg[1] )
Observed_everyseason_pVal1 <- Adrian %>% group_by(season) %>%
summarise(pvalMK = mk.test(ts(prcpmm))$pvalue[1], SMK =
mk.test(ts(prcpmm))$Sg[1])
from the above to code lines:
Observed_everyseason_pVal <- Adrian %>% group_by(yearNew, season)
It worked well without error but when I run with
Observed_everyseason_pVal <- Adrian %>% group_by(yearNew, season) %>%
summarise(pvalMK = mk.test(ts(prcpmm))$pvalue[1], SMK =
mk.test(ts(prcpmm))$Sg[1] )
It produces error:
Error in summarise_impl(.data, dots) : Column pvalMK is of
unsupported type NULL
if you do str(mk.test(ts(Adrian$prcpmm))), it gives you the following output:
List of 9
$ data.name : chr "ts(Adrian$prcpmm)"
$ p.value : num 0.259
$ statistic : Named num 1.13
..- attr(*, "names")= chr "z"
$ null.value : Named num 0
..- attr(*, "names")= chr "S"
$ parameter : Named int 13
..- attr(*, "names")= chr "n"
$ estimates : Named num [1:3] 16 176.67 0.28
..- attr(*, "names")= chr [1:3] "S" "varS" "tau"
$ alternative: chr "two.sided"
$ method : chr "Mann-Kendall trend test"
$ pvalg : num 0.259
- attr(*, "class")= chr "htest"
Therefore, the former object you specified, namely mk.test(ts(prcpmm))$pvalue[1], doesn´t exist. There is only mk.test(ts(prcpmm))$p.value[1]. Also, the object mk.test(ts(prcpmm))$Sg[1] (...$Sg) is not in scope at all. This explains the error: Error in summarise_impl(.data, dots) : Column pvalMK is of unsupported type NULL.

R: perform parameter sweep and collect results in long data frame

I am looking the right R idiom to run a function over a set of parameters and create a long data frame from the results. Imagine that you have the following toy function:
fun <- function(sd, mean, foobar = "foobar") {
list(random = rnorm(10) * sd + mean + 1:10, foobar = foobar)
}
Now you want to run fun over different values of sd and mean:
par_sd <- rep(1:5, 3)
par_mean <- rep(0:2, each = 5)
pars <- data.frame(sd = par_sd, mean = par_mean)
I want to run fun for the parameters in each row of pars, and collect the results in a data frame with columns sd, mean, pos, value. Here is a rather clumsy solution:
set.seed(42)
## Run fun
res <- lapply(seq_len(nrow(pars)), function(x) {
do.call(fun, as.list(pars[x, ]))
})
## Select the result we need
res <- lapply(res, "[[", "random")
## Make it a single data frame
res <- do.call(rbind, res)
## Together with the parameters
res <- as.data.frame(cbind(sd = par_sd, mean = par_mean, res))
colnames(res) <- c("sd", "mean", 1:10)
## Make it a long data frame
res <- reshape2::melt(res, id.vars=c("sd", "mean"),
variable.name = "pos", value.name="value")
## Done
res[1:5,]
#> sd mean pos value
#> 1 1 0 1 2.37095845
#> 2 2 0 1 3.60973931
#> 3 3 0 1 0.08008422
#> 4 4 0 1 2.82180049
#> 5 5 0 1 2.02999300
Is there a simpler way to do this? Anyone knows a package that does things like this? My quick search did not give any good results...
If you're willing to amend fun() to return a data.frame, I find the most elegant solution is plyr's mdply.
fun <- function(sd, mean, foobar = "foobar") {
data.frame(random = rnorm(10) * sd + mean + 1:10, foobar = foobar)
}
par_sd <- rep(1:5, 3)
par_mean <- rep(0:2, each = 5)
pars <- data.frame(sd = par_sd, mean = par_mean)
results = mdply(pars, fun, foobar = "stuff")
str(results)
mapply would seem a good fit:
> str(with(pars, mapply(fun, sd=sd, mean=mean) ) )
List of 30
$ : num [1:10] 3.16 2.28 2.84 1.49 3.43 ...
$ : chr "foobar"
$ : num [1:10] 3.429 0.157 0.583 1.542 6.485 ...
$ : chr "foobar"
$ : num [1:10] -4.56 -1.51 -1.33 7.16 3.21 ...
$ : chr "foobar"
$ : num [1:10] -2.275 2.225 4.196 0.962 15.739 ...
$ : chr "foobar"
$ : num [1:10] 6.23 10.08 2.85 6.81 4.51 ...
$ : chr "foobar"
$ : num [1:10] 1.65 3.15 5.62 5.91 6.14 ...
$ : chr "foobar"
$ : num [1:10] 4.26 1.95 7.33 2.72 6.29 ...
$ : chr "foobar"
$ : num [1:10] 7.53 6.74 3.6 6.43 3.08 ...
$ : chr "foobar"
$ : num [1:10] -0.4181 -0.0584 5.5812 1.038 8.2482 ...
$ : chr "foobar"
$ : num [1:10] 0.2377 4.8557 5.2177 -0.0706 2.0434 ...
$ : chr "foobar"
$ : num [1:10] 2.95 4.3 5.26 8.58 5.81 ...
$ : chr "foobar"
$ : num [1:10] -0.85 4.83 8.19 5.17 6.58 ...
$ : chr "foobar"
$ : num [1:10] 3.59 11.46 6.29 6.57 2.97 ...
$ : chr "foobar"
$ : num [1:10] 0.117 3.142 10.473 10.196 5.56 ...
$ : chr "foobar"
$ : num [1:10] 13.03 2.64 -1.07 5.29 1.97 ...
$ : chr "foobar"
- attr(*, "dim")= int [1:2] 2 15
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "random" "foobar"
..$ : NULL
By default mapply will attempt to simplify and if you wanted to keep them as separate objects you could negate that default:
> str(with(pars, mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) )
List of 15
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] 1.08 0.68 3.16 3.38 5.96 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] 0.0927 5.1506 -1.0109 2.7136 2.1263 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] -0.331 2.9 -1.705 5.471 4.712 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
snipped
And if you need them in one stacked dataframe, it's just:
> str(do.call( rbind, with(pars, mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) ))
'data.frame': 150 obs. of 2 variables:
$ random: num 1 3.34 2.5 4.72 4.25 ...
$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1 ...
If you want these "labeled" with the sd and mean values, just this modification of the constructor function:
fun <- function(sd, mean, foobar = "foobar") {
data.frame(random = rnorm(10) * sd + mean + 1:10,
sd=sd, mean=mean, foobar = foobar)
}
str(do.call( rbind, with(pars,
mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) ))
#---------------
'data.frame': 150 obs. of 4 variables:
$ random: num 1.42 1.13 3.73 4.5 5.63 ...
$ sd : int 1 1 1 1 1 1 1 1 1 1 ...
$ mean : int 0 0 0 0 0 0 0 0 0 0 ...
$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1 ...

Resources