I have a nested list density_subset_list
It contains 6 lists, which each contain another 3 lists of density data. e.g.
dsl <- A(all_density, p1_density,p2_density), B(all_density, p1_density,p2_density)
etc.
I would like the overall y range.
Here is my attempt.
for (i in 1:length(INTlist)){
y <- unlist(lapply(density_subset_list[[i]], function(d) range(d$y)))
yall <- c(y, yall)
}
range(yall)
It doesn't seem to be working.
Any help is appreciated
Thanks
str(density_subset_list)
List of 6
$ STRexp :List of 3
..$ all:List of 7
.. ..$ x : num [1:512] -0.712 -0.708 -0.705 -0.702 -0.698 ...
.. ..$ y : num [1:512] 2.17e-14 3.64e-14 5.99e-14 9.64e-14 1.62e-13 ...
.. ..$ bw : num 0.047
.. ..$ n : int 1127
.. ..$ call : language density.default(x = x$corr, from = min(Sa14_scoreCorr$corr), to = max(Sa14_scoreCorr$corr), na.rm = T)
.. ..$ data.name: chr "x$corr"
.. ..$ has.na : logi FALSE
.. ..- attr(*, "class")= chr "density"
..$ Kan:List of 7
.. ..$ x : num [1:512] -0.712 -0.708 -0.705 -0.702 -0.698 ...
.. ..$ y : num [1:512] 2.60e-08 3.42e-08 4.50e-08 5.88e-08 7.62e-08 ...
.. ..$ bw : num 0.0649
.. ..$ n : int 287
.. ..$ call : language density.default(x = x$corr, from = min(Sa14_scoreCorr$corr), to = max(Sa14_scoreCorr$corr), na.rm = T)
.. ..$ data.name: chr "x$corr"
.. ..$ has.na : logi FALSE
.. ..- attr(*, "class")= chr "density"
..$ Cm :List of 7
.. ..$ x : num [1:512] -0.712 -0.708 -0.705 -0.702 -0.698 ...
.. ..$ y : num [1:512] 3.88e-08 4.79e-08 5.94e-08 7.38e-08 9.10e-08 ...re
You're very close, but a for isn't needed:
set.seed(100)
dat <- list(STRexp=list(list(), list(), list()),
all=list(y=sample(100, 50)),
Kan=list(y=sample(10,3)),
Cm=list(y=sample(1000, 100)))
str(dat)
## List of 4
## $ STRexp:List of 3
## ..$ : list()
## ..$ : list()
## ..$ : list()
## $ all :List of 1
## ..$ y: int [1:50] 31 26 55 6 45 46 77 35 51 16 ...
## $ Kan :List of 1
## ..$ y: int [1:3] 4 2 9
## $ Cm :List of 1
## ..$ y: int [1:100] 275 591 253 124 229 595 211 461 642 952 ...
# this will get the range of each "y" then get the overall range
range(unlist(lapply(names(dat)[-1], function(x) range(dat[[x]]$y))))
## [1] 2 991
Related
I have the following object with a lot of lists and I need to convert all of them in dataframes using R...
glimpse(pickle_data)
List of 32
$ 2020-02-01:List of 11
..$ model :List of 6
.. ..$ : num [1:88, 1:100] 0.00487 0.13977 -0.07648 0.18417 -0.1105 ...
.. ..$ : num [1:25, 1:100] -0.186 0.0703 0.1479 0.0321 0.1185 ...
.. ..$ : num [1:100(1d)] 0.0119 0.0457 0.023 0.0295 0.0115 ...
.. ..$ : num [1:25, 1:132] -0.024 0.0756 -0.0724 -0.1112 -0.1974 ...
.. ..$ : num [1:33, 1:132] 0.1904 0.1275 0.0684 0.1707 0.017 ...
.. ..$ : num [1:132(1d)] 0.0434 0.0636 0.0444 0.0329 0.0393 ...
..$ X_train : num [1:2494, 1:13, 1:88] 0.0676 0.0697 0.0717 0.0753 0.0783 ...
..$ X_test : num [1:3180, 1:13, 1:88] 0.0676 0.0697 0.0717 0.0753 0.0783 ...
..$ df_input : feat__price__mean_22_days ... feat__us_area_harvested__slope_66_days
ds ...
2010-01-01 NaN ... NaN
2010-01-04 NaN ... NaN
2010-01-05 NaN ... NaN
2010-01-06 NaN ... NaN
2010-01-07 NaN ... NaN
... ... ... ...
2022-09-13 0.699482 ... 0.157974
2022-09-14 0.705994 ... 0.163528
2022-09-15 0.713729 ... 0.171177
2022-09-16 0.722944 ... 0.176913
2022-09-19 0.728798 ... 0.184181
[3317 rows x 88 columns]
..$ index_test :DatetimeIndex(['2010-07-13', '2010-07-14', '2010-07-15', '2010-07-16',
'2010-07-19', '2010-07-20', '2010-07-21', '2010-07-22',
'2010-07-23', '2010-07-26',
...
'2022-09-06', '2022-09-07', '2022-09-08', '2022-09-09',
'2022-09-12', '2022-09-13', '2022-09-14', '2022-09-15',
'2022-09-16', '2022-09-19'],
dtype='datetime64[ns]', name='ds', length=3180, freq=None)
..$ window : int 12
..$ shift_coeff : int 4
..$ target_frequency : int 6
..$ nb_of_months_to_predict : int 9
..$ spine_params :List of 3
.. ..$ month_duration: int 22
.. ..$ week_duration : int 5
.. ..$ min_periods : int 7
..$ neural_network_hyperparams:List of 10
.. ..$ number_of_neurons_first_layer: int 25
.. ..$ l1_regularization : int 0
.. ..$ l2_regularization : int 0
.. ..$ decrease_reg_deep_layers : logi TRUE
.. ..$ decrease_reg_factor : int 10
.. ..$ dropout_rate : num 0.1
.. ..$ use_dropout : logi TRUE
.. ..$ initial_learning_rate : num 0.000151
.. ..$ output_layer_cell : chr "lstm"
.. ..$ gradient_initialization : int 42
$ 2020-03-01:List of 11
..$ model :List of 6
It is possible to extract dataframe for the first list and all of them inside him and so on untill to the end of this object using R?
I was wondering if there was a way to retrieve the data from a model built from the BART package in R?
It seems to be possible using other bart packages, such as dbarts... but I can't seem to find a way to get the original data back from a BART model. For example, if I create some data and run a BART and dbarts model, like so:
library(BART)
library(dbarts)
# create data
df <- data.frame(
x = runif(100),
y = runif(100),
z = runif(100)
)
# create BART
BARTmodel <- wbart(x.train = df[,1:2],
y.train = df[,3])
# create dbarts
DBARTSmodel <- bart(x.train = df[,1:2],
y.train = df[,3],
keeptrees = TRUE)
Using the keeptrees option in dbarts allows me to retrieve the data using:
# retrieve data from dbarts
DBARTSmodel$fit$data#x
However, there doesn't seem to be any type of similar option when using BART. Is it even possible to retrieve the data from a BART model?
The Value: section of ?wbart suggests it doesn't return the input as part of the output, and none of the function arguments for wbart suggest that this can be changed.
Furthermore, if you look at the output of str, you can see that it's not present.
library(BART)
library(dbarts)
# create data
df <- data.frame(
x = runif(100),
y = runif(100),
z = runif(100)
)
# create BART
BARTmodel <- wbart(x.train = df[,1:2],
y.train = df[,3])
# create dbarts
DBARTSmodel <- bart(x.train = df[,1:2],
y.train = df[,3],
keeptrees = TRUE)
str(BARTmodel)
#> List of 13
#> $ sigma : num [1:1100] 0.258 0.262 0.295 0.278 0.273 ...
#> $ yhat.train.mean: num [1:100] 0.584 0.457 0.505 0.54 0.403 ...
#> $ yhat.train : num [1:1000, 1:100] 0.673 0.62 0.433 0.711 0.634 ...
#> $ yhat.test.mean : num(0)
#> $ yhat.test : num[1:1000, 0 ]
#> $ varcount : int [1:1000, 1:2] 109 114 111 118 115 114 115 110 114 117 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:2] "x" "y"
#> $ varprob : num [1:1000, 1:2] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:2] "x" "y"
#> $ treedraws :List of 2
#> ..$ cutpoints:List of 2
#> .. ..$ x: num [1:100] 0.0147 0.0245 0.0343 0.0442 0.054 ...
#> .. ..$ y: num [1:100] 0.0395 0.0491 0.0586 0.0681 0.0776 ...
#> ..$ trees : chr "1000 200 2\n1\n1 0 0 0.01185590432\n3\n1 1 30 -0.01530736435\n2 0 0 0.01064412946\n3 0 0 0.02413784284\n3\n1 0 "| __truncated__
#> $ proc.time : 'proc_time' Named num [1:5] 1.406 0.008 1.415 0 0
#> ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
#> $ mu : num 0.501
#> $ varcount.mean : Named num [1:2] 115 110
#> ..- attr(*, "names")= chr [1:2] "x" "y"
#> $ varprob.mean : Named num [1:2] 0.5 0.5
#> ..- attr(*, "names")= chr [1:2] "x" "y"
#> $ rm.const : int [1:2] 1 2
#> - attr(*, "class")= chr "wbart"
Whereas the output of str() for the bart output, while long, does contain the input:
str(DBARTSmodel)
#> List of 11
#> $ call : language bart(x.train = df[, 1:2], y.train = df[, 3], keeptrees = TRUE)
#> $ first.sigma : num [1:100] 0.289 0.311 0.268 0.253 0.242 ...
#> $ sigma : num [1:1000] 0.288 0.307 0.248 0.257 0.293 ...
#> $ sigest : num 0.295
#> $ yhat.train : num [1:1000, 1:100] 0.715 0.677 0.508 0.51 0.827 ...
#> $ yhat.train.mean: num [1:100] 0.583 0.456 0.504 0.544 0.404 ...
#> $ yhat.test : NULL
#> $ yhat.test.mean : NULL
#> $ varcount : int [1:1000, 1:2] 128 118 120 142 130 145 145 150 138 138 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:2] "x" "y"
#> $ y : num [1:100] 0.8489 0.0817 0.4371 0.8566 0.0878 ...
#> $ fit :Reference class 'dbartsSampler' [package "dbarts"] with 5 fields
#> ..$ pointer:<externalptr>
#> ..$ control:Formal class 'dbartsControl' [package "dbarts"] with 18 slots
#> .. .. ..# binary : logi FALSE
#> .. .. ..# verbose : logi TRUE
#> .. .. ..# keepTrainingFits: logi TRUE
#> .. .. ..# useQuantiles : logi FALSE
#> .. .. ..# keepTrees : logi TRUE
#> .. .. ..# n.samples : int 1000
#> .. .. ..# n.burn : int 100
#> .. .. ..# n.trees : int 200
#> .. .. ..# n.chains : int 1
#> .. .. ..# n.threads : int 1
#> .. .. ..# n.thin : int 1
#> .. .. ..# printEvery : int 100
#> .. .. ..# printCutoffs : int 0
#> .. .. ..# rngKind : chr "default"
#> .. .. ..# rngNormalKind : chr "default"
#> .. .. ..# rngSeed : int NA
#> .. .. ..# updateState : logi TRUE
#> .. .. ..# call : language bart(x.train = df[, 1:2], y.train = df[, 3], keeptrees = TRUE)
#> ..$ model :Formal class 'dbartsModel' [package "dbarts"] with 9 slots
#> .. .. ..# p.birth_death : num 0.5
#> .. .. ..# p.swap : num 0.1
#> .. .. ..# p.change : num 0.4
#> .. .. ..# p.birth : num 0.5
#> .. .. ..# node.scale : num 0.5
#> .. .. ..# tree.prior :Formal class 'dbartsCGMPrior' [package "dbarts"] with 3 slots
#> .. .. .. .. ..# power : num 2
#> .. .. .. .. ..# base : num 0.95
#> .. .. .. .. ..# splitProbabilities: num(0)
#> .. .. ..# node.prior :Formal class 'dbartsNormalPrior' [package "dbarts"] with 0 slots
#> list()
#> .. .. ..# node.hyperprior:Formal class 'dbartsFixedHyperprior' [package "dbarts"] with 1 slot
#> .. .. .. .. ..# k: num 2
#> .. .. ..# resid.prior :Formal class 'dbartsChiSqPrior' [package "dbarts"] with 2 slots
#> .. .. .. .. ..# df : num 3
#> .. .. .. .. ..# quantile: num 0.9
#> ..$ data :Formal class 'dbartsData' [package "dbarts"] with 10 slots
#> .. .. ..# y : num [1:100] 0.8489 0.0817 0.4371 0.8566 0.0878 ...
#> .. .. ..# x : num [1:100, 1:2] 0.152 0.666 0.967 0.248 0.668 ...
#> .. .. .. ..- attr(*, "dimnames")=List of 2
#> .. .. .. .. ..$ : NULL
#> .. .. .. .. ..$ : chr [1:2] "x" "y"
#> .. .. .. ..- attr(*, "drop")=List of 2
#> .. .. .. .. ..$ x: logi FALSE
#> .. .. .. .. ..$ y: logi FALSE
#> .. .. .. ..- attr(*, "term.labels")= chr [1:2] "x" "y"
#> .. .. ..# varTypes : int [1:2] 0 0
#> .. .. ..# x.test : NULL
#> .. .. ..# weights : NULL
#> .. .. ..# offset : NULL
#> .. .. ..# offset.test : NULL
#> .. .. ..# n.cuts : int [1:2] 100 100
#> .. .. ..# sigma : num 0.295
#> .. .. ..# testUsesRegularOffset: logi NA
#> ..$ state :List of 1
#> .. ..$ :Formal class 'dbartsState' [package "dbarts"] with 6 slots
#> .. .. .. ..# trees : int [1:1055] 0 18 -1 0 49 -1 -1 0 60 -1 ...
#> .. .. .. ..# treeFits : num [1:100, 1:200] -0.02252 0.00931 0.00931 0.02688 0.00931 ...
#> .. .. .. ..# savedTrees: int [1:2340360] 0 797997482 1070928224 1 -402902351 1070268808 -1 -1094651769 -1081938039 -1 ...
#> .. .. .. ..# sigma : num 0.297
#> .. .. .. ..# k : num 2
#> .. .. .. ..# rng.state : int [1:18] 0 1078575104 0 1078575104 -1657977906 1075613906 0 1078558720 277209871 -1068236140 ...
#> .. ..- attr(*, "runningTime")= num 0.477
#> .. ..- attr(*, "currentNumSamples")= int 1000
#> .. ..- attr(*, "currentSampleNum")= int 0
#> .. ..- attr(*, "numCuts")= int [1:2] 100 100
#> .. ..- attr(*, "cutPoints")=List of 2
#> .. .. ..$ : num [1:100] 0.0147 0.0245 0.0343 0.0442 0.054 ...
#> .. .. ..$ : num [1:100] 0.0395 0.0491 0.0586 0.0681 0.0776 ...
#> ..and 40 methods, of which 26 are possibly relevant:
#> .. copy#envRefClass, getLatents, getPointer, getTrees, initialize, plotTree,
#> .. predict, printTrees, run, sampleNodeParametersFromPrior,
#> .. sampleTreesFromPrior, setControl, setCutPoints, setData, setModel,
#> .. setOffset, setPredictor, setResponse, setSigma, setState, setTestOffset,
#> .. setTestPredictor, setTestPredictorAndOffset, setWeights,
#> .. show#envRefClass, storeState
#> - attr(*, "class")= chr "bart"
You can achieve what you are looking for using bartModelMatrix() function form BART package.
This function it will determinate the number of cutpoints necessary for each column.
In this way, you'll have so many columns as variables you have in your df.
In your example you're only insterested in x and y, so you'll only care for the first and second column from bartModelMatrix() matrix obtained.
So for the example you gave:
# create data
df <- data.frame(
x = runif(100),
y = runif(100),
z = runif(100),
)
# create BART
BARTmodel <- wbart(x.train = df[,1:2],
y.train = df[,3])
# create dbarts
DBARTSmodel <- bart(x.train = df[,1:2],
y.train = df[,3],
keeptrees = TRUE)
BARTmatrix <- bartModelMatrix(df)
BARTmatrix <- BARTmatrix[,1:2]
BARTmatrix == DBARTSmodel$fit$data#x
Hope that helped you
I have done many bayesian models using the MCMCglmm package in R, like this one:
model=MCMCglmm(scale(lifespan)~scale(weight)*scale(littersize),
random=~idv(DNA1)+idv(DNA2),
data=df,
family="gaussian",
prior=prior1,
thin=50,
burnin=5000,
nitt=50000,
verbose=F)
summary(model)
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 11.23327 8.368 13.73756 6228 <2e-04 ***
weight -1.63770 -2.059 -1.23457 6600 <2e-04 ***
littersize 0.40960 0.024 0.80305 6600 0.0415 *
weight:littersize -0.33411 -0.635 -0.04406 5912 0.0248 *
I would like to plot the resulting interaction (weight:littersize) with ggeffects or sjPlots packages, like this:
plot_model(model,
type = "int",
terms = c("scale(lifespan)", "scale(weight)", "scale(littersize)"),
mdrt.values = "meansd",
ppd = TRUE)
But I obtain the next output:
`scale(weight)` was not found in model terms. Maybe misspelled?
`scale(littersize)` was not found in model terms. Maybe misspelled?
Error in terms.default(model) : no terms component nor attribute
Además: Warning messages:
1: Some model terms could not be found in model data. You probably need to load the data into the environment.
2: Some model terms could not be found in model data. You probably need to load the data into the environment.
Data is already loaded. I tried to write terms differently without the "scale(x)" term, and changed the model too to deal with equal terms, but I am still getting this error message. I am also open to plot this interaction with different packages.
My model str(model) is:
>str(model)
List of 20
$ Sol : 'mcmc' num [1:6600, 1:4] -0.814 1.215 -2.119 -0.125 -1.648 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:4] "(Intercept)" "scale(weight)" "scale(littersize)" "scale(weight):scale(littersize)"
..- attr(*, "mcpar")= num [1:3] 7e+04 4e+05 5e+01
$ Lambda : NULL
$ VCV : 'mcmc' num [1:6600, 1:3] 1.094 0.693 1.58 0.645 1.161 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "phylo." "haplo." "units"
..- attr(*, "mcpar")= num [1:3] 7e+04 4e+05 5e+01
$ CP : NULL
$ Liab : NULL
$ Fixed :List of 3
..$ formula:Class 'formula' language scale(lifespan) ~ scale(weight) * scale(littersize)
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
..$ nfl : int 4
..$ nll : num 0
$ Random :List of 5
..$ formula:Class 'formula' language ~idv(phylo) + idv(haplo)
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
..$ nfl : num [1:2] 1 1
..$ nrl : int [1:2] 92 92
..$ nat : num [1:2] 0 0
..$ nrt : int [1:2] 1 1
$ Residual :List of 6
..$ formula :Class 'formula' language ~units
.. .. ..- attr(*, ".Environment")=<environment: 0x0000025ba05f8938>
..$ nfl : num 1
..$ nrl : int 92
..$ nrt : int 1
..$ family : chr "gaussian"
..$ original.family: chr "gaussian"
$ Deviance : 'mcmc' num [1:6600] -262.6 -137.3 -203.6 -83.6 -29.1 ...
..- attr(*, "mcpar")= num [1:3] 7e+04 4e+05 5e+01
$ DIC : num -158
$ X :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..# i : int [1:368] 0 1 2 3 4 5 6 7 8 9 ...
.. ..# p : int [1:5] 0 92 184 276 368
.. ..# Dim : int [1:2] 92 4
.. ..# Dimnames:List of 2
.. .. ..$ : chr [1:92] "1.1" "2.1" "3.1" "4.1" ...
.. .. ..$ : chr [1:4] "(Intercept)" "scale(weight)" "scale(littersize)" "scale(weight):scale(littersize)"
.. ..# x : num [1:368] 1 1 1 1 1 1 1 1 1 1 ...
.. ..# factors : list()
$ Z :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..# i : int [1:16928] 0 1 2 3 4 5 6 7 8 9 ...
.. ..# p : int [1:185] 0 92 184 276 368 460 552 644 736 828 ...
.. ..# Dim : int [1:2] 92 184
.. ..# Dimnames:List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:184] "phylo1.NA.1" "phylo2.NA.1" "phylo3.NA.1" "phylo4.NA.1" ...
.. ..# x : num [1:16928] 0.4726 0.0869 0.1053 0.087 0.1349 ...
.. ..# factors : list()
$ ZR :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..# i : int [1:92] 0 1 2 3 4 5 6 7 8 9 ...
.. ..# p : int [1:93] 0 1 2 3 4 5 6 7 8 9 ...
.. ..# Dim : int [1:2] 92 92
.. ..# Dimnames:List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:92] "units.1" "units.2" "units.3" "units.4" ...
.. ..# x : num [1:92] 1 1 1 1 1 1 1 1 1 1 ...
.. ..# factors : list()
$ XL : NULL
$ ginverse : NULL
$ error.term : int [1:92] 1 1 1 1 1 1 1 1 1 1 ...
$ family : chr [1:92] "gaussian" "gaussian" "gaussian" "gaussian" ...
$ Tune : num [1, 1] 1
..- attr(*, "dimnames")=List of 2
.. ..$ : chr "1"
.. ..$ : chr "1"
$ meta : logi FALSE
$ y.additional: num [1:92, 1:2] 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "class")= chr "MCMCglmm"
Thank you.
Try to scale your predictors before fitting the model, i.e.
df$lifespan <- as.vecor(scale(df$lifespan))
Or better, use effectsize::standardize(), which does not create a matrix for a one-dimensial vector when scaling your variables:
df <- effectsize::standardize(df, select = c("lifespan", "weight", "littersize"))
Then you can call your model like this:
model <- MCMCglmm(lifespan ~ weight * littersize,
random=~idv(DNA1)+idv(DNA2),
data=df,
family="gaussian",
prior=prior1,
thin=50,
burnin=5000,
nitt=50000,
verbose=F)
Does this work?
I'm unable to call the function randomForest.plot() when loading a randomForest object through an RData file.
library("randomForest")
load("rf.RData")
plot(rf)
I get the error:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
Get the same error when I call randomForest:::plot.randomForest(rf)
Other function calls on rf work just fine.
EDIT:
See output of str(rf)
str(rf)
List of 15
$ call : language randomForest(x = data[, match("feat1", names(data)):match("feat_n", names(data))], y = data[, match("my_y", n| __truncated__ ...
$ type : chr "regression"
$ predicted : Named num [1:723012] -1141 -1767 -1577 NA -1399 ...
..- attr(*, "names")= chr [1:723012] "1" "2" "3" "4" ...
$ oob.times : int [1:723012] 3 4 6 3 2 3 2 6 7 5 ...
$ importance : num [1:150, 1:2] 6172 928 6367 5754 1013 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
.. ..$ : chr [1:2] "%IncMSE" "IncNodePurity"
$ importanceSD : Named num [1:150] 400.9 96.7 500.1 428.9 194.8 ...
..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
$ localImportance: NULL
$ proximity : NULL
$ ntree : num 60
$ mtry : num 10
$ forest :List of 11
..$ ndbigtree : int [1:60] 392021 392219 392563 392845 393321 392853 392157 392709 393223 392679 ...
..$ nodestatus : num [1:393623, 1:60] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 ...
..$ leftDaughter : num [1:393623, 1:60] 2 4 6 8 10 12 14 16 18 20 ...
..$ rightDaughter: num [1:393623, 1:60] 3 5 7 9 11 13 15 17 19 21 ...
..$ nodepred : num [1:393623, 1:60] -8.15 -31.38 5.62 -59.87 -16.06 ...
..$ bestvar : num [1:393623, 1:60] 118 57 82 77 65 148 39 39 12 77 ...
..$ xbestsplit : num [1:393623, 1:60] 1.08e+02 -8.26e+08 -2.50 8.55e+03 1.20e+04 ...
..$ ncat : Named int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
..$ nrnodes : int 393623
..$ ntree : num 60
..$ xlevels :List of 150
.. ..$ feat1 : num 0
.. ..$ feat2 : num 0
.. ..$ feat3 : num 0
.. ..$ feat4 : num 0
.. ..$ featn : num 0
.. .. [list output truncated]
$ coefs : NULL
$ y : num [1:723012] -1885 -1918 -1585 -1838 -2035 ...
$ test : NULL
$ inbag : NULL
- attr(*, "class")= chr "randomForest"
I have created an MI data set using the MICE package with 7 imputed data sets
imputeddata <- mice(distress_tibmi, m=7)
the structure of my data is now:
..$ id : num [1:342] 4 8 10 11 23 32 40 47 48 56 ...
..$ diagnosis : Factor w/ 2 levels "psychosis","bpd": 1 1 1 1 1 1 1 1 1 1 ...
..$ gender : Factor w/ 2 levels "female","male": 1 2 2 2 2 1 1 1 1 1 ...
..$ distress.time : Factor w/ 2 levels "baseline","post": 1 1 1 1 1 1 1 1 1 1 ...
..$ distress.score: num [1:342] -2.436 -1.242 0.251 -1.54 0.549 ...
..$ depression : num [1:342] 0.332 0.542 1.172 -0.298 1.172 ...
..$ anxiety : num [1:342] -1.898 -0.687 0.87 -0.687 1.043 ...
..$ choice : num [1:342] 6.73 2.18 2 6.45 3.55 ...
$ imp :List of 8
..$ id :'data.frame': 0 obs. of 7 variables:
.. ..$ 1: logi(0)
.. ..$ 2: logi(0)
.. ..$ 3: logi(0)
.. ..$ 4: logi(0)
.. ..$ 5: logi(0)
.. ..$ 6: logi(0)
.. ..$ 7: logi(0)
..$ diagnosis :'data.frame': 0 obs. of 7 variables:
.. ..$ 1: logi(0)
.. ..$ 2: logi(0)
.. ..$ 3: logi(0)
.. ..$ 4: logi(0)
.. ..$ 5: logi(0)
.. ..$ 6: logi(0)
.. ..$ 7: logi(0)
..$ gender :'data.frame': 0 obs. of 7 variables:
.. ..$ 1: logi(0)
.. ..$ 2: logi(0)
.. ..$ 3: logi(0)
.. ..$ 4: logi(0)
.. ..$ 5: logi(0)
.. ..$ 6: logi(0)
.. ..$ 7: logi(0)
..$ distress.time :'data.frame': 0 obs. of 7 variables:
.. ..$ 1: logi(0)
.. ..$ 2: logi(0)
.. ..$ 3: logi(0)
.. ..$ 4: logi(0)
.. ..$ 5: logi(0)
.. ..$ 6: logi(0)
.. ..$ 7: logi(0)
..$ distress.score:'data.frame': 59 obs. of 7 variables:
.. ..$ 1: num [1:59] -0.6808 -0.6448 -1.658 -0.0293 -0.3463 ...
.. ..$ 2: num [1:59] 1.2736 0.2507 -0.0478 -0.6448 1.2736 ...
.. ..$ 3: num [1:59] -0.681 0.848 -1.658 1.274 0.251 ...
.. ..$ 4: num [1:59] -1.3322 -0.0478 -0.6808 -0.355 -2.4358 ...
.. ..$ 5: num [1:59] -1.3322 -0.355 -4.8239 -0.6448 -0.0293 ...
.. ..$ 6: num [1:59] -1.3322 0.5493 -0.0293 -2.6352 0.8478 ...
.. ..$ 7: num [1:59] 0.5493 0.2507 1.1463 -0.0478 1.2736 ...
..$ depression :'data.frame': 24 obs. of 7 variables:
.. ..$ 1: num [1:24] -0.0882 -0.5084 -1.2966 0.542 -2.1891 ...
.. ..$ 2: num [1:24] 0.332 0.255 1.592 0.752 0.945 ...
.. ..$ 3: num [1:24] -2.159 0.332 -0.262 0.962 1.382 ...
.. ..$ 4: num [1:24] -0.2621 -0.0897 -1.7689 1.1172 0.7724 ...
.. ..$ 5: num [1:24] 0.122 -2.159 -2.399 1.462 -2.189 ...
.. ..$ 6: num [1:24] -0.298 -0.434 -0.607 1.172 0.962 ...
.. ..$ 7: num [1:24] 0.6 1.29 1.635 0.542 0.428 ...
..$ anxiety :'data.frame': 10 obs. of 7 variables:
.. ..$ 1: num [1:10] 0.909 -1.379 1.389 -1.268 -0.598 ...
.. ..$ 2: num [1:10] 1.0433 -1.3789 -0.0955 -0.7655 -0.598 ...
.. ..$ 3: num [1:10] 1.0771 -1.8979 -0.0955 -0.5138 0.0052 ...
.. ..$ 4: num [1:10] -0.598 -1.603 0.9095 -2.608 -0.0955 ...
.. ..$ 5: num [1:10] 0.742 0.2395 -1.7249 -2.1055 -0.0955 ...
.. ..$ 6: num [1:10] 1.412 -0.86 1.389 -2.608 0.575 ...
.. ..$ 7: num [1:10] 1.245 -1.033 0.909 0.909 -1.033 ...
..$ choice :'data.frame': 22 obs. of 7 variables:
.. ..$ 1: num [1:22] 4.55 3.91 7.09 4.27 3.55 ...
.. ..$ 2: num [1:22] 8.09 5.09 5.36 4.91 4.45 ...
.. ..$ 3: num [1:22] 4.27 7.09 3.91 3.91 7.09 ...
.. ..$ 4: num [1:22] 5.82 6.27 7 6.82 4.73 ...
.. ..$ 5: num [1:22] 6.18 5.36 5.36 3.18 3.18 ...
.. ..$ 6: num [1:22] 6.18 6.73 4.73 4.73 5 ...
.. ..$ 7: num [1:22] 5.45 7.09 7.45 3.18 4.91 ...
$ m : num 7
$ where : logi [1:342, 1:8] FALSE FALSE FALSE FALSE FALSE FALSE ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:342] "1" "2" "3" "4" ...
.. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ blocks :List of 8
..$ id : chr "id"
..$ diagnosis : chr "diagnosis"
..$ gender : chr "gender"
..$ distress.time : chr "distress.time"
..$ distress.score: chr "distress.score"
..$ depression : chr "depression"
..$ anxiety : chr "anxiety"
..$ choice : chr "choice"
..- attr(*, "calltype")= Named chr [1:8] "type" "type" "type" "type" ...
.. ..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ call : language mice(data = distress_tibmi, m = 7)
$ nmis : Named int [1:8] 0 0 0 0 59 24 10 22
..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ method : Named chr [1:8] "" "" "" "" ...
..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ predictorMatrix: num [1:8, 1:8] 0 1 1 1 1 1 1 1 1 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
.. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ visitSequence : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ formulas :List of 8
..$ id :Class 'formula' language id ~ 0 + diagnosis + gender + distress.time + distress.score + depression + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ diagnosis :Class 'formula' language diagnosis ~ 0 + id + gender + distress.time + distress.score + depression + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ gender :Class 'formula' language gender ~ 0 + id + diagnosis + distress.time + distress.score + depression + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ distress.time :Class 'formula' language distress.time ~ 0 + id + diagnosis + gender + distress.score + depression + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ distress.score:Class 'formula' language distress.score ~ 0 + id + diagnosis + gender + distress.time + depression + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ depression :Class 'formula' language depression ~ 0 + id + diagnosis + gender + distress.time + distress.score + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ anxiety :Class 'formula' language anxiety ~ 0 + id + diagnosis + gender + distress.time + distress.score + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ choice :Class 'formula' language choice ~ 0 + id + diagnosis + gender + distress.time + distress.score + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
$ post : Named chr [1:8] "" "" "" "" ...
..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ blots :List of 8
..$ id : list()
..$ diagnosis : list()
..$ gender : list()
..$ distress.time : list()
..$ distress.score: list()
..$ depression : list()
..$ anxiety : list()
..$ choice : list()
$ seed : logi NA
$ iteration : num 5
$ lastSeedValue : int [1:626] 10403 331 -1243825859 461242975 2057104913 -837414599 -54045022 1529270132 -105270003 -1459771035 ...
$ chainMean : num [1:8, 1:5, 1:7] NaN NaN NaN NaN -0.727 ...
..- attr(*, "dimnames")=List of 3
.. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
.. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. ..$ : chr [1:7] "Chain 1" "Chain 2" "Chain 3" "Chain 4" ...
$ chainVar : num [1:8, 1:5, 1:7] NA NA NA NA 2.26 ...
..- attr(*, "dimnames")=List of 3
.. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
.. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. ..$ : chr [1:7] "Chain 1" "Chain 2" "Chain 3" "Chain 4" ...
$ loggedEvents : NULL
$ version :Classes 'package_version', 'numeric_version' hidden list of 1
..$ : int [1:3] 3 9 0
$ date : Date[1:1], format: ...
- attr(*, "class")= chr "mids"
Show in New WindowClear OutputExpand/Collapse Output
id diagnosis gender
Min. : 1.00 psychosis:250 female:196
1st Qu.: 76.75 bpd : 92 male :146
Median :198.00
Mean :215.66
3rd Qu.:337.00
Max. :514.00
distress.time distress.score depression
baseline:171 Min. :-4.8239 Min. :-2.39920
post :171 1st Qu.:-0.6808 1st Qu.:-0.76410
Median :-0.0293 Median : 0.08280
Mean :-0.3083 Mean :-0.06085
3rd Qu.: 0.6221 3rd Qu.: 0.77240
Max. : 1.2736 Max. : 1.80690
NA's :59 NA's :24
anxiety choice
Min. :-2.6080 Min. :0.0909
1st Qu.:-0.9330 1st Qu.:2.4545
Median :-0.0955 Median :4.0454
Mean :-0.1397 Mean :3.8903
3rd Qu.: 0.8702 3rd Qu.:5.1136
Max. : 1.7471 Max. :8.0909
NA's :10 NA's :22
Show in New WindowClear OutputExpand/Collapse Output
1
<dbl>
2
<dbl>
3
<dbl>
4
<dbl>
5
<dbl>
6
<dbl>
7
<dbl>
21 -0.6808 1.2736 -0.6808 -1.3322 -1.3322 -1.3322 0.5493
34 -0.6448 0.2507 0.8478 -0.0478 -0.3550 0.5493 0.2507
48 -1.6580 -0.0478 -1.6580 -0.6808 -4.8239 -0.0293 1.1463
141 -0.0293 -0.6448 1.2736 -0.3550 -0.6448 -2.6352 -0.0478
143 -0.3463 1.2736 0.2507 -2.4358 -0.0293 0.8478 1.2736
180 1.1463 -1.0065 -2.3094 -3.6124 -0.6448 -1.5403 -1.0065
181 -0.0293 -0.6808 -0.6808 -3.9381 -0.3463 -1.3322 0.2964
182 1.2736 -0.3463 0.9479 -0.0478 0.9479 -0.3463 1.1463
197 -0.3550 -0.0293 -0.6808 -0.3550 -1.3322 -4.8239 -0.6448
208 0.6221 0.2507 -0.6808 -0.3550 -0.6448 0.6221 -0.6448
1-10 of 59 rows
I created a lm with the imputed data set and summarised it using pool()
distressmodel <- with(data = imputeddata, exp = lm(distress.score ~ distress.time * diagnosis))
summary(mice::pool(distressmodel), conf.int = TRUE, conf.level = 0.95 )
however now I want to get the type 3 F values for the model, but this code is not working
car::Anova(mice::pool(distressmodel), type = 3)
it produces this error message:
Error in UseMethod("vcov") : no applicable method for 'vcov' applied to an object of class "c('mipo', 'data.frame')"
I also want to get the marginal effects of the model (eg see effects from only one level of the grouping variable which is diagnosis) which I have done successfully in my complete case analysis, but this code:
summary(margins(distressmodel, data = subset(imputeddata, diagnosis == "bpd", type = "response")))
produces this error
Error in subset_datlist(datlist = x, subset = subset, select = select, : object 'diagnosis' not found
Does anyone have any advice on alterations to the code or way to get the car::anova or margins () packages to work with an MI data set? (preferably being able to pool the results
The with(data, exp) procedure can be used to apply statistical test/models to multiple imputation outputs (mipo) only if they allow extracting the estimates with the coef method and a variance-covariance matrix with vcov. The latter seems not to work for the function car::Anova that you used.
Fortunately, there is the miceadds package, which offers procedures to conduct and pool additional statistical tests. miceadds::mi.anova seems to do exactly what you want:
miceadds::mi.anova(imputeddata, distress.score ~ distress.time * diagnosis, type=3)
I am not sure, however, about the marginal effects. In general, you can do a bit more coding and apply any statistical procedure to each imputed sample separately. Then you can pool it using the pool.scalar function. This method also gives you within-imputation, between-imputation, and total variance estimates for your pooled statistic. (And with that you can conduct a basic t-test for difference from 0, if you want.)
This approach relies on normal distribution of statistics – or on them being transformable to a normally distributed metric. (Stef van Buuren gives a list of statistics that can easily be transformed, pooled, and back-transformed here, see Table 5.2) So it should be possible for the marginal means you want, right?
I do not know the margins function you use (what package is it from?). But, if you want to get the marginal means and pool them yourself, this is the approach:
# transform your mids into a long-format data frame
imputed_l <- mice::complete(imputeddata, action="long")
nimp <- imputed_l$m #number of imputations for convenience
# create vectors to contain the marginal effects and their SEs from all seven imputations
mm_all <- vector("numeric", nimp)
mmse_all <- mm_all
# get marginal means and SEs for all imputations
for (i in 1:nimp) {
mm_all[i] <- Expression_producing_marginal_mean(..., data = subset(imputed_l, .imp=i) )
mmse_all[i] <- Expression_producing_SE(..., data = subset(imputed_l, .imp=i) )
}
# pool them (the U argument should be variances, so square the SEs)
mm_pool <- pool.scalar(Q=mm_all, U=mmse_all^2, n=nrow(imputed_l)/nimp)
mm_pool$qbar #marginal mean aggregated across imputations
sqrt(mm_pool$t) #SE of marginal mean (based on within- and between-imputations variance)