Reaaragment of list data in R - r

I have a list like
a=
:x1
..$y: chr [1:100] "da" "da" "dw" "dw"...........
..$z: num [1:100] 1 3 7 10 14 15 16...........
:x2
..$y chr [1:150] "sdd" "gtr" "fr" "sw"........
..$z num [1:150] 1 2 3 7 10 15 16............
i want to create a list which is a split of current list in a way that z vector should be split between 1:10, 11:20, 21:30,......
for eg.
a1=
:list1
..$x1
.. ..$y: chr [1:10] "da" "da"...........
.. ..$z: num [1:10] 1 3 7 10 ...........
..$x2
.. ..$y chr [1:10] "sdd" "gtr"........
.. ..$z num [1:10] 1 2 3 7 10............
:list2
..$x1
.. ..$y: chr [1:10] "des" "ded"...........
.. ..$z: num [1:10] 14 15 16...........
..$x2
.. ..$y chr [1:10] "dwd" "ded"........
.. ..$z num [1:10] 15 16............
:list3
..$x1
.. ..$y: chr [1:10] "ded" "sa"...........
.. ..$z: num [1:10] 21 24 27...........
..$x2
.. ..$y chr [1:10] "dww" "dw"........
.. ..$z num [1:10] 24 27 30............
I am trying some for loop but is throwing some errors.

Related

export Hmisc::describe output to excel/csv

Is there any way I can export this data to a csv file, instead of typing things in manually.
Below is the output from Hmisc describe function:
library(Hmisc) # Hmisc describe
> Hmisc::describe(data)
data
3 Variables 6 Observations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
ID
n missing distinct Info Mean Gmd
6 0 3 0.857 112.2 1.267
Value 110 112 113
Frequency 1 2 3
Proportion 0.167 0.333 0.500
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Date
n missing distinct
6 0 3
Value 23/04/2018 24/04/2018 25/04/2018
Frequency 3 2 1
Proportion 0.500 0.333 0.167
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Revenue
n missing distinct Info Mean Gmd
6 0 6 1 74 17.2
lowest : 51 65 70 85 86, highest: 65 70 85 86 87
Value 51 65 70 85 86 87
Frequency 1 1 1 1 1 1
Proportion 0.167 0.167 0.167 0.167 0.167 0.167
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Dataset:
> data
ID Date Revenue
1 113 23/04/2018 51
2 113 23/04/2018 87
3 113 23/04/2018 70
4 112 24/04/2018 85
5 112 24/04/2018 65
6 110 25/04/2018 86
I doubt writing it to csv would be helpful. Try writing it to text file instead.
cat(capture.output(Hmisc::describe(data)), file = 'result.txt', sep = '\n')
Probably not going to be easy. You could use capture.output but then you would need to parse the sections differently depending on their class and counts. You could also assign the results to a data object and try to work with that, but again, there will be a diversity of formats:
obj <- describe(iris)
str(obj)
# this is the canonical example of a dataframe but it doesn't even capture all the cases.
List of 5
$ Sepal.Length:List of 6
..$ descript: chr "Sepal.Length"
..$ units : NULL
..$ format : NULL
..$ counts : Named chr [1:13] "150" "0" "35" "0.998" ...
.. ..- attr(*, "names")= chr [1:13] "n" "missing" "distinct" "Info" ...
..$ values :List of 2
.. ..$ value : num [1:35] 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 ...
.. ..$ frequency: num [1:35(1d)] 1 3 1 4 2 5 6 10 9 4 ...
..$ extremes: Named num [1:10] 4.3 4.4 4.5 4.6 4.7 7.3 7.4 7.6 7.7 7.9
.. ..- attr(*, "names")= chr [1:10] "L1" "L2" "L3" "L4" ...
..- attr(*, "class")= chr "describe"
$ Sepal.Width :List of 6
..$ descript: chr "Sepal.Width"
..$ units : NULL
..$ format : NULL
..$ counts : Named chr [1:13] "150" "0" "23" "0.992" ...
.. ..- attr(*, "names")= chr [1:13] "n" "missing" "distinct" "Info" ...
..$ values :List of 2
.. ..$ value : num [1:23] 2 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 ...
.. ..$ frequency: num [1:23(1d)] 1 3 4 3 8 5 9 14 10 26 ...
..$ extremes: Named num [1:10] 2 2.2 2.3 2.4 2.5 3.9 4 4.1 4.2 4.4
.. ..- attr(*, "names")= chr [1:10] "L1" "L2" "L3" "L4" ...
..- attr(*, "class")= chr "describe"
$ Petal.Length:List of 6
..$ descript: chr "Petal.Length"
..$ units : NULL
..$ format : NULL
..$ counts : Named chr [1:13] "150" "0" "43" "0.998" ...
.. ..- attr(*, "names")= chr [1:13] "n" "missing" "distinct" "Info" ...
..$ values :List of 2
.. ..$ value : num [1:43] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.9 3 ...
.. ..$ frequency: num [1:43(1d)] 1 1 2 7 13 13 7 4 2 1 ...
..$ extremes: Named num [1:10] 1 1.1 1.2 1.3 1.4 6.3 6.4 6.6 6.7 6.9
.. ..- attr(*, "names")= chr [1:10] "L1" "L2" "L3" "L4" ...
..- attr(*, "class")= chr "describe"
$ Petal.Width :List of 6
..$ descript: chr "Petal.Width"
..$ units : NULL
..$ format : NULL
..$ counts : Named chr [1:13] "150" "0" "22" "0.99" ...
.. ..- attr(*, "names")= chr [1:13] "n" "missing" "distinct" "Info" ...
..$ values :List of 2
.. ..$ value : num [1:22] 0.1 0.2 0.3 0.4 0.5 0.6 1 1.1 1.2 1.3 ...
.. ..$ frequency: num [1:22(1d)] 5 29 7 7 1 1 7 3 5 13 ...
..$ extremes: Named num [1:10] 0.1 0.2 0.3 0.4 0.5 2.1 2.2 2.3 2.4 2.5
.. ..- attr(*, "names")= chr [1:10] "L1" "L2" "L3" "L4" ...
..- attr(*, "class")= chr "describe"
$ Species :List of 5
..$ descript: chr "Species"
..$ units : NULL
..$ format : NULL
..$ counts : Named num [1:3] 150 0 3
.. ..- attr(*, "names")= chr [1:3] "n" "missing" "distinct"
..$ values :List of 2
.. ..$ value : chr [1:3] "setosa" "versicolor" "virginica"
.. ..$ frequency: num [1:3(1d)] 50 50 50
..- attr(*, "class")= chr "describe"
- attr(*, "descript")= chr "iris"
- attr(*, "dimensions")= int [1:2] 150 5
- attr(*, "class")= chr "describe"

R Unable to plot loaded randomForest object

I'm unable to call the function randomForest.plot() when loading a randomForest object through an RData file.
library("randomForest")
load("rf.RData")
plot(rf)
I get the error:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
Get the same error when I call randomForest:::plot.randomForest(rf)
Other function calls on rf work just fine.
EDIT:
See output of str(rf)
str(rf)
List of 15
$ call : language randomForest(x = data[, match("feat1", names(data)):match("feat_n", names(data))], y = data[, match("my_y", n| __truncated__ ...
$ type : chr "regression"
$ predicted : Named num [1:723012] -1141 -1767 -1577 NA -1399 ...
..- attr(*, "names")= chr [1:723012] "1" "2" "3" "4" ...
$ oob.times : int [1:723012] 3 4 6 3 2 3 2 6 7 5 ...
$ importance : num [1:150, 1:2] 6172 928 6367 5754 1013 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
.. ..$ : chr [1:2] "%IncMSE" "IncNodePurity"
$ importanceSD : Named num [1:150] 400.9 96.7 500.1 428.9 194.8 ...
..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
$ localImportance: NULL
$ proximity : NULL
$ ntree : num 60
$ mtry : num 10
$ forest :List of 11
..$ ndbigtree : int [1:60] 392021 392219 392563 392845 393321 392853 392157 392709 393223 392679 ...
..$ nodestatus : num [1:393623, 1:60] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 ...
..$ leftDaughter : num [1:393623, 1:60] 2 4 6 8 10 12 14 16 18 20 ...
..$ rightDaughter: num [1:393623, 1:60] 3 5 7 9 11 13 15 17 19 21 ...
..$ nodepred : num [1:393623, 1:60] -8.15 -31.38 5.62 -59.87 -16.06 ...
..$ bestvar : num [1:393623, 1:60] 118 57 82 77 65 148 39 39 12 77 ...
..$ xbestsplit : num [1:393623, 1:60] 1.08e+02 -8.26e+08 -2.50 8.55e+03 1.20e+04 ...
..$ ncat : Named int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
..$ nrnodes : int 393623
..$ ntree : num 60
..$ xlevels :List of 150
.. ..$ feat1 : num 0
.. ..$ feat2 : num 0
.. ..$ feat3 : num 0
.. ..$ feat4 : num 0
.. ..$ featn : num 0
.. .. [list output truncated]
$ coefs : NULL
$ y : num [1:723012] -1885 -1918 -1585 -1838 -2035 ...
$ test : NULL
$ inbag : NULL
- attr(*, "class")= chr "randomForest"

How to get type 3 F values and marginal effects using a multiple imputation data set (generated from MICE)

I have created an MI data set using the MICE package with 7 imputed data sets
imputeddata <- mice(distress_tibmi, m=7)
the structure of my data is now:
..$ id : num [1:342] 4 8 10 11 23 32 40 47 48 56 ...
..$ diagnosis : Factor w/ 2 levels "psychosis","bpd": 1 1 1 1 1 1 1 1 1 1 ...
..$ gender : Factor w/ 2 levels "female","male": 1 2 2 2 2 1 1 1 1 1 ...
..$ distress.time : Factor w/ 2 levels "baseline","post": 1 1 1 1 1 1 1 1 1 1 ...
..$ distress.score: num [1:342] -2.436 -1.242 0.251 -1.54 0.549 ...
..$ depression : num [1:342] 0.332 0.542 1.172 -0.298 1.172 ...
..$ anxiety : num [1:342] -1.898 -0.687 0.87 -0.687 1.043 ...
..$ choice : num [1:342] 6.73 2.18 2 6.45 3.55 ...
$ imp :List of 8
..$ id :'data.frame': 0 obs. of 7 variables:
.. ..$ 1: logi(0)
.. ..$ 2: logi(0)
.. ..$ 3: logi(0)
.. ..$ 4: logi(0)
.. ..$ 5: logi(0)
.. ..$ 6: logi(0)
.. ..$ 7: logi(0)
..$ diagnosis :'data.frame': 0 obs. of 7 variables:
.. ..$ 1: logi(0)
.. ..$ 2: logi(0)
.. ..$ 3: logi(0)
.. ..$ 4: logi(0)
.. ..$ 5: logi(0)
.. ..$ 6: logi(0)
.. ..$ 7: logi(0)
..$ gender :'data.frame': 0 obs. of 7 variables:
.. ..$ 1: logi(0)
.. ..$ 2: logi(0)
.. ..$ 3: logi(0)
.. ..$ 4: logi(0)
.. ..$ 5: logi(0)
.. ..$ 6: logi(0)
.. ..$ 7: logi(0)
..$ distress.time :'data.frame': 0 obs. of 7 variables:
.. ..$ 1: logi(0)
.. ..$ 2: logi(0)
.. ..$ 3: logi(0)
.. ..$ 4: logi(0)
.. ..$ 5: logi(0)
.. ..$ 6: logi(0)
.. ..$ 7: logi(0)
..$ distress.score:'data.frame': 59 obs. of 7 variables:
.. ..$ 1: num [1:59] -0.6808 -0.6448 -1.658 -0.0293 -0.3463 ...
.. ..$ 2: num [1:59] 1.2736 0.2507 -0.0478 -0.6448 1.2736 ...
.. ..$ 3: num [1:59] -0.681 0.848 -1.658 1.274 0.251 ...
.. ..$ 4: num [1:59] -1.3322 -0.0478 -0.6808 -0.355 -2.4358 ...
.. ..$ 5: num [1:59] -1.3322 -0.355 -4.8239 -0.6448 -0.0293 ...
.. ..$ 6: num [1:59] -1.3322 0.5493 -0.0293 -2.6352 0.8478 ...
.. ..$ 7: num [1:59] 0.5493 0.2507 1.1463 -0.0478 1.2736 ...
..$ depression :'data.frame': 24 obs. of 7 variables:
.. ..$ 1: num [1:24] -0.0882 -0.5084 -1.2966 0.542 -2.1891 ...
.. ..$ 2: num [1:24] 0.332 0.255 1.592 0.752 0.945 ...
.. ..$ 3: num [1:24] -2.159 0.332 -0.262 0.962 1.382 ...
.. ..$ 4: num [1:24] -0.2621 -0.0897 -1.7689 1.1172 0.7724 ...
.. ..$ 5: num [1:24] 0.122 -2.159 -2.399 1.462 -2.189 ...
.. ..$ 6: num [1:24] -0.298 -0.434 -0.607 1.172 0.962 ...
.. ..$ 7: num [1:24] 0.6 1.29 1.635 0.542 0.428 ...
..$ anxiety :'data.frame': 10 obs. of 7 variables:
.. ..$ 1: num [1:10] 0.909 -1.379 1.389 -1.268 -0.598 ...
.. ..$ 2: num [1:10] 1.0433 -1.3789 -0.0955 -0.7655 -0.598 ...
.. ..$ 3: num [1:10] 1.0771 -1.8979 -0.0955 -0.5138 0.0052 ...
.. ..$ 4: num [1:10] -0.598 -1.603 0.9095 -2.608 -0.0955 ...
.. ..$ 5: num [1:10] 0.742 0.2395 -1.7249 -2.1055 -0.0955 ...
.. ..$ 6: num [1:10] 1.412 -0.86 1.389 -2.608 0.575 ...
.. ..$ 7: num [1:10] 1.245 -1.033 0.909 0.909 -1.033 ...
..$ choice :'data.frame': 22 obs. of 7 variables:
.. ..$ 1: num [1:22] 4.55 3.91 7.09 4.27 3.55 ...
.. ..$ 2: num [1:22] 8.09 5.09 5.36 4.91 4.45 ...
.. ..$ 3: num [1:22] 4.27 7.09 3.91 3.91 7.09 ...
.. ..$ 4: num [1:22] 5.82 6.27 7 6.82 4.73 ...
.. ..$ 5: num [1:22] 6.18 5.36 5.36 3.18 3.18 ...
.. ..$ 6: num [1:22] 6.18 6.73 4.73 4.73 5 ...
.. ..$ 7: num [1:22] 5.45 7.09 7.45 3.18 4.91 ...
$ m : num 7
$ where : logi [1:342, 1:8] FALSE FALSE FALSE FALSE FALSE FALSE ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:342] "1" "2" "3" "4" ...
.. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ blocks :List of 8
..$ id : chr "id"
..$ diagnosis : chr "diagnosis"
..$ gender : chr "gender"
..$ distress.time : chr "distress.time"
..$ distress.score: chr "distress.score"
..$ depression : chr "depression"
..$ anxiety : chr "anxiety"
..$ choice : chr "choice"
..- attr(*, "calltype")= Named chr [1:8] "type" "type" "type" "type" ...
.. ..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ call : language mice(data = distress_tibmi, m = 7)
$ nmis : Named int [1:8] 0 0 0 0 59 24 10 22
..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ method : Named chr [1:8] "" "" "" "" ...
..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ predictorMatrix: num [1:8, 1:8] 0 1 1 1 1 1 1 1 1 0 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
.. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ visitSequence : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ formulas :List of 8
..$ id :Class 'formula' language id ~ 0 + diagnosis + gender + distress.time + distress.score + depression + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ diagnosis :Class 'formula' language diagnosis ~ 0 + id + gender + distress.time + distress.score + depression + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ gender :Class 'formula' language gender ~ 0 + id + diagnosis + distress.time + distress.score + depression + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ distress.time :Class 'formula' language distress.time ~ 0 + id + diagnosis + gender + distress.score + depression + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ distress.score:Class 'formula' language distress.score ~ 0 + id + diagnosis + gender + distress.time + depression + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ depression :Class 'formula' language depression ~ 0 + id + diagnosis + gender + distress.time + distress.score + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ anxiety :Class 'formula' language anxiety ~ 0 + id + diagnosis + gender + distress.time + distress.score + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
..$ choice :Class 'formula' language choice ~ 0 + id + diagnosis + gender + distress.time + distress.score + ...
.. .. ..- attr(*, ".Environment")=<environment: 0x7ff907cd9d00>
$ post : Named chr [1:8] "" "" "" "" ...
..- attr(*, "names")= chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
$ blots :List of 8
..$ id : list()
..$ diagnosis : list()
..$ gender : list()
..$ distress.time : list()
..$ distress.score: list()
..$ depression : list()
..$ anxiety : list()
..$ choice : list()
$ seed : logi NA
$ iteration : num 5
$ lastSeedValue : int [1:626] 10403 331 -1243825859 461242975 2057104913 -837414599 -54045022 1529270132 -105270003 -1459771035 ...
$ chainMean : num [1:8, 1:5, 1:7] NaN NaN NaN NaN -0.727 ...
..- attr(*, "dimnames")=List of 3
.. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
.. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. ..$ : chr [1:7] "Chain 1" "Chain 2" "Chain 3" "Chain 4" ...
$ chainVar : num [1:8, 1:5, 1:7] NA NA NA NA 2.26 ...
..- attr(*, "dimnames")=List of 3
.. ..$ : chr [1:8] "id" "diagnosis" "gender" "distress.time" ...
.. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. ..$ : chr [1:7] "Chain 1" "Chain 2" "Chain 3" "Chain 4" ...
$ loggedEvents : NULL
$ version :Classes 'package_version', 'numeric_version' hidden list of 1
..$ : int [1:3] 3 9 0
$ date : Date[1:1], format: ...
- attr(*, "class")= chr "mids"
Show in New WindowClear OutputExpand/Collapse Output
id diagnosis gender
Min. : 1.00 psychosis:250 female:196
1st Qu.: 76.75 bpd : 92 male :146
Median :198.00
Mean :215.66
3rd Qu.:337.00
Max. :514.00
distress.time distress.score depression
baseline:171 Min. :-4.8239 Min. :-2.39920
post :171 1st Qu.:-0.6808 1st Qu.:-0.76410
Median :-0.0293 Median : 0.08280
Mean :-0.3083 Mean :-0.06085
3rd Qu.: 0.6221 3rd Qu.: 0.77240
Max. : 1.2736 Max. : 1.80690
NA's :59 NA's :24
anxiety choice
Min. :-2.6080 Min. :0.0909
1st Qu.:-0.9330 1st Qu.:2.4545
Median :-0.0955 Median :4.0454
Mean :-0.1397 Mean :3.8903
3rd Qu.: 0.8702 3rd Qu.:5.1136
Max. : 1.7471 Max. :8.0909
NA's :10 NA's :22
Show in New WindowClear OutputExpand/Collapse Output
1
<dbl>
2
<dbl>
3
<dbl>
4
<dbl>
5
<dbl>
6
<dbl>
7
<dbl>
21 -0.6808 1.2736 -0.6808 -1.3322 -1.3322 -1.3322 0.5493
34 -0.6448 0.2507 0.8478 -0.0478 -0.3550 0.5493 0.2507
48 -1.6580 -0.0478 -1.6580 -0.6808 -4.8239 -0.0293 1.1463
141 -0.0293 -0.6448 1.2736 -0.3550 -0.6448 -2.6352 -0.0478
143 -0.3463 1.2736 0.2507 -2.4358 -0.0293 0.8478 1.2736
180 1.1463 -1.0065 -2.3094 -3.6124 -0.6448 -1.5403 -1.0065
181 -0.0293 -0.6808 -0.6808 -3.9381 -0.3463 -1.3322 0.2964
182 1.2736 -0.3463 0.9479 -0.0478 0.9479 -0.3463 1.1463
197 -0.3550 -0.0293 -0.6808 -0.3550 -1.3322 -4.8239 -0.6448
208 0.6221 0.2507 -0.6808 -0.3550 -0.6448 0.6221 -0.6448
1-10 of 59 rows
I created a lm with the imputed data set and summarised it using pool()
distressmodel <- with(data = imputeddata, exp = lm(distress.score ~ distress.time * diagnosis))
summary(mice::pool(distressmodel), conf.int = TRUE, conf.level = 0.95 )
however now I want to get the type 3 F values for the model, but this code is not working
car::Anova(mice::pool(distressmodel), type = 3)
it produces this error message:
Error in UseMethod("vcov") : no applicable method for 'vcov' applied to an object of class "c('mipo', 'data.frame')"
I also want to get the marginal effects of the model (eg see effects from only one level of the grouping variable which is diagnosis) which I have done successfully in my complete case analysis, but this code:
summary(margins(distressmodel, data = subset(imputeddata, diagnosis == "bpd", type = "response")))
produces this error
Error in subset_datlist(datlist = x, subset = subset, select = select, : object 'diagnosis' not found
Does anyone have any advice on alterations to the code or way to get the car::anova or margins () packages to work with an MI data set? (preferably being able to pool the results
The with(data, exp) procedure can be used to apply statistical test/models to multiple imputation outputs (mipo) only if they allow extracting the estimates with the coef method and a variance-covariance matrix with vcov. The latter seems not to work for the function car::Anova that you used.
Fortunately, there is the miceadds package, which offers procedures to conduct and pool additional statistical tests. miceadds::mi.anova seems to do exactly what you want:
miceadds::mi.anova(imputeddata, distress.score ~ distress.time * diagnosis, type=3)
I am not sure, however, about the marginal effects. In general, you can do a bit more coding and apply any statistical procedure to each imputed sample separately. Then you can pool it using the pool.scalar function. This method also gives you within-imputation, between-imputation, and total variance estimates for your pooled statistic. (And with that you can conduct a basic t-test for difference from 0, if you want.)
This approach relies on normal distribution of statistics – or on them being transformable to a normally distributed metric. (Stef van Buuren gives a list of statistics that can easily be transformed, pooled, and back-transformed here, see Table 5.2) So it should be possible for the marginal means you want, right?
I do not know the margins function you use (what package is it from?). But, if you want to get the marginal means and pool them yourself, this is the approach:
# transform your mids into a long-format data frame
imputed_l <- mice::complete(imputeddata, action="long")
nimp <- imputed_l$m #number of imputations for convenience
# create vectors to contain the marginal effects and their SEs from all seven imputations
mm_all <- vector("numeric", nimp)
mmse_all <- mm_all
# get marginal means and SEs for all imputations
for (i in 1:nimp) {
mm_all[i] <- Expression_producing_marginal_mean(..., data = subset(imputed_l, .imp=i) )
mmse_all[i] <- Expression_producing_SE(..., data = subset(imputed_l, .imp=i) )
}
# pool them (the U argument should be variances, so square the SEs)
mm_pool <- pool.scalar(Q=mm_all, U=mmse_all^2, n=nrow(imputed_l)/nimp)
mm_pool$qbar #marginal mean aggregated across imputations
sqrt(mm_pool$t) #SE of marginal mean (based on within- and between-imputations variance)

Error 'duplicate subscripts for columns' on using CreateTableOne

I was trying to do CreateTableOne from tableone package for my dataset called m.dataaaaaa using the following code:
CreateTableOne(vars =Vars,strata = "ejecfraclesstha40_gps", factorVars =Catvars, data = m.dataaaaaa, test = T)
But I got the following error :
Error in [<-.data.frame(x, i, value = value) : duplicate
subscripts for columns In addition: Warning message: In
ModuleReturnVarsExist(vars, data) : The data frame does not have:
ejecfraclesstha40 Dropped
structure of the data is shown below as it is a big database
str(m.dataaaaaa)
Classes ‘data.table’ and 'data.frame': 194 obs. of 203 variables:
$ ejecfraclesstha40_gps : num 1 0 1 0 0 0 1 1 1 0 ...
$ Serial.ID : num 2 3 4 7 10 14 17 20 23 24 ...
..- attr(*, "format.spss")= chr "F4.0"
$ Serial.ID_matched.EF.cohort.Ivan1.to.2 : num 2 NA 4 NA NA NA 17 20 23 NA ...
..- attr(*, "format.spss")= chr "F8.0"
$ ps..matched.EF.cohort.Ivan1.to.2 : num 0.138 NA 0.19 NA NA NA 0.176 0.286 0.152 NA ...
..- attr(*, "format.spss")= chr "F8.3"
$ psweight1.to.2 : num 1 NA 1 NA NA NA 1 1 1 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ matched_ID1.to.2 : num 483 NA 763 NA NA NA 180 176 239 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ matched_cases_in_control1.to.2 : num 2 NA 2 NA NA NA 2 2 2 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ ejecfrac_4gps : num 1 3 1 3 3 3 1 1 1 3 ...
..- attr(*, "format.spss")= chr "F8.2"
..- attr(*, "labels")= Named num 1 2 3 4
.. ..- attr(*, "names")= chr "EF<35%" "EF=35 - <40%" "EF=40 - <=50" "EF>50%"
$ ejecfrac_4gps30 : num 1 4 1 3 3 4 1 1 1 4 ...
..- attr(*, "format.spss")= chr "F8.2"
..- attr(*, "labels")= Named num 1 2 3 4
.. ..- attr(*, "names")= chr "EF<=30%" "EF>30 - 39%" "EF=40 - 49%" "EF>=50%"
$ renisch : num 29 31 23 18 48 19 10 29 17 13 ...
..- attr(*, "label")= chr "renal + visceral ischemic time"
..- attr(*, "format.spss")= chr "F3.0"
..- attr(*, "display_width")= int 12
$ totxct : num 46 31 55 46 48 19 54 29 17 37 ...
..- attr(*, "label")= chr "total cross-clamp time"
..- attr(*, "format.spss")= chr "F4.0"
..- attr(*, "display_width")= int 12
The original database was read from spss into r.
My main problem is with this error :
Error in [<-.data.frame(x, i, value = value) : duplicate subscripts for columns
Any advice will be greatly appreciated.

How to manipulate data.frame object in different list more elegantly?

I have data.frame objects in the list which is the output of my function I implemented. However, I intend to make new list where data.frame object in different list put it together. I tried several way to get my expected output but not much elegant. Does anyone know any useful trick of doing this manipulation efficiently ? Is there any elegant solution to accomplish this task ? Any idea?
This is mini example:
savedList <- list(
foo_saved = data.frame(v1=c(1,6,16), v2=c(4,12,23)),
bar_saved = data.frame(v1=c(7,19,31), v2=c(16,28,41)),
cat_saved = data.frame(v1=c(5,13,26), v2=c(11,21,42))
)
dropedList <- list(
foo_droped = data.frame(v1=c(4,9,20), v2=c(7,15,29)),
bar_droped = data.frame(v1=c(14,26,35), v2=c(21,30,47)),
cat_droped = data.frame(v1=c(18,29,39), v2=c(25,36,48))
)
This is my expected output:
foo <- list(
foo_saved = data.frame(v1=c(1,6,16), v2=c(4,12,23)),
foo_droped = data.frame(v1=c(4,9,20), v2=c(7,15,29))
)
bar <- list(
bar_saved = data.frame(v1=c(7,19,31), v2=c(16,28,41)),
bar_droped = data.frame(v1=c(14,26,35), v2=c(21,30,47))
)
cat <- list(
cat_saved = data.frame(v1=c(5,13,26), v2=c(11,21,42)),
cat_droped = data.frame(v1=c(18,29,39), v2=c(25,36,48))
)
I tried some existing solution but I am not feeling satisfy with it. How can I get my desired output easily ? Is there any efficient, compatible solution for this ? Thanks a lot
You could combine the two lists, then split on the common part of the names. split() is not the most efficient function ever, but the code for this is very simple.
x <- c(savedList, dropedList)
split(x, sub("_.*", "", names(x)))
This gives the following:
List of 3
$ bar:List of 2
..$ bar_saved :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 7 19 31
.. ..$ v2: num [1:3] 16 28 41
..$ bar_droped:'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 14 26 35
.. ..$ v2: num [1:3] 21 30 47
$ cat:List of 2
..$ cat_saved :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 5 13 26
.. ..$ v2: num [1:3] 11 21 42
..$ cat_droped:'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 18 29 39
.. ..$ v2: num [1:3] 25 36 48
$ foo:List of 2
..$ foo_saved :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 1 6 16
.. ..$ v2: num [1:3] 4 12 23
..$ foo_droped:'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 4 9 20
.. ..$ v2: num [1:3] 7 15 29
You can use mapply for this, it will iterate thru both lists and make a list with each pair of items:
res <- mapply( list, savedList, dropedList, SIMPLIFY = F)
str(res)
List of 3
$ foo_saved:List of 2
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 1 6 16
.. ..$ v2: num [1:3] 4 12 23
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 4 9 20
.. ..$ v2: num [1:3] 7 15 29
$ bar_saved:List of 2
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 7 19 31
.. ..$ v2: num [1:3] 16 28 41
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 14 26 35
.. ..$ v2: num [1:3] 21 30 47
$ cat_saved:List of 2
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 5 13 26
.. ..$ v2: num [1:3] 11 21 42
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 18 29 39
.. ..$ v2: num [1:3] 25 36 48

Resources