Here's some simple code utilizing dplyr to group and spread data from the mtcars data set.
library(dplyr)
mtcars.df <- mtcars %>%
group_by(disp, cyl) %>%
summarise(Qty = n())
mtcars.spread <- mtcars.df %>%
spread(cyl, Qty)
str(mtcars.spread)
When you look at the structure of the 'mtcars.spread' tibble you'll notice the '4' and '6' cylinder variable are listed as integers, while the '8' cylinder variable has all this babble
attr(*, "vars")= chr "disp"
attr(*, "drop")= logi TRUE
attr(*, "indices")=List of 27
attached to it. Where did I go wrong? Am I supposed to ungroup along the way after using the group_by command?
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 27 obs. of 4 variables:
$ disp: num 71.1 75.7 78.7 79 95.1 ...
$ 4 : int 1 1 1 1 1 1 1 1 1 1 ...
$ 6 : int NA NA NA NA NA NA NA NA NA NA ...
$ 8 : int NA NA NA NA NA NA NA NA NA NA ...
- attr(*, "vars")= chr "disp"
- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 27
..$ : int 0
..$ : int 1
..$ : int 2
..$ : int 3
..$ : int 4
..$ : int 5
..$ : int 6
..$ : int 7
..$ : int 8
..$ : int 9
..$ : int 10
..$ : int 11
..$ : int 12
..$ : int 13
..$ : int 14
..$ : int 15
..$ : int 16
..$ : int 17
..$ : int 18
..$ : int 19
..$ : int 20
..$ : int 21
..$ : int 22
..$ : int 23
..$ : int 24
..$ : int 25
..$ : int 26
- attr(*, "group_sizes")= int 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "biggest_group_size")= int 1
- attr(*, "labels")='data.frame': 27 obs. of 1 variable:
..$ disp: num 71.1 75.7 78.7 79 95.1 ...
..- attr(*, "vars")= chr "disp"
..- attr(*, "drop")= logi TRUE
Related
I am using ConsensusClusterPlus package in R for clustering my omic data. I want to use my clusters for regression.Is there a way to create composite scores if say i reduce 1000 genes to 7 clusters and use those 7 clusters for regression.
I tried to look at structure of cluster in R.
results = ConsensusClusterPlus(d1,maxK=maxK,reps=1000,pItem=0.8,pFeature=1, title=title,clusterAlg="hc",distance="pearson",seed=1262118388.71279,plot="png")
icl = calcICL(results,title=title,plot="png")
str(results[[7]])
List of 5
$ consensusMatrix: num [1:40, 1:40] 1 0.689 0.976 1 1 ...
$ consensusTree :List of 7
..$ merge : int [1:39, 1:2] -1 -5 -7 -8 -9 -10 -11 -12 -13 -14 ...
..$ height : num [1:39] 0 0 0 0 0 0 0 0 0 0 ...
..$ order : int [1:40] 40 34 35 28 6 32 22 18 21 19 ...
..$ labels : NULL
..$ method : chr "average"
..$ call : language hclust(d = as.dist(1 - fm), method = finalLinkage)
..$ dist.method: NULL
..- attr(*, "class")= chr "hclust"
$ consensusClass : Named int [1:40] 1 1 1 1 1 2 1 1 1 1 ...
..- attr(*, "names")= chr [1:40] "CAR 12:0" "CAR 12:1" "CAR 13:0" "CAR 14:0" ...
$ ml : num [1:40, 1:40] 1 0.689 0.976 1 1 ...
$ clrs :List of 3
..$ : chr [1:40] "#A6CEE3" "#A6CEE3" "#A6CEE3" "#A6CEE3" ...
..$ : num 8
..$ : chr [1:7] "#A6CEE3" "#FB9A99" "#FF7F00" "#FDBF6F" ...
How to find composite scores ?
I'm unable to call the function randomForest.plot() when loading a randomForest object through an RData file.
library("randomForest")
load("rf.RData")
plot(rf)
I get the error:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
Get the same error when I call randomForest:::plot.randomForest(rf)
Other function calls on rf work just fine.
EDIT:
See output of str(rf)
str(rf)
List of 15
$ call : language randomForest(x = data[, match("feat1", names(data)):match("feat_n", names(data))], y = data[, match("my_y", n| __truncated__ ...
$ type : chr "regression"
$ predicted : Named num [1:723012] -1141 -1767 -1577 NA -1399 ...
..- attr(*, "names")= chr [1:723012] "1" "2" "3" "4" ...
$ oob.times : int [1:723012] 3 4 6 3 2 3 2 6 7 5 ...
$ importance : num [1:150, 1:2] 6172 928 6367 5754 1013 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
.. ..$ : chr [1:2] "%IncMSE" "IncNodePurity"
$ importanceSD : Named num [1:150] 400.9 96.7 500.1 428.9 194.8 ...
..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
$ localImportance: NULL
$ proximity : NULL
$ ntree : num 60
$ mtry : num 10
$ forest :List of 11
..$ ndbigtree : int [1:60] 392021 392219 392563 392845 393321 392853 392157 392709 393223 392679 ...
..$ nodestatus : num [1:393623, 1:60] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 ...
..$ leftDaughter : num [1:393623, 1:60] 2 4 6 8 10 12 14 16 18 20 ...
..$ rightDaughter: num [1:393623, 1:60] 3 5 7 9 11 13 15 17 19 21 ...
..$ nodepred : num [1:393623, 1:60] -8.15 -31.38 5.62 -59.87 -16.06 ...
..$ bestvar : num [1:393623, 1:60] 118 57 82 77 65 148 39 39 12 77 ...
..$ xbestsplit : num [1:393623, 1:60] 1.08e+02 -8.26e+08 -2.50 8.55e+03 1.20e+04 ...
..$ ncat : Named int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
..$ nrnodes : int 393623
..$ ntree : num 60
..$ xlevels :List of 150
.. ..$ feat1 : num 0
.. ..$ feat2 : num 0
.. ..$ feat3 : num 0
.. ..$ feat4 : num 0
.. ..$ featn : num 0
.. .. [list output truncated]
$ coefs : NULL
$ y : num [1:723012] -1885 -1918 -1585 -1838 -2035 ...
$ test : NULL
$ inbag : NULL
- attr(*, "class")= chr "randomForest"
I was trying to do CreateTableOne from tableone package for my dataset called m.dataaaaaa using the following code:
CreateTableOne(vars =Vars,strata = "ejecfraclesstha40_gps", factorVars =Catvars, data = m.dataaaaaa, test = T)
But I got the following error :
Error in [<-.data.frame(x, i, value = value) : duplicate
subscripts for columns In addition: Warning message: In
ModuleReturnVarsExist(vars, data) : The data frame does not have:
ejecfraclesstha40 Dropped
structure of the data is shown below as it is a big database
str(m.dataaaaaa)
Classes ‘data.table’ and 'data.frame': 194 obs. of 203 variables:
$ ejecfraclesstha40_gps : num 1 0 1 0 0 0 1 1 1 0 ...
$ Serial.ID : num 2 3 4 7 10 14 17 20 23 24 ...
..- attr(*, "format.spss")= chr "F4.0"
$ Serial.ID_matched.EF.cohort.Ivan1.to.2 : num 2 NA 4 NA NA NA 17 20 23 NA ...
..- attr(*, "format.spss")= chr "F8.0"
$ ps..matched.EF.cohort.Ivan1.to.2 : num 0.138 NA 0.19 NA NA NA 0.176 0.286 0.152 NA ...
..- attr(*, "format.spss")= chr "F8.3"
$ psweight1.to.2 : num 1 NA 1 NA NA NA 1 1 1 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ matched_ID1.to.2 : num 483 NA 763 NA NA NA 180 176 239 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ matched_cases_in_control1.to.2 : num 2 NA 2 NA NA NA 2 2 2 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ ejecfrac_4gps : num 1 3 1 3 3 3 1 1 1 3 ...
..- attr(*, "format.spss")= chr "F8.2"
..- attr(*, "labels")= Named num 1 2 3 4
.. ..- attr(*, "names")= chr "EF<35%" "EF=35 - <40%" "EF=40 - <=50" "EF>50%"
$ ejecfrac_4gps30 : num 1 4 1 3 3 4 1 1 1 4 ...
..- attr(*, "format.spss")= chr "F8.2"
..- attr(*, "labels")= Named num 1 2 3 4
.. ..- attr(*, "names")= chr "EF<=30%" "EF>30 - 39%" "EF=40 - 49%" "EF>=50%"
$ renisch : num 29 31 23 18 48 19 10 29 17 13 ...
..- attr(*, "label")= chr "renal + visceral ischemic time"
..- attr(*, "format.spss")= chr "F3.0"
..- attr(*, "display_width")= int 12
$ totxct : num 46 31 55 46 48 19 54 29 17 37 ...
..- attr(*, "label")= chr "total cross-clamp time"
..- attr(*, "format.spss")= chr "F4.0"
..- attr(*, "display_width")= int 12
The original database was read from spss into r.
My main problem is with this error :
Error in [<-.data.frame(x, i, value = value) : duplicate subscripts for columns
Any advice will be greatly appreciated.
I have data.frame objects in the list which is the output of my function I implemented. However, I intend to make new list where data.frame object in different list put it together. I tried several way to get my expected output but not much elegant. Does anyone know any useful trick of doing this manipulation efficiently ? Is there any elegant solution to accomplish this task ? Any idea?
This is mini example:
savedList <- list(
foo_saved = data.frame(v1=c(1,6,16), v2=c(4,12,23)),
bar_saved = data.frame(v1=c(7,19,31), v2=c(16,28,41)),
cat_saved = data.frame(v1=c(5,13,26), v2=c(11,21,42))
)
dropedList <- list(
foo_droped = data.frame(v1=c(4,9,20), v2=c(7,15,29)),
bar_droped = data.frame(v1=c(14,26,35), v2=c(21,30,47)),
cat_droped = data.frame(v1=c(18,29,39), v2=c(25,36,48))
)
This is my expected output:
foo <- list(
foo_saved = data.frame(v1=c(1,6,16), v2=c(4,12,23)),
foo_droped = data.frame(v1=c(4,9,20), v2=c(7,15,29))
)
bar <- list(
bar_saved = data.frame(v1=c(7,19,31), v2=c(16,28,41)),
bar_droped = data.frame(v1=c(14,26,35), v2=c(21,30,47))
)
cat <- list(
cat_saved = data.frame(v1=c(5,13,26), v2=c(11,21,42)),
cat_droped = data.frame(v1=c(18,29,39), v2=c(25,36,48))
)
I tried some existing solution but I am not feeling satisfy with it. How can I get my desired output easily ? Is there any efficient, compatible solution for this ? Thanks a lot
You could combine the two lists, then split on the common part of the names. split() is not the most efficient function ever, but the code for this is very simple.
x <- c(savedList, dropedList)
split(x, sub("_.*", "", names(x)))
This gives the following:
List of 3
$ bar:List of 2
..$ bar_saved :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 7 19 31
.. ..$ v2: num [1:3] 16 28 41
..$ bar_droped:'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 14 26 35
.. ..$ v2: num [1:3] 21 30 47
$ cat:List of 2
..$ cat_saved :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 5 13 26
.. ..$ v2: num [1:3] 11 21 42
..$ cat_droped:'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 18 29 39
.. ..$ v2: num [1:3] 25 36 48
$ foo:List of 2
..$ foo_saved :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 1 6 16
.. ..$ v2: num [1:3] 4 12 23
..$ foo_droped:'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 4 9 20
.. ..$ v2: num [1:3] 7 15 29
You can use mapply for this, it will iterate thru both lists and make a list with each pair of items:
res <- mapply( list, savedList, dropedList, SIMPLIFY = F)
str(res)
List of 3
$ foo_saved:List of 2
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 1 6 16
.. ..$ v2: num [1:3] 4 12 23
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 4 9 20
.. ..$ v2: num [1:3] 7 15 29
$ bar_saved:List of 2
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 7 19 31
.. ..$ v2: num [1:3] 16 28 41
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 14 26 35
.. ..$ v2: num [1:3] 21 30 47
$ cat_saved:List of 2
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 5 13 26
.. ..$ v2: num [1:3] 11 21 42
..$ :'data.frame': 3 obs. of 2 variables:
.. ..$ v1: num [1:3] 18 29 39
.. ..$ v2: num [1:3] 25 36 48
I have a nested list density_subset_list
It contains 6 lists, which each contain another 3 lists of density data. e.g.
dsl <- A(all_density, p1_density,p2_density), B(all_density, p1_density,p2_density)
etc.
I would like the overall y range.
Here is my attempt.
for (i in 1:length(INTlist)){
y <- unlist(lapply(density_subset_list[[i]], function(d) range(d$y)))
yall <- c(y, yall)
}
range(yall)
It doesn't seem to be working.
Any help is appreciated
Thanks
str(density_subset_list)
List of 6
$ STRexp :List of 3
..$ all:List of 7
.. ..$ x : num [1:512] -0.712 -0.708 -0.705 -0.702 -0.698 ...
.. ..$ y : num [1:512] 2.17e-14 3.64e-14 5.99e-14 9.64e-14 1.62e-13 ...
.. ..$ bw : num 0.047
.. ..$ n : int 1127
.. ..$ call : language density.default(x = x$corr, from = min(Sa14_scoreCorr$corr), to = max(Sa14_scoreCorr$corr), na.rm = T)
.. ..$ data.name: chr "x$corr"
.. ..$ has.na : logi FALSE
.. ..- attr(*, "class")= chr "density"
..$ Kan:List of 7
.. ..$ x : num [1:512] -0.712 -0.708 -0.705 -0.702 -0.698 ...
.. ..$ y : num [1:512] 2.60e-08 3.42e-08 4.50e-08 5.88e-08 7.62e-08 ...
.. ..$ bw : num 0.0649
.. ..$ n : int 287
.. ..$ call : language density.default(x = x$corr, from = min(Sa14_scoreCorr$corr), to = max(Sa14_scoreCorr$corr), na.rm = T)
.. ..$ data.name: chr "x$corr"
.. ..$ has.na : logi FALSE
.. ..- attr(*, "class")= chr "density"
..$ Cm :List of 7
.. ..$ x : num [1:512] -0.712 -0.708 -0.705 -0.702 -0.698 ...
.. ..$ y : num [1:512] 3.88e-08 4.79e-08 5.94e-08 7.38e-08 9.10e-08 ...re
You're very close, but a for isn't needed:
set.seed(100)
dat <- list(STRexp=list(list(), list(), list()),
all=list(y=sample(100, 50)),
Kan=list(y=sample(10,3)),
Cm=list(y=sample(1000, 100)))
str(dat)
## List of 4
## $ STRexp:List of 3
## ..$ : list()
## ..$ : list()
## ..$ : list()
## $ all :List of 1
## ..$ y: int [1:50] 31 26 55 6 45 46 77 35 51 16 ...
## $ Kan :List of 1
## ..$ y: int [1:3] 4 2 9
## $ Cm :List of 1
## ..$ y: int [1:100] 275 591 253 124 229 595 211 461 642 952 ...
# this will get the range of each "y" then get the overall range
range(unlist(lapply(names(dat)[-1], function(x) range(dat[[x]]$y))))
## [1] 2 991