Error 'duplicate subscripts for columns' on using CreateTableOne - r

I was trying to do CreateTableOne from tableone package for my dataset called m.dataaaaaa using the following code:
CreateTableOne(vars =Vars,strata = "ejecfraclesstha40_gps", factorVars =Catvars, data = m.dataaaaaa, test = T)
But I got the following error :
Error in [<-.data.frame(x, i, value = value) : duplicate
subscripts for columns In addition: Warning message: In
ModuleReturnVarsExist(vars, data) : The data frame does not have:
ejecfraclesstha40 Dropped
structure of the data is shown below as it is a big database
str(m.dataaaaaa)
Classes ‘data.table’ and 'data.frame': 194 obs. of 203 variables:
$ ejecfraclesstha40_gps : num 1 0 1 0 0 0 1 1 1 0 ...
$ Serial.ID : num 2 3 4 7 10 14 17 20 23 24 ...
..- attr(*, "format.spss")= chr "F4.0"
$ Serial.ID_matched.EF.cohort.Ivan1.to.2 : num 2 NA 4 NA NA NA 17 20 23 NA ...
..- attr(*, "format.spss")= chr "F8.0"
$ ps..matched.EF.cohort.Ivan1.to.2 : num 0.138 NA 0.19 NA NA NA 0.176 0.286 0.152 NA ...
..- attr(*, "format.spss")= chr "F8.3"
$ psweight1.to.2 : num 1 NA 1 NA NA NA 1 1 1 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ matched_ID1.to.2 : num 483 NA 763 NA NA NA 180 176 239 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ matched_cases_in_control1.to.2 : num 2 NA 2 NA NA NA 2 2 2 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ ejecfrac_4gps : num 1 3 1 3 3 3 1 1 1 3 ...
..- attr(*, "format.spss")= chr "F8.2"
..- attr(*, "labels")= Named num 1 2 3 4
.. ..- attr(*, "names")= chr "EF<35%" "EF=35 - <40%" "EF=40 - <=50" "EF>50%"
$ ejecfrac_4gps30 : num 1 4 1 3 3 4 1 1 1 4 ...
..- attr(*, "format.spss")= chr "F8.2"
..- attr(*, "labels")= Named num 1 2 3 4
.. ..- attr(*, "names")= chr "EF<=30%" "EF>30 - 39%" "EF=40 - 49%" "EF>=50%"
$ renisch : num 29 31 23 18 48 19 10 29 17 13 ...
..- attr(*, "label")= chr "renal + visceral ischemic time"
..- attr(*, "format.spss")= chr "F3.0"
..- attr(*, "display_width")= int 12
$ totxct : num 46 31 55 46 48 19 54 29 17 37 ...
..- attr(*, "label")= chr "total cross-clamp time"
..- attr(*, "format.spss")= chr "F4.0"
..- attr(*, "display_width")= int 12
The original database was read from spss into r.
My main problem is with this error :
Error in [<-.data.frame(x, i, value = value) : duplicate subscripts for columns
Any advice will be greatly appreciated.

Related

R Unable to plot loaded randomForest object

I'm unable to call the function randomForest.plot() when loading a randomForest object through an RData file.
library("randomForest")
load("rf.RData")
plot(rf)
I get the error:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
Get the same error when I call randomForest:::plot.randomForest(rf)
Other function calls on rf work just fine.
EDIT:
See output of str(rf)
str(rf)
List of 15
$ call : language randomForest(x = data[, match("feat1", names(data)):match("feat_n", names(data))], y = data[, match("my_y", n| __truncated__ ...
$ type : chr "regression"
$ predicted : Named num [1:723012] -1141 -1767 -1577 NA -1399 ...
..- attr(*, "names")= chr [1:723012] "1" "2" "3" "4" ...
$ oob.times : int [1:723012] 3 4 6 3 2 3 2 6 7 5 ...
$ importance : num [1:150, 1:2] 6172 928 6367 5754 1013 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
.. ..$ : chr [1:2] "%IncMSE" "IncNodePurity"
$ importanceSD : Named num [1:150] 400.9 96.7 500.1 428.9 194.8 ...
..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
$ localImportance: NULL
$ proximity : NULL
$ ntree : num 60
$ mtry : num 10
$ forest :List of 11
..$ ndbigtree : int [1:60] 392021 392219 392563 392845 393321 392853 392157 392709 393223 392679 ...
..$ nodestatus : num [1:393623, 1:60] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 ...
..$ leftDaughter : num [1:393623, 1:60] 2 4 6 8 10 12 14 16 18 20 ...
..$ rightDaughter: num [1:393623, 1:60] 3 5 7 9 11 13 15 17 19 21 ...
..$ nodepred : num [1:393623, 1:60] -8.15 -31.38 5.62 -59.87 -16.06 ...
..$ bestvar : num [1:393623, 1:60] 118 57 82 77 65 148 39 39 12 77 ...
..$ xbestsplit : num [1:393623, 1:60] 1.08e+02 -8.26e+08 -2.50 8.55e+03 1.20e+04 ...
..$ ncat : Named int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
..$ nrnodes : int 393623
..$ ntree : num 60
..$ xlevels :List of 150
.. ..$ feat1 : num 0
.. ..$ feat2 : num 0
.. ..$ feat3 : num 0
.. ..$ feat4 : num 0
.. ..$ featn : num 0
.. .. [list output truncated]
$ coefs : NULL
$ y : num [1:723012] -1885 -1918 -1585 -1838 -2035 ...
$ test : NULL
$ inbag : NULL
- attr(*, "class")= chr "randomForest"

prcomp error: "undefined columns selected"

I am trying to build a PCA with a matrix of labeled numeric data. I am trying to select only certain columns (6-78) to include in the PCA, but have an error (syntax?)
Here's the code:
cytokines.pca <- prcomp(PICHCytokines[,c(6:78)], center = TRUE, scale. = TRUE)
summary(cytokines.pca)
The error is:
Error in [.data.frame(data, , c(6:78)) : undefined columns selected
Here's the structure of my data frame:
str(PICHCytokines)
'data.frame': 106 obs. of 69 variables:
$ Record.ID : Factor w/ 106 levels "FA001","FA007",..: 1 2 3 4 5 6 7 8 9 10 ...
..- attr(*, "label")= chr "Record ID"
$ Event.Name : Factor w/ 2 levels "Enrollment and Admission",..: 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "Event Name"
$ Time.since.trauma: 'labelled' num 0.717 7.717 1.383 0.817 2.85 ...
..- attr(*, "label")= chr "Time since trauma"
$ Batch.Number : 'labelled' int 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "Batch Number"
$ Plate.Number : 'labelled' int 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "Plate Number"
$ FASL.MFI : 'labelled' num 748 295 256 333 275 ...
..- attr(*, "label")= chr "FASL MFI"
$ TGFA.MFI : 'labelled' num 122 64.2 96 126 94.8 ...
..- attr(*, "label")= chr "TGFA MFI"
$ MIP1A.MFI : 'labelled' num 1611 142 158 339 168 ...
..- attr(*, "label")= chr "MIP1A MFI"
$ IL27.MFI : 'labelled' num 139.2 40 63 52.5 63.2 ...
..- attr(*, "label")= chr "IL27 MFI"
$ IL1B.MFI : 'labelled' num 68 38.2 77.5 46 70.8 ...
..- attr(*, "label")= chr "IL1B MFI"
$ IL2.MFI : 'labelled' num 159 61.5 120.8 79.5 117.2 ...
..- attr(*, "label")= chr "IL2 MFI"

dplyr Group - Do I need to then Ungroup

Here's some simple code utilizing dplyr to group and spread data from the mtcars data set.
library(dplyr)
mtcars.df <- mtcars %>%
group_by(disp, cyl) %>%
summarise(Qty = n())
mtcars.spread <- mtcars.df %>%
spread(cyl, Qty)
str(mtcars.spread)
When you look at the structure of the 'mtcars.spread' tibble you'll notice the '4' and '6' cylinder variable are listed as integers, while the '8' cylinder variable has all this babble
attr(*, "vars")= chr "disp"
attr(*, "drop")= logi TRUE
attr(*, "indices")=List of 27
attached to it. Where did I go wrong? Am I supposed to ungroup along the way after using the group_by command?
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 27 obs. of 4 variables:
$ disp: num 71.1 75.7 78.7 79 95.1 ...
$ 4 : int 1 1 1 1 1 1 1 1 1 1 ...
$ 6 : int NA NA NA NA NA NA NA NA NA NA ...
$ 8 : int NA NA NA NA NA NA NA NA NA NA ...
- attr(*, "vars")= chr "disp"
- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 27
..$ : int 0
..$ : int 1
..$ : int 2
..$ : int 3
..$ : int 4
..$ : int 5
..$ : int 6
..$ : int 7
..$ : int 8
..$ : int 9
..$ : int 10
..$ : int 11
..$ : int 12
..$ : int 13
..$ : int 14
..$ : int 15
..$ : int 16
..$ : int 17
..$ : int 18
..$ : int 19
..$ : int 20
..$ : int 21
..$ : int 22
..$ : int 23
..$ : int 24
..$ : int 25
..$ : int 26
- attr(*, "group_sizes")= int 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "biggest_group_size")= int 1
- attr(*, "labels")='data.frame': 27 obs. of 1 variable:
..$ disp: num 71.1 75.7 78.7 79 95.1 ...
..- attr(*, "vars")= chr "disp"
..- attr(*, "drop")= logi TRUE

aggregate function only returns one value with no function applied

I have some data that looks something like this:
# Date Time Temp Intensity Coupler Attached Host Connected Stopped End Of File
1 05/28/15 06:00:00.0 20.329 893.4
2 05/28/15 07:00:00.0 21.76 5 511.1
3 05/28/15 08:00:00.0 36.946 79 911.6
4 05/28/15 09:00:00.0 40.761 60 622.6
5 05/28/15 10:00:00.0 41.225 24 800.2
6 05/28/15 11:00:00.0 29.853 14 466.8
7 05/28/15 12:00:00.0 26.195 5 511.1
8 05/28/15 13:00:00.0 28.06 9 300.1
9 05/28/15 14:00:00.0 27.468 6 544.5
10 05/28/15 15:00:00.0 26.879 4 133.4
11 05/28/15 16:00:00.0 26 2 238.9
12 05/28/15 17:00:00.0 25.513 1 173.3
13 05/28/15 18:00:00.0 24.738 75.3
14 05/28/15 19:00:00.0 24.062 0
15 05/28/15 20:00:00.0 23.773 0
16 05/28/15 21:00:00.0 23.292 0
17 05/28/15 22:00:00.0 22.812 0
18 05/28/15 23:00:00.0 22.429 0
19 05/29/15 00:00:00.0 22.046 0
20 05/29/15 01:00:00.0 21.76 0
21 05/29/15 02:00:00.0 21.473 0
22 05/29/15 03:00:00.0 21.091 0
23 05/29/15 04:00:00.0 20.901 0
24 05/29/15 05:00:00.0 20.615 0
25 05/29/15 06:00:00.0 20.901 1 894.5
26 05/29/15 07:00:00.0 22.525 8 611.2
27 05/29/15 08:00:00.0 29.652 42 711.4
28 05/29/15 09:00:00.0 36.079 22 44.6
29 05/29/15 10:00:00.0 39.729 77 156.1
30 05/29/15 11:00:00.0 31.37 19 289
31 05/29/15 12:00:00.0 32.086 7 233.4
I am attempting to use the aggregate function to get average temperatures at each time point. I use this function:
aggregate(x=trap7u$Temp,by=list(trap7u$Time),FUN=mean)
This gives the following output:
Group.1 x
1 06:00:00 NA
R does not return any errors, just the above datum. I have attempted casting the columns as different things, as well as attempting to remove any NA's, which returns the same result.
str(trap7u)
returns:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1770 obs. of 9 variables:
$ # : int 1 2 3 4 5 6 7 8 9 10 ...
$ Date : chr "05/28/15" "05/28/15" "05/28/15" "05/28/15" ...
$ Time :Classes 'hms', 'difftime' atomic [1:1770] 21600 25200 28800 32400 36000 39600 43200 46800 50400 54000 ...
.. ..- attr(*, "units")= chr "secs"
$ Temp : num 20.3 21.8 36.9 40.8 41.2 ...
$ Intensity : num 893 5 79 60 24 ...
$ Coupler Attached: num NA 511 912 623 800 ...
$ Host Connected : chr NA NA NA NA ...
$ Stopped : chr NA NA NA NA ...
$ End Of File : chr NA NA NA NA ...
- attr(*, "problems")=Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 5 variables:
..$ row : int 1769
..$ col : chr "Coupler Attached"
..$ expected: chr "a double"
..$ actual : chr "Logged"
..$ file : chr "'~/Desktop/bioinformatic_work/HOBO_files_complete/hobo_files/2015-AUG-offload/trap7u_10733861_150809.csv'"
- attr(*, "spec")=List of 2
..$ cols :List of 9
.. ..$ # : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Date : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ Time :List of 1
.. .. ..$ format: chr ""
.. .. ..- attr(*, "class")= chr "collector_time" "collector"
.. ..$ Temp : list()
.. .. ..- attr(*, "class")= chr "collector_double" "collector"
.. ..$ Intensity : list()
.. .. ..- attr(*, "class")= chr "collector_double" "collector"
.. ..$ Coupler Attached: list()
.. .. ..- attr(*, "class")= chr "collector_double" "collector"
.. ..$ Host Connected : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ Stopped : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ End Of File : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
..$ default: list()
.. ..- attr(*, "class")= chr "collector_guess" "collector"
..- attr(*, "class")= chr "col_spec"
What I am trying to get is the mean Temp values for each time, how can I accomplish this?

How Can I Quickly Inspect Built-in Data Sets (PSA)?

One of the best ways to make a question reproducible is to use one of the built in data sets. Using data(), however, is frustrating because no information about the structure of the data set is provided.
How can I quickly view the structure of available data sets?
The following function may help:
dataStr <- function(fun=function(x) TRUE)
str(
Filter(
fun,
Filter(
Negate(is.null),
mget(data()$results[, "Item"], inh=T, ifn=list(NULL))
) ) )
It accepts a filtering function, applies it to all the data sets, and prints out the structure of the matching data sets. For example, if we're looking for matrices:
> dataStr(is.matrix)
List of 8
$ WorldPhones : num [1:7, 1:7] 45939 60423 64721 68484 71799 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:7] "1951" "1956" "1957" "1958" ...
.. ..$ : chr [1:7] "N.Amer" "Europe" "Asia" "S.Amer" ...
$ occupationalStatus : 'table' int [1:8, 1:8] 50 16 12 11 2 12 0 0 19 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ origin : chr [1:8] "1" "2" "3" "4" ...
.. ..$ destination: chr [1:8] "1" "2" "3" "4" ...
$ volcano : num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...
--- 5 entries omitted ---
Or for data frames (also omitting entries):
> dataStr(is.data.frame)
List of 42
$ BOD :'data.frame': 6 obs. of 2 variables:
..$ Time : num [1:6] 1 2 3 4 5 7
..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
..- attr(*, "reference")= chr "A1.4, p. 270"
$ CO2 :Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame': 84 obs. of 5 variables:
..$ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
..$ Type : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
..$ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
..$ conc : num [1:84] 95 175 250 350 500 675 1000 95 175 250 ...
..$ uptake : num [1:84] 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
--- 40 entries omitted ---
Or even for simple vectors:
> dataStr(function(x) is.atomic(x) && is.vector(x) && !is.ts(x))
List of 4
$ euro : Named num [1:11] 13.76 40.34 1.96 166.39 5.95 ...
..- attr(*, "names")= chr [1:11] "ATS" "BEF" "DEM" "ESP" ...
$ islands: Named num [1:48] 11506 5500 16988 2968 16 ...
..- attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia" ...
$ precip : Named num [1:70] 67 54.7 7 48.5 14 17.2 20.7 13 43.4 40.2 ...
..- attr(*, "names")= chr [1:70] "Mobile" "Juneau" "Phoenix" "Little Rock" ...
$ rivers : num [1:141] 735 320 325 392 524 ...

Resources