I am trying to build a PCA with a matrix of labeled numeric data. I am trying to select only certain columns (6-78) to include in the PCA, but have an error (syntax?)
Here's the code:
cytokines.pca <- prcomp(PICHCytokines[,c(6:78)], center = TRUE, scale. = TRUE)
summary(cytokines.pca)
The error is:
Error in [.data.frame(data, , c(6:78)) : undefined columns selected
Here's the structure of my data frame:
str(PICHCytokines)
'data.frame': 106 obs. of 69 variables:
$ Record.ID : Factor w/ 106 levels "FA001","FA007",..: 1 2 3 4 5 6 7 8 9 10 ...
..- attr(*, "label")= chr "Record ID"
$ Event.Name : Factor w/ 2 levels "Enrollment and Admission",..: 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "Event Name"
$ Time.since.trauma: 'labelled' num 0.717 7.717 1.383 0.817 2.85 ...
..- attr(*, "label")= chr "Time since trauma"
$ Batch.Number : 'labelled' int 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "Batch Number"
$ Plate.Number : 'labelled' int 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "Plate Number"
$ FASL.MFI : 'labelled' num 748 295 256 333 275 ...
..- attr(*, "label")= chr "FASL MFI"
$ TGFA.MFI : 'labelled' num 122 64.2 96 126 94.8 ...
..- attr(*, "label")= chr "TGFA MFI"
$ MIP1A.MFI : 'labelled' num 1611 142 158 339 168 ...
..- attr(*, "label")= chr "MIP1A MFI"
$ IL27.MFI : 'labelled' num 139.2 40 63 52.5 63.2 ...
..- attr(*, "label")= chr "IL27 MFI"
$ IL1B.MFI : 'labelled' num 68 38.2 77.5 46 70.8 ...
..- attr(*, "label")= chr "IL1B MFI"
$ IL2.MFI : 'labelled' num 159 61.5 120.8 79.5 117.2 ...
..- attr(*, "label")= chr "IL2 MFI"
Related
This is the frist time to perform KDE in R with data which has more than 5 variables for me for anomaly detection.
As far as I know that KDE is performable for multidimensional data but I couldn't find examples which using more than 5 dimensional data.
I'm using data which have 'age', 'trestbps', 'chol', 'thalach', and 'oldpeak' 5 variables as like below.
'data.frame': 176 obs. of 5 variables:
$ age : int 30 50 50 50 50 60 50 40 50 40 ...
$ trestbps: int 130 130 130 130 130 130 130 130 130 130 ...
$ chol : int 198 245 221 288 205 309 240 243 289 250 ...
$ thalach : int 130 166 164 159 184 131 154 152 124 179 ...
$ oldpeak : num 1.6 2.4 0 0.2 0 1.8 0.6 0 1 0 ...
I performed KDE for those data, with the approach as like below, but I'm not sure it is correct approach, and proper result.
evpts <- do.call(expand.grid, lapply(df3, quantile, prob = c(0.1,.25,.5,.75,.9)))
hat2 <- kde(df3, eval.points = evpts)
> str(hat2)
List of 9
$ x : num [1:176, 1:5] 30 50 50 50 50 60 50 40 50 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:5] "age" "trestbps" "chol" "thalach" ...
$ eval.points:'data.frame': 3125 obs. of 5 variables:
..$ age : Named num [1:3125] 40 40 50 60 60 40 40 50 60 60 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "25%" "50%" "75%" ...
..$ trestbps: Named num [1:3125] 108 108 108 108 108 112 112 112 112 112 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ chol : Named num [1:3125] 194 194 194 194 194 194 194 194 194 194 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ thalach : Named num [1:3125] 114 114 114 114 114 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..$ oldpeak : Named num [1:3125] 0 0 0 0 0 0 0 0 0 0 ...
.. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
..- attr(*, "out.attrs")=List of 2
.. ..$ dim : Named int [1:5] 5 5 5 5 5
.. .. ..- attr(*, "names")= chr [1:5] "age" "trestbps" "chol" "thalach" ...
.. ..$ dimnames:List of 5
.. .. ..$ age : chr [1:5] "age=40" "age=40" "age=50" "age=60" ...
.. .. ..$ trestbps: chr [1:5] "trestbps=108" "trestbps=112" "trestbps=120" "trestbps=128" ...
.. .. ..$ chol : chr [1:5] "chol=194.00" "chol=211.00" "chol=244.00" "chol=283.75" ...
.. .. ..$ thalach : chr [1:5] "thalach=113.50" "thalach=128.25" "thalach=150.00" "thalach=164.00" ...
.. .. ..$ oldpeak : chr [1:5] "oldpeak=0.0" "oldpeak=0.0" "oldpeak=0.8" "oldpeak=1.8" ...
$ estimate : Named num [1:3125] 5.64e-12 5.64e-12 2.85e-09 7.76e-10 7.76e-10 ...
..- attr(*, "names")= chr [1:3125] "1" "2" "3" "4" ...
$ H : num [1:5, 1:5] 6.972 0.866 5.065 -6.541 0.189 ...
$ gridded : logi FALSE
$ binned : logi FALSE
$ names : chr [1:5] "age" "trestbps" "chol" "thalach" ...
$ w : num [1:176] 1 1 1 1 1 1 1 1 1 1 ...
$ type : chr "kde"
- attr(*, "class")= chr "kde"
If it is not proper approach, could you please help me to get correct approach?
Thank you for your support.
I'm unable to call the function randomForest.plot() when loading a randomForest object through an RData file.
library("randomForest")
load("rf.RData")
plot(rf)
I get the error:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
Get the same error when I call randomForest:::plot.randomForest(rf)
Other function calls on rf work just fine.
EDIT:
See output of str(rf)
str(rf)
List of 15
$ call : language randomForest(x = data[, match("feat1", names(data)):match("feat_n", names(data))], y = data[, match("my_y", n| __truncated__ ...
$ type : chr "regression"
$ predicted : Named num [1:723012] -1141 -1767 -1577 NA -1399 ...
..- attr(*, "names")= chr [1:723012] "1" "2" "3" "4" ...
$ oob.times : int [1:723012] 3 4 6 3 2 3 2 6 7 5 ...
$ importance : num [1:150, 1:2] 6172 928 6367 5754 1013 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
.. ..$ : chr [1:2] "%IncMSE" "IncNodePurity"
$ importanceSD : Named num [1:150] 400.9 96.7 500.1 428.9 194.8 ...
..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
$ localImportance: NULL
$ proximity : NULL
$ ntree : num 60
$ mtry : num 10
$ forest :List of 11
..$ ndbigtree : int [1:60] 392021 392219 392563 392845 393321 392853 392157 392709 393223 392679 ...
..$ nodestatus : num [1:393623, 1:60] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 ...
..$ leftDaughter : num [1:393623, 1:60] 2 4 6 8 10 12 14 16 18 20 ...
..$ rightDaughter: num [1:393623, 1:60] 3 5 7 9 11 13 15 17 19 21 ...
..$ nodepred : num [1:393623, 1:60] -8.15 -31.38 5.62 -59.87 -16.06 ...
..$ bestvar : num [1:393623, 1:60] 118 57 82 77 65 148 39 39 12 77 ...
..$ xbestsplit : num [1:393623, 1:60] 1.08e+02 -8.26e+08 -2.50 8.55e+03 1.20e+04 ...
..$ ncat : Named int [1:150] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "names")= chr [1:150] "feat1" "feat2" "feat3" "feat4" ...
..$ nrnodes : int 393623
..$ ntree : num 60
..$ xlevels :List of 150
.. ..$ feat1 : num 0
.. ..$ feat2 : num 0
.. ..$ feat3 : num 0
.. ..$ feat4 : num 0
.. ..$ featn : num 0
.. .. [list output truncated]
$ coefs : NULL
$ y : num [1:723012] -1885 -1918 -1585 -1838 -2035 ...
$ test : NULL
$ inbag : NULL
- attr(*, "class")= chr "randomForest"
I was trying to do CreateTableOne from tableone package for my dataset called m.dataaaaaa using the following code:
CreateTableOne(vars =Vars,strata = "ejecfraclesstha40_gps", factorVars =Catvars, data = m.dataaaaaa, test = T)
But I got the following error :
Error in [<-.data.frame(x, i, value = value) : duplicate
subscripts for columns In addition: Warning message: In
ModuleReturnVarsExist(vars, data) : The data frame does not have:
ejecfraclesstha40 Dropped
structure of the data is shown below as it is a big database
str(m.dataaaaaa)
Classes ‘data.table’ and 'data.frame': 194 obs. of 203 variables:
$ ejecfraclesstha40_gps : num 1 0 1 0 0 0 1 1 1 0 ...
$ Serial.ID : num 2 3 4 7 10 14 17 20 23 24 ...
..- attr(*, "format.spss")= chr "F4.0"
$ Serial.ID_matched.EF.cohort.Ivan1.to.2 : num 2 NA 4 NA NA NA 17 20 23 NA ...
..- attr(*, "format.spss")= chr "F8.0"
$ ps..matched.EF.cohort.Ivan1.to.2 : num 0.138 NA 0.19 NA NA NA 0.176 0.286 0.152 NA ...
..- attr(*, "format.spss")= chr "F8.3"
$ psweight1.to.2 : num 1 NA 1 NA NA NA 1 1 1 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ matched_ID1.to.2 : num 483 NA 763 NA NA NA 180 176 239 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ matched_cases_in_control1.to.2 : num 2 NA 2 NA NA NA 2 2 2 NA ...
..- attr(*, "format.spss")= chr "F8.2"
$ ejecfrac_4gps : num 1 3 1 3 3 3 1 1 1 3 ...
..- attr(*, "format.spss")= chr "F8.2"
..- attr(*, "labels")= Named num 1 2 3 4
.. ..- attr(*, "names")= chr "EF<35%" "EF=35 - <40%" "EF=40 - <=50" "EF>50%"
$ ejecfrac_4gps30 : num 1 4 1 3 3 4 1 1 1 4 ...
..- attr(*, "format.spss")= chr "F8.2"
..- attr(*, "labels")= Named num 1 2 3 4
.. ..- attr(*, "names")= chr "EF<=30%" "EF>30 - 39%" "EF=40 - 49%" "EF>=50%"
$ renisch : num 29 31 23 18 48 19 10 29 17 13 ...
..- attr(*, "label")= chr "renal + visceral ischemic time"
..- attr(*, "format.spss")= chr "F3.0"
..- attr(*, "display_width")= int 12
$ totxct : num 46 31 55 46 48 19 54 29 17 37 ...
..- attr(*, "label")= chr "total cross-clamp time"
..- attr(*, "format.spss")= chr "F4.0"
..- attr(*, "display_width")= int 12
The original database was read from spss into r.
My main problem is with this error :
Error in [<-.data.frame(x, i, value = value) : duplicate subscripts for columns
Any advice will be greatly appreciated.
I have some data that looks something like this:
# Date Time Temp Intensity Coupler Attached Host Connected Stopped End Of File
1 05/28/15 06:00:00.0 20.329 893.4
2 05/28/15 07:00:00.0 21.76 5 511.1
3 05/28/15 08:00:00.0 36.946 79 911.6
4 05/28/15 09:00:00.0 40.761 60 622.6
5 05/28/15 10:00:00.0 41.225 24 800.2
6 05/28/15 11:00:00.0 29.853 14 466.8
7 05/28/15 12:00:00.0 26.195 5 511.1
8 05/28/15 13:00:00.0 28.06 9 300.1
9 05/28/15 14:00:00.0 27.468 6 544.5
10 05/28/15 15:00:00.0 26.879 4 133.4
11 05/28/15 16:00:00.0 26 2 238.9
12 05/28/15 17:00:00.0 25.513 1 173.3
13 05/28/15 18:00:00.0 24.738 75.3
14 05/28/15 19:00:00.0 24.062 0
15 05/28/15 20:00:00.0 23.773 0
16 05/28/15 21:00:00.0 23.292 0
17 05/28/15 22:00:00.0 22.812 0
18 05/28/15 23:00:00.0 22.429 0
19 05/29/15 00:00:00.0 22.046 0
20 05/29/15 01:00:00.0 21.76 0
21 05/29/15 02:00:00.0 21.473 0
22 05/29/15 03:00:00.0 21.091 0
23 05/29/15 04:00:00.0 20.901 0
24 05/29/15 05:00:00.0 20.615 0
25 05/29/15 06:00:00.0 20.901 1 894.5
26 05/29/15 07:00:00.0 22.525 8 611.2
27 05/29/15 08:00:00.0 29.652 42 711.4
28 05/29/15 09:00:00.0 36.079 22 44.6
29 05/29/15 10:00:00.0 39.729 77 156.1
30 05/29/15 11:00:00.0 31.37 19 289
31 05/29/15 12:00:00.0 32.086 7 233.4
I am attempting to use the aggregate function to get average temperatures at each time point. I use this function:
aggregate(x=trap7u$Temp,by=list(trap7u$Time),FUN=mean)
This gives the following output:
Group.1 x
1 06:00:00 NA
R does not return any errors, just the above datum. I have attempted casting the columns as different things, as well as attempting to remove any NA's, which returns the same result.
str(trap7u)
returns:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1770 obs. of 9 variables:
$ # : int 1 2 3 4 5 6 7 8 9 10 ...
$ Date : chr "05/28/15" "05/28/15" "05/28/15" "05/28/15" ...
$ Time :Classes 'hms', 'difftime' atomic [1:1770] 21600 25200 28800 32400 36000 39600 43200 46800 50400 54000 ...
.. ..- attr(*, "units")= chr "secs"
$ Temp : num 20.3 21.8 36.9 40.8 41.2 ...
$ Intensity : num 893 5 79 60 24 ...
$ Coupler Attached: num NA 511 912 623 800 ...
$ Host Connected : chr NA NA NA NA ...
$ Stopped : chr NA NA NA NA ...
$ End Of File : chr NA NA NA NA ...
- attr(*, "problems")=Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 5 variables:
..$ row : int 1769
..$ col : chr "Coupler Attached"
..$ expected: chr "a double"
..$ actual : chr "Logged"
..$ file : chr "'~/Desktop/bioinformatic_work/HOBO_files_complete/hobo_files/2015-AUG-offload/trap7u_10733861_150809.csv'"
- attr(*, "spec")=List of 2
..$ cols :List of 9
.. ..$ # : list()
.. .. ..- attr(*, "class")= chr "collector_integer" "collector"
.. ..$ Date : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ Time :List of 1
.. .. ..$ format: chr ""
.. .. ..- attr(*, "class")= chr "collector_time" "collector"
.. ..$ Temp : list()
.. .. ..- attr(*, "class")= chr "collector_double" "collector"
.. ..$ Intensity : list()
.. .. ..- attr(*, "class")= chr "collector_double" "collector"
.. ..$ Coupler Attached: list()
.. .. ..- attr(*, "class")= chr "collector_double" "collector"
.. ..$ Host Connected : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ Stopped : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
.. ..$ End Of File : list()
.. .. ..- attr(*, "class")= chr "collector_character" "collector"
..$ default: list()
.. ..- attr(*, "class")= chr "collector_guess" "collector"
..- attr(*, "class")= chr "col_spec"
What I am trying to get is the mean Temp values for each time, how can I accomplish this?
One of the best ways to make a question reproducible is to use one of the built in data sets. Using data(), however, is frustrating because no information about the structure of the data set is provided.
How can I quickly view the structure of available data sets?
The following function may help:
dataStr <- function(fun=function(x) TRUE)
str(
Filter(
fun,
Filter(
Negate(is.null),
mget(data()$results[, "Item"], inh=T, ifn=list(NULL))
) ) )
It accepts a filtering function, applies it to all the data sets, and prints out the structure of the matching data sets. For example, if we're looking for matrices:
> dataStr(is.matrix)
List of 8
$ WorldPhones : num [1:7, 1:7] 45939 60423 64721 68484 71799 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:7] "1951" "1956" "1957" "1958" ...
.. ..$ : chr [1:7] "N.Amer" "Europe" "Asia" "S.Amer" ...
$ occupationalStatus : 'table' int [1:8, 1:8] 50 16 12 11 2 12 0 0 19 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ origin : chr [1:8] "1" "2" "3" "4" ...
.. ..$ destination: chr [1:8] "1" "2" "3" "4" ...
$ volcano : num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...
--- 5 entries omitted ---
Or for data frames (also omitting entries):
> dataStr(is.data.frame)
List of 42
$ BOD :'data.frame': 6 obs. of 2 variables:
..$ Time : num [1:6] 1 2 3 4 5 7
..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
..- attr(*, "reference")= chr "A1.4, p. 270"
$ CO2 :Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame': 84 obs. of 5 variables:
..$ Plant : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
..$ Type : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
..$ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
..$ conc : num [1:84] 95 175 250 350 500 675 1000 95 175 250 ...
..$ uptake : num [1:84] 16 30.4 34.8 37.2 35.3 39.2 39.7 13.6 27.3 37.1 ...
--- 40 entries omitted ---
Or even for simple vectors:
> dataStr(function(x) is.atomic(x) && is.vector(x) && !is.ts(x))
List of 4
$ euro : Named num [1:11] 13.76 40.34 1.96 166.39 5.95 ...
..- attr(*, "names")= chr [1:11] "ATS" "BEF" "DEM" "ESP" ...
$ islands: Named num [1:48] 11506 5500 16988 2968 16 ...
..- attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia" ...
$ precip : Named num [1:70] 67 54.7 7 48.5 14 17.2 20.7 13 43.4 40.2 ...
..- attr(*, "names")= chr [1:70] "Mobile" "Juneau" "Phoenix" "Little Rock" ...
$ rivers : num [1:141] 735 320 325 392 524 ...