Related
I have a data set I'm trying to run a glm regression on, however it contains characters as age limit, race, and comorbidity class. I would like to change those columns into a continuous variable so the regression can accept it. Data below, I want to change the TBI.irace2 into (Hispanic=1, Black=2, white=3, and other=4) same with age (age 18-28=1, 29-46=2, 47-64=3, and >64=4) and with NISS (NISS 0-10=1, NISS 11-20=2, NISS 21-30=3, and NISS 31-40=4, NISS41-50=5, NISS 51-60=6, NISS 61-70=7, NISS>70= 8)
Please find summary of data below
TBI.crani = c(0, 0, 0, 0, 0, 0), TBI.vte = c(0,
0, 0, 0, 0, 0), TBI.FEMALE = c(0, 0, 1, 0, 1, 0), TBI.iracecat2 = c("Whites",
"Whites", "Whites", "Hispanics", "Whites", "Blacks"), TBI.agecat = c("Age 47-64",
"Age 29-46", "Age > 64", "Age 29-46", "Age 18-28", "Age 18-28"
), TBI.nisscategory = c("NISS 21-30", "NISS 11-20", "NISS 21-30",
"NISS 11-20", "NISS 11-20", "NISS 0-10"), TBI.LOS = c(5, 8, 1,
3, 19, 1), TBI.hospitalteach = c(0, 0, 1, 1, 1, 1), TBI.largebedsize = c(1,
1, 1, 1, 1, 1), TBI.CM_ALCOHOL = c(0, 0, 0, 1, 0, 0), TBI.CM_ANEMDEF = c(0,
0, 0, 0, 0, 0), TBI.CM_BLDLOSS = c(0, 0, 0, 0, 0, 0), TBI.CM_CHF = c(1,
0, 0, 0, 0, 0), TBI.CM_CHRNLUNG = c(0, 0, 0, 0, 0, 0), TBI.CM_COAG = c(0,
0, 0, 0, 1, 0), TBI.CM_HYPOTHY = c(0, 0, 0, 0, 0, 0), TBI.CM_LYTES = c(0,
0, 0, 0, 0, 0), TBI.CM_METS = c(0, 0, 0, 0, 0, 0), TBI.CM_NEURO = c(0,
0, 0, 0, 0, 0), TBI.CM_OBESE = c(0, 0, 0, 0, 0, 0), TBI.CM_PARA = c(0,
0, 0, 0, 0, 0), TBI.CM_PSYCH = c(0, 1, 0, 0, 0, 0), TBI.CM_TUMOR = c(0,
0, 0, 0, 0, 0), TBI.CM_WGHTLOSS = c(0, 0, 0, 0, 0, 0), TBI.UTI = c(0,
0, 0, 0, 0, 0), TBI.pneumonia = c(0, 0, 0, 0, 0, 0), TBI.AMI = c(0,
0, 0, 0, 0, 0), TBI.sepsis = c(0, 0, 0, 0, 0, 0), TBI.arrest = c(0,
0, 0, 0, 0, 0), TBI.spineinjury = c(0, 0, 0, 0, 0, 0), TBI.legfracture = c(0,
0, 0, 0, 0, 0), TBI_time_to_surg.NEW = c(0, 0, 0, 0, 0, 0)), row.names = c(NA,
6L), class = "data.frame")
A small little tip, provide a small sample set that is just big enough to address your question.
library(data.table)
# took a small sample and changed one value to Asian
dt <- data.table(
TBI.FEMALE = c(0, 0, 1, 0, 1, 0),
TBI.iracecat2 = as.character(c("Whites", "Whites", "Asian", "Hispanics", "Whites", "Blacks"))
)
# define race groups, and note I did not define Asian
convert_race <- c("Hispanics" = 1, "Blacks" = 2, "Whites" = 3) # other will all be not defined
dt[, TBI.irace2 := lapply(TBI.iracecat2, function(x) convert_race[x]), by = TBI.iracecat2]
dt[is.na(TBI.irace2), TBI.irace2 := 4]
dt
# TBI.FEMALE TBI.iracecat2 TBI.irace2
# 1: 0 Whites 3
# 2: 0 Whites 3
# 3: 1 Asian 4
# 4: 0 Hispanics 1
# 5: 1 Whites 3
# 6: 0 Blacks 2
I am trying to run an NMDS on some data, using the metaMDS function in the R vegan package. I've managed to run it with a similar dataframe, but for some reason I'm getting the following error with this one:
>Error in cmdscale(dist, k = k) : NA values not allowed in 'd'
In addition: Warning messages:
1: In distfun(comm, method = distance, ...) :
you have empty rows: their dissimilarities may be meaningless in method “bray”
2: In distfun(comm, method = distance, ...) : missing values in results
As it's a large dataframe, I've put it into a Google sheet here
For context, the rows are samples and the columns are genes, with the value indicating the level of the gene in the sample.
With the NMDS, I want to see how similar the samples are, and from that I understand I've got the data set up correctly.
So I tried running the following;
library(vegan)
NMDS <- metaMDS(NMDS, distance="bray")
where NMDS is the dataframe. This is where I get the above error, and I'm not sure what I've done wrong?
This also happens after I run the following code:
NMDS[is.na(NMDS)] = 0
Any ideas where I'm going wrong?
dput:
structure(list(X1 = c(0, 0, 0, 0, 0, 0), X2 = c(0, 0, 0, 0, 0,
0), X3 = c(0, 0, 0, 0, 0, 0), X4 = c(0, 0, 0, 0, 0, 0), X5 = c(0,
0, 0, 0, 0, 0), X6 = c(0, 28, 161, 688, 0, 0), X7 = c(0, 3, 14,
0, 0, 0), X8 = c(0, 0, 0, 0, 0, 0), X9 = c(3, 0, 2, 2, 0, 0),
X10 = c(12, 78, 602, 303, 900, 0), X11 = c(0, 52, 856, 28,
191, 0), X12 = c(0, 51, 12, 1, 0, 0), X13 = c(0, 0, 0, 0,
0, 0), X14 = c(0, 0, 2, 0, 0, 0), X15 = c(5, 17, 46, 39,
9, 0), X16 = c(5255, 1531, 6790, 3302, 5084, 0), X17 = c(0,
0, 0, 0, 0, 0), X18 = c(0, 0, 15, 0, 0, 0), X19 = c(0, 0,
0, 0, 0, 0), X20 = c(0, 0, 0, 0, 0, 0), X21 = c(0, 0, 0,
0, 0, 0), X22 = c(0, 0, 0, 0, 0, 0), X23 = c(0, 0, 0, 0,
0, 0), X24 = c(0, 0, 44, 0, 0, 0), X25 = c(0, 0, 0, 0, 0,
0), X26 = c(0, 6, 24, 185, 0, 0), X27 = c(0, 0, 0, 0, 0,
0), X28 = c(0, 0, 13, 0, 0, 0), X29 = c(0, 0, 0, 0, 0, 0),
X30 = c(0, 0, 0, 7, 0, 0), X31 = c(0, 0, 0, 0, 0, 0), X32 = c(0,
0, 0, 0, 0, 0), X33 = c(0, 0, 1, 2, 0, 0), X34 = c(0, 0,
0, 0, 0, 0), X35 = c(0, 0, 0, 0, 0, 0), X36 = c(0, 2, 0,
0, 0, 0), X37 = c(0, 0, 0, 0, 0, 0), X38 = c(0, 0, 0, 0,
0, 0), X39 = c(0, 0, 0, 0, 0, 0), X40 = c(0, 0, 0, 0, 0,
0), X41 = c(0, 0, 0, 0, 0, 0), X42 = c(0, 0, 0, 0, 0, 0),
X43 = c(0, 0, 0, 0, 0, 0), X44 = c(0, 0, 0, 0, 0, 0), X45 = c(0,
0, 0, 1, 0, 0), X46 = c(0, 0, 0, 63, 0, 0), X47 = c(0, 0,
0, 0, 0, 0), X48 = c(0, 0, 0, 0, 0, 0), X49 = c(0, 0, 0,
0, 0, 0), X50 = c(0, 0, 0, 0, 0, 0), X51 = c(0, 0, 0, 0,
0, 0), X52 = c(0, 0, 0, 0, 0, 0), X53 = c(0, 0, 0, 1, 0,
0), X54 = c(0, 0, 0, 0, 0, 0), X55 = c(0, 0, 0, 1, 0, 0),
X56 = c(0, 0, 0, 0, 0, 0), X57 = c(0, 0, 3, 0, 0, 0), X58 = c(0,
0, 0, 0, 0, 0), X59 = c(0, 0, 0, 0, 0, 0), X60 = c(0, 0,
0, 0, 0, 0), X61 = c(0, 0, 44, 0, 0, 0), X62 = c(0, 0, 15,
0, 0, 0), X63 = c(0, 0, 347, 0, 0, 0), X64 = c(0, 0, 0, 0,
0, 0), X65 = c(0, 0, 0, 5, 0, 0), X66 = c(0, 0, 0, 0, 0,
0), X67 = c(1, 8, 2, 11, 6, 0), X68 = c(0, 26, 0, 0, 0, 0
), X69 = c(0, 0, 0, 8, 0, 0), X70 = c(0, 0, 0, 13, 0, 0),
X71 = c(0, 0, 0, 0, 0, 0), X72 = c(0, 2, 0, 0, 0, 0), X73 = c(0,
0, 0, 0, 0, 0), X74 = c(341, 74, 0, 0, 0, 0), X75 = c(4,
6, 10, 17, 13, 0), X76 = c(0, 0, 0, 0, 0, 0), X77 = c(0,
0, 0, 0, 0, 0), X78 = c(0, 0, 0, 6, 0, 0), X79 = c(0, 0,
0, 0, 0, 0), X80 = c(0, 0, 0, 0, 0, 0), X81 = c(403, 86,
0, 0, 0, 0), X82 = c(20, 95, 54, 0, 0, 0), X83 = c(0, 2,
0, 1, 0, 0), X84 = c(0, 0, 3, 1, 0, 0), X85 = c(0, 0, 0,
0, 0, 0), X86 = c(40, 132, 39, 0, 1, 0), X87 = c(0, 0, 0,
0, 0, 0), X88 = c(0, 0, 0, 0, 0, 0), X89 = c(0, 0, 0, 0,
0, 0), X90 = c(0, 0, 0, 0, 0, 0), X91 = c(0, 0, 0, 0, 0,
0), X92 = c(0, 7, 0, 0, 0, 0), X93 = c(0, 0, 0, 0, 0, 0),
X94 = c(0, 0, 0, 0, 0, 0), X95 = c(0, 0, 0, 0, 0, 0), X96 = c(0,
0, 0, 0, 0, 0), X97 = c(0, 0, 0, 0, 0, 0), X98 = c(0, 0,
0, 0, 0, 0), X99 = c(0, 0, 0, 0, 0, 0), X100 = c(0, 0, 0,
0, 0, 0), X101 = c(0, 0, 0, 0, 0, 0), X102 = c(0, 8, 0, 1,
0, 0), X103 = c(0, 0, 0, 0, 0, 0), X104 = c(0, 0, 0, 0, 0,
0), X105 = c(0, 0, 0, 0, 0, 0), X106 = c(0, 0, 0, 0, 0, 0
), X107 = c(0, 0, 0, 0, 0, 0), X108 = c(0, 0, 0, 0, 0, 0),
X109 = c(0, 0, 0, 0, 0, 0), X110 = c(0, 0, 0, 0, 0, 0), X111 = c(0,
0, 0, 0, 0, 0), X112 = c(15, 47, 0, 1, 0, 0), X113 = c(0,
0, 0, 0, 0, 0), X114 = c(0, 0, 0, 0, 0, 0), X115 = c(0, 0,
0, 2, 0, 0), X116 = c(43, 0, 0, 1, 1, 0), X117 = c(0, 0,
0, 0, 0, 0), X118 = c(0, 0, 0, 0, 0, 0), X119 = c(0, 0, 0,
0, 0, 0), X120 = c(387, 0, 0, 0, 0, 0), X121 = c(0, 0, 0,
0, 0, 0), X122 = c(342, 1, 0, 72, 0, 0), X123 = c(0, 0, 0,
0, 0, 0), X124 = c(0, 0, 0, 76, 0, 0), X125 = c(0, 0, 0,
0, 0, 0), X126 = c(0, 0, 0, 0, 0, 0), X127 = c(0, 2, 0, 0,
0, 0), X128 = c(0, 0, 0, 0, 0, 0), X129 = c(0, 0, 0, 0, 0,
0), X130 = c(0, 0, 0, 0, 0, 0), X131 = c(0, 0, 0, 0, 0, 0
), X132 = c(0, 0, 0, 0, 0, 0), X133 = c(0, 0, 0, 0, 0, 0),
X134 = c(0, 0, 0, 11, 0, 0), X135 = c(13, 108, 0, 129, 192,
0), X136 = c(0, 0, 0, 0, 0, 0), X137 = c(18, 129, 0, 23,
0, 0), X138 = c(0, 0, 0, 32, 7, 0), X139 = c(1, 0, 0, 10,
0, 0), X140 = c(0, 0, 0, 3, 0, 0), X141 = c(0, 0, 0, 0, 0,
0), X142 = c(0, 0, 0, 14, 0, 0), X143 = c(0, 0, 0, 0, 0,
0), X144 = c(16, 74, 71, 0, 0, 0), X145 = c(0, 0, 0, 0, 392,
0), X146 = c(0, 24, 224, 1, 0, 0), X147 = c(0, 19, 224, 1,
0, 0), X148 = c(0, 13, 253, 0, 0, 0), X149 = c(49, 17, 17,
0, 0, 0), X150 = c(133, 70, 74, 0, 0, 0), X151 = c(0, 0,
0, 0, 0, 0), X152 = c(0, 0, 0, 0, 0, 0), X153 = c(0, 0, 0,
0, 0, 0), X154 = c(0, 0, 0, 0, 0, 0), X155 = c(0, 0, 0, 0,
0, 0), X156 = c(0, 1, 0, 0, 0, 0), X157 = c(0, 0, 0, 0, 0,
0), X158 = c(0, 0, 0, 22, 0, 0), X159 = c(0, 0, 0, 0, 0,
0), X160 = c(0, 0, 0, 10, 0, 0), X161 = c(0, 0, 0, 106, 0,
0), X162 = c(148, 27, 85, 0, 0, 0), X163 = c(0, 0, 0, 0,
0, 0), X164 = c(0, 0, 0, 0, 0, 0), X165 = c(0, 10, 0, 0,
0, 0), X166 = c(0, 5, 0, 0, 0, 0), X167 = c(0, 0, 0, 0, 0,
0), X168 = c(1, 0, 0, 0, 0, 0), X169 = c(0, 7, 0, 0, 0, 0
), X170 = c(0, 0, 0, 2, 0, 0), X171 = c(0, 0, 0, 0, 0, 0),
X172 = c(0, 0, 0, 0, 0, 0), X173 = c(0, 0, 0, 0, 0, 0), X174 = c(0,
0, 0, 0, 0, 0), X175 = c(0, 0, 0, 2, 0, 0), X176 = c(0, 0,
0, 0, 0, 0), X177 = c(0, 0, 0, 212, 0, 0), X178 = c(0, 1,
0, 0, 0, 0), X179 = c(0, 0, 0, 0, 0, 0), X180 = c(0, 0, 0,
0, 0, 0), X181 = c(0, 0, 0, 0, 0, 0), X182 = c(0, 0, 0, 0,
0, 0), X183 = c(0, 0, 0, 0, 0, 0), X184 = c(0, 0, 0, 0, 0,
0), X185 = c(0, 9, 0, 0, 0, 0), X186 = c(0, 0, 0, 0, 0, 0
), X187 = c(0, 0, 0, 0, 0, 0), X188 = c(0, 0, 0, 0, 0, 0),
X189 = c(0, 0, 0, 0, 0, 0), X190 = c(475, 108, 329, 14, 57,
0), X191 = c(0, 0, 8, 0, 0, 0), X192 = c(0, 0, 0, 0, 0, 0
), X193 = c(0, 0, 0, 0, 0, 0), X194 = c(0, 0, 0, 0, 0, 0),
X195 = c(0, 0, 0, 0, 0, 0), X196 = c(0, 0, 0, 0, 0, 0), X197 = c(0,
0, 0, 0, 0, 0), X198 = c(0, 0, 2, 0, 0, 0), X199 = c(0, 0,
0, 0, 0, 0), X200 = c(0, 0, 0, 0, 0, 0), X201 = c(0, 27,
647, 1, 0, 0), X202 = c(0, 0, 0, 0, 0, 0), X203 = c(0, 0,
0, 0, 0, 0), X204 = c(0, 0, 0, 0, 0, 0), X205 = c(251, 41,
58, 0, 1, 0), X206 = c(0, 0, 0, 0, 0, 0), X207 = c(0, 0,
0, 0, 0, 0), X208 = c(0, 0, 0, 0, 0, 0), X209 = c(0, 0, 0,
0, 0, 0), X210 = c(0, 0, 0, 0, 0, 0), X211 = c(0, 0, 0, 0,
0, 0), X212 = c(0, 0, 0, 0, 0, 0), X213 = c(0, 0, 0, 0, 0,
0), X214 = c(0, 0, 0, 0, 0, 0), X215 = c(0, 0, 0, 0, 0, 0
), X216 = c(0, 0, 0, 0, 0, 0), X217 = c(0, 0, 0, 0, 0, 0),
X218 = c(0, 0, 0, 0, 0, 0), X219 = c(0, 0, 0, 0, 0, 0), X220 = c(0,
0, 0, 0, 0, 0), X221 = c(0, 0, 0, 0, 0, 0), X222 = c(0, 0,
0, 0, 0, 0), X223 = c(0, 0, 0, 0, 0, 0), X224 = c(2, 0, 0,
0, 0, 0), X225 = c(0, 0, 0, 0, 0, 0), X226 = c(0, 0, 0, 0,
0, 0), X227 = c(0, 0, 0, 0, 0, 0), X228 = c(0, 0, 0, 0, 0,
0), X229 = c(0, 0, 0, 0, 0, 0), X230 = c(0, 0, 0, 0, 0, 0
), X231 = c(1, 0, 0, 0, 0, 0), X232 = c(0, 0, 0, 0, 0, 0),
X233 = c(0, 0, 0, 0, 0, 0), X234 = c(0, 0, 0, 0, 0, 0), X235 = c(0,
0, 0, 0, 0, 0), X236 = c(0, 0, 0, 0, 0, 0), X237 = c(0, 0,
0, 0, 0, 0), X238 = c(0, 0, 0, 0, 0, 0), X239 = c(0, 0, 0,
0, 0, 0), X240 = c(1, 0, 0, 0, 0, 0), X241 = c(445, 90, 0,
0, 1, 0), X242 = c(1, 70, 0, 0, 0, 0), X243 = c(23, 154,
11, 0, 0, 0), X244 = c(0, 0, 1, 0, 0, 0), X245 = c(174, 250,
192, 6, 0, 0), X246 = c(0, 2, 0, 1, 0, 0), X247 = c(0, 0,
0, 0, 0, 0), X248 = c(0, 0, 0, 0, 0, 0), X249 = c(29, 73,
20, 0, 0, 0), X250 = c(0, 99, 0, 0, 0, 0), X251 = c(20, 66,
4, 0, 0, 0), X252 = c(265, 48, 191, 0, 1, 0), X253 = c(112,
59, 0, 0, 0, 0), X254 = c(0, 3, 3, 0, 0, 0), X255 = c(0,
1, 0, 0, 0, 0), X256 = c(0, 0, 0, 0, 0, 0), X257 = c(0, 2,
0, 0, 0, 0), X258 = c(0, 0, 0, 0, 0, 0), X259 = c(86, 44,
69, 0, 0, 0), X260 = c(0, 0, 0, 0, 0, 0), X261 = c(13, 27,
0, 0, 1, 0), X262 = c(0, 5, 0, 0, 0, 0), X263 = c(0, 0, 0,
0, 0, 0), X264 = c(0, 0, 0, 0, 0, 0), X265 = c(0, 0, 0, 0,
0, 0), X266 = c(0, 0, 0, 0, 0, 0), X267 = c(0, 1, 0, 0, 0,
0), X268 = c(0, 0, 0, 0, 0, 0), X269 = c(0, 0, 0, 0, 0, 0
), X270 = c(0, 0, 0, 0, 0, 0), X271 = c(0, 0, 0, 4, 0, 0),
X272 = c(0, 0, 0, 0, 0, 0), X273 = c(0, 0, 0, 0, 0, 0), X274 = c(0,
0, 0, 0, 0, 0), X275 = c(291, 200, 115, 0, 0, 0), X276 = c(0,
5, 0, 0, 0, 0), X277 = c(0, 0, 0, 0, 0, 0), X278 = c(0, 5,
0, 5, 0, 0), X279 = c(0, 3, 2, 6, 0, 0), X280 = c(0, 0, 28,
0, 0, 0), X281 = c(0, 1, 0, 0, 0, 0), X282 = c(0, 8, 1, 5,
0, 0), X283 = c(0, 3, 0, 1, 0, 0), X284 = c(0, 0, 17, 0,
0, 0), X285 = c(0, 3, 0, 0, 0, 0), X286 = c(0, 0, 0, 0, 0,
0), X287 = c(0, 1, 1, 4, 0, 0), X288 = c(0, 0, 0, 0, 0, 0
), X289 = c(0, 2, 0, 0, 0, 0), X290 = c(0, 0, 0, 0, 0, 0),
X291 = c(0, 0, 0, 0, 0, 0), X292 = c(0, 0, 0, 4, 0, 0), X293 = c(0,
0, 0, 0, 0, 0), X294 = c(38, 10, 72, 0, 0, 0), X295 = c(0,
58, 0, 0, 0, 0), X296 = c(0, 20, 0, 0, 0, 0), X297 = c(69,
4, 39, 0, 1, 0), X298 = c(0, 15, 304, 3, 0, 0), X299 = c(0,
0, 0, 0, 0, 0), X300 = c(0, 6, 0, 0, 0, 0), X301 = c(0, 1,
0, 0, 0, 0), X302 = c(51, 28, 13, 0, 0, 0), X303 = c(96,
149, 28, 0, 0, 0), X304 = c(34, 25, 24, 0, 0, 0), X305 = c(0,
3, 1, 0, 0, 0), X306 = c(0, 3, 7, 0, 0, 0), X307 = c(0, 4,
0, 0, 0, 0), X308 = c(0, 0, 0, 0, 0, 0), X309 = c(0, 0, 35,
1, 0, 0), X310 = c(262, 9, 137, 0, 0, 0), X311 = c(3, 15,
0, 2, 9, 0), X312 = c(445, 139, 353, 48, 16, 0), X313 = c(0,
0, 0, 0, 0, 0), X314 = c(0, 0, 0, 0, 0, 0), X315 = c(0, 0,
0, 0, 0, 0), X316 = c(0, 0, 0, 0, 0, 0), X317 = c(0, 0, 0,
0, 0, 0), X318 = c(0, 0, 0, 0, 0, 0), X319 = c(0, 0, 0, 0,
0, 0), X320 = c(62, 138, 36, 0, 0, 0), X321 = c(3, 0, 0,
0, 0, 0), X322 = c(0, 0, 0, 0, 0, 0), X323 = c(0, 13, 0,
0, 0, 0), X324 = c(0, 0, 0, 0, 0, 0), X325 = c(142, 0, 104,
0, 0, 0), X326 = c(0, 2, 0, 0, 0, 0), X327 = c(56, 35, 101,
0, 0, 0), X328 = c(0, 0, 0, 10, 0, 0), X329 = c(0, 0, 0,
0, 0, 0), X330 = c(0, 2, 0, 0, 0, 0), X331 = c(259, 27, 107,
0, 2, 0), X332 = c(0, 0, 0, 0, 0, 0), X333 = c(0, 7, 0, 0,
0, 0), X334 = c(0, 0, 0, 0, 0, 0), X335 = c(98, 39, 95, 0,
0, 0), X336 = c(0, 0, 1, 0, 0, 0), X337 = c(0, 0, 0, 0, 0,
0), X338 = c(141, 28, 85, 0, 0, 0), X339 = c(15, 14, 20,
0, 0, 0), X340 = c(0, 6, 0, 0, 0, 0), X341 = c(0, 0, 0, 0,
0, 0), X342 = c(0, 2, 0, 0, 0, 0), X343 = c(0, 0, 0, 0, 0,
0), X344 = c(0, 0, 0, 0, 0, 0), X345 = c(0, 10, 232, 0, 0,
0), X346 = c(0, 4, 0, 0, 0, 0), X347 = c(0, 0, 0, 0, 0, 0
), X348 = c(0, 0, 0, 0, 0, 0), X349 = c(0, 0, 0, 0, 0, 0),
X350 = c(0, 0, 0, 0, 0, 0), X351 = c(0, 0, 0, 0, 0, 0), X352 = c(0,
0, 0, 0, 0, 0), X353 = c(0, 0, 0, 0, 4, 0), X354 = c(0, 0,
0, 0, 0, 0), X355 = c(0, 0, 0, 0, 1, 0), X356 = c(0, 0, 0,
0, 0, 0), X357 = c(0, 0, 0, 0, 0, 0), X358 = c(0, 0, 0, 0,
0, 0), X359 = c(0, 0, 0, 0, 0, 0), X360 = c(0, 0, 0, 0, 0,
0), X361 = c(0, 0, 0, 0, 0, 0), X362 = c(0, 0, 0, 0, 0, 0
), X363 = c(0, 0, 0, 0, 0, 0), X364 = c(0, 0, 0, 0, 2, 0),
X365 = c(0, 0, 0, 0, 0, 0), X366 = c(0, 0, 0, 0, 0, 0), X367 = c(0,
0, 0, 0, 0, 0), X368 = c(0, 0, 0, 0, 0, 0), X369 = c(0, 0,
0, 17, 0, 0), X370 = c(0, 0, 0, 0, 0, 0), X371 = c(0, 0,
0, 0, 0, 0), X372 = c(0, 0, 0, 0, 0, 0), X373 = c(0, 0, 0,
0, 0, 0), X374 = c(0, 0, 0, 0, 0, 0), X375 = c(0, 0, 0, 0,
0, 0), X376 = c(0, 0, 1, 0, 0, 0), X377 = c(0, 0, 0, 0, 0,
0), X378 = c(0, 0, 0, 0, 0, 0), X379 = c(0, 0, 0, 0, 0, 0
), X380 = c(0, 0, 0, 0, 0, 0), X381 = c(0, 0, 0, 0, 0, 0),
X382 = c(0, 0, 0, 0, 0, 0), X383 = c(0, 51, 0, 0, 0, 0),
X384 = c(0, 0, 0, 0, 0, 0), X385 = c(7, 0, 0, 11, 1, 0),
X386 = c(0, 0, 0, 0, 0, 0), X387 = c(0, 0, 1, 0, 0, 0), X388 = c(0,
0, 0, 0, 0, 0), X389 = c(0, 0, 0, 0, 0, 0), X390 = c(0, 5,
0, 0, 0, 0), X391 = c(0, 0, 0, 0, 0, 0), X392 = c(0, 0, 0,
0, 0, 0), X393 = c(2, 16, 0, 0, 0, 0), X394 = c(0, 6, 88,
0, 0, 0), X395 = c(0, 14, 136, 1, 0, 0), X396 = c(0, 41,
350, 2, 0, 0), X397 = c(0, 0, 0, 0, 0, 0), X398 = c(20, 413,
0, 12, 3, 0), X399 = c(0, 0, 0, 0, 0, 0), X400 = c(0, 3,
0, 0, 0, 0), X401 = c(0, 0, 0, 0, 0, 0), X402 = c(0, 2, 0,
0, 0, 0), X403 = c(0, 2, 0, 0, 0, 0), X404 = c(0, 0, 0, 0,
0, 0), X405 = c(0, 0, 0, 0, 0, 0), X406 = c(0, 0, 0, 0, 0,
0), X407 = c(0, 0, 39, 1, 0, 0), X408 = c(10, 73, 31, 0,
0, 0), X409 = c(0, 11, 0, 0, 0, 0), X410 = c(68, 58, 66,
1, 0, 0), X411 = c(4, 32, 3, 0, 0, 0), X412 = c(8, 66, 39,
0, 0, 0), X413 = c(0, 0, 0, 0, 0, 0), X414 = c(2, 53, 7,
0, 0, 0), X415 = c(120, 90, 109, 0, 0, 0), X416 = c(0, 80,
0, 0, 0, 0), X417 = c(62, 79, 24, 0, 0, 0), X418 = c(58,
156, 30, 0, 0, 0), X419 = c(72, 138, 50, 2, 0, 0), X420 = c(0,
0, 0, 0, 0, 0), X421 = c(0, 0, 0, 0, 0, 0), X422 = c(36,
143, 43, 0, 0, 0), X423 = c(0, 0, 0, 0, 0, 0), X424 = c(0,
0, 0, 0, 0, 0), X425 = c(0, 5, 0, 0, 0, 0), X426 = c(12,
109, 0, 18, 26, 0), X427 = c(0, 0, 0, 0, 0, 0), X428 = c(0,
0, 0, 0, 0, 0), X429 = c(0, 3, 0, 0, 0, 0), X430 = c(0, 0,
362, 0, 0, 0), X431 = c(0, 0, 0, 0, 0, 0), X432 = c(0, 0,
685, 0, 0, 0), X433 = c(0, 0, 0, 0, 0, 0), X434 = c(0, 0,
0, 0, 0, 0), X435 = c(0, 0, 0, 0, 0, 0), X436 = c(0, 0, 0,
0, 0, 0), X437 = c(0, 0, 15, 8, 0, 0), X438 = c(0, 0, 184,
0, 0, 0), X439 = c(0, 0, 0, 0, 0, 0), X440 = c(0, 0, 0, 0,
0, 0), X441 = c(0, 0, 0, 0, 0, 0), X442 = c(0, 0, 0, 0, 0,
0), X443 = c(0, 0, 0, 0, 0, 0), X444 = c(0, 6, 0, 0, 0, 0
), X445 = c(0, 0, 0, 0, 0, 0), X446 = c(0, 1, 1, 4, 0, 0),
X447 = c(0, 3, 0, 0, 0, 0), X448 = c(0, 1, 0, 0, 0, 0), X449 = c(616,
28, 368, 0, 0, 0), X450 = c(0, 0, 1, 0, 0, 0), X451 = c(4098,
2120, 3788, 2663, 3524, 0), X452 = c(0, 0, 0, 0, 0, 0), X453 = c(0,
66, 0, 0, 0, 0), X454 = c(0, 9, 0, 0, 0, 0), X455 = c(0,
1, 0, 0, 0, 0), X456 = c(0, 5, 0, 0, 0, 0), X457 = c(57,
111, 36, 0, 0, 0), X458 = c(0, 0, 0, 0, 0, 0), X459 = c(0,
54, 68, 0, 0, 0), X460 = c(0, 0, 0, 0, 0, 0), X461 = c(0,
0, 0, 0, 0, 0), X462 = c(0, 0, 0, 0, 0, 0), X463 = c(0, 0,
0, 0, 0, 0), X464 = c(0, 0, 0, 0, 0, 0), X465 = c(0, 0, 0,
0, 0, 0), X466 = c(0, 0, 0, 0, 0, 0), X467 = c(0, 1, 0, 2,
0, 0), X468 = c(48, 79, 52, 0, 0, 0), X469 = c(24, 244, 178,
0, 0, 0), X470 = c(24, 28, 13, 0, 0, 0), X471 = c(0, 0, 0,
0, 0, 0), X472 = c(96, 52, 45, 0, 0, 0), X473 = c(0, 0, 0,
102, 0, 0), X474 = c(196, 82, 130, 0, 0, 0), X475 = c(106,
30, 33, 0, 0, 0), X476 = c(12, 21, 22, 0, 0, 0), X477 = c(0,
0, 0, 0, 172, 0), X478 = c(0, 28, 280, 0, 0, 0), X479 = c(0,
27, 310, 0, 0, 0), X480 = c(0, 32, 366, 0, 0, 0), X481 = c(0,
7, 0, 0, 0, 0), X482 = c(0, 22, 0, 0, 0, 0), X483 = c(0,
1, 0, 0, 0, 0), X484 = c(0, 13, 0, 0, 0, 0), X485 = c(0,
2, 0, 0, 0, 0), X486 = c(0, 16, 0, 0, 0, 0), X487 = c(0,
6, 0, 0, 0, 0), X488 = c(0, 8, 0, 0, 0, 0), X489 = c(0, 20,
0, 0, 0, 0), X490 = c(0, 3, 0, 0, 0, 0), X491 = c(0, 14,
0, 0, 0, 0), X492 = c(0, 4, 0, 0, 0, 0), X493 = c(0, 2, 0,
0, 0, 0), X494 = c(0, 5, 0, 0, 0, 0), X495 = c(0, 1, 0, 0,
0, 0), X496 = c(0, 4, 0, 0, 0, 0), X497 = c(0, 15, 0, 0,
0, 0), X498 = c(0, 0, 0, 0, 0, 0), X499 = c(0, 7, 0, 0, 0,
0), X500 = c(0, 13, 0, 0, 0, 0), X501 = c(0, 11, 0, 0, 0,
0), X502 = c(0, 7, 0, 0, 0, 0), X503 = c(0, 4, 0, 0, 0, 0
), X504 = c(0, 0, 0, 0, 0, 0), X505 = c(0, 7, 0, 0, 0, 0),
X506 = c(0, 1, 0, 0, 0, 0), X507 = c(0, 1, 0, 0, 0, 0), X508 = c(0,
0, 0, 1, 0, 0), X509 = c(0, 6, 0, 0, 0, 0), X510 = c(0, 0,
0, 0, 0, 0), X511 = c(0, 2, 0, 0, 0, 0), X512 = c(0, 1, 0,
0, 0, 0), X513 = c(0, 14, 0, 0, 0, 0), X514 = c(0, 3, 0,
0, 0, 0), X515 = c(237, 171, 188, 0, 0, 0), X516 = c(291,
222, 163, 0, 0, 0), X517 = c(5, 36, 9, 0, 0, 0), X518 = c(5,
102, 0, 0, 0, 0), X519 = c(0, 0, 0, 0, 0, 0), X520 = c(0,
0, 0, 0, 0, 0), X521 = c(0, 0, 0, 0, 0, 0), X522 = c(96,
69, 109, 0, 0, 0), X523 = c(236, 0, 118, 0, 1, 0), X524 = c(0,
44, 0, 0, 0, 0), X525 = c(0, 0, 0, 0, 0, 0), X526 = c(0,
0, 0, 0, 0, 0), X527 = c(0, 0, 0, 0, 0, 0), X528 = c(0, 0,
0, 0, 0, 0), X529 = c(0, 62, 15, 0, 0, 0), X530 = c(4, 183,
16, 0, 0, 0), X531 = c(3, 187, 19, 0, 0, 0), X532 = c(197,
79, 64, 0, 0, 0), X533 = c(27, 255, 25, 0, 0, 0), X534 = c(0,
2, 0, 0, 0, 0), X535 = c(0, 20, 0, 0, 0, 0), X536 = c(0,
1, 0, 0, 0, 0), X537 = c(0, 10, 0, 0, 0, 0), X538 = c(0,
1, 0, 0, 0, 0), X539 = c(0, 4, 0, 0, 0, 0), X540 = c(0, 0,
0, 0, 0, 0), X541 = c(0, 6, 0, 0, 0, 0), X542 = c(0, 1, 0,
0, 0, 0), X543 = c(0, 12, 113, 0, 0, 0), X544 = c(0, 77,
990, 0, 0, 0), X545 = c(6, 27, 14, 0, 0, 0), X546 = c(0,
0, 0, 0, 0, 0), X547 = c(0, 0, 0, 0, 0, 0), X548 = c(0, 0,
0, 0, 0, 0), X549 = c(0, 0, 0, 0, 0, 0), X550 = c(0, 0, 0,
0, 0, 0), X551 = c(0, 0, 0, 0, 0, 0), X552 = c(0, 0, 0, 0,
0, 0), X553 = c(301, 0, 0, 0, 0, 0), X554 = c(444, 148, 305,
0, 0, 0), X555 = c(0, 0, 0, 0, 0, 0), X556 = c(0, 2, 2, 0,
0, 0), X557 = c(0, 0, 0, 0, 0, 0), X558 = c(0, 1, 0, 0, 0,
0), X559 = c(0, 0, 0, 0, 0, 0), X560 = c(0, 0, 0, 0, 0, 0
), X561 = c(0, 3, 4, 6, 1, 0), X562 = c(120, 77, 26, 0, 0,
0), X563 = c(0, 3, 628, 0, 0, 0), X564 = c(709, 104, 0, 0,
0, 0), X565 = c(0, 0, 0, 0, 0, 0), X566 = c(95, 59, 581,
175, 1219, 0), X567 = c(0, 0, 0, 0, 13, 0), X568 = c(26,
7, 0, 26, 39, 0), X569 = c(18, 33, 0, 35, 36, 0), X570 = c(0,
2, 41, 39, 1, 0), X571 = c(0, 8, 47, 97, 1, 0), X572 = c(216,
291, 52, 279, 688, 0), X573 = c(198, 504, 0, 5, 0, 0), X574 = c(0,
0, 0, 0, 0, 0), X575 = c(110, 102, 895, 254, 1682, 0), X576 = c(1,
2, 0, 0, 0, 0), X577 = c(10, 18, 0, 0, 0, 0), X578 = c(8,
40, 0, 0, 0, 0), X579 = c(0, 0, 0, 0, 0, 0), X580 = c(0,
0, 0, 0, 0, 0), X581 = c(0, 0, 0, 0, 0, 0), X582 = c(0, 0,
0, 0, 0, 0), X583 = c(0, 0, 216, 0, 0, 0), X584 = c(0, 0,
0, 0, 0, 0), X585 = c(0, 0, 0, 0, 0, 0), X586 = c(0, 0, 0,
0, 0, 0), X587 = c(0, 0, 0, 0, 0, 0), X588 = c(0, 0, 0, 0,
0, 0), X589 = c(0, 0, 0, 0, 0, 0), X590 = c(0, 0, 0, 0, 0,
0), X591 = c(31, 32, 0, 52, 213, 0), X592 = c(0, 0, 12, 0,
0, 0), X593 = c(0, 0, 0, 0, 0, 0), X594 = c(28, 77, 21, 0,
0, 0), X595 = c(0, 0, 0, 0, 0, 0), X596 = c(0, 0, 0, 0, 0,
0)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
You have some rows in NMDS that contain all 0 values which apparently doesn't work with metaMDS.
You can remove rows containing all values == 0 using dplyr:
library(dplyr)
NMDS <- NMDS %>%
filter_all(any_vars(. != 0))
NMDS <- metaMDS(NMDS, distance="bray")
I have some data which looks like:
# A tibble: 50 x 28
sanchinarro date holiday weekday weekend workday_on_holi… weekend_on_holi… protocol_active
<dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -1.01 2010-01-01 1 1 0 1 0 0
2 0.832 2010-01-02 0 0 1 0 0 0
3 1.29 2010-01-03 0 0 1 0 0 0
4 1.04 2010-01-04 0 1 0 0 0 0
5 0.526 2010-01-05 0 1 0 0 0 0
6 -0.292 2010-01-06 1 1 0 1 0 0
7 -0.394 2010-01-07 0 1 0 0 0 0
8 -0.547 2010-01-08 0 1 0 0 0 0
9 -0.139 2010-01-09 0 0 1 0 0 0
10 0.628 2010-01-10 0 0 1 0 0 0
I want to run xgb.cv on the first 40 rows and validate it on the final 10 rows.
I try the following:
library(xgboost)
library(dplyr)
X_Val <- ddd %>% select(-c(1:2))
Y_Val <- ddd %>% select(c(1)) %>% pull()
dVal <- xgb.DMatrix(data = as.matrix(X_Val), label = as.numeric(Y_Val))
xgb.cv(data = dVal, nround = 30, folds = NA, params = list(eta = 0.1, max_depth = 5))
which gives me this error:
Error in xgb.cv(data = dVal, nround = 30, folds = NA, eta = 0.1,
max_depth = 5) : 'folds' must be a list with 2 or more elements
that are vectors of indices for each CV-fold
How can I run a simple xgb.cv on the first 40 rows and test it on the last 10 rows.
I eventually want to apply a gird search with a list of parameters and save the results in a list. Since I am dealing with time series data I do not want to mix the folds up, I just want a simple train and in-sample test of 40:10.
Data:
ddd <- structure(list(sanchinarro = c(-1.00742964973274, 0.832453587904369,
1.29242439731365, 1.03688505875294, 0.525806381631517, -0.291919501762755,
-0.394135237187039, -0.547458840323464, -0.138595898626329, 0.628022117055801,
1.19020866188936, 1.5990716035865, 1.5990716035865, -0.70078244345989,
2.11015028070792, 1.95682667757149, 0.985777191040795, 0.883561455616511,
0.985777191040795, 0.270267043070807, 2.51901322240505, 2.41679748698077,
0.372482778495091, -0.291919501762755, -0.905213914308458, -0.905213914308458,
-0.649674575747748, 1.2413165296015, 1.54796373587436, -0.70078244345989,
-0.905213914308458, -0.0363801632020448, 1.54796373587436, 2.00793454528363,
1.54796373587436, -0.445243104899181, -0.445243104899181, 1.03688505875294,
0.628022117055801, -0.496350972611323, 0.168051307646523, -0.649674575747748,
0.0658355722222391, -1.00742964973274, -0.291919501762755, 0.0147277045100972,
0.168051307646523, -0.189703766338471, 0.219159175358665, 0.679129984767943
), date = structure(c(14610, 14611, 14612, 14613, 14614, 14615,
14616, 14617, 14618, 14619, 14620, 14621, 14622, 14623, 14624,
14625, 14626, 14627, 14628, 14629, 14630, 14631, 14632, 14633,
14634, 14635, 14636, 14637, 14638, 14639, 14640, 14641, 14642,
14643, 14644, 14645, 14646, 14647, 14648, 14649, 14650, 14651,
14652, 14653, 14654, 14655, 14656, 14657, 14658, 14659), class = "Date"),
holiday = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), weekday = c(1,
0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1,
1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1), weekend = c(0, 1, 1, 0,
0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 0), workday_on_holiday = c(1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), weekend_on_holiday = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), protocol_active = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), text_broken_clouds = c(0,
1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1), text_clear = c(0, 0, 0,
0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1), text_fog = c(0, 1, 0, 1, 1, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 1, 0, 1, 0), text_partly_cloudy = c(0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), text_partly_sunny = c(1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0,
0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1), text_passing_clouds = c(1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0,
0, 0, 0, 0, 0, 1, 1, 1), text_scattered_clouds = c(1, 1,
0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0,
0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1), text_sunny = c(0, 0, 0, 0,
0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,
0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1), month_1 = c(1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0), month_2 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1), month_3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_4 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_5 = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), month_6 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0), month_7 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), month_8 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_9 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_10 = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), month_11 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0), month_12 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-50L))
EDIT: List data:
The final data comes in the form of lists.
datalst <- list(structure(list(sanchinarro = c(-1.00742964973274, 0.832453587904369,
1.29242439731365, 1.03688505875294, 0.525806381631517, -0.291919501762755,
-0.394135237187039, -0.547458840323464, -0.138595898626329, 0.628022117055801,
1.19020866188936, 1.5990716035865, 1.5990716035865, -0.70078244345989
), date = structure(c(14610, 14611, 14612, 14613, 14614, 14615,
14616, 14617, 14618, 14619, 14620, 14621, 14622, 14623), class = "Date"),
holiday = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0), weekday = c(1,
0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1), weekend = c(0, 1,
1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0), workday_on_holiday = c(1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0), weekend_on_holiday = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), protocol_active = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), text_broken_clouds = c(0,
1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0), text_clear = c(0,
0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0), text_fog = c(0, 1,
0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0), text_partly_cloudy = c(0,
1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0), text_partly_sunny = c(1,
1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1), text_passing_clouds = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), text_scattered_clouds = c(1,
1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1), text_sunny = c(0,
0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0), month_1 = c(1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), month_2 = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_3 = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_4 = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), month_5 = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), month_6 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), month_7 = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), month_8 = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0), month_9 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), month_10 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0), month_11 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), month_12 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-14L)), structure(list(sanchinarro = c(0.832179838392013, 1.29225734336885,
1.03665872949283, 0.525461501740789, -0.292454062662475, -0.394693508212883,
-0.548052676538495, -0.139094894336863, 0.627700947291197, 1.19001789781844,
1.59897568002007, 1.59897568002007, -0.701411844864107, 2.11017290777211
), date = structure(c(14611, 14612, 14613, 14614, 14615, 14616,
14617, 14618, 14619, 14620, 14621, 14622, 14623, 14624), class = "Date"),
holiday = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), weekday = c(0,
0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1), weekend = c(1, 1,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0), workday_on_holiday = c(0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), weekend_on_holiday = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), protocol_active = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), text_broken_clouds = c(1,
0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), text_clear = c(0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1), text_fog = c(1, 0,
1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0), text_partly_cloudy = c(1,
0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0), text_partly_sunny = c(1,
1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0), text_passing_clouds = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), text_scattered_clouds = c(1,
0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0), text_sunny = c(0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1), month_1 = c(1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), month_2 = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_3 = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_4 = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), month_5 = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), month_6 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), month_7 = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), month_8 = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0), month_9 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), month_10 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0), month_11 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), month_12 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-14L)), structure(list(sanchinarro = c(1.29293502084952, 1.03729933727253,
0.526027970118536, -0.292006217327851, -0.394260490758649, -0.547641900904846,
-0.138624807181653, 0.628282243549334, 1.19068074741873, 1.59969784114192,
1.59969784114192, -0.701023311051044, 2.11096920829591, 1.95758779814971
), date = structure(c(14612, 14613, 14614, 14615, 14616, 14617,
14618, 14619, 14620, 14621, 14622, 14623, 14624, 14625), class = "Date"),
holiday = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), weekday = c(0,
1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0), weekend = c(1, 0,
0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1), workday_on_holiday = c(0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), weekend_on_holiday = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), protocol_active = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), text_broken_clouds = c(0,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), text_clear = c(0,
0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0), text_fog = c(0, 1,
1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0), text_partly_cloudy = c(0,
0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0), text_partly_sunny = c(1,
1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1), text_passing_clouds = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), text_scattered_clouds = c(0,
0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0), text_sunny = c(0,
0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0), month_1 = c(1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), month_2 = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_3 = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), month_4 = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), month_5 = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), month_6 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), month_7 = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), month_8 = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0), month_9 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), month_10 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0), month_11 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0), month_12 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-14L)))
EDIT:
I think this gives me what I am after - I need to double/tripple check it. (if you see any errors please let me know)
splt <- 0.80 * nrow(ddd)
ddd[c(1:splt), "id"] = 1
ddd$id[is.na(ddd$id)] = 2
fold.ids <- unique(ddd$id)
custom.folds <- vector("list", length(fold.ids))
i <- 1
for( id in fold.ids){
custom.folds[[i]] <- which( ddd$id %in% id )
i <- i+1
}
custom.folds
cv <- xgb.cv(params = list(eta = 0.1, max_depth = 5), dVal, nround = 10, folds = custom.folds, prediction = TRUE)
cv$evaluation_log
I now need to find a way to apply this to all 3 lists in the "new" added data.
Firstly, you should split the data onto dtrain (40 first rows) and dval (10 last rows). Secondly, you need rather xgb.train, not xgb.cv.
So, your code should be modified to something like that:
library(xgboost)
library(dplyr)
# you code regarding ddd
X <- ddd %>% select(-c(1:2))
Y <- ddd %>% select(c(1)) %>% pull()
dtrain <- xgb.DMatrix(data = as.matrix(X[1:40,]), label = as.numeric(Y[1:40,]))
dval <- xgb.DMatrix(data = as.matrix(X[41:50,]), label = as.numeric(Y[41:50,]))
watchlist <- list(train=dtrain, val=dval)
model <- xgb.train(data=dtrain, watchlist=watchlist, nround = 30, eta = 0.1, max_depth = 5)
IMHO, 40+10 rows only and so sparse features give no hope to obtain good results using XGBoost.
I feel like this answer has been asked before, but I can't seem to find an answer to this question. Maybe my title is too vague, so feel free to change it.
So I have one data frame, a, with ids the correspond to column name in data frame b. Both data frames are simplified versions of a much larger data frame.
here is data frame a
a <- structure(list(V1 = structure(c(4L, 5L, 1L, 2L, 3L), .Label = c("GEN[D00105].GT",
"GEN[D00151].GT", "GEN[D00188].GT", "GEN[D86396].GT", "GEN[D86397].GT"
), class = "factor")), row.names = c(NA, -5L), class = "data.frame")
here is data frame b
b <- structure(list(`GEN[D01104].GT` = c(0, 0, 0, 0, 1, 0, 0, 2, 0,
1, 1, 1, 1, 0, 0, 0, 2, 0, 0, 0), `GEN[D01312].GT` = c(1, 0,
2, 2, 0, 0, 0, 0, 0, 1, 1, 0, 0, 2, 0, 0, 2, 0, 0, 0), `GEN[D01878].GT` = c(0,
0, 0, 2, 0, 0, 2, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 2, 0, 0), `GEN[D01882].GT` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 2, 0, 0, 0, 0), `GEN[D01952].GT` = c(0,
0, 1, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 2, 0), `GEN[D01953].GT` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 2, 0, 0, 0, 2, 0), `GEN[D02053].GT` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0), `GEN[D00316].GT` = c(0,
0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 2, 0, 0), `GEN[D01827].GT` = c(0,
0, 0, 2, 0, 0, 2, 0, 0, 2, 0, 0, 2, 0, 0, 2, 0, 0, 2, 0), `GEN[D01881].GT` = c(0,
0, 0, 2, 0, 0, 2, 0, 0, 2, 0, 0, 2, 0, 0, 0, 2, 0, 2, 0), `GEN[D02044].GT` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0), `GEN[D02085].GT` = c(0,
0, 0, 2, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0), `GEN[D02204].GT` = c(0,
0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0), `GEN[D02276].GT` = c(0,
0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0), `GEN[D02297].GT` = c(0,
0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0), `GEN[D02335].GT` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 2, 0, 0), `GEN[D02397].GT` = c(0,
0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0), `GEN[D00856].GT` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 0), `GEN[D00426].GT` = c(0,
0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0), `GEN[D02139].GT` = c(0,
0, 1, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0), `GEN[D02168].GT` = c(0,
0, 2, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0)), row.names = c(NA,
-20L), class = "data.frame")
I want to be able to use the ids from data frame a to sum the row in data frame b that have a matching id if that makes sense.
So in the past, I just did something like
b$affected.samples <- (b$`GEN[D86396].GT` + b$`GEN[D86397].GT` + b$`GEN[D00105].GT` + b$`GEN[D00151].GT` + b$`GEN[D00188].GT`)
which got annoying and took to much time, so I moved over to
b$affected.samples <- rowSums(b[,c(1:5)])
Which isn't too bad for this example but with my large data set, my sample can be all over the place, and it's starting to take too much time to finds where everything is. I was hoping there is a way just to use my data frame a to sum the correct rows in data frame b.
Hopefully, I gave this is all the information you need! Let me know if you have any questions.
Thanks in advance!!
Extract the 'V1' column as a character string, use that to select the columns of 'b' (assuming these column names are found in 'b') and get the rowSums
rowSums( b[as.character(a$V1)], na.rm = TRUE)
I have a data.frame like this:
> dput(head(dat))
structure(list(`Gene name` = c("at1g01050", "at1g01080", "at1g01090",
"at1g01220", "at1g01320", "at1g01420"), `1_1` = c(0, 0, 0, 0,
0, 0), `1_2` = c(0, 0, 0, 0, 0, 0), `1_3` = c(0, 2.2266502274762,
0, 0, 0, 0), `1_4` = c(0, 1.42835007256373, 0, 0, 0, 0), `1_5` = c(0,
1, 0, 0, 0, 0.680307288653971), `1_6` = c(0, 0.974694551708235,
0.0703315834738149, 0, 0, 1.5411058346636), `1_7` = c(1, 1.06166030205396,
0, 0, 0, 0), `1_8` = c(1, 1.07309874414745, 0.129442847788922,
0, 0, 0), `1_9` = c(1.83566164452602, 0.770848509662441, 1.16522133036595,
1.02360016370994, 0, 0), `1_10` = c(0, 0, 0.96367393959757, 0,
0, 0), `1_11` = c(0, 1, 1.459452636222, 0, 0.992067202742928,
0), `1_12` = c(0, 0, 0.670100384155585, 0, 0.461601636474094,
0), `1_13` = c(0, 0, 1.43074917909221, 0, 1.35246977730244, 0
), `1_14` = c(0, 0, 1.13052717277684, 0, 1.27971261718285, 0),
`1_15` = c(0, 0, 0, 0, 0, 0), `1_16` = c(0, 0, 1.02186950513655,
0, 0.937805171752374, 0), `1_17` = c(0, 0, 0, 0, 1.82226410514639,
0), `1_18` = c(0, 0, 1.2057581396188, 0, 1, 0), `1_19` = c(0,
0, 2.54080080087007, 0, 1.74014162763125, 0), `1_20` = c(0,
0, 0, 0, 0, 0), `1_21` = c(0, 0, 1.85335086627868, 0, 2.93605031878879,
0), `1_22` = c(0, 0, 0, 0, 0, 0), `1_23` = c(0, 0, 0, 0,
0, 0), `1_24` = c(0, 0.59685787388353, 4.74450895485671,
0, 1.64665192735547, 0), `1_25` = c(0, 0, 0, 0, 0, 0), `1_26` = c(0,
0, 0, 0, 0, 0), `1_27` = c(0, 1.70324142554566, 0, 0, 0,
0), `1_28` = c(0, 4.02915818089525, 0, 0, 0, 0), `1_29` = c(0,
1.10050253348262, 0, 0, 0, 1.78705663080963), `1_30` = c(0,
0, 0, 0, 0, 0), `1_31` = c(0.525193634811661, 1.19203674964562,
0, 0, 0, 0), `1_32` = c(0.949695564218912, 0.511935958918944,
0.698256748091399, 0.924419021307232, 0, 0), `1_33` = c(1,
0.392202418854686, 0.981531026331928, 1, 0, 0), `1_34` = c(0,
0, 1.04480642952605, 0, 0, 0), `1_35` = c(0.875709646300199,
0.416787083481068, 0.910412293707794, 0, 0.931813162802324,
0), `1_36` = c(0.235817844851986, 0, 0.695496044366791, 0,
0, 0), `1_37` = c(0, 0, 0, 0, 0, 0), `1_38` = c(0, 0, 0,
0, 0, 0), `1_39` = c(0, 0, 0, 0, 0, 0), `1_40` = c(0, 0.426301584359177,
1.05916031917965, 0, 1.11716924423855, 0), `1_41` = c(0,
0, 0, 0, 0, 0), `1_42` = c(0, 0, 0, 0, 0, 0), `1_43` = c(0,
0, 0, 0, 0, 0), `1_44` = c(0, 0.817605484758179, 1, 0, 1,
0), `1_45` = c(0, 0, 0, 0, 1.83706702696725, 0), `1_46` = c(0,
0, 0, 0, 0, 0), `1_48` = c(0, 0, 0, 0, 0, 0), `1_49` = c(0,
0, 0, 0, 0, 0), `1_50` = c(0, 0, 0, 0, 0, 0), `1_51` = c(0,
0.822966241998042, 0, 0, 0, 0), `1_52` = c(0, 1.38548267401525,
0, 0, 0, 0), `1_53` = c(0, 0.693090058304095, 0, 0, 0, 1.200664746484
), `1_54` = c(0, 7.58136662752864, 0, 0, 0, 0), `1_55` = c(0.519878111919004,
0.530809413647805, 0.343274113384907, 0, 0, 0), `1_56` = c(1.24511715957891,
0.545097856366912, 0.397440073804376, 0, 0, 0), `1_57` = c(1.26748496499576,
0.502893153188496, 1, 1.09278985531586, 0, 0), `1_58` = c(0.696198684496234,
0.68197003689249, 1.30108437738319, 0.778091049180591, 0.533017938104689,
0), `1_59` = c(1.15255606344999, 0.294294436704185, 1.07862692616479,
1, 0.250091116406616, 0), `1_60` = c(1.95634163405497, 0,
1.1602014253913, 0, 0, 0), `1_61` = c(1.09287167009628, 0,
2.05939536537347, 1.08165521287259, 0.68027384701565, 0),
`1_62` = c(0.791776166968497, 0, 0.846107162142824, 0, 0.77013323652256,
0), `1_63` = c(0.378787010943447, 0.391876271945063, 0.623223753921758,
0, 0.651918444771296, 0), `1_64` = c(0.189585762007804, 0.361452381684218,
0.799519726870751, 0, 1.06818683719768, 0), `1_65` = c(0,
0, 2.5212953775211, 0, 0, 0), `1_66` = c(0, 0, 0, 0, 0, 0
), `1_67` = c(0, 0, 0, 0, 2.44827717262786, 0), `1_68` = c(0,
0, 0, 0, 0, 0), `1_69` = c(0, 0, 0, 0, 0, 0), `1_70` = c(0,
0, 2.36142611074334, 0, 2.391093649557, 0), `1_71` = c(0,
0, 0.35565044656798, 0, 0, 0), `1_72` = c(0, 0, 5.86951313801941,
0, 0, 0)), .Names = c("Gene name", "1_1", "1_2", "1_3", "1_4",
"1_5", "1_6", "1_7", "1_8", "1_9", "1_10", "1_11", "1_12", "1_13",
"1_14", "1_15", "1_16", "1_17", "1_18", "1_19", "1_20", "1_21",
"1_22", "1_23", "1_24", "1_25", "1_26", "1_27", "1_28", "1_29",
"1_30", "1_31", "1_32", "1_33", "1_34", "1_35", "1_36", "1_37",
"1_38", "1_39", "1_40", "1_41", "1_42", "1_43", "1_44", "1_45",
"1_46", "1_48", "1_49", "1_50", "1_51", "1_52", "1_53", "1_54",
"1_55", "1_56", "1_57", "1_58", "1_59", "1_60", "1_61", "1_62",
"1_63", "1_64", "1_65", "1_66", "1_67", "1_68", "1_69", "1_70",
"1_71", "1_72"), row.names = c(NA, 6L), class = "data.frame")
That's the code I use for calculation of the mean for 3 replicates which I have in the data frame:
## Calculating the mean of 3 "replicates"
ind <- c(1, 25, 49)
dat2 <- dat[-1]
tbl_end <- cbind(dat[1], sapply(0:23, function(i) rowMeans(dat2[ind+i])))
That's an error which comes:
Error in `[.data.frame`(dat2, ind + i) : undefined columns selected
Called from: eval(substitute(browser(skipCalls = pos), list(pos = 9 - frame)),
envir = sys.frame(frame))
I have 71 columns of results (should be 72 because I have 24 fractions and 3 replicates what gives 72 in total) but there should be one more column. No idea why it's missing but anyway I have to solve it. There is no 1_47 which should come with 1_23 and 1_71. Do you have any idea how can I edit my function to just ignore fraction 1_47 and still get a mean of 1_23 and 1_71 ?
Why not just add in a dummy column for 1_47. That will make your data more regular and make it much easier to extract the indexes you need. To do this, try
dat2<-cbind(dat[1:47], 1_47=rep(NA, nrow(dat)), dat[48:72])
ind <- c(1, 25, 49)
tbl_end <- cbind(dat[1], sapply(0:23, function(i) rowMeans(dat2[ind+i+1], na.rm=T)))