The majority of information I can find on side-by-side stacked barplots deals with instances in where some variable (number of side-by-side bars) are repeated for each variable along the x-axis - see: 1, 2, 3, 4, 5, 6. In these cases they use ggplot with besides=TRUE.
I have a more complex example which I believe will require faceting like these two examples: 7 & 8.
Quick background (for those interested in the why?):
I'm trying to compare the efficiency of a proteomics protocol that enriches for chromatin by comparing to the proportion of nuclear proteins found in the core/whole proteome experiments for 4 cell lines. to do this I used The Human Protein Atlas to annotate proteins by their subcellular location and compare nuclear proteins from chromatin-enrichment to whole-enrichment. However, the chromatin-enrichment protocol was 1D-shotgun while the whole proteome data was 2D-shotgun with 50 fractions. In layman terms this means the whole/core proteome data is a more expensive experiment done at higher coverage. Therefore, it wouldn't make sense to look at absolute proportion though because the overall amount of found proteins would be higher in the whole proteome pull-downs (see figure: absolute protein comparison sketch). To circumvent this issue I divided by the total number of proteins found in each pull-down to get relative proportion of proteins from each subcellular location.
Using these relative proportions I've produced a stacked barplot of the following data in my gist with the following code:
df1 <- read.csv("data.csv") # Load data.frame of the data
df2 <- melt(df1, # Reshape the data from
id.vars = "subcellular_location", # wide format into long format
variable.name = "cell_line", # (i.e. tidy data)
value.name = "relative_proportion")
For some reason this didn't change the variable name or value name (headers) - they are called "variable" and "value" still? So I had to rename column headers via the following.
names(df2) <- c("subcellular_location", "cell_line", "relative_proportion")
As there are many subcellular locations I needed to custom add colors, furthermore I grouped them by similar locations (e.g. nuclear in blue).
p <- ggplot() +
geom_bar(aes(x = cell_line, y = percentage, fill = subcellular_location),
data = df2, stat="identity")
p +
coord_flip() +
scale_fill_manual(values = c("#bd5db0","#9ae17c", "#be0024", "#7388ff", "#c456b7",
"#8ed470", "#7ec361", "#7d7304", "#f87a00", "#d543c7",
"#bead47", "#d148c3", "#da8836", "#e28504", "#d93eca",
"#c720b9", "#bc07ae", "#a40098", "#9a008e", "#e8d448",
"#104ed7", "#2c4ecc", "#00428c", "#393c6d", "#173b8f",
"#3f4c96", "#9ba2f5", "#727bcc", "#e59c5f", "#790000",
"#045d00", "#f9ad6f"))
See image here: stacked barplot
The core proteome pull-downs are highlighted in yellow. Ideally what I would like to do is facet this barplot into 4 sections - one for each cell line. I followed the instructions from reference 7 for faceting but am getting an error.
First I split my dataframe into 4 separate tidy dataframes (e.g. below):
K562 <- read.csv("K562-relative.csv")
K562 <- melt(K562, id.vars = "subcellular_location") # Reshape the data into tidy form
names(K562) <- c("subcellular_location", "cell_line", "relative_proportion")
etc.
Than I created a vector for cell line:
cell <- sample(c("HAP1","K562","A673","MDS"))
When I try the following code I get an error:
ref_by_cell <- data.frame(HAP1 = HAP1, K562 = K562, A673 = A673, MDS = MDS, cell = cell)
Error in data.frame(HAP1 = HAP1, K562 = K562, A673 = A673, MDS = MDS,
arguments imply differing number of rows: 576, 544, 64, 4
I would appreciate any help with faceting or alternative ideas for displaying this information.
Thank you!
I'm not entirely sure what you want, but if you want to facet by the first part of each cell_line value...
# add faceting variable to df2
df2 <- df2 %>%
mutate(cell = stringi::stri_extract_first_regex(cell_line, "^[^\\.|_]+"))
# facet by cell, specifying free scales / space on the y-axis
ggplot(data = df2,
aes(x = cell_line, y = relative_proportion, fill = subcellular_location)) +
geom_bar(stat = "identity") +
coord_flip() +
facet_grid(cell~., scales = "free_y", space = "free_y") +
scale_fill_manual(values = c("#bd5db0","#9ae17c", "#be0024", "#7388ff", "#c456b7",
"#8ed470", "#7ec361", "#7d7304", "#f87a00", "#d543c7",
"#bead47", "#d148c3", "#da8836", "#e28504", "#d93eca",
"#c720b9", "#bc07ae", "#a40098", "#9a008e", "#e8d448",
"#104ed7", "#2c4ecc", "#00428c", "#393c6d", "#173b8f",
"#3f4c96", "#9ba2f5", "#727bcc", "#e59c5f", "#790000",
"#045d00", "#f9ad6f")) +
theme_bw() +
theme(strip.text.y = element_text(angle = 0))
Data (copied from your gist link; next time please use dput so that others can reproduce your example more easily):
> dput(df1)
structure(list(subcellular_location = c("actinFilaments", "aggresome",
"cellJunctions", "centrosome", "cytokineticBridge", "cytoplasmicBodies",
"cytosol", "endoplasmicReticulum", "endosome", "focalAdhesion",
"golgiApparatus", "intermediateFilaments", "lipidDroplets", "lysosomes",
"microtubuleEnds", "microtubuleOrganizingCenter", "microtubules",
"midbodyRing", "midbody", "mitochondria", "mitoticSpindle", "nuclearBodies",
"nuclearMembrane", "nuclearSpeckles", "nucleliFibrallar", "nucleoli",
"nucleoplasm", "nucleus", "peroxisomes", "plasmaMembrane", "rodsAndRings",
"vesicles"), HAP1_P5242 = c(0.009581882, 0.000338753, 0.011033682,
0.015824623, 0.003774681, 0.00232288, 0.227013163, 0.024535424,
0.001258227, 0.005807201, 0.04229578, 0.008710801, 0.0014518,
0.001064654, 0.00029036, 0.006484708, 0.013646922, 0.000483933,
0.001064654, 0.063637244, 0.00087108, 0.02303523, 0.013646922,
0.024535424, 0.013259775, 0.054587689, 0.195509098, 0.101480836,
0.00174216, 0.058072009, 0.000822687, 0.071815718), HAP1.wt_P8255.1 = c(0.0176,
0, 0.0032, 0.0096, 0, 0.0032, 0.3664, 0.0912, 0.008, 0.0032,
0.0128, 0, 0, 0.0064, 0, 0.0032, 0.0288, 0, 0, 0.0528, 0, 0.0128,
0.0048, 0.0096, 0, 0.0496, 0.1552, 0.0576, 0, 0.064, 0.0016,
0.0384), HAP1.wt_P8255.2 = c(0.013179572, 0, 0, 0.008237232,
0, 0.004942339, 0.36738056, 0.098846788, 0.003294893, 0.003294893,
0.016474465, 0.001647446, 0, 0.004942339, 0, 0.003294893, 0.029654036,
0, 0, 0.05107084, 0, 0.009884679, 0.004942339, 0.011532125, 0,
0.044481054, 0.154859967, 0.05601318, 0, 0.064250412, 0.001647446,
0.046128501), HAP1.wt_P8254.1 = c(0.012841091, 0, 0, 0.006420546,
0.001605136, 0.004815409, 0.362760835, 0.08988764, 0.001605136,
0.004815409, 0.017656501, 0.003210273, 0, 0.003210273, 0, 0.004815409,
0.032102729, 0, 0, 0.04975923, 0, 0.011235955, 0.003210273, 0.011235955,
0, 0.04975923, 0.160513644, 0.060995185, 0, 0.069020867, 0.001605136,
0.036918138), HAP1.wt_P8254.2 = c(0.015873016, 0, 0, 0.00952381,
0.001587302, 0.004761905, 0.357142857, 0.103174603, 0.003174603,
0.003174603, 0.014285714, 0.001587302, 0.001587302, 0.003174603,
0, 0.003174603, 0.03015873, 0, 0, 0.055555556, 0, 0.012698413,
0.006349206, 0.012698413, 0, 0.050793651, 0.152380952, 0.063492063,
0, 0.057142857, 0.001587302, 0.034920635), HAP1.kd_P8253.1 = c(0,
0, 0, 0, 0, 0, 0.270270271, 0.027027028, 0, 0, 0.027027028, 0,
0, 0, 0, 0, 0.054054053, 0, 0, 0, 0, 0, 0, 0.054054053, 0, 0.054054053,
0.405405405, 0.027027028, 0, 0, 0, 0.081081081), HAP1.kd_P8253.2 = c(0.021381579,
0, 0.003289474, 0.013157895, 0, 0.003289474, 0.368421053, 0.100328947,
0.004934211, 0.003289474, 0.013157895, 0.003289474, 0, 0.006578947,
0, 0.001644737, 0.027960526, 0, 0, 0.046052632, 0, 0.011513158,
0.004934211, 0.009868421, 0, 0.050986842, 0.15131579, 0.050986842,
0, 0.065789474, 0.001644737, 0.036184211), HAP1.kd_P8252.1 = c(0.018518518,
0, 0.00308642, 0.010802469, 0, 0.00462963, 0.354938272, 0.092592593,
0.00617284, 0.00462963, 0.018518518, 0, 0.00154321, 0.00462963,
0, 0.00617284, 0.026234568, 0, 0, 0.043209877, 0, 0.015432099,
0.00308642, 0.015432099, 0, 0.049382716, 0.154320988, 0.061728395,
0, 0.063271605, 0.00154321, 0.040123457), HAP1.kd_P8252.2 = c(0.012965964,
0, 0, 0.011345219, 0.001620746, 0.003241491, 0.367909238, 0.095623987,
0.003241491, 0.004862237, 0.017828201, 0.003241491, 0, 0.004862237,
0, 0.003241491, 0.030794165, 0, 0, 0.051863857, 0, 0.016207455,
0.003241491, 0.009724473, 0, 0.04376013, 0.1636953, 0.055105348,
0, 0.064829822, 0.001620746, 0.02917342), HAP1.kd_P8249.1 = c(0.010309278,
0.001718213, 0, 0.006872852, 0, 0.005154639, 0.197594502, 0.091065292,
0.001718213, 0, 0.013745704, 0.005154639, 0.001718213, 0.001718213,
0, 0, 0.027491409, 0, 0, 0.054982818, 0, 0.017182131, 0.013745704,
0.060137457, 0, 0.082474227, 0.240549828, 0.094501718, 0, 0.04467354,
0, 0.027491409), HAP1.kd_P8249.2 = c(0.010752688, 0, 0, 0.007168459,
0, 0.003584229, 0.20609319, 0.084229391, 0.001792115, 0, 0.007168459,
0.005376344, 0, 0.001792115, 0, 0, 0.03046595, 0, 0, 0.069892473,
0, 0.019713262, 0.014336918, 0.064516129, 0, 0.08781362, 0.224014337,
0.096774194, 0, 0.039426523, 0, 0.025089606), HAP1.kd_P8248.1 = c(0.007207207,
0, 0.001801802, 0.007207207, 0, 0.003603604, 0.198198198, 0.099099099,
0, 0, 0.009009009, 0.007207207, 0, 0, 0, 0.001801802, 0.025225225,
0, 0, 0.061261261, 0, 0.021621622, 0.016216216, 0.068468468,
0, 0.079279279, 0.234234234, 0.093693694, 0.001801802, 0.028828829,
0, 0.034234234), HAP1.kd_P8248.2 = c(0.005272408, 0.001757469,
0, 0.008787346, 0, 0.005272408, 0.202108963, 0.09314587, 0, 0,
0.014059754, 0.005272408, 0, 0, 0, 0.001757469, 0.029876977,
0, 0, 0.056239016, 0, 0.021089631, 0.014059754, 0.065026362,
0, 0.086115993, 0.228471002, 0.094903339, 0.001757469, 0.036906854,
0, 0.028119508), HAP1.wt_P8247.1 = c(0.016333938, 0, 0, 0.001814882,
0, 0.005444646, 0.197822141, 0.09800363, 0.001814882, 0, 0.007259528,
0.005444646, 0, 0.001814882, 0, 0.001814882, 0.030852995, 0,
0, 0.061705989, 0, 0.021778584, 0.012704174, 0.065335753, 0,
0.087114338, 0.234119782, 0.096188748, 0, 0.029038113, 0, 0.023593466
), HAP1.wt_P8247.2 = c(0.011173184, 0, 0, 0.003724395, 0, 0.003724395,
0.197392924, 0.098696462, 0.001862197, 0, 0.009310987, 0.005586592,
0, 0.001862197, 0, 0.001862197, 0.029795158, 0, 0, 0.059590317,
0, 0.018621974, 0.013035382, 0.067039106, 0, 0.08566108, 0.240223464,
0.096834264, 0, 0.029795158, 0, 0.024208566), HAP1.wt_P8246.1 = c(0.008880995,
0, 0, 0.005328597, 0.003552398, 0.003552398, 0.195381883, 0.090586146,
0, 0, 0.005328597, 0.008880995, 0, 0, 0, 0.001776199, 0.030195382,
0, 0, 0.051509769, 0, 0.023090586, 0.01598579, 0.063943162, 0,
0.097690941, 0.245115453, 0.097690941, 0, 0.026642984, 0, 0.024866785
), HAP1.wt_P8246.2 = c(0.009025271, 0, 0.001805054, 0.005415162,
0, 0.003610108, 0.19133574, 0.088447653, 0, 0.001805054, 0.012635379,
0.007220217, 0, 0, 0, 0, 0.028880866, 0, 0, 0.048736462, 0, 0.019855596,
0.027075812, 0.066787004, 0, 0.084837545, 0.241877256, 0.09566787,
0, 0.028880866, 0, 0.036101083), HAP1_P7964.1 = c(0.010040907,
0, 0.007437709, 0.017106731, 0.002975084, 0.003346969, 0.211230941,
0.040535515, 0.002603198, 0.005950167, 0.023056898, 0.00818148,
0.001115656, 0.002231313, 0.000743771, 0.005950167, 0.014503533,
0, 0.000743771, 0.065451841, 0.001115656, 0.023056898, 0.018966158,
0.031610264, 0, 0.065451841, 0.223875046, 0.105243585, 0.002603198,
0.051692079, 0.001487542, 0.051692079), MDS_P7246 = c(0.008080031,
0.000384763, 0.005386687, 0.012889573, 0.002885725, 0.002500962,
0.204116968, 0.035013467, 0.002116199, 0.00461716, 0.030973451,
0.008272412, 0.001539053, 0.001539053, 0.000192382, 0.003270489,
0.01250481, 0.000192382, 0.000961908, 0.082724125, 0.000577145,
0.025971528, 0.018661023, 0.030011543, 0.013851481, 0.065217391,
0.214313197, 0.108310889, 0.002116199, 0.048095421, 0.000577145,
0.052135437), MDS.L_P7246.1 = c(0.008308003, 0.000202634, 0.006079027,
0.013373858, 0.003039513, 0.002228976, 0.207294805, 0.036068891,
0.002026342, 0.004660587, 0.030800401, 0.008308003, 0.001621074,
0.001621074, 0.000202634, 0.003039513, 0.012563322, 0.000202634,
0.001013171, 0.081458956, 0.000405268, 0.026950351, 0.018034445,
0.030395133, 1.34e-07, 0.065450852, 0.218642321, 0.109017209,
0.002228976, 0.050050652, 0.000810537, 0.053900702), A673_P6591 = c(0.01081944,
0.000354736, 0.008158922, 0.013125222, 0.003015254, 0.003015254,
0.202554097, 0.035118836, 0.002128414, 0.006207875, 0.036183044,
0.006917347, 0.000886839, 0.001596311, 0.000709471, 0.004788932,
0.013657325, 0.000177368, 0.001064207, 0.069882937, 0.000709471,
0.025186236, 0.015253636, 0.029265697, 0.013125222, 0.06385243,
0.207875133, 0.106598084, 0.002305782, 0.056580348, 0.000354736,
0.058531394), A673_P6591.1 = c(0.011204482, 0.000186741, 0.008403361,
0.01363212, 0.003174603, 0.002614379, 0.203361345, 0.036414566,
0.002054155, 0.006162465, 0.036788049, 0.006722689, 0.000933707,
0.001680672, 0.000746965, 0.004668534, 0.01363212, 0.000186741,
0.001120448, 0.069467787, 0.000560224, 0.025957049, 0.014752568,
0.029505135, 0, 0.064239029, 0.212885154, 0.108496732, 0.002427638,
0.05751634, 0.000373483, 0.060130719), K562_P535 = c(0.008616975,
0.000143616, 0.007755278, 0.011202068, 0.002441476, 0.003303174,
0.278471923, 0.038776389, 0.00229786, 0.006031883, 0.033031739,
0.00689358, 0.003159558, 0.00229786, 0.000287233, 0.004164871,
0.012638231, 0.000287233, 0.000574465, 0.090621858, 0.000574465,
0.015941405, 0.009478673, 0.021255206, 0.009909522, 0.047536981,
0.181243717, 0.083871894, 0.002441476, 0.055292259, 0.00114893,
0.0583082), K562_P5494.1 = c(0.008692853, 0.000321957, 0.008692853,
0.012395364, 0.002736639, 0.002736639, 0.212813909, 0.032356729,
0.001448809, 0.004990341, 0.033000644, 0.007405023, 0.001448809,
0.001448809, 0.000160979, 0.004990341, 0.013039279, 0, 0.001126851,
0.074050225, 0.000643915, 0.027849324, 0.01545396, 0.029459111,
0, 0.065035415, 0.216355441, 0.111719253, 0.002092724, 0.051996137,
0.000804894, 0.054732775), K562_P5464.1 = c(0.009412153, 0.000495376,
0.008256275, 0.013705416, 0.002476882, 0.002476882, 0.20673712,
0.032529723, 0.001486129, 0.004788639, 0.034180978, 0.007595773,
0.001155878, 0.001321004, 0.000330251, 0.005284016, 0.012714663,
0.000330251, 0.000990753, 0.073811096, 0.000660502, 0.02823646,
0.016017173, 0.029722589, 0, 0.06489432, 0.217635403, 0.110634082,
0.002146631, 0.052179657, 0.000825627, 0.056968296), K562_P5359.1 = c(0.00740349,
0, 0.005288207, 0.005288207, 0.001057641, 0.003172924, 0.225806452,
0.063987308, 0.003172924, 0.004230566, 0.022739291, 0.005817028,
0.002644104, 0.002644104, 0, 0.003701745, 0.013749339, 0, 0.000528821,
0.099947118, 0, 0.015864622, 0.020095188, 0.037546272, 0, 0.080909572,
0.196192491, 0.090957166, 0.002115283, 0.040719196, 0.000528821,
0.043892121), K562_P5359.2 = c(0.007903056, 0, 0.004741834, 0.006322445,
0.001580611, 0.002107482, 0.223393045, 0.062170706, 0.002634352,
0.004741834, 0.023709168, 0.005795574, 0.002634352, 0.002634352,
0, 0.003688093, 0.014752371, 0, 0, 0.103266596, 0, 0.017386723,
0.021601686, 0.036354057, 0, 0.079030558, 0.192834563, 0.090621707,
0.002107482, 0.042676502, 0.00052687, 0.044783983), K562_P5358.1 = c(0.007462687,
0, 0.00533049, 0.005863539, 0.001599147, 0.003731343, 0.229744136,
0.064498934, 0.003198294, 0.004264392, 0.024520256, 0.005863539,
0.003198294, 0.002132196, 0, 0.003731343, 0.015458422, 0, 0,
0.101812367, 0, 0.015991471, 0.019189765, 0.036247335, 0, 0.077292111,
0.191364606, 0.087953092, 0.002132196, 0.041577825, 0.000533049,
0.045309168), K562_P5358.2 = c(0.006546645, 0, 0.005455537, 0.007637752,
0.001636661, 0.003273322, 0.225859247, 0.063829787, 0.003273322,
0.003818876, 0.024549918, 0.007092199, 0.002182215, 0.002727769,
0, 0.003818876, 0.015275505, 0, 0, 0.106382979, 0, 0.016912166,
0.01745772, 0.038188762, 0, 0.074195308, 0.195853792, 0.089470813,
0.001636661, 0.040370977, 0.000545554, 0.042007638), K562_P5357.1 = c(0.007057546,
0, 0.004885993, 0.00597177, 0.001085776, 0.003257329, 0.231813246,
0.06514658, 0.003257329, 0.004343105, 0.024972856, 0.00597177,
0.003257329, 0.002171553, 0, 0.003800217, 0.014115092, 0, 0,
0.118892508, 0, 0.01194354, 0.016829533, 0.030944625, 0, 0.07383279,
0.184039088, 0.086862106, 0.002171553, 0.049402823, 0.000542888,
0.043431053), K562_P5357.2 = c(0.008086253, 0, 0.003773585, 0.006469003,
0.001617251, 0.003234501, 0.23180593, 0.063072776, 0.003773585,
0.003234501, 0.023180593, 0.005929919, 0.003234501, 0.002695418,
0, 0.003773585, 0.014555256, 0, 0, 0.116442049, 0, 0.01509434,
0.017789757, 0.035579515, 0, 0.071698113, 0.189757412, 0.085714286,
0.002156334, 0.044743935, 0.000539084, 0.042048518), K562_P5356.1 = c(0.006292906,
0, 0.005148741, 0.004576659, 0.001716247, 0.002860412, 0.215675057,
0.070366133, 0.003432494, 0.003432494, 0.025743707, 0.005720824,
0.003432494, 0.002860412, 0, 0.004576659, 0.016018307, 0, 0,
0.127002288, 0, 0.01201373, 0.016590389, 0.03375286, 0, 0.076659039,
0.183638444, 0.086956522, 0.001716247, 0.044622426, 0.000572082,
0.044622426), K562_P5356.2 = c(0.00755814, 0, 0.004069767, 0.004651163,
0.001744186, 0.002325581, 0.21627907, 0.070930233, 0.002906977,
0.004069767, 0.025, 0.005813953, 0.003488372, 0.002906977, 0,
0.004069767, 0.015697674, 0, 0, 0.125581395, 0, 0.013953488,
0.015697674, 0.035465116, 0, 0.073837209, 0.190697674, 0.08372093,
0.001744186, 0.045348837, 0.000581395, 0.041860465), K562_P5355.1 = c(0.009320175,
0, 0.003289474, 0.007675439, 0.001096491, 0.002741228, 0.285087719,
0.059210526, 0.003837719, 0.006030702, 0.023026316, 0.003837719,
0.003837719, 0.003289474, 0, 0.003289474, 0.016995614, 0.000548246,
0, 0.094298246, 0, 0.014254386, 0.012061404, 0.026315789, 0,
0.052631579, 0.175438596, 0.08497807, 0.001096491, 0.057017544,
0.000548246, 0.048245614), K562_P5355.2 = c(0.008210181, 0, 0.004378763,
0.009304871, 0.001094691, 0.002189382, 0.280788177, 0.056376574,
0.003284072, 0.005473454, 0.024630542, 0.004378763, 0.003284072,
0.003284072, 0, 0.004378763, 0.016967707, 0, 0, 0.100164204,
0, 0.014778325, 0.012588944, 0.028461959, 0, 0.053639847, 0.172961138,
0.084838533, 0.001642036, 0.054187192, 0.000547345, 0.048166393
), K562_P5269.1 = c(0.007308161, 0, 0.003045067, 0.007917174,
0.001218027, 0.00365408, 0.228989038, 0.071863581, 0.00365408,
0.004263094, 0.017661389, 0.007308161, 0.002436054, 0.003045067,
0, 0.002436054, 0.017052375, 0, 0, 0.107186358, 0, 0.015834348,
0.020097442, 0.033495737, 0, 0.085261876, 0.18453106, 0.091961023,
0.002436054, 0.040194884, 0.001218027, 0.03593179), K562_P5269.2 = c(0.006234414,
0, 0.00436409, 0.006234414, 0.002493766, 0.002493766, 0.224438903,
0.073566085, 0.003117207, 0.002493766, 0.018703242, 0.006857855,
0.002493766, 0.003117207, 0, 0.001246883, 0.015586035, 0, 0,
0.109725686, 0, 0.015586035, 0.018703242, 0.034289277, 0, 0.082294264,
0.195760598, 0.092892768, 0.003117207, 0.039276808, 0.000623441,
0.034289277), K562_P5268.1 = c(0.004635762, 0, 0.00397351, 0.007284768,
0.001986755, 0.002649007, 0.214569536, 0.071523179, 0.00397351,
0.002649007, 0.01986755, 0.007284768, 0.003311258, 0.003311258,
0, 0.003311258, 0.016556291, 0, 0, 0.104635762, 0, 0.017880795,
0.018543046, 0.039735099, 0, 0.090066225, 0.195364238, 0.091390728,
0.001986755, 0.039735099, 0.000662252, 0.033112583), K562_P5268.2 = c(0.005242464,
0, 0.002621232, 0.00655308, 0.002621232, 0.00327654, 0.216251638,
0.070117955, 0.00327654, 0.00327654, 0.020969856, 0.008519004,
0.002621232, 0.001965924, 0, 0.002621232, 0.018348624, 0, 0,
0.108781127, 0, 0.015727392, 0.020314548, 0.040629096, 0, 0.087811271,
0.190039319, 0.090432503, 0.001965924, 0.040629096, 0.000655308,
0.034731324)), .Names = c("subcellular_location", "HAP1_P5242",
"HAP1.wt_P8255.1", "HAP1.wt_P8255.2", "HAP1.wt_P8254.1", "HAP1.wt_P8254.2",
"HAP1.kd_P8253.1", "HAP1.kd_P8253.2", "HAP1.kd_P8252.1", "HAP1.kd_P8252.2",
"HAP1.kd_P8249.1", "HAP1.kd_P8249.2", "HAP1.kd_P8248.1", "HAP1.kd_P8248.2",
"HAP1.wt_P8247.1", "HAP1.wt_P8247.2", "HAP1.wt_P8246.1", "HAP1.wt_P8246.2",
"HAP1_P7964.1", "MDS_P7246", "MDS.L_P7246.1", "A673_P6591", "A673_P6591.1",
"K562_P535", "K562_P5494.1", "K562_P5464.1", "K562_P5359.1",
"K562_P5359.2", "K562_P5358.1", "K562_P5358.2", "K562_P5357.1",
"K562_P5357.2", "K562_P5356.1", "K562_P5356.2", "K562_P5355.1",
"K562_P5355.2", "K562_P5269.1", "K562_P5269.2", "K562_P5268.1",
"K562_P5268.2"), class = "data.frame", row.names = c(NA, -32L
))
In my work I'm trying to find which of genes usually comes together. So I set up some experiments and now trying to analyze the data. I already wrote a nice script for analyzing it but still it's not enough.
What I want to do this time is to analyze couple of tables and establish which genes are usually together - in the same cluster.
That's my data:
First table:
> dput(tbl_col_clu1[1:20,])
structure(list(`10` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `20` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `52.5` = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `81` = c(0, 0, 0, 0,
0, 0, 0.64209043, 0, 0, 0, 0, 0, 0, 0, 0.636411741, 0.183490041,
0, 0, 0, 0), `110` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `140.5` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `189` = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0.84958569, 0, 0, 0, 0, 0), `222.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0.37119221, 0, 0, 0, 1, 0, 0, 0, 0,
0), `278` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), `340` = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `397` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `453.5` = c(0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `529` = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `580` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `630.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `683.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `735.5` = c(0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `784` = c(0,
0, 0, 0, 0, 0, 0, 0.399952462, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0.959211661, 1), `832` = c(0, 0.1266780707, 0, 0, 0, 0, 0, 0.2132893016,
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0.959211661, 1), `882.5` = c(0,
0.12667807, 0, 0, 0, 1, 0, 0.08480435, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0.70163097), `926.5` = c(0, 1, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0), `973` = c(0, 0.12621196, 0,
0, 0, 0, 0, 0.11813646, 0, 0, 0, 1, 0, 0, 0.59389934, 1, 0, 0,
0, 0), `1108` = c(0, 0.092444384, 0, 0, 0, 0, 0, 0.115758222,
0, 0, 0, 0.925835779, 0, 0, 1, 0.303482426, 0.848464317, 0, 0,
0), `1200` = c(0, 0.120055749, 0, 1, 0, 0, 0, 0.150055416, 0,
0, 0, 0.558015841, 0, 0, 0.796949668, 0.276321753, 1, 0, 0, 0
), Clusters = structure(c(1L, 64L, 45L, 102L, 11L, 77L, 170L,
55L, 59L, 316L, 316L, 98L, 90L, 77L, 232L, 178L, 101L, 50L, 51L,
51L), .Label = c("10", "10,13,15", "10,15", "10,15,16", "10,20,21,22,23,24",
"10,22,23,24", "11", "11,12,13,14,15", "11,12,13,14,15,16", "11,12,13,14,15,16,17",
"12", "12,13", "12,13,14", "12,13,14,15", "12,13,14,15,16", "12,13,14,15,16,17",
"12,13,14,15,16,17,18,19,20,21,22,23,24", "12,13,15", "12,13,17",
"13", "13,14", "13,14,15", "13,14,15,16", "13,14,15,16,17", "13,15",
"13,15,16,17", "14", "14,15", "14,15,16", "14,15,16,17", "14,15,16,17,18,19,20,21,22,23,24",
"14,19", "15", "15,16", "15,16,17", "15,16,17,18,19,20,21,22,23,24",
"15,16,17,19,20,21,22,23,24", "15,17", "15,17,24", "15,22,23,24",
"15,23", "15,24", "16", "16,17", "17", "17,18,19,20", "17,18,19,20,21,22,23,24",
"17,21,22,23,24", "18", "18,19", "18,19,20", "18,19,20,21", "18,19,20,21,22",
"18,19,20,21,22,23", "18,19,20,21,22,23,24", "18,19,21", "18,19,22,23",
"18,20", "19", "19,20", "19,20,21", "19,20,21,22", "19,20,21,22,23",
"19,20,21,22,23,24", "19,20,22", "19,20,22,23", "19,20,22,23,24",
"19,20,23", "19,21", "19,22", "19,23", "19,24", "2", "2,18,19,20",
"2,19,20", "2,3,4", "20", "20,21", "20,21,22", "20,21,22,23",
"20,21,22,23,24", "20,21,23", "20,22", "20,22,23", "20,22,23,24",
"20,22,24", "20,23", "20,23,24", "20,24", "21", "21,22", "21,22,23",
"21,22,23,24", "21,23,24", "21,24", "22", "22,23", "22,23,24",
"22,24", "23", "23,24", "24", "3", "3,10", "3,18,19,20", "3,18,19,20,21,22,23,24",
"3,19,20", "3,19,20,21", "3,19,20,22,23,24", "3,20,21,22,23,24",
"3,20,22,23,24", "3,21,23,24", "3,22,23,24", "3,22,24", "3,23",
"3,23,24", "3,24", "3,4", "3,4,10", "3,4,18,19", "3,4,18,19,20",
"3,4,18,19,20,21,22,23", "3,4,18,19,20,21,22,23,24", "3,4,19,20,21",
"3,4,21", "3,4,21,22,23", "3,4,21,22,23,24", "3,4,22,23", "3,4,22,23,24",
"3,4,22,24", "3,4,23,24", "3,4,24", "3,4,5", "3,4,5,10", "3,4,5,10,23,24",
"3,4,5,20", "3,4,5,22,23,24", "3,4,5,23,24", "3,4,5,24", "3,4,5,6",
"3,4,5,6,10", "3,4,5,6,20,22,23,24", "3,4,5,6,7", "3,4,5,6,7,10",
"3,4,5,6,7,24", "3,4,5,6,7,8", "3,4,5,6,7,8,10", "3,4,5,6,7,8,10,13",
"3,4,5,6,7,8,10,22,23,24", "3,4,5,6,7,8,12", "3,4,5,6,7,8,15",
"3,4,5,6,7,8,18,19,20,21,22,23,24", "3,4,5,6,7,8,22,23,24", "3,4,5,6,7,8,9,10",
"3,4,5,6,7,8,9,10,11,12", "3,4,5,6,7,8,9,10,11,12,13,14,15",
"3,4,5,6,7,8,9,10,11,12,13,14,15,16,17", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24",
"3,4,5,6,7,8,9,10,11,14,15", "3,4,5,6,7,8,9,10,19,20,21,22,23,24",
"3,4,5,6,7,8,9,10,22,23,24", "3,4,6", "3,4,6,7,20,21,22,23,24",
"3,4,7", "3,4,7,8", "3,5,6,7,8", "3,5,8", "3,7", "3,7,19,20,22,23",
"4", "4,10", "4,10,24", "4,18,19,20", "4,19,20", "4,20,21,22",
"4,20,21,22,23,24", "4,20,22,23,24", "4,22,23,24", "4,23,24",
"4,24", "4,5", "4,5,10", "4,5,10,21", "4,5,10,23,24", "4,5,19,20,21,22,23",
"4,5,19,20,22,23,24", "4,5,20,21,22,23,24", "4,5,20,22,23,24",
"4,5,22,23,24", "4,5,24", "4,5,6", "4,5,6,10", "4,5,6,10,20,22,23,24",
"4,5,6,19", "4,5,6,22,23,24", "4,5,6,7", "4,5,6,7,10", "4,5,6,7,19,20,21,22,23,24",
"4,5,6,7,22,23,24", "4,5,6,7,8", "4,5,6,7,8,10", "4,5,6,7,8,10,19,20,21,22,23,24",
"4,5,6,7,8,10,20,21,22,23,24", "4,5,6,7,8,10,21,22,23,24", "4,5,6,7,8,10,22,23,24",
"4,5,6,7,8,10,23,24", "4,5,6,7,8,15", "4,5,6,7,8,17,18,19,20,21,22,23,24",
"4,5,6,7,8,19,20", "4,5,6,7,8,19,20,21,22,23,24", "4,5,6,7,8,20,21,22,23,24",
"4,5,6,7,8,21,22,23,24", "4,5,6,7,8,22,23,24", "4,5,6,7,8,9,10",
"4,5,6,7,8,9,10,11,12", "4,5,6,7,8,9,10,11,12,13,14,15", "4,5,6,7,8,9,10,11,12,13,14,15,16,17",
"4,5,6,7,8,9,10,11,12,13,14,15,16,17,18", "4,5,6,7,8,9,10,12,13",
"4,5,6,7,8,9,14,15,16", "4,5,7,9", "4,5,8,22", "4,6", "4,6,7,22,23,24",
"4,6,7,23,24", "4,6,7,8,15,17", "4,6,7,8,23,24", "4,7", "4,7,20,21",
"4,7,21,22,23,24", "4,7,8", "4,7,8,22,23,24", "5", "5,10", "5,17",
"5,18,19,20,21,22,23", "5,19,20,21,22,23,24", "5,20", "5,22,23,24",
"5,24", "5,6", "5,6,10", "5,6,7", "5,6,7,10", "5,6,7,10,19",
"5,6,7,22,23,24", "5,6,7,8", "5,6,7,8,10", "5,6,7,8,10,15", "5,6,7,8,10,22,23,24",
"5,6,7,8,15", "5,6,7,8,18,19,20,21,22,23,24", "5,6,7,8,21,22,23,24",
"5,6,7,8,22,23,24", "5,6,7,8,9", "5,6,7,8,9,10", "5,6,7,8,9,10,11,12,13",
"5,6,7,8,9,10,11,12,13,14,15", "5,6,7,8,9,12", "5,6,7,8,9,13",
"5,7", "5,7,8", "5,8", "6", "6,10", "6,21,22,23", "6,22", "6,22,23,24",
"6,7", "6,7,10,17", "6,7,22,23,24", "6,7,23,24", "6,7,24", "6,7,8",
"6,7,8,10", "6,7,8,13,14,15,16,17", "6,7,8,15", "6,7,8,19,20",
"6,7,8,20,21,22,23,24", "6,7,8,21,22,23,24", "6,7,8,23,24", "6,7,8,9",
"6,7,8,9,10", "6,7,8,9,10,11,12", "6,7,8,9,10,11,12,13,14,15,16,17",
"6,7,8,9,10,15,16", "6,7,8,9,10,18,19,20,21,22,23,24", "6,7,8,9,15",
"6,8", "7", "7,15", "7,15,17", "7,16,18,21", "7,17", "7,19,20",
"7,19,20,21,22", "7,20,21,22,23,24", "7,20,22,23,24", "7,22,23,24",
"7,24", "7,8", "7,8,10", "7,8,10,22,23,24", "7,8,13,15", "7,8,14",
"7,8,15", "7,8,15,16", "7,8,15,23", "7,8,20", "7,8,22", "7,8,23",
"7,8,9", "7,8,9,10", "7,8,9,13", "7,8,9,15,16,17", "8", "8,10",
"8,15", "8,17", "8,22", "8,24", "8,9", "8,9,10", "9", "9,10,11,12,13,14,15,16,17"
), class = "factor")), .Names = c("10", "20", "52.5", "81", "110",
"140.5", "189", "222.5", "278", "340", "397", "453.5", "529",
"580", "630.5", "683.5", "735.5", "784", "832", "882.5", "926.5",
"973", "1108", "1200", "Clusters"), row.names = c("at1g01050.1",
"at1g01080.1", "at1g01090.1", "at1g01220.1", "at1g01320.2", "at1g01420.1",
"at1g01710.1", "at1g01800.1", "at1g01920.2", "at1g01940.1", "at1g01960.1",
"at1g02020.2", "at1g02100.2", "at1g02140.1", "at1g02150.1", "at1g02500.2",
"at1g02560.1", "at1g02880.3", "at1g02920.1", "at1g02930.2"), class = "data.frame")
Second table:
> dput(tbl_col_clu2[1:20,])
structure(list(`10` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `20` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `52.5` = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `81` = c(0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `110` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `140.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `189` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `222.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `278` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), `340` = c(0,
0, 0, 0, 0, 0, 0.583163048, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0,
1, 0.218194067), `397` = c(0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0.63953839, 0, 1, 0, 0, 0, 1), `453.5` = c(0, 0.66069369,
0, 0, 0, 1, 0.57541627, 1, 1, 0, 0, 0, 1, 0.64615661, 0, 0.45209671,
0, 0, 0, 0.17022498), `529` = c(0, 0.521435654, 0, 0, 1, 0, 0.175996209,
0, 0, 0, 1, 0, 0, 0, 0, 0.886059888, 0, 0, 0, 0.17022498), `580` = c(0,
0.437291195, 0, 0, 1, 0, 0.20731698, 0, 0, 0, 1, 0, 0, 0, 0,
0.719755907, 0, 0, 0, 0.033248127), `630.5` = c(0, 0.52204783,
0, 0, 0, 0, 0.48815538, 0, 0, 0, 0, 1, 0, 0, 0, 0.82709638, 0,
0, 0, 0.09539534), `683.5` = c(0, 0.52429838, 0, 0, 0, 0, 0.59605685,
0, 0, 0, 0, 0, 0, 0, 0, 0.27845748, 0.28224351, 0, 0, 0), `735.5` = c(1,
0.3768651, 0, 1, 0, 0, 0.51381348, 0, 0, 0, 0, 0, 0, 0, 0, 0.39914361,
0.22206677, 0, 0, 0), `784` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0), `832` = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0.16189002, 0, 0, 0), `882.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `926.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0), `973` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.86100786, 0, 0, 0, 0,
0), `1108` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), `1200` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), Clusters = structure(c(168L, 32L, 246L,
168L, 81L, 44L, 8L, 44L, 27L, 318L, 81L, 132L, 15L, 3L, 219L,
32L, 156L, 318L, 1L, 6L), .Label = c("10", "10,11", "10,11,12",
"10,11,12,13", "10,11,12,13,14", "10,11,12,13,14,15", "10,11,12,13,14,15,16",
"10,11,12,13,14,15,16,17", "10,11,12,13,14,15,16,17,18,19", "10,11,12,13,14,15,16,17,18,19,20",
"10,11,12,13,14,15,16,17,18,19,20,21", "10,11,12,13,14,16", "10,11,12,13,15,16,17,18,19,20,21",
"10,11,12,13,19", "10,12", "10,12,13", "10,12,13,14", "10,12,13,14,15",
"10,12,13,14,15,16,17", "10,12,13,15", "10,12,21", "10,13", "10,13,14",
"10,17,18", "10,20", "11", "11,12", "11,12,13", "11,12,13,14",
"11,12,13,14,15", "11,12,13,14,15,16", "11,12,13,14,15,16,17",
"11,12,13,14,15,16,17,18,19", "11,12,13,14,15,16,17,18,19,20",
"11,12,13,14,15,16,17,18,19,20,21,22,23", "11,12,13,14,15,16,17,18,19,20,21,22,23,24",
"11,12,13,14,15,16,17,18,19,21,22", "11,12,13,14,15,16,18", "11,12,13,17,18,19",
"11,12,14", "11,13", "11,13,14,15,16", "11,15", "12", "12,13",
"12,13,14", "12,13,14,15", "12,13,14,15,16", "12,13,14,15,16,17",
"12,13,14,15,16,17,18", "12,13,14,15,16,17,18,19", "12,13,14,15,16,17,18,19,20",
"12,13,14,15,16,17,18,19,20,21", "12,13,14,15,16,17,18,19,20,21,22",
"12,13,14,15,16,17,18,19,20,21,22,23", "12,13,14,15,16,17,18,19,20,21,22,23,24",
"12,13,14,15,16,17,18,19,23,24", "12,13,14,15,16,17,19", "12,13,14,15,16,17,19,20,21",
"12,13,14,15,16,17,21", "12,13,14,15,16,18", "12,13,14,15,17",
"12,13,14,16,17,19", "12,13,14,18", "12,13,15", "12,13,16", "12,13,16,17,18,19",
"12,13,16,19", "12,13,17", "12,13,21,22,23", "12,14", "12,14,15",
"12,14,15,16", "12,14,15,17,19", "12,15", "12,15,16,17", "12,16,17",
"12,20", "12,21,23", "13", "13,14", "13,14,15", "13,14,15,16",
"13,14,15,16,17", "13,14,15,16,17,18", "13,14,15,16,17,18,19",
"13,14,15,16,17,18,19,20", "13,14,15,16,17,18,19,20,21", "13,14,15,16,17,18,19,20,21,22",
"13,14,15,16,17,18,19,20,21,22,23", "13,14,15,16,17,18,19,20,21,22,23,24",
"13,14,15,16,17,18,19,21", "13,14,15,16,17,18,19,21,22,23", "13,14,15,16,17,19",
"13,14,15,16,17,21", "13,14,15,16,18,23", "13,14,17", "13,14,19,20,21,22,23",
"13,14,23,24", "13,15", "13,15,16", "13,15,16,18,19", "13,15,17",
"13,16,17", "13,17", "13,17,19", "13,19", "13,21", "14", "14,15",
"14,15,16", "14,15,16,17", "14,15,16,17,18", "14,15,16,17,18,19",
"14,15,16,17,18,19,20", "14,15,16,17,18,19,20,21", "14,15,16,17,18,19,20,21,22",
"14,15,16,17,18,19,20,21,22,23", "14,15,16,17,18,19,20,21,22,23,24",
"14,15,16,17,18,19,20,22,23,24", "14,15,16,17,19", "14,15,16,17,19,20",
"14,15,16,17,19,20,21", "14,15,16,17,22", "14,15,16,19", "14,15,17",
"14,15,19", "14,17", "14,17,18,19", "14,19", "14,21", "15", "15,16",
"15,16,17", "15,16,17,18", "15,16,17,18,19", "15,16,17,18,19,20",
"15,16,17,18,19,20,21", "15,16,17,18,19,20,21,22,23", "15,16,17,18,19,20,21,22,23,24",
"15,16,17,19", "15,16,17,19,20,21", "15,16,17,19,24", "15,16,17,20,21",
"15,16,17,21", "15,16,17,23", "15,16,18,19", "15,16,19,20", "15,17",
"15,18,19,20", "15,18,19,20,21", "15,19", "16", "16,17", "16,17,18",
"16,17,18,19", "16,17,18,19,20", "16,17,18,19,20,21", "16,17,18,19,20,21,22",
"16,17,18,19,20,21,22,23", "16,17,18,19,20,21,22,23,24", "16,17,19",
"16,17,19,20", "16,17,19,20,21", "16,17,19,21", "16,17,23", "16,19",
"17", "17,18", "17,18,19", "17,18,19,20", "17,18,19,20,21", "17,18,19,20,21,22",
"17,18,19,20,21,22,23", "17,18,19,20,21,22,23,24", "17,18,19,21",
"17,19", "17,19,20", "17,19,20,21", "17,19,20,21,22,23,24", "17,19,23",
"17,20,21", "17,20,21,23", "17,21,22", "17,23", "17,24", "18",
"18,19", "18,19,20", "18,19,20,21", "18,19,20,21,22", "18,19,20,21,22,23",
"18,19,20,21,22,23,24", "18,19,20,21,23", "18,20", "19", "19,20",
"19,20,21", "19,20,21,22", "19,20,21,22,23", "19,20,21,22,23,24",
"19,20,21,23,24", "19,20,22", "19,21", "19,22", "19,23", "2",
"2,17", "2,3,4,5,6", "2,3,4,5,6,7", "20", "20,21", "20,21,22",
"20,21,22,23", "20,21,22,23,24", "20,21,23", "20,21,23,24", "21",
"21,22", "21,22,23", "21,22,23,24", "21,23", "22", "22,23", "22,23,24",
"23", "23,24", "24", "3", "3,23,24", "3,4", "3,4,23,24", "3,4,5",
"3,4,5,6", "3,4,5,6,13,14,15,16,17,18,19,20,21,22,23,24", "3,4,5,6,7",
"3,4,5,6,7,8,9", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24",
"3,4,5,6,7,8,9,20,21,22,23,24", "3,4,5,6,7,8,9,21,22,23,24",
"3,4,5,6,8,9", "3,4,5,7,8,9,15,16,17,18,19,20,21,22,23", "3,4,6,12,13,14,15,16,17,18,19,20,21,22,23,24",
"3,8,9,10,11,12,13,14,15,16,17,18,19,20", "4", "4,17,18,19,20,21,22,23,24",
"4,19,20,21,22,23,24", "4,21", "4,22,23,24", "4,5,17,18,19,20,21,22,23,24",
"4,5,21,22,23,24", "4,5,6", "4,5,6,22,23,24", "4,5,6,7,8,9",
"4,5,6,7,8,9,10", "4,5,6,7,8,9,10,15,16,17,18,19,20,21,22,23,24",
"4,5,6,7,8,9,12,13,14,15,16,17,18,19,20,21,22,23,24", "4,5,6,7,8,9,13",
"4,5,6,7,8,9,14,15,16,17,18,19,20,21,22,23,24", "4,5,6,7,8,9,17,18,19,20,21,22,23,24",
"4,5,6,7,8,9,19,20,21,22,23,24", "4,5,6,7,8,9,19,23,24", "4,5,6,7,8,9,23,24",
"4,5,7,8,9", "4,8,9,12,13,14,15,16,17,18,19,20,21,22,23,24",
"4,8,9,23,24", "5", "5,22,23", "5,6", "5,6,15,16,17,18,19,20,21,22,23,24",
"5,6,19,20,21,22,23,24", "5,6,24", "5,6,7", "5,6,7,8", "5,6,7,8,19,20,21,22,23,24",
"5,6,7,8,9", "5,6,7,8,9,10,11,12,13", "5,6,7,8,9,10,11,12,13,14,15,16,17",
"5,6,7,8,9,15,23,24", "5,6,9", "5,7", "5,8,9", "6", "6,15,16,17,18,19,20,21,22,23,24",
"6,19,20,21,22,23,24", "6,20,21,22,23,24", "6,21,22,23,24", "6,7",
"6,7,8", "6,7,8,9", "6,7,8,9,15,16,17,18,19,20,21,22,23,24",
"6,7,8,9,23,24", "6,7,9", "6,8,15,16,17,18,19,20,21,22,23", "6,8,9",
"6,9", "7", "7,14,24", "7,8,9", "7,8,9,10,11,12,13,14,15", "7,8,9,20,21,22,23,24",
"7,8,9,23,24", "7,9", "7,9,10", "8", "8,19,20,21", "8,19,20,21,22,23,24",
"8,9", "8,9,10,11,12,13,14,15,16,17", "8,9,10,17,18,19,20,21,22",
"8,9,12,13,14,15,16,17,18,19", "8,9,14,15,16,17,18,19,20,21,22,23,24",
"8,9,15,16,17,18,19,20,21,22", "8,9,19", "8,9,19,20,21,22,23",
"8,9,21,22", "9", "9,10", "9,10,11,12,13,14", "9,10,11,12,13,14,15,16",
"9,10,11,12,13,14,15,16,17", "9,10,11,12,13,14,15,16,17,18,19",
"9,10,11,12,13,14,15,16,17,18,19,20,21", "9,10,11,12,13,14,15,16,17,18,19,20,21,22,23",
"9,10,11,12,13,14,15,16,17,19", "9,12", "9,12,13", "9,12,13,14",
"9,13", "9,13,14,15", "9,13,14,15,16,17", "9,13,14,15,18", "9,14",
"9,14,15,16", "9,15", "9,15,16,17", "9,16", "9,16,17,18,19,21,22",
"9,16,17,19", "9,17", "9,17,18", "9,19", "9,19,20", "9,19,20,21",
"9,19,21", "9,20", "9,20,21", "9,20,21,22", "9,21", "9,22", "9,23"
), class = "factor")), .Names = c("10", "20", "52.5", "81", "110",
"140.5", "189", "222.5", "278", "340", "397", "453.5", "529",
"580", "630.5", "683.5", "735.5", "784", "832", "882.5", "926.5",
"973", "1108", "1200", "Clusters"), row.names = c("at1g01050.1",
"at1g01080.1", "at1g01090.1", "at1g01220.1", "at1g01420.1", "at1g01470.1",
"at1g01800.1", "at1g01910.5", "at1g01920.2", "at1g01980.1", "at1g02020.2",
"at1g02100.2", "at1g02130.1", "at1g02140.1", "at1g02150.1", "at1g02500.2",
"at1g02560.1", "at1g02780.1", "at1g02880.3", "at1g02920.1"), class = "data.frame")
Third Table:
> dput(tbl_col_clu3[1:20,])
structure(list(`10` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `33.95` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `58.66` = c(0, 0, 0, 0, 0.328143363,
0.552139556, 0.495919686, 0, 0, 0, 0, 0, 0, 0, 0, 0.416266322,
0.886125103, 1, 1, 0), `84.42` = c(0, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 0.327004551, 0, 0, 0, 0.956778355, 1, 0.175277617, 0.240402438,
0), `110.21` = c(0, 0, 0, 0, 0, 0.151581882, 0, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0.091367379, 0.029316359, 0, 0), `134.16` = c(0.190968551,
0, 0, 0, 0, 0.164736594, 0, 0, 0, 0, 0, 0.650199285, 0, 0, 0,
0, 0.097800974, 0.007393484, 0, 0), `164.69` = c(0.5342874459,
0, 0.3619993464, 0, 0, 0.1891527151, 0, 0, 0, 0, 0, 0.4926963182,
0, 0, 0, 0, 0, 0, 0, 0), `199.1` = c(0.866134859, 0, 0.405387979,
0, 0, 0.274468991, 0, 0, 0, 0, 0, 0.352737127, 0.170514318, 0,
0, 0, 0, 0, 0, 0), `234.35` = c(1, 0, 0.446118481, 0, 0, 0.338427523,
0, 0, 0, 0, 0, 0.204601923, 0.343919727, 0, 0, 0, 0, 0, 0, 0),
`257.19` = c(0.732231652, 0, 0.666653103, 0, 0, 0.403078017,
0, 0, 0, 0, 0, 0.315665123, 1, 0, 0, 0, 0, 0, 0, 0), `361.84` = c(0.660960044,
0, 1, 0, 0, 0.202578329, 0, 0, 0, 0, 0, 0.320183046, 0.424361453,
0, 0, 0, 0, 0, 0, 0), `432.74` = c(0.47961801, 0, 0.48323321,
0, 0, 0.25926071, 0, 0, 0, 0, 0, 0.36362413, 0.43039587,
0, 0, 0, 0, 0, 0, 0), `506.34` = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0.22943212, 0.19354376, 0, 0, 0, 0, 0, 0, 0), `581.46` = c(0,
0.52783556, 0, 1, 0, 0, 0, 0.64407392, 0, 0.70701938, 0,
0.2596209, 0.29757967, 0, 0, 0, 0, 0, 0, 0), `651.71` = c(0,
0.32678969, 0, 0.36428195, 0, 0, 0, 0.64951761, 0, 0.80866933,
1, 0.18614028, 0.21567888, 0.32813633, 0, 0, 0, 0, 0, 0),
`732.59` = c(0, 0.229023369, 0, 0.312832425, 0, 0, 0, 0.696041374,
0, 0.590471454, 0, 0.108699479, 0.187935709, 0.275177957,
0, 0, 0, 0, 0, 0.243080694), `817.56` = c(0, 0.25668583,
0, 0.4003249, 0, 0, 0, 0.53376606, 0, 0.85524485, 0, 0.22539659,
0.27977127, 0.55089774, 0, 0, 0, 0, 0, 1), `896.24` = c(0,
0.31675535, 0, 0.50882005, 0, 0, 0, 0.74705458, 0.12936306,
1, 0, 0.1949139, 0.21957859, 0.75063327, 0, 0, 0, 0, 0, 0.63346358
), `971.77` = c(0, 0.27811949, 0, 0.48419038, 0, 0, 0, 0.8563439,
0.39897143, 0.84491933, 0, 0.13935282, 0.17670128, 0.84111004,
0, 0, 0, 0, 0, 0), `1038.91` = c(0, 1, 0, 0.52506752, 0,
0, 0, 1, 1, 0.85617714, 0, 0.13507463, 0, 1, 0, 0, 0, 0,
0, 0), Clusters = structure(c(222L, 88L, 237L, 88L, 145L,
155L, 143L, 88L, 122L, 88L, 97L, 180L, 260L, 102L, 186L,
145L, 149L, 149L, 145L, 106L), .Label = c("10", "10,11",
"10,11,12", "10,11,12,13", "10,11,12,13,14", "10,11,12,13,14,15",
"10,11,12,13,14,15,16", "10,11,12,13,14,15,16,17,18", "10,11,12,13,14,15,16,17,18,19",
"10,11,12,13,14,15,16,17,18,19,20", "10,11,12,14", "10,11,12,14,15",
"10,11,12,14,15,16", "10,11,12,14,15,16,17,18", "10,11,12,14,15,16,17,18,19",
"10,11,12,14,15,16,17,18,19,20", "10,11,12,14,15,17,18,19",
"10,11,12,15,16,17", "10,11,14", "10,11,15", "10,11,15,16,17",
"10,11,16", "10,11,17", "10,11,20", "10,12", "10,14,15,16",
"10,14,15,16,17,18,19", "10,15", "10,15,16", "10,15,16,18",
"10,16,19", "10,18,19,20", "10,19", "10,19,20", "10,20",
"11", "11,12", "11,12,13", "11,12,13,14", "11,12,13,14,15",
"11,12,13,14,15,16", "11,12,13,14,15,16,17,18", "11,12,13,14,15,16,17,18,19",
"11,12,13,14,15,16,17,18,19,20", "11,12,13,14,15,16,18,19",
"11,12,14,15", "11,12,14,15,16,17", "11,12,14,15,16,17,18",
"11,12,14,15,16,17,18,19", "11,12,14,15,16,17,18,19,20",
"11,12,18", "11,12,19", "11,12,20", "12", "12,13", "12,13,14",
"12,13,14,15", "12,13,14,15,16", "12,13,14,15,16,17,18",
"12,13,14,15,16,17,18,19,20", "12,14", "12,14,15", "12,14,15,16",
"12,14,15,16,17", "12,14,15,16,17,18", "12,14,15,16,17,18,19",
"12,14,15,16,17,18,19,20", "12,14,15,16,20", "12,14,15,18,19,20",
"12,15", "12,16", "12,16,17,18", "12,18,19,20", "12,19,20",
"12,20", "13", "13,14", "13,14,15", "13,14,15,16,17,18,19,20",
"13,16", "13,20", "14", "14,15", "14,15,16", "14,15,16,17",
"14,15,16,17,18", "14,15,16,17,18,19", "14,15,16,17,18,19,20",
"14,15,16,18", "14,15,17", "14,15,18", "14,16", "14,16,17",
"14,16,17,18,19,20", "14,18,19,20", "14,19", "15", "15,16",
"15,16,17", "15,16,17,18", "15,16,17,18,19", "15,16,17,18,19,20",
"15,20", "16", "16,17", "16,17,18", "16,17,18,19", "16,17,18,19,20",
"16,17,18,20", "16,17,19", "16,18,19,20", "16,19,20", "17",
"17,18", "17,18,19", "17,18,19,20", "17,18,20", "17,19,20",
"17,20", "18", "18,19", "18,19,20", "19", "19,20", "2", "2,19,20",
"2,3", "2,3,4", "2,3,4,5", "2,3,4,5,11", "2,3,4,5,6", "2,3,4,5,6,7,8",
"2,3,4,5,6,7,8,11,12", "2,3,4,5,6,7,8,9", "2,3,4,5,6,7,8,9,10",
"2,3,4,5,6,7,8,9,10,11", "2,3,4,5,6,7,8,9,10,11,12", "2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
"2,4", "2,5", "2,5,6,7", "20", "3", "3,18", "3,4", "3,4,10",
"3,4,20", "3,4,5", "3,4,5,6", "3,4,5,6,7", "3,4,5,6,7,8",
"3,4,5,6,7,8,9", "3,4,5,6,7,8,9,10", "3,4,5,6,7,8,9,10,11",
"3,4,5,6,7,8,9,10,11,12", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17",
"3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
"3,4,8", "3,4,8,9", "3,5", "3,7", "3,9", "4", "4,5", "4,5,12,13",
"4,5,16", "4,5,6", "4,5,6,16,17,18,19,20", "4,5,6,20", "4,5,6,7",
"4,5,6,7,8", "4,5,6,7,8,10,11", "4,5,6,7,8,9", "4,5,6,7,8,9,10",
"4,5,6,7,8,9,10,11", "4,5,6,7,8,9,10,11,12", "4,5,6,7,8,9,10,11,12,13,14,15",
"4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19", "4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
"4,5,6,7,8,9,10,11,12,14,15,16,17,18,19,20", "4,5,6,7,8,9,16,17",
"4,5,7,8,9,10,11,12,13,14,15,16,17,18,19,20", "4,6,7", "4,7,13",
"5", "5,11,12,14,15,16,17,18,19", "5,14", "5,14,15,16", "5,16,19",
"5,17,18,19,20", "5,18", "5,6", "5,6,7", "5,6,7,10", "5,6,7,8",
"5,6,7,8,10", "5,6,7,8,9", "5,6,7,8,9,10", "5,6,7,8,9,10,11",
"5,6,7,8,9,10,11,12", "5,6,7,8,9,10,11,12,13", "5,6,7,8,9,10,11,12,13,14",
"5,6,7,8,9,10,11,12,13,14,15,16", "5,6,7,8,9,10,11,12,13,14,15,16,17,18",
"5,6,7,8,9,10,11,12,13,14,15,16,17,18,19", "5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
"5,6,7,8,9,16,17,18,19,20", "5,6,8", "5,7,8,9,10", "5,7,8,9,10,14,15,16,17,18",
"5,8", "6", "6,7", "6,7,16", "6,7,8", "6,7,8,10,11,12,15,16,17,18",
"6,7,8,19", "6,7,8,9", "6,7,8,9,10", "6,7,8,9,10,11", "6,7,8,9,10,11,12",
"6,7,8,9,10,11,12,13,14", "6,7,8,9,10,11,12,13,14,15,16,17",
"6,7,8,9,10,11,12,13,14,15,16,17,18,19", "6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
"6,7,8,9,10,11,12,14,15,16", "6,7,8,9,10,18,19", "7", "7,10,11,14,15",
"7,12", "7,8", "7,8,12", "7,8,9", "7,8,9,10", "7,8,9,10,11",
"7,8,9,10,11,12", "7,8,9,10,11,12,13", "7,8,9,10,11,12,13,14,15,16",
"7,8,9,10,11,12,13,14,15,16,17,18", "7,8,9,10,11,12,13,14,15,16,17,18,19",
"7,8,9,10,11,12,13,14,15,16,17,18,19,20", "7,8,9,10,11,12,14,15,16,17,18,19",
"7,8,9,10,11,12,14,15,16,17,18,19,20", "7,8,9,10,12,15,16,17,18",
"7,9,10,11,12,13,14,15,16,17,18,19,20", "8", "8,10", "8,10,20",
"8,14,15,16,17,18,19,20", "8,16,17", "8,9", "8,9,10", "8,9,10,11",
"8,9,10,11,12", "8,9,10,11,12,13,14", "8,9,10,11,12,13,14,15",
"8,9,10,11,12,13,14,15,16", "8,9,10,11,12,13,14,15,16,17,18",
"8,9,10,11,12,13,14,15,16,17,18,19", "8,9,10,11,12,13,14,15,16,17,18,19,20",
"8,9,10,11,12,14,15,16", "8,9,10,11,12,14,15,16,17,18,19,20",
"8,9,10,14,15,16,17,18,19,20", "8,9,17", "9", "9,10", "9,10,11",
"9,10,11,12", "9,10,11,12,13,14,15,16,17", "9,10,11,12,13,14,15,16,17,18",
"9,10,11,12,13,14,15,16,17,18,19", "9,10,11,12,13,14,15,16,17,18,19,20",
"9,10,11,12,14,15,16", "9,10,11,12,14,15,16,17,18", "9,10,11,12,14,15,16,17,18,19",
"9,10,11,12,14,15,16,17,18,19,20", "9,10,11,12,16,17,18,19,20",
"9,10,11,14,15,16,17", "9,10,12,14,15,16,17", "9,10,14,15",
"9,11,12", "9,11,12,14", "9,12,14", "9,20"), class = "factor")), .Names = c("10",
"33.95", "58.66", "84.42", "110.21", "134.16", "164.69", "199.1",
"234.35", "257.19", "361.84", "432.74", "506.34", "581.46", "651.71",
"732.59", "817.56", "896.24", "971.77", "1038.91", "Clusters"
), row.names = c("at1g01050.1", "at1g01080.1", "at1g01090.1",
"at1g01320.2", "at1g01470.1", "at1g01800.1", "at1g01910.5", "at1g01960.1",
"at1g01980.1", "at1g02150.1", "at1g02470.1", "at1g02500.2", "at1g02560.1",
"at1g02780.1", "at1g02816.1", "at1g02880.2", "at1g02920.1", "at1g02930.2",
"at1g03030.1", "at1g03090.2"), class = "data.frame")
The last column (Clusters) is important for us and the row.names. This column says in which column we can find any abundance for that gene. It doesn't matter for me in which exaclty cluster is gene but which genes come together with it.
Let's use an example:
Those genes belong to the same cluster (cluster 5) in data1.
at1g09640.1
at1g07250.1
at1g08200.1
at1g09300.2 ##
at1g09490.2 ## Those
at1g09760.1 ##
at1g09780.1
If we analyze other data set (data2). We can see that some of those genes can be found together again. Maybe it's different cluster (cluster 20) or so but they are together and that's most important for me.
at1g02880.3
at1g01220.1
at1g09300.2 ##
at1g09490.2 ## Those
at1g09760.1 ##
at1g02130.1
I have like 15 similar data sets and I would like to be able to ask R: show me genes which can be found together in 15 of 15 data sets or 13 of 15 data sets and so on....
Any ideas ?
First, you need to turn those comma delimited lists into columns- it is much easier to work with them that way. Then, you want to find which genes have the matching columns. Finally, you can aggregate to get totals of how many genes match other genes.
Note that you will have both orders of genes, as well as genes matched with themselves. Also, the "Clusters" column will tell you how many times they were in the same exact set of clusters.
This will run in O(n^2) time, meaning that doubling the number of genes analyzed will quadruple the time. My quick timing tests estimate it would take 15 hours on my computer to do 15 data frames of 2300 rows.
library(plyr)
frame_list <- list(tbl_col_clu1, tbl_col_clu2, tbl_col_clu3)
turn_numbers_into_columns <- function(x) {
# Creates a data.frame that has the group numbers as columns
x[, strsplit(x$Clusters, ",")[[1]]] <- 1
return(x)
}
get_comparison <- function(current_table) {
# Creates a comparison data frame for a single input table
simplified_frame <- data.frame(
"gene" = row.names(current_table),
"Clusters" = as.character(current_table$Clusters),
stringsAsFactors = FALSE)
split_f <- adply(simplified_frame, 1, turn_numbers_into_columns)
#This is the slow line
comparison_frame <- ddply(split_f, "gene", function(x) {
ddply(split_f, "gene", function (y) {
output <- as.data.frame(x == y)
output$gene <- x$gene
output$gene2 <- y$gene
return(output)
})
})
return(comparison_frame)
}
combined_frame <- ldply(frame_list, get_comparison)
sum_frame <- aggregate(
combined_frame[, !(names(combined_frame) %in% c("gene", "gene2"))],
by = combined_frame[, c("gene", "gene2")],
FUN = sum,
na.rm = T)
View(sum_frame)
If you had consistently the same set of genes and groupings, you could turn everything into arrays, which run faster than data frames, cutting your time by a factor of about six. The part that runs very slowly would be replaced with something like this. It returns 3-dimensional arrays that you could add together.
comparison_frame <- aaply(split_f, 1, function(x) {
print(x)
output <- aaply(split_f, 1, function (y) {
output <- array(x == y, c(1, length(x)))
return(output)
})
return(output)
})
Throw them into SPMF with Apriori or FPGrowth algorithm. SPMF expects input as file of comma-separated sequences of integers (you may have to convert your data). Each sequence is on separate string:
1,2,4,10
3,2,1,11,12
2,5,14,5
You invoke it like this:
java -jar spmf.jar run FPGrowth sequences.txt output.txt 35% 90%
First number is minimal support (how many sets should contain your group to consider it a group). SPMF contains different algorithms You can try to see which one fits you best.