Plot rows with similar name on the same graph - r

That's the data which I would like to plot:
structure(list(`10` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
`34` = c(0, 0, 0, 0, 0, 0, 0, 0, 547725, 0),
`59` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
`84` = c(0, 0, 0, 8173070.8, 0, 0, 0, 0, 0, 0),
`110` = c(0, 0, 0, 20302893.6, 0, 0, 0, 0, 0, 0),
`134` = c(0, 0, 0, 13696077.5, 0, 0, 0, 0, 0, 0),
`165` = c(1024325, 0, 0, 10486165.5, 0, 0, 0, 0, 0, 0),
`199` = c(1183267.5, 0, 0, 6015700, 0, 0, 0, 0, 0, 0),
`234` = c(1771708.3, 0, 0, 3384495.8, 3384495.8, 0, 0, 0, 0, 1144700),
`257` = c(2007712.3, 0, 0, 0, 6980230.6, 0, 0, 0, 0, 0),
`362` = c(3339118.9, 0, 0, 0, 7280030.6, 1119625, 0, 0, 0, 0),
`433` = c(973797.9, 0, 0, 0, 6230170, 1497625, 0, 0, 0, 0),
`506` = c(0, 0, 0, 0, 12905925, 0, 0, 0, 0, 0),
`581` = c(0, 2140050, 0, 0, 4560645.8, 0, 3170133.3, 0, 0, 0),
`652` = c(0, 639437.7, 639437.7, 0, 2349711.3, 0, 902318.3, 902318.3, 0, 0),
`733` = c(0, 0, 1397257.5, 0, 2274710, 0, 0, 1414458.3, 0, 0),
`818` = c(0, 0, 742731.8, 0, 2953550, 0, 0, 563876.7, 0, 0),
`896` = c(0, 0, 714654.7, 0, 1199563.3, 0, 0, 561000, 0, 0),
`972` = c(0, 0, 434271.5, 0, 1358225, 0, 0, 0, 0, 0),
`1039` = c(0, 0, 227435, 0, 934840, 0, 0, 0, 0, 0)),
.Names = c("10", "34", "59", "84", "110", "134", "165", "199", "234", "257", "362", "433", "506", "581", "652", "733", "818", "896", "972", "1039"),
row.names = c("Mark121_1", "Mark121_2", "Mark121_3", "Mark143_1", "Mark143_2", "Mark152_1", "Mark152_2", "Mark152_3", "Mark444_1", "Mark444_2"),
class = "data.frame")
I would like to put the lines for the rows which differ only in the number after _ (dash) on the same plot. The different colors for the lines are necessary. I was thinking about using matplot but I have no idea how to select the rows with similar strings.
Using simple words I would like to have lines for:
Mark121_*
Mark143_*
Mark152_*
Mark444_*
on the same graph. It means 4 different graphs containing multiple lines.

This solution uses "dplyr" and "ggplot2" and "purrr". There is a large difference in scale so I change to log10, you might not want that.
df2 <- df %>% mutate(Name= rownames(.)) %>%
gather(key=period, value=value,-Name) %>%
mutate(person= sub("_.", "", Name), period=as.numeric(period))
df2 %>% ggplot(., aes(x=period, y=log10(value), colour=Name, group=Name)) +
geom_line() + facet_wrap(~person)
Edit: Additional request
In order to plot each figure individually
#This saves the figures as a list of plot objects
FiguresList <- unique(df2$person) %>% map(function(P) {
df2 %>% filter(person ==P) %>%
ggplot(., aes(x=period, y=log10(value), colour=Name, group=Name)) +
geom_line()}
)
FiguresList[[1]]
#This saves each plot as a pdf named by the person e.g "Mark121.pdf"
unique(df2$person) %>% map(function(P) {
df2 %>% filter(person ==P) %>%
ggplot(., aes(x=period, y=log10(value), colour=Name, group=Name)) +
geom_line()
ggsave(paste(P,".pdf", sep=""))}
)

Related

How can I make this replacement of values based on order more computationally efficient in R? [duplicate]

This question already has answers here:
Get value of a matrix with row-index and column-index [duplicate]
(2 answers)
Closed 2 years ago.
I have a df that of 32 columns and just under a million rows. The columns are the POINTID (individual id), First (year that an event first happened), and then 30 columns of years w binary occurrence data. I would like the first occurrence in each row (currently stored as a 1, same as all other occurrences) to be changed to a 2, so that I can differentiate between the first event and repeat events. I've tried doing this with the tidyverse, but even then it is taking forever. I can't tell if my code is just wrong or if it's not computationally efficient enough. I tested it on a smaller dataset and it seemed to work, in the long format but not the wide, so I'm thinking it's an efficiency issue because the pivot_longer table generated is about about 35 million rows long.
Can anyone help me understand why this isn't working or how to do it in a way that computes faster?
classifications %>%
pivot_longer(-c(1,32),names_to="Years", values_to="Present")%>%
group_by(POINTID)%>%
mutate(Present=replace(Present, Years==first, 2))
A reduced version of my DF is below:
> dput(classifications)
structure(list(POINTID = 2:11, first = structure(c(33L, 33L,
33L, 33L, 1L, 33L, 33L, 1L, 1L, 36L), .Label = c("X1985", "X1986",
"X1987", "X1988", "X1989", "X1990", "X1991", "X1992", "X1993",
"X1994", "X1995", "X1996", "X1997", "X1998", "X1999", "X2000",
"X2001", "X2002", "X2003", "X2004", "X2005", "X2006", "X2007",
"X2008", "X2009", "X2010", "X2011", "X2012", "X2013", "X2014",
"X2015", "X2016", "X2017", "X2018", "X2019", "X2020"), class = "factor"),
X1990 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X1991 = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0), X1992 = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0), X1993 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X1994 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X1995 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X1996 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X1997 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X1998 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X1999 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2000 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2001 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2002 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2003 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2004 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2005 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2006 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2007 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2008 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2009 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2010 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2011 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2012 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2013 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2014 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), X2015 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2016 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2017 = c(1, 1, 1, 1, 0, 1, 1, 0, 0, 0), X2018 = c(1,
0, 0, 0, 0, 0, 0, 0, 0, 0), X2019 = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0), X2020 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1)), row.names = c(NA,
10L), class = "data.frame")
You can do this keeping the data in wide format with vectorised operations of row/column subsetting. We get the column index using match.
mat <- cbind(1:nrow(classifications),
match(classifications$first, names(classifications)))
classifications[mat] <- 2

Side-by-side stacked barplot with facetting using ggplot2

The majority of information I can find on side-by-side stacked barplots deals with instances in where some variable (number of side-by-side bars) are repeated for each variable along the x-axis - see: 1, 2, 3, 4, 5, 6. In these cases they use ggplot with besides=TRUE.
I have a more complex example which I believe will require faceting like these two examples: 7 & 8.
Quick background (for those interested in the why?):
I'm trying to compare the efficiency of a proteomics protocol that enriches for chromatin by comparing to the proportion of nuclear proteins found in the core/whole proteome experiments for 4 cell lines. to do this I used The Human Protein Atlas to annotate proteins by their subcellular location and compare nuclear proteins from chromatin-enrichment to whole-enrichment. However, the chromatin-enrichment protocol was 1D-shotgun while the whole proteome data was 2D-shotgun with 50 fractions. In layman terms this means the whole/core proteome data is a more expensive experiment done at higher coverage. Therefore, it wouldn't make sense to look at absolute proportion though because the overall amount of found proteins would be higher in the whole proteome pull-downs (see figure: absolute protein comparison sketch). To circumvent this issue I divided by the total number of proteins found in each pull-down to get relative proportion of proteins from each subcellular location.
Using these relative proportions I've produced a stacked barplot of the following data in my gist with the following code:
df1 <- read.csv("data.csv") # Load data.frame of the data
df2 <- melt(df1, # Reshape the data from
id.vars = "subcellular_location", # wide format into long format
variable.name = "cell_line", # (i.e. tidy data)
value.name = "relative_proportion")
For some reason this didn't change the variable name or value name (headers) - they are called "variable" and "value" still? So I had to rename column headers via the following.
names(df2) <- c("subcellular_location", "cell_line", "relative_proportion")
As there are many subcellular locations I needed to custom add colors, furthermore I grouped them by similar locations (e.g. nuclear in blue).
p <- ggplot() +
geom_bar(aes(x = cell_line, y = percentage, fill = subcellular_location),
data = df2, stat="identity")
p +
coord_flip() +
scale_fill_manual(values = c("#bd5db0","#9ae17c", "#be0024", "#7388ff", "#c456b7",
"#8ed470", "#7ec361", "#7d7304", "#f87a00", "#d543c7",
"#bead47", "#d148c3", "#da8836", "#e28504", "#d93eca",
"#c720b9", "#bc07ae", "#a40098", "#9a008e", "#e8d448",
"#104ed7", "#2c4ecc", "#00428c", "#393c6d", "#173b8f",
"#3f4c96", "#9ba2f5", "#727bcc", "#e59c5f", "#790000",
"#045d00", "#f9ad6f"))
See image here: stacked barplot
The core proteome pull-downs are highlighted in yellow. Ideally what I would like to do is facet this barplot into 4 sections - one for each cell line. I followed the instructions from reference 7 for faceting but am getting an error.
First I split my dataframe into 4 separate tidy dataframes (e.g. below):
K562 <- read.csv("K562-relative.csv")
K562 <- melt(K562, id.vars = "subcellular_location") # Reshape the data into tidy form
names(K562) <- c("subcellular_location", "cell_line", "relative_proportion")
etc.
Than I created a vector for cell line:
cell <- sample(c("HAP1","K562","A673","MDS"))
When I try the following code I get an error:
ref_by_cell <- data.frame(HAP1 = HAP1, K562 = K562, A673 = A673, MDS = MDS, cell = cell)
Error in data.frame(HAP1 = HAP1, K562 = K562, A673 = A673, MDS = MDS,
arguments imply differing number of rows: 576, 544, 64, 4
I would appreciate any help with faceting or alternative ideas for displaying this information.
Thank you!
I'm not entirely sure what you want, but if you want to facet by the first part of each cell_line value...
# add faceting variable to df2
df2 <- df2 %>%
mutate(cell = stringi::stri_extract_first_regex(cell_line, "^[^\\.|_]+"))
# facet by cell, specifying free scales / space on the y-axis
ggplot(data = df2,
aes(x = cell_line, y = relative_proportion, fill = subcellular_location)) +
geom_bar(stat = "identity") +
coord_flip() +
facet_grid(cell~., scales = "free_y", space = "free_y") +
scale_fill_manual(values = c("#bd5db0","#9ae17c", "#be0024", "#7388ff", "#c456b7",
"#8ed470", "#7ec361", "#7d7304", "#f87a00", "#d543c7",
"#bead47", "#d148c3", "#da8836", "#e28504", "#d93eca",
"#c720b9", "#bc07ae", "#a40098", "#9a008e", "#e8d448",
"#104ed7", "#2c4ecc", "#00428c", "#393c6d", "#173b8f",
"#3f4c96", "#9ba2f5", "#727bcc", "#e59c5f", "#790000",
"#045d00", "#f9ad6f")) +
theme_bw() +
theme(strip.text.y = element_text(angle = 0))
Data (copied from your gist link; next time please use dput so that others can reproduce your example more easily):
> dput(df1)
structure(list(subcellular_location = c("actinFilaments", "aggresome",
"cellJunctions", "centrosome", "cytokineticBridge", "cytoplasmicBodies",
"cytosol", "endoplasmicReticulum", "endosome", "focalAdhesion",
"golgiApparatus", "intermediateFilaments", "lipidDroplets", "lysosomes",
"microtubuleEnds", "microtubuleOrganizingCenter", "microtubules",
"midbodyRing", "midbody", "mitochondria", "mitoticSpindle", "nuclearBodies",
"nuclearMembrane", "nuclearSpeckles", "nucleliFibrallar", "nucleoli",
"nucleoplasm", "nucleus", "peroxisomes", "plasmaMembrane", "rodsAndRings",
"vesicles"), HAP1_P5242 = c(0.009581882, 0.000338753, 0.011033682,
0.015824623, 0.003774681, 0.00232288, 0.227013163, 0.024535424,
0.001258227, 0.005807201, 0.04229578, 0.008710801, 0.0014518,
0.001064654, 0.00029036, 0.006484708, 0.013646922, 0.000483933,
0.001064654, 0.063637244, 0.00087108, 0.02303523, 0.013646922,
0.024535424, 0.013259775, 0.054587689, 0.195509098, 0.101480836,
0.00174216, 0.058072009, 0.000822687, 0.071815718), HAP1.wt_P8255.1 = c(0.0176,
0, 0.0032, 0.0096, 0, 0.0032, 0.3664, 0.0912, 0.008, 0.0032,
0.0128, 0, 0, 0.0064, 0, 0.0032, 0.0288, 0, 0, 0.0528, 0, 0.0128,
0.0048, 0.0096, 0, 0.0496, 0.1552, 0.0576, 0, 0.064, 0.0016,
0.0384), HAP1.wt_P8255.2 = c(0.013179572, 0, 0, 0.008237232,
0, 0.004942339, 0.36738056, 0.098846788, 0.003294893, 0.003294893,
0.016474465, 0.001647446, 0, 0.004942339, 0, 0.003294893, 0.029654036,
0, 0, 0.05107084, 0, 0.009884679, 0.004942339, 0.011532125, 0,
0.044481054, 0.154859967, 0.05601318, 0, 0.064250412, 0.001647446,
0.046128501), HAP1.wt_P8254.1 = c(0.012841091, 0, 0, 0.006420546,
0.001605136, 0.004815409, 0.362760835, 0.08988764, 0.001605136,
0.004815409, 0.017656501, 0.003210273, 0, 0.003210273, 0, 0.004815409,
0.032102729, 0, 0, 0.04975923, 0, 0.011235955, 0.003210273, 0.011235955,
0, 0.04975923, 0.160513644, 0.060995185, 0, 0.069020867, 0.001605136,
0.036918138), HAP1.wt_P8254.2 = c(0.015873016, 0, 0, 0.00952381,
0.001587302, 0.004761905, 0.357142857, 0.103174603, 0.003174603,
0.003174603, 0.014285714, 0.001587302, 0.001587302, 0.003174603,
0, 0.003174603, 0.03015873, 0, 0, 0.055555556, 0, 0.012698413,
0.006349206, 0.012698413, 0, 0.050793651, 0.152380952, 0.063492063,
0, 0.057142857, 0.001587302, 0.034920635), HAP1.kd_P8253.1 = c(0,
0, 0, 0, 0, 0, 0.270270271, 0.027027028, 0, 0, 0.027027028, 0,
0, 0, 0, 0, 0.054054053, 0, 0, 0, 0, 0, 0, 0.054054053, 0, 0.054054053,
0.405405405, 0.027027028, 0, 0, 0, 0.081081081), HAP1.kd_P8253.2 = c(0.021381579,
0, 0.003289474, 0.013157895, 0, 0.003289474, 0.368421053, 0.100328947,
0.004934211, 0.003289474, 0.013157895, 0.003289474, 0, 0.006578947,
0, 0.001644737, 0.027960526, 0, 0, 0.046052632, 0, 0.011513158,
0.004934211, 0.009868421, 0, 0.050986842, 0.15131579, 0.050986842,
0, 0.065789474, 0.001644737, 0.036184211), HAP1.kd_P8252.1 = c(0.018518518,
0, 0.00308642, 0.010802469, 0, 0.00462963, 0.354938272, 0.092592593,
0.00617284, 0.00462963, 0.018518518, 0, 0.00154321, 0.00462963,
0, 0.00617284, 0.026234568, 0, 0, 0.043209877, 0, 0.015432099,
0.00308642, 0.015432099, 0, 0.049382716, 0.154320988, 0.061728395,
0, 0.063271605, 0.00154321, 0.040123457), HAP1.kd_P8252.2 = c(0.012965964,
0, 0, 0.011345219, 0.001620746, 0.003241491, 0.367909238, 0.095623987,
0.003241491, 0.004862237, 0.017828201, 0.003241491, 0, 0.004862237,
0, 0.003241491, 0.030794165, 0, 0, 0.051863857, 0, 0.016207455,
0.003241491, 0.009724473, 0, 0.04376013, 0.1636953, 0.055105348,
0, 0.064829822, 0.001620746, 0.02917342), HAP1.kd_P8249.1 = c(0.010309278,
0.001718213, 0, 0.006872852, 0, 0.005154639, 0.197594502, 0.091065292,
0.001718213, 0, 0.013745704, 0.005154639, 0.001718213, 0.001718213,
0, 0, 0.027491409, 0, 0, 0.054982818, 0, 0.017182131, 0.013745704,
0.060137457, 0, 0.082474227, 0.240549828, 0.094501718, 0, 0.04467354,
0, 0.027491409), HAP1.kd_P8249.2 = c(0.010752688, 0, 0, 0.007168459,
0, 0.003584229, 0.20609319, 0.084229391, 0.001792115, 0, 0.007168459,
0.005376344, 0, 0.001792115, 0, 0, 0.03046595, 0, 0, 0.069892473,
0, 0.019713262, 0.014336918, 0.064516129, 0, 0.08781362, 0.224014337,
0.096774194, 0, 0.039426523, 0, 0.025089606), HAP1.kd_P8248.1 = c(0.007207207,
0, 0.001801802, 0.007207207, 0, 0.003603604, 0.198198198, 0.099099099,
0, 0, 0.009009009, 0.007207207, 0, 0, 0, 0.001801802, 0.025225225,
0, 0, 0.061261261, 0, 0.021621622, 0.016216216, 0.068468468,
0, 0.079279279, 0.234234234, 0.093693694, 0.001801802, 0.028828829,
0, 0.034234234), HAP1.kd_P8248.2 = c(0.005272408, 0.001757469,
0, 0.008787346, 0, 0.005272408, 0.202108963, 0.09314587, 0, 0,
0.014059754, 0.005272408, 0, 0, 0, 0.001757469, 0.029876977,
0, 0, 0.056239016, 0, 0.021089631, 0.014059754, 0.065026362,
0, 0.086115993, 0.228471002, 0.094903339, 0.001757469, 0.036906854,
0, 0.028119508), HAP1.wt_P8247.1 = c(0.016333938, 0, 0, 0.001814882,
0, 0.005444646, 0.197822141, 0.09800363, 0.001814882, 0, 0.007259528,
0.005444646, 0, 0.001814882, 0, 0.001814882, 0.030852995, 0,
0, 0.061705989, 0, 0.021778584, 0.012704174, 0.065335753, 0,
0.087114338, 0.234119782, 0.096188748, 0, 0.029038113, 0, 0.023593466
), HAP1.wt_P8247.2 = c(0.011173184, 0, 0, 0.003724395, 0, 0.003724395,
0.197392924, 0.098696462, 0.001862197, 0, 0.009310987, 0.005586592,
0, 0.001862197, 0, 0.001862197, 0.029795158, 0, 0, 0.059590317,
0, 0.018621974, 0.013035382, 0.067039106, 0, 0.08566108, 0.240223464,
0.096834264, 0, 0.029795158, 0, 0.024208566), HAP1.wt_P8246.1 = c(0.008880995,
0, 0, 0.005328597, 0.003552398, 0.003552398, 0.195381883, 0.090586146,
0, 0, 0.005328597, 0.008880995, 0, 0, 0, 0.001776199, 0.030195382,
0, 0, 0.051509769, 0, 0.023090586, 0.01598579, 0.063943162, 0,
0.097690941, 0.245115453, 0.097690941, 0, 0.026642984, 0, 0.024866785
), HAP1.wt_P8246.2 = c(0.009025271, 0, 0.001805054, 0.005415162,
0, 0.003610108, 0.19133574, 0.088447653, 0, 0.001805054, 0.012635379,
0.007220217, 0, 0, 0, 0, 0.028880866, 0, 0, 0.048736462, 0, 0.019855596,
0.027075812, 0.066787004, 0, 0.084837545, 0.241877256, 0.09566787,
0, 0.028880866, 0, 0.036101083), HAP1_P7964.1 = c(0.010040907,
0, 0.007437709, 0.017106731, 0.002975084, 0.003346969, 0.211230941,
0.040535515, 0.002603198, 0.005950167, 0.023056898, 0.00818148,
0.001115656, 0.002231313, 0.000743771, 0.005950167, 0.014503533,
0, 0.000743771, 0.065451841, 0.001115656, 0.023056898, 0.018966158,
0.031610264, 0, 0.065451841, 0.223875046, 0.105243585, 0.002603198,
0.051692079, 0.001487542, 0.051692079), MDS_P7246 = c(0.008080031,
0.000384763, 0.005386687, 0.012889573, 0.002885725, 0.002500962,
0.204116968, 0.035013467, 0.002116199, 0.00461716, 0.030973451,
0.008272412, 0.001539053, 0.001539053, 0.000192382, 0.003270489,
0.01250481, 0.000192382, 0.000961908, 0.082724125, 0.000577145,
0.025971528, 0.018661023, 0.030011543, 0.013851481, 0.065217391,
0.214313197, 0.108310889, 0.002116199, 0.048095421, 0.000577145,
0.052135437), MDS.L_P7246.1 = c(0.008308003, 0.000202634, 0.006079027,
0.013373858, 0.003039513, 0.002228976, 0.207294805, 0.036068891,
0.002026342, 0.004660587, 0.030800401, 0.008308003, 0.001621074,
0.001621074, 0.000202634, 0.003039513, 0.012563322, 0.000202634,
0.001013171, 0.081458956, 0.000405268, 0.026950351, 0.018034445,
0.030395133, 1.34e-07, 0.065450852, 0.218642321, 0.109017209,
0.002228976, 0.050050652, 0.000810537, 0.053900702), A673_P6591 = c(0.01081944,
0.000354736, 0.008158922, 0.013125222, 0.003015254, 0.003015254,
0.202554097, 0.035118836, 0.002128414, 0.006207875, 0.036183044,
0.006917347, 0.000886839, 0.001596311, 0.000709471, 0.004788932,
0.013657325, 0.000177368, 0.001064207, 0.069882937, 0.000709471,
0.025186236, 0.015253636, 0.029265697, 0.013125222, 0.06385243,
0.207875133, 0.106598084, 0.002305782, 0.056580348, 0.000354736,
0.058531394), A673_P6591.1 = c(0.011204482, 0.000186741, 0.008403361,
0.01363212, 0.003174603, 0.002614379, 0.203361345, 0.036414566,
0.002054155, 0.006162465, 0.036788049, 0.006722689, 0.000933707,
0.001680672, 0.000746965, 0.004668534, 0.01363212, 0.000186741,
0.001120448, 0.069467787, 0.000560224, 0.025957049, 0.014752568,
0.029505135, 0, 0.064239029, 0.212885154, 0.108496732, 0.002427638,
0.05751634, 0.000373483, 0.060130719), K562_P535 = c(0.008616975,
0.000143616, 0.007755278, 0.011202068, 0.002441476, 0.003303174,
0.278471923, 0.038776389, 0.00229786, 0.006031883, 0.033031739,
0.00689358, 0.003159558, 0.00229786, 0.000287233, 0.004164871,
0.012638231, 0.000287233, 0.000574465, 0.090621858, 0.000574465,
0.015941405, 0.009478673, 0.021255206, 0.009909522, 0.047536981,
0.181243717, 0.083871894, 0.002441476, 0.055292259, 0.00114893,
0.0583082), K562_P5494.1 = c(0.008692853, 0.000321957, 0.008692853,
0.012395364, 0.002736639, 0.002736639, 0.212813909, 0.032356729,
0.001448809, 0.004990341, 0.033000644, 0.007405023, 0.001448809,
0.001448809, 0.000160979, 0.004990341, 0.013039279, 0, 0.001126851,
0.074050225, 0.000643915, 0.027849324, 0.01545396, 0.029459111,
0, 0.065035415, 0.216355441, 0.111719253, 0.002092724, 0.051996137,
0.000804894, 0.054732775), K562_P5464.1 = c(0.009412153, 0.000495376,
0.008256275, 0.013705416, 0.002476882, 0.002476882, 0.20673712,
0.032529723, 0.001486129, 0.004788639, 0.034180978, 0.007595773,
0.001155878, 0.001321004, 0.000330251, 0.005284016, 0.012714663,
0.000330251, 0.000990753, 0.073811096, 0.000660502, 0.02823646,
0.016017173, 0.029722589, 0, 0.06489432, 0.217635403, 0.110634082,
0.002146631, 0.052179657, 0.000825627, 0.056968296), K562_P5359.1 = c(0.00740349,
0, 0.005288207, 0.005288207, 0.001057641, 0.003172924, 0.225806452,
0.063987308, 0.003172924, 0.004230566, 0.022739291, 0.005817028,
0.002644104, 0.002644104, 0, 0.003701745, 0.013749339, 0, 0.000528821,
0.099947118, 0, 0.015864622, 0.020095188, 0.037546272, 0, 0.080909572,
0.196192491, 0.090957166, 0.002115283, 0.040719196, 0.000528821,
0.043892121), K562_P5359.2 = c(0.007903056, 0, 0.004741834, 0.006322445,
0.001580611, 0.002107482, 0.223393045, 0.062170706, 0.002634352,
0.004741834, 0.023709168, 0.005795574, 0.002634352, 0.002634352,
0, 0.003688093, 0.014752371, 0, 0, 0.103266596, 0, 0.017386723,
0.021601686, 0.036354057, 0, 0.079030558, 0.192834563, 0.090621707,
0.002107482, 0.042676502, 0.00052687, 0.044783983), K562_P5358.1 = c(0.007462687,
0, 0.00533049, 0.005863539, 0.001599147, 0.003731343, 0.229744136,
0.064498934, 0.003198294, 0.004264392, 0.024520256, 0.005863539,
0.003198294, 0.002132196, 0, 0.003731343, 0.015458422, 0, 0,
0.101812367, 0, 0.015991471, 0.019189765, 0.036247335, 0, 0.077292111,
0.191364606, 0.087953092, 0.002132196, 0.041577825, 0.000533049,
0.045309168), K562_P5358.2 = c(0.006546645, 0, 0.005455537, 0.007637752,
0.001636661, 0.003273322, 0.225859247, 0.063829787, 0.003273322,
0.003818876, 0.024549918, 0.007092199, 0.002182215, 0.002727769,
0, 0.003818876, 0.015275505, 0, 0, 0.106382979, 0, 0.016912166,
0.01745772, 0.038188762, 0, 0.074195308, 0.195853792, 0.089470813,
0.001636661, 0.040370977, 0.000545554, 0.042007638), K562_P5357.1 = c(0.007057546,
0, 0.004885993, 0.00597177, 0.001085776, 0.003257329, 0.231813246,
0.06514658, 0.003257329, 0.004343105, 0.024972856, 0.00597177,
0.003257329, 0.002171553, 0, 0.003800217, 0.014115092, 0, 0,
0.118892508, 0, 0.01194354, 0.016829533, 0.030944625, 0, 0.07383279,
0.184039088, 0.086862106, 0.002171553, 0.049402823, 0.000542888,
0.043431053), K562_P5357.2 = c(0.008086253, 0, 0.003773585, 0.006469003,
0.001617251, 0.003234501, 0.23180593, 0.063072776, 0.003773585,
0.003234501, 0.023180593, 0.005929919, 0.003234501, 0.002695418,
0, 0.003773585, 0.014555256, 0, 0, 0.116442049, 0, 0.01509434,
0.017789757, 0.035579515, 0, 0.071698113, 0.189757412, 0.085714286,
0.002156334, 0.044743935, 0.000539084, 0.042048518), K562_P5356.1 = c(0.006292906,
0, 0.005148741, 0.004576659, 0.001716247, 0.002860412, 0.215675057,
0.070366133, 0.003432494, 0.003432494, 0.025743707, 0.005720824,
0.003432494, 0.002860412, 0, 0.004576659, 0.016018307, 0, 0,
0.127002288, 0, 0.01201373, 0.016590389, 0.03375286, 0, 0.076659039,
0.183638444, 0.086956522, 0.001716247, 0.044622426, 0.000572082,
0.044622426), K562_P5356.2 = c(0.00755814, 0, 0.004069767, 0.004651163,
0.001744186, 0.002325581, 0.21627907, 0.070930233, 0.002906977,
0.004069767, 0.025, 0.005813953, 0.003488372, 0.002906977, 0,
0.004069767, 0.015697674, 0, 0, 0.125581395, 0, 0.013953488,
0.015697674, 0.035465116, 0, 0.073837209, 0.190697674, 0.08372093,
0.001744186, 0.045348837, 0.000581395, 0.041860465), K562_P5355.1 = c(0.009320175,
0, 0.003289474, 0.007675439, 0.001096491, 0.002741228, 0.285087719,
0.059210526, 0.003837719, 0.006030702, 0.023026316, 0.003837719,
0.003837719, 0.003289474, 0, 0.003289474, 0.016995614, 0.000548246,
0, 0.094298246, 0, 0.014254386, 0.012061404, 0.026315789, 0,
0.052631579, 0.175438596, 0.08497807, 0.001096491, 0.057017544,
0.000548246, 0.048245614), K562_P5355.2 = c(0.008210181, 0, 0.004378763,
0.009304871, 0.001094691, 0.002189382, 0.280788177, 0.056376574,
0.003284072, 0.005473454, 0.024630542, 0.004378763, 0.003284072,
0.003284072, 0, 0.004378763, 0.016967707, 0, 0, 0.100164204,
0, 0.014778325, 0.012588944, 0.028461959, 0, 0.053639847, 0.172961138,
0.084838533, 0.001642036, 0.054187192, 0.000547345, 0.048166393
), K562_P5269.1 = c(0.007308161, 0, 0.003045067, 0.007917174,
0.001218027, 0.00365408, 0.228989038, 0.071863581, 0.00365408,
0.004263094, 0.017661389, 0.007308161, 0.002436054, 0.003045067,
0, 0.002436054, 0.017052375, 0, 0, 0.107186358, 0, 0.015834348,
0.020097442, 0.033495737, 0, 0.085261876, 0.18453106, 0.091961023,
0.002436054, 0.040194884, 0.001218027, 0.03593179), K562_P5269.2 = c(0.006234414,
0, 0.00436409, 0.006234414, 0.002493766, 0.002493766, 0.224438903,
0.073566085, 0.003117207, 0.002493766, 0.018703242, 0.006857855,
0.002493766, 0.003117207, 0, 0.001246883, 0.015586035, 0, 0,
0.109725686, 0, 0.015586035, 0.018703242, 0.034289277, 0, 0.082294264,
0.195760598, 0.092892768, 0.003117207, 0.039276808, 0.000623441,
0.034289277), K562_P5268.1 = c(0.004635762, 0, 0.00397351, 0.007284768,
0.001986755, 0.002649007, 0.214569536, 0.071523179, 0.00397351,
0.002649007, 0.01986755, 0.007284768, 0.003311258, 0.003311258,
0, 0.003311258, 0.016556291, 0, 0, 0.104635762, 0, 0.017880795,
0.018543046, 0.039735099, 0, 0.090066225, 0.195364238, 0.091390728,
0.001986755, 0.039735099, 0.000662252, 0.033112583), K562_P5268.2 = c(0.005242464,
0, 0.002621232, 0.00655308, 0.002621232, 0.00327654, 0.216251638,
0.070117955, 0.00327654, 0.00327654, 0.020969856, 0.008519004,
0.002621232, 0.001965924, 0, 0.002621232, 0.018348624, 0, 0,
0.108781127, 0, 0.015727392, 0.020314548, 0.040629096, 0, 0.087811271,
0.190039319, 0.090432503, 0.001965924, 0.040629096, 0.000655308,
0.034731324)), .Names = c("subcellular_location", "HAP1_P5242",
"HAP1.wt_P8255.1", "HAP1.wt_P8255.2", "HAP1.wt_P8254.1", "HAP1.wt_P8254.2",
"HAP1.kd_P8253.1", "HAP1.kd_P8253.2", "HAP1.kd_P8252.1", "HAP1.kd_P8252.2",
"HAP1.kd_P8249.1", "HAP1.kd_P8249.2", "HAP1.kd_P8248.1", "HAP1.kd_P8248.2",
"HAP1.wt_P8247.1", "HAP1.wt_P8247.2", "HAP1.wt_P8246.1", "HAP1.wt_P8246.2",
"HAP1_P7964.1", "MDS_P7246", "MDS.L_P7246.1", "A673_P6591", "A673_P6591.1",
"K562_P535", "K562_P5494.1", "K562_P5464.1", "K562_P5359.1",
"K562_P5359.2", "K562_P5358.1", "K562_P5358.2", "K562_P5357.1",
"K562_P5357.2", "K562_P5356.1", "K562_P5356.2", "K562_P5355.1",
"K562_P5355.2", "K562_P5269.1", "K562_P5269.2", "K562_P5268.1",
"K562_P5268.2"), class = "data.frame", row.names = c(NA, -32L
))

Calculate euclidean distance between profiles stored in data frame. Using one row as a reference

I have a data frame like that one below:
> dput(data)
structure(list(`28` = c(0, 0, 0, 0, 0, 0), `38` = c(0, 0, 0,
0, 0, 0), `45` = c(0, 0, 0, 0, 0, 0), `53` = c(0, 0, 0, 0, 0,
0), `60` = c(0, 0, 0, 0, 0, 0), `78` = c(0, 0, 0, 0, 0, 0), `116` = c(0,
0, 0, 0, 0, 0.983309489747258), `145` = c(0, 0, 0, 0, 0, 1),
`189` = c(0, 1, 0.560384508734634, 0, 0, 0.875695437927198
), `223` = c(0, 0.988158197286733, 1, 0, 0, 0.492500108379937
), `281` = c(1, 0.677856978615774, 0.448525741750624, 0,
0.362088745790311, 0.180474270603026), `362` = c(0.79151704397606,
0.763278914693033, 0.35864682503004, 1, 1, 0.114178985852806
), `440` = c(0.662841530054645, 0.818636468153598, 0.448488769756909,
0, 0.448447503793346, 0), `524` = c(0, 0.638192687974247,
0, 0, 0, 0), `634` = c(0, 0, 0, 0, 0, 0), `759` = c(0, 0,
0, 0, 0, 0), `848` = c(0, 0, 0, 0, 0, 0), `979` = c(0, 0,
0, 0, 0, 0), `1120` = c(0, 0, 0, 0, 0, 0), `1248` = c(0,
0, 0, 0, 0, 0)), .Names = c("28", "38", "45", "53", "60",
"78", "116", "145", "189", "223", "281", "362", "440", "524",
"634", "759", "848", "979", "1120", "1248"), row.names = c("Mark",
"Gregg", "Tim", "Oscar", "Tom", "Matthew"
), class = "data.frame")
I would like to calculate euclidean distance between all the profiles from this data and Tim should be used as a reference. The results can be stored in additional column.
Mark to Tim
Gregg to Tim
Oscar to Tim
and etc
You can use dist function (which actually computes all the distances between all the profiles) :
m <- as.matrix(DF)
distances <- as.matrix(dist(m, method = "euclidean", upper = TRUE,diag = TRUE))
> distances['Mark','Tim']
[1] 1.36069
> distances['Gregg','Tim']
[1] 0.9767401
> distances['Oscar','Tim']
[1] 1.458658

Grouping elements from different data

In my work I'm trying to find which of genes usually comes together. So I set up some experiments and now trying to analyze the data. I already wrote a nice script for analyzing it but still it's not enough.
What I want to do this time is to analyze couple of tables and establish which genes are usually together - in the same cluster.
That's my data:
First table:
> dput(tbl_col_clu1[1:20,])
structure(list(`10` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `20` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `52.5` = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `81` = c(0, 0, 0, 0,
0, 0, 0.64209043, 0, 0, 0, 0, 0, 0, 0, 0.636411741, 0.183490041,
0, 0, 0, 0), `110` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `140.5` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `189` = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0.84958569, 0, 0, 0, 0, 0), `222.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0.37119221, 0, 0, 0, 1, 0, 0, 0, 0,
0), `278` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), `340` = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `397` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `453.5` = c(0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `529` = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `580` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `630.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `683.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `735.5` = c(0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `784` = c(0,
0, 0, 0, 0, 0, 0, 0.399952462, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0.959211661, 1), `832` = c(0, 0.1266780707, 0, 0, 0, 0, 0, 0.2132893016,
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0.959211661, 1), `882.5` = c(0,
0.12667807, 0, 0, 0, 1, 0, 0.08480435, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0.70163097), `926.5` = c(0, 1, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0), `973` = c(0, 0.12621196, 0,
0, 0, 0, 0, 0.11813646, 0, 0, 0, 1, 0, 0, 0.59389934, 1, 0, 0,
0, 0), `1108` = c(0, 0.092444384, 0, 0, 0, 0, 0, 0.115758222,
0, 0, 0, 0.925835779, 0, 0, 1, 0.303482426, 0.848464317, 0, 0,
0), `1200` = c(0, 0.120055749, 0, 1, 0, 0, 0, 0.150055416, 0,
0, 0, 0.558015841, 0, 0, 0.796949668, 0.276321753, 1, 0, 0, 0
), Clusters = structure(c(1L, 64L, 45L, 102L, 11L, 77L, 170L,
55L, 59L, 316L, 316L, 98L, 90L, 77L, 232L, 178L, 101L, 50L, 51L,
51L), .Label = c("10", "10,13,15", "10,15", "10,15,16", "10,20,21,22,23,24",
"10,22,23,24", "11", "11,12,13,14,15", "11,12,13,14,15,16", "11,12,13,14,15,16,17",
"12", "12,13", "12,13,14", "12,13,14,15", "12,13,14,15,16", "12,13,14,15,16,17",
"12,13,14,15,16,17,18,19,20,21,22,23,24", "12,13,15", "12,13,17",
"13", "13,14", "13,14,15", "13,14,15,16", "13,14,15,16,17", "13,15",
"13,15,16,17", "14", "14,15", "14,15,16", "14,15,16,17", "14,15,16,17,18,19,20,21,22,23,24",
"14,19", "15", "15,16", "15,16,17", "15,16,17,18,19,20,21,22,23,24",
"15,16,17,19,20,21,22,23,24", "15,17", "15,17,24", "15,22,23,24",
"15,23", "15,24", "16", "16,17", "17", "17,18,19,20", "17,18,19,20,21,22,23,24",
"17,21,22,23,24", "18", "18,19", "18,19,20", "18,19,20,21", "18,19,20,21,22",
"18,19,20,21,22,23", "18,19,20,21,22,23,24", "18,19,21", "18,19,22,23",
"18,20", "19", "19,20", "19,20,21", "19,20,21,22", "19,20,21,22,23",
"19,20,21,22,23,24", "19,20,22", "19,20,22,23", "19,20,22,23,24",
"19,20,23", "19,21", "19,22", "19,23", "19,24", "2", "2,18,19,20",
"2,19,20", "2,3,4", "20", "20,21", "20,21,22", "20,21,22,23",
"20,21,22,23,24", "20,21,23", "20,22", "20,22,23", "20,22,23,24",
"20,22,24", "20,23", "20,23,24", "20,24", "21", "21,22", "21,22,23",
"21,22,23,24", "21,23,24", "21,24", "22", "22,23", "22,23,24",
"22,24", "23", "23,24", "24", "3", "3,10", "3,18,19,20", "3,18,19,20,21,22,23,24",
"3,19,20", "3,19,20,21", "3,19,20,22,23,24", "3,20,21,22,23,24",
"3,20,22,23,24", "3,21,23,24", "3,22,23,24", "3,22,24", "3,23",
"3,23,24", "3,24", "3,4", "3,4,10", "3,4,18,19", "3,4,18,19,20",
"3,4,18,19,20,21,22,23", "3,4,18,19,20,21,22,23,24", "3,4,19,20,21",
"3,4,21", "3,4,21,22,23", "3,4,21,22,23,24", "3,4,22,23", "3,4,22,23,24",
"3,4,22,24", "3,4,23,24", "3,4,24", "3,4,5", "3,4,5,10", "3,4,5,10,23,24",
"3,4,5,20", "3,4,5,22,23,24", "3,4,5,23,24", "3,4,5,24", "3,4,5,6",
"3,4,5,6,10", "3,4,5,6,20,22,23,24", "3,4,5,6,7", "3,4,5,6,7,10",
"3,4,5,6,7,24", "3,4,5,6,7,8", "3,4,5,6,7,8,10", "3,4,5,6,7,8,10,13",
"3,4,5,6,7,8,10,22,23,24", "3,4,5,6,7,8,12", "3,4,5,6,7,8,15",
"3,4,5,6,7,8,18,19,20,21,22,23,24", "3,4,5,6,7,8,22,23,24", "3,4,5,6,7,8,9,10",
"3,4,5,6,7,8,9,10,11,12", "3,4,5,6,7,8,9,10,11,12,13,14,15",
"3,4,5,6,7,8,9,10,11,12,13,14,15,16,17", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24",
"3,4,5,6,7,8,9,10,11,14,15", "3,4,5,6,7,8,9,10,19,20,21,22,23,24",
"3,4,5,6,7,8,9,10,22,23,24", "3,4,6", "3,4,6,7,20,21,22,23,24",
"3,4,7", "3,4,7,8", "3,5,6,7,8", "3,5,8", "3,7", "3,7,19,20,22,23",
"4", "4,10", "4,10,24", "4,18,19,20", "4,19,20", "4,20,21,22",
"4,20,21,22,23,24", "4,20,22,23,24", "4,22,23,24", "4,23,24",
"4,24", "4,5", "4,5,10", "4,5,10,21", "4,5,10,23,24", "4,5,19,20,21,22,23",
"4,5,19,20,22,23,24", "4,5,20,21,22,23,24", "4,5,20,22,23,24",
"4,5,22,23,24", "4,5,24", "4,5,6", "4,5,6,10", "4,5,6,10,20,22,23,24",
"4,5,6,19", "4,5,6,22,23,24", "4,5,6,7", "4,5,6,7,10", "4,5,6,7,19,20,21,22,23,24",
"4,5,6,7,22,23,24", "4,5,6,7,8", "4,5,6,7,8,10", "4,5,6,7,8,10,19,20,21,22,23,24",
"4,5,6,7,8,10,20,21,22,23,24", "4,5,6,7,8,10,21,22,23,24", "4,5,6,7,8,10,22,23,24",
"4,5,6,7,8,10,23,24", "4,5,6,7,8,15", "4,5,6,7,8,17,18,19,20,21,22,23,24",
"4,5,6,7,8,19,20", "4,5,6,7,8,19,20,21,22,23,24", "4,5,6,7,8,20,21,22,23,24",
"4,5,6,7,8,21,22,23,24", "4,5,6,7,8,22,23,24", "4,5,6,7,8,9,10",
"4,5,6,7,8,9,10,11,12", "4,5,6,7,8,9,10,11,12,13,14,15", "4,5,6,7,8,9,10,11,12,13,14,15,16,17",
"4,5,6,7,8,9,10,11,12,13,14,15,16,17,18", "4,5,6,7,8,9,10,12,13",
"4,5,6,7,8,9,14,15,16", "4,5,7,9", "4,5,8,22", "4,6", "4,6,7,22,23,24",
"4,6,7,23,24", "4,6,7,8,15,17", "4,6,7,8,23,24", "4,7", "4,7,20,21",
"4,7,21,22,23,24", "4,7,8", "4,7,8,22,23,24", "5", "5,10", "5,17",
"5,18,19,20,21,22,23", "5,19,20,21,22,23,24", "5,20", "5,22,23,24",
"5,24", "5,6", "5,6,10", "5,6,7", "5,6,7,10", "5,6,7,10,19",
"5,6,7,22,23,24", "5,6,7,8", "5,6,7,8,10", "5,6,7,8,10,15", "5,6,7,8,10,22,23,24",
"5,6,7,8,15", "5,6,7,8,18,19,20,21,22,23,24", "5,6,7,8,21,22,23,24",
"5,6,7,8,22,23,24", "5,6,7,8,9", "5,6,7,8,9,10", "5,6,7,8,9,10,11,12,13",
"5,6,7,8,9,10,11,12,13,14,15", "5,6,7,8,9,12", "5,6,7,8,9,13",
"5,7", "5,7,8", "5,8", "6", "6,10", "6,21,22,23", "6,22", "6,22,23,24",
"6,7", "6,7,10,17", "6,7,22,23,24", "6,7,23,24", "6,7,24", "6,7,8",
"6,7,8,10", "6,7,8,13,14,15,16,17", "6,7,8,15", "6,7,8,19,20",
"6,7,8,20,21,22,23,24", "6,7,8,21,22,23,24", "6,7,8,23,24", "6,7,8,9",
"6,7,8,9,10", "6,7,8,9,10,11,12", "6,7,8,9,10,11,12,13,14,15,16,17",
"6,7,8,9,10,15,16", "6,7,8,9,10,18,19,20,21,22,23,24", "6,7,8,9,15",
"6,8", "7", "7,15", "7,15,17", "7,16,18,21", "7,17", "7,19,20",
"7,19,20,21,22", "7,20,21,22,23,24", "7,20,22,23,24", "7,22,23,24",
"7,24", "7,8", "7,8,10", "7,8,10,22,23,24", "7,8,13,15", "7,8,14",
"7,8,15", "7,8,15,16", "7,8,15,23", "7,8,20", "7,8,22", "7,8,23",
"7,8,9", "7,8,9,10", "7,8,9,13", "7,8,9,15,16,17", "8", "8,10",
"8,15", "8,17", "8,22", "8,24", "8,9", "8,9,10", "9", "9,10,11,12,13,14,15,16,17"
), class = "factor")), .Names = c("10", "20", "52.5", "81", "110",
"140.5", "189", "222.5", "278", "340", "397", "453.5", "529",
"580", "630.5", "683.5", "735.5", "784", "832", "882.5", "926.5",
"973", "1108", "1200", "Clusters"), row.names = c("at1g01050.1",
"at1g01080.1", "at1g01090.1", "at1g01220.1", "at1g01320.2", "at1g01420.1",
"at1g01710.1", "at1g01800.1", "at1g01920.2", "at1g01940.1", "at1g01960.1",
"at1g02020.2", "at1g02100.2", "at1g02140.1", "at1g02150.1", "at1g02500.2",
"at1g02560.1", "at1g02880.3", "at1g02920.1", "at1g02930.2"), class = "data.frame")
Second table:
> dput(tbl_col_clu2[1:20,])
structure(list(`10` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `20` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `52.5` = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `81` = c(0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `110` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `140.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `189` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `222.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `278` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), `340` = c(0,
0, 0, 0, 0, 0, 0.583163048, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0,
1, 0.218194067), `397` = c(0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0.63953839, 0, 1, 0, 0, 0, 1), `453.5` = c(0, 0.66069369,
0, 0, 0, 1, 0.57541627, 1, 1, 0, 0, 0, 1, 0.64615661, 0, 0.45209671,
0, 0, 0, 0.17022498), `529` = c(0, 0.521435654, 0, 0, 1, 0, 0.175996209,
0, 0, 0, 1, 0, 0, 0, 0, 0.886059888, 0, 0, 0, 0.17022498), `580` = c(0,
0.437291195, 0, 0, 1, 0, 0.20731698, 0, 0, 0, 1, 0, 0, 0, 0,
0.719755907, 0, 0, 0, 0.033248127), `630.5` = c(0, 0.52204783,
0, 0, 0, 0, 0.48815538, 0, 0, 0, 0, 1, 0, 0, 0, 0.82709638, 0,
0, 0, 0.09539534), `683.5` = c(0, 0.52429838, 0, 0, 0, 0, 0.59605685,
0, 0, 0, 0, 0, 0, 0, 0, 0.27845748, 0.28224351, 0, 0, 0), `735.5` = c(1,
0.3768651, 0, 1, 0, 0, 0.51381348, 0, 0, 0, 0, 0, 0, 0, 0, 0.39914361,
0.22206677, 0, 0, 0), `784` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0), `832` = c(0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0.16189002, 0, 0, 0), `882.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `926.5` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0), `973` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.86100786, 0, 0, 0, 0,
0), `1108` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), `1200` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), Clusters = structure(c(168L, 32L, 246L,
168L, 81L, 44L, 8L, 44L, 27L, 318L, 81L, 132L, 15L, 3L, 219L,
32L, 156L, 318L, 1L, 6L), .Label = c("10", "10,11", "10,11,12",
"10,11,12,13", "10,11,12,13,14", "10,11,12,13,14,15", "10,11,12,13,14,15,16",
"10,11,12,13,14,15,16,17", "10,11,12,13,14,15,16,17,18,19", "10,11,12,13,14,15,16,17,18,19,20",
"10,11,12,13,14,15,16,17,18,19,20,21", "10,11,12,13,14,16", "10,11,12,13,15,16,17,18,19,20,21",
"10,11,12,13,19", "10,12", "10,12,13", "10,12,13,14", "10,12,13,14,15",
"10,12,13,14,15,16,17", "10,12,13,15", "10,12,21", "10,13", "10,13,14",
"10,17,18", "10,20", "11", "11,12", "11,12,13", "11,12,13,14",
"11,12,13,14,15", "11,12,13,14,15,16", "11,12,13,14,15,16,17",
"11,12,13,14,15,16,17,18,19", "11,12,13,14,15,16,17,18,19,20",
"11,12,13,14,15,16,17,18,19,20,21,22,23", "11,12,13,14,15,16,17,18,19,20,21,22,23,24",
"11,12,13,14,15,16,17,18,19,21,22", "11,12,13,14,15,16,18", "11,12,13,17,18,19",
"11,12,14", "11,13", "11,13,14,15,16", "11,15", "12", "12,13",
"12,13,14", "12,13,14,15", "12,13,14,15,16", "12,13,14,15,16,17",
"12,13,14,15,16,17,18", "12,13,14,15,16,17,18,19", "12,13,14,15,16,17,18,19,20",
"12,13,14,15,16,17,18,19,20,21", "12,13,14,15,16,17,18,19,20,21,22",
"12,13,14,15,16,17,18,19,20,21,22,23", "12,13,14,15,16,17,18,19,20,21,22,23,24",
"12,13,14,15,16,17,18,19,23,24", "12,13,14,15,16,17,19", "12,13,14,15,16,17,19,20,21",
"12,13,14,15,16,17,21", "12,13,14,15,16,18", "12,13,14,15,17",
"12,13,14,16,17,19", "12,13,14,18", "12,13,15", "12,13,16", "12,13,16,17,18,19",
"12,13,16,19", "12,13,17", "12,13,21,22,23", "12,14", "12,14,15",
"12,14,15,16", "12,14,15,17,19", "12,15", "12,15,16,17", "12,16,17",
"12,20", "12,21,23", "13", "13,14", "13,14,15", "13,14,15,16",
"13,14,15,16,17", "13,14,15,16,17,18", "13,14,15,16,17,18,19",
"13,14,15,16,17,18,19,20", "13,14,15,16,17,18,19,20,21", "13,14,15,16,17,18,19,20,21,22",
"13,14,15,16,17,18,19,20,21,22,23", "13,14,15,16,17,18,19,20,21,22,23,24",
"13,14,15,16,17,18,19,21", "13,14,15,16,17,18,19,21,22,23", "13,14,15,16,17,19",
"13,14,15,16,17,21", "13,14,15,16,18,23", "13,14,17", "13,14,19,20,21,22,23",
"13,14,23,24", "13,15", "13,15,16", "13,15,16,18,19", "13,15,17",
"13,16,17", "13,17", "13,17,19", "13,19", "13,21", "14", "14,15",
"14,15,16", "14,15,16,17", "14,15,16,17,18", "14,15,16,17,18,19",
"14,15,16,17,18,19,20", "14,15,16,17,18,19,20,21", "14,15,16,17,18,19,20,21,22",
"14,15,16,17,18,19,20,21,22,23", "14,15,16,17,18,19,20,21,22,23,24",
"14,15,16,17,18,19,20,22,23,24", "14,15,16,17,19", "14,15,16,17,19,20",
"14,15,16,17,19,20,21", "14,15,16,17,22", "14,15,16,19", "14,15,17",
"14,15,19", "14,17", "14,17,18,19", "14,19", "14,21", "15", "15,16",
"15,16,17", "15,16,17,18", "15,16,17,18,19", "15,16,17,18,19,20",
"15,16,17,18,19,20,21", "15,16,17,18,19,20,21,22,23", "15,16,17,18,19,20,21,22,23,24",
"15,16,17,19", "15,16,17,19,20,21", "15,16,17,19,24", "15,16,17,20,21",
"15,16,17,21", "15,16,17,23", "15,16,18,19", "15,16,19,20", "15,17",
"15,18,19,20", "15,18,19,20,21", "15,19", "16", "16,17", "16,17,18",
"16,17,18,19", "16,17,18,19,20", "16,17,18,19,20,21", "16,17,18,19,20,21,22",
"16,17,18,19,20,21,22,23", "16,17,18,19,20,21,22,23,24", "16,17,19",
"16,17,19,20", "16,17,19,20,21", "16,17,19,21", "16,17,23", "16,19",
"17", "17,18", "17,18,19", "17,18,19,20", "17,18,19,20,21", "17,18,19,20,21,22",
"17,18,19,20,21,22,23", "17,18,19,20,21,22,23,24", "17,18,19,21",
"17,19", "17,19,20", "17,19,20,21", "17,19,20,21,22,23,24", "17,19,23",
"17,20,21", "17,20,21,23", "17,21,22", "17,23", "17,24", "18",
"18,19", "18,19,20", "18,19,20,21", "18,19,20,21,22", "18,19,20,21,22,23",
"18,19,20,21,22,23,24", "18,19,20,21,23", "18,20", "19", "19,20",
"19,20,21", "19,20,21,22", "19,20,21,22,23", "19,20,21,22,23,24",
"19,20,21,23,24", "19,20,22", "19,21", "19,22", "19,23", "2",
"2,17", "2,3,4,5,6", "2,3,4,5,6,7", "20", "20,21", "20,21,22",
"20,21,22,23", "20,21,22,23,24", "20,21,23", "20,21,23,24", "21",
"21,22", "21,22,23", "21,22,23,24", "21,23", "22", "22,23", "22,23,24",
"23", "23,24", "24", "3", "3,23,24", "3,4", "3,4,23,24", "3,4,5",
"3,4,5,6", "3,4,5,6,13,14,15,16,17,18,19,20,21,22,23,24", "3,4,5,6,7",
"3,4,5,6,7,8,9", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24",
"3,4,5,6,7,8,9,20,21,22,23,24", "3,4,5,6,7,8,9,21,22,23,24",
"3,4,5,6,8,9", "3,4,5,7,8,9,15,16,17,18,19,20,21,22,23", "3,4,6,12,13,14,15,16,17,18,19,20,21,22,23,24",
"3,8,9,10,11,12,13,14,15,16,17,18,19,20", "4", "4,17,18,19,20,21,22,23,24",
"4,19,20,21,22,23,24", "4,21", "4,22,23,24", "4,5,17,18,19,20,21,22,23,24",
"4,5,21,22,23,24", "4,5,6", "4,5,6,22,23,24", "4,5,6,7,8,9",
"4,5,6,7,8,9,10", "4,5,6,7,8,9,10,15,16,17,18,19,20,21,22,23,24",
"4,5,6,7,8,9,12,13,14,15,16,17,18,19,20,21,22,23,24", "4,5,6,7,8,9,13",
"4,5,6,7,8,9,14,15,16,17,18,19,20,21,22,23,24", "4,5,6,7,8,9,17,18,19,20,21,22,23,24",
"4,5,6,7,8,9,19,20,21,22,23,24", "4,5,6,7,8,9,19,23,24", "4,5,6,7,8,9,23,24",
"4,5,7,8,9", "4,8,9,12,13,14,15,16,17,18,19,20,21,22,23,24",
"4,8,9,23,24", "5", "5,22,23", "5,6", "5,6,15,16,17,18,19,20,21,22,23,24",
"5,6,19,20,21,22,23,24", "5,6,24", "5,6,7", "5,6,7,8", "5,6,7,8,19,20,21,22,23,24",
"5,6,7,8,9", "5,6,7,8,9,10,11,12,13", "5,6,7,8,9,10,11,12,13,14,15,16,17",
"5,6,7,8,9,15,23,24", "5,6,9", "5,7", "5,8,9", "6", "6,15,16,17,18,19,20,21,22,23,24",
"6,19,20,21,22,23,24", "6,20,21,22,23,24", "6,21,22,23,24", "6,7",
"6,7,8", "6,7,8,9", "6,7,8,9,15,16,17,18,19,20,21,22,23,24",
"6,7,8,9,23,24", "6,7,9", "6,8,15,16,17,18,19,20,21,22,23", "6,8,9",
"6,9", "7", "7,14,24", "7,8,9", "7,8,9,10,11,12,13,14,15", "7,8,9,20,21,22,23,24",
"7,8,9,23,24", "7,9", "7,9,10", "8", "8,19,20,21", "8,19,20,21,22,23,24",
"8,9", "8,9,10,11,12,13,14,15,16,17", "8,9,10,17,18,19,20,21,22",
"8,9,12,13,14,15,16,17,18,19", "8,9,14,15,16,17,18,19,20,21,22,23,24",
"8,9,15,16,17,18,19,20,21,22", "8,9,19", "8,9,19,20,21,22,23",
"8,9,21,22", "9", "9,10", "9,10,11,12,13,14", "9,10,11,12,13,14,15,16",
"9,10,11,12,13,14,15,16,17", "9,10,11,12,13,14,15,16,17,18,19",
"9,10,11,12,13,14,15,16,17,18,19,20,21", "9,10,11,12,13,14,15,16,17,18,19,20,21,22,23",
"9,10,11,12,13,14,15,16,17,19", "9,12", "9,12,13", "9,12,13,14",
"9,13", "9,13,14,15", "9,13,14,15,16,17", "9,13,14,15,18", "9,14",
"9,14,15,16", "9,15", "9,15,16,17", "9,16", "9,16,17,18,19,21,22",
"9,16,17,19", "9,17", "9,17,18", "9,19", "9,19,20", "9,19,20,21",
"9,19,21", "9,20", "9,20,21", "9,20,21,22", "9,21", "9,22", "9,23"
), class = "factor")), .Names = c("10", "20", "52.5", "81", "110",
"140.5", "189", "222.5", "278", "340", "397", "453.5", "529",
"580", "630.5", "683.5", "735.5", "784", "832", "882.5", "926.5",
"973", "1108", "1200", "Clusters"), row.names = c("at1g01050.1",
"at1g01080.1", "at1g01090.1", "at1g01220.1", "at1g01420.1", "at1g01470.1",
"at1g01800.1", "at1g01910.5", "at1g01920.2", "at1g01980.1", "at1g02020.2",
"at1g02100.2", "at1g02130.1", "at1g02140.1", "at1g02150.1", "at1g02500.2",
"at1g02560.1", "at1g02780.1", "at1g02880.3", "at1g02920.1"), class = "data.frame")
Third Table:
> dput(tbl_col_clu3[1:20,])
structure(list(`10` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0), `33.95` = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `58.66` = c(0, 0, 0, 0, 0.328143363,
0.552139556, 0.495919686, 0, 0, 0, 0, 0, 0, 0, 0, 0.416266322,
0.886125103, 1, 1, 0), `84.42` = c(0, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 0.327004551, 0, 0, 0, 0.956778355, 1, 0.175277617, 0.240402438,
0), `110.21` = c(0, 0, 0, 0, 0, 0.151581882, 0, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0.091367379, 0.029316359, 0, 0), `134.16` = c(0.190968551,
0, 0, 0, 0, 0.164736594, 0, 0, 0, 0, 0, 0.650199285, 0, 0, 0,
0, 0.097800974, 0.007393484, 0, 0), `164.69` = c(0.5342874459,
0, 0.3619993464, 0, 0, 0.1891527151, 0, 0, 0, 0, 0, 0.4926963182,
0, 0, 0, 0, 0, 0, 0, 0), `199.1` = c(0.866134859, 0, 0.405387979,
0, 0, 0.274468991, 0, 0, 0, 0, 0, 0.352737127, 0.170514318, 0,
0, 0, 0, 0, 0, 0), `234.35` = c(1, 0, 0.446118481, 0, 0, 0.338427523,
0, 0, 0, 0, 0, 0.204601923, 0.343919727, 0, 0, 0, 0, 0, 0, 0),
`257.19` = c(0.732231652, 0, 0.666653103, 0, 0, 0.403078017,
0, 0, 0, 0, 0, 0.315665123, 1, 0, 0, 0, 0, 0, 0, 0), `361.84` = c(0.660960044,
0, 1, 0, 0, 0.202578329, 0, 0, 0, 0, 0, 0.320183046, 0.424361453,
0, 0, 0, 0, 0, 0, 0), `432.74` = c(0.47961801, 0, 0.48323321,
0, 0, 0.25926071, 0, 0, 0, 0, 0, 0.36362413, 0.43039587,
0, 0, 0, 0, 0, 0, 0), `506.34` = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0.22943212, 0.19354376, 0, 0, 0, 0, 0, 0, 0), `581.46` = c(0,
0.52783556, 0, 1, 0, 0, 0, 0.64407392, 0, 0.70701938, 0,
0.2596209, 0.29757967, 0, 0, 0, 0, 0, 0, 0), `651.71` = c(0,
0.32678969, 0, 0.36428195, 0, 0, 0, 0.64951761, 0, 0.80866933,
1, 0.18614028, 0.21567888, 0.32813633, 0, 0, 0, 0, 0, 0),
`732.59` = c(0, 0.229023369, 0, 0.312832425, 0, 0, 0, 0.696041374,
0, 0.590471454, 0, 0.108699479, 0.187935709, 0.275177957,
0, 0, 0, 0, 0, 0.243080694), `817.56` = c(0, 0.25668583,
0, 0.4003249, 0, 0, 0, 0.53376606, 0, 0.85524485, 0, 0.22539659,
0.27977127, 0.55089774, 0, 0, 0, 0, 0, 1), `896.24` = c(0,
0.31675535, 0, 0.50882005, 0, 0, 0, 0.74705458, 0.12936306,
1, 0, 0.1949139, 0.21957859, 0.75063327, 0, 0, 0, 0, 0, 0.63346358
), `971.77` = c(0, 0.27811949, 0, 0.48419038, 0, 0, 0, 0.8563439,
0.39897143, 0.84491933, 0, 0.13935282, 0.17670128, 0.84111004,
0, 0, 0, 0, 0, 0), `1038.91` = c(0, 1, 0, 0.52506752, 0,
0, 0, 1, 1, 0.85617714, 0, 0.13507463, 0, 1, 0, 0, 0, 0,
0, 0), Clusters = structure(c(222L, 88L, 237L, 88L, 145L,
155L, 143L, 88L, 122L, 88L, 97L, 180L, 260L, 102L, 186L,
145L, 149L, 149L, 145L, 106L), .Label = c("10", "10,11",
"10,11,12", "10,11,12,13", "10,11,12,13,14", "10,11,12,13,14,15",
"10,11,12,13,14,15,16", "10,11,12,13,14,15,16,17,18", "10,11,12,13,14,15,16,17,18,19",
"10,11,12,13,14,15,16,17,18,19,20", "10,11,12,14", "10,11,12,14,15",
"10,11,12,14,15,16", "10,11,12,14,15,16,17,18", "10,11,12,14,15,16,17,18,19",
"10,11,12,14,15,16,17,18,19,20", "10,11,12,14,15,17,18,19",
"10,11,12,15,16,17", "10,11,14", "10,11,15", "10,11,15,16,17",
"10,11,16", "10,11,17", "10,11,20", "10,12", "10,14,15,16",
"10,14,15,16,17,18,19", "10,15", "10,15,16", "10,15,16,18",
"10,16,19", "10,18,19,20", "10,19", "10,19,20", "10,20",
"11", "11,12", "11,12,13", "11,12,13,14", "11,12,13,14,15",
"11,12,13,14,15,16", "11,12,13,14,15,16,17,18", "11,12,13,14,15,16,17,18,19",
"11,12,13,14,15,16,17,18,19,20", "11,12,13,14,15,16,18,19",
"11,12,14,15", "11,12,14,15,16,17", "11,12,14,15,16,17,18",
"11,12,14,15,16,17,18,19", "11,12,14,15,16,17,18,19,20",
"11,12,18", "11,12,19", "11,12,20", "12", "12,13", "12,13,14",
"12,13,14,15", "12,13,14,15,16", "12,13,14,15,16,17,18",
"12,13,14,15,16,17,18,19,20", "12,14", "12,14,15", "12,14,15,16",
"12,14,15,16,17", "12,14,15,16,17,18", "12,14,15,16,17,18,19",
"12,14,15,16,17,18,19,20", "12,14,15,16,20", "12,14,15,18,19,20",
"12,15", "12,16", "12,16,17,18", "12,18,19,20", "12,19,20",
"12,20", "13", "13,14", "13,14,15", "13,14,15,16,17,18,19,20",
"13,16", "13,20", "14", "14,15", "14,15,16", "14,15,16,17",
"14,15,16,17,18", "14,15,16,17,18,19", "14,15,16,17,18,19,20",
"14,15,16,18", "14,15,17", "14,15,18", "14,16", "14,16,17",
"14,16,17,18,19,20", "14,18,19,20", "14,19", "15", "15,16",
"15,16,17", "15,16,17,18", "15,16,17,18,19", "15,16,17,18,19,20",
"15,20", "16", "16,17", "16,17,18", "16,17,18,19", "16,17,18,19,20",
"16,17,18,20", "16,17,19", "16,18,19,20", "16,19,20", "17",
"17,18", "17,18,19", "17,18,19,20", "17,18,20", "17,19,20",
"17,20", "18", "18,19", "18,19,20", "19", "19,20", "2", "2,19,20",
"2,3", "2,3,4", "2,3,4,5", "2,3,4,5,11", "2,3,4,5,6", "2,3,4,5,6,7,8",
"2,3,4,5,6,7,8,11,12", "2,3,4,5,6,7,8,9", "2,3,4,5,6,7,8,9,10",
"2,3,4,5,6,7,8,9,10,11", "2,3,4,5,6,7,8,9,10,11,12", "2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
"2,4", "2,5", "2,5,6,7", "20", "3", "3,18", "3,4", "3,4,10",
"3,4,20", "3,4,5", "3,4,5,6", "3,4,5,6,7", "3,4,5,6,7,8",
"3,4,5,6,7,8,9", "3,4,5,6,7,8,9,10", "3,4,5,6,7,8,9,10,11",
"3,4,5,6,7,8,9,10,11,12", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17",
"3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
"3,4,8", "3,4,8,9", "3,5", "3,7", "3,9", "4", "4,5", "4,5,12,13",
"4,5,16", "4,5,6", "4,5,6,16,17,18,19,20", "4,5,6,20", "4,5,6,7",
"4,5,6,7,8", "4,5,6,7,8,10,11", "4,5,6,7,8,9", "4,5,6,7,8,9,10",
"4,5,6,7,8,9,10,11", "4,5,6,7,8,9,10,11,12", "4,5,6,7,8,9,10,11,12,13,14,15",
"4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19", "4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
"4,5,6,7,8,9,10,11,12,14,15,16,17,18,19,20", "4,5,6,7,8,9,16,17",
"4,5,7,8,9,10,11,12,13,14,15,16,17,18,19,20", "4,6,7", "4,7,13",
"5", "5,11,12,14,15,16,17,18,19", "5,14", "5,14,15,16", "5,16,19",
"5,17,18,19,20", "5,18", "5,6", "5,6,7", "5,6,7,10", "5,6,7,8",
"5,6,7,8,10", "5,6,7,8,9", "5,6,7,8,9,10", "5,6,7,8,9,10,11",
"5,6,7,8,9,10,11,12", "5,6,7,8,9,10,11,12,13", "5,6,7,8,9,10,11,12,13,14",
"5,6,7,8,9,10,11,12,13,14,15,16", "5,6,7,8,9,10,11,12,13,14,15,16,17,18",
"5,6,7,8,9,10,11,12,13,14,15,16,17,18,19", "5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
"5,6,7,8,9,16,17,18,19,20", "5,6,8", "5,7,8,9,10", "5,7,8,9,10,14,15,16,17,18",
"5,8", "6", "6,7", "6,7,16", "6,7,8", "6,7,8,10,11,12,15,16,17,18",
"6,7,8,19", "6,7,8,9", "6,7,8,9,10", "6,7,8,9,10,11", "6,7,8,9,10,11,12",
"6,7,8,9,10,11,12,13,14", "6,7,8,9,10,11,12,13,14,15,16,17",
"6,7,8,9,10,11,12,13,14,15,16,17,18,19", "6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
"6,7,8,9,10,11,12,14,15,16", "6,7,8,9,10,18,19", "7", "7,10,11,14,15",
"7,12", "7,8", "7,8,12", "7,8,9", "7,8,9,10", "7,8,9,10,11",
"7,8,9,10,11,12", "7,8,9,10,11,12,13", "7,8,9,10,11,12,13,14,15,16",
"7,8,9,10,11,12,13,14,15,16,17,18", "7,8,9,10,11,12,13,14,15,16,17,18,19",
"7,8,9,10,11,12,13,14,15,16,17,18,19,20", "7,8,9,10,11,12,14,15,16,17,18,19",
"7,8,9,10,11,12,14,15,16,17,18,19,20", "7,8,9,10,12,15,16,17,18",
"7,9,10,11,12,13,14,15,16,17,18,19,20", "8", "8,10", "8,10,20",
"8,14,15,16,17,18,19,20", "8,16,17", "8,9", "8,9,10", "8,9,10,11",
"8,9,10,11,12", "8,9,10,11,12,13,14", "8,9,10,11,12,13,14,15",
"8,9,10,11,12,13,14,15,16", "8,9,10,11,12,13,14,15,16,17,18",
"8,9,10,11,12,13,14,15,16,17,18,19", "8,9,10,11,12,13,14,15,16,17,18,19,20",
"8,9,10,11,12,14,15,16", "8,9,10,11,12,14,15,16,17,18,19,20",
"8,9,10,14,15,16,17,18,19,20", "8,9,17", "9", "9,10", "9,10,11",
"9,10,11,12", "9,10,11,12,13,14,15,16,17", "9,10,11,12,13,14,15,16,17,18",
"9,10,11,12,13,14,15,16,17,18,19", "9,10,11,12,13,14,15,16,17,18,19,20",
"9,10,11,12,14,15,16", "9,10,11,12,14,15,16,17,18", "9,10,11,12,14,15,16,17,18,19",
"9,10,11,12,14,15,16,17,18,19,20", "9,10,11,12,16,17,18,19,20",
"9,10,11,14,15,16,17", "9,10,12,14,15,16,17", "9,10,14,15",
"9,11,12", "9,11,12,14", "9,12,14", "9,20"), class = "factor")), .Names = c("10",
"33.95", "58.66", "84.42", "110.21", "134.16", "164.69", "199.1",
"234.35", "257.19", "361.84", "432.74", "506.34", "581.46", "651.71",
"732.59", "817.56", "896.24", "971.77", "1038.91", "Clusters"
), row.names = c("at1g01050.1", "at1g01080.1", "at1g01090.1",
"at1g01320.2", "at1g01470.1", "at1g01800.1", "at1g01910.5", "at1g01960.1",
"at1g01980.1", "at1g02150.1", "at1g02470.1", "at1g02500.2", "at1g02560.1",
"at1g02780.1", "at1g02816.1", "at1g02880.2", "at1g02920.1", "at1g02930.2",
"at1g03030.1", "at1g03090.2"), class = "data.frame")
The last column (Clusters) is important for us and the row.names. This column says in which column we can find any abundance for that gene. It doesn't matter for me in which exaclty cluster is gene but which genes come together with it.
Let's use an example:
Those genes belong to the same cluster (cluster 5) in data1.
at1g09640.1
at1g07250.1
at1g08200.1
at1g09300.2 ##
at1g09490.2 ## Those
at1g09760.1 ##
at1g09780.1
If we analyze other data set (data2). We can see that some of those genes can be found together again. Maybe it's different cluster (cluster 20) or so but they are together and that's most important for me.
at1g02880.3
at1g01220.1
at1g09300.2 ##
at1g09490.2 ## Those
at1g09760.1 ##
at1g02130.1
I have like 15 similar data sets and I would like to be able to ask R: show me genes which can be found together in 15 of 15 data sets or 13 of 15 data sets and so on....
Any ideas ?
First, you need to turn those comma delimited lists into columns- it is much easier to work with them that way. Then, you want to find which genes have the matching columns. Finally, you can aggregate to get totals of how many genes match other genes.
Note that you will have both orders of genes, as well as genes matched with themselves. Also, the "Clusters" column will tell you how many times they were in the same exact set of clusters.
This will run in O(n^2) time, meaning that doubling the number of genes analyzed will quadruple the time. My quick timing tests estimate it would take 15 hours on my computer to do 15 data frames of 2300 rows.
library(plyr)
frame_list <- list(tbl_col_clu1, tbl_col_clu2, tbl_col_clu3)
turn_numbers_into_columns <- function(x) {
# Creates a data.frame that has the group numbers as columns
x[, strsplit(x$Clusters, ",")[[1]]] <- 1
return(x)
}
get_comparison <- function(current_table) {
# Creates a comparison data frame for a single input table
simplified_frame <- data.frame(
"gene" = row.names(current_table),
"Clusters" = as.character(current_table$Clusters),
stringsAsFactors = FALSE)
split_f <- adply(simplified_frame, 1, turn_numbers_into_columns)
#This is the slow line
comparison_frame <- ddply(split_f, "gene", function(x) {
ddply(split_f, "gene", function (y) {
output <- as.data.frame(x == y)
output$gene <- x$gene
output$gene2 <- y$gene
return(output)
})
})
return(comparison_frame)
}
combined_frame <- ldply(frame_list, get_comparison)
sum_frame <- aggregate(
combined_frame[, !(names(combined_frame) %in% c("gene", "gene2"))],
by = combined_frame[, c("gene", "gene2")],
FUN = sum,
na.rm = T)
View(sum_frame)
If you had consistently the same set of genes and groupings, you could turn everything into arrays, which run faster than data frames, cutting your time by a factor of about six. The part that runs very slowly would be replaced with something like this. It returns 3-dimensional arrays that you could add together.
comparison_frame <- aaply(split_f, 1, function(x) {
print(x)
output <- aaply(split_f, 1, function (y) {
output <- array(x == y, c(1, length(x)))
return(output)
})
return(output)
})
Throw them into SPMF with Apriori or FPGrowth algorithm. SPMF expects input as file of comma-separated sequences of integers (you may have to convert your data). Each sequence is on separate string:
1,2,4,10
3,2,1,11,12
2,5,14,5
You invoke it like this:
java -jar spmf.jar run FPGrowth sequences.txt output.txt 35% 90%
First number is minimal support (how many sets should contain your group to consider it a group). SPMF contains different algorithms You can try to see which one fits you best.

Plotting all of the rows in different graph - data frame

Propably the code is very simple but I have never tried plotting in R yet.
I would like to have a linear plot for every row and all the plots on different graph.
The number in my data goes from 0 to 1. Value one is the maximum of the plot, in some cases there might be few maximums in a single row. I would like to have a pdf file as an output.
Data:
> dput(head(tbl_end))
structure(list(`NA` = structure(1:6, .Label = c("AT1G01050",
"AT1G01080", "AT1G01090", "AT1G01220", "AT1G01320", "AT1G01420",
"ATCG00800", "ATCG00810", "ATCG00820", "ATCG01090", "ATCG01110",
"ATCG01120", "ATCG01240", "ATCG01300", "ATCG01310", "ATMG01190"
), class = "factor"), `10` = c(0, 0, 0, 0, 0, 0), `20` = c(0,
0, 0, 0, 0, 0), `52.5` = c(0, 1, 0, 0, 0, 0), `81` = c(0, 0.660693687777888,
0, 0, 0, 0), `110` = c(0, 0.521435654491704, 0, 0, 0, 1), `140.5` = c(0,
0.437291194705566, 0, 0, 0, 1), `189` = c(0, 0.52204783488213,
0, 0, 0, 0), `222.5` = c(0, 0.524298383907171, 0, 0, 0, 0), `278` = c(1,
0.376865096972469, 0, 1, 0, 0), `340` = c(0, 0, 0, 0, 0, 0),
`397` = c(0, 0, 0, 0, 0, 0), `453.5` = c(0, 0, 0, 0, 0, 0
), `529` = c(0, 0, 0, 0, 0, 0), `580` = c(0, 0, 0, 0, 0,
0), `630.5` = c(0, 0, 0, 0, 0, 0), `683.5` = c(0, 0, 0, 0,
0, 0), `735.5` = c(0, 0, 0, 0, 0, 0), `784` = c(0, 0, 0.476101907006443,
0, 0, 0), `832` = c(0, 0, 1, 0, 0, 0), `882.5` = c(0, 0,
0, 0, 0, 0), `926.5` = c(0, 0, 0, 0, 1, 0), `973` = c(0,
0, 0, 0, 0, 0), `1108` = c(0, 0, 0, 0, 0, 0), `1200` = c(0,
0, 0, 0, 0, 0)), .Names = c(NA, "10", "20", "52.5", "81",
"110", "140.5", "189", "222.5", "278", "340", "397", "453.5",
"529", "580", "630.5", "683.5", "735.5", "784", "832", "882.5",
"926.5", "973", "1108", "1200"), row.names = c(NA, 6L), class = "data.frame").
Would be great to have a name of the row on the top of each page in pdf.
Here's an example using your dputed data:
# open the pdf file
pdf(file='myfile.pdf')
# since I don't know what values should be on the X axis,
# I'm just using values from 1 to number of y-values
x <- 1:(ncol(tbl_end)-1)
for(i in 1:nrow(tbl_end)){
# plot onto a new pdf page
plot(x=x,y=tbl_end[i,-1],type='b',main=tbl_end[i,1],xlab='X',ylab='Y')
}
# close the pdf file
dev.off()
where the first page is something like this:
If you want to change the style (e.g. lines without the little circles etc.) of the plot, have a look at the documentation.

Resources