There are some issues with my ggplot chart that I can't seem to fix.
# as you may geuss from the file name I have provided this matrix below
vis.matrix <- read.csv("csvfileprovidedbelow.csv")
# setting up annotation_row
cell_df <- data.frame ("Cells" = c(rep("Putative Engram Cell", 10), rep("Random Cell",10))
)
rownames(cell_df) <- rownames(vis.matrix)
cell_df$Cells <- as.factor(cell_df$Cells)
#setting up colors
newCols <- colorRampPalette(grDevices::rainbow(length(unique(cell_df$Cells))))
annoCol <- c("2AFE00", "ACACAC") # green and grey
names(annoCol) <- levels(cell_df$Cells)
annoCol <- list(category = annoCol)
color=colorRampPalette(c("navy", "white", "red"))(50)
#plotting
pheatmap(vis.matrix,cluster_rows = F, cluster_cols=F, annotation_row = cell_df,
annotation_names_col = F, scale = "column", color = color,
annotation_row_colors = annoCol,
show_rownames = F)
Result
For some reason the Cells are not the colors I selected, you can search those colors here: https://www.color-hex.com/
Don't know why ggplot is ignoring the input I'm giving it. Also would like to remove the word "Cells" beside the color bars on the graph, it's unescessary the legend already explains what it is.
Variables as csv's for reproduction(copy and paste!)
vis.matrix is here:
"","LINGO1","ARC","INHBA","BDNF","MAPK4","ADGRL3","PTGS2","CHGB","BRINP1","KCNK1"
"P57_CATCGGGCATGTCGAT",-0.368245729284319,3.47987283505039,2.94634318740768,5.57309275653906,1.28904872906168,5.3650511213102,-0.368245729284319,2.25850383984707,4.60363764575367,-0.368245729284319
"P57_GAAGCAGGTAAAGGAG",-0.384074162377759,4.36118508997518,3.70326968156081,4.89874111968957,1.65959775959153,4.36118508997518,-0.384074162377759,-0.384074162377759,4.89874111968957,2.85506919772029
"P57_TGACTTTTCTTTACAC",-0.357194851773428,2.40812492004642,3.13225019258772,5.67855340720666,-0.357194851773428,3.13225019258772,-0.357194851773428,4.87697271476829,1.38752767040715,-0.357194851773428
"P57_CTAGAGTGTCCGACGT",1.50110424640379,3.34315724311024,2.57863617381809,6.67240079339861,3.34315724311024,3.93616585502151,-0.340948750302666,1.50110424640379,5.77821885172796,3.34315724311024
"P57_CCTTACGTCCAAGTAC",-0.381478022176755,4.73256922534426,2.17554560158375,6.70465771162764,1.23182426263886,3.36449387848259,-0.381478022176755,2.17554560158375,4.45842883227008,3.36449387848259
"P57_ATCCGAAGTGTGACCC",2.60172319423431,1.50562420175544,-0.36816940232616,5.57161579079479,1.50562420175544,3.37941780583703,-0.36816940232616,3.37941780583703,4.47551679831591,3.98264461101114
"P57_TCCACACAGCTCCTCT",-0.364903374339472,2.59101007342497,2.59101007342497,5.23001785519025,-0.364903374339472,3.36504411201368,-0.364903374339472,1.5000703688371,1.5000703688371,-0.364903374339472
"P57_CTGAAGTGTGCTTCTC",-0.384690873645543,3.35025193111807,2.83241374986762,4.71429931551947,3.35025193111807,3.35025193111807,-0.384690873645543,3.35025193111807,2.16480422093696,2.16480422093696
"P57_CTGATAGAGAATCTCC",1.6886646742164,2.87694996247181,-0.342722443403036,7.39148929746973,1.6886646742164,5.75143890945527,-0.342722443403036,5.75143890945527,4.37401237658979,-0.342722443403036
"P57_GGAGCAACATACAGCT",-0.351186802480077,1.4651606822983,1.4651606822983,5.40649850082577,-0.351186802480077,4.34400333395122,-0.351186802480077,1.4651606822983,5.09785565185506,1.4651606822983
"A57_CGTCTACCAGACGCAA",-0.229651158962319,-0.229651158962319,-0.229651158962319,-0.229651158962319,-0.229651158962319,3.72717582194343,-0.229651158962319,-0.229651158962319,-0.229651158962319,-0.229651158962319
"P57_GTTCGGGCAATGGACG",-0.269219507178484,-0.269219507178484,-0.269219507178484,-0.269219507178484,-0.269219507178484,4.26241026631276,-0.269219507178484,-0.269219507178484,-0.269219507178484,-0.269219507178484
"P56_GGTATTGTCATGTCTT",-0.294887130864939,-0.294887130864939,-0.294887130864939,-0.294887130864939,-0.294887130864939,5.06808977241301,-0.294887130864939,-0.294887130864939,-0.294887130864939,-0.294887130864939
"A67_AAATGCCAGATAGTCA",4.03836820795661,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977,-0.211281061058977
"P76_CCCTGATAGAGGACTC",-0.507269585219581,-0.507269585219581,-0.507269585219581,1.90264065061749,-0.507269585219581,4.86614536666517,-0.507269585219581,1.40253909173334,-0.507269585219581,0.697685532698955
"P56_GATCGATTCCGTCAAA",2.00727896845415,-0.313514850319463,-0.313514850319463,2.00727896845415,-0.313514850319463,3.36485632434217,-0.313514850319463,-0.313514850319463,-0.313514850319463,-0.313514850319463
"P57_GCTGCAGCATAGGATA",2.32839123926114,-0.289105834618761,-0.289105834618761,-0.289105834618761,-0.289105834618761,2.32839123926114,-0.289105834618761,-0.289105834618761,-0.289105834618761,4.94588831314104
"P82_AGGATAACATAGGTTC",1.39699437520094,-0.501641808549684,0.696264250985952,1.39699437520094,-0.501641808549684,4.49353661848721,-0.501641808549684,-0.501641808549684,1.89417031052159,-0.501641808549684
"P82_CCAAGCGTCCGGCTTT",-0.328980171926236,-0.328980171926236,-0.328980171926236,4.08682708745919,-0.328980171926236,1.87892345776647,-0.328980171926236,-0.328980171926236,4.08682708745919,-0.328980171926236
"P57_CAGCGACCATGTCCTC",-0.316475979591103,-0.316475979591103,-0.316475979591103,2.18079240270816,-0.316475979591103,6.13886914288907,-0.316475979591103,2.18079240270816,-0.316475979591103,4.67806078500742
pheatmap is not ggplot. It is drawn using grid graphics.
Anyway, you would pass the color specification as follows:
pheatmap(vis.matrix, cluster_rows = F, cluster_cols=F, annotation_row = cell_df,
annotation_names_col = F, scale = "column", color = color,
annotation_colors = list(Cells = c("Putative Engram Cell" = "#2AFE00",
"Random Cell" = "#ACACAC")),
show_rownames = F)
I am trying to draw a Venn diagram with four logical variables. I have tried many different R packages but with each of them I have faced some problems. So far, the best result I have achieved by using the ggvenn package. However, the problem is, that it shows the percentages of the intersections based on the observations included in the diagram, instead of all observations in the data.
Below is an example Venn diagram and its code to illustrate the problem. So my question is: is there some way to display the percentages in relation to the total amount of observations in the data. For instance, in the diagram below the intersection of ABCD consists of 45 observation and thus the correct proportion would be 4.5% (i.e. 45/1000) instead of 4.7%.
I would really appreciate if someone could help me out with this.
library(ggvenn)
a <- sample(x = c(TRUE, TRUE, FALSE), size = 1000, replace = TRUE)
b <- sample(x = c(TRUE,FALSE, FALSE), size = 1000, replace = TRUE)
c <- sample(x = c(TRUE, FALSE, FALSE, FALSE), size = 1000, replace = TRUE)
d <- sample(x = c(TRUE, TRUE, TRUE, FALSE), size = 1000, replace = TRUE)
df <- tibble(values = c(1:1000), A=a, B=b, C=c, D=d)
ggvenn(df,
fill_color = c("black", "grey70", "grey80", "grey90"),
show_percentage = TRUE,
digits = 1,
text_size = 2.5)
A quick fix for this problem would be to use the latest Github version of {ggvenn}:
remove.packages("ggvenn")
devtools::install_github("yanlinlin82/ggvenn")
Then, by default, there will be also a percentage for the observations that are outside A/B/C/D. This gives you the percentages you're looking for:
I made the heatmap using the code below:
library(pheatmap)
library(dplyr)
data = data.frame(matrix(runif(10*10), ncol=10))
data$sample = rep(c("tumour", "normal"), 5)
data$subject.ID = paste('Subject', 1:10)
data = data %>% arrange(sample)
# for row annotation
my_sample_col = data %>% select(sample)
rownames(my_sample_col) = data$subject.ID
# data matrix
mat = as.matrix(data %>% select(-sample, -subject.ID))
rownames(mat) = data$subject.ID
pheatmap(mat,
scale='row',
annotation_row = my_sample_col,
annotation_names_row=F,
cluster_rows = FALSE,
cluster_cols = FALSE,
show_colnames = FALSE,
show_rownames = FALSE)
I want to put a gap between row 5 and row 6, to separate the heatmap according to my row annotation.
In pheatmap function, the argument gaps_row seems to do the job.
vector of row indices that show shere to put gaps into heatmap. Used only if the rows are not clustered.
I'm not sure how to implement that. Can someone help me with this? Thanks a lot.
I would recommend using ComplexHeatmap package (website; Gu et al, 2016). You can install it with devtools::install_github("jokergoo/ComplexHeatmap").
It has more functionalities, but you also have to invest more time (eg., row annotation and matrix scaling).
library(ComplexHeatmap)
# Create annotation for rows
my_sample_col_ano <- rowAnnotation(sample = my_sample_col$sample,
show_annotation_name = FALSE)
# Scale original matrix row-wise
matS <- t(apply(mat, 1, scale))
# Plot heatmap
Heatmap(matS,
# Remove name from fill legend
name = "",
# Keep original row/col order
row_order = rownames(matS), column_order = colnames(matS),
# Add left annotation (legend with tumor/normal)
left_annotation = my_sample_col_ano,
# ACTUAL SPLIT by sample group
row_split = my_sample_col$sample,
show_row_names = FALSE, show_column_names = FALSE,
show_row_dend = FALSE, show_column_dend = FALSE,
row_title = NULL)
If you want to use original pheatmap pass argument to gaps_row which is equal to the size of your group (ie, normal):
pheatmap(mat,
scale='row',
gaps_row = 5,
annotation_row = my_sample_col,
annotation_names_row=F,
cluster_rows = FALSE,
cluster_cols = FALSE,
show_colnames = FALSE,
show_rownames = FALSE)
If you can more groups than two instead of hardcoding numeric value to gaps_row (ie, gaps_row = 5) you can pass this snippet (head(as.numeric(cumsum(table(my_sample_col$sample))), -1)).
I'm learning how to use R and for an exercise I'm using an experimental ExpresionSet from Bioconductor: http://bioconductor.org/packages/release/data/experiment/html/leukemiasEset.html
As simple as that: My heat map doesn't display the color key and the main title appears chopped off on the left site. I had given up with the no color key problem after searching on the web how to solve it and not really finding a solution. But then I added the title and since it doesn't appear properly either, I thought that maybe both things are due to the same (unknown for me) reason...
Here is my code:
difexp <- exprs(leukemiasEset)[c(which(adjPval < 0.01)),c(1:12, 25:36)]
heatmap.2(difexp,
trace = "none",
cexCol = 0.6,
ColSideColors = as.character(as.numeric(factor(leukemiasEset$LeukemiaType[c(1:12, 25:36)]))),
main = "Differentially expressed genes in ALL and CLL samples",
cex.main = 1.5,
key = TRUE,
keysize = 1.5,
density.info = "histogram")
Where:
pvalues <- c()
for(i in 1:nrow(exprs(leukemiasEset))) {
R <- t.test(exprs(leukemiasEset)[i, leukemiasEset$LeukemiaType == "ALL"],
exprs(leukemiasEset)[i, leukemiasEset$LeukemiaType == "CLL"],
var.equal = TRUE)
pvalues <- c(pvalues, R$p.value)
}
adjPval <- p.adjust(pvalues, method = "fdr")
And this is how it looks like:
Since I am a beginner, I think this is probably an "easy peasy" for the experts... thank you very much in advance!
Use "\n" to insert a newline in your title and put half of it on a second line.
heatmap.2(difexp,
trace = "none",
cexCol = 0.6,
ColSideColors = as.character(as.numeric(factor(leukemiasEset$LeukemiaType[c(1:12, 25:36)]))),
main = "Differentially expressed genes\nin ALL and CLL samples",
cex.main = 1.5,
key = TRUE,
keysize = 1.5,
density.info = "histogram")