Venn Diagram in R to show character labels - r

I'm very new to R and trying to create a 4-dimensional Venn diagram with a data which contains all categorical variables.
For instance, if i have the data below and I want to create a Venn Diagram in R to show the word "hello" at the intersection of A & B instead of the counts and percentages, how do i do that? I used the code ggVennDigram(x) after creating the list below and it gave me the counts instead of the actual data labels.
x=list()
x$A=as.character(c("hello"))
x$B=as.character(c("hello", "how", "are"))
x$C=as.character(c("how", "You"))
x$D=as.character(c("me", "her", "they"))

The graph cannot be label by default. This is solution with some hack.
(modified from Venn Diagram with Item labels)
library(VennDiagram)
library(stringr)
library(purrr)
# Generate plot
v <- venn.diagram(x,
fill = c("orange", "blue", "red", "green"),
filename=NULL)
# Calculate overlap site
overlaps <- calculate.overlap(x)
overlaps <- overlaps[str_sort(names(overlaps), numeric = TRUE)] # sort base on numeric value
# Apply name to global variable
# Index of venn diagram start at calculate.overlaps + 8. You have to find the index value by yourself for (3,5,6,7,.. venn)
walk2(seq(overlaps) +8, seq(overlaps),
function(x,y) {v[[x]]$label <<- paste0(overlaps[[y]], collapse = "\n")})
# Draw plot
grid.draw(v)

Related

Plot single block of color in R

I am trying to figure out how to plot a single block of color in R. I am trying to visualize a region of a genome with color. I am starting with a matrix that has 1 row and 6049 columns.
l1_canon <- matrix( nrow = 1, ncol = 6049, data = "_" )
Next, I have blocks that differentiate major regions of this element:
l1_canon[,1:909] <- "5' UTR"
l1_canon[,910:1923] <- "ORF1"
l1_canon[,1990:5814] <- "ORF2"
l1_canon[,49:420] <- "CPG"
l1_canon[,5815:6049] <- "3' UTR"
l1_canon[,211:225] <- "RXRA::VDR"
I have assigned colors to the different categories:
l1_colors <- list()
l1_colors[["5' UTR"]] <- "#26A064" # "#ea0064"
l1_colors[["ORF1"]] <- "#3095C7" # "#008a3f"
l1_colors[["ORF2"]] <- "#CA6BAA" # "#116eff"
l1_colors[["CPG"]] <- "#B38241" # "#cf00dc"
l1_colors[["3' UTR"]] <- "#CCCCCC" # "#dddddd"
l1_colors[["RXRA::VDR"]] <- "#FFFFFF"
l1_colors[["_"]] <- "#000000"
But I can't figure out how to plot this. I am looking for something like the color ramp functions in R , and have been trying to adapt the code unsuccessfully.
I tried assigning colors like so
for ( i in l1_canon ){
l1_color <- l1_colors[ l1_canon ]
}
and using it in the code that was used to generate the color ramp plots, but I am getting errors. I am aware that having 6000+ columns is going to make this weird visually, but, it's what I need! I am hoping I can make the individual color blocks small enough to fit on a screen. Eventually, this bar is going to be annotation above another image.
TY for your help! :)
I don't fully understand what you want, but you could use ggplot2 as follows:
# Find the run lengths of the regions
rle1 = rle(l1_canon[1,])
# Turn the run lengths into a data frame
df=data.frame(lengths=rle1$lengths, V=rle1$values)
# Align the colours with the regions
df$color <- unlist(l1_colors)[df$V]
# Plot a single stacked bar on its side with no annotation
ggplot(df, aes(x=1,group=seq_along(V),label=V, fill=color,y=lengths)) +
geom_bar(stat="identity",color="black")+
scale_fill_identity() +
theme_void() +
coord_flip()+
scale_y_reverse()

Base R Choropleth: colors aren't being applied to the map according to the order of the interval/breaks which makes the map hard to read

I created a choropleth with base R but I'm struggling with the colors. First, the colors don't follow the same order as the intervals and second, two of the intervals are using the same color, all of which makes the graph hard to read. This happens regardless of how many colors I use. It also doesn't matter whether I'm using brewer.pal or base colors.Here is a map with its respective legend illustrating the issue.
Below are the statements that I use to create the graph once data has been downloaded:
#Relevant packages:
library(dplyr)
library(RColorBrewer)
library(rgdal)
#create colors vector
pop_colors <- brewer.pal(8,"Purples")
#create breaks/intervals
pop_breaks <- c(0,20000,40000,60000,80000,100000,120000)
#apply breaks to population
cuts <- cut(cal_pop$Pop2016, pop_breaks, dig.lab = 6)
#create a vector with colors by population according to the interval they belong to:
color_breaks <- pop_colors[findInterval(cal_pop$Pop2016,vec = pop_breaks)]
Create choropleth
plot(cal_pop,col = color_breaks, main = "Calgary Population (2016)")
#create legend
legend("topleft", fill = color_breaks, legend = levels(cuts), title = "Population")
I used readOGR() command to read the shape file, which I'm linking here in case anybody is interested in taking a look at the data.
I'd appreciate any advice you could give me.
Thanks!
Your error is in this line:
color_breaks <- pop_colors[findInterval(cal_pop$Pop2016,vec = pop_breaks)]
I can't read your data file, so I'll use a built-in one from the sf package.
library(sf)
nc <- readOGR(system.file("shapes/", package="maptools"), "sids")
str(nc#data)
colors <- brewer.pal(8,"Purples")
#create breaks/intervals
sid_breaks <- c(0,2,4,6,8,10,12,20,60)
#apply breaks to population
sid_cuts <- cut(nc$SID79, sid_breaks, dig.lab = 6, include=TRUE)
#create a vector with colors by population according to the interval they belong to:
sid_colors <- colors[sid_cuts]
#Create choropleth
par(mar=c(0,0,0,0))
plot(nc, col = sid_colors)
legend("bottomleft", fill = colors, legend = levels(sid_cuts), nc=2, title = "SID (1979)", bty="n")

How to combine state distribution plot and separate legend in traminer?

Plotting several clusters using seqdplot in TraMineR can make the legend messy, especially in combination with numerous states. This calls for additional options for modifying the legend which is available with the function seqlegend. However, I have a hard time combining a state distribution plot (seqdplot) with a separate modified legend (seqlegend). Ideally one wants to plot the clusters (e.g. 9) without a legend and then add the separate legend in the available bottom right row, but instead the separate legend is generating a new plot window. Can anyone help?
Here's an example using the biofam data. With the data I use in my own research the legend becomes much more messy since I have 11 states.
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
biofam.seq <- seqdef(biofam[501:600, 10:25])
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = F)
#Separate legend
seqlegend(biofam.seq, title = "States", ncol = 2)
#Combine state distribution plot and separate legend
#??
Thank you.
The seqplot function does not allow to control the number of columns of the legend, nor does it allow to add a legend title. So you have to compose the plot yourself by generating a separated plot for each group with the legend disabled and adding the legend afterwards. Here is how you can do that:
cluster9 <- factor(cluster9)
levc <- levels(cluster9)
lev <- length(levc)
par(mfrow=c(5,2))
for (i in 1:lev)
seqdplot(biofam.seq[cluster9 == levc[i],], border=NA, main=levc[i], with.legend=FALSE)
seqlegend(biofam.seq, ncol=4, cex = 1.2, title='States')
========================
Update, Oct 1, 2018 =================
Since TraMineR V 2.0-9, the seqplot family of functions now support (when applicable) the argument ncol to control the number of columns in the legend. To add a title to the legend, you still have to proceed as shown above.
AFAIK seqlegend() doesn't work when the other plots you are plotting utilizes the groups arguments. In your case the only thing seqlegend() is adding is a title "States". If you are looking to add a legend so you can customize what is in the legend and so forth, you can accomplish that by providing the corresponding alphabet and states that are used in your analysis.
The package's website has several walkthroughs and guides enumerating the various options and so forth: Link to their webiste
#Data
library(TraMineR)
library(WeightedCluster)
data(biofam)
## Generate alphabet and states
alphabet <- 0:7
states <- letters[seq_along(alphabet)]
biofam.seq <- seqdef(biofam[501:600, 10:25], states = states, alphabet = alphabet)
#OM distances
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE")
#9 clusters
wardCluster <- hclust(as.dist(biofam.om), method = "ward.D2")
cluster9 <- cutree(wardCluster, k = 9)
#State distribution plot
seqdplot(biofam.seq, group = cluster9, with.legend = TRUE)

How can I extract the matrix derived from a heatmap created with gplots after hierarchical clustering?

I am making a heatmap, but I can't assign the result in a variable to check the result before plotting. Rstudio plot it automatically. I would like to get the list of rownames in the order of the heatmap. I'am not sure if this is possible. I'am using this code:
hm <- heatmap.2( assay(vsd)[ topVarGenes, ], scale="row",
trace="none", dendrogram="both",
col = colorRampPalette( rev(brewer.pal(9, "RdBu")) )(255),
ColSideColors = c(Controle="gray", Col1.7G2="darkgreen", JG="blue", Mix="orange")[
colData(vsd)$condition ] )
You can assign the plot to an object. The plot will still be drawn in the plot window, however, you'll also get a list with all the data for each plot element. Then you just need to extract the desired plot elements from the list. For example:
library(gplots)
p = heatmap.2(as.matrix(mtcars), dendrogram="both", scale="row")
p is a list with all the elements of the plot.
p # Outputs all the data in the list; lots of output to the console
str(p) # Struture of p; also lots of output to the console
names(p) # Names of all the list elements
p$rowInd # Ordering of the data rows
p$carpet # The heatmap values
You'll see all the other values associated with the dendrogram and the heatmap if you explore the list elements.
To others out there, a more complete description way to capture a matrix representation of the heatmap created by gplots:
matrix_map <- p$carpet
matrix_map <- t(matrix_map)

Heatmap with categorical variables and with phylogenetic tree in R

:)
I have a question and did not find any answer by personal search.
I would like to make a heatmap with categorical variables (a bit like this one: heatmap-like plot, but for categorical variables ), and I would like to add on the left side a phylogenetic tree (like this one : how to create a heatmap with a fixed external hierarchical cluster ). The ideal would be to adapt the second one since it looks much prettier! ;)
Here is my data:
a newick-formatted phylogenetic tree, with 3 species, let's say:
((1,2),3);
a data frame:
x<-c("species 1","species 2","species 3")
y<-c("A","A","C")
z<-c("A","B","A")
df<- data.frame(x,y,z)
(with A, B and C being the categorical variables, for instance in my case presence/absence/duplicated gene).
Would you know how to do it?
Many thanks in advance!
EDIT: I would like to be able to choose the color of each of the categories in the heatmap, not a classic gradation. Let's say A=green, B=yellow, C=red
I actually figured it out by myself. For those that are interested, here is my script:
#load packages
library("ape")
library(gplots)
#retrieve tree in newick format with three species
mytree <- read.tree("sometreewith3species.tre")
mytree_brlen <- compute.brlen(mytree, method="Grafen") #so that branches have all same length
#turn the phylo tree to a dendrogram object
hc <- as.hclust(mytree_brlen) #Compulsory step as as.dendrogram doesn't have a method for phylo objects.
dend <- as.dendrogram(hc)
plot(dend, horiz=TRUE) #check dendrogram face
#create a matrix with values of each category for each species
a<-mytree_brlen$tip
b<-c("gene1","gene2")
list<-list(a,b)
values<-c(1,2,1,1,3,2) #some values for the categories (1=A, 2=B, 3=C)
mat <- matrix(values,nrow=3, dimnames=list) #Some random data to plot
#plot the hetmap
heatmap.2(mat, Rowv=dend, Colv=NA, dendrogram='row',col =
colorRampPalette(c("red","green","yellow"))(3),
sepwidth=c(0.01,0.02),sepcolor="black",colsep=1:ncol(mat),rowsep=1:nrow(mat),
key=FALSE,trace="none",
cexRow=2,cexCol=2,srtCol=45,
margins=c(10,10),
main="Gene presence, absence and duplication in three species")
#legend of heatmap
par(lend=2) # square line ends for the color legend
legend("topright", # location of the legend on the heatmap plot
legend = c("gene absence", "1 copy of the gene", "2 copies"), # category labels
col = c("red", "green", "yellow"), # color key
lty= 1, # line style
lwd = 15 # line width
)
and here is the resulting figure :)
I am trying to use your same syntax and the R packages ape, gplots and RColorsBrewer to make a heatmap whose column dendrogram is esssentially a species tree.
But I am unable to proceed beyond reading in my tre file. There are various errors when trying to perform any of the following operations on the tree file read in:
a) plot, or
b) compute.brlen, and
c) plot, after collapse.singles, looks totally mangled in terms of species tree topology
I suspect there is something wrong with my tre input, but not sure what is. Would you happen to understand what is wrong and how I could fix it? Thank you!
(((((((((((((Mt3.5v5, Mt4.0v1), Car), (((Pvu186, Pvu218), (Gma109, Gma189)), Cca))), (((Ppe139, Mdo196), Fve226), Csa122)), ((((((((Ath167, Aly107), Cru183), (Bra197, Tha173)), Cpa113), (Gra221, Tca233)), (Csi154, (Ccl165, Ccl182))), ((Mes147, Rco119),(Lus200, (Ptr156, Ptr210)))), Egr201)), Vvi145), ((Stu206, Sly225), Mgu140)), Aco195), (((Sbi79, Zma181),(Sit164, Pvi202)), (Osa193, Bdi192))), Smo91), Ppa152), (((Cre169, Vca199), Csu227), ((Mpu228, Mpu229), Olu231)));

Resources