Forcing ggplot to display graphs with no data? - r

I am working on displaying some UMAPs from Seurat object HVvU01temp using the following:
Idents(HVvU01temp) <- HVvU01temp#meta.data$integrated_snn_res.0.9
temptemp <- HVvU01temp[,HVvU01temp#meta.data$integrated_snn_res.0.9=="0"]
temp <- RunUMAP(temptemp, reduction = "pca", dims = 1:100)
my_levels <- c("PBMC_HV","PBMC_UD1","PBMC_UD2","PBMC_UD3","PBMC_UD4","PBMC_UD5","PBMC_UD6","PBMC_UD7","CSF_HV","CSF_UD1","CSF_UD2","CSF_UD3","CSF_UD4","CSF_UD5")
temp#meta.data$compartment.class2 <- factor(x=temp#meta.data$compartment.class2, levels=my_levels)
DimPlot(temp,group.by="compartment.class2",split.by="compartment.class2",ncol=8,cols=c("red","blue","orchid","tomato","springgreen","steelblue","violet","darkred","orange","purple","navy","sienna","lightgoldenrod","hotpink4"))
I am selecting all cells from the original cluster 0, re-running UMAP on all selected cells, then generating a DimPlot for the data in which the data is split by compartment.class2 label. I should have 13 UMAPs organized in 2 rows of 8 columns (8 UMAPs on top, 5 on the bottom), such that PBMC_HV is directly above CSF_HV, PBMC_UD1 is above CSF_UD1, and so on.
However, some of the compartment.class2 labels were not present in cluster 0, so they are omitted from the split UMAP and cause my matched PBMC and CSF data to no longer be aligned (for example, PBMC_UD2 and CSF_UD1 did not have cells available to be plotted).
Is there a way for me to force the plot to leave an empty space where these UMAPs would have been so my matched PBMC and CSF plots are aligned properly?
Split UMAP plot of samples "PBMC_HV","PBMC_UD1","PBMC_UD2","PBMC_UD3","PBMC_UD4","PBMC_UD5","PBMC_UD6","PBMC_UD7","CSF_HV","CSF_UD1","CSF_UD2","CSF_UD3","CSF_UD4",and "CSF_UD5"

Related

Make multiple histograms at once and save them

I would like to make histogram of columns 5-34 of my data set and save them for reference. This is what I have and the error 'x must be numeric' is what keeps coming up. All of these columns have numeric data.
[data screenshot][1]
dput(longbroca)
histograms = c()
GBhistograms = c()
RThistograms = c()
for (i in 5:34){
hist(longbroca)
hist(GBlongbroca)
hist(RTlongbroca)
histograms = c(histograms, hist(longbroca[,5:34]))
GBhistograms = c(GBhistograms, hist(GBlongbroca[,5:34]))
RThistograms = c(RThistograms, hist(RTlongbroca[,5:34]))
}
#reproducible
fakerow1 <- c(100,80,60,40,20)
fakerow2 <- c(100,80,60,40,20)
fakedata = rbind(fakerow1,fakerow2)
colnames(fakedata) = c('ant1','ant2','ant3','ant4','ant5')
You cannot plot all of the columns with a single hist() function. That is why you are getting the error message. You are plotting histograms and saving the output list from the histogram. Your code does not save any histograms, only the data for producing them. If you actually want to save the plotted histograms, you need to plot them to a device (e.g. pdf).
We can use the iris dataset which comes with R (data(iris)) as some example data. The first 4 columns are numeric. If you just want the data for the histograms from the iris data set (columns 1 through 4):
# R will plot all four but you will only see the last one.
histograms <- lapply(iris[, 1:4], hist)
The variable histograms is a list that contains 6 elements. These are documented on the manual page for the function (?hist).
# To plot one of the histograms with a title and x-axis label:
lbl <- names(histograms)
plot(histograms[[1]], main=lbl[1], xlab=lbl[1])
# To plot them all
pdf("histograms.pdf")
lapply(1:4, function(x) plot(histograms[[x]], main=lbl[x], xlab=lbl[x]))
dev.off()
The file "histograms.pdf" will have all four histograms, one per page.

Add a dendrogram to a a plotly::subplot figure

Since this post doesn't seem to be responded I tried generating it myself using R's plotly.
What I'm trying to do is plot several homologous genomic DNA segments, which are essentially horizontally laid out boxes that represent genes, and to their left a phylogenetic tree that represents the evolutionary relationships between the species of the respective genomes.
The genes belong to several groups, not all are represented in every genome.
Here is the list of data.frames that represent the genomic DNA segments:
dna.segs.list <- list(data.frame(name=c(paste0("B.",1:3),paste0("C.",1:3)),y=0.2,width=0.75,group=c(rep("B",3),rep("C",3)),stringsAsFactors=F),
data.frame(name=c(paste0("A.",1:2),paste0("C.",1:3)),y=0.2,width=0.75,group=c(rep("A",2),rep("C",3)),stringsAsFactors=F),
data.frame(name=c(paste0("A.",1:2),"B.1"),y=0.2,width=0.75,group=c(rep("A",2),"B"),stringsAsFactors=F),
data.frame(name=c(paste0("B.",1:3),paste0("C.",1:3)),y=0.2,width=0.75,group=c(rep("B",3),rep("C",3))),
data.frame(name=paste0("A.",1:3),y=0.2,width=0.75,group=rep("A",3),stringsAsFactors=F))
Here's how I create a single plot of all of them:
x.range <- c(-1,9)
dna.segs.plot.list <- lapply(1:length(dna.segs.list),function(s){
dna.seg.df <- dna.segs.list[[s]]
dna.seg.df$group <- factor(dna.seg.df$group,levels=c("A","B","C"))
dna.seg.plot <- plotly::plot_ly(dna.seg.df,showlegend=s==1) %>%
plotly::add_bars(x=~name,y=~y,width=~width,color=~group,colors=c("red","blue","green")) %>%
plotly::layout(legend=list(x=1,y=0)) %>%
plotly::layout(xaxis=list(title=NA,zeroline=F,tickangle=45,range=x.range),yaxis=list(title=NA,zeroline=F,showgrid=F,range=c(0,1),showticklabels=F))
return(dna.seg.plot)
})
dna.segs.plot <- plotly::subplot(dna.segs.plot.list,shareX = F,nrows = length(dna.segs.plot.list))
Which gives:
The problem here already is that I need to customize the legend so that I only plot it once on the one hand (otherwise it will repeat for each genome), but will include all gene groups.
Then I create the phylogenetic tree and convert it to a ggplot object so that I can add it to the dna.segs.plot:
tree.obj <- ape::read.tree(text="(((species1:0.08,species2:0.075):0.028,(species3:0.06,species4:0.06):0.05):0.0055,species5:0.1);")
tree.dend <- dendextend::as.ggdend(phylogram::as.dendrogram.phylo(tree.obj))
leaf.heights <- dplyr::filter(tree.dend$nodes,!is.na(leaf))$height
leaf.xs <- dplyr::filter(tree.dend$nodes,!is.na(leaf))$x
leaf.seqments.idx <- which(tree.dend$segments$yend %in% leaf.heights & tree.dend$segments$x %in% leaf.xs)
tree.dend$segments$yend[leaf.seqments.idx] <- max(tree.dend$segments$yend[leaf.seqments.idx])
tree.dend$segments$col[leaf.seqments.idx] <- "black"
tree.dend$labels$y <- max(tree.dend$segments$yend[leaf.seqments.idx])
tree.dend$labels$x <- tree.dend$segments$x[leaf.seqments.idx]
tree.dend$labels$col <- "black"
tree.dend$segments$lwd <- 0.5
tree.ggdend <- ggplot(tree.dend,labels=F,horiz=T)+guides(fill=F)+coord_flip()+annotate("text",size=4.5,hjust=0,x=tree.dend$label$x,y=tree.dend$label$y,label=tree.dend$label$label)+labs(x="",y="")+theme_minimal()+
theme(axis.text=element_blank(),axis.ticks=element_blank(),panel.grid=element_blank(),legend.position="none",legend.text=element_blank(),legend.background=element_blank(),legend.key=element_blank())
And finally, to combine the two I use:
dna.segs.tree.plot <- plotly::subplot(tree.ggdend,plotly::plotly_empty(),dna.segs.plot %>% plotly::layout(showlegend=T),nrows=1,margin=c(0,0,0,0),widths=c(0.39,0.02,0.59))
Which gives me:
Which is close to what I want but the issues I need help with are:
Having the tips of the tree and the DNA segments aligned
Hoe to get the tree labels not get run over by the branches as they do now
How to avoid getting the ---(black,solid,1) (NA,1) parts of the legend off (I'm assuming they get added due to the tree)
Taking care of the legend issue I described above - getting it to show all groups.
Thanks

Table output of hierarchical clustering dendrogram in R

I have produced a dendrogram in R through hierarchical clustering analysis. I have 310 individuals that have been classified into 1 of 3 groups (my cut off, k, looks to be 3) based on 4 criteria. I have plotted the dendrogram, with the labels I want. But I am hoping to extract the results into a table which will be easier for me to use for further statistical work. I have manually gone through the small text on my dendrogram, but have found an error in my work, so I would like R to create the table for me to verify my work.
I have tried a few options from other websites, and from one entry on stackflow, but have not been successful. I would ideally want the data extraction to provide an output in this format:
columns[Individual ID, clustering group label (1-3)] #with all the results below for my 310 individuals
Here is what I have tried:
eaf.order <- matrix(data=NA, ncol=2, nrow=nrow(residency2), dimnames=list(c(), c("row.num", "row.name")))
leaf.order[,2] <- hc.complete2$labels[hc.complete2$order]
Which gives error:
Error in leaf.order[, 2] <- hc.complete2$labels[hc.complete2$order] : number of items to replace is not a multiple of replacement length

Color of the Diagonal in a Heatmap

I'm trying to interpret a heatmap I created with the following code:
csv <- read.csv("test.csv")
aggdata <-aggregate(csv[-1], list(csv[[1]]), sum)
row.names(aggdata) <- aggdata$Group.1
aggdata[["Group.1"]] = NULL
aggdata_matrix <- as.matrix(aggdata)
cor.mat <- cor(t(aggdata_matrix))
heatmap(cor.mat, Rowv=NA, Colv=NA)
The diagonal represents the similarity between the aggregated groups. So e.g. sports should be identical to sports and thus white. The same holds for politics and history.
However, I don't understand, why this isn't the case with art. As you can see in the left corner, the rectangle is not the same color as the remaining diagonal.
Why is this the case?
This is my example data:
doc1,word1,word2,word3,word4,word5,word6,word7,word8,word9,word10
POLITICS,8,1,3,8,5,0,0,3,4,4
SPORTS,4,5,3,4,2,5,3,3,0,7
HISTORY,3,0,4,3,0,3,8,3,3,1
SPORTS,5,7,3,8,6,4,5,6,3,4
ART,5,4,3,0,7,7,6,2,6,6
POLITICS,2,2,5,5,6,2,0,2,2,6
SPORTS,4,0,6,8,6,7,8,0,8,7
HISTORY,1,7,5,0,1,4,2,1,1,7
ART,0,8,3,3,8,6,3,1,3,6
SPORTS,6,7,3,2,6,7,2,1,1,7
POLITICS,8,0,2,7,0,2,6,5,3,1
POLITICS,7,0,4,2,0,3,8,1,1,3
The problem--which can be found quickly by stepping through the execution of heatmap (issue the command debug(heatmap) first)--is that the code has standardized the rows by default. Turn off this unwanted behavior by including scale="none" as an argument to heatmap.

Displaying TraMineR (R) dendrograms in text/table format

I use the following R code to generate a dendrogram (see attached picture) with labels based on TraMineR sequences:
library(TraMineR)
library(cluster)
clusterward <- agnes(twitter.om, diss = TRUE, method = "ward")
plot(clusterward, which.plots = 2, labels=colnames(twitter_sequences))
The full code (including dataset) can be found here.
As informative as the dendrogram is graphically, it would be handy to get the same information in text and/or table format. If I call any of the aspects of the object clusterward (created by agnes), such as "order" or "merge" I get everything labeled using numbers rather than the names I get from colnames(twitter_sequences). Also, I don't see how I can output the groupings represented graphically in the dendrogram.
To summarize: How can I get the cluster output in text/table format with the labels properly displayed using R and ideally the traminer/cluster libraries?
The question concerns the cluster package. The help page for the agnes.object returned by agnes
(See http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/agnes.object.html ) states that this object contains an order.lab component "similar to order, but containing observation labels instead of observation numbers. This component is only available if the original observations were labelled."
The dissimilarity matrix (twitter.om in your case) produced by TraMineR does currently not retain the sequence labels as row and column names. To get the order.lab component you have to manually assign sequence labels as both the rownames and colnames of your twitter.om matrix. I illustrate here with the mvad data provided by the TraMineR package.
library(TraMineR)
data(mvad)
## attaching row labels
rownames(mvad) <- paste("seq",rownames(mvad),sep="")
mvad.seq <- seqdef(mvad[17:86])
## computing the dissimilarity matrix
dist.om <- seqdist(mvad.seq, method = "OM", indel = 1, sm = "TRATE")
## assigning row and column labels
rownames(dist.om) <- rownames(mvad)
colnames(dist.om) <- rownames(mvad)
dist.om[1:6,1:6]
## Hierarchical cluster with agnes library(cluster)
cward <- agnes(dist.om, diss = TRUE, method = "ward")
## here we can see that cward has an order.lab component
attributes(cward)
That is for getting order with sequence labels rather than numbers. But now it is not clear to me which cluster outcome you want in text/table form. From the dendrogram you decide of where you want to cut it, i.e., the number of groups you want and cut the dendrogram with cutree, e.g. cl.4 <- cutree(clusterward1, k = 4). The result cl.4 is a vector with the cluster membership for each sequence and you get the list of the members of group 1, for example, with rownames(mvad.seq)[cl.4==1].
Alternatively, you can use the identify method (see ?identify.hclust) to select the groups interactively from the plot, but need to pass the argument as as.hclust(cward). Here is the code for the example
## plot the dendrogram
plot(cward, which.plot = 2, labels=FALSE)
## and select the groups manually from the plot
x <- identify(as.hclust(cward)) ## Terminate with second mouse button
## number of groups selected
length(x)
## list of members of the first group
x[[1]]
Hope this helps.

Resources