How to get subclusters from the function aheatmap in R - r

I'm using aheatmap in the NMF package to construct heat maps of array and RNA-seq data.
I'm trying to extract subclusters of data in the same way that this user was trying to use cutree to extract from hclust objects
I can't seem to find an attribute of the aheatmap object that is of type hclust or interpreble by cutree. Any advice would be greatly appreciated.
I've also posted this both here and in BioStar, because it's not entirely an R question and it's not entirely a bioinformatics question.
A few things I've tried:
library(NMF)
#create some data
d <- matrix(rnorm(120),12,10)
# cluster it
heatmp.obj = aheatmap(d)
# define some clusters
mycl <- cutree(heatmp.obj$Rowv, k=2) #this produces an error
mycl <- cutree(heatmp.obj$Colv, k=2) #this produces an error

aheatmap returns a list, two elements of which are type 'dendrogram'. 'dendrogram' can be coerced to type 'hclust' using as.hclust(). For instance,
a = cutree(as.hclust(heatmp.obj$Rowv), k=3)

Based on the example in the second link you provided:
d <- matrix(rnorm(120),12,10)
hr <- hclust(dist(d, method="euclidean"), method="complete")
mycl <- cutree(hr, k=2)
heatmp.obj <- aheatmap(d,Rowv=as.dendrogram(hr))
Does that get you closer to your desired answer?
EDIT
Image with aheatmap:
Image with hclust:

Related

Set common y axis limits from a list of ggplots

I am running a function that returns a custom ggplot from an input data (it is in fact a plot with several layers on it). I run the function over several different input data and obtain a list of ggplots.
I want to create a grid with these plots to compare them but they all have different y axes.
I guess what I have to do is extract the maximum and minimum y axes limits from the ggplot list and apply those to each plot in the list.
How can I do that? I guess its through the use of ggbuild. Something like this:
test = ggplot_build(plot_list[[1]])
> test$layout$panel_scales_x
[[1]]
<ScaleContinuousPosition>
Range:
Limits: 0 -- 1
I am not familiar with the structure of a ggplot_build and maybe this one in particular is not a standard one as it comes from a "custom" ggplot.
For reference, these plots are created whit the gseaplot2 function from the enrichplot package.
I dont know how to "upload" an R object but if that would help, let me know how to do it.
Thanks!
edit after comments (thanks for your suggestions!)
Here is an example of the a gseaplot2 plot. GSEA stands for Gene Set Enrichment Analysis, it is a technique used in genomic studies. The gseaplot2 function calculates a running average and then plots it and another bar plot on the bottom.
and here is the grid I create to compare the plots generated from different data:
I would like to have a common scale for the "Running Enrichment Score" part.
I guess I could try to recreate the gseaplot2 function and input all of the datasets and then create the grid by facet_wrap, but I was wondering if there was an easy way of extracting parameters from a plot list.
As a reproducible example (from the enrichplot package):
library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]
wpgmtfile <- system.file("extdata/wikipathways-20180810-gmt-Homo_sapiens.gmt", package="clusterProfiler")
wp2gene <- read.gmt(wpgmtfile)
wp2gene <- wp2gene %>% tidyr::separate(term, c("name","version","wpid","org"), "%")
wpid2gene <- wp2gene %>% dplyr::select(wpid, gene) #TERM2GENE
wpid2name <- wp2gene %>% dplyr::select(wpid, name) #TERM2NAME
ewp2 <- GSEA(geneList, TERM2GENE = wpid2gene, TERM2NAME = wpid2name, verbose=FALSE)
gseaplot2(ewp2, geneSetID=1, subplots=1:2)
And this is how I generate the plot list (probably there is a much more elegant way):
plot_list = list()
for(i in 1:3) {
fig_i = gseaplot2(ewp2,
geneSetID=i,
subplots=1:2)
plot_list[[i]] = fig_i
}
ggarrange(plotlist=plot_list)

define population level for PCA analysis in adegenet

I want to perform a PCA analysis in adegenet starting from a genepop file without defined populations.
I imported the data like this:
datapop <- read.genepop('tous.gen', ncode=3, quiet = FALSE)
it works, and I can perform a PCA after scaling the data.
But I would like to plot the results / individuals on the PCA axis according to their population of origin using s.class. I have a vcf file with a three lettre code for each individual. I imported it in R:
pops_list <- read.csv('liste_pops.csv', header=FALSE)
but now how can I use it to define population levels in the genind object datapop?
I tried something likes this:
setPop(datapop, formula = NULL)
setPop(datapop) <- pops_list
but it doesn't work; even the first line doesn't work: I get this message:
"Erreur : formula must be a valid formula object."
And then how should I use it in s.class?
thanks
Didier
Without a working example it is kind of hard to tell but perhaps you can find the solution to your problem here: How to add strata information to a genind
Either way from your examples and given how the setPop method works, your line setPop(datapop, formula = NULL) would not work because you would not be defining anything. You would actually have to do:
setPop(datapop) <- pops_list
while also guaranteeing that pops_list is a factor with the appropriate format
I know this is a bit late, but the way to do this is to add pops_list as the strata and then use setPop() to select a certain column:
strata(datapop) <- pops_list
setPop(datapop) <- ~myPop # set the population to the column called "myPop" in the data frame

How to make R output text details about a dendrogram object?

Please see my previous question for details relating to test data and commands used to create a dendrogram: Using R to cluster based on euclidean distance and a complete linkage metric, too many vectors?
Here is a quick summary of my commands to make the dendrogram:
un_exprs <- as.matrix(read.table("sample.txt", header=TRUE, sep = "\t", row.names = 1, as.is=TRUE))
exprs <- t(un_exprs)
eucl_dist=dist(exprs,method = 'euclidean')
hie_clust=hclust(eucl_dist, method = 'complete')\
dend <- as.dendrogram(hie_clust)
plot(dend)
This makes a very nice dengrogram plot. However, lets say this dendrogram has 2 clusters... I want to get a text list of each element belonging to each of the 2 clusters. I'm assuming this is trivial, but I don't have enough experience with R for this to be intuitive. Thanks!
You can compute this from the hclust return with stats::cutree
cutree(hie_clust,k=2)

R programming - Graphic edges too large error while using clustering.plot in EMA package

I'm an R programming beginner and I'm trying to implement the clustering.plot method available in R package EMA. My clustering works fine and I can see the results populated as well. However, when I try to generate a heat map using clustering.plot, it gives me an error "Error in plot.new (): graphic edges too large". My code below,
#Loading library
library(EMA)
library(colonCA)
#Some information about the data
data(colonCA)
summary(colonCA)
class(colonCA) #Expression set
#Extract expression matrix from colonCA
expr_mat <- exprs(colonCA)
#Applying average linkage clustering on colonCA data using Pearson correlation
expr_genes <- genes.selection(expr_mat, thres.num=100)
expr_sample <- clustering(expr_mat[expr_genes,],metric = "pearson",method = "average")
expr_gene <- clustering(data = t(expr_mat[expr_genes,]),metric = "pearson",method = "average")
expr_clust <- clustering.plot(tree = expr_sample,tree.sup=expr_gene,data=expr_mat[expr_genes,],title = "Heat map of clustering",trim.heatmap =1)
I do not get any error when it comes to actually executing the clustering process. Could someone help?
In your example, some of the rownames of expr_mat are very long (max(nchar(rownames(expr_mat)) = 271 characters). The clustering_plot function tries to make a margin large enough for all the names but because the names are so long, there isn't room for anything else.
The really long names seem to have long stretches of periods in them. One way to condense the names of these genes is to replace runs of 2 or more periods with just one, so I would add in this line
#Extract expression matrix from colonCA
expr_mat <- exprs(colonCA)
rownames(expr_mat)<-gsub("\\.{2,}","\\.", rownames(expr_mat))
Then you can run all the other commands and plot like normal.

'x' is a list, but does not have components 'x' and 'y'

i am trying to plot a ROC curve for a multiclass problem, using multiclass.roc function from pROC package, but I get this error:
'x' is a list, but does not have components 'x' and 'y'
What does this error mean cause searching in the web didn't help me to find an answer. I can print the roc object, but can not plot it.
Thank you!
If you call plot on a list l: plot (l), the x coordinates will be taken from l$x and the y coordinates from l$y. Your list doesn't have elements x and y.
You need to call plot (l$your.x.coordinate, l$your.y.coordinate) instead.
Another (lazy) approach is to simply use the useful library
install.packages('useful')
library(useful)
Example -
wineUrl <- 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data'
wine <- read.table(wineUrl, header=F, sep=',')
wine_kmeans <- wine[, which(names(wine) != "Cultivar")]
wine_cluster <- kmeans(x=wine_kmeans , centers=3)
plot(wine_cluster, data=wine_kmeans)

Resources