I'm trying to use heatmap3 to visualize my bacterial community data. I have already clustered my samples, and have a dendrogram from hclust that I want to use to order my samples in the heatmap, while showing the dendrogram as well. However, the data file that I am putting into heatmap3 has been collapsed to the genera level, so the structure that I had before isn't immediately apparent to heatmap3, and I therefore don't want heatmap3 to try to do its own clustering.
I've tried using my hclust object as a dendrogram and passing it to Colv, but the order of the samples in my heatmap, for whatever reason, becomes literally random while the dendrogram looks correct. When I extract the order of samples in my dendrogram and re-order my input matrix according to this order (without a dendrogram), I get visible clustering. Essentially what I want is to give heatmap my dendrogram that I made from an hclust function, and have it order the samples in that order WITH the dendrogram above, and then plot the relative abundances in the heatmap. What am I missing?
Unfortunately I don't have data that I'm able to provide, or an illustrative example. If this question is impossible to answer without data and R code, I will do my best to put something together.
You can try pheatmap package there parameters cluster_rows and cluster_cols accept hclust objects.
Related
I'm planning to use patchwork to assemble several ROC curves plotted with pROC. After constructing a pROC plot list (of S3: roc objects) and attempting to use wrap_plots(plots) to assemble, I came across the following error:
Error: Only know how to add ggplots and/or grobs
AFAIK, there may be several solutions:
Coerce S3:roc objects to ggplots. It seems the function fortify does this job for S3 objects generated by precrec package but I don't know if S3:roc objects can be done in the same way. Using ggplot2::fortify I ran into
`data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class roc.
Use precrec to streamline the conversion, instead. What curtails my migration is that I want to print Youden index point and confidence intervals of the Youden index point and area under curve (AUC) on the plot. It seems only pROC package meets all my needs so I don't quite want to move on. Also I need to adjust my codes to cater parameter demands from precrec. Too much to learn and try, so tutorials and simple codes are appreciated.
Whatever, my final purpose is being able to assemble all ROC curves programmatically, with automatic annotations. The ROC curves need to show their respective Youden index point and confidence intervals of the Youden index point and area under curve (AUC) on the plot.
Drawbacks exist in the pROC package, too. The text sizes of Youden index and confidence interval values are too small for the whole plot if all ROC plots are assembled. I can adjust them by specifying par(cex=<text size>) but there's ricks that the texts may overlap with the curves or get out of bound if the texts are too marginal. pROC is not smart enough to reconcile with text sizes, curves and text positions. A smarter package to meet all of my harsh demands mentioned above will strongly push me forward to adopt a new package to draw ROC curves. Therefore, solutions vary in my scenario (but please don't recommend using a graphical vector image editor to edit these curves by hand because it's time-consuming and error-prone, and lags changing demands from different journals). All insights from all perspectives are appreciated.
Have you tried the ggroc function from pROC? It does exactly what you're asking for: it creates a ggplot2 plot (class gg) which you can then manipulate as you wish.
However I think you are being slightly confused:
Coerce S3:roc objects to ggplots. It seems the function fortify does this job for S3 objects generated by precrec package
It makes sense that the precrec package would be able to convert its own objects. However, note that it doesn't generate a ggplot2 object, but a data.frame with the coordinates of the ROC curve (which can then be used as input for ggplot2).
In pROC, this exact operation is done with the coords function, which extracts the coordinates of the ROC curve to a data.frame (and that you can then use as input for ggplot2).
before k-means clustering for consumer segmentation, I want to identify and delete outliers of my sample. I tried hierarchical clustering with single linkage algorithm. The problem is, I have a sample with more than 800 cases, and in my plot (single linkage dendrogram) the numbers are written across each other and therefore not readable, so it is impossible for me to clearly identify the outliers by just looking at the graph :-/
Here they say, you can create boxplots based on the branch distance to identify outliers in a more objective way. I thought that would be also a great way to just make the row numbers of the outliers in my dataset readable, however I am struggling with creating the boxplots..
https://link.springer.com/article/10.1186/s12859-017-1645-5/figures/3
Does anyone know, how to write the code to get the boxplots based on the height of the branches?
This is the code I use for clustering and attached you can see the plot
dr_dist<-dist(dr_ma_cluster[,c(148:154)])
hc_dr<-hclust(dr_dist,method = "single") #single linkage
plot(hc_dr,labels=(row.names(dr_ma_cluster)))
This is my failed trial to do the boxplot, as I don't know how to address the branch height
> boxplot(hc_dr)
Error in x[floor(d)] + x[ceiling(d)] :
non-numeric argument for binary operator
> boxplot(hc_dr[,c(148:154)])
Error in hc_dr[, c(148:154)] : Incorrect number of dimensions
And here another way to do the graph (and some automated outlier detection approach), but it makes the readability even worse with large datasets..
Another code to plot the tree, even less readable for large datasets:
Delete outliers automatically of a calculated agglomerative hierarchical clustering data
Thanks for any help!!
boxplot(hc_dr$height) as suggested by StupidWolf was the simple thing I was looking for.
Unfortunately I did not manage to label the outlier dots with the rownames from the original dataframe. Rownames from the branch height table were useless as they were assigned in ascending order.
hang = 0.0001 gave a better look to the dendrogram, but labels were still unreadable as still over eachother.
If anyone has a similar problem check R Shiny, zoomable dendrogram program
the code given there in the answer was super easy to adapt, resulting in a zoomable dendrogram, which makes it easy to identify the relevant cases (->outliers). for details search dendextendas proposed by csgroen.
Both together, the boxplot and this nice tool served to identify the rownames of the outliers after single linkage clustering in order to delete them before km means clustering
I am plotting the survival probability for my dataframe with 8 different groups with this command:
fit2<-Surv((time=t2$uptimeDay,event=t2$solved,type='right')~t2$cluster)
plot(fit2,conf.int=F,xlim=c(0, 250),mark.time=c(1,50,100,200),mark=c(1,3,4,2,5,7,6,8,9,10),lwd=1,cex=0.7,lty = 1:11,xlab='Time(days)',ylab='Survival Probability')
the cluster here is a number between 1 and 10.
I would like to know how to automatically set the colors of the curves together with an automatic legend using key of the curves.
Can somebody help me out with this?
I have a function that I use for Kaplan-Meier curves that is based on ggplot2, which will take care of the colors and legends for you. Regrettably, I've not gotten around to packaging it up in any sensible way. But you can download the source code from
https://gist.github.com/nutterb/004ade595ec6932a0c29
And some examples on how to use it from
https://gist.github.com/nutterb/fb19644cc18c4e64d12a
It's not clear what you mean by making this "automatic" and the desire to "use the key of the curves", but perhaps you are asking that the colors of the curves match the legend.
png()
mycols=c("red","blue")
plot(prio.fit, fill=mycols)
legend(x="bottomleft", col=mycols, legend=mycols)
dev.off()
If you want this mated to a dataset and wanted to specify particular colors for your groups, then you will need to provide a dataset so there is something meaningful to use as labels, and be more specific about the coloring schema needed.
I'm doing a density compare in R using the sm package (sm.density.compare). Is there anyway I can get a mathematical description of the graph or at least a table with number of points rather than a plot back? I would like to plot the resulting graphs in a different application, but need the data to do so.
Thanks a lot for the help,
culicidae
I am using R with igraph and I have a square matrix with weights. I want to sort it. I thought to use page.rank(g) and I got a corresponding vector and its values.
library(igraph)
g<-get.matrix()
page.rank(g)$value
page.rank(g)$vector
Now I want to sort using this values and visualizing it in a graph if it is possible.
Something similar to the following picture:
How I could do this?
Choose a force-based layout and set the vertex size (vertex.size) to be proportional to the page rank values. See an example on the igraph homepage on how to set the vertex size. (The example uses tkplot, but you can just use plot instead of that.) You can set the vertex labels via the vertex.label argument to plot, and \n is allowed to make multi-line labels.