Heatmap generation in r - r

I need some suggestions regarding the heatmap generation with heatmap.2 in R. I have a matrix for gene expression values which has 15,616 rows and 27 columns for generating the heatmap. The problem is the code I am using is creating the heatmap but the visualization is not good as the matrix size is large. So can you give me suggestions as to how to get a proper heatmap out of such a huge matrix? I am attaching the command I am using to generate the heatmap and the warnings I am receiving. It would be great if someone can help me with the adjustment of the dimensions to generate a proper heatmap.
color <- colorpanel(100,low="blue",mid="white",high="red")
heatmap.2(data4,Rowv="none",col=color,trace='none',
density.info="none",scale="row",labRow=NULL,
lmat=rbind( c(0, 3), c(2,1), c(0,4) ), lhei=c(1.5, 4, 2 ))
Warning message:
In heatmap.2(data4, Rowv = "none", col = color, trace = "none", :
Discrepancy: Rowv is FALSE, while dendrogram is `column'. Omitting row dendogram.
It would be nice to have suggestions regarding a proper visualization of the heatmap with the colour key panel small and the image more distinct on columns which are my conditions and the image shifted towards a bit right as when I am generating it is a bit shifted to left. I am unable to upload the image of the heatmap as I am new to the forum and don't have that privilege. I am unable to judge the appropriate values for the matrix while generating the heatmap.

Your question is a bit too vague to get a detailed answer, however, here are a couple of things to help you out:
Colours. Typically, you want the mid point to be zero. So you probably want to try something like:
breaks = c(seq(min(data4), 0, length.out=128),
seq(0, max(data4), length.out=128))
heatmap.2(..., col=bluered(255), breaks=breaks,...)
Your matrix is too large - well make it smaller. Typically, only "differentially expressed genes" are shown in the heatmap. So select the top 50 genes or so, and plot them.

I'm afraid I can't help you resolve your issue with heatmap.2, as I myself found it does not perform overly well with specific tasks and data sets.
I would recommend looking at ComplexHeatmaps which might potentially be more suited to such large volumes of data. It also has extensive supporting documents on how to use it that can be found here.
As of writing this, the following command with get you the very most up-to-date version from GitHub, though a stable version is available on Bioconductor.
Most Recent (you must use force = TRUE):
library(devtools)
install_github("jokergoo/ComplexHeatmap", force = TRUE)
library(ComplexHeatmap)
Stable:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ComplexHeatmap", version = "3.8")
I know it is sub-optimal to be told to change packages but it is a more comprehensive tool AND is very well explained in the support file. The creator jokergoo also appears to be highly attentive. Moreover, I am not sure how effectively / easily heatmap.2 can be coerced into displaying what I assume would be 421,632 data cells. I have just switched to ComplexHeatmaps and find it to be very thorough.
I'm going to leave some sample images below to show you the scope of ComplexHeatmaps given the amount of data you appear to have.
Simple heatmap example (doesn't have to be flashy):
Complex heatmap example (i):
Complex heatmap example (ii):

Related

How can I create circular cladograms in R?

I want to produce a circular cladogram in R. I was trying out the ape package and could produce something like this:
plot(tree,'f', use.edge.length=F)
Now I am not really happy with how the edges look like here. I tried out the evolview webserver, which got me something like this, which looks much nicer, especially regarding the edges.
Can anyone suggest other R packages or a different approach with the ape package, to get similar results to the evolview tree?
The two main differences that stand out to me are the size of the labels and the relative lengths of the edges.
Label size can be controlled using the cex graphical parameter (using par(), or as a parameter to plot()).
Uniform edge length can be added to the tree by replacing the $edge.length property with a vector of 1s:
par(cex = 0.8) # Shrink text
tree$edge.length <- rep_along(1, tree$edge.length)
plot(tree, 'f', use.edge.length = TRUE)

How can you create Marginal Histogram Scatterplot using lattice package (not ggplot2)?

Long story short, I am working on an assignment for a data visualization course and the assignment specifies that we have to use the lattice package and that we have to create a marginal histogram scatterplot. (I know that asking about homework questions is frowned upon, but I'm not asking you to write my assignment for me - only asking for guidance or at least a direction to start in).
Our lecture and book don't mention anything about marginal histogram scatterplots and while the lecture shows how to create them using the standard plot function in R as well as how to do it using ggplot2, we are told not to use either. I've never used lattice before, and when I ask for help, I get general responses that aren't helpful at all.
Note: I'm not posting the question or what type of data I have to use as I'm not looking for an answer to the homework here. Just some help on where to begin. You can literally use any data if you want to show an example.
This is definitely a tricky question in lattice as well. There are quite some compelling reasons why ggplot2 has become one of the more popular packages, while lattice is still extremely powerful. As this is part of a visualization course, I'd assume you are meant to come up with something similar to ggMarginal. For this you'll have to use some time adjusting margins on your lattice plot.
As a guideline for how I'd solve this question, I found an answer doing the following:
Search google for lattice marginal histogram, the second link is an answer to a mailing help list, which gives an example to a similar problem
Open R and following the link make a small example. Eg.
data(mtcars)
library(lattice)
scatter <- xyplot(hp ~ mpg, mtcars)
hist <- histogram(~ mpg, mtcars)
plot(scatter, more = TRUE, split = c(1, 2, 1, 2))
plot(hist, more = FALSE, split = c(1, 1, 1, 2))
after getting this far, it comes about figuring what is actually happening. The link above suggests looking at ?plot.trellis, and the importance here seems how can we move around our plots, which seems to be controlled by split. Looking at the documentation (?plot.trellis) we get some help for understanding how to use this argument
a vector of 4 integers, c(x, y, nx, ny), that says to position the current plot at the x, y position in a regular array of nx by ny plots. (Note: this has origin at top left)
From here we have everything we need to create the marginal plot, If we make this a 2x2 plot, we'd place one histogram at c(1, 1, 2, 2), a scatter plot at c(2, 1, 2, 2) and another histogram at c(2, 2, 2, 2). Of course this is not going to be the best looking marginal plot, for which you'd have to work with the margins or go under the hood and manually set up the plot using the grid package. I'd say that is definitely a bit on the "next level" side of thing.
Note:
In the above example I didn't cover how one can rotate one histogram, or how one can create a sideways histogram, if you are seeking to replicate ggMarginal more closely.
In addition as you said you had some problems finding information on this. Another option for finding an answer would've been reading the ?histogram documentation page. There are several examples within this page (and many others) which show how one can manipulate the position of lattice plots.

How to visualize a large network in R?

Network visualizations become common in science in practice. But as networks are increasing in size, common visualizations become less useful. There are simply too many nodes/vertices and links/edges. Often visualization efforts end up in producing "hairballs".
Some new approaches have been proposed to overcome this issue, e.g.:
Edge bundling:
http://vis.stanford.edu/papers/divided-edge-bundling or
https://gephi.org/tag/edge-bundling/
Hierarchial edge bundling:
http://graphics.cs.illinois.edu/sites/graphics.dev.engr.illinois.edu/files/edgebundles.pdf
Group Attributes Layout:
http://wiki.cytoscape.org/Cytoscape_3/UserManual
How to make grouped layout in igraph?
I am sure that there are many more approaches. Thus, my question is:
How to overcome the hairball issue, i.e. how to visualize large networks by using R?
Here is some code that simulates an exemplary network:
# Load packages
lapply(c("devtools", "sna", "intergraph", "igraph", "network"), install.packages)
library(devtools)
devtools::install_github(repo="ggally", username="ggobi")
lapply(c("sna", "intergraph", "GGally", "igraph", "network"),
require, character.only=T)
# Set up data
set.seed(123)
g <- barabasi.game(1000)
# Plot data
g.plot <- ggnet(g, mode = "fruchtermanreingold")
g.plot
This questions is related to
Visualizing Undirected Graph That's Too Large for GraphViz?. However, here I am searching not for general software recommendations but for concrete examples (using the data provided above) which techniques help to make a good visualization of a large network by using R (comparable to the examples in this thread: R: Scatterplot with too many points).
Another way to visualize very large networks is with BioFabric (www.BioFabric.org), which uses horizontal lines instead of points to represent the nodes. Edges are then shown using vertical line segments. A quick D3 demo of this technique is shown at: http://www.biofabric.org/gallery/pages/SuperQuickBioFabric.html.
BioFabric is a Java application, but a simple R version is available at: https://github.com/wjrl/RBioFabric.
Here is a snippet of R code:
# You need 'devtools':
install.packages("devtools")
library(devtools)
# you need igraph:
install.packages("igraph")
library(igraph)
# install and load 'RBioFabric' from GitHub
install_github('RBioFabric', username='wjrl')
library(RBioFabric)
#
# This is the example provided in the question:
#
set.seed(123)
bfGraph = barabasi.game(1000)
# This example has 1000 nodes, just like the provided example, but it
# adds 6 edges in each step, making for an interesting shape; play
# around with different values.
# bfGraph = barabasi.game(1000, m=6, directed=FALSE)
# Plot it up! For best results, make the PDF in the same
# aspect ratio as the network, though a little extra height
# covers the top labels. Given the size of the network,
# a PDF width of 100 gives us good resolution.
height <- vcount(bfGraph)
width <- ecount(bfGraph)
aspect <- height / width;
plotWidth <- 100.0
plotHeight <- plotWidth * (aspect * 1.2)
pdf("myBioFabricOutput.pdf", width=plotWidth, height=plotHeight)
bioFabric(bfGraph)
dev.off()
Here is a shot of the BioFabric version of the data provided by the questioner, though networks created with values of m > 1 are more interesting. The inset detail shows a close-up of the upper left corner of the network; node BF4 is the highest-degree node in the network, and the default layout is a breadth-first search of the network (ignoring edge directions) starting from that node, with neighboring nodes traversed in order of decreasing node degree. Note that we can immediately see that, for example, about 60% of node BF4's neighbors are degree 1. We can also see from the strict 45-degree lower edge that this 1000-node network has 999 edges, and is therefore a tree.
Full disclosure: BioFabric is a tool that I wrote.
That's an interesting question, I didn't know most of the tools you listed, so thanks. You can add HivePlot to the list. It's a deterministic method consisting in projecting nodes on a fixed number of axes (usually 2 or 3). Look a the linked page, there're many visual examples.
It works better if you have a categorical nodal attribute in your dataset, so that you can use it to select which axis a node goes to. For instance, when studying the social network of a university: students on one axis, teachers on another and administrative staff on the third. But of course, it can also work with a discretized numerical attribute (eg. young, middle-aged and older people on their respective axes).
Then you need another attribute, and it has to be numerical (or at least ordinal) this time. It is used to determine the position of a node on its axis. You can also use some topological measure, such as degree or transitivity (clustering coefficient).
(source: hiveplot.net)
The fact the method is deterministic is interesting, because it allows comparing different networks representing distinct (but comparable) systems. For example, you can compare two universities (provided you use the same attributes/measures to determine axes and position). It also allows describing the same network in various ways, by choosing different combinations of attributes/measures to generate the visualization. This is the recommanded way of visualizing a network, actually, thanks to a so-called hive panel.
Several softwares able of generating those hive plots are listed in the page I mentioned at the beginning of this post, including implementations in Java and R.
I've been dealing with this problem recently. As a result, I've come up with another solution. Collapse the graph by communities/clusters. This approach is similar to the third option outlined by the OP above. As a word of warning, this approach will work best with undirected graphs. For example:
library(igraph)
set.seed(123)
g <- barabasi.game(1000) %>%
as.undirected()
#Choose your favorite algorithm to find communities. The algorithm below is great for large networks but only works with undirected graphs
c_g <- fastgreedy.community(g)
#Collapse the graph by communities. This insight is due to this post http://stackoverflow.com/questions/35000554/collapsing-graph-by-clusters-in-igraph/35000823#35000823
res_g <- simplify(contract(g, membership(c_g)))
The result of this process is the below figure, where the vertices' names represent community membership.
plot(g, margin = -.5)
The above is clearly nicer than this hideous mess
plot(r_g, margin = -.5)
To link communities to original vertices you will need something akin to the following
mem <- data.frame(vertices = 1:vcount(g), memeber = as.numeric(membership(c_g)))
IMO this is a nice approach for two reasons. First, it can in theory deal with any size graph. The process of finding communities can be continuously repeated on collapsed graphs. Second, adopting a interactive approach would yield very readable results. For example, one can imagine the user being able to click on a vertex in the collapsed graph to expand that community revealing all of its original vertices.
I have looked around and found no good solution. My approach has been to remove nodes and play with edge transparency. It is more of a design solution rather than a technical one, but I've been able to plot gephi-like networks of up to 50,000 edges without much complications on my laptop.
with your example:
plot(simplify(g), vertex.size= 0.01,edge.arrow.size=0.001,vertex.label.cex = 0.75,vertex.label.color = "black" ,vertex.frame.color = adjustcolor("white", alpha.f = 0),vertex.color = adjustcolor("white", alpha.f = 0),edge.color=adjustcolor(1, alpha.f = 0.15),display.isolates=FALSE,vertex.label=ifelse(page_rank(g)$vector > 0.1 , "important nodes", NA))
Example of twitter mentions network with 30,000 edges:
Yet another interesting package is networkD3. There are a myriad of means of representing graphs within this library. In particular, I find the forceNetwork an interesting option. It is interactive and therefore allows you to really explore your network. It is great for EDA, but it maybe too "wiggly" for final work.
I tired this pacakge. It's very fast.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("netbiov")
https://www.bioconductor.org/packages/release/bioc/html/netbiov.html
Examples:
https://www.bioconductor.org/packages/release/bioc/vignettes/netbiov/inst/doc/netbiov-intro.pdf

accessing shape attribute for points when making NVD3 scatterChart with nplot/rCharts

How do you set the shape attribute for points when building a scatterChart with nplot from rCharts? Point size can be set by providing a column in the input dataframe named "size" but if there's a corresponding "shape" column consisting of strings such as "square" or "cross" the resulting graph still has the default circle points. New to R and NVD3 so I apologize for my lack of vocabulary.
It appears the newest version of nvd3 no longer works the same way as the old version. See for example. The screenshot shows shapes, and the data has shape:, but only circles are rendered in the actual chart. Also, the tests do not produce anything other than circles. I glanced at the source, and I could not find where or how to set shape. If you know how to do with nvd3, I could easily translate into a rCharts example.
I don't have a reputation of 50, but I'd like to comment.
Line 18 in this NVD3 example(Novus.github) shows how it's currently done. Likewise, all you need to do with the live code(nvd3.org) is uncomment the 'size' line in the data tab.
I attempted making a column in my df named 'shape', and using n1 <- nPlot(x~y, data=df, shape='shape', type='scatterChart'); n1$chart(onlyCircles=FALSE); and a number of other combinations. I've only spent the last two days working with rCharts but have made some exciting progress. I'm giving up on this but found it curious that these two examples weren't mentioned here, so I thought I'd mention them.
I know this question is a bit "ancient" but I faced the same problem and it took me a while to find out how to change the shapes.
I followed the approach in this example for changing the size:
nvd3 scatterPlot with rCharts in R: Vary size of points?
Here my solution:
library(rCharts)
df=data.frame(x=rep(0:2,3),y=c(rep(1,3),rep(2,3),rep(3,3)),
group=c(rep("a",3),rep("b",3),rep("c",3)),shape=rep("square",9))
p <- nPlot(y~x , group = 'group',data = df, type = 'scatterChart')
#In order to make it more pleasant to look at
p$chart(xDomain=c(-0.5,2.5))
p$chart(yDomain=c(0,4))
p$chart(sizeRange = c(200,200))
#here the magic
p$chart(scatter.onlyCircles = FALSE)
p$chart(shape = '#! function(d){return d.shape} !#')
p

Heatmap in R, representation of colors and removal of x-axis

I made a heatmap. The code I used is:
heatmap(t(data.matrix(survey)))
I don't need anything on x-axis. In plots, the following command would delete the numbers in x-axis:
xaxt='n'
Also, if I want to add a chart at the top (which tells about the representation of colors - like yellow means lower values and red means higher), how can I do that? I have no idea so I didn't even try. The only thing I can think of is 'scale' but that didn't work.
Lastly, I tried to change the color (green and red) and for that I used:
mycol = c("green","red")
heatmap(t(data.matrix(zscoreplus)),col=mycol)
Unlike 1st pic, there are no colors in between. (1st one had a lot more variety.) What I was trying to get was red, light red, reddish-green, green, dark green, etc...
p.s. For some reason, gplots and heatmap.2 are not installed and R can not find those packages.
Instead of using the basic heatmap() function, you could load the gplots package and use heatmap.2() - in your case same syntax - to get a color key. Let me know if you have any further questions about the heatmap.2() package.
EDIT:
Sorry, didn't read that you cannot install gplots. Is it because of limited admin rights?
Unfortunately, heatmap() is kind of limited regarding the color key.
But for the red -> green problem I have a solution for you. To create your own color palette, try
my_palette <- colorRampPalette(c("red", "green"))(n =
1000)
and then use it as color in your heat map:
heatmap(..., col = my_palette, ...)
How important is clustering in your case? If you don't need clustering, you can use the levelplot() function (comes with R), which has a nice color key representation.
EDIT2:
Regarding the color "scale" problem. I assumed that you mean something like legend according to the description in your first post. So is something like in the screenshot below that you want?
EDIT3
Regarding the x-labels:
Unfortunately, there is no direct option in heatmap.2() to turn those off. THose x-labels are the colnames for your matrix that you read in. By xlabel you would just control the general description of the axis (it is turned off by default). Here is a Screenshot that shows what I mean when xlabels is used:
Maybe you could just give your matrix empty ( " " ) colnames. That should help.
On the other hand, I am sorry to ask you this, but this doesn't make sense if you are using clustering. How would you know which is which?
An alternative solution is to simply crop the region or code from the pdf, or svg once you saved the heat map. Shouldn't take more than 5 seconds.
Regarding your problems to install gplots: You forgot the quotes.
require(gplots) Loading required package: gplots Warning message: In
library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE, : there is no package called ‘gplots’ >
install.packages(gplots) Error in install.packages : object 'gplots'
not found – ayesha malik 8 mins ago
Try
install.packages("gplots", dependencies = TRUE)

Resources