Plotting heatmap with R and clustering - r

hello everyone I am trying to plot the heat map wanted cluster the plot and plot is not looking good wanted change the color i am newbie can any one tell me how can I plot heat-map with clustering values which are showing similar pattern cluster together
my data data_link
what i tried simply tried to log normalize the data and plot the graph
library(ggplot2)
library(reshape2)
mydata=read.table("Test_data", sep="\t", header=TRUE)
melted_cormat <- melt(mydata)
head(melted_cormat)
melted_cormat$new=log2(1+melted_cormat$value)
ggplot(data = melted_cormat, aes(x=variable, y=ID, fill=new)) +
geom_tile()
is it posible increase each value cell size like below
image
please suggest me
Thank you

You can make a heatmap from this data, but I don't think it will be a very good way to visualize this much data. You have 287 rows in mydata, which means you will have 287 rows in your plot. This will make the individual rows difficult to make out, and it will make labelling of the y axis impossible.
The other issue is that approximately 99% of your values are under 1000, yet your highest value is almost 6000. That means that the scaling of your fill is going to be extremely uneven. It will be difficult to see much detail in the lower ranges.
If you want to see clustering you could use pheatmap instead of ggplot2, and I would probably do a log transform on the fill scale to reveal the details better. However, the problem with simply having too much data on a single plot persists.
mymatrix <- log(as.matrix(mydata[,-1]))
mymatrix[mymatrix < 0] <- 0
pheatmap::pheatmap(mymatrix)
EDIT
If you only plotted the first 10 rows of data, you can see this is more clearly like a heatmap:
pheatmap(as.matrix(mydata[1:10,-1]))
Or the first 30 rows:
pheatmap(as.matrix(mydata[1:30,-1]))

Related

How to set heigth of rows grid in graph lines on ggplots (R)?

I'm trying plots a graph lines using ggplot library in R, but I get a good plots but I need reduce the gradual space or height between rows grid lines because I get big separation between lines.
This is my R script:
library(ggplot2)
library(reshape2)
data <- read.csv('/Users/keepo/Desktop/G.Con/Int18/input-int18.csv')
chart_data <- melt(data, id='NRO')
names(chart_data) <- c('NRO', 'leyenda', 'DTF')
ggplot() +
geom_line(data = chart_data, aes(x = NRO, y = DTF, color = leyenda), size = 1)+
xlab("iteraciones") +
ylab("valores")
and this is my actual graphs:
..the first line is very distant from the second. How I can reduce heigth?
regards.
The lines are far apart because the values of the variable plotted on the y-axis are far apart. If you need them closer together, you fundamentally have 3 options:
change the scale (e.g. convert the plot to a log scale), although this can make it harder for people to interpret the numbers. This can also change the behavior of each line, not just change the space between the lines. I'm guessing this isn't what you will want, ultimately.
normalize the data. If the actual value of the variable on the y-axis isn't important, just standardize the data (separately for each value of leyenda).
As stated above, you can graph each line separately. The main drawback here is that you need 3 graphs where 1 might do.
Not recommended:
I know that some graphs will have the a "squiggle" to change scales or skip space. Generally, this is considered poor practice (and I doubt it's an option in ggplot2 because it masks the true separation between the data points. If you really do want a gap, I would look at this post: axis.break and ggplot2 or gap.plot? plot may be too complexe
In a nutshell, the answer here depends on what your numbers mean. What is the story you are trying to tell? Is the important feature of your plots the change between them (in which case, normalizing might be your best option), or the actual numbers themselves (in which case, the space is relevant).
you could use an axis transformation that maps your data to the screen in a non-linear fashion,
fun_trans <- function(x){
d <- data.frame(x=c(800, 2500, 3100), y=c(800,1950, 3100))
model1 <- lm(y~poly(x,2), data=d)
model2 <- lm(x~poly(y,2), data=d)
scales::trans_new("fun",
function(x) as.vector(predict(model1,data.frame(x=x))),
function(x) as.vector(predict(model2,data.frame(y=x))))
}
last_plot() + scale_y_continuous(trans = "fun")
enter image description here

Coloring scatterplot in R based on fold enrichment

I'm very new to R and have tried to search around for an answer to my question, but couldn't find quite what I was looking for (or I just couldn't figure out the right keywords to include!). I think this is a fairly common task in R though, I am just very new.
I have a x vs y scatterplot and I want to color those points for which there is at least a 2-fold enrichment, ie where x/y>=2 . Since my values are expressed as log2 values, the the transformed value needs to be x/y>=4.
I currently have the scatterplot plotted with
plot(log2(counts[,40], log2(counts[,41))
where counts is a .csv imported files and 40 & 41 are my columns of interested.
I've also created a column for fold change using
counts$fold<-counts[,41]/counts[,40]
I don't know how to incorporate these two pieces of information... Ultimately I want a graph that looks something like the example here: http://s17.postimg.org/s3k1w8r7j/error_messsage_1.png
where those points that are at least two-fold enriched will colored in blue.
Any help would be greatly appreciated. Thanks!
Is this what you're looking for:
# Fake data
dat = data.frame(x=runif(100,0,50), y = rnorm(100, 10, 2))
plot(dat$x, dat$y, col=ifelse(dat$x/dat$y > 4, "blue", "red"), pch=16)
The ifelse statement creates a vector of "blue" and "red" (or whatever colors you want) based on the values of dat$x/dat$y and plot uses that to color the points.
This might be helpful if you've never worked with colors in R.
Another option is to use ggplot2 instead of base graphics. Here's an example:
library(ggplot2)
ggplot(dat, aes(x,y, colour=cut(x/y, breaks=c(-1000,4,1000),
labels=c("<=4",">4")))) +
geom_point(size=5) +
labs(colour="x/y")

How to plot heatmap with multiple categories in a single cell with ggplot2?

How to plot heatmap with multiple categories in a single cell with ggplot2? Heatmap plot of categorical variables could be done with this code
#data
datf <- data.frame(indv=factor(paste("ID", 1:20),
levels =rev(paste("ID", 1:20))), matrix(sample(LETTERS[1:7], 400, T), ncol = 20))
library(ggplot2);
library(reshape2)
# converting data to long form for ggplot2 use
datf1 <- melt(datf, id.var = 'indv')
ggplot(datf1, aes(variable, indv)) + geom_tile(aes(fill = value),
colour = "white") + scale_fill_manual(values= rainbow (7))
The codes came from here:
http://rgraphgallery.blogspot.com/2013/04/rg54-heatmap-plot-of-categorical.html
But what about multiple categories in a single cell like this? Is it possible to use triangle or other shape as a cell?
http://postimg.org/image/4dudrv0nz/
copy from biostar as Alex Reynolds suggested.
For those interested, this apperas to be Figure 2 from Exome sequencing identifies mutation in CNOT3 and ribosomal genes RPL5 and RPL10 in T-cell acute lymphoblastic leukemia.
I wanted to create a similar plot with ggplot and geom_tile for a bigger collection of genes (few hundreds) but finally decided to use geom_points instead to provide additional information per cell (tile). Also it looks to me a lot like this plot was generated in Excel or some other spreadsheet software (maybe along those lines https://www.youtube.com/watch?v=0s5OiRMMzuY). The colors in the cells (tiles) do not match those in the legend (suggesting that they have been added separately and not automatically) and there appears to be an erroneous cell (diagonal separating colors -upper left to lower right - different from diagonal in black color - lower left to upper right -).
Hence, my concluding two cents: Doing this automatically is probably very time-consuming and in my opinion makes only sense if you want to do this repeatedly, e.g., on data that is subject to change or on multiple datasets, and/or if you have a larger collections of genes.
Otherwise, following the instructions in the youtube video for a rather small number of cells is likely to be more efficient. Or use geom_point (similar to Adding points to a geom_tile layer in ggplot2 or
Marking specific tiles in geom_tile() / geom_raster()
) to represent information about an additional category (variable).
In any case, should anyone have other suggestions on how to automatically create such a figure, I am more than happy to hear about that.

Clustering and heatmap in R

I am a newbie to R and I am trying to do some clustering on a data table where rows represent individual objects and columns represent the features that have been measured for these objects. I've worked through some clustering tutorials and I do get some output, however, the heatmap that I get after clustering does not correspond at all to the heatmap produced from the same data table with another programme. While the heatmap of that programme does indicate clear differences in marker expression between the objects, my heatmap doesn't show much differences and I cannot recognize any clustering (i.e., colour) pattern on the heatmap, it just seems to be a randomly jumbled set of colours that are close to each other (no big contrast). Here is an example of the code I am using, maybe someone has an idea on what I might be doing wrong.
mydata <- read.table("mydata.csv")
datamat <- as.matrix(mydata)
datalog <- log(datamat)
I am using log values for the clustering because I know that the other programme does so, too
library(gplots)
hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete")
mycl <- cutree(hr, k=7)
mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)]
heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA,
col=colorpanel(40, "black","yellow","green"),
scale="column", RowSideColors=mycol)
Again, I plot the original colours but use the log-clusters because I know that this is what the other programme does.
I tried to play around with the methods, but I don't get anything that would at least somehow look like a clustered heatmap. When I take out the scaling, the heatmap becomes extremely dark (and I am actually quite sure that I have somehow to scale or normalize the data by column). I also tried to cluster with k-means, but again, this didn't help. My idea was that the colour scale might not be used completely because of two outliers, but although removing them slightly increased the range of colours plotted on the heatmap, this still did not reveal proper clusters.
Is there anything else I could play around with?
And is it possible to change the colour scale with heatmap so that outliers are found in the last bin that has a range of "everything greater than a particular value"? I tried to do this with heatmap.2 (argument "breaks"), but I didn't quite succeed and also I didn't manage to put the row side colours that I use with the heatmap function.
If you are okay with using heatmap.2 from the gplots package that will allow you to add breaks to assign colors to ranges represented in your heatmap.
For example if you had 3 colors blue, white, and red with the values going from low to high you could do something like this:
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
In this case you have 3 sets of values that correspond to the 3 colors, the values will differ of course depending on what values you have with your data.
One thing you are doing in your program is to call hclust on your data then to call heatmap on it, however if you look in the heatmap manual page it states:
Defaults to hclust.
So I don't think you need to do that. You might want to take a look at some similar questions that I had asked that might help to point you in the right direction:
Heatmap Question 1
Heatmap Question 2
If you post an image of the heatmap you get and an image of the heatmap that the other program is making it will be easier for us to help you out more.

How to make 3D plots with categorical data in R?

I've been trying to create a 3D bar plot based on categorical data, but have not found a way.
It is simple to explain. Consider the following example data (the real example is more complex, but it reduces to this), showing the relative risk of incurring something broken down by income and age, both categorical data.
I want to display this in a 3D bar plot (similar in idea to http://demos.devexpress.com/aspxperiencedemos/NavBar/Images/Charts/ManhattanBar.jpg). I looked at the scatterplot3d package, but it's only for scatter plots and doesn't handle categorical data well. I was able to make a 3d chart, but it shows dots instead of 3d bars. There is no chart type for what I need. I've also tried the rgl package, but no luck either. I've been googling for more than an hour now and haven't found a solution. I have a copy of the ggplot2 - Elegant Graphics for Data Analysis book as well, but ggplot2 doesn't have this kind of chart.
Is there another freeware app I could use? OpenOffice 3.2 doesn't have this chart either.
Thank you for any hints.
Age,Income,Risk
young,high,1
young,medium,1.2
young,low,1.36
adult,high,1
adult,medium,1.12
adult,low,1.23
old,high,1
old,medium,1.03
old,low,1.11
I'm not sure how to make a 3d chart in R, but there are other, better ways to represent this data than with a 3d bar chart. 3d charts make interpretation difficult, because the heights of the bars and then skewed by the 3d perspective. In that example chart, it's hard to tell if Wisconsin in 2004 is really higher than Wisconsin 2001, or if that's an effect of the perspective. And if it is higher, how much so?
Since both Age and Income have meaningful orders, it wouldn't be awful to make a line graph. ggplot2 code:
ggplot(data, aes(Age, Risk, color = Income))+
geom_line(aes(group = Income))
Or, you could make a heatmap.
ggplot(data, aes(Age, Income, fill = Risk)) +
geom_tile()
Like the others suggested there are better ways to present this, but this should get you started if you want something similar to what you had.
df <- read.csv(textConnection("Age,Income,Risk
young,high,1
young,medium,1.2
young,low,1.36
adult,high,1
adult,medium,1.12
adult,low,1.23
old,high,1
old,medium,1.03
old,low,1.11
"))
df$Age <- ordered(df$Age, levels=c('young', 'adult', 'old'))
df$Income <- ordered(df$Income, levels=c('low', 'medium', 'high'))
library(rgl)
plot3d(Risk ~ Age|Income, type='h', lwd=10, col=rainbow(3))
This will just produce flat rectangles. For an example to create nice looking bars, see demo(hist3d).
You can find a starting point here but you need to add in more lines and some rectangles to get a plot like you posted.

Resources