I'm trying to generate some plots of log-transformed fold-change data using heatmap.2 (code below).
I'd like to order the rows in the heatmap by the values in the last column (largest to smallest). The rows are being ordered automatically (I'm unsure the precise calculation used 'under the hood') and as shown in the image, there is some clustering being performed.
sample_data
gid 2hrs 4hrs 6hrs 8hrs
1234 0.5 0.75 0.9 2
2234 0 0 1.5 2
3234 -0.5 0.1 1 3
4234 -0.2 -0.2 0.4 2
5234 -0.5 1.2 1 -0.5
6234 -0.5 1.3 2 -0.3
7234 1 1.2 0.5 2
8234 -1.3 -0.2 2 1.2
9234 0.2 0.2 0.2 1
0123 0.2 0.2 3 0.5
code
data <- read.csv(infile, sep='\t',comment.char="#")
rnames <- data[,1] # assign labels in column 1 to "rnames"
mat_data <- data.matrix(data[,2:ncol(data)]) # transform columns into a matrix
rownames(mat_data) <- rnames # assign row names
# custom palette
my_palette <- colorRampPalette(c("turquoise", "yellow", "red"))(n = 299)
# (optional) defines the color breaks manually for a "skewed" color transition
col_breaks = c(seq(-4,-1,length=100), # for red
seq(-1,1,length=100), # for yellow
seq(1,4,length=100)) # for green
# plot data
heatmap.2(mat_data,
density.info="none", # turns off density plot inside color legend
trace="none", # turns off trace lines inside the heat map
margins =c(12,9), # widens margins around plot
col=my_palette, # use on color palette defined earlier
breaks=col_breaks, # enable color transition at specified limits
dendrogram='none', # only draw a row dendrogram
Colv=FALSE) # turn off column clustering
Plot
I'm wondering if anyone can suggest either how to turn off reordering so I can reorder my matrix by the last column and force this order to be used, or alternatively hack the heatmap.2 function to do this.
You are not specifying Rowv=FALSE and by default the rows are reordered (in heatmap.2 help, for parameter Rowv :
determines if and how the row dendrogram should be reordered. By
default, it is TRUE, which implies dendrogram is computed and
reordered based on row means. If NULL or FALSE, then no dendrogram is
computed and no reordering is done.
So if you want to have the rows ordered according to the last columns, you can do :
mat_data<-mat_data[order(mat_data[,ncol(mat_data)],decreasing=T),]
and then
heatmap.2(mat_data,
density.info="none",
trace="none",
margins =c(12,9),
col=my_palette,
breaks=col_breaks,
dendrogram='none',
Rowv=FALSE,
Colv=FALSE)
You will get the following image :
Related
I am making a volcano plot in R. I have a huge range of pvalues and log2fold changes. I set an xlim and ylim because I want to focus in on the central region of the plot. However, naturally setting my limits excludes some of my data. I would like to have the data outside of my axes limits displayed at my limits. So for example, a fold change of 4 would be displayed as a point just outside of my xlim of 2.
with(mydata, plot(ExpLogRatio, -log10(Expr_p_value), pch=20, main = "Volcano Plot",xlim=c(-2,2),ylim=c(0,40)))
this works but cuts out some of my datapoints (those with fold change above 2 and less than -2 and with pvalue of less than -log10(40)
if I understand correctly, I'd just use pmin and pmax to limit your values, e.g.:
values = seq(-3, 3, len=21)
pmin(pmax(values, -2), 2)
gives back:
[1] -2.0 -2.0 -2.0 -2.0 -1.8 -1.5 -1.2 -0.9 -0.6 -0.3 0.0 0.3 0.6 0.9 1.2
[16] 1.5 1.8 2.0 2.0 2.0 2.0
i.e. it's limited values to the range (-2, +2).
applying this to your data, you'd do something like:
with(mydata, {
lratio <- pmin(pmax(ExpLogRatio, -2.1), 2.1)
pch <- ifelse(ExpLogRatio == lratio, 20, 4)
plot(lratio, -log10(Expr_p_value), pch=pch, ylim=c(0, 40))
})
you'll probably want to set xlab and main to set titles, but I've not included that to keep the answer tidier. also extending this to the y-axis would obviously be easy
note I've also changed the plotting point style to indicate which points were truncated
I am not sure how to make a X-Y plot by R.
I have A B C datasets.
A dataset
ID Result
1.1 2
1.2 4
1.3 2.5
1.4 9
B dataset
ID Result
1.1 1
1.2 7
1.3 6
1.4 9
C dataset
ID Result
1.1 0.5
1.2 8
1.3 9
1.4 9
I want to make a plot X=result A , y=the result B, the other plot x=result A and Y=result C....
then A represented by red spots, B is black and C is blue for example. So the spot 1.1 should be x=2 and y=1 in red (A) and block (B). the spot 4,7, it means it is ID 1.2 in red and block.... The spot 9,9 it means is is ID 1.4 in the red and block.....
I try qqplots but I dont know how to make the X and Y correctly.
Thanks
ggplot2 is an excellent library for producing plots and there are many reference manuals online. Below is an answer to your question using the ggplot approach. The A,B,C data frames are unified into a single frame and the geom_point() for an x-y plot is used. The aes() sets the x and y coordinates (here you seem to seek to plot 'result' as both the x and y, if I understood the question?). The points are scaled by color, which is defined in the data frame as attributes A,B,C. Importantly, this variable must be a factor. The colors are defined by the manual color scale.
library(ggplot2)
dataA <- data.frame(ID=c(1.1,1.2,1.3),result=c(2,4,2.5),index=c(1,2,3),color="A")
dataB <- data.frame(ID=c(1.1,1.2,1.3),result=c(1,7,6),index=c(1,2,3),color="B")
dataC <- data.frame(ID=c(1.1,1.2,1.3),result=c(0.5,8,9),index=c(1,2,3),color="C")
data <- rbind(dataA,dataB,dataC)
data$color <- as.factor(data$color)
ggplot(data) +
geom_point(aes(x=result,y=result,color=color,size=10)) +
scale_color_manual(values=c("red", "black", "blue")) +
theme_bw()
I have 2 sets of data A and B, each with a y value for x=100, 200, 300. I want to create one graph which shows the difference between these two data sets. As such this means that for each x, there will be two boxplots(one for data A and one for data B).
for example, this is how the columns are organized in my data.
DataSet A
# x=100 200 300
1 2 3
1.1 2.1 3.1
1.2 2.2 3.2
1 2 3
1.01 2.01 3.01
DataSet B
# x=100 200 300
6 7 9
6.1 7.1 9.1
6.2 7.2 9.2
6 7 9
6.01 7.01 9.01
I was able to get two graphs out of this data using:
set style fill solid 0.25 border -1
set style boxplot outliers pointtype 7
set style data boxplot
set xtics ('100' 1, '200' 2, '300' 3)
plot for [i=1:3] "A.txt" using (i):i notitle
plot for [i=1:3] "B.txt" using (i):i notitle
However, I am facing issues when combining it into one.
Please help.
If you want to have them stacked above each other (in case they don't overlap), then you can just combine the two plot into one with
plot for [i=1:3] "A.txt" using (i):i notitle,\
for [i=1:3] "B.txt" using (i):i notitle
If they can overlap, you may want to put them side-by-side with
set boxwidth 0.3
plot for [i=1:3] "A.txt" using (i-0.15):i notitle,\
for [i=1:3] "B.txt" using (i+0.15):i notitle
Just to give two example of how you could combine those plots.
How can I make clustered rowstacked bars in gnuplot? It know how to get a clustered bars, but
not a cluster of rowstacked bars. Thanks!
Edit: in a cluster, stacked bars should use different colors/patterns as well.
I'm not completely sure how to go about doing this, but, one idea is to make it so that the boxes are touching each other
`set boxwidth 1`
That doesn't quite get you a "clustered" look yet -- To get a clustered look, I think you'd need to insert a row (maybe column) of zeros...(I haven't sorted through that one in my head yet) into your datafile where you want a cluster break.
Of course, you wouldn't need to set the boxwidth either I suppose...clustered just depends on the breaking every once in a while...
If I understand the original post right, it should be easy to accomplish with gnuplot if you can preprocess your data to offset x coordinates of specific data series.
To illustrate the approach I will use the following data in 3 data series:
# impulse.dat
0.9 1
1.9 4
2.9 3
3.9 5
1.0 1
2.0 2
3.0 4
4.0 2
1.1 3
2.1 3
3.1 5
4.1 4
Here each series has x-coordinates shifted by .1. To plot it I choose impulses of width 10.
plot [0:5] [0:6] 'impulse.dat' ind 0 w imp lw 10, \
'impulse.dat' ind 1 w imp lw 10, \
'impulse.dat' ind 2 w imp lw 10
Edit: to combine this with Matt's suggestion to use boxes would definitely be better:
set boxwidth 0.1
set fill solid
plot [0:5] [0:6] 'impulse.dat' ind 0 w boxes,\
'impulse.dat' ind 1 w boxes, \
'impulse.dat' ind 2 w boxes
Following is the picture with impulses.
I have a question about the package gplots. I want to use the function heatmap.2 and therefore I want to change my symmetric point in color key from 0 to 1. Normally when symkey=TRUE and you use the col=redgreen(), a colorbar is created where the colors are managed like this:
red = -2 to -0.5
black=-0.5 to 0.5
green= 0.5 to 2
Now i want to create a colorbar like this:
red= -1 to 0.8
black= 0.8 to 1.2
green= 1.2 to 3
Is something like this possible?
Thank you!
If you look at the heatmap.2 help file, it looks like you want the breaks argument. From the help file:
breaks (optional) Either a numeric vector indicating the splitting points for binning x into colors, or a integer number of break points to be used, in which case the break points will be spaced equally between min(x) and max(x)
So, you use breaks to specify the cutoff points for each colour. e.g.:
library(gplots)
# make up a bunch of random data from -1, -.9, -.8, ..., 2.9, 3
# 10x10
x = matrix(sample(seq(-1,3,by=.1),100,replace=TRUE),ncol=10)
# plot. We want -1 to 0.8 being red, 0.8 to 1.2 being black, 1.2 to 3 being green.
heatmap.2(x, col=redgreen, breaks=c(-1,0.8,1.2,3))
The crucial bit is the breaks=c(-1,0.8,1.2,3) being your cutoffs.