R language - heat maps - r

I'm trying to build a heat map using R. I have a matrix of values (percentage) like:
<p>0 5 0 0 25 30</p>
<p>0 0 0 10 0 0</p>
<p>10 15 65 65 70</p>
and so on.
What I want to get - a heat map where the same values (across the whole matrix) would be represented by the same colour. But I still get a map where the colour for the zero value in the first row differs from the colour for the zero value in the second row etc.
Command that I used to build a heat map is:
my_heatmap <- heatmap(my_heat_matrix, Rowv=NA, Colv=NA, col = colors_01, margins=c(5,10))
UPD: Sorry, I found an answer.

I think you should try the scale="none" argument.
A reproducible example would have been helpful ...
z <- outer(1:10,1:10,"+")
heatmap(z,Rowv=NA,Colv=NA)
heatmap(z,Rowv=NA,Colv=NA,scale="none")
heatmap may qualify as the most annoying R graphics function because of its use of layout, which makes it impossible to arrange the plots on the page in any sensible way ... (filled.contour and the plots from the hexbin package share similar annoyances).

Related

Generating a fancy dotplot with R

i really like the dotplot function in clusterprofiler package. For some reasons related to the object this program creates by itself i cannot replicate this graph with my data.
So my question is, could someone point me to a similar dotplot package/function for achieving the same plot?
Its important to me to show that some of these "biological processes" are present in some clusters (x-axis) ad not in other, and that the colour of the dot is representing its importance (fold). The size of the dot would be represented by an integer.
here the data example i want to show.
thanks in advance.
biological process cluster1-fold cluster2-fold cluster3-fold cluster1-num cluster2-num cluster3-num
cell cycle 0 3 5 0 23 24
dna replication 4 2 0 43 22 0
here the plot i want to replicate

gnuplot with 2 x axis from data points

I have a data file, the data for y axis are in the third column. I would like to have the scale given by the first column on the x1 and by the second column on the x2. The standard way would be to:
plot data u 1:2 axes x1y1, data u 1:3 x2y1
But that creates two plots which is something I want to avoid. Of course one could make the above work with colours or with some other dirty tricks. It makes the whole plot code very cumbersome. Another nice way is to use multiplot as suggested here. But this is not really my goal, as I want to have the the real x2 axis.
Another way that came to my mind was to set x2range but that means going to the source file and figuring out the min and max or using some statistics in gnuplot (which feels like a waste of time for such a simple thing).
Is there any more simple and elegant way than the above ones? (I am especially concerned about the solution to be short to write, the plot can consist of several (>5) datasets and doing and I want to avoid plotting each dataset twice.
This can be done in this way, by telling gnuplot to re-scan file with 2nd column as x2 values but only invalid y-values for this second plot:
set xtics nomirror
set xrange [:] noextend
set x2tics
set x2range [:] noextend
plot '/tmp/f.gdat' u 1:3 w l, '' u 2:(1/0) ax x2y1
As an example, you can plot this data with Celsius on x and Fahrenheit on x2:
0 32 0
30 86 1
60 140 2
90 194 3
Note that this will only be sensible if column 2 is affinely linked with column 1. If you know the affine relation, using set link is much better.

Gnuplot histogram 3d

I'm looking for a way to plot histograms in 3d to produce something like this figure http://www.gnuplot.info/demo/surface1.17.png but where each series is a histogram.
I'm using the procedure given here https://stackoverflow.com/a/19596160 and http://www.gnuplotting.org/calculating-histograms/ to produce histograms, and it works perfectly in 2d.
Basically, the commands I use are
hist = 'u (binwidth*(floor(($2-binstart)/binwidth)+0.5)+binstart):(1) smooth freq w boxes
plot 'data.txt' #hist
Now I would just like to add multiple histograms in the same plot, but because they overlap in 2d, I would like to space them out in a 3d plot.
I have tried to do the following command (using above procedure)
hist = 'u (1):(binwidth*(floor(($2-binstart)/binwidth)+0.5)+binstart):(1) smooth freq w boxes
splot 'data.txt' #hist
But gnuplot complains that the z values are undefined.
I don't understand why this would not put a histogram along the value 1 on the x-axis with the bins along the y-axis, and plot the height on the z-axis.
My data is formatted simply in two columns:
Index angle
0 92.046
1 91.331
2 86.604
3 88.446
4 85.384
5 85.975
6 88.566
7 90.575
I have 10 files like this, and since the values in the files are close to each other, they will completely overlap if I plot them all in one 2d histogram. Therefore, I would like to see 10 histograms behind each other in a sort of 3d perspective.
This second answer is distinct from my first. Whereas the first addresses what the OP was trying to accomplish, this second provides an alternative approach which address the underlying problem the OP was trying to overcome.
I have posted an answer that addresses the ability to do this in 3d. However, this isn't usually the best way to do this with multiple histograms like this. A 3d graph like that will be difficult to compare.
We can address the overlap in 2D by stagnating the position of the boxes. With default settings, the boxes will spread out to touch. We can turn that off and adjust the position of the boxes to allow more than 1 histogram on a graph. Remember, that the coordinates you supply are the center of the boxes.
Suppose that I have the data you have provided and this additional data set
Index Angle
0 85.0804
1 92.2482
2 90.0384
3 99.2974
4 87.729
5 94.6049
6 86.703
7 97.9413
We can set the boxwidth to 2 units with set boxwidth 2 (your bins are 4 units wide). Additionally, we will turn on box filling with set style fill solid border lc black.
Then I can issue
plot datafile1 u (binwidth*(floor(($2-binstart)/binwidth)+0.5)+binstart):(1) smooth freq w boxes, \
datafile2 u (binwidth*(floor(($2-binstart)/binwidth)+0.5)+binstart+1):(1) smooth freq w boxes
The second plot command is identical to the first, except for the +1 after binstart. This will shift this box 1 unit to the right. This produces
Here, the two series are clear. Keeping track of which box is associated with each is easy because of the overlap, but it is not enough to mask the other series.
We can even move them next to each other, with no overlap, by subtracting 1 from the first plot command:
plot datafile1 u (binwidth*(floor(($2-binstart)/binwidth)+0.5)+binstart-1):(1) smooth freq w boxes, \
datafile2 u (binwidth*(floor(($2-binstart)/binwidth)+0.5)+binstart+1):(1) smooth freq w boxes
producing
This first answer is distinct from my second. This answer address what the OP was trying to accomplish whereas the second addresses the underlying problem the OP was trying to overcome.
Gnuplot isn't going to be able to do this on it's own, as the relevant styles (boxes and histograms) only work in 2D. You would have to do it using an external program.
For example, using your data and your 2d command (your first command), we get (using your data and the linked values of -100 and 4 for binstart and binwidth)
To draw these boxes on the 3d grid, we will need to use the line style and have four points for each: lower left, upper left, upper right, and lower right. We can use the previous command and capture to a table, but this will only gives the upper center point. We can use an external program to pre-process, however. The following python program, makehist.py, does just that.
from sys import argv
import re
from math import floor
pat = re.compile("\s+")
fname = argv[1]
binstart = float(argv[2])
binwidth = float(argv[3])
data = [tuple(map(float,pat.split(x.strip()))) for x in open(fname,"r").readlines()[1:]]
counts = {}
for x in data:
bn = binwidth*(floor((x[-1]-binstart)/binwidth)+0.5)+binstart
if not bn in counts: counts[bn] = 0
counts[bn]+=1
for x in sorted(counts.keys()):
count = counts[x]
print(x-binwidth/2,0)
print(x-binwidth/2,count)
print(x+binwidth/2,count)
print(x+binwidth/2,0)
print(max(counts.keys())+binwidth/2,0)
print(min(counts.keys())-binwidth/2,0)
Essentially, this program does the same thing as the smooth frequency option does, but instead of getting the upper center of each box, we get the four previously mentioned points along with two points to draw a line along the bottom of all the boxes.
Running the following command,
plot "< makehist.py data.txt -100 4" u 1:2 with lines
produces
which looks very similar to the original graph. We can use this in a 3d plot
splot "< makehist.py data.txt -100 4" u (1):1:2 with lines
which produces
This isn't all that pretty, but does lay the histogram out on a 3d plot. The same technique can be used to add multiple data files onto it spread out. For example, with the additional data
Index Angle
0 85.0804
1 92.2482
2 90.0384
3 99.2974
4 87.729
5 94.6049
6 86.703
7 97.9413
We can use
splot "< makehist.py data.txt -100 4" u (1):1:2 with lines, \
"< makehist.py data2.txt -100 4" u (2):1:2 with lines
to produce

Colour-coded 3D Plot in R

I am new to R, so can someone please help with this?
I have a data frame with 4 columns: x,y,z and freq. One row in this frame represents one point in 3D space (x,y,z are x-,y- and z- coordinates respectively) and it's frequency. I want to plot these points and make these points coloured such that the color is decided by the frequency. For eg: All points with frequency 0 are blue, between 1 and 5 are red, between 5 and 10 are orange, between 10 and 15 are yellow and so on. Some points can have a frequency of 0 also. But I don't know the range of frequency. Max no of colors to be used is 10. Also, there should be a scale explaining the meaning of colors.
I have been trying to correct the following code and make it work, but it`s just not working:
lev <- levels(factor(t$freq));
n <- as.numeric(lev);
n <- n+1;
plot3d(t$x,t$z,t$z,col=n);
Please help! Thank you.
PS- Please tell the solution using rgl package
PPS - I have been trying to manipulate the col arguement in plot3d function of rgl package, but I am unable to get the desired result.
I would load package rgl and do
plot3d(x,y,z, col=colors)
That means that you should prepare a list of color values that is of the same length as x,y,z lists so that you have a color selected for each x,y,z point.
the other part would be to make the list. I would try
givecolor = function(freq){
if(freq < 1) "red"
else if ....
}
colors = sapply(mydataframe[,"freq"], givecolor)
You just need to build a vector of colors that is the same length as the number of points you are plotting. You then pass this vector as the col argument to the rgl plot3d() function. See this page for a demonstration that uses the iris dataset: http://planspace.org/2013/02/03/pca-3d-visualization-and-clustering-in-r/
First you should select a palette you like and pick the number of colors you want, e.g. palette=rainbow(10). Then use a factor you get from splitting your data 10 ways to set your color from the palette.
See 3d scatterplot in R using rgl plot3d - different size for each data point? for how to effectively split a dataframe by a newly created factor. That question is w.r.t. size, but it also works for color.

How to generate medoid plots

Hi I am using partitioning around medoids algorithm for clustering using the pam function in clustering package. I have 4 attributes in the dataset that I clustered and they seem to give me around 6 clusters and I want to generate a a plot of these clusters across those 4 attributes like this 1: http://www.flickr.com/photos/52099123#N06/7036003411/in/photostream/lightbox/ "Centroid plot"
But the only way I can draw the clustering result is either using a dendrogram or using
plot (data, col = result$clustering) command which seems to generate a plot similar to this
[2] : http://www.flickr.com/photos/52099123#N06/7036003777/in/photostream "pam results".
Although the first image is a centroid plot I am wondering if there are any tools available in R to do the same with a medoid plot Note that it also prints the size of each cluster in the plot. It would be great to know if there are any packages/solutions available in R that facilitate to do this or if not what should be a good starting point in order to achieve plots similar to that in Image 1.
Thanks
Hi All,I was trying to work out the problem the way Joran told but I think I did not understand it correctly and have not done it the right way as it is supposed to be done. Anyway this is what I have done so far. Following is how the file looks like that I tried to cluster
geneID RPKM-base RPKM-1cm RPKM+4cm RPKMtip
GRMZM2G181227 3.412444267 3.16437442 1.287909035 0.037320722
GRMZM2G146885 14.17287135 11.3577013 2.778514642 2.226818648
GRMZM2G139463 6.866752401 5.373925806 1.388843962 1.062745344
GRMZM2G015295 1349.446347 447.4635291 29.43627879 29.2643755
GRMZM2G111909 47.95903081 27.5256729 1.656555758 0.949824883
GRMZM2G078097 4.433627458 0.928492841 0.063329249 0.034255945
GRMZM2G450498 36.15941083 9.45235616 0.700105077 0.194759794
GRMZM2G413652 25.06985426 15.91342458 5.372151214 3.618914949
GRMZM2G090087 21.00891969 18.02318412 17.49531186 10.74302155
following is the Pam clustering output
GRMZM2G181227
1
GRMZM2G146885
2
GRMZM2G139463
2
GRMZM2G015295
2
GRMZM2G111909
2
GRMZM2G078097
3
GRMZM2G450498
3
GRMZM2G413652
2
GRMZM2G090087
2
AC217811.3_FG003
2
Using the above two files I generated a third file that somewhat looks like this and has cluster information in the form of cluster type K1,K2,etc
geneID RPKM-base RPKM-1cm RPKM+4cm RPKMtip Cluster_type
GRMZM2G181227 3.412444267 3.16437442 1.287909035 0.037320722 K1
GRMZM2G146885 14.17287135 11.3577013 2.778514642 2.226818648 K2
GRMZM2G139463 6.866752401 5.373925806 1.388843962 1.062745344 K2
GRMZM2G015295 1349.446347 447.4635291 29.43627879 29.2643755 K2
GRMZM2G111909 47.95903081 27.5256729 1.656555758 0.949824883 K2
GRMZM2G078097 4.433627458 0.928492841 0.063329249 0.034255945 K3
GRMZM2G450498 36.15941083 9.45235616 0.700105077 0.194759794 K3
GRMZM2G413652 25.06985426 15.91342458 5.372151214 3.618914949 K2
GRMZM2G090087 21.00891969 18.02318412 17.49531186 10.74302155 K2
I certainly don't think that this is the file that joran would have wanted me to create but I could not think of anything else thus I ran lattice on the above file using the following code.
clusres<- read.table("clusinput.txt",header=TRUE,sep="\t");
jpeg(filename = "clusplot.jpeg", width = 800, height = 1078,
pointsize = 12, quality = 100, bg = "white",res=100);
parallel(~clusres[2:5]|Cluster_type,clusres,horizontal.axis=FALSE);
dev.off();
and I get a picture like this
Since I want one single line as the representative of the whole cluster at four different points this output is wrong moreover I tried playing with lattice but I can not figure out how to make it accept the Rpkm values as the X coordinate It always seems to plot so many lines against a maximum or minimum value at the Y coordinate which I don't understand what it is.
It will be great if anybody can help me out. Sorry If my question still seems absurd to you.
I do not know of any pre-built functions that generate the plot you indicate, which looks to me like a sort of parallel coordinates plot.
But generating such a plot would be a fairly trivial exercise.
Add a column of cluster labels (K1,K2, etc.) to your original data set, based on your clustering algorithm's output.
Use one of the many, many tools in R for aggregating data (plyr, aggregate, etc.) to calculate the relevant summary statistics by cluster on each of the four variables. (You haven't said what the first graph is actually plotting. Mean and sd? Median and MAD?)
Since you want the plots split into six separate panels, or facets, you will probably want to plot the data using either ggplot or lattice, both of which provide excellent support for creating the same plot, split across a single grouping vector (i.e. the clusters in your case).
But that's about as specific as anyone can get, given that you've provided so little information (i.e. no minimal runnable example, as recommended here).
How about using clusplot from package cluster with partitioning around medoids? Here is a simple example (from the example section):
require(cluster)
#generate 25 objects, divided into 2 clusters.
x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)))
clusplot(pam(x, 2)) #`pam` does you partitioning

Resources