I want to create a discrete legend (not continuous) in pheatmap. So for this code,
m=as.matrix(c(1:100))
breaks=c(1,5,10,50,80,100)
color=c("red","blue","green","yellow","orange")
pheatmap(m,cluster_rows=FALSE, cluster_cols=FALSE,breaks=breaks,color=color)
the legend looks like this:
But I want it to look like this where the size of each rectangle is the same:
Can you point me to the options in pheatmap that will make this possible? I cannot figure it out. Thank you muchly,
Well, the function itself really does not want to accommodate such a legend. There is no way to pass in any combination of arguments to make it discrete as far as I can tell and all the plotting functions it relies on seem to be locked so you can't really adjust their behavior.
But, the good news is that the function uses grid graphics to make the output. We can hack at the grid objects left on the grid tree to remove the legend they drew and draw our own. I've created a function to do this.
changeLegend<-function(breaks, color) {
tree <- grid.ls(viewport=T, print=F)
#find legend
legendvp <- tail( grep("GRID.VP", tree$name), 1)
#get rid of grobs in this viewport
drop <- tree$name[grepl(tree$vpPath[legendvp],tree$vpPath) &
grepl("grob",tree$type)]
sapply(drop, grid.remove)
#calculate size/position of labels
legend_pos = seq(0,to=1,length.out=length(breaks))
brat = seq(0,to=1,length.out=length(breaks))
h = 1/(length(breaks)-1)
#render legend
seekViewport(tree$name[legendvp])
grid.rect(x = 0, y = brat[-length(brat)],
width = unit(10, "bigpts"), height = h, hjust = 0,
vjust = 0, gp = gpar(fill = color, col = "#FFFFFF00"))
grid.text(breaks, x = unit(12, "bigpts"),
y = legend_pos, hjust = 0,)
}
Since they didn't really name any of their viewports, I had to make some guesses about which viewport contained which objects. I'm assuming the legend will always be the last viewport and that it will contain two globs, one for the box of color and one for the text in the legend. I remove those items, and then re-draw a new legend using the breaks and colors passed in. Here's how you would use that function with your sample
library(pheatmap)
library(grid)
mm <- as.matrix(c(1:100))
breaks <- c(1,5,10,50,80,100)
colors <- c("red","blue","green","yellow","orange")
pp<-pheatmap(mm,cluster_rows=FALSE, cluster_cols=FALSE,
breaks=breaks, color=colors, legend=T)
changeLegend(breaks, colors)
And that produces
Because we are hacking at undocumented grid objects, this might not be the most robust method, but it shows off how flexible grid graphics are
Related
I'm using tabplot package to visualize my data set. How is it possible to change the color of the space between the barcharts in tableplot? In the following graph the color of the spaces is white, how can we change it into another color? I marked the spaces with arrows in the attached plot
library(ggplot2)
library(tabplot)
data("diamonds")
tableplot(diamonds)
The question is far from being easy to answer. But I can give some of the ideas I find.
The problem is that tabplot-object are neither ggplot nor plot object.
So we must look at the viewport and grid.layout parameter of the graph. I'm not used to that type of things. My solution is :
library(ggplot2)
library(tabplot)
library(grid)
data("diamonds")
tplot <- tableplot(diamonds, plot = F)
grid.rect(gp=gpar(fill="red",col="black", alpha = 1))
plot(tplot, vp = viewport(width = 1, height = 1))
I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:
I've got a polar plot which uses geom_smooth(). The smoothed loess line though is very small and rings around the center of the plot. I'd like to "zoom in" so you can see it better.
Using something like scale_y_continuous(limits = c(-.05,.7)) will make the geom_smooth ring bigger, but it will also alter it because it will recompute with the datapoints limited by the limits = c(-.05,.7) argument.
For a Cartesian plot I could use something like coord_cartesian(ylim = c(-.05,.7)) which would clip the chart but not the underlying data. However I can see no way to do this with coord_polar()
Any ideas? I thought there might be a way to do this with grid.clip() in the grid package but I am not having much luck.
Any ideas?
What my plot looks like now, note "higher" red line:
What I'd like to draw:
What I get when I use scale_y_continuous() note "higher" blue line, also it's still not that big.
I haven't figured out a way to do this directly in coord_polar, but this can be achieved by modifying the ggplot_build object under the hood.
First, here's an attempt to make a plot like yours, using the fake data provided at the bottom of this answer.
library(ggplot2)
plot <- ggplot(data, aes(theta, values, color = series, group = series)) +
geom_smooth() +
scale_x_continuous(breaks = 30*-6:6, limits = c(-180,180)) +
coord_polar(start = pi, clip = "on") # use "off" to extend plot beyond axes
plot
Here, my Y (or r for radius) axis ranges from about -2.4 to 4.3.
We can confirm this by looking at the associated ggplot_build object:
# Create ggplot_build object and look at radius range
plot_build <- ggplot_build(plot)
plot_build[["layout"]][["panel_params"]][[1]][["r.range"]]
# [1] -2.385000 4.337039
If we redefine the range of r and plot that, we get what you're looking for, a close-up of the plot.
# Here we change the 2nd element (max) of r.range from 4.337 to 1
plot_build[["layout"]][["panel_params"]][[1]][["r.range"]][2] <- 1
plot2 <- ggplot_gtable(plot_build)
plot(plot2)
Note, this may not be a perfect solution, since this seems to introduce some image cropping issues that I don't know how to address. I haven't tested to see if those can be overcome using ggsave or perhaps by further modifying the ggplot_build object.
Sample data used above:
set.seed(4.2)
data <- data.frame(
series = as.factor(rep(c(1:2), each = 10)),
theta = rep(seq(from = -170, to = 170, length.out = 10), times = 2),
values = rnorm(20, mean = 0, sd = 1)
)
I use the gplots package to output barplots. I use it inside a for-loop, so rest of the code is omitted to make it more clear:
library("gplots")
pdf(file = "/Users/Tim/desktop/pgax.pdf", onefile = TRUE, paper = "special")
par(mfrow = c(4,2)) #figures arranged in 2 rows and 2 columns
par(las=2) #perpendicular labels on x-axis
barplot2(expression,ylab = expression(expression),main = graph.header, cex.names =0.85, beside = TRUE, offset = 0, xpd = FALSE,axis.lty = 0, cex.axis = 0.85, plot.ci = TRUE,ci.l = expression - sd.value, ci.u = expression + sd.value, col = colors,width = 1,names.arg = c(etc))
Now when I specify the papersize at a4, and print out in two columns the bars are made so they fill up the full space assigned. If I only have a few bars in each graws, the width is too big compared to the height. I know I should be using xlimit and width = amongst and perhaps even the aspect ratio?, but I can't get the results I wanted. And unconvenient way is to specify the height and width output of the paper, and manually I adjust it for the number of bars in the plots each time. But this doesnt seem appropiate. Does someone know a convenient way to fix width bars in my plots?
All help is much appreciated!
Although bar plots with wide bars can look silly, a wide bar plot with a few narrow bars in it will likely look even sillier. That leaves you specifying the width of the plot (via the width argument to pdf).
It may be prettiest to keep all your plots the same size, in which case you just give width a fixed value. If you do want narrower plots when there are less bars, you need a line of code like
plot_width <- 3 + 0.5 * nlevels(x_variable)
I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this: