Related
I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:
I want to create a discrete legend (not continuous) in pheatmap. So for this code,
m=as.matrix(c(1:100))
breaks=c(1,5,10,50,80,100)
color=c("red","blue","green","yellow","orange")
pheatmap(m,cluster_rows=FALSE, cluster_cols=FALSE,breaks=breaks,color=color)
the legend looks like this:
But I want it to look like this where the size of each rectangle is the same:
Can you point me to the options in pheatmap that will make this possible? I cannot figure it out. Thank you muchly,
Well, the function itself really does not want to accommodate such a legend. There is no way to pass in any combination of arguments to make it discrete as far as I can tell and all the plotting functions it relies on seem to be locked so you can't really adjust their behavior.
But, the good news is that the function uses grid graphics to make the output. We can hack at the grid objects left on the grid tree to remove the legend they drew and draw our own. I've created a function to do this.
changeLegend<-function(breaks, color) {
tree <- grid.ls(viewport=T, print=F)
#find legend
legendvp <- tail( grep("GRID.VP", tree$name), 1)
#get rid of grobs in this viewport
drop <- tree$name[grepl(tree$vpPath[legendvp],tree$vpPath) &
grepl("grob",tree$type)]
sapply(drop, grid.remove)
#calculate size/position of labels
legend_pos = seq(0,to=1,length.out=length(breaks))
brat = seq(0,to=1,length.out=length(breaks))
h = 1/(length(breaks)-1)
#render legend
seekViewport(tree$name[legendvp])
grid.rect(x = 0, y = brat[-length(brat)],
width = unit(10, "bigpts"), height = h, hjust = 0,
vjust = 0, gp = gpar(fill = color, col = "#FFFFFF00"))
grid.text(breaks, x = unit(12, "bigpts"),
y = legend_pos, hjust = 0,)
}
Since they didn't really name any of their viewports, I had to make some guesses about which viewport contained which objects. I'm assuming the legend will always be the last viewport and that it will contain two globs, one for the box of color and one for the text in the legend. I remove those items, and then re-draw a new legend using the breaks and colors passed in. Here's how you would use that function with your sample
library(pheatmap)
library(grid)
mm <- as.matrix(c(1:100))
breaks <- c(1,5,10,50,80,100)
colors <- c("red","blue","green","yellow","orange")
pp<-pheatmap(mm,cluster_rows=FALSE, cluster_cols=FALSE,
breaks=breaks, color=colors, legend=T)
changeLegend(breaks, colors)
And that produces
Because we are hacking at undocumented grid objects, this might not be the most robust method, but it shows off how flexible grid graphics are
how do I reduce the gap between two guides in one plot. In the example below, the two guides are from a color and size scale and I want to change the gap between the two so that the title 'size' is right below the legend-point for 1. Design-wise, it might not make sense in this example but in my actual application it does.
df=data.frame(x=rnorm(100),y=rnorm(100),color=factor(rbinom(100,1,0.5)),size=runif(100))
ggplot(df,aes(x=x,y=y,color=color,size=size)) + geom_point()
Edit: Here is the plot. I would like to make the gap highlighted by the green line and the arrow smaller.
Now it seems to be possible using the theme parameters:
ggplot(df,aes(x=x,y=y,color=color,size=size)) + geom_point() +
theme(legend.spacing.y = unit(-0.5, "cm"))
You can also try to decrease margins of the legends:
legend.margin = margin(-0.5,0,0,0, unit="cm")
or older
legend.margin=unit(0, "cm")
I tried to play to customize legend or guide parameters but I can't find a solution. I hope give a solution using ggplot2 settings.
Here 2 solutions based on the gtable and grid packages.
for the gtable solution, the code is inspired from this question.
library(gtable)
# Data transformation
data <- ggplot_build(p)
gtable <- ggplot_gtable(data)
# Determining index of legends table
lbox <- which(sapply(gtable$grobs, paste) == "gtable[guide-box]")
# changing the space between the 2 legends: here -0.5 lines
guide <- gtable$grobs[[lbox]]
gtable$grobs[[lbox]]$heights <- unit.c(guide$heights[1:2],
unit(-.5,'lines'), ## you can the GAP here
guide$heights[4:5])
# Plotting
grid.draw(gtable)
Similar using the grid package ( we redraw in the viewport of the legend)
pp <- grid.get('guide',grep=T)
depth <- downViewport(pp$wrapvp$name)
guide <- grid.get('guide',grep=T)
grid.rect(gp=gpar(fill='white'))
guide$heights <- unit.c(guide$heights[1:2],unit(-0.2,'lines'),guide$heights[4],unit(0.1,'lines'))
grid.draw(guide)
upViewport(depth)
I wanted to ask for any general idea about plotting this kind of plot in R which can compare for example the overlaps of different methods listed on the horizontal and vertical side of the plot? Any sample code or something
Many thanks
A ggplot2-example:
# data generation
df <- matrix(runif(25), nrow = 5)
# bring data to long format
require(reshape2)
dfm <- melt(df)
# plot
require(ggplot2)
ggplot(dfm, aes(x = Var1, y = Var2)) +
geom_tile(aes(fill = value)) +
geom_text(aes(label = round(value, 2)))
The corrplot package and corrplot function in that package will create plots similar to what you show above, that may do what you want or give you a starting point.
If you want more control then you could plot the colors using the image function, then use the text function to add the numbers. You can either create the margins large enough to place the text in the margins, see the axis function for the common way to add text labels in the margin. Or you could leave enough space internally (maybe use rasterImage instead of image) and use text to do the labelling. Look at the xpd argument to par if you want to add the lines and the grconvertX and grconvertY functions to help with the coordinates of the line segents.
I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this: