Histogram with spaces using continuous data (width=... doesn't work) - r

I'm trying to plot a histogram using ggplot which has some space between the bars.
This is no problem with discrete data:
b= data.frame(x=sample(LETTERS[1:3],size=50, replace=T))
ggplot(b, aes(x=x)) + geom_bar(width=.3)
However, using continuous data, width seems to have no effect.
a= data.frame(x=rnorm(100))
ggplot(a, aes(x=x, width=.5)) +
geom_bar(width=.3, binwidth=1)
How can a histogram with spaced bars be archived for continuous data?

I think doing this is a really bad idea (and ggplot2 doesn't support it).
Here is one possibility:
breaks <- pretty(range(a$x), n = 6, min.n = 1)
mids <- 0.5 * (breaks[-1L] + breaks[-length(breaks)])
ggplot(a, aes(x = cut(x, breaks = breaks, labels = mids))) +
geom_bar(width=.3)

Related

Weird behavior of ggplot combined with fill and scale_y_log10()

I'm trying to produce a histogram with ggplot's geom_histogram which colors the bars according to a gradient, and log10's them.
Here's the code:
library(ggplot2)
set.seed(1)
df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F)
bins <- 10
cols <- c("darkblue","darkred")
colGradient <- colorRampPalette(cols)
cut.cols <- colGradient(bins)
df$cut <- cut(df$val,bins)
df$cut <- factor(df$cut,level=unique(df$cut))
Then,
ggplot(data=df,aes_string(x="val",y="..count..+1",fill="cut"))+
geom_histogram(show.legend=FALSE)+
scale_color_manual(values=cut.cols,labels=levels(df$cut))+
scale_fill_manual(values=cut.cols,labels=levels(df$cut))+
scale_y_log10()
gives:
whereas dropping the fill from the aesthetics:
ggplot(data=df,aes_string(x="val",y="..count..+1"))+
geom_histogram(show.legend=FALSE)+
scale_color_manual(values=cut.cols,labels=levels(cuts))+
scale_fill_manual(values=cut.cols,labels=levels(cuts))+
scale_y_log10()
gives:
Any idea why do the histogram bars differ between the two plots and to make the first one similar to the second one?
The OP is trying to produce a histogram with ggplot's geom_histogram which colors the bars according to a gradient...
The OP has already done the binning (with 10 bins) but is then calling geom_histogram() which does a binning on its own using 30 bins by default (see ?geomhistogram).
When geom_bar() is used instead together with cutinstead of val
ggplot(data = df, aes_string(x = "cut", y = "..count..+1", fill = "cut")) +
geom_bar(show.legend = FALSE) +
scale_color_manual(values = cut.cols, labels = levels(df$cut)) +
scale_fill_manual(values = cut.cols, labels = levels(df$cut)) +
scale_y_log10()
the chart becomes:
Using geom_histogram() with filled bars is less straightforward as can be seen in this and this answer to the question How to fill histogram with color gradient?

How to fill histogram with color gradient?

I have a simple problem. How to plot histogram with ggplot2 with fixed binwidth and filled with rainbow colors (or any other palette)?
Lets say I have a data like that:
myData <- abs(rnorm(1000))
I want to plot histogram, using e.g. binwidth=.1. That however will cause different number of bins, depending on data:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
If I knew number of bins (e.g. n=15) I'd use something like:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill=rainbow(n))
But with changing number of bins I'm kind of stuck on this simple problem.
If you really want the number of bins flexible, here is my little workaround:
library(ggplot2)
gg_b <- ggplot_build(
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
)
nu_bins <- dim(gg_b$data[[1]])[1]
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill = rainbow(nu_bins))
In case the binwidth is fixed, here is an alternative solution which is using the internal function ggplot2:::bin_breaks_width() to get the number of bins before creating the graph. It's still a workaround but avoids to call geom_histogram() twice as in the other solution:
# create sample data
set.seed(1L)
myData <- abs(rnorm(1000))
binwidth <- 0.1
# create plot
library(ggplot2) # CRAN version 2.2.1 used
n_bins <- length(ggplot2:::bin_breaks_width(range(myData), width = binwidth)$breaks) - 1L
ggplot() + geom_histogram(aes(x = myData), binwidth = binwidth, fill = rainbow(n_bins))
As a third alternative, the aggregation can be done outside of ggplot2. Then, geom_col() cam be used instead of geom_histogram():
# start binning on multiple of binwidth
start_bin <- binwidth * floor(min(myData) / binwidth)
# compute breaks and bin the data
breaks <- seq(start_bin, max(myData) + binwidth, by = binwidth)
myData2 <- cut(sort(myData), breaks = breaks, by = binwidth)
ggplot() + geom_col(aes(x = head(breaks, -1L),
y = as.integer(table(myData2)),
fill = levels(myData2))) +
ylab("count") + xlab("myData")
Note that breaks is plotted on the x-axis instead of levels(myData2) to keep the x-axis continuous. Otherwise each factor label would be plotted which would clutter the x-axis. Also note that the built-in ggplot2 color palette is used instead of rainbow().

Line up columns of bar graph with points of line plot with ggplot

Is there any way to line up the points of a line plot with the bars of a bar graph using ggplot when they have the same x-axis? Here is the sample data I'm trying to do it with.
library(ggplot2)
library(gridExtra)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line()
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity")
grid.arrange(no, yes)
Here is the output:
The first point of the line plot is to the left of the first bar, and the last point of the line plot is to the right of the last bar.
Thank you for your time.
Extending #Stibu's post a little: To align the plots, use gtable (Or see answers to your earlier question)
library(ggplot2)
library(gtable)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line() +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity") +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
gYes = ggplotGrob(yes) # get the ggplot grobs
gNo = ggplotGrob(no)
plot(rbind(gNo, gYes, size = "first")) # Arrange and plot the grobs
Edit To change heights of plots:
g = rbind(gNo, gYes, size = "first") # Combine the plots
panels <- g$layout$t[grepl("panel", g$layout$name)] # Get the positions for plot panels
g$heights[panels] <- unit(c(0.7, 0.3), "null") # Replace heights with your relative heights
plot(g)
I can think of (at least) two ways to align the x-axes in the two plots:
The two axis do not align because in the bar plot, the geoms cover the x-axis from 0.5 to 27.5, while in the other plot, the data only ranges from 1 to 27. The reason is that the bars have a width and the points don't. You can force the axex to align by explicitly specifying an x-axis range. Using the definitions from your plot, this can be achieved by
yes <- yes + scale_x_continuous(limits=c(0,28))
no <- no + scale_x_continuous(limits=c(0,28))
grid.arrange(no, yes)
limits sets the range of the x-axis. Note, though, that the alginment is still not quite perfect. The y-axis labels take up a little more space in the upper plot, because the numbers have two digits. The plot looks as follows:
The other solution is a bit more complicated but it has the advantage that the x-axis is drawn only once and that ggplot makes sure that the alignment is perfect. It makes use of faceting and the trick described in this answer. First, the data must be combined into a single data frame by
all <- rbind(data.frame(other_data,type="other"),data.frame(data,type="data"))
and then the plot can be created as follows:
ggplot(all,aes(x=x,y=y)) + facet_grid(type~.,scales = "free_y") +
geom_bar(data=subset(all,type=="other"),stat="identity") +
geom_point(data=subset(all,type=="data")) +
geom_line(data=subset(all,type=="data"))
The trick is to let the facets be constructed by the variable type which was used before to label the two data sets. But then each geom only gets the subset of the data that should be drawn with that specific geom. In facet_grid, I also used scales = "free_y" because the two y-axes should be independent. This plot looks as follows:
You can change the labels of the facets by giving other names when you define the data frame all. If you want to remove them alltogether, then add the following to your plot:
+ theme(strip.background = element_blank(), strip.text = element_blank())

Making adjustments to a forest plot using ggplot2

I'm trying to create a forest plot in R from meta-analysis results. However, I'm having difficulties adjusting the line thickness & the center points as well as getting rid of the automatic legend and creating my own legend.
#d is a data frame with 4 columns
#d$x gives variable names
#d$y gives center point
#d$ylo gives lower limits
#d$yhi gives upper limits
#data
d <- data.frame(x = toupper(letters[1:10]),
y = rnorm(10, 0, 0.1))
d <- transform(d, ylo = y-1/10, yhi=y+1/10)
d$x <- factor(d$x, levels=rev(d$x)) #Reverse ordering in the way that it's is in the
#function
credplot.gg <- function(d){
require(ggplot2)
p <- ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi,group=x,colour=x))+
geom_pointrange()+ theme_bw()+ coord_flip()+
guides(color=guide_legend(title="Cohort"))+
geom_hline(aes(x=0),colour = 'red', lty=1)+
xlab('Cohort') + ylab('Beta') + ggtitle('rs6467890_CACNA2D1')
return(p)
}
credplot.gg(d)
The issues that I'm having are:
when insert "size" into ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi, group=x,colour=x), size=1.5) the line and points are extremely large
How do I get rid of the legend that is automatically generated with the plot and how do I create my own legend?
I'm fairly new to r so and any help is gladly appreciated

Align multiple ggplot2 plots with grid

Context
I want to plot two ggplot2 on the same page with the same legend. http://code.google.com/p/gridextra/wiki/arrangeGrob discribes, how to do this. This already looks good. But... In my example I have two plots with the same x-axis and different y-axis. When the range of the the y-axis is at least 10 times higher than of the other plot (e.g. 10000 instead of 1000), ggplot2 (or grid?) does not align the plots correct (see Output below).
Question
How do I also align the left side of the plot, using two different y-axis?
Example Code
x = c(1, 2)
y = c(10, 1000)
data1 = data.frame(x,y)
p1 <- ggplot(data1) + aes(x=x, y=y, colour=x) + geom_line()
y = c(10, 10000)
data2 = data.frame(x,y)
p2 <- ggplot(data2) + aes(x=x, y=y, colour=x) + geom_line()
# Source: http://code.google.com/p/gridextra/wiki/arrangeGrob
leg <- ggplotGrob(p1 + opts(keep="legend_box"))
legend=gTree(children=gList(leg), cl="legendGrob")
widthDetails.legendGrob <- function(x) unit(3, "cm")
grid.arrange(
p1 + opts(legend.position="none"),
p2 + opts(legend.position="none"),
legend=legend, main ="", left = "")
Output
A cleaner way of doing the same thing but in a more generic way is by using the formatter arg:
p1 <- ggplot(data1) +
aes(x=x, y=y, colour=x) +
geom_line() +
scale_y_continuous(formatter = function(x) format(x, width = 5))
Do the same for your second plot and make sure to set the width >= the widest number you expect across both plots.
1. Using cowplot package:
library(cowplot)
plot_grid(p1, p2, ncol=1, align="v")
2. Using tracks from ggbio package:
Note: There seems to be a bug, x ticks do not align. (tested on 17/03/2016, ggbio_1.18.5)
library(ggbio)
tracks(data1=p1,data2=p2)
If you don't mind a shameless kludge, just add an extra character to the longest label in p1, like this:
p1 <- ggplot(data1) +
aes(x=x, y=y, colour=x) +
geom_line() +
scale_y_continuous(breaks = seq(200, 1000, 200),
labels = c(seq(200, 800, 200), " 1000"))
I have two underlying questions, which I hope you'll forgive if you have your reasons:
1) Why not use the same y axis on both? I feel like that's a more straight-forward approach, and easily achieved in your above example by adding scale_y_continuous(limits = c(0, 10000)) to p1.
2) Is the functionality provided by facet_wrap not adequate here? It's hard to know what your data structure is actually like, but here's a toy example of how I'd do this:
library(ggplot2)
# Maybe your dataset is like this
x <- data.frame(x = c(1, 2),
y1 = c(0, 1000),
y2 = c(0, 10000))
# Molten data makes a lot of things easier in ggplot
x.melt <- melt(x, id.var = "x", measure.var = c("y1", "y2"))
# Plot it - one page, two facets, identical axes (though you could change them),
# one legend
ggplot(x.melt, aes(x = x, y = value, color = x)) +
geom_line() +
facet_wrap( ~ variable, nrow = 2)

Resources