grouping of axis labels ggplot2 - r

I am trying to bluid a plot with ggplot2 where on the X-axis I could find some way of having a label for groups of variables. Here is a minimal version of my code:
Bzero <-100*matrix(runif(100),ncol=10,nrow=10)
B <-99
LNtype <-c(1,1,1,1,2,2,2,3,3,3)
LNnames <-c('grp1','grp2','grp3')
tB <-t(Bzero)/(B+1)
dfB <-data.frame(tB)
dfB$grp <-LNtype
dfB$vid <-1:nrow(tB)
mB0 <- melt(dfB,id.vars=c('grp','vid'))
mB0 <- mB0[order(mB0$grp,mB0$vid),]
gg0 <- ggplot(mB0,aes(x=vid,y=variable))
gg0 <- gg0 + geom_tile(aes(fill = value),colour = "white")
gg0 <- gg0 + scale_fill_gradient(low = "green", high = "red",na.value='white',limits=c(0,1),name='p0i')
gg0 <- gg0 + xlab('Equation')+ylab('Covariate')
Here's the resulting plot:
And here is what I'd like to have:
I have been tinkering with the scale, breaks, and labels to no avail. Even a massive amount of googling did reveal any plot with that kind of axis. Is there any way to get what I want?

You can replace numbers with groups using scale_x_continuous() and setting breaks at desired positions. With geom_segment() you can add those black lines to group data.
gg0+
geom_segment(aes(x=0.5,y=0.5,xend=10.5,yend=0.5))+
geom_segment(aes(x=c(0.5,4.5,7.5,10.5),
xend=c(0.5,4.5,7.5,10.5),y=rep(0.5,4),yend=rep(1,4)))+
scale_x_continuous("",breaks=c(2.5,6,9),labels=c("Group1","Group2","Group3"))

Related

How to change the legend from geom_area to color in geom_line [duplicate]

I have a graph of wind speeds against direction which has a huge numeber of points, and so am using alpha=I(1/20) in addition to color=month
Here is a sample of code:
library(RMySQL)
library(ggplot2)
con <- dbConnect(...)
wind <- dbGetQuery(con, "SELECT speed_w/speed_e AS ratio, dir_58 as dir, MONTHNAME(timestamp) AS month, ROUND((speed_w+speed_e)/2) AS speed FROM tablename;");
png("ratio-by-speed.png",height=400,width=1200)
qplot(wind$dir,wind$ratio,ylim=c(0.5,1.5),xlim=c(0,360),color=wind$month,alpha=I(1/30),main="West/East against direction")
dev.off()
This produces a decent graph, however my issue is that the alpha of the legend is 1/30th also, which makes it unreadable. Is there a way I can force the legend to be 1 alpha instead?
Here is an example:
Update With the release of version 0.9.0, one can now override aesthetic values in the legend using override.aes in the guides function. So if you add something like this to your plot:
+ guides(colour = guide_legend(override.aes = list(alpha = 1)))
that should do it.
I've gotten around this by doing a duplicate call to the geom using an empty subset of the data and using the legend from that call. Unfortunately, it doesn't work if the data frame is actually empty (e.g. as you'd get from subset(diamonds,FALSE)) since ggplot2 seems to treat this case the same as it treats NULL in place of a data frame. But we can get the same effect by taking a subset with only one row and setting it to NaN on one of the plot dimensions, which will prevent it from getting plotted.
Based off Chase's example:
# Alpha parameter washes out legend:
gp <- ggplot() + geom_point(data=diamonds, aes(depth, price, colour=clarity), alpha=0.1)
print(gp)
# Full color legend:
dummyData <- diamonds[1, ]
dummyData$price <- NaN
#dummyData <- subset(diamonds, FALSE) # this would be nicer but it doesn't work!
gp <- ggplot() +
geom_point(data=diamonds, aes(depth, price, colour=clarity), alpha=0.1, legend=FALSE) +
geom_point(data=dummyData, aes(depth, price, colour=clarity), alpha=1.0, na.rm=TRUE)
print(gp)
A bit of googling turned up this post which doesn't seem to indicate that ggplot currently supports this option. Others have addressed related problems by using gridExtra and using viewPorts as discussed here.
I'm not that sophisticated, but here's one approach that should give you the desired results. The approach is to plot the geom twice, once without an alpha parameter and outside of the real plotting area. The second geom will include the alpha parameter and suppress the legend. We will then specify the plotting region with xlim and ylim. Given that you are a lot of points, this will roughly double the plotting time, but should give you the effect you are after.
Using the diamonds dataset:
#Alpha parameter washes out legend
ggplot(data = diamonds, aes(depth, price, colour = clarity)) +
geom_point(alpha = 1/10)
#Fully colored legend
ggplot() +
geom_point(data = diamonds, aes(depth, price, colour =clarity), alpha = 1/10, legend = FALSE) +
geom_point(data = diamonds, aes(x = depth - 999999, y = price - 999999, colour = clarity)) +
xlim(40, 80) + ylim(0, 20000)

How to fill histogram with color gradient?

I have a simple problem. How to plot histogram with ggplot2 with fixed binwidth and filled with rainbow colors (or any other palette)?
Lets say I have a data like that:
myData <- abs(rnorm(1000))
I want to plot histogram, using e.g. binwidth=.1. That however will cause different number of bins, depending on data:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
If I knew number of bins (e.g. n=15) I'd use something like:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill=rainbow(n))
But with changing number of bins I'm kind of stuck on this simple problem.
If you really want the number of bins flexible, here is my little workaround:
library(ggplot2)
gg_b <- ggplot_build(
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
)
nu_bins <- dim(gg_b$data[[1]])[1]
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill = rainbow(nu_bins))
In case the binwidth is fixed, here is an alternative solution which is using the internal function ggplot2:::bin_breaks_width() to get the number of bins before creating the graph. It's still a workaround but avoids to call geom_histogram() twice as in the other solution:
# create sample data
set.seed(1L)
myData <- abs(rnorm(1000))
binwidth <- 0.1
# create plot
library(ggplot2) # CRAN version 2.2.1 used
n_bins <- length(ggplot2:::bin_breaks_width(range(myData), width = binwidth)$breaks) - 1L
ggplot() + geom_histogram(aes(x = myData), binwidth = binwidth, fill = rainbow(n_bins))
As a third alternative, the aggregation can be done outside of ggplot2. Then, geom_col() cam be used instead of geom_histogram():
# start binning on multiple of binwidth
start_bin <- binwidth * floor(min(myData) / binwidth)
# compute breaks and bin the data
breaks <- seq(start_bin, max(myData) + binwidth, by = binwidth)
myData2 <- cut(sort(myData), breaks = breaks, by = binwidth)
ggplot() + geom_col(aes(x = head(breaks, -1L),
y = as.integer(table(myData2)),
fill = levels(myData2))) +
ylab("count") + xlab("myData")
Note that breaks is plotted on the x-axis instead of levels(myData2) to keep the x-axis continuous. Otherwise each factor label would be plotted which would clutter the x-axis. Also note that the built-in ggplot2 color palette is used instead of rainbow().

Line up columns of bar graph with points of line plot with ggplot

Is there any way to line up the points of a line plot with the bars of a bar graph using ggplot when they have the same x-axis? Here is the sample data I'm trying to do it with.
library(ggplot2)
library(gridExtra)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line()
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity")
grid.arrange(no, yes)
Here is the output:
The first point of the line plot is to the left of the first bar, and the last point of the line plot is to the right of the last bar.
Thank you for your time.
Extending #Stibu's post a little: To align the plots, use gtable (Or see answers to your earlier question)
library(ggplot2)
library(gtable)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line() +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity") +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
gYes = ggplotGrob(yes) # get the ggplot grobs
gNo = ggplotGrob(no)
plot(rbind(gNo, gYes, size = "first")) # Arrange and plot the grobs
Edit To change heights of plots:
g = rbind(gNo, gYes, size = "first") # Combine the plots
panels <- g$layout$t[grepl("panel", g$layout$name)] # Get the positions for plot panels
g$heights[panels] <- unit(c(0.7, 0.3), "null") # Replace heights with your relative heights
plot(g)
I can think of (at least) two ways to align the x-axes in the two plots:
The two axis do not align because in the bar plot, the geoms cover the x-axis from 0.5 to 27.5, while in the other plot, the data only ranges from 1 to 27. The reason is that the bars have a width and the points don't. You can force the axex to align by explicitly specifying an x-axis range. Using the definitions from your plot, this can be achieved by
yes <- yes + scale_x_continuous(limits=c(0,28))
no <- no + scale_x_continuous(limits=c(0,28))
grid.arrange(no, yes)
limits sets the range of the x-axis. Note, though, that the alginment is still not quite perfect. The y-axis labels take up a little more space in the upper plot, because the numbers have two digits. The plot looks as follows:
The other solution is a bit more complicated but it has the advantage that the x-axis is drawn only once and that ggplot makes sure that the alignment is perfect. It makes use of faceting and the trick described in this answer. First, the data must be combined into a single data frame by
all <- rbind(data.frame(other_data,type="other"),data.frame(data,type="data"))
and then the plot can be created as follows:
ggplot(all,aes(x=x,y=y)) + facet_grid(type~.,scales = "free_y") +
geom_bar(data=subset(all,type=="other"),stat="identity") +
geom_point(data=subset(all,type=="data")) +
geom_line(data=subset(all,type=="data"))
The trick is to let the facets be constructed by the variable type which was used before to label the two data sets. But then each geom only gets the subset of the data that should be drawn with that specific geom. In facet_grid, I also used scales = "free_y" because the two y-axes should be independent. This plot looks as follows:
You can change the labels of the facets by giving other names when you define the data frame all. If you want to remove them alltogether, then add the following to your plot:
+ theme(strip.background = element_blank(), strip.text = element_blank())

geom_historgram() versus hist() : controlling ranges in geom

Consider the following two plots
library(ggplot2)
set.seed(666)
bigx <- data.frame(x=sample(1:12,50,replace=TRUE))
ggplot(bigx, aes(x=x)) +
geom_histogram(fill = "red", colour =
"black",stat="bin",binwidth=2) +
ylab("Frequency") +
xlab("things") +
ylim(c(0,30))
hist(bigx$x)
Why do I get the overhang above 12 on ggplot? When i play with right = TRUE this just shifts the overhang to below zero. I want the simple and simply bounded result from hist() but using ggplot2.
How can I do this?
If your goal is to reproduce the output of hist(...) using ggplot, this will work:
ggplot(bigx, aes(x=x)) +
geom_histogram(fill = "red", colour = "black",stat="bin",
binwidth=2, right=TRUE) +
scale_x_continuous(limits=c(0,12),breaks=seq(0,12,2))
Or, more generally, this:
brks <- hist(bigx$x, plot=F)$breaks
ggplot(bigx, aes(x=x)) +
geom_histogram(fill = "red", colour = "black",stat="bin",
breaks=brks, right=TRUE) +
scale_x_continuous(limits=range(brks),breaks=brks)
Evidently, the ggplot default for histograms is to use right-closed intervals, whereas the default for hist(...) is left closed intervals. Also, ggplot uses a different algorithm for calculating the x-axis breaks and limits.

How to adjust the ordering of labels in the default legend in ggplot2 so that it corresponds to the order in the data

I am plotting a forest plot in ggplot2 and am having issues with the ordering of the labels in the legend matching the order of the labels in the data set. Here is my code below.
data code
d<-data.frame(x=c("Co-K(W) N=720", "IH-K(W) N=67", "IF-K(W) N=198", "CO-K(B)N=78", "IH-K(B) N=13", "CO=A(W) N=874","D-Sco Ad(W) N=346","DR-Ad (W) N=892","CE_A(W) N=274","CO-Ad(B) N=66","D-So Ad(B) N=215","DR-Ad(B) N=123","CE-Ad(B) N=79"),
y = rnorm(13, 0, 0.1))
d <- transform(d, ylo = y-1/13, yhi=y+1/13)
d$x <- factor(d$x, levels=rev(d$x)) # reverse ordering
forest plot code
credplot.gg <- function(d){
# d is a data frame with 4 columns
# d$x gives variable names
# d$y gives center point
# d$ylo gives lower limits
# d$yhi gives upper limits
require(ggplot2)
p <- ggplot(d, aes(x=x, y=y, ymin=ylo, ymax=yhi,group=x,colour=x,)) +
geom_pointrange(size=1) +
theme_bw() +
scale_color_discrete(name="Sample") +
coord_flip() +
theme(legend.key=element_rect(fill='cornsilk2')) +
guides(colour = guide_legend(override.aes = list(size=0.5))) +
geom_hline(aes(x=0), colour = 'red', lty=2) +
xlab('Cohort') + ylab('CI') + ggtitle('Forest Plot')
return(p)
}
credplot.gg(d)
This is what I get. As you can see the labels on the y axis matches the labels in the order that it is in the data. However, it is not the same order in the legend. I'm not sure how to correct this. This is my first time creating a plot in ggplot2. Any feedback is well appreciated.Thanks in advanced
Nice plot, especially for a first ggplot! I've not tested, but I think all you need is to add reverse=TRUE inside your colour's guide_legend(found this in the Cookbook for R).
If I were to make one more comment, I'd say that ordering your vertical factor by numeric value often makes comparisons easier when alphabetical order isn't particularly meaningful. (Though maybe your alpha order is meaningful.)

Resources