I think I have a tricky case. I'm plotting the evolution plant disease levels in time, using geom_raster: x and y are arbitrary field coordinates, and z is the disease level measured at several time points, and I want to have each date plotted in a different facet.
So far, no problem. Below is a mock dataset and code:
library(ggplot2)
data <- data_frame(month=factor(rep(c("march","april","may","june"), each=100), levels=c("march","april","may","june")),
x=rep(rep(1:10, each=10), 4),
y=rep(rep(1:10, 10), 4),
z=c(rnorm(100, 0.5, 1), rnorm(100, 3, 1.5), rnorm(100, 6, 2), rnorm(100, 9, 1)))
ggplot(data, aes(x=x, y=y, fill=z)) +
geom_raster(color="white") +
scale_fill_gradient2(low="white", mid=mean(range(dat$z)), high="red") +
scale_x_discrete(limit=1:10, expand = c(0, 0)) +
scale_y_discrete(limit=1:10, expand = c(0, 0)) +
coord_equal() +
facet_wrap(~month)
But what I'd really like, is to have each facet rotated at a certain angle (for example 15°), to reflect the fact that my field is not oriented perfectly according to north (i.e., the top is not North, and bottom is not South).
Is there a possibility in ggplot2, or any grid-related tools, to do this automatically? Even an automatic way to savec individual facets to images, rotate them, and printing the rotated images on new page would be enough for my needs. Here's an example of image I would like to obtain (facets rotated 15° in an image editor):
http://imgur.com/RYJ3EaR
Here's a way to rotate the facets independently. We create a list containing a separate rotated plot for each level of month, and then use grid.arrange to lay out the four plots together. I've also removed the legend from the individual plots and plotted the legend separately. The code below includes a helper function to extract the legend.
I extract the legend object into the global environment within the lapply function below (not to mention repeating the extraction multiple times). There's probably a better way, but this way was quick.
library(gridExtra)
# Helper function to extract the legend from a ggplot
# Source: http://stackoverflow.com/questions/12539348/ggplot-separate-legend-and-plot
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
legend
}
# Create a list containing a rotated plot for each level of month
pl = lapply(unique(data$month), function(m) {
# Create a plot for the current level of month
p1 = ggplot(data[data$month==m,], aes(x=x, y=y, fill=z)) +
geom_raster(color="white") +
scale_fill_gradient2(low="white", high="red",
limits=c(floor(min(data$z)), ceiling(max(data$z)))) +
scale_x_discrete(limit=1:10, expand = c(0, 0)) +
scale_y_discrete(limit=1:10, expand = c(0, 0)) +
coord_equal() +
facet_wrap(~month)
# Extract legend into global environment
leg <<- g_legend(p1)
# Remove legend from plot
p1 = p1 + guides(fill=FALSE)
# Return rotated plot
editGrob(ggplotGrob(p1), vp=viewport(angle=-20, width=unit(0.85,"npc"),
height=unit(0.85,"npc")))
})
# Lay out the rotated plots and the legend and save to a png file
png("rotated.png", 1100, 1000)
grid.arrange(do.call(arrangeGrob, c(pl, ncol=2)),
leg, ncol=2, widths=c(0.9,0.1))
dev.off()
Related
I have a rather long timeseries that I want to plot in ggplot, but it's sufficiently long that even using the full width of the page it's barely readable.
What I want to do instead is to divide the plot into 2 (or more, in the general case) panels one on top of each other.
I could do it manually but not only it's cumbersome but also it's hard to get the axis to have the same scale. Ideally I would like to have something like this:
ggplot(data, aes(time, y)) +
geom_line() +
facet_time(time, n = 2)
And then get something like this:
(This plot was made using facet_wrap(~(year(as.Date(time)) > 2000), ncol = 1, scales = "free_x"), which messes up x axis scale, it works only for 2 panels, and doesn't work well with geom_smooth())
Also, ideally it would also handle summary statistics correctly. For example, using the correct data for geom_smooth() (so facetting wouldn't do it, because at the beginning of every facet it would not use the data in the last chunk of the previous one).
Is there a way to do this?
Thank you!
Below I create two separate plots, one for the period 1982-1999 and one for 1999-2016 and then lay them out using grid.arrange from the gridExtra package. The horizontal axes are scaled equivalently in both plots.
I also generate regression lines outside of ggplot using the loess function so that it can be added using geom_line (you can of course use any regression function here, such as lm, gam, splines, etc). With this approach the regression can be run on the entire time series, ensuring continuity of the regression line across the two panels, even though we break the time series into two halves for plotting.
library(dplyr) # For the chaining (%>%) operator
library(purrr) # For the map function
library(gridExtra) # For the grid.arrange function
Function to extract a legend from a ggplot. We'll use this to get one legend across two separate plots.
# http://stackoverflow.com/questions/12539348/ggplot-separate-legend-and-plot
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
legend
}
# Fake data
set.seed(255)
dat = data.frame(time=rep(seq(1982,2016,length.out=500),2),
value= c(arima.sim(list(ar=c(0.4, 0.05, 0.5)), n=500),
arima.sim(list(ar=c(0.3, -0.3, 0.6)), n=500)),
group=rep(c("A","B"), each=500))
Generate smoother lines using loess: We want a separate regression line for each level of group, so we use group_by with the chaining operator from dplyr:
dat = dat %>% group_by(group) %>%
mutate(smooth = predict(loess(value ~ time, span=0.1)))
Create a list of two plots, one for each time period: We use map to create separate plots for each time period and return a list with the two plot objects as elements (you can also use base lapply for this instead of map):
pl = map(list(c(1982,1999), c(1999,2016)),
~ ggplot(dat %>% filter(time >= .x[1], time <= .x[2]),
aes(colour=group)) +
geom_line(aes(time, value), alpha=0.5) +
geom_line(aes(time, smooth), size=1) +
scale_x_continuous(breaks=1982:2016, expand=c(0.01,0)) +
scale_y_continuous(limits=range(dat$value)) +
theme_bw() +
labs(x="", y="", colour="") +
theme(strip.background=element_blank(),
strip.text=element_blank(),
axis.title=element_blank()))
# Extract legend as a separate graphics object
leg = g_legend(pl[[1]])
Finally, we lay out both plots (after removing legends) plus the extracted legend:
grid.arrange(arrangeGrob(grobs=map(pl, function(p) p + guides(colour=FALSE)), ncol=1),
leg, ncol=2, widths=c(10,1), left="Value", bottom="Year")
You can do this by storing the plot object, then printing it twice. Each time add an option coord_cartesian:
orig_plot <- ggplot(data, aes(time, y)) +
geom_line()
early <- orig_plot + coord_cartesian(xlim = c(1982, 2000))
late <- orig_plot + coord_cartesian(xlim = c(2000, 2016))
That makes sure that both plots use all the data.
To plot them on the same page, use grid (I got this from the ggplot2 book, which is probably around as a pdf somewhere):
library(grid)
vp1 <- viewport(width = 1, height = .5, just = c("center", "bottom"))
vp2 <- viewport(width = 1, height = .5, just = c("center", "top"))
print(early, vp = vp1)
print(late, vp = vp2)
I would like to show in the same plot interpolated data and a histogram of the raw data of each predictor. I have seen in other threads like this one, people explain how to do marginal histograms of the same data shown in a scatter plot, in this case, the histogram is however based on other data (the raw data).
Suppose we see how price is related to carat and table in the diamonds dataset:
library(ggplot2)
p = ggplot(diamonds, aes(x = carat, y = table, color = price)) + geom_point()
We can add a marginal frequency plot e.g. with ggMarginal
library(ggExtra)
ggMarginal(p)
How do we add something similar to a tile plot of predicted diamond prices?
library(mgcv)
model = gam(price ~ s(table, carat), data = diamonds)
newdat = expand.grid(seq(55,75, 5), c(1:4))
names(newdat) = c("table", "carat")
newdat$predicted_price = predict(model, newdat)
ggplot(newdat,aes(x = carat, y = table, fill = predicted_price)) +
geom_tile()
Ideally, the histograms go even beyond the margins of the tileplot, as these data points also influence the predictions. I would, however, be already very happy to know how to plot a histogram for the range that is shown in the tileplot. (Maybe the values that are outside the range could just be added to the extreme values in different color.)
PS. I managed to more or less align histograms to the margins of the sides of a tile plot, using the method of the accepted answer in the linked thread, but only if I removed all kind of labels. It would be particularly good to keep the color legend, if possible.
EDIT:
eipi10 provided an excellent solution. I tried to modify it slightly to add the sample size in numbers and to graphically show values outside the plotted range since they also affect the interpolated values.
I intended to include them in a different color in the histograms at the side. I hereby attempted to count them towards the lower and upper end of the plotted range. I also attempted to plot the sample size in numbers somewhere on the plot. However, I failed with both.
This was my attempt to graphically illustrate the sample size beyond the plotted area:
plot_data = diamonds
plot_data <- transform(plot_data, carat_range = ifelse(carat < 1 | carat > 4, "outside", "within"))
plot_data <- within(plot_data, carat[carat < 1] <- 1)
plot_data <- within(plot_data, carat[carat > 4] <- 4)
plot_data$carat_range = as.factor(plot_data$carat_range)
p2 = ggplot(plot_data, aes(carat, fill = carat_range)) +
geom_histogram() +
thm +
coord_cartesian(xlim=xrng)
I tried to add the sample size in numbers with geom_text. I tried fitting it in the far right panel but it was difficult (/impossible for me) to adjust. I tried to put it on the main graph (which would anyway probably not be the best solution), but it didn’t work either (it removed the histogram and legend, on the right side and it did not plot all geom_texts). I also tried to add a third row of plots and writing it there. My attempt:
n_table_above = nrow(subset(diamonds, table > 75))
n_table_below = nrow(subset(diamonds, table < 55))
n_table_within = nrow(subset(diamonds, table >= 55 & table <= 75))
text_p = ggplot()+
geom_text(aes(x = 0.9, y = 2, label = paste0("N(>75) = ", n_table_above)))+
geom_text(aes(x = 1, y = 2, label = paste0("N = ", n_table_within)))+
geom_text(aes(x = 1.1, y = 2, label = paste0("N(<55) = ", n_table_below)))+
thm
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, text_p, ggplot(), widths=c(6,1), heights =c(6,1))
I would be very happy to receive help on either or both tasks (adding sample size as text & adding values outside plotted range in a different color).
Based on your comment, maybe the best approach is to roll your own layout. Below is an example. We create the marginal plots as separate ggplot objects and lay them out with the main plot. We also extract the legend and put it outside the marginal plots.
Set-up
library(ggplot2)
library(cowplot)
# Function to extract legend
#https://github.com/hadley/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend) }
thm = list(theme_void(),
guides(fill=FALSE),
theme(plot.margin=unit(rep(0,4), "lines")))
xrng = c(0.6,4.4)
yrng = c(53,77)
Plots
p1 = ggplot(newdat, aes(x = carat, y = table, fill = predicted_price)) +
geom_tile() +
theme_classic() +
coord_cartesian(xlim=xrng, ylim=yrng)
leg = g_legend(p1)
p1 = p1 + thm[-1]
p2 = ggplot(diamonds, aes(carat)) +
geom_line(stat="density") +
thm +
coord_cartesian(xlim=xrng)
p3 = ggplot(diamonds, aes(table)) +
geom_line(stat="density") +
thm +
coord_flip(xlim=yrng)
plot_grid(
plot_grid(plotlist=list(p2, ggplot(), p1, p3), ncol=2,
rel_widths=c(4,1), rel_heights=c(1,4), align="hv", scale=1.1),
leg, rel_widths=c(5,1))
UPDATE: Regarding your comment about the space between the plots: This is an Achilles heel of plot_grid and I don't know if there's a way to fix it. Another option is ggarrange from the experimental egg package, which doesn't add so much space between plots. Also, you need to save the output of ggarrange first and then lay out the saved object with the legend. If you run ggarrange inside grid.arrange you get two overlapping copies of the plot:
# devtools::install_github('baptiste/egg')
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, widths=c(6,1))
The code is as follows:
set.seed(123)
d1=data.frame(x=runif(10),y=runif(10),z=runif(10,1,10))
d2=data.frame(x=runif(10),y=runif(10),z=runif(10,100,1000))
ggplot()+geom_point(aes(x,y,size=z),data=d1)+
geom_line(aes(x,y,size=z),data=d2)
And the result is like this:
The size of points are too small so I want to change its size by scale_size. However, it seems both lines and points are influenced. So I wonder if there is a way to scale lines and points separately with a separate legend?
The two ways I can think of are 1) combining two legend grobs or 2) hacking another legend aesthetic. Both of these were mentioned by #Mike Wise in the comments above.
Approach #1: combining 2 separate legends in the same plot using grobs.
I used code from this answer to grab the legend. Baptiste's arrangeGrob vignette is a useful reference.
library(grid); library(gridExtra)
#Function to extract legend grob
g_legend <- function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
legend
}
#Create plots
p1 <- ggplot()+ geom_point(aes(x,y,size=z),data=d1) + scale_size(name = "point")
p2 <- ggplot()+ geom_line(aes(x,y,size=z),data=d2) + scale_size(name = "line")
p3 <- ggplot()+ geom_line(aes(x,y,size=z),data=d2) +
geom_point(aes(x,y, size=z * 100),data=d1) # Combined plot
legend1 <- g_legend(p1)
legend2 <- g_legend(p2)
legend.width <- sum(legend2$width)
gplot <- grid.arrange(p3 +theme(legend.position = "none"), legend1, legend2,
ncol = 2, nrow = 2,
layout_matrix = rbind(c(1,2 ),
c(1,3 )),
widths = unit.c(unit(1, "npc") - legend.width, legend.width))
grid.draw(gplot)
Note for printing: use arrangeGrob() instead of grid.arrange(). I had to use png; grid.draw; dev.off to save the (arrangeGrob) plot.
Approach #2: hacking another aesthetic legend.
MilanoR has a great post on this, focusing on colour instead of size.
More SO examples: 1) discrete colour and 2) colour gradient.
#Create discrete levels for point sizes (because points will be mapped to fill)
d1$z.bin <- findInterval(d1$z, c(0,2,4,6,8,10), all.inside= TRUE) #Create bins
#Scale the points to the same size as the lines (points * 100).
#Map points to a dummy aesthetic (fill)
#Hack the fill properties.
ggplot()+ geom_line(aes(x,y,size=z),data=d2) +
geom_point(aes(x,y, size=z * 100, fill = as.character(z.bin)),data=d1) +
scale_size("line", range = c(1,5)) +
scale_fill_manual("points", values = rep(1, 10) ,
guide = guide_legend(override.aes =
list(colour = "black",
size = sort(unique(d1$z.bin)) )))
I'm a noob in programming, but you could try this methode. As you see, my code uses points and paths. I define a vector of the length of number of paths. My lines have the size 1. Then I add the sizes of my points at the back of that vector.
size_vec<-c(rep(1, length(unique(Data$Satellite))), 1.4, 4.6, 4.2, 5.5)
plot <- ggplot(data) +
geom_point(aes(x = x_cor, y = y_cor, shape=Type, size=Type)) +
geom_path(aes(x = x_cor, y = y_cor, group = Tour, size=factor(Satellite))) +
scale_size_manual(values = size_vec, guide ='none')
When I compile the following MWE I observe that the maximum point (3,5) is significantly cut/cropped by the margins.
The following example is drastically reduced for simplicity.
In my actual data the following are all impacted by limiting my coord_cartesian manually if the coresponding x-axis aesthetic is on the max x value.
Point symbol
Error bars
Statistical symbols inserted by text annotation
MWE
library(ggplot2)
library("grid")
print("Program started")
n = c(0.1,2, 3, 5)
s = c(0,1, 2, 3)
df = data.frame(n, s)
gg <- ggplot(df, aes(x=s, y=n))
gg <- gg + geom_point(position=position_dodge(width=NULL), size = 1.5)
gg <- gg + geom_line(position=position_dodge(width=NULL))
gg <- gg + coord_cartesian( ylim = c(0, 5), xlim = c((-0.05)*3, 3));
print(gg)
print("Program complete - a graph should be visible.")
To show my data appropriately I would consider using any of the following that are possible (influenced by the observation that the x-axis labels themselves are never cut):
Make the margin transparent so the point isn't cut
unless the point is cut by the plot area and not the margin
Bring the panel with the plot area to the front
unless the point is cut by the plot area and not the margin so order is independent
Use xlim = c((-0.05)*3, (3*0.05)) to extend the axis range but implement some hack to not show the overhanging axis bar after the maximum point of 3?
this is how I had it originally but I was told to remove the overhang after the 3 as it was unacceptable.
Is this what you mean by option 1:
gg <- ggplot(df, aes(x=s, y=n)) +
geom_point(position=position_dodge(width=NULL), size = 3) +
geom_line(position=position_dodge(width=NULL)) +
coord_cartesian(xlim=c(0,3), ylim=c(0,5))
# Turn of clipping, so that point at (3,5) is not clipped by the panel grob
gg1 <- ggplot_gtable(ggplot_build(gg))
gg1$layout$clip[gg1$layout$name=="panel"] <- "off"
grid.draw(gg1)
I would like two separate plots. I am using them in different frames of a beamer presentation and I will add one line to the other (eventually, not in example below). Thus I do not want the presentation to "skip" ("jump" ?) from one slide to the next slide. I would like it to look like the line is being added naturally. The below code I believe shows the problem. It is subtle, but not how the plot area of the second plot is slightly larger than of the first plot. This happens because of the y axis label.
library(ggplot2)
dfr1 <- data.frame(
time = 1:10,
value = runif(10)
)
dfr2 <- data.frame(
time = 1:10,
value = runif(10, 1000, 1001)
)
p1 <- ggplot(dfr1, aes(time, value)) + geom_line() + scale_y_continuous(breaks = NULL) + scale_x_continuous(breaks = NULL) + ylab(expression(hat(z)==hat(gamma)[1]*time+hat(gamma)[4]*time^2))
print(p1)
dev.new()
p2 <- ggplot(dfr2, aes(time, value)) + geom_line() + scale_y_continuous(breaks = NULL) + scale_x_continuous(breaks = NULL) + ylab(".")
print(p2)
I would prefer to not have a hackish solution such as setting the size of the axis label manually or adding spaces on the x-axis (see one reference below), because I will use this technique in several settings and the labels can change at any time (I like reproducibility so want a flexible solution).
I'm searched a lot and have found the following:
Specifying ggplot2 panel width
How can I make consistent-width plots in ggplot (with legends)?
https://groups.google.com/forum/#!topic/ggplot2/2MNoYtX8EEY
How can I add variable size y-axis labels in R with ggplot2 without changing the plot width?
They do not work for me, mainly because I need separate plots, so it is not a matter of aligning them virtically on one combined plot as in some of the above solutions.
haven't tried, but this might work,
gl <- lapply(list(p1,p2), ggplotGrob)
library(grid)
widths <- do.call(unit.pmax, lapply(gl, "[[", "widths"))
heights <- do.call(unit.pmax, lapply(gl, "[[", "heights"))
lg <- lapply(gl, function(g) {g$widths <- widths; g$heights <- heights; g})
grid.newpage()
grid.draw(lg[[1]])
grid.newpage()
grid.draw(lg[[2]])
How about using this for p2:
p2 <- ggplot(dfr2, aes(time, value)) + geom_line() +
scale_y_continuous(breaks = NULL) +
scale_x_continuous(breaks = NULL) +
ylab(expression(hat(z)==hat(gamma)[1]*time+hat(gamma)[4]*time^2)) +
theme(axis.title.y=element_text(color=NA))
This has the same label as p1, but the color is NA so it doesn't display. You could also use color="white".