Dividing long time series in multiple panels with ggplot2 - r

I have a rather long timeseries that I want to plot in ggplot, but it's sufficiently long that even using the full width of the page it's barely readable.
What I want to do instead is to divide the plot into 2 (or more, in the general case) panels one on top of each other.
I could do it manually but not only it's cumbersome but also it's hard to get the axis to have the same scale. Ideally I would like to have something like this:
ggplot(data, aes(time, y)) +
geom_line() +
facet_time(time, n = 2)
And then get something like this:
(This plot was made using facet_wrap(~(year(as.Date(time)) > 2000), ncol = 1, scales = "free_x"), which messes up x axis scale, it works only for 2 panels, and doesn't work well with geom_smooth())
Also, ideally it would also handle summary statistics correctly. For example, using the correct data for geom_smooth() (so facetting wouldn't do it, because at the beginning of every facet it would not use the data in the last chunk of the previous one).
Is there a way to do this?
Thank you!

Below I create two separate plots, one for the period 1982-1999 and one for 1999-2016 and then lay them out using grid.arrange from the gridExtra package. The horizontal axes are scaled equivalently in both plots.
I also generate regression lines outside of ggplot using the loess function so that it can be added using geom_line (you can of course use any regression function here, such as lm, gam, splines, etc). With this approach the regression can be run on the entire time series, ensuring continuity of the regression line across the two panels, even though we break the time series into two halves for plotting.
library(dplyr) # For the chaining (%>%) operator
library(purrr) # For the map function
library(gridExtra) # For the grid.arrange function
Function to extract a legend from a ggplot. We'll use this to get one legend across two separate plots.
# http://stackoverflow.com/questions/12539348/ggplot-separate-legend-and-plot
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
legend
}
# Fake data
set.seed(255)
dat = data.frame(time=rep(seq(1982,2016,length.out=500),2),
value= c(arima.sim(list(ar=c(0.4, 0.05, 0.5)), n=500),
arima.sim(list(ar=c(0.3, -0.3, 0.6)), n=500)),
group=rep(c("A","B"), each=500))
Generate smoother lines using loess: We want a separate regression line for each level of group, so we use group_by with the chaining operator from dplyr:
dat = dat %>% group_by(group) %>%
mutate(smooth = predict(loess(value ~ time, span=0.1)))
Create a list of two plots, one for each time period: We use map to create separate plots for each time period and return a list with the two plot objects as elements (you can also use base lapply for this instead of map):
pl = map(list(c(1982,1999), c(1999,2016)),
~ ggplot(dat %>% filter(time >= .x[1], time <= .x[2]),
aes(colour=group)) +
geom_line(aes(time, value), alpha=0.5) +
geom_line(aes(time, smooth), size=1) +
scale_x_continuous(breaks=1982:2016, expand=c(0.01,0)) +
scale_y_continuous(limits=range(dat$value)) +
theme_bw() +
labs(x="", y="", colour="") +
theme(strip.background=element_blank(),
strip.text=element_blank(),
axis.title=element_blank()))
# Extract legend as a separate graphics object
leg = g_legend(pl[[1]])
Finally, we lay out both plots (after removing legends) plus the extracted legend:
grid.arrange(arrangeGrob(grobs=map(pl, function(p) p + guides(colour=FALSE)), ncol=1),
leg, ncol=2, widths=c(10,1), left="Value", bottom="Year")

You can do this by storing the plot object, then printing it twice. Each time add an option coord_cartesian:
orig_plot <- ggplot(data, aes(time, y)) +
geom_line()
early <- orig_plot + coord_cartesian(xlim = c(1982, 2000))
late <- orig_plot + coord_cartesian(xlim = c(2000, 2016))
That makes sure that both plots use all the data.
To plot them on the same page, use grid (I got this from the ggplot2 book, which is probably around as a pdf somewhere):
library(grid)
vp1 <- viewport(width = 1, height = .5, just = c("center", "bottom"))
vp2 <- viewport(width = 1, height = .5, just = c("center", "top"))
print(early, vp = vp1)
print(late, vp = vp2)

Related

how to add a fitted distribution to a histogram

i am trying to add a fitted distribution to the histogram, but after I run it, it is just a straight line. How can i get a density line?
hist(data$price) lines(density(data$price)), lwd = 2, col ="red")
You are using graphics function hist. Use MASS function truehist instead
MASS::truehist(data$price)
lines(density(data$price)), lwd = 2, col ="red")
#Chriss gave a good solution--it does produce a density curve on top of the histogram; however, it changes the y-axis so that you only see the density values (losing the count values).
Here is an alternate solution that will place the frequency counts on the left-side y-axis and add density as a right-side y-axis. Tweak code as needed for things like bins, color, etc. I'm using the mtcars data as an example since there was no code or data provided in the question to replicate. In addition to the two libraries used here (ggpubr and cowplot), you may need to use some ggplot functions to better customize these plot options.
Code for this solution was modified from https://www.datanovia.com/en/blog/ggplot-histogram-with-density-curve-in-r-using-secondary-y-axis/
# packages needed
library(ggpubr)
library(cowplot)
# load data (none provided in the original question)
data("mtcars")
# create histogram (I have 10 bins here, but you may need a different amount)
phist <- gghistogram(mtcars, x="hp", bins=10, fill="blue", ylab="Count (blue)") + ggtitle("Car Horsepower Histogram")
# create density plot, removing many plot elements
pdens <- ggdensity(mtcars, x="hp", col="red", size=2, alpha = 0, ylab="Density (red)") +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)), position = "right") +
theme_half_open(11, rel_small = 1) +
rremove("x.axis")+
rremove("xlab") +
rremove("x.text") +
rremove("x.ticks") +
rremove("legend")
# overlay and display the plots
aligned_plots <- align_plots(phist, pdens, align="hv", axis="tblr")
ggdraw(aligned_plots[[1]]) + draw_plot(aligned_plots[[2]])

How to plot histograms of raw data on the margins of a plot of interpolated data

I would like to show in the same plot interpolated data and a histogram of the raw data of each predictor. I have seen in other threads like this one, people explain how to do marginal histograms of the same data shown in a scatter plot, in this case, the histogram is however based on other data (the raw data).
Suppose we see how price is related to carat and table in the diamonds dataset:
library(ggplot2)
p = ggplot(diamonds, aes(x = carat, y = table, color = price)) + geom_point()
We can add a marginal frequency plot e.g. with ggMarginal
library(ggExtra)
ggMarginal(p)
How do we add something similar to a tile plot of predicted diamond prices?
library(mgcv)
model = gam(price ~ s(table, carat), data = diamonds)
newdat = expand.grid(seq(55,75, 5), c(1:4))
names(newdat) = c("table", "carat")
newdat$predicted_price = predict(model, newdat)
ggplot(newdat,aes(x = carat, y = table, fill = predicted_price)) +
geom_tile()
Ideally, the histograms go even beyond the margins of the tileplot, as these data points also influence the predictions. I would, however, be already very happy to know how to plot a histogram for the range that is shown in the tileplot. (Maybe the values that are outside the range could just be added to the extreme values in different color.)
PS. I managed to more or less align histograms to the margins of the sides of a tile plot, using the method of the accepted answer in the linked thread, but only if I removed all kind of labels. It would be particularly good to keep the color legend, if possible.
EDIT:
eipi10 provided an excellent solution. I tried to modify it slightly to add the sample size in numbers and to graphically show values outside the plotted range since they also affect the interpolated values.
I intended to include them in a different color in the histograms at the side. I hereby attempted to count them towards the lower and upper end of the plotted range. I also attempted to plot the sample size in numbers somewhere on the plot. However, I failed with both.
This was my attempt to graphically illustrate the sample size beyond the plotted area:
plot_data = diamonds
plot_data <- transform(plot_data, carat_range = ifelse(carat < 1 | carat > 4, "outside", "within"))
plot_data <- within(plot_data, carat[carat < 1] <- 1)
plot_data <- within(plot_data, carat[carat > 4] <- 4)
plot_data$carat_range = as.factor(plot_data$carat_range)
p2 = ggplot(plot_data, aes(carat, fill = carat_range)) +
geom_histogram() +
thm +
coord_cartesian(xlim=xrng)
I tried to add the sample size in numbers with geom_text. I tried fitting it in the far right panel but it was difficult (/impossible for me) to adjust. I tried to put it on the main graph (which would anyway probably not be the best solution), but it didn’t work either (it removed the histogram and legend, on the right side and it did not plot all geom_texts). I also tried to add a third row of plots and writing it there. My attempt:
n_table_above = nrow(subset(diamonds, table > 75))
n_table_below = nrow(subset(diamonds, table < 55))
n_table_within = nrow(subset(diamonds, table >= 55 & table <= 75))
text_p = ggplot()+
geom_text(aes(x = 0.9, y = 2, label = paste0("N(>75) = ", n_table_above)))+
geom_text(aes(x = 1, y = 2, label = paste0("N = ", n_table_within)))+
geom_text(aes(x = 1.1, y = 2, label = paste0("N(<55) = ", n_table_below)))+
thm
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, text_p, ggplot(), widths=c(6,1), heights =c(6,1))
I would be very happy to receive help on either or both tasks (adding sample size as text & adding values outside plotted range in a different color).
Based on your comment, maybe the best approach is to roll your own layout. Below is an example. We create the marginal plots as separate ggplot objects and lay them out with the main plot. We also extract the legend and put it outside the marginal plots.
Set-up
library(ggplot2)
library(cowplot)
# Function to extract legend
#https://github.com/hadley/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend) }
thm = list(theme_void(),
guides(fill=FALSE),
theme(plot.margin=unit(rep(0,4), "lines")))
xrng = c(0.6,4.4)
yrng = c(53,77)
Plots
p1 = ggplot(newdat, aes(x = carat, y = table, fill = predicted_price)) +
geom_tile() +
theme_classic() +
coord_cartesian(xlim=xrng, ylim=yrng)
leg = g_legend(p1)
p1 = p1 + thm[-1]
p2 = ggplot(diamonds, aes(carat)) +
geom_line(stat="density") +
thm +
coord_cartesian(xlim=xrng)
p3 = ggplot(diamonds, aes(table)) +
geom_line(stat="density") +
thm +
coord_flip(xlim=yrng)
plot_grid(
plot_grid(plotlist=list(p2, ggplot(), p1, p3), ncol=2,
rel_widths=c(4,1), rel_heights=c(1,4), align="hv", scale=1.1),
leg, rel_widths=c(5,1))
UPDATE: Regarding your comment about the space between the plots: This is an Achilles heel of plot_grid and I don't know if there's a way to fix it. Another option is ggarrange from the experimental egg package, which doesn't add so much space between plots. Also, you need to save the output of ggarrange first and then lay out the saved object with the legend. If you run ggarrange inside grid.arrange you get two overlapping copies of the plot:
# devtools::install_github('baptiste/egg')
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, widths=c(6,1))

directlabels: using different positioning methods for different panels in the same plot

I have a two-panel lattice lineplot. I want to use the directlabels package to automatically label the lines in each plot. But I want to use a different positioning method for each plot. Specifically, I want to use the first.bumpup method for the first panel, and the last.bumpup method for the second panel. Here is a minimal example:
library(directlabels)
library(lattice)
myDF <- data.frame(
y = rep(1:4, 2),
x = rep(rep(1:2, 2), 2),
group = rep(c('a', 'b'), each = 2),
panel = rep(1:2, each = 4))
myPlot <- xyplot(y ~ x | panel, groups = group, data = myDF, type = 'l')
direct.label(
p = myPlot,
method = 'first.bumpup')
This code produces a plot in which labels appear on the left-hand side of each panel:
I want labels on the left-hand side of the left-hand panel (as in this example), but on the right-hand side of the right-hand panel. What is the simplest way to produce that sort of figure?
I've checked the advanced examples in the directlabels documentation, and they make me think that it may be possible to use different methods for different panels by creating a custom positioning method or a custom panel. But I cannot quite see how to do it.
I took a crack at this with ggplot2 (only because I know ggplot2 much better than I know lattice). Let me know what you think. Below are two approaches. The first actually doesn't use directlabels. The placement rule is relatively simple, so I just used geom_text for label placement. The second method does use directlabels, but is more complicated.
Place labels using geom_text
library(dplyr) # For chaining operator (%>%)
library(ggplot2)
library(cowplot) # For cowplot theme
ggplot(myDF, aes(x, y, colour=group)) +
geom_line() +
geom_text(data=myDF %>% group_by(panel) %>%
filter(ifelse(panel==1, x==min(x), x==max(x))),
aes(x + 0.07*(panel-mean(panel)), y, label=group)) +
facet_grid(~panel) +
scale_x_continuous(breaks=seq(1,2,0.2)) +
theme_cowplot() +
guides(colour=FALSE)
In the code above, inside geom_text we keep only the lowest x value for the first panel and the highest x value in the second panel and then place the group labels at the y values that pair with the x values. The x + 0.07*(panel-mean(panel)) is just to move the labels slightly away from the ends of the lines.
Place labels using mapply and directlabels
Here's a more complicated method using directlabels. My approach was to plot each "facet" separately using mapply, so that I could use a different directlabels method for each panel, but then lay the two plots out together as if they were two facets of the same overall plot. If you like the result, maybe you can adapt it to a lattice plot if none of the ggplot2 versions meet your needs.
library(directlabels)
library(ggplot2)
library(gridExtra)
library(cowplot)
pl = mapply(function(pnl, m) {
# Create plot for each level of panel
p = ggplot(myDF[myDF$panel==pnl, ], aes(x, y, colour=group)) +
geom_line() +
facet_grid(~panel) +
scale_x_continuous(breaks=seq(1,2,0.2)) +
theme_cowplot()
# # Tweak margins of panel 1
# if(pnl==1) p = p + theme(plot.margin=unit(rep(0,4),"lines"))
# Remove y-axis title, labels and ticks for panel 2 and tweak margins
if(pnl==2) p = p + theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())
# Add directlabels with different method for each panel
direct.label(p, method=m)
},
pnl=unique(myDF$panel), m=c("first.bumpup", "last.bumpup"), SIMPLIFY=FALSE)
Because I removed the y-axis title, labels, and ticks in panel 2, that panel is wider than panel 1. plot_grid has an align argument that allows us to align the two plots so that they have the same width, but I don't know of a way to get rid of the space between the plots. grid.arrange will also lay out that plots, but we have to adjust the widths manually (you can adjust the widths manually with plot_grid as well).
# Lay out each panel using plot_grid from cowplot package
plot_grid(plotlist=pl, ncol=2, align="v")
# Lay out each panel using grid.arrange from gridExtra package
grid.arrange(grobs=pl, ncol=2, widths=c(10,9))
Here is an adaptation of eipi10's second solution that creates the desired effect in a lattice plot:
library(directlabels)
library(gridExtra)
library(lattice)
myDF <- data.frame(
y = rep(1:4, 2),
x = rep(rep(1:2, 2), 2),
group = rep(c('a', 'b'), each = 2),
panel = rep(1:2, each = 4))
plotFunction <- function(panelNumber, labelMethod) {
myPlot = xyplot(
y ~ x,
groups = group,
data = myDF[myDF$panel==panelNumber, ],
type = 'l')
direct.label(
p = myPlot,
method = labelMethod)
}
panelList = mapply(
FUN = plotFunction,
panelNumber = unique(myDF$panel),
labelMethod = c('first.bumpup', 'last.bumpup'),
SIMPLIFY = FALSE)
grid.arrange(grobs = panelList, ncol = 2)

Rotation of facets in ggplot2

I think I have a tricky case. I'm plotting the evolution plant disease levels in time, using geom_raster: x and y are arbitrary field coordinates, and z is the disease level measured at several time points, and I want to have each date plotted in a different facet.
So far, no problem. Below is a mock dataset and code:
library(ggplot2)
data <- data_frame(month=factor(rep(c("march","april","may","june"), each=100), levels=c("march","april","may","june")),
x=rep(rep(1:10, each=10), 4),
y=rep(rep(1:10, 10), 4),
z=c(rnorm(100, 0.5, 1), rnorm(100, 3, 1.5), rnorm(100, 6, 2), rnorm(100, 9, 1)))
ggplot(data, aes(x=x, y=y, fill=z)) +
geom_raster(color="white") +
scale_fill_gradient2(low="white", mid=mean(range(dat$z)), high="red") +
scale_x_discrete(limit=1:10, expand = c(0, 0)) +
scale_y_discrete(limit=1:10, expand = c(0, 0)) +
coord_equal() +
facet_wrap(~month)
But what I'd really like, is to have each facet rotated at a certain angle (for example 15°), to reflect the fact that my field is not oriented perfectly according to north (i.e., the top is not North, and bottom is not South).
Is there a possibility in ggplot2, or any grid-related tools, to do this automatically? Even an automatic way to savec individual facets to images, rotate them, and printing the rotated images on new page would be enough for my needs. Here's an example of image I would like to obtain (facets rotated 15° in an image editor):
http://imgur.com/RYJ3EaR
Here's a way to rotate the facets independently. We create a list containing a separate rotated plot for each level of month, and then use grid.arrange to lay out the four plots together. I've also removed the legend from the individual plots and plotted the legend separately. The code below includes a helper function to extract the legend.
I extract the legend object into the global environment within the lapply function below (not to mention repeating the extraction multiple times). There's probably a better way, but this way was quick.
library(gridExtra)
# Helper function to extract the legend from a ggplot
# Source: http://stackoverflow.com/questions/12539348/ggplot-separate-legend-and-plot
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
legend
}
# Create a list containing a rotated plot for each level of month
pl = lapply(unique(data$month), function(m) {
# Create a plot for the current level of month
p1 = ggplot(data[data$month==m,], aes(x=x, y=y, fill=z)) +
geom_raster(color="white") +
scale_fill_gradient2(low="white", high="red",
limits=c(floor(min(data$z)), ceiling(max(data$z)))) +
scale_x_discrete(limit=1:10, expand = c(0, 0)) +
scale_y_discrete(limit=1:10, expand = c(0, 0)) +
coord_equal() +
facet_wrap(~month)
# Extract legend into global environment
leg <<- g_legend(p1)
# Remove legend from plot
p1 = p1 + guides(fill=FALSE)
# Return rotated plot
editGrob(ggplotGrob(p1), vp=viewport(angle=-20, width=unit(0.85,"npc"),
height=unit(0.85,"npc")))
})
# Lay out the rotated plots and the legend and save to a png file
png("rotated.png", 1100, 1000)
grid.arrange(do.call(arrangeGrob, c(pl, ncol=2)),
leg, ncol=2, widths=c(0.9,0.1))
dev.off()

Date labels overlap when putting multiple ggplot plots on single page

I am trying to put multiple ggplot2 time series plots on a page using the gridExtra package's arrange() function. Unfortunately, I am finding that the x-axis labels get pushed together; it appears that the plot is putting the same number of x-axis labels as a full-page chart, even though my charts only take up 1/4 of a page. Is there a better way to do this? I would prefer not to have to manually set any points, since I will be dealing with a large number of charts that span different date ranges and have different frequencies.
Here is some example code that replicates the problem:
dfm <- data.frame(index=seq(from=as.Date("2000-01-01"), length.out=100, by="year"),
x1=rnorm(100),
x2=rnorm(100))
mydata <- melt(dfm, id="index")
pdf("test.pdf")
plot1 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
plot2 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
plot3 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
plot4 <- ggplot(mydata, aes(index, value, color=variable))+geom_line()
arrange(plot1, plot2, plot3, plot4, ncol=2, nrow=2)
dev.off()
either rotate the axis labels
+ opts(axis.text.x=theme_text(angle=45, hjust=1))
Note that opts is deprecated in current versions of ggplot2. This functionality has been moved to theme():
+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
or dilute the x-axis
+scale_x_datetime(major = "10 years")
to automatically shift the labels, I think the arrange() function needs to be fiddled with (though I'm not sure how).
I wrote this function to return the proper major axis breaks given that you want some set number of major breaks.
year.range.major <- function(df, column = "index", n = 5){
range <- diff(range(df[,column]))
range.num <- as.numeric(range)
major = max(pretty((range.num/365)/n))
return(paste(major,"years"))
}
So, instead of always fixing the breaks at 10 years, it'll produce fixed number of breaks at nice intervals.
+scale_x_date(major = year.range.major())
or
+scale_x_date(major = year.range.major(n=3))

Resources