I'd like to create a faceted plot using ggplot2 in which the minimum limit of the y axis will be fixed (say at 0) and the maximum limit will be determined by the data in the facet (as it is when scales="free_y". I was hoping that something like the following would work, but no such luck:
library(plyr)
library(ggplot2)
#Create the underlying data
l <- gl(2, 10, 20, labels=letters[1:2])
x <- rep(1:10, 2)
y <- c(runif(10), runif(10)*100)
df <- data.frame(l=l, x=x, y=y)
#Create a separate data frame to define axis limits
dfLim <- ddply(df, .(l), function(y) max(y$y))
names(dfLim)[2] <- "yMax"
dfLim$yMin <- 0
#Create a plot that works, but has totally free scales
p <- ggplot(df, aes(x=x, y=y)) + geom_point() + facet_wrap(~l, scales="free_y")
#Add y limits defined by the limits dataframe
p + ylim(dfLim$yMin, dfLim$yMax)
It's not too surprising to me that this throws an error (length(lims) == 2 is not TRUE) but I can't think of a strategy to get started on this problem.
In your case, either of the following will work:
p + expand_limits(y=0)
p + aes(ymin=0)
Related
I would like to add decile breaks to the following plot.
ggplot(data = data.frame(x = c(-0, 1)), aes(x)) +
stat_function(fun = dexp, n = 101,
args = list(rate=3)
)
As an example of what I would like to achieve, here is another plot but I would like to find a way to stick to stat_function since I can easily change the underlying distribution and parameters and also the plot is always more smooth.
vals <-rexp(1001,rate=3)
dens <- density(vals)
plot(dens)
df <- data.frame(x=dens$x, y=dens$y)
probs <-seq(0,1,0.1)
quantiles <- quantile(vals, prob=probs)
df$quant <- factor(findInterval(df$x,quantiles))
ggplot(df, aes(x,y)) + geom_line() + geom_ribbon(aes(ymin=0, ymax=y, fill=quant)) +
scale_x_continuous(breaks=quantiles) + scale_fill_brewer(guide="none")
This is a bit hacky, uses pexp with some rounding to calculate which decile each x location is in. It doesn't use stat_function but has the same advantages as you mention.
# Set up range and number of x points
x=seq(0,1,l=1e4)
# Create x and y positions with dexp
df <- data.frame(x=x, y=dexp(x,rate=3))
# Identify quantiles using pexp
df$quant <- scales::number(floor(10*pexp(df$x,rate=3))*10, suffix="%")
# Find x positions for breaks
xpos <- aggregate(df, x~quant, min)
# Plot
ggplot(df, aes(x,y)) +
geom_line() +
geom_ribbon(aes(ymin=0, ymax=y, fill=quant)) +
scale_x_continuous(breaks=xpos$x, labels=xpos$quant)
This question already has answers here:
Align multiple plots in ggplot2 when some have legends and others don't
(6 answers)
Closed 5 years ago.
I'm trying to use ggplot to draw a graph comparing the absolute values of two variables, and also show the ratio between them. Since the ratio is unitless and the values are not, I can't show them on the same y-axis, so I'd like to stack vertically as two separate graphs with aligned x-axes.
Here's what I've got so far:
library(ggplot2)
library(dplyr)
library(gridExtra)
# Prepare some sample data.
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# Plot absolute values
plot_values <- ggplot(results, aes(x=index)) +
geom_point(aes(y=value, color="value")) +
geom_point(aes(y=control, color="control"))
# Plot ratios between values
plot_ratios <- ggplot(results, aes(x=index, y=ratio)) +
geom_point()
# Arrange the two plots above each other
grid.arrange(plot_values, plot_ratios, ncol=1, nrow=2)
The big problem is that the legend on the right of the first plot makes it a different size. A minor problem is that I'd rather not show the x-axis name and tick marks on the top plot, to avoid clutter and make it clear that they share the same axis.
I've looked at this question and its answers:
Align plot areas in ggplot
Unfortunately, neither answer there works well for me. Faceting doesn't seem a good fit, since I want to have completely different y scales for my two graphs. Manipulating the dimensions returned by ggplot_gtable seems more promising, but I don't know how to get around the fact that the two graphs have a different number of cells. Naively copying that code doesn't seem to change the resulting graph dimensions for my case.
Here's another similar question:
The perils of aligning plots in ggplot
The question itself seems to suggest a good option, but rbind.gtable complains if the tables have different numbers of columns, which is the case here due to the legend. Perhaps there's a way to slot in an extra empty column in the second table? Or a way to suppress the legend in the first graph and then re-add it to the combined graph?
Here's a solution that doesn't require explicit use of grid graphics. It uses facets, and hides the legend entry for "ratio" (using a technique from https://stackoverflow.com/a/21802022).
library(reshape2)
results_long <- melt(results, id.vars="index")
results_long$facet <- ifelse(results_long$variable=="ratio", "ratio", "values")
results_long$facet <- factor(results_long$facet, levels=c("values", "ratio"))
ggplot(results_long, aes(x=index, y=value, colour=variable)) +
geom_point() +
facet_grid(facet ~ ., scales="free_y") +
scale_colour_manual(breaks=c("control","value"),
values=c("#1B9E77", "#D95F02", "#7570B3")) +
theme(legend.justification=c(0,1), legend.position=c(0,1)) +
guides(colour=guide_legend(title=NULL)) +
theme(axis.title.y = element_blank())
Try this:
library(ggplot2)
library(gtable)
library(gridExtra)
AlignPlots <- function(...) {
LegendWidth <- function(x) x$grobs[[8]]$grobs[[1]]$widths[[4]]
plots.grobs <- lapply(list(...), ggplotGrob)
max.widths <- do.call(unit.pmax, lapply(plots.grobs, "[[", "widths"))
plots.grobs.eq.widths <- lapply(plots.grobs, function(x) {
x$widths <- max.widths
x
})
legends.widths <- lapply(plots.grobs, LegendWidth)
max.legends.width <- do.call(max, legends.widths)
plots.grobs.eq.widths.aligned <- lapply(plots.grobs.eq.widths, function(x) {
if (is.gtable(x$grobs[[8]])) {
x$grobs[[8]] <- gtable_add_cols(x$grobs[[8]],
unit(abs(diff(c(LegendWidth(x),
max.legends.width))),
"mm"))
}
x
})
plots.grobs.eq.widths.aligned
}
df <- data.frame(x = c(1:5, 1:5),
y = c(1:5, seq.int(5,1)),
type = factor(c(rep_len("t1", 5), rep_len("t2", 5))))
p1.1 <- ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()
p1.2 <- ggplot(df, aes(x = x, y = y, colour = type)) + geom_line()
plots1 <- AlignPlots(p1.1, p1.2)
do.call(grid.arrange, plots1)
p2.1 <- ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()
p2.2 <- ggplot(df, aes(x = x, y = y)) + geom_line()
plots2 <- AlignPlots(p2.1, p2.2)
do.call(grid.arrange, plots2)
Produces this:
// Based on multiple baptiste's answers
Encouraged by baptiste's comment, here's what I did in the end:
library(ggplot2)
library(dplyr)
library(gridExtra)
# Prepare some sample data.
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# Plot ratios between values
plot_ratios <- ggplot(results, aes(x=index, y=ratio)) +
geom_point()
# Plot absolute values
remove_x_axis =
theme(
axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.title.x = element_blank())
plot_values <- ggplot(results, aes(x=index)) +
geom_point(aes(y=value, color="value")) +
geom_point(aes(y=control, color="control")) +
remove_x_axis
# Arrange the two plots above each other
grob_ratios <- ggplotGrob(plot_ratios)
grob_values <- ggplotGrob(plot_values)
legend_column <- 5
legend_width <- grob_values$widths[legend_column]
grob_ratios <- gtable_add_cols(grob_ratios, legend_width, legend_column-1)
grob_combined <- gtable:::rbind_gtable(grob_values, grob_ratios, "first")
grob_combined <- gtable_add_rows(
grob_combined,unit(-1.2,"cm"), pos=nrow(grob_values))
grid.draw(grob_combined)
(I later realised I didn't even need to extract the legend width, since the size="first" argument to rbind tells it just to have that one override the other.)
It feels a bit messy, but it is exactly the layout I was hoping for.
An alternative & quite easy solution is as follows:
# loading needed packages
library(ggplot2)
library(dplyr)
library(tidyr)
# Prepare some sample data
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# reshape into long format
long <- results %>%
gather(variable, value, -index) %>%
mutate(facet = ifelse(variable=="ratio", "ratio", "values"))
long$facet <- factor(long$facet, levels=c("values", "ratio"))
# create the plot & remove facet labels with theme() elements
ggplot(long, aes(x=index, y=value, colour=variable)) +
geom_point() +
facet_grid(facet ~ ., scales="free_y") +
scale_colour_manual(breaks=c("control","value"), values=c("green", "red", "blue")) +
theme(axis.title.y=element_blank(), strip.text=element_blank(), strip.background=element_blank())
which gives:
I want to incrementally build a plot that contains several data series of different lengths. My goal is to be able to control the appearance of each data series,
give them custom names
and to have appropriate legends. My data series are of different lengths, so
I cannot put them in a single dataframe.
In the code below I expect 4 lines: the shortest will be red, the next ones will be blue, green and black respectively
library(ggplot2)
set.seed(12345)
plt <- ggplot()
colors <- c('red', 'blue', 'green', 'black')
for(i in seq(length(colors))) {
x <- seq(1, 2*i)
y <- x * i + rnorm(length(x))
df <- data.frame(x=x, y=y)
plt <- plt + geom_point(aes(x, y), data=df, color=colors[i]) +
geom_line(aes(x, y), data=df, color=colors[i])
}
print(plt)
This is what I get.
How can I give names to the lines and display a legend?
Is there a better way to acheive my goal?
The way to do this is to create a single data frame in long format:
Like this:
library(ggplot2)
set.seed(12345)
colors <- c('red', 'blue', 'green', 'black')
dat <- lapply(seq_along(colors), function(i){
x <- seq(1, 2*i)
data.frame(
series = colors[i],
x = x,
y = x * i + rnorm(length(x))
)}
)
dat <- do.call(rbind, dat)
Now plot
ggplot(dat, aes(x, y, color=series)) + geom_line()
You don't have to use a for-loop and plot each time after constructing the data because they are of unequal lengths. This is why ggplot2 is awesome! You can create a group for each of the dataset. And you can name the line to whatever you want using the same group as it will appear in the legend as such (of course you can change it directly in the legend as well, later, if you wish). Here's what I think you expect:
set.seed(12345)
require(ggplot2)
require(plyr)
# to group your data. change the letters to whatever you want to appear as legend
line_names <- letters[1:4]
# Use plyr to create your x and y for each i and add the group.
dat <- ldply(1:length(colors), function(i) {
x <- seq(1, 2*i)
y <- x * i + rnorm(length(x))
data.frame(x=x, y=y, grp=line_names[i])
})
# just plot here.
ggplot(data = dat, aes(x=x, y=y)) + geom_line(aes(colour=grp)) + geom_point()
I have two graphs with the same x axis - the range of x is 0-5 in both of them.
I would like to combine both of them to one graph and I didn't find a previous example.
Here is what I got:
c <- ggplot(survey, aes(often_post,often_privacy)) + stat_smooth(method="loess")
c <- ggplot(survey, aes(frequent_read,often_privacy)) + stat_smooth(method="loess")
How can I combine them?
The y axis is "often privacy" and in each graph the x axis is "often post" or "frequent read".
I thought I can combine them easily (somehow) because the range is 0-5 in both of them.
Many thanks!
Example code for Ben's solution.
#Sample data
survey <- data.frame(
often_post = runif(10, 0, 5),
frequent_read = 5 * rbeta(10, 1, 1),
often_privacy = sample(10, replace = TRUE)
)
#Reshape the data frame
survey2 <- melt(survey, measure.vars = c("often_post", "frequent_read"))
#Plot using colour as an aesthetic to distinguish lines
(p <- ggplot(survey2, aes(value, often_privacy, colour = variable)) +
geom_point() +
geom_smooth()
)
You can use + to combine other plots on the same ggplot object. For example, to plot points and smoothed lines for both pairs of columns:
ggplot(survey, aes(often_post,often_privacy)) +
geom_point() +
geom_smooth() +
geom_point(aes(frequent_read,often_privacy)) +
geom_smooth(aes(frequent_read,often_privacy))
Try this:
df <- data.frame(x=x_var, y=y1_var, type='y1')
df <- rbind(df, data.frame(x=x_var, y=y2_var, type='y2'))
ggplot(df, aes(x, y, group=type, col=type)) + geom_line()
I'm trying to use ggplot2 to create and label a scatterplot. The variables that I am plotting are both scaled such that the horizontal and the vertical axis are plotted in units of standard deviation (1,2,3,4,...ect from the mean). What I would like to be able to do is label ONLY those elements that are beyond a certain limit of standard deviations from the mean. Ideally, this labeling would be based off of another column of data.
Is there a way to do this?
I've looked through the online manual, but I haven't been able to find anything about defining labels for plotted data.
Help is appreciated!
Thanks!
BEB
Use subsetting:
library(ggplot2)
x <- data.frame(a=1:10, b=rnorm(10))
x$lab <- letters[1:10]
ggplot(data=x, aes(a, b, label=lab)) +
geom_point() +
geom_text(data = subset(x, abs(b) > 0.2), vjust=0)
The labeling can be done in the following way:
library("ggplot2")
x <- data.frame(a=1:10, b=rnorm(10))
x$lab <- rep("", 10) # create empty labels
x$lab[c(1,3,4,5)] <- LETTERS[1:4] # some labels
ggplot(data=x, aes(x=a, y=b, label=lab)) + geom_point() + geom_text(vjust=0)
Subsetting outside of the ggplot function:
library(ggplot2)
set.seed(1)
x <- data.frame(a = 1:10, b = rnorm(10))
x$lab <- letters[1:10]
x$lab[!(abs(x$b) > 0.5)] <- NA
ggplot(data = x, aes(a, b, label = lab)) +
geom_point() +
geom_text(vjust = 0)
Using qplot:
qplot(a, b, data = x, label = lab, geom = c('point','text'))