Related
Given a collection of ggplot2 line graphs, is it possible to extract the line-to-color mapping for each graph and then change the colors of all the lines across the collection so I can use a single guide in patchwork?
Example:
library("tibble")
library("ggplot2")
library("patchwork")
d1 <-
tribble(
~g, ~x, ~y,
"a", 1, 1,
"a", 2, 2,
"b", 1, 2,
"b", 2, 1
)
p1 <- ggplot(d1, aes(x = x, y = y, group = g)) + geom_line( aes(color=g))
d2 <-
tribble(
~g, ~x, ~y,
"b", 1, 3,
"b", 2, 5,
"c", 1, 5,
"c", 2, 4
)
p2 <-ggplot(d2, aes(x = x, y = y, group = g)) + geom_line( aes(color=g))
p1 + p2 + plot_layout(guides = "collect")
cols <- c("a" = "red", "b" = "blue", "c" = "green")
p1a <- p1 + scale_colour_manual(values = cols)
p2a <- p2 + scale_colour_manual(values = cols)
p1a + p2a + plot_layout(guides = "collect")
In the first plot, pl+p2 use different color mappings so get different guides.
In the second plot, I specify a color mapping vector post-hoc and then patchwork can combine the guides.
My question is how can I extract the data pragmatically given just p1 and p2 to build the cols variable?
I see that, in the above:
> p1$mapping$group[[2]]
g
> p1$data$g
[1] "a" "a" "b" "b"
which is an approach that might work in this case, but would break if the group wasn't the color aesthetic in geom_line.
As far as I know, there is no straightforward way to do this. What we can do, is write our own function that merges scales of plot inputs. In the code below, we loop through a list of plots, extract and merge their limits, and build a new scale based on the first plot. (Not tested for any other example than this, it is unlikely to work for continuous scales)
merge_scales <- function(..., aesthetic = "colour") {
plotlist <- list(...)
# Extract scales
plotlist <- lapply(plotlist, ggplot_build)
scales <- lapply(
plotlist, function(x) x$plot$scales$get_scales(aesthetic)
)
# Calculate new limits
limits <- lapply(scales, function(x) x$get_limits())
limits <- Reduce(union, limits)
# Copy scale, assign new limits
scale <- scales[[1]]$clone()
scale$limits <- limits
# Reset caches
scale$palette.cache <- NULL
scale$n.breaks.cache <- NULL
scale
}
You should then be able to just slap on this function in your patchwork with the & operator to apply the new scale to all preceding plots.
library("tibble")
library("ggplot2")
library("patchwork")
d1 <-
tribble(
~g, ~x, ~y,
"a", 1, 1,
"a", 2, 2,
"b", 1, 2,
"b", 2, 1
)
p1 <- ggplot(d1, aes(x = x, y = y, group = g)) + geom_line( aes(color=g))
d2 <-
tribble(
~g, ~x, ~y,
"b", 1, 3,
"b", 2, 5,
"c", 1, 5,
"c", 2, 4
)
p2 <-ggplot(d2, aes(x = x, y = y, group = g)) + geom_line( aes(color=g))
p1 + p2 + plot_layout(guides = "collect") & merge_scales(p1, p2)
Created on 2022-08-31 by the reprex package (v2.0.1)
EDIT
I think it is mostly unavoidable to get the 'scale is already present' warning when we're not allowed to touch p1 and p2 upstream (maybe if you wrap everything in suppressWarnings()?). What you can do, it so use a template scale in the function, so that might make things a bit easier.
merge_scales <- function(..., template = scale_colour_discrete()) {
plotlist <- list(...)
aesthetic <- template$aesthetics[[1]]
# Extract scales
plotlist <- lapply(plotlist, ggplot_build)
scales <- lapply(
plotlist, function(x) x$plot$scales$get_scales(aesthetic)
)
# Calculate new limits
limits <- lapply(scales, function(x) x$get_limits())
limits <- Reduce(union, limits)
# Copy scale, assign new limits
scale <- template$clone()
scale$limits <- limits
scale
}
Then use it like this:
p1 + p2 + plot_layout(guides = "collect") &
merge_scales(
p1, p2,
template = scale_colour_manual(values = c("blue", "red", "green"))
)
I am trying to draw a line plot having two x variables in the x-axis with one continuous y variable in the y-axis. The count of x1 and x2 are different. The df looks like the following-
df <- structure(list(val = c(3817,2428,6160,6729,7151,7451,6272,7146,7063,6344,5465,6169,7315,6888,7167,6759,4903,6461,7010,7018,6920,3644,6541,31862,31186,28090,28488,29349,28284,25815,23529,20097,19945,22118), type = c("1wt", "1wt", "3wt", "3wt", "3wt", "5wt", "5wt", "7wt", "7wt", "7wt","10wt","10wt","10wt","15wt","15wt","20wt","20wt","25wt","25wt","25wt","30wt","30wt","30wt","20m","20m","15m","15m","15m","10m","10m","5m", "5m", "5m", "5m"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B")), row.names = c(NA, 34L), class = "data.frame")
where the x variables are-
x1 <- factor(df$type, levels = c("1wt", "3wt", "5wt", "7wt", "10wt", "15wt", "20wt", "25wt", "30wt")) and
x2 <- factor(df$type, levels = c("20m", "15m","10m","5m"))
I want to have separate lines for the x1 and x2 with different colors and legends as per the df$group at the x-axis with df$val at the y -axis. could you please help me doing this? Thanks in advance.
EDIT: added below
Here's an approach that assumes the intent is to map the span of possible type values from group A against the span of possible values from group B.
Labeling could be added manually, but I don't think there's any simple way to use two categorical x axes together in one plot.
df2 <- df %>%
mutate(x = case_when(type == "1wt" ~ 0,
type == "3wt" ~ 1,
type == "5wt" ~ 2,
type == "7wt" ~ 3,
type == "10wt" ~ 4,
type == "15wt" ~ 5,
type == "20wt" ~ 6,
type == "25wt" ~ 7,
type == "30wt" ~ 8,
type == "20m" ~ 0/3 * 8,
type == "15m" ~ 1/3 * 8,
type == "10m" ~ 2/3 * 8,
type == "5m" ~ 3/3 * 8))
ggplot(df2, aes(x, val, color = group, group = group)) +
geom_point() +
geom_smooth(method = lm)
2nd approach
It sounds like the OP would like to use the type values numerically in some fashion. If they aren't intrinsically linked to each other in the way that's described, I suspect it will be misleading to plot them as if they are. (See here for a discussion of why this is trouble.)
That said, here's how you could do it. First, here's an approach that just uses the numeric portion of type as is. Note that "m", associated with group B, is on the bottom and "wt" is on the top, associated with group A, as in the example added in the OP comment below. I've added colors to the axes to clarify this. It's a little counterintuitive visually, since the points related to the top axis are on the bottom, and vice versa.
df2 <- df %>%
# First, let's take the number used in "type" without adjustment
mutate(x_unadj = parse_number(type))
ggplot(df2, aes(x_unadj, val, color = group, group = group)) +
geom_point() +
geom_smooth(method = lm) + # Feel free to use other smoothing method, but
# not obvious to me what would be improvement.
scale_x_continuous("m", sec.axis = sec_axis(~., name = "wt")) +
theme(axis.text.x.bottom = element_text(color = "#00BFC4"),
axis.title.x.bottom = element_text(color = "#00BFC4"),
axis.text.x.top = element_text(color = "#F8766D"),
axis.title.x.top = element_text(color = "#F8766D"))
If this is not satisfactory, we might reverse the order of both axes using
scale_x_reverse("m", sec.axis = sec_axis(~., name = "wt")) +
Using ggplot 3.1.0 (from Oct 2018), I could not get the secondary x axis to shift in the opposite direction as the primary axis. This example from 2017 doesn't seem to work with this version any more. As of Dec 2018, there is a proposed fix being reviewed that is meant to address this.
So I have compared two groups with a third using a range of inputs. For each of the three groups I have a value and a confidence interval for a range of inputs. For the two comparisons I also have a p-value for that range of inputs. Now I would like to plot all five data series, but use a second axis for the p values.
I am able to do that except for one thing: how do I make sure that R knows which of the plots to assign to the second axis?
This is what it looks like now. The bottom two data series should be scaled up to the Y axis to the right.
ggplot(df) +
geom_pointrange(aes(x=x, ymin=minc, ymax=maxc, y=meanc, color="c")) +
geom_pointrange(aes(x=x, ymin=minb, ymax=maxb, y=meanb, color="b")) +
geom_pointrange(aes(x=x, ymin=mina, ymax=maxa, y=meana, color="a")) +
geom_point(aes(x=x, y=c, color="c")) +
geom_point(aes(x=x, y=b, color="b")) +
scale_y_continuous(sec.axis = sec_axis(~.*0.2))
df is a dataframe whose column names are all the variables you see listed above, all row values are the corresponding datapoints.
You can get what you want, staying true to Hadley's cannon and Grammar of Graphics gospel, if you transform your DF from wide to long, and employ a different aes (i.e. shape, color, fill) between means and CI.
You did not provide a reproducible example, so I employ my own. (Dput at the end of the post)
df2 <- df %>%
mutate(CatCI = if_else(is.na(CI), "", Cat)) # Create a categorical name to map the CI to the legend.
ggplot(df2, aes(x = x)) +
geom_pointrange(aes(ymin = min, ymax = max, y = mean, color = Cat), shape = 16) +
geom_point(data = dplyr::filter(df2,!is.na(CI)), ## Filter the NA within the CI
aes(y = (CI/0.2), ## Transform the CI's y position to fit the right axis.
fill = CatCI), ## Call a second aes the aes
shape = 25, size = 5, alpha = 0.25 ) + ## I changed shape, size, and fillto help with visualization
scale_y_continuous(sec.axis = sec_axis(~.*0.2, name = "P Value")) +
labs(color = "Linerange\nSinister Axis", fill = "P value\nDexter Axis", y = "Mean")
Result:
Dataframe:
df <- structure(list(Cat = c("a", "b", "c", "a", "b", "c", "a", "b",
"c", "a", "b", "c", "a", "b", "c"), x = c(2, 2, 2, 2.20689655172414,
2.20689655172414, 2.20689655172414, 2.41379310344828, 2.41379310344828,
2.41379310344828, 2.62068965517241, 2.62068965517241, 2.62068965517241,
2.82758620689655, 2.82758620689655, 2.82758620689655), mean = c(0.753611797661977,
0.772340941644911, 0.793970086962944, 0.822424652072316, 0.837015408776649,
0.861417383841253, 0.87023105762465, 0.892894201949377, 0.930096326498796,
0.960862178366363, 0.966600321596147, 0.991206984637544, 1.00714201832596,
1.02025006679944, 1.03650896186786), max = c(0.869753641121797,
0.928067675294351, 0.802815304215019, 0.884750162053761, 1.03609814491961,
0.955909854315582, 1.07113399603486, 1.02170928767791, 1.05504846273091,
1.09491706586801, 1.20235615364205, 1.12035782960649, 1.17387406039167,
1.13909154635088, 1.0581878034897), min = c(0.632638511783381,
0.713943701135991, 0.745868763626567, 0.797491261486603, 0.743382797144923,
0.827693203320894, 0.793417962991821, 0.796917421637021, 0.92942504556723,
0.89124101157585, 0.813058838839382, 0.91701749675892, 0.943744642652422,
0.912869230576973, 0.951734254896252), CI = c(NA, 0.164201137643034,
0.154868406784159, NA, 0.177948094206453, 0.178360305763648,
NA, 0.181862670931493, 0.198447350829814, NA, 0.201541499248143,
0.203737532636542, NA, 0.205196077692786, 0.200992205838595),
CatCI = c("", "b", "c", "", "b", "c", "", "b", "c", "", "b",
"c", "", "b", "c")), .Names = c("Cat", "x", "mean", "max",
"min", "CI", "CatCI"), row.names = c(NA, 15L), class = "data.frame")
I would like to use ggplot to create a barchart, but not aggregate the observations by (categorical) x. For example, here is what I want using the R base plot system:
library(ggplot2)
data <- data.frame(lab = c("a", "b", "b", "c", "a"),
val = c(2, 5, 6, 3, 1))
barplot(data$val, names.arg = data$lab)
and here is what I want:
However, if I use ggplot, this is what I get:
ggplot(data, aes(lab, val)) + geom_bar(stat = "identity")
What is the right way of using ggplot to get the plot I want? Thanks!
You can create a new variable along the lab value as the x and then relabel them.
ggplot(data, aes(as.character(seq_along(lab)), val)) + geom_bar(stat = "identity") +
scale_x_discrete("lab", labels = c("1" = "a", "2" = "b", "3" = "b", "4" = "c", "5" = "a"))
I have a plot with a discrete x-axis and I want to tweak the extra space on both sides of the scale, making it smaller on the left and bigger on the right, so the long labels will fit. scale_x_discrete(expand=c(0, 1)) is not my friend here, as it always works on both side simultaneously. This question is similar but addresses continuous scales.
How can I achieve that?
set.seed(0)
L <- sapply(LETTERS, function(x) paste0(rep(x, 10), collapse=""))
x <- data.frame(label=L[1:24],
g2 = c("a", "b"),
y = rnorm(24))
x$g2 <- as.factor(x$g2)
x$xpos2 <- as.numeric(x$g2) + .25
# two groups
ggplot(x, aes(x=g2, y=y)) +
geom_boxplot(width=.4) +
geom_point(col="blue") +
geom_text(aes(x=xpos2, label=label, hjust=0))
This may be what you're looking for:
library(ggplot2)
set.seed(0)
L <- sapply(LETTERS, function(x) paste0(rep(x, 10), collapse=""))
x <- data.frame(label=L[1:24],
g2 = c("a", "b"),
y = rnorm(24))
x$g2 <- factor(x$g2, levels=c("a", "b", "c"))
x$xpos2 <- as.numeric(x$g2) + .25
# two groups
ggplot(x, aes(x=g2, y=y)) +
geom_boxplot(width=.4) +
geom_point(col="blue") +
geom_text(aes(x=xpos2, label=label, hjust=0)) +
scale_x_discrete(expand=c(0.1,0),
breaks=c("a", "b"),
labels=c("a", "b"),
limits=c("a", "b", "c"), drop=FALSE)