How to blend two overlapping graphs with ggplot2 - r

Using ggplot2, how can I blend two graphs? If I graph two sets over data, the second set of data covers up the first set of data. Is there a way to blend both graphs. I already put the alpha value as low as I can. Any lower and I can't see individual points.
demanalyze <- function(infocode, n = 1){
infoname <- filter(infolookup, column_name == infocode)$description
infocolumn <- as.vector(as.matrix(mydata[infocode]))
ggplot(mydata) +
aes(x = infocolumn) +
ggtitle(infoname) +
xlab(infoname) +
ylab("Fraction of votes each canidate recieved") +
xlab(infoname) +
geom_point(aes(y = sanders_vote_fraction, colour = "Bernie Sanders"), size=I(2)) +#, color = alpha("blue",0.02), size=I(1)) +
stat_smooth(aes(y = sanders_vote_fraction), method = "lm", formula = y ~ poly(x, n), size = 1, color = "darkblue", se = F) +
geom_point(aes(y = clinton_vote_fraction, colour = "Hillary Clinton"), size=I(2)) +#, color = alpha("red",0.02), size=I(1)) +
stat_smooth(aes(y = clinton_vote_fraction), method = "lm", formula = y ~ poly(x, n), size = 1, color = "darkred", se = F) +
scale_colour_manual("",
values = c("Bernie Sanders" = alpha("blue",0.005), "Hillary Clinton" = alpha("red",0.005))
) +
guides(colour = guide_legend(override.aes = list(alpha = 1)))
}
By blend, I mean of a there is a red point and a blue point in the same spot, it should show up as purple.

Looking at the plot, my guess is that the issue is a ton of red stacking on top of each other, blocking the blue below. I think you may need to randomize the layering on the graph, which will require generating a single data.frame. Alternatively, if Hillary+Bernie always equals 1, you may be able to just plot that. If they don't, and you don't want to lose too much information, you could plot just one metric of (Hillary)/(Bernie+Hillary).
Example:
geom_point(aes(y = clinton_vote_fraction / ( clinton_vote_fraction + sanders_vote_fraction)
, colour = "Clinton Share"), size=I(2))
And here is an example with the melting approach:
library(dplyr)
library(reshape2)
df <-
data.frame(
metric = rnorm(1000)
, Clinton = rnorm(1000, 48, 10)
) %>%
mutate(Sanders = 100 - Clinton - rnorm(4))
meltDF <-
melt(df, "metric"
, variable.name = "Candidate"
, value.name = "Vote Share")
ggplot(meltDF %>%
arrange(sample(1:nrow(.)))
, aes(x = metric
, y = `Vote Share`
, col = Candidate)) +
geom_point(size = 2, alpha = 0.2) +
geom_smooth(se = FALSE, alpha = 1, show.legend = FALSE) +
scale_colour_manual("",
values = c("Clinton" = "darkblue"
, "Sanders" = "red3")
) +
theme_minimal()

Related

ggplot line plot with one group`s lines on top

I am making a line plot of several groups and want to make a visualization where one of the groups lines are highlighted
ggplot(df) + geom_line(aes(x=timepoint ,y=var, group = participant_id, color=color)) +
scale_color_identity(labels = c(red = "g1",gray90 = "Other"),guide = "legend")
However, the group lines are partially obscured by the other groups lines
How can I make these lines always on top of other groups lines?
The simplest way to do this is to plot the gray and red groups on different layers.
First, let's try to replicate your problem with a dummy data set:
set.seed(1)
df <- data.frame(
participant_id = rep(1:50, each = 25),
timepoint = factor(rep(0:24, 50)),
var = c(replicate(50, runif(1, 50, 200) + runif(25, 0.3, 1.5) *
sin(0:24/(0.6*pi))^2/seq(0.002, 0.005, length = 25))),
color = rep(sample(c("red", "gray90"), 50, TRUE, prob = c(1, 9)), each = 100)
)
Now we apply your plotting code:
library(ggplot2)
ggplot(df) +
geom_line(aes(x=timepoint ,y=var, group = participant_id, color = color)) +
scale_color_identity(labels = c(red = "g1", gray90 = "Other"),
guide = "legend") +
theme_classic()
This looks broadly similar to your plot. If instead we plot in different layers, we get:
ggplot(df, aes(timepoint, var, group = participant_id)) +
geom_line(data = df[df$color == "gray90",], aes(color = "Other")) +
geom_line(data = df[df$color == "red",], aes(color = "gl")) +
scale_color_manual(values = c("red", "gray90")) +
theme_classic()
Created on 2022-06-20 by the reprex package (v2.0.1)
You can use factor releveling to bring the line (-s) of interest to front.
First, let's plot the data as is, with the red line partly hidden by others.
library(ggplot2)
library(dplyr)
set.seed(13)
df <-
data.frame(timepoint = rep(c(1:100), 20),
participant_id = paste0("p_", sort(rep(c(1:20), 100))),
var = abs(rnorm(2000, 200, 50) - 200),
color = c(rep("red", 100), rep("gray90", 1900)))
ggplot(df) +
geom_line(aes(x = timepoint ,
y = var,
group = participant_id, color = color)) +
scale_color_identity(labels = c(red = "g1", gray90 = "Other"),
guide = "legend")
Now let's bring p_1 to front by making it the last factor level.
df %>%
mutate(participant_id = factor(participant_id)) %>%
mutate(participant_id = relevel(participant_id, ref = "p_1")) %>%
mutate(participant_id = factor(participant_id, levels = rev(levels(participant_id)))) %>%
ggplot() +
geom_line(aes(x=timepoint,
y=var,
group = participant_id,
color = color)) +
scale_color_identity(labels = c(red = "g1", gray90 = "Other"),
guide = "legend")

How to plot geom_point alone plus geom_point with position_dodge

I struggling on how I can plot my real values, present in the real_values vector, next to the estimates values. My problem here is that the estimates values have a range (via the geom_errorbar), and for the real values I would like to plot just the point, in black, on the left side of each of the 10 estimates.
Here's an example of what I tried:
est_values = rnorm(20)
real_values = rnorm(10)
dat_ex = data.frame(
xvalues = 1:10,
values = est_values,
method = c(rep("A",10),rep("B",10)),
ic_0.025 = c(est_values - rnorm(20,1,0.1)),
ic_0.975 = c(est_values + rnorm(20,1,0.1)))
ggplot(dat_ex) +
#geom_point(aes(x = 1:10, y= real_values), size = 2) +
geom_point(aes(x = xvalues, y= values, group = method, colour = method), position=position_dodge(.9), size = 3) +
geom_errorbar(aes(x = xvalues, y= values, group = method, colour = method,ymin = ic_0.025, ymax = ic_0.975), size = 1.3,position=position_dodge(.9), width = .2)
ggplot generally works best with data in data frames. So we put your real_values in a data frame and plot them in a separate layer, and "nudge" them to the left, as requested:
ggplot(dat_ex) +
geom_point(aes(x = xvalues, y= values, group = method, colour = method), position=position_dodge(.9), size = 3) +
geom_errorbar(aes(x = xvalues, y= values, group = method, colour = method,ymin = ic_0.025, ymax = ic_0.975), size = 1.3,position=position_dodge(.9), width = .2) +
geom_point(
data = data.frame(values = real_values, xvalues = dat_ex$xvalues),
aes(x = xvalues, y = values),
position = position_nudge(x = -.4),
color = "black")
A nicer method might be to put them all in the same data frame. This can simplify the code and will automatically put them in the legend.
library(dplyr)
dat_ex = data.frame(
xvalues = 1:10,
values = real_values,
method = "real"
) %>%
bind_rows(dat_ex) %>%
mutate(method = factor(method, levels = c("real", "A", "B")))
ggplot(dat_ex, aes(x = xvalues, y = values, color = method)) +
geom_point(position=position_dodge(.9), size = 3) +
geom_errorbar(aes(ymin = ic_0.025, ymax = ic_0.975, group = method),
size = 1.3, position=position_dodge(.9), width = .2) +
scale_color_manual(values = c("real" = "black", "A" = "orange", "B" = "blue"))
I would add real_values to your data as another level of method, so they will be dodged along with "A" and "B" (and included in the legend):
library(ggplot2)
dat_ex <- rbind(
dat_ex,
data.frame(
xvalues = 1:10,
values = real_values,
method = "Real",
ic_0.025 = NA_real_,
ic_0.975 = NA_real_
)
)
# arrange so "Real" is on the left
dat_ex$method <- factor(dat_ex$method, levels = c("Real", "A", "B"))
ggplot(dat_ex) +
geom_point(aes(x = xvalues, y= values, group = method, colour = method), position=position_dodge(.9), size = 3) +
geom_errorbar(aes(x = xvalues, y= values, group = method, colour = method,ymin = ic_0.025, ymax = ic_0.975), size = 1.3,position=position_dodge(.9), width = .2) +
scale_colour_manual(values = c("black", "forestgreen", "royalblue"))

How to change the ranges of the left and the right axis in ggplot2?

data
structure(list(VAR1 = 1:25, VAR2 = c(33151.4302619749, 30243.6061009354,
29075.4630823572, 27646.1136405244, 27227.44196157, 25910.8454253342,
26405.0119958585, 26167.0056585366, 25230.4079822407, 24976.0912545877,
25343.7017313494, 26159.0753483146, 26957.9730021768, 27759.9504648796,
29046.5695915796, 26946.8014342613, 28679.717757397, 28511.198514726,
31713.1021393953, 29817.0124623543, 29296.1166847962, 30338.1835634015,
31091.0104433836, 31006.5299473411, 31774.4212233181), VAR3 = c(0.159880385135182,
0.271631841511188, 0.32632879125533, 0.399844578596391, 0.421938705559609,
0.478323940647091, 0.48149816119631, 0.509588233124643, 0.551439379228908,
0.573370058778077, 0.580567765134673, 0.580904582887182, 0.576292211835652,
0.566802717164287, 0.588146883004088, 0.609420847255646, 0.5934221293271,
0.622638198647242, 0.600671162470947, 0.563000351965266, 0.641578758944993,
0.629095088526046, 0.608579189017581, 0.65220556730865, 0.63606478018115
)), class = "data.frame", row.names = c(NA, -25L))
I want to draw a plot with dual y axis with the following code. It seems that the limit of the left axis doesn't match the red line appropriately and I want to stretch the red line with the lowest point being at the bottom of the plot.
ggplot(df, aes(VAR1)) +
geom_point(aes(y = VAR2), size = 3, color = '#D62728') +
geom_line(aes(y = VAR2), size = 1, color = '#D62728', lty = 3) +
geom_point(aes(y = VAR3*5.1e4), size = 3, color = '#1F77B4') +
geom_line(aes(y = VAR3*5.1e4), size = 1, color = '#1F77B4') +
scale_y_continuous(name = 'VAR2', sec.axis = sec_axis(~./5.1e4, name = "VAR3"))
Scale the data differently (and add an intercept, not just a scalar).
Up front, the use of two axes is (intentionally) obfuscating the true relationship between the lines. From this image, one might infer that the two numbers overlap somehow, where in reality the ranges are far different (VAR2 is 25k-33k, VAR3 is 0.16-0.65). I recognize you are overlapping these intentionally, I'm adding this note as a caution that this methodology can introduce visual bias.
zz <- c(22000, 18000)
ggplot(df, aes(VAR1)) +
geom_point(aes(y = VAR2), size = 3, color = '#D62728') +
geom_line(aes(y = VAR2), size = 1, color = '#D62728', lty = 3) +
geom_point(aes(y = zz[1] + VAR3*zz[2]), size = 3, color = '#1F77B4') +
geom_line(aes(y = zz[1] + VAR3*zz[2]), size = 1, color = '#1F77B4') +
scale_y_continuous(name = 'VAR2', sec.axis = sec_axis(~(. - zz[1]) / zz[2], name = "VAR3"))

How to change the text and title of legend in ggplot with several variables

I'm trying to fix my legend text so that the text is representing the appropriate symbols and color. However, I have a lot of variables that I need to include in the legend, and they are all in different columns. Does anyone know a quick way to indicate what the colours and symbol are in the ggplot legend?
Here is some sample code
#sample data
temps = data.frame(Temperature= c(15,25,35),
Growth.Phase = c("exponential", "stationary", "death"),
Carbohydrates = sample(c(3:10), 9, replace = T),
Lipids = sample(c(10:25), 9, replace = T),
Chlorophyll = sample(c(2:15), 9),
DNA.RNA = sample(c(3:15), 9),
Protein = sample(c(5:20), 9))
temps$Shape = if_else(temps$Growth.Phase == "exponential", 21,
if_else(temps$Growth.Phase == "stationary", 22, 23))
#Graph code
ggplot(data = temps, aes(x = Temperature, y = "Proportions", shape = factor(Shape))) +
geom_point(aes(y = Carbohydrates),colour = "darkred",
fill = "darkred", size = 3) +
geom_line(aes(y = Carbohydrates), size = 1, col = "darkred") +
geom_point(aes(y = Lipids), colour = "darkblue",
fill = "darkblue", size = 3, col ="darkblue") +
geom_line(aes(y = Lipids), size = 1) +
geom_point(aes(y = Protein), colour = "violet",
fill = "violet", size = 3) +
geom_line(aes(y = Protein), size = 1, col ="violet") +
geom_point(aes(y = DNA.RNA), colour = "darkorange",
fill = "darkorange", size = 3) +
geom_line(aes(y = DNA.RNA), size = 1, col = "darkorange") +
geom_point(aes(y = Chlorophyll), size = 3, colour = "darkgreen",
fill = "darkgreen") +
geom_line(aes(y = Chlorophyll), size = 1, col = "darkgreen") +
labs(x = "Temperature (°C)", y = "Proportion")
This is the image I am getting
But as you can see it's not giving me the correct text in the legend. I would like the symbols to specify which Growth.Phase they are and the colour to specify what column I have plotted (ie. Carbohydrate, Protein etc....). Does anyone know a quick fix?
When I use my own data this is what the graph looks like, please note the lines are going through the same symbols, and are the same colours
I'm not sure whether I got the legend right. But the idea is the same as in #dc37's answer. Your plot can be considerably simplified using pivot_longer:
#sample data
temps = data.frame(Temperature= c(15,25,35),
Growth.Phase = c("exponential", "stationary", "death"),
Carbohydrates = sample(c(3:10), 9, replace = T),
Lipids = sample(c(10:25), 9, replace = T),
Chlorophyll = sample(c(2:15), 9),
DNA.RNA = sample(c(3:15), 9),
Protein = sample(c(5:20), 9))
library(ggplot2)
library(dplyr)
library(tidyr)
library(tibble)
temps_long <- temps %>%
pivot_longer(-c(Temperature, Growth.Phase)) %>%
mutate(
shape = case_when(
Growth.Phase == "exponential" ~ 21,
Growth.Phase == "stationary" ~ 22,
TRUE ~ 23
),
color = case_when(
name == "Carbohydrates" ~ "darkred",
name == "Lipids" ~ "darkblue",
name == "Protein" ~ "violet",
name == "DNA.RNA" ~ "darkorange",
name == "Chlorophyll" ~ "darkgreen",
TRUE ~ NA_character_
),
)
# named color vector
colors <- select(temps_long, name, color) %>%
distinct() %>%
deframe()
# named shape vector
shapes <- select(temps_long, Growth.Phase, shape) %>%
distinct() %>%
deframe()
ggplot(data = temps_long, aes(x = Temperature, y = value, shape = Growth.Phase, color = name, fill = name, group = Temperature)) +
geom_point(size = 3) +
geom_line(size = 1) +
scale_shape_manual(values = shapes) +
scale_fill_manual(values = colors) +
scale_color_manual(values = colors) +
labs(x = "Temperature (C)", y = "Proportion", color = "XXXX") +
guides(fill = FALSE, shape = guide_legend(override.aes = list(fill = "black")))
Created on 2020-04-04 by the reprex package (v0.3.0)
In order to make your code simpler and not have to repeat several times the same line, you can transform your data into a longer format and then use those new variables to attribute color, fill and shape arguments in your aes.
Then, using scale_color_manual or scale_shape_manual, you can set appropriate color and shape.
In order to add lines between appropriate points, I add a "rep" column in order to mimick the rpesence of replicate in your experiments. Otherwise, geom_line can't decide which points are associated together.
library(tidyr)
library(dplyr)
library(ggplot2)
temps %>% mutate(Rep = rep(1:3,each = 3)) %>%
pivot_longer(cols = Carbohydrates:Protein, names_to = "Type", values_to = "proportions") %>%
ggplot(aes(x = Temperature, y = proportions))+
geom_point(aes(fill = Type, shape = Growth.Phase, color = Type), size = 3)+
geom_line(aes( color = Type, group =interaction(Rep, Type)))+
scale_color_manual(values = c("darkred","darkgreen","darkorange","darkblue","violet"))+
scale_fill_manual(values = c("darkred","darkgreen","darkorange","darkblue","violet"))+
scale_shape_manual(values = c(23,21,22))+
labs(x = "Temperature (°C)", y = "Proportion")
Does it answer your question ?

ggplot2's line legends appear "crossed-out"

I'm creating a ggplot with two lines, each from separate geoms. As an example:
df = data.frame(
x.v = seq(0, 1, 0.025),
y.v = runif(41)
)
straight.line = data.frame(
Inter = c(0),
Slope = c(1)
)
p = ggplot() +
geom_point(
mapping = aes(
x = x.v,
y = y.v
),
data = df,
colour = "blue"
) +
geom_smooth(
mapping = aes(
x = x.v,
y = y.v,
colour = "line of best fit"
),
data = df,
method = "lm",
show.legend = NA
) +
geom_abline(
mapping = aes(
intercept = Inter,
slope = Slope,
colour = "y = x"
),
data = straight.line,
show.legend = NA
) +
guides(
fill = "none",
linetype = "none",
shape = "none",
size = "none"
)
This gives the output:
As you can see, the legend has weird diagonal lines through it. An answer to a similar question says this can be fixed by using show.legend = NA. However, as you can see in the code above, I did this and it did not change the result.
Does anybody know what is adding the diagonal lines in the legend and how else I can fix it please? Thanks.
EDIT: A question of if this is a duplicate of this. This may be the answer but how do I apply this when the answer in the link uses fill, and I use colour, please?
If I try
+ guides(colour = guide_legend(override.aes = list(colour = NULL)))
I get the error
Error in check.length("col") : 'gpar' element 'col' must not be length 0
and if I try
+ guides(colour = guide_legend(override.aes = listfill = NULL)))
I get the error
Error in `$<-.data.frame`(`*tmp*`, "fill", value = character(0)) :
replacement has 0 rows, data has 1
The following works:
library(ggplot2)
ggplot() +
geom_point(mapping = aes(x = x.v, y = y.v),
data = df, colour = "blue") +
geom_smooth(mapping = aes(x = x.v, y = y.v, colour = "line of best fit"),
data = df, method = "lm", show.legend = NA) +
geom_abline(mapping = aes(intercept = Inter, slope = Slope, colour = "y = x"),
data = straight.line, show.legend = FALSE) +
guides(fill = "none", linetype = "none", shape = "none", size = "none")
The code can be made a little bit less repetitive and we can leave out some things (liek the guide-call):
ggplot(data = df, mapping = aes(x = x.v, y = y.v)) +
geom_point(colour = "blue") +
geom_smooth(aes(colour = "line of best fit"), method = "lm") +
geom_abline(mapping = aes(intercept = Inter, slope = Slope, colour = "y = x"),
data = straight.line, show.legend = FALSE)
Why do we need to use show.legend = FALSE here and not show.legend = NA?
From the documentation:
show.legend
logical. Should this layer be included in the legends? NA, the default, includes if any aesthetics are mapped. FALSE never includes, and TRUE always includes. It can also be a named logical vector to finely select the aesthetics to display
This means that is we use show.legend = NA for the geom_abline-call we use this layer in the legend. However, we don't want to use this layer and therefore need show.legend = FALSE. You can see that this does not influence, which colors are included in the legend, only the layer.
Data
set.seed(42) # For reproducibilty
df = data.frame(x.v = seq(0, 1, 0.025),
y.v = runif(41))
straight.line = data.frame(Inter = 0, Slope = 1)

Resources