I want to draw two densities with two vertical lines for the averages.
The legend is once to denote the densities and once the vertical
lines.
I tried the code below. However, only one legend appears and the labeling is wrong.
Can anyone help me?
set.seed(1234)
data <- data.frame(value = rnorm(n = 10000, mean = 50, sd = 20),
type = sample(letters[1:2], size = 10000, replace = TRUE))
data$value[data$type == "b"] <- data$value[data$type == "b"] + 50
mean.a <- mean(data$value[data$type == "a"])
mean.b <- mean(data$value[data$type == "b"])
library(ggplot2)
gp <- ggplot(data = data, aes(x = value))
gp <- gp + geom_density(aes(fill = type), color = "black", alpha=0.3, lwd = 1.0, show.legend = TRUE)
gp <- gp + scale_fill_manual(breaks = 1:2, name = "Density", values = c("a" = "green", "b" = "blue"), labels = c("a" = "Density a", "b" = "Density b") )
gp <- gp + geom_vline(aes(color="mean.a", xintercept=mean.a), linetype="solid", size=1.0, show.legend = NA)
gp <- gp + geom_vline(aes(color="mean.b", xintercept=mean.b), linetype="dashed", size=1.0, show.legend = NA)
gp <- gp + scale_color_manual(name = "", values = c("mean.a" = "red", "mean.b" = "darkblue"), labels = c("mean.a" = "Mean.A", "mean.b" = "Mean.B"))
gp <- gp + theme(legend.position="top")
gp
Here are a couple ways to do it. I'm not sure, but I think some of the difficulty comes from having more than one geom_vline and trying to hard-code values in the aes. You're building three scales here: fill for the density curves, and color and linetype for the vertical lines. But you're aiming (correct me if I'm misreading) for two legends.
The easiest way to deal with getting the proper legends is to make a small data frame for the means, rather than individual values for each mean. You can do this easily with dplyr to calculate means for each type.
library(tidyverse)
set.seed(1234)
data <- data.frame(value = rnorm(n = 10000, mean = 50, sd = 20),
type = sample(letters[1:2], size = 10000, replace = TRUE))
data$value[data$type == "b"] <- data$value[data$type == "b"] + 50
means <- group_by(data, type) %>%
summarise(mean = mean(value))
means
#> # A tibble: 2 x 2
#> type mean
#> <fct> <dbl>
#> 1 a 50.3
#> 2 b 99.9
Then when you plot, you can make a single geom_vline call, assigning the means data frame and allowing the aesthetics you want—color and linetype—to be scaled based on this data. The trick then is reconciling the names and labels: if you don't set the same legend name and labels for both the color and linetype scales, you'll have two legends for the lines. Set them the same, and you get a single legend for the mean lines.
ggplot(data, aes(x = value)) +
geom_density(aes(fill = type), alpha = 0.3) +
geom_vline(aes(xintercept = mean, color = type, linetype = type), data = means) +
scale_color_manual(values = c("red", "darkblue"), labels = c("Mean.A", "Mean.B"), name = NULL) +
scale_linetype_discrete(labels = c("Mean.A", "Mean.B"), name = NULL) +
scale_fill_manual(values = c(a = "green", b = "blue"), name = "Density")
The second way is to just add a step to creating the means data frame where you label the types the way you want later, i.e. "Mean.A" instead of just "a". Then you don't need to adjust labels, and you can skip the linetype scale—unless you want to change linetypes manually—and then just remove the name for that legend for both color and linetype in your labs.
means2 <- group_by(data, type) %>%
summarise(mean = mean(value)) %>%
mutate(type = paste("Mean", str_to_upper(type), sep = "."))
means2
#> # A tibble: 2 x 2
#> type mean
#> <chr> <dbl>
#> 1 Mean.A 50.3
#> 2 Mean.B 99.9
ggplot(data, aes(x = value)) +
geom_density(aes(fill = type), alpha = 0.3) +
geom_vline(aes(xintercept = mean, color = type, linetype = type), data = means2) +
scale_color_manual(values = c(Mean.A = "red", Mean.B = "darkblue")) +
scale_fill_manual(values = c(a = "green", b = "blue"), name = "Density") +
labs(color = NULL, linetype = NULL)
Created on 2018-06-05 by the reprex package (v0.2.0).
Related
This question already has an answer here:
How can I add hatches, stripes or another pattern or texture to a barplot in ggplot?
(1 answer)
Closed 1 year ago.
Suppose I want to show in a barplot the gene expression results (logFC) based on RNA-seq and q-PCR analysis. My dataset looks like that:
set.seed(42)
f1 <- expand.grid(
comp = LETTERS[1:3],
exp = c("qPCR", "RNA-seq"),
geneID = paste("Gene", 1:4)
)
f1$logfc <- rnorm(nrow(f1))
f1$SE <- runif(nrow(f1), min=0, max=1.5)
My R command line
p=ggplot(f1, aes(x=geneID, y=logfc, fill= comp,color=exp))+
geom_bar(stat="identity", position =position_dodge2(preserve="single"))+
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1))```
I have this output:
I want to get any mark patterns or hatches on the bars corresponding to one of the variables (exp or comp) and adding the upper error bars as shown in this plot bellow:
Any help please?
Following the linked answer, it seems quite natural how to extend it to your case. In the example below, I'm using some dummy data structured like the head() data you gave, since the csv link gave me a 404.
library(ggplot2)
library(ggpattern)
#>
#> Attaching package: 'ggpattern'
#> The following objects are masked from 'package:ggplot2':
#>
#> flip_data, flipped_names, gg_dep, has_flipped_aes, remove_missing,
#> should_stop, waiver
# Setting up some dummy data
set.seed(42)
f1 <- expand.grid(
comp = LETTERS[1:3],
exp = c("qPCR", "RNA-seq"),
geneID = paste("Gene", 1:4)
)
f1$logfc <- rnorm(nrow(f1))
ggplot(f1, aes(x = geneID, y = logfc, fill = comp)) +
geom_col_pattern(
aes(pattern = exp),
colour = "black",
pattern_fill = "black",
pattern_angle = 45,
pattern_density = 0.1,
pattern_spacing = 0.01,
position = position_dodge2(preserve = 'single'),
) +
scale_pattern_manual(
values = c("none", "stripe"),
guide = guide_legend(override.aes = list(fill = "grey70")) # <- make lighter
) +
scale_fill_discrete(
guide = guide_legend(override.aes = list(pattern = "none")) # <- hide pattern
)
Created on 2021-04-19 by the reprex package (v1.0.0)
EDIT: if you want to repeat the hatching in the fill legend, you can make an interaction() and then customise a manual fill scale.
ggplot(f1, aes(x = geneID, y = logfc)) +
geom_col_pattern(
aes(pattern = exp,
fill = interaction(exp, comp)), # <- make this an interaction
colour = "black",
pattern_fill = "black",
pattern_angle = 45,
pattern_density = 0.1,
pattern_spacing = 0.01,
position = position_dodge2(preserve = 'single'),
) +
scale_pattern_manual(
values = c("none", "stripe"),
guide = guide_legend(override.aes = list(fill = "grey70")) # <- make lighter
) +
scale_fill_manual(
# Have 3 colours and repeat each twice
values = rep(scales::hue_pal()(3), each = 2),
# Extract the second name after the '.' from the `interaction()` call
labels = function(x) {
vapply(strsplit(x, "\\."), `[`, character(1), 2)
},
# Repeat the pattern over the guide
guide = guide_legend(
override.aes = list(pattern = rep(c("none", "stripe"), 3))
)
)
Created on 2021-04-19 by the reprex package (v1.0.0)
EDIT2: Now with errorbars:
library(ggplot2)
library(ggpattern)
set.seed(42)
f1 <- expand.grid(
comp = LETTERS[1:3],
exp = c("qPCR", "RNA-seq"),
geneID = paste("Gene", 1:4)
)
f1$logfc <- rnorm(nrow(f1))
f1$SE <- runif(nrow(f1), min=0, max=1.5)
ggplot(f1, aes(x = geneID, y = logfc)) +
geom_col_pattern(
aes(pattern = exp,
fill = interaction(exp, comp)), # <- make this an interaction
colour = "black",
pattern_fill = "black",
pattern_angle = 45,
pattern_density = 0.1,
pattern_spacing = 0.01,
position = position_dodge2(preserve = 'single'),
) +
geom_errorbar(
aes(
ymin = logfc,
ymax = logfc + sign(logfc) * SE,
group = interaction(geneID, comp, exp)
),
position = "dodge"
) +
scale_pattern_manual(
values = c("none", "stripe"),
guide = guide_legend(override.aes = list(fill = "grey70")) # <- make lighter
) +
scale_fill_manual(
# Have 3 colours and repeat each twice
values = rep(scales::hue_pal()(3), each = 2),
# Extract the second name after the '.' from the `interaction()` call
labels = function(x) {
vapply(strsplit(x, "\\."), `[`, character(1), 2)
},
# Repeat the pattern over the guide
guide = guide_legend(
override.aes = list(pattern = rep(c("none", "stripe"), 3))
)
)
Created on 2021-04-22 by the reprex package (v1.0.0)
I'm interested in visualizing the location of certain words in sentences. Say I have 500 sentences between 3-5 words long and want to visualize the location of word A in each sentence:
Data:
set.seed(123)
w1 <- sample(LETTERS[1:3], 1000, replace = TRUE)
w2 <- sample(LETTERS[1:5], 1000, replace = TRUE)
w3 <- sample(LETTERS[1:6], 1000, replace = TRUE)
w4 <- sample(c(NA,LETTERS[1:7]), 1000, replace = TRUE)
w5 <- sample(c(NA,LETTERS[1:8]), 1000, replace = TRUE)
df <- data.frame(
position = rep(1:5, each = 1000), # position of word in sentence
word = c(w1, w2, w3, w4, w5) # the words in the sentences
)
I can produce the location plot in base R. But the code involves a very slow for loop and does not have the aesthetic qualities of ggplot2. So how can the same type of visualization be produced faster and in ggplot2?
This is the code that produces the location plot in base R:
# Plot dimensions:
x <- rep(1:5, 100)
y <- 1:500
# Plot parameters:
par(mar=c(2,1.5,1,1.5), par(xpd = T))
# Plot:
plot(y ~ x, type = "n", frame = F, axes = F, ylab="", xlab="",
main="Location of word 'A' in sentences", cex.main=0.9)
axis(1, at=seq(1:5), labels=c("w1", "w2", "w3", "w4", "w5"), cex.axis=0.9)
# Legend:
legend(2.25, 530, c("A", "other", "NA"), fill=c("blue", "orange", "black"),
horiz = T, cex = 0.7, bty = "n", border = "white")
# For loop to print 'A' as color in positions:
for(i in unique(df$position)){
text(i, 1:500, "__________", cex = 1,
col = ifelse(df[df$position==i,]$word=="A", "blue", "orange"))
}
For example using geom_segment, and then with a conditional aesthetic.
I am using ggh4x for the truncated axis.
library(tidyverse)
library(ggh4x)
df <-
df %>% group_by(position) %>%mutate(index = row_number())
ggplot(df, aes( color = word=="A")) +
geom_segment(aes(x = position-.4, xend = position+.4,
y = index, yend = index),
key_glyph= "rect") +
scale_color_manual(name = NULL,
values=c(`TRUE` = "blue", `FALSE` = "orange"),
labels = c(`TRUE` = "A", `FALSE` = "other"),
na.value="black")+
guides(x = "axis_truncated") +
scale_x_continuous(breaks = 1:5, labels = paste0("w", 1:5))+
theme_classic() +
theme(axis.line.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
plot.title = element_text(hjust = .5),
legend.position = "top") +
labs( y = NULL, x = NULL, title = "Location of A")
Here's an initial attempt. (I'm not quite clear, are you looking to show just the first 500 of the 1000 sentences?)
My approach here is to first summarize the data in terms of contiguous sections that are A / other / NA. This way, the plot area is filled exactly without needing to tweak line thickness, and it should plot more quickly by reducing the number of plotted elements.
library(dplyr)
df_plot <- df %>%
mutate(A_spots = case_when(word == "A" ~ "A",
word != "A" ~ "other",
TRUE ~ "NA")) %>%
group_by(position) %>%
mutate(col_chg = A_spots != lag(A_spots, default = ""),
group_num = cumsum(col_chg)) %>%
ungroup() %>%
count(position, group_num, A_spots)
library(ggplot2)
ggplot(df_plot, aes(position, n, fill = A_spots, group = group_num)) +
geom_col() +
scale_x_continuous(name = NULL, breaks = 1:5, #stolen from #tjebo's answer
labels = paste0("w", 1:5))+
scale_fill_manual(
values = c("A" = "blue","other" = "orange", "NA" = "black")) +
labs(title = "Location of word 'A' in sentences") +
theme_minimal()
I'm trying to fix my legend text so that the text is representing the appropriate symbols and color. However, I have a lot of variables that I need to include in the legend, and they are all in different columns. Does anyone know a quick way to indicate what the colours and symbol are in the ggplot legend?
Here is some sample code
#sample data
temps = data.frame(Temperature= c(15,25,35),
Growth.Phase = c("exponential", "stationary", "death"),
Carbohydrates = sample(c(3:10), 9, replace = T),
Lipids = sample(c(10:25), 9, replace = T),
Chlorophyll = sample(c(2:15), 9),
DNA.RNA = sample(c(3:15), 9),
Protein = sample(c(5:20), 9))
temps$Shape = if_else(temps$Growth.Phase == "exponential", 21,
if_else(temps$Growth.Phase == "stationary", 22, 23))
#Graph code
ggplot(data = temps, aes(x = Temperature, y = "Proportions", shape = factor(Shape))) +
geom_point(aes(y = Carbohydrates),colour = "darkred",
fill = "darkred", size = 3) +
geom_line(aes(y = Carbohydrates), size = 1, col = "darkred") +
geom_point(aes(y = Lipids), colour = "darkblue",
fill = "darkblue", size = 3, col ="darkblue") +
geom_line(aes(y = Lipids), size = 1) +
geom_point(aes(y = Protein), colour = "violet",
fill = "violet", size = 3) +
geom_line(aes(y = Protein), size = 1, col ="violet") +
geom_point(aes(y = DNA.RNA), colour = "darkorange",
fill = "darkorange", size = 3) +
geom_line(aes(y = DNA.RNA), size = 1, col = "darkorange") +
geom_point(aes(y = Chlorophyll), size = 3, colour = "darkgreen",
fill = "darkgreen") +
geom_line(aes(y = Chlorophyll), size = 1, col = "darkgreen") +
labs(x = "Temperature (°C)", y = "Proportion")
This is the image I am getting
But as you can see it's not giving me the correct text in the legend. I would like the symbols to specify which Growth.Phase they are and the colour to specify what column I have plotted (ie. Carbohydrate, Protein etc....). Does anyone know a quick fix?
When I use my own data this is what the graph looks like, please note the lines are going through the same symbols, and are the same colours
I'm not sure whether I got the legend right. But the idea is the same as in #dc37's answer. Your plot can be considerably simplified using pivot_longer:
#sample data
temps = data.frame(Temperature= c(15,25,35),
Growth.Phase = c("exponential", "stationary", "death"),
Carbohydrates = sample(c(3:10), 9, replace = T),
Lipids = sample(c(10:25), 9, replace = T),
Chlorophyll = sample(c(2:15), 9),
DNA.RNA = sample(c(3:15), 9),
Protein = sample(c(5:20), 9))
library(ggplot2)
library(dplyr)
library(tidyr)
library(tibble)
temps_long <- temps %>%
pivot_longer(-c(Temperature, Growth.Phase)) %>%
mutate(
shape = case_when(
Growth.Phase == "exponential" ~ 21,
Growth.Phase == "stationary" ~ 22,
TRUE ~ 23
),
color = case_when(
name == "Carbohydrates" ~ "darkred",
name == "Lipids" ~ "darkblue",
name == "Protein" ~ "violet",
name == "DNA.RNA" ~ "darkorange",
name == "Chlorophyll" ~ "darkgreen",
TRUE ~ NA_character_
),
)
# named color vector
colors <- select(temps_long, name, color) %>%
distinct() %>%
deframe()
# named shape vector
shapes <- select(temps_long, Growth.Phase, shape) %>%
distinct() %>%
deframe()
ggplot(data = temps_long, aes(x = Temperature, y = value, shape = Growth.Phase, color = name, fill = name, group = Temperature)) +
geom_point(size = 3) +
geom_line(size = 1) +
scale_shape_manual(values = shapes) +
scale_fill_manual(values = colors) +
scale_color_manual(values = colors) +
labs(x = "Temperature (C)", y = "Proportion", color = "XXXX") +
guides(fill = FALSE, shape = guide_legend(override.aes = list(fill = "black")))
Created on 2020-04-04 by the reprex package (v0.3.0)
In order to make your code simpler and not have to repeat several times the same line, you can transform your data into a longer format and then use those new variables to attribute color, fill and shape arguments in your aes.
Then, using scale_color_manual or scale_shape_manual, you can set appropriate color and shape.
In order to add lines between appropriate points, I add a "rep" column in order to mimick the rpesence of replicate in your experiments. Otherwise, geom_line can't decide which points are associated together.
library(tidyr)
library(dplyr)
library(ggplot2)
temps %>% mutate(Rep = rep(1:3,each = 3)) %>%
pivot_longer(cols = Carbohydrates:Protein, names_to = "Type", values_to = "proportions") %>%
ggplot(aes(x = Temperature, y = proportions))+
geom_point(aes(fill = Type, shape = Growth.Phase, color = Type), size = 3)+
geom_line(aes( color = Type, group =interaction(Rep, Type)))+
scale_color_manual(values = c("darkred","darkgreen","darkorange","darkblue","violet"))+
scale_fill_manual(values = c("darkred","darkgreen","darkorange","darkblue","violet"))+
scale_shape_manual(values = c(23,21,22))+
labs(x = "Temperature (°C)", y = "Proportion")
Does it answer your question ?
In this SO answer, user #Crops shows how to add a legend to a ggalt::geom_dumbbell plot. Very nice.
library(ggalt)
df <- data.frame(trt=LETTERS[1:5], l=c(20, 40, 10, 30, 50), r=c(70, 50, 30, 60, 80))
df2 = tidyr::gather(df, group, value, -trt)
ggplot(df, aes(y = trt)) +
geom_point(data = df2, aes(x = value, color = group), size = 3) +
geom_dumbbell(aes(x = l, xend = r), size=3, color="#e3e2e1",
colour_x = "red", colour_xend = "blue",
dot_guide=TRUE, dot_guide_size=0.25) +
theme_bw() +
scale_color_manual(name = "", values = c("red", "blue") )
I want to sort trt descending on r. I tried replacing y = trt with y = reorder(trt, r), but I get an error that object r is not found.
Here is a way where we reorder the factor levels of trt in df and df2 before we plot.
# reorder factor levels
df$trt <- reorder(df$trt, df$r)
df2$trt <- factor(df2$trt, levels = levels(df$trt))
ggplot(df, aes(y = trt)) +
geom_point(data = df2, aes(x = value, color = group), size = 3) +
geom_dumbbell(aes(x = l, xend = r), size=3, color="#e3e2e1",
colour_x = "red", colour_xend = "blue",
dot_guide=TRUE, dot_guide_size=0.25) +
theme_bw() +
scale_color_manual(name = "", values = c("red", "blue") )
Using the dumbbell package
##Reformat data
df3<-df %>% arrange(r)
df2<-df%>% mutate("key"="trt")
df2$trt<-factor(df2$trt,df3$trt)
##plot
dumbbell::dumbbell(df2, id="trt", column1="l", column2="r",key="key", delt =1, textsize=3, lab1 = "l", lab2="r", pt_val = 1, pointsize = 3,pt_alpha = 0.6, arrow=1, leg = "Add legend title", pval=2) + xlim(8,85) + facet_wrap(key ~.)
Added in some bells and whistles, you can remove them toggling with the options.
I dont have enough points to embed for here is the link. Hope someone finds it useful.
I have looked through similar questions and I have a feeling I have done everything. Still not getting the desire output. I am using ggplot2 and tidyquant packages to visualize data with 2 financial trends I am trying to display a legend that contains trends line coloron plot
data %>%
ggplot(aes(date, price)) +
geom_line() +
geom_ma(ma_fun = SMA, n = 50, size = 1 , col = "red" , show.legend = TRUE)+
geom_ma(ma_fun = SMA, n = 200, size = 1 , col = "blue", show.legend= TRUE)+
theme_tq()
Here you go:
library(tidyquant)
library(ggplot2)
data <- data.frame(date = 1:1000, price = cumsum(rnorm(1000)))
data %>%
ggplot(aes(date, price)) +
geom_line() +
geom_ma(aes(color = 'MA50'), ma_fun = SMA, n = 50, size = 1 ,show.legend = TRUE)+
geom_ma(aes(color = 'MA200'), ma_fun = SMA, n = 200, size = 1 , show.legend = TRUE) +
scale_colour_manual(name = 'Legend',
guide = 'legend',
values = c('MA50' = 'red',
'MA200' = 'blue'),
labels = c('SMA(50)',
'SMA(200)'))