This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 4 years ago.
I'm working with the Prosper Loan dataset and I'm trying to show two variable in the same plot using geom_density.
The problem, when I try to include the lengend to show the variable name from the pink area and the variable name from the dark area, it doesn't work.
library(ggplot2)
EstimatedLoss <- c(0.5, 0.2,0.3,0.4,0.8,0.5, 0.2,0.3,0.4,0.8)
EstimatedEffectiveYield <- c(0.10, 0.15,0.18,0.20,0.8,0.15, 0.13,0.22,0.22,0.25)
prosper_loan <- data.frame(EstimatedLoss,EstimatedEffectiveYield)
ggplot(data = prosper_loan)
geom_density(aes(EstimatedLoss * 100), color = '#e1b582', fill = '#e1b582', alpha = 0.5, show.legend = TRUE ) +
geom_density(aes(EstimatedEffectiveYield * 100), color = '#a2b285',fill = '#a2b285', alpha = 0.7, linetype = 3, size = 1, show.legend = TRUE) +
scale_y_continuous(name = "Density")+
scale_x_continuous(name = "Estimate loss and effective yield in percentage") +
ggtitle('Density from the Estimated loss and effective yield in percentage')
Am I doing anything wrong?
Ideally, your data should be one observation per row (aka "long" data) to properly take advantage of ggplot2. Here's an example of first transforming the data using tidyr::gather. A legend will automatically be added with a fill or color aesthetic.
library(ggplot2)
library(tidyr)
library(magrittr)
EstimatedLoss <- c(0.5, 0.2,0.3,0.4,0.8,0.5, 0.2,0.3,0.4,0.8)
EstimatedEffectiveYield <- c(0.10, 0.15,0.18,0.20,0.8,0.15, 0.13,0.22,0.22,0.25)
prosper_loan <- data.frame(EstimatedLoss, EstimatedEffectiveYield) %>%
gather(key, value, EstimatedLoss:EstimatedEffectiveYield)
ggplot(data = prosper_loan) +
geom_density(aes(value * 100, fill = key, color = key), alpha = 0.5) +
scale_fill_manual(values = c('#e1b582', '#a2b285')) +
scale_color_manual(values = c('#e1b582', '#a2b285')) +
scale_y_continuous(name = "Density")+
scale_x_continuous(name = "Estimate loss and effective yield in percentage") +
ggtitle('Density from the Estimated loss and effective yield in percentage')
Related
I'm creating a bar chart with a pattern for a subset of the bars, and I want to add error bars.
However, I'm having trouble lining up the error bars with with the bar charts—I want to have them appear centered on each bar. How do I do this? Moreover, the legend currently does not clearly distinguish the striped and non-striped bars as corresponding to not treated and treated groups.
Finally, I'd like to create version of this plot which stacks adjacent bars (i.e. bars within each facet_grid)—any tips on how to do that would be much appreciated.
The code I'm using is:
library(ggplot2)
library(tidyverse)
library(ggpattern)
models = c("a", "b")
task = c("1","2")
ratios = c(0.3, 0.4)
standard_errors = c(0.02, 0.02)
ymax = ratios + standard_errors
ymin = ratios - standard_errors
colors = c("#F39B7FFF", "#8491B4FF")
df <- data.frame(task = task, ratios = ratios)
df <- df %>% mutate(filler = 1-ratios)
df <- df %>% gather(key = "obs", value = "ratios", -1)
df$upper <- df$ratios + c(standard_errors,standard_errors)
df$models <- c(models,models)
df$lower <- df$ratios - c(standard_errors,standard_errors)
df$col <- c(colors,colors)
df$group <- paste(df$task, df$models, sep="-")
df$treated <- "yes"
df[df$ratios<0.5,]$treated = "no"
p <- ggplot(df, aes(x = group, y = ratios, fill = col, ymin = lower, ymax = upper)) +
stat_summary(aes(pattern=treated),
fun = "mean", position=position_dodge(),
geom = "bar_pattern", pattern_fill="black", colour="black") +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, position=position_dodge(0.9)) +
scale_pattern_manual(values=c("none", "stripe"))+ #edited part
facet_grid(.~task,
scales = "free_x", # Let the x axis vary across facets.
space = "free_x", # Let the width of facets vary and force all bars to have the same width.
switch = "x") + guides(colour = guide_legend(nrow = 1)) +
guides(fill = "none")
p
Here is an option
df %>%
ggplot(aes(x = models, y = ratios)) +
geom_col_pattern(
aes(fill = col, pattern = treated),
pattern_fill = "black",
colour = "black",
pattern_key_scale_factor = 0.2,
position = position_dodge()) +
geom_errorbar(
aes(ymin = lower, ymax = upper, group = interaction(task, treated)),
width = 0.2,
position = position_dodge(0.9)) +
facet_grid(~ task, scales = "free_x") +
scale_pattern_manual(values = c("none", "stripe")) +
scale_fill_identity()
A few comments:
I don't understand the point of creating group. IMO this is unnecessary. TBH, I also don't understand the point of models and task: if task = "1" then models = "a"; if task = "2" then models = "b"; so task and models are redundant as they encode the same thing (whether you call it "1"/"2" or "a"/"b").
The reason why you (originally) didn't see a pattern in the legend is because of the scale factor in the legend key. As per ?scale_col_pattern, you can adjust this with the pattern_key_scale_factor parameter. Here, I've chosen pattern_key_scale_factor = 0.2 but you may want to play with different values.
The reason why the error bars didn't align with the dodged bars was because geom_errorbar didn't know that there are different task-treated combinations. We can fix this by explicitly defining a group aesthetic given by the combination of task & treated values. The reason why you don't need this in geom_col_pattern is because you already allow for different treated values through the pattern aesthetic.
You want to use scale_fill_identity() if you already have actual colour values defined in the data.frame.
I am trying to create a single chart from two created bar charts to show the differences in their distribution. I have both charts merging together, and the axis labels are correct. However, I cannot figure out how to get the bars in each section to be next to each other for comparison instead of overlaying. Data for this chart are two variables within the same DF. I am relatively new to r and new to ggplot so even plotting what I have was a challenge. Please be kind and I apologize if this is a question that has been answered before.
Here is the code I am using:
Labeled <- ggplot(NULL, aes(lab),position_dodge(.5)) + ggtitle("Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges")+
geom_bar(aes(x=AgeFactor,fill = "Age of Autism Diagnosis"), data = Graph, alpha = 0.5,width = 0.6) +
geom_bar(aes(x=FdgFactor,fill = "Feeding Challenge Onset"), data = Graph, alpha = 0.5,width=.6)+
scale_x_discrete(limits=c("0-6months","7-12months","1-1.99","2-2.99","3-3.99","4-4.99","5-5.99","6-6.99","7-7.99","8-8.99","9-9.99","10-10.99"))+
xlab("Age")+
ylab("")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
scale_fill_discrete(name = "")
and this is the graph it is creating for me:
I really appreciate any insight. This is my first time asking a question on stack too - so I am happy to edit/adjust info as needed.
The issue is that you plot from different columns of your dataset. To dodge your bars position_dodge requires a column to group the data by. To this end you could reshape your data to long format using e.g. tidyr::pivot_longer so that your two columns are stacked on top of each other and you get a new column containing the column or group names as categories.
Using some fake random example data. First I replicate your issue with this data and your code:
set.seed(123)
levels <- c("0-6months", "7-12months", "1-1.99", "2-2.99", "3-3.99", "4-4.99", "5-5.99", "6-6.99", "7-7.99", "8-8.99", "9-9.99", "10-10.99")
Graph <- data.frame(
AgeFactor = sample(levels, 100, replace = TRUE),
FdgFactor = sample(levels, 100, replace = TRUE),
lab = 1:100
)
library(ggplot2)
ggplot(NULL, aes(lab), position_dodge(.5)) +
ggtitle("Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges") +
geom_bar(aes(x = AgeFactor, fill = "Age of Autism Diagnosis"), data = Graph, alpha = 0.5, width = 0.6) +
geom_bar(aes(x = FdgFactor, fill = "Feeding Challenge Onset"), data = Graph, alpha = 0.5, width = .6) +
scale_x_discrete(limits = c("0-6months", "7-12months", "1-1.99", "2-2.99", "3-3.99", "4-4.99", "5-5.99", "6-6.99", "7-7.99", "8-8.99", "9-9.99", "10-10.99")) +
xlab("Age") +
ylab("") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
scale_fill_discrete(name = "")
And now the fix using reshaping. Additionally I simplified your code a bit:
library(tidyr)
library(dplyr)
Graph_long <- Graph %>%
select(AgeFactor, FdgFactor) %>%
pivot_longer(c(AgeFactor, FdgFactor))
ggplot(Graph_long, aes(x = value, fill = name)) +
geom_bar(alpha = 0.5, width = 0.6, position = position_dodge()) +
scale_fill_discrete(labels = c("Age of Autism Diagnosis", "Feeding Challenge Onset")) +
scale_x_discrete(limits = levels) +
labs(x = "Age", y = NULL, fill = NULL, title = "Figure 1. Comparison of Distribution of Age of Diagnosis and Age of Feeding Challenges") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
I'm doing a boxplot using the ggplot2 package, however, for some external reason, only half of the boxplot is being made for the "Control" and "Commercial IMD" treatments.
See below that when making the graph using the "boxplot" function, the graph is normally done.
mediasCon = tapply(dados$CS, dados$Trat, mean)
boxplot(dados$CS ~ dados$Trat, data = dados, col="gray",
xlab = 'Tratamentos', ylab = 'Espermatozoides - Cabeça Solta')
points(1:3, mediasCon, col = 'Red', pch = 16)
However, when making the same graph using the GGPLOT2 function, see that for the first two treatments only half of the graph is being done, why is this occurring?
Plus, how do I add boxplot "tails" using a ggplot2 function?
library(ggplot2)
ggplot(data=dados, aes(x=Trat, y=CS)) + geom_boxplot(fill=c("#DEEBF7","#2171B5","#034E7B"),color="black") +
xlab('Tratamentos') +
ylab('Espermatozoides - Cabeça Solta') +
stat_summary(fun=mean, colour="black", geom="point",
shape=18, size=5) +
theme(axis.title = element_text(size = 20),
axis.text = element_text(size = 16))
If you look at the help file under ?geom_boxplot you will see:
The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). This differs slightly from the method used by the boxplot() function, and may be apparent with small samples. See boxplot.stats() for more information on how hinge positions are calculated for boxplot().
In your case, the 4 entries for IMD Commercial are c(0, 1, 1, 1), which is certainly a small sample.
One way around this is to calculate where you want the hinges to be and pass that data to ggplot, using stat = "identity". This makes the code a bit more complex, but this is often the case when you are trying to modify default behaviour:
library(ggplot2)
library(dplyr)
dados %>%
group_by(Trat) %>%
summarize(median = median(CS), mean = mean(CS),
upper = quantile(CS, 0.75, type = 2),
lower = quantile(CS, 0.25, type = 2),
max = max(CS), min = min(CS)) %>%
ggplot(aes(x = Trat, y = mean, fill = Trat)) +
geom_boxplot(aes(ymin = min, lower = lower,
middle = median, upper = upper, ymax = max),
stat = "identity", color = "black") +
geom_point(size = 3, shape = 21, fill = "red") +
scale_fill_manual(values = c("#DEEBF7","#2171B5","#034E7B")) +
theme_classic() +
xlab('Tratamentos') +
ylab('Espermatozoides - Cabeça Solta')
This is in relation to this question here: Adjust geom_point size so large values are plotted, but do not appear larger in ggplot2?
Specifically in reference to the response by #Axeman
I couldn't comment on that question, and so had to ask a new question.
I wish to achieve the "squish"ing for the point size of geom_point, but the option oob=scales::squish doesn't work with scale_size_continuous. I am not sure what else am I missing.
Would appreciate any help. Here is the code I tried:
xx = ggplot(pcm, aes(x = variable, y = TF)) +
geom_point(aes(size = value, fill=value), shape = 21) +
scale_size_continuous(range=c(1, 12),
limits = c(-2, 2),
oob = scales::squish)
Further, I want to add that I cannot use scale_size_area as answered by #Axeman because I do not want 0 values to be mapped to points with size 0. The range of my data is approx. from -1.7 to +3. I want the smallest size allocated for the lowest negative value. Thanks.
Is it a problem if you handle it with a transformation on value?
Here I just imagined how your data may look like:
pcm <- data.frame(variable = runif(100),
TF = runif(100),
value = runif(100, -1.7, 3))
Here is the plot
library(ggplot2)
ggplot(pcm, aes(x = variable, y = TF)) +
geom_point(aes(size = pmax(pmin(value, 2), -2),
fill = value),
shape = 21) +
labs(size = "value")
Let me know, I may be able to improve my answer.
Also I would suggest you just a couple of updates to improve the readability of your chart, but it's up to you.
ggplot(pcm, aes(x = variable, y = TF)) +
geom_point(aes(size = pmax(pmin(value, 2), -2),
colour = value),
alpha = 0.6) +
labs(title = "Dots and colours",
size = "value") +
theme_minimal()
This question already has an answer here:
Changing colour schemes between facets
(1 answer)
Closed 3 years ago.
I am doing a boxplot in ggplot2, but I have been unable to find a way to deal with multiple colors across a 3 x 3 factor design.
This is an example code what I have able to do (using as a guide this thread):
library(ggplot2)
data <- data.frame(
value = sample(1:50),
animals = sample(c("cat","dog","zebra"), 50, replace = TRUE),
region = sample(c("forest","desert","tundra"), 50, replace = TRUE)
)
ggplot(data, aes(animals, value)) + geom_boxplot(aes(fill = animals)) +
facet_grid(~region) + scale_fill_brewer()
I am being able to use the color blue scale for the the categories: desert, forest and tundra. You can see the output here.
However, what I would like to use a diferent color scale for each one this categories. For example: yellow scale for dessert, green scale for forest and blue for tundra. Thanks!
The easiest way is to use alpha for transparency as a dimension, as suggested at the possible dupe. It's a little different to get a nice legend for boxplots, here's a worked example. (Though, since they have x-labels, you could probably just set guide = FALSE in the alpha scale.)
ggplot(data, aes(animals, value)) +
geom_boxplot(aes(fill = region, alpha = animals)) +
facet_grid( ~ region) +
scale_alpha_discrete(
range = c(0.3, 0.9),
guide = guide_legend(override.aes = list(fill = "black"))) +
scale_fill_manual(values = c("goldenrod2", "forestgreen", "dodgerblue4"))
You can do this in a not-so-elegant way with data manipulation.
library(ggplot2)
library(dplyr)
data <- data.frame(value = sample(1:50),
animals = sample(c("cat","dog","zebra"), 50, replace = TRUE),
region = sample(c("forest","desert","tundra"), 50, replace = TRUE))
data <- data %>%
dplyr::mutate(fill = paste(animals, "-", region))
ggplot(data, aes(animals, value)) +
geom_boxplot(aes(fill = fill), col = "black", show.legend = F) +
facet_grid(~region) +
scale_fill_manual(values = c("gold3", "green3", "blue",
"yellow", "green4", "blue4",
"goldenrod", 'greenyellow', "dodgerblue2"))