data(mtcars)
library(ggplot2)
ggplot(mtcars, aes(x = reorder(row.names(mtcars), mpg), y = mpg, fill = factor(cyl))) +
geom_bar(stat = "identity")
This will ggplot the bars with solid fills but what if I wish to use the same fill colors as outlines for some measures but solid fills for others. For example if 'am' equals to 1 it is solid fill but if 'am' equals to 0 than it is just an outline fill like this sample:
One option to remove the fill based on a logical condition is to change those values to NA.
library(tidyverse)
d <- head(mtcars) %>%
rownames_to_column() %>%
# make a new variable for fill
# note: don't use ifelse on a factor!
mutate(cyl_fill = ifelse(am == 0, NA, cyl),
# now make them factors
# (you can do this inside ggplot, but that is messy)
cyl = factor(cyl),
cyl_fill = factor(cyl_fill, levels = levels(cyl)))
# plot
p <- ggplot(d) +
aes(x = rowname,
y = mpg,
color = cyl,
fill = cyl_fill
) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90))
# change the fill color of NA values
p + scale_fill_discrete(drop=FALSE, na.value="white")
If you want NA fill values to be empty and omitted from the legend:
# omit the fill color of NA values
# note: drop=FALSE is still needed to keep the fill and (outline) color values the same
p + scale_fill_discrete(drop=FALSE, na.translate = F)
You can change the color of the outline in the same way (e.g. cyl_color = ifelse(am != 0, NA, Cyl)), but if you want to specify a color like white or black, it will (should) appear in the legend. You can try to hack your way around these wise defaults by plotting non-aesthetic layers behind your main layers, but it usually gets ugly:
head(mtcars) %>%
rownames_to_column() %>%
mutate(cyl_fill = ifelse(am == 0, NA, cyl),
cyl_color = ifelse(am != 0, NA, cyl),
cyl = factor(cyl),
cyl_fill = factor(cyl_fill, levels = levels(cyl)),
cyl_color = factor(cyl_color, levels = levels(cyl))) %>%
ggplot() +
aes(x = rowname,
y = mpg,
color = cyl_color,
fill = cyl_fill
) +
geom_bar(stat = "identity", color = "black") + # NON-AES LAYER FIRST
geom_bar(stat = "identity") + # Covers up the black except where omitted
theme(axis.text.x = element_text(angle = 90))+
scale_fill_discrete(drop=FALSE, na.translate = F) +
scale_color_discrete(drop=FALSE, na.translate = F)
You could assign the desired colors to each level of the fill and color variables. For example:
library(tidyverse)
mtcars %>%
rownames_to_column() %>%
arrange(mpg) %>%
mutate(rowname=factor(rowname, levels=rowname)) %>%
ggplot(aes(x = rowname, y = mpg, fill = factor(am), colour=factor(cyl))) +
geom_col(size=1) +
scale_fill_manual(values=c("0"="white", "1"="red")) +
scale_color_manual(values=c("4"="blue", "6"="orange", "8"="white")) +
theme_classic() +
theme(axis.text.x=element_text(angle=-90, vjust=0.5, hjust=0))
May be, we can do
library(dplyr)
library(ggplot2)
mtcars %>%
mutate(new = case_when(am == 1 ~ factor(cyl)),
new1 = case_when(am !=1 ~ factor(cyl))) %>%
ggplot(aes(x = reorder(row.names(mtcars), mpg), y = mpg,
fill = new, color = new1)) +
geom_bar(stat = 'identity') +
scale_fill_discrete(na.value= NA) + # similar to Devin Judge-Lord post
theme_classic() +
theme(axis.text.x=element_text(angle=-90, vjust=0.5, hjust=0))
Related
As an example, we can use geom_freqpoly() to examine how hp varies by cyl in the mtcars data.
library(tidyverse)
mtcars %>%
mutate(cyl = as.factor(cyl)) %>%
ggplot() +
aes(x=hp, color=cyl) +
geom_freqpoly(mapping = aes(y = after_stat(ncount)), bins=5)
Using after_stat(ncount), I can make each line be normalized between 0 and 1. However, is there a way to have it so that the sum of all the lines at any point is equal to 1? i.e., at any value of hp, the red, green, and blue lines add to one -- representing the estimated proportion of each cyl type at that value of hp.
This can be achieved with position = "fill", though it looks confusing with lines and is better represented as a filled geom using the same statistical transformation as geom_freqpoly
library(tidyverse)
mtcars %>%
mutate(cyl = as.factor(cyl)) %>%
ggplot() +
aes(x = hp, fill =c yl) +
stat_bin(bins = 5, position = "fill", geom = "area")
Compare this to the same result using an unfilled geom_freqpoly
mtcars %>%
mutate(cyl = as.factor(cyl)) %>%
ggplot() +
aes(x = hp, color = cyl) +
geom_freqpoly(position = "fill", bins = 5)
I think this is harder to follow.
Another alternative to geom_freqpoly would be geom_density, which permits more visually appealing representations of similar information:
mtcars %>%
mutate(cyl = as.factor(cyl)) %>%
ggplot() +
aes(x = hp, fill = cyl) +
geom_density(position = "fill", alpha = 0.5, color = "white", lwd = 2) +
coord_cartesian(xlim = c(50, 200)) +
scale_fill_brewer(palette = "Set2") +
theme_minimal(base_size = 20) +
labs(y = "Relative density")
Created on 2022-09-05 with reprex v2.0.2
I have a set of data as such;
Station;Species;
CamA;SpeciesA
CamA;SpeciesB
CamB;SpeciesA
etc...
I would like to create a cumulative barplot with the cameras station in x axis and the percentage of each species added. I have tried the following code;
ggplot(data=data, aes(x=Station, y=Species, fill = Species))+ geom_col(position="stack") + theme(axis.text.x =element_text(angle=90)) + labs (x="Cameras", y= NULL, fill ="Species")
And end up with the following graph;
But clearly I don't have a percentage on the y axis, just the species name - which is in the end what I have coded for..
How could I have the percentages on the y axis, the cameras on the x axis and the species as a fill?
Thanks !
Using mtcars as example dataset one approach to get a barplot of percentages is to use geom_bar with position = "fill".
library(ggplot2)
library(dplyr)
mtcars2 <- mtcars
mtcars2$cyl = factor(mtcars2$cyl)
mtcars2$gear = factor(mtcars2$gear)
# Use geom_bar with position = "fill"
ggplot(data = mtcars2, aes(x = cyl, fill = gear)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent_format()) +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = "Cameras", y = NULL, fill = "Species")
A second approach would be to manually pre-compute the percentages and make use of geom_col with position="stack".
# Pre-compute pecentages
mtcars2_sum <- mtcars2 %>%
count(cyl, gear) %>%
group_by(cyl) %>%
mutate(pct = n / sum(n))
ggplot(data = mtcars2_sum, aes(x = cyl, y = pct, fill = gear)) +
geom_col(position = "stack") +
scale_y_continuous(labels = scales::percent_format()) +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = "Cameras", y = NULL, fill = "Species")
Here are two reproducible minimal examples for my request.
In the first one, the x variable is a factor variable, I find the function geom_area does not work, works like a geom_segment output.
In the second one, I transfer the x variable from factor into interger, the function geom_area works but I find the axis.text.y labels are not what I want.
Anyone know fix it?
suppressMessages(library(tidyverse))
mtcars %>%
rownames_to_column('index1') %>%
mutate(index1 = index1 %>% as.factor) %>%
mutate(index2 = index1 %>% as.integer) -> df
df %>%
ggplot() +
geom_area(aes(x = index1, y = mpg), color = 'black', fill = 'black') +
coord_flip()
df %>%
ggplot() +
geom_area(aes(x = index2, y = mpg), color = 'black', fill = 'black') +
coord_flip()
Check this solution:
library(tidyverse)
library(wrapr)
df %.>%
ggplot(data = .) +
geom_area(aes(x = index2, y = mpg), color = 'black', fill = 'black') +
coord_flip() +
scale_x_continuous(
breaks = .$index2,
labels = .$index1
)
Using mtcars as an example, I've produced some violin plots. I wanted to add two things to this chart:
for each group, list n
for each group, sum a third variable (e.g. wt)
I can do (1) with the geom_text code below although (n) is actually plotted on the x axis rather than off to the side.
But I can't work out how to do (2).
Any help much appreciated!
library(ggplot2)
library(gridExtra)
library(ggthemes)
result <- mtcars
ggplot(result, aes(x = gear, y = drat, , group=gear)) +
theme_tufte(base_size = 15) + theme(line=element_blank()) +
geom_violin(fill = "white") +
geom_boxplot(fill = "black", alpha = 0.3, width = 0.1) +
ylab("drat") +
xlab("gear") +
coord_flip()+
geom_text(stat = "count", aes(label = ..count.., y = ..count..))
You can add both of these annotations by creating them in your dataframe temporarily prior to graphing. Using the dplyr package, you can create two new columns, one with the count for each group, and one with the sum of wt for each group. This can then be piped directly into your ggplot using %>% (alternatively, you could save the new dataset and insert it into ggplot the way you have it). Then with some minor edits to your geom_text call and adding a second one, we can create the plot you want. The code looks like this:
library(ggplot2)
library(gridExtra)
library(ggthemes)
library(magrittr)
library(dplyr)
result <- mtcars
result %>%
group_by(gear) %>%
mutate(count = n(), sum_wt = sum(wt)) %>%
ggplot(aes(x = gear, y = drat, , group=gear)) +
theme_tufte(base_size = 15) + theme(line=element_blank()) +
geom_violin(fill = "white") +
geom_boxplot(fill = "black", alpha = 0.3, width = 0.1) +
ylab("drat") +
xlab("gear") +
coord_flip()+
geom_text(aes(label = paste0("n = ", count),
x = (gear + 0.25),
y = 4.75)) +
geom_text(aes(label = paste0("sum wt = ", sum_wt),
x = (gear - 0.25),
y = 4.75))
The new graph looks like this:
Alternatively, if you create a summary data frame named result_sum, then you can manually add that into the geom_text calls.
result <- mtcars %>%
mutate(gear = factor(as.character(gear)))
result_sum <- result %>%
group_by(gear) %>%
summarise(count = n(), sum_wt = sum(wt))
ggplot(result, aes(x = gear, y = drat, , group=gear)) +
theme_tufte(base_size = 15) +
theme(line=element_blank()) +
geom_violin(fill = "white") +
geom_boxplot(fill = "black", alpha = 0.3, width = 0.1) +
ylab("drat") +
xlab("gear") +
coord_flip()+
geom_text(data = result_sum, aes(label = paste0("n = ", count),
x = (as.numeric(gear) + 0.25),
y = 4.75)) +
geom_text(data = result_sum, aes(label = paste0("sum wt = ", sum_wt),
x = (as.numeric(gear) - 0.25),
y = 4.75))
This gives you this:
The benefit to this second method is that the text isn't bold like in the first graph. The bold effect occurs in the first graph due to the text being printed over itself for all observations in the dataframe.
Thanks to those who helped.... I used this in the end which plots the calculated values, one set of classes being text based so using vjust to position the vertical offset.
thanks again!
library(ggplot2)
library(gridExtra)
library(ggthemes)
results <- mtcars
results$gear <- as.factor(as.character(results$gear)) #Turn 'gear' to text to simulate classes, then factorise
result_sum <- results %>%
group_by(gear) %>%
summarise(count = n(), sum_wt = sum(wt))
ggplot(results, aes(x = gear, y = drat, group=gear)) +
theme_tufte(base_size = 15) + theme(line=element_blank()) +
geom_violin(fill = "white") +
geom_boxplot(fill = "black", alpha = 0.3, width = 0.1) +
ylab("drat") +
xlab("gear") +
coord_flip()+
geom_text(data = result_sum, aes(label = paste0("n = ", count), x = (gear), vjust= 0, y = 5.25)) +
geom_text(data = result_sum, aes(label = paste0("sum wt = ", round(sum_wt,0)), x = (gear), vjust= -2, y = 5.25))
I am having trouble drawing "dodges" line on "dodged" stacked bars.
dt = mtcars %>% group_by(am, cyl) %>% summarise(m = mean(disp))
dt0 = dt[dt$am == 0, ]
dt1 = dt[dt$am == 1, ]
dt0 %>% ggplot(aes(factor(cyl), m, fill = factor(cyl))) + geom_bar(stat = 'identity', position = 'dodge') +
geom_point(data = dt1, aes(factor(cyl), m, colour = factor(cyl)), position=position_dodge(width=0.9), colour = 'black')
What I would like is to draw a line from the top of the stacked bar to the black points of each cyl.
dt0 %>% ggplot(aes(factor(cyl), m, fill = factor(cyl))) + geom_bar(stat = 'identity', position = 'dodge') +
geom_point(data = dt1, aes(factor(cyl), m, colour = factor(cyl)), position=position_dodge(width=0.9), colour = 'black') +
geom_line(data = dt1, aes(factor(cyl), m, colour = factor(cyl), group = 1), position=position_dodge(width=0.9), colour = 'black')
However, the position=position_dodge(width=0.9) dodge doesn't work here.
Any idea ?
This is much easier to accomplish if you reshape your summary data:
dt <- mtcars %>%
group_by(am, cyl) %>%
summarise(m = mean(disp)) %>%
spread(am, m)
cyl 0 1
* <dbl> <dbl> <dbl>
1 4 135.8667 93.6125
2 6 204.5500 155.0000
3 8 357.6167 326.0000
While "0" and "1" are poor column names, they can still be used in aes() if you quote them in backticks. The calls to position_dodge() also become unnecessary:
dt %>% ggplot(aes(x = factor(cyl), y = `0`, fill = factor(cyl))) +
geom_bar(stat = 'identity') +
geom_point(aes(x = factor(cyl), y = `1`), colour = 'black') +
geom_segment(aes(x = factor(cyl), xend = factor(cyl), y = `0`, yend = `1`))