Can't draw a concentric pie chart in R - r

I have the following data:
Phyla V4 Fl
<chr> <dbl> <dbl>
Proteobacteria 88.58 81.43
Firmicutes 7.33 15.34
Actinobacteriota 1.55 1.94
Bacteroidota 2.20 1.25
I want to display the data using a concentric pie chart. I have a couple of trials:
mycols <- c("#eee0b1", "#da8a67", "#e63e62", "#0033aa")
ggplot(df, aes(x = 2, y = V4, fill = Phyla)) +
geom_bar(stat = "identity", color = "white") +
coord_polar(theta = "y", start = 0)+
geom_text(aes(y = Fl, label = V4), color = "white")+
scale_y_continuous(breaks=min(df$Fl):max(df$Fl)) +
scale_fill_manual(values = mycols) +
theme_void()+
xlim(0.5, 2.5)
This generates
So, I got only one column displayed.
The other trial used this:
pie(x=c(88.58,7.33,1.55,2.2),labels="",
col=c("#eee0b1", "#da8a67", "#e63e62", "#0033aa"))
par(new=TRUE)
pie(x=c(81.43,15.34,1.94, 1.25),labels=c("Proteobacteria","Firmicutes","Actinobacteriota", "Bacteroidota"),radius=.5,
col=c("#eee0b1", "#da8a67", "#e63e62", "#0033aa"))
that generates this figure:
I do not know which is easier to fix to generate the concentric pie. I need to include the color legend and label each pie with the category name (V4, Fl) along with adding the values as percentages.

You may try this
df %>%
pivot_longer(-Phyla, names_to = "type", values_to = "y") %>%
ggplot(aes(x = type, y = y)) +
geom_bar(aes(fill = Phyla), stat = "identity",
color = "white", position = "fill", width=0.7) +
coord_polar(theta = "y", start = pi/2) +
geom_text(aes(y = y, group = Phyla, label = y),
color = "white", position = position_fill(vjust=0.5)) +
geom_text(aes(x = x, y = y, label = type),
data = data.frame(x = c(2.5, 3.5), y = c(0, 0), type = c("V4", "Fl"))
) +
scale_fill_manual(values = mycols) +
scale_x_discrete(limits = c(NA, "V4", "Fl")) +
theme_void()
pivot_longer transforms your data from "wide" to "long", so that you can draw multiple columns.
position="fill" in geom_bar() and position_fill in geom_text() will scale y value into [0,1], so that two columns are aligned.
vjust=0.5 in position_fill will display values to their corresponding areas.
It is a little difficult to label the circle directly using x axis texts, but you can label them manually using geom_text() with a new data.frame(x=c(2.5,3.5),y=c(0,0),type=c("V4","Fl"))

Related

Set size line plot with different y axis as addition to a stacked barplot

I would like to plot stacked barplot with added line plot that presents the overall set sizes. I'm plotting stacked barplot in ggplot2 without problems however additional line with different y axis is the difficulty. I'm using long-formated table as input, so there is no 'overall size' column.
Code to reproduce sample table:
df <- data.frame(Sample=c("S1","S2","S3","S4","S5","S6"), A=c(30,52,50,81,23,48), B=c(12,20,15,22,30,14), C=c(rep(15,6)))
df.melt <- melt(setDT(df), id.vars = "Sample", variable.name = "Group")
Head of the table:
Sample Group value
1: S1 A 30
2: S2 A 52
3: S3 A 50
4: S4 A 81
5: S5 A 23
6: S6 A 48
Code to draw stacked barplot:
ggplot(df.melt, aes(x = Sample, y = value, fill = Group)) +
geom_col(position = position_fill(reverse = TRUE)) +
theme(axis.text.x=element_text(angle=45, hjust=1), legend.title=element_blank()) +
scale_fill_brewer(palette="Set3") +
ylab("% of Total") +
scale_y_continuous(labels = percent) +
scale_x_discrete(limits = unique(df.melt$Sample))
Therefore the line would run through six stacked bars pointing the size of each set i.e. for sample S1 it would be 57 (A + B + C), and y axis labels to the right of the plot would show set size range.
You can put the data set directly in the geom. This allows you to use different data sets for each geom. Secondary axis are a bit tricky. They need to be a function of the primary axis and the data adjusted accordingly. I've used 120 as the adjustment factor.
percent <- c("0%", "25%", "50%", "75%", "100%")
set_sizes <- df %>%
rowwise %>%
mutate(Size = sum(A, B, C))
ggplot() +
geom_col(df.melt, mapping = aes(x = Sample, y = value, fill = Group),position = position_fill(reverse = TRUE)) +
geom_line(set_sizes, mapping = aes(x = Sample, y = Size / 120, group = 1)) +
scale_y_continuous(name = "% of Total", labels = percent, sec.axis = sec_axis(~ .*120, name = "Sample Size")) +
theme(axis.text.x=element_text(angle=45, hjust=1), legend.title=element_blank()) +
scale_fill_brewer(palette="Set3") +
scale_x_discrete(limits = unique(df.melt$Sample))
Alternatively, you can use cowplot to arrange two independent plots on top of each other, e.g.:
suppressMessages(invisible(lapply(c("data.table", "ggplot2", "cowplot"),
require, character.only=TRUE)))
df <- data.table(Sample=c("S1","S2","S3","S4","S5","S6"),
A=c(30,52,50,81,23,48), B=c(12,20,15,22,30,14), C=c(rep(15,6)))
df.melt <- melt(df, id.vars = "Sample", variable.name = "Group")
percent <- paste0(sprintf("%s", seq(0, 100, 25)), "%")
p1 <- ggplot(df.melt, aes(x = Sample, y = value, fill = Group)) +
geom_col(position = position_fill(reverse = TRUE)) +
theme(axis.text.x=element_text(angle=45, hjust=1), legend.title=element_blank()) +
scale_fill_brewer(palette="Set3") +
ylab("% of Total") +
scale_y_continuous(labels = percent) +
scale_x_discrete(limits = unique(df.melt$Sample))
p2 <- ggplot(df.melt[, .(value=sum(value)), by="Sample"],
aes(x = Sample, y = value, group=1)) +
geom_line() +
scale_x_discrete(labels = NULL, breaks = NULL) +
labs(x = NULL)
plot_grid(p2, NULL, p1, align="hv", nrow=3, axis='tlbr', rel_heights=c(1, -.28, 4), greedy=FALSE)
Created on 2022-02-20 by the reprex package (v2.0.1)

Include 2nd variable labels on an existing Variable vs sample plot geom_jitter

I have a geom_jitter plot showing Variables between 2 samples, I would like to include the Group-variable parameters on the left of the plot, setting a separation by lines like in the figure below. Thus, Variables are organised by Group.
Here is a reproducible example:
data<- tibble::tibble(
Variable = c("A","B","C","D","E", "F"),
Group = c("Asia","Asia","Europe","Europe","Africa","America"),
sample1 = c(0.38,0.22,0.18,0.12,0.1,0),
sample2 = c(0.23,0.2,0,0.12,0.11,0.15))
library(reshape2)
data2<- melt(data,
id.vars=c("Variable", "Group"),
measure.vars=c("sample1", "sample2"),
variable.name="Sample",
value.name="value")
data22[is.na(data22)] <- 0
library(ggplot2)
ggplot(data2, aes(x = Sample, y = Variable, label=NA)) +
geom_point(aes(size = value, colour = value)) +
geom_text(hjust = 1, size = 2) +
# scale_size(range = c(1,3)) +
theme_bw()+
scale_color_gradient(low = "lightblue", high = "darkblue")
Here is the current output I have:
And this is the format I would like:
To get a polished version of the plot most similar to your ideal plot, you can use facet_grid() plus some theme() customization.
ggplot(data2, aes(x = Sample, y = Variable, label=NA)) +
geom_point(aes(size = value, colour = value)) +
geom_text(hjust = 1, size = 2) +
# scale_size(range = c(1,3)) +
theme_bw()+
scale_color_gradient(low = "lightblue", high = "darkblue") +
facet_grid(Group~., scales = "free", switch = "y") +
theme(strip.placement = "outside",
strip.text.y = element_text(angle = 180),
panel.spacing = unit(0, "cm"))

adding legend to plot with 2 geom points

I have this plot
dat = data.frame(group = rep("A",3),subgroup= c("B","C","D"), value= c(4,5,6),avg = c(4.5,4.5,4.5))
ggplot(dat, aes(x= group, y =value, color = fct_rev(subgroup) ))+
geom_point()+
geom_point(data = dat ,aes(x = group, y = avg), color = "blue",pch = 17, inherit.aes = FALSE)
I need to show 2 legends: 1 for the fct_rev(subgroup) which I already there but there is no legend for "avg".
How can i add a legend that is a blue triangle pch 17 with the title "avg?
thank you
Maybe like this?
ggplot(dat, aes(x= group, y =value, color = fct_rev(subgroup) ))+
geom_point()+
geom_point(data = dat ,aes(x = group, y = avg,shape = "Mean"),
color = "blue", inherit.aes = FALSE) +
scale_shape_manual(values = c('Mean' = 17))
Using data from original post.
Legends do not work like that in ggplot. Why not add a geom_text at the average? I see that you have a column with the average being repeated. This seems like a bad way to handle the data, but irrelevant right now.
My proposed solution:
ggplot(dat)+
geom_point(aes(x= group, y =value, color = subgroup))+
geom_point(aes(x = group, y = avg), color = "blue",pch = 17, inherit.aes = FALSE) +
geom_text(aes(x=1, y = 4.5), label = "avg", nudge_x = .1)
You could also add a hline to symbolize the average, which would aesthetically look nicer.

Aesthetics must be either length 1 or the same as the data (1): x, y, label

I'm working on some data on party polarization (something like this) and used geom_dumbbell from ggalt and ggplot2. I keep getting the same aes error and other solutions in the forum did not address this as effectively. This is my sample data.
df <- data_frame(policy=c("Not enough restrictions on gun ownership", "Climate change is an immediate threat", "Abortion should be illegal"),
Democrats=c(0.54, 0.82, 0.30),
Republicans=c(0.23, 0.38, 0.40),
diff=sprintf("+%d", as.integer((Democrats-Republicans)*100)))
I wanted to keep order of the plot, so converted policy to factor and wanted % to be shown only on the first line.
df <- arrange(df, desc(diff))
df$policy <- factor(df$policy, levels=rev(df$policy))
percent_first <- function(x) {
x <- sprintf("%d%%", round(x*100))
x[2:length(x)] <- sub("%$", "", x[2:length(x)])
x
}
Then I used ggplot that rendered something close to what I wanted.
gg2 <- ggplot()
gg2 <- gg + geom_segment(data = df, aes(y=country, yend=country, x=0, xend=1), color = "#b2b2b2", size = 0.15)
# making the dumbbell
gg2 <- gg + geom_dumbbell(data=df, aes(y=country, x=Democrats, xend=Republicans),
size=1.5, color = "#B2B2B2", point.size.l=3, point.size.r=3,
point.color.l = "#9FB059", point.color.r = "#EDAE52")
I then wanted the dumbbell to read Democrat and Republican on top to label the two points (like this). This is where I get the error.
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Democrats, y=country, label="Democrats"),
color="#9fb059", size=3, vjust=-2, fontface="bold", family="Calibri")
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Republicans, y=country, label="Republicans"),
color="#edae52", size=3, vjust=-2, fontface="bold", family="Calibri")
Any thoughts on what I might be doing wrong?
I think it would be easier to build your own "dumbbells" with geom_segment() and geom_point(). Working with your df and changing the variable refences "country" to "policy":
library(tidyverse)
# gather data into long form to make ggplot happy
df2 <- gather(df,"party", "value", Democrats:Republicans)
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
# our dumbell
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
# the text labels
geom_text(aes(label = party), vjust = -1.5) + # use vjust to shift text up to no overlap
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) + # named vector to map colors to values in df2
scale_x_continuous(limits = c(0,1), labels = scales::percent) # use library(scales) nice math instead of pasting
Produces this plot:
Which has some overlapping labels. I think you could avoid that if you use just the first letter of party like this:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(aes(label = gsub("^(\\D).*", "\\1", party)), vjust = -1.5) + # just the first letter instead
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red"),
guide = "none") +
scale_x_continuous(limits = c(0,1), labels = scales::percent)
Only label the top issue with names:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(data = filter(df2, policy == "Not enough restrictions on gun ownership"),
aes(label = party), vjust = -1.5) +
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) +
scale_x_continuous(limits = c(0,1), labels = scales::percent)

ggplot2 legend with two different geom_point

I have the following ggplot graph with circles representing the observed data and the crosses the mean for each treatment :
d <- data.frame(Number = rnorm(12,100,20),
Treatment = rep(c("A","B","C", "D"), each = 3))
av <- aggregate(d["Number"], d["Treatment"], mean)
ggplot(data = d, aes(y = Number, x = Treatment)) +
geom_point(shape = 1, size = 6, color = "grey50") +
geom_point(data=av, shape = 4) +
theme_bw()
I would like to add a legend with the exact same symbols on top of the graphs but I'm a bit lost... I use aes to force the creation of legend and then try to modify it with manual scales but the result is not convincing. I would like to have one grey circle of size 6. That sounds also quite complicated for such a basic thing ... There is probably an easyier solution.
ggplot(data = d, aes(y = Number, x = Treatment)) +
geom_point(aes(shape = "1", size = "6", color = "grey50")) +
geom_point(data=av, aes(shape = "4")) +
theme_bw() +
scale_shape_manual(name = "", values = c(1,4), labels = c("observed values", "mean")) +
scale_size_manual(name = "", values = c(6,1), labels = c("observed values", "mean")) +
scale_color_manual(name = "", values = c("grey50","black"),
labels = c("observed values", "mean")) +
theme(legend.position = "top",
legend.key = element_rect(color = NA))
http://imagizer.imageshack.us/v2/320x240q90/842/4pgj.png
The ggplot2 way would be combining everything into a single data.frame like this:
av$Aggregated <- "mean"
d$Aggregated <- "observed value"
d <- rbind(d, av)
ggplot(data = d, aes(y = Number, x = Treatment,
shape=Aggregated, size=Aggregated, colour=Aggregated)) +
geom_point()
And than customize using manual scales and themes.

Resources