How to change colours in multiple stacked bar charts in R? - r

I am trying to construct the stacked bar chart plots. Below you may find the sample data set and a code.
This is the data set:
group;answer;count;proportion
first;1;67;19
first;2;119;33,7
first;3;6;1,7
first;4;116;32,9
first;5;45;12,7
second;1;102;17,1
second;2;197;33,1
second;3;10;1,7
second;4;232;38,9
second;5;55;9,2
third;1;49;12,9
third;2;143;37,7
third;3;1;0,3
third;4;142;37,5
third;5;44;11,6
fourth;1;45;14,9
fourth;2;93;30,7
fourth;3;3;1
fourth;4;118;38,9
fourth;5;44;14,5
This is the code:
p <- ggplot(sample1, aes(y = proportion, x = group, fill = proportion)) +
geom_bar(position = "stack", stat = "identity") +
facet_grid(~ "") +
theme_minimal() +
p1 <- ggpar(p, xlab = F, ylab = F, legend = "", ticks = F)
p1 + geom_text(aes(label = proportion),
position = position_stack(vjust = 0.5),
check_overlap = T,
colour = "white")
This generates the plot well, but I need to manually change the colours of the five categories (in the data set denoted to as "answer").
However, if I add:
scale_fill_manual(values = c("#E7344E", "#0097BF", "#E7344E", "#0097BF", "#E7344E") )
I get the error: Continuous value supplied to discrete scale.

You are mapping a numeric on the fill aes. Hence you get a continuous fill color scale. If you want to fill your bars by answer map this column on the fill aes. But as this column is a numeric too, convert it to factor to make scale_fill_manual work:
library(ggplot2)
p <- ggplot(sample1, aes(y = proportion, x = group, fill = factor(answer))) +
geom_bar(position = "stack", stat = "identity") +
scale_fill_manual(values = c("#E7344E", "#0097BF", "#E7344E", "#0097BF", "#E7344E")) +
facet_grid(~"") +
theme_minimal()
p1 <- ggpubr::ggpar(p, xlab = F, ylab = F, legend = "", ticks = F)
p1 + geom_text(aes(label = proportion),
position = position_stack(vjust = 0.5),
check_overlap = T,
colour = "white"
)
DATA
sample1 <- structure(list(group = c(
"first", "first", "first", "first",
"first", "second", "second", "second", "second", "second", "third",
"third", "third", "third", "third", "fourth", "fourth", "fourth",
"fourth", "fourth"
), answer = c(
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L
), count = c(
67L,
119L, 6L, 116L, 45L, 102L, 197L, 10L, 232L, 55L, 49L, 143L, 1L,
142L, 44L, 45L, 93L, 3L, 118L, 44L
), proportion = c(
19, 33.7,
1.7, 32.9, 12.7, 17.1, 33.1, 1.7, 38.9, 9.2, 12.9, 37.7, 0.3,
37.5, 11.6, 14.9, 30.7, 1, 38.9, 14.5
)), class = "data.frame", row.names = c(
NA,
-20L
))

Related

Add age adjustment to geom_smooth

I need to include age adjustment in the geom_smooth line I am adding to my ggscatter plot.
my data looks like~
table link
structure(list(Time = c(0L, 0L, 0L, 0L, 6L, 12L, 18L, 18L, 0L,
12L, 18L, 6L), group = structure(c(1L, 1L, 2L, 2L, 1L, 3L, 3L,
3L, 3L, 4L, 4L, 1L), .Label = c("A", "B", "C", "D"), class = "factor"),
Age = c(77, 70.2, 69.9, 65.7, 66.2, 66.7, 67.2, 67.7, 66.8,
67.8, 68.3, 68.8), Average = c(96L, 90L, 94L, 94L, 96L, 96L,
92L, 120L, 114L, 109L, 113L, 103L)), row.names = c(NA, 12L
), class = "data.frame")
What I currently have (the 'Average" value have dependency in age..):
ggscatter(dtable, "Time","Average",conf.int = TRUE)+theme_bw()+
geom_smooth(aes(group=1),method='lm')+facet_wrap(~groups)
What I would like to have is something like:
ggscatter(dtable, "Time","Average",conf.int = TRUE)+theme_bw()+
geom_smooth(aes(group=1),method='lm', adjust= ~age)+facet_wrap(~groups)
With adjustment per each group mean age
Any suggestions?
Here is I think what you are after.
First, we need to fit the more complicated model because ggplot does not have a functionality for multivariable models (yet)
fit <- lm(Average ~ Time + group + Age, data = tdata)
Then we can use some functionality from the broom package to add the predictions and associated standard errors. With these in hand we can manually build the plot using the geom_line and geom_ribbon geoms
library(broom)
tdata %>%
bind_cols(augment(fit)) %>%
ggplot(aes(Time, Average))+
geom_point()+
geom_line(aes(x = Time, y = .fitted), size = 2, color = "blue")+
geom_ribbon(aes(ymin = .fitted + .se.fit*2, ymax = .fitted - .se.fit*2), alpha = .2)+
facet_wrap(~group)+
theme_bw()
Additionally, if you wanted to look at pooled vs non-pooled estimates
fit_no_pool <- lm(Average ~ Time + group + Age, data = tdata)
fit_complete_pool <- lm(Average ~ Time + Age, data = tdata)
library(broom)
tdata %>%
bind_cols(augment(fit_no_pool) %>% setNames(sprintf("no_pool%s", names(.)))) %>%
bind_cols(augment(fit_complete_pool) %>% setNames(sprintf("pool%s", names(.)))) %>%
ggplot(aes(Time, Average))+
geom_point()+
# Non-Pooled Estimates
geom_line(aes(x = Time, y = no_pool.fitted, color = "blue"), size = 2)+
geom_ribbon(aes(ymin = no_pool.fitted + no_pool.se.fit*2,
ymax = no_pool.fitted - no_pool.se.fit*2), alpha = .2)+
# Pooled Estimates
geom_line(aes(x = Time, y = pool.fitted, color = "orange"), size = 2)+
geom_ribbon(aes(ymin = pool.fitted + pool.se.fit*2,
ymax = pool.fitted - pool.se.fit*2), alpha = .2)+
facet_wrap(~group)+
scale_color_manual(name = "Regression",
labels = c("Pooled", "Non-Pooled"),
values = c("blue", "orange"))+
theme_bw()
One way to go is to run your model with Age as an additional predictor in your model. then use predict to get the predicted value with CIs. Append to your data then use ggplot to plot. I know you want to facet by group, so it might be worth putting it into your model as well. Just a thought. The steps would be the same.
df <- structure(list(Time = c(0L, 0L, 0L, 0L, 6L, 12L, 18L, 18L, 0L,
12L, 18L, 6L), group = structure(c(1L, 1L, 2L, 2L, 1L, 3L, 3L,
3L, 3L, 4L, 4L, 1L), .Label = c("A", "B", "C", "D"), class = "factor"),
Age = c(77, 70.2, 69.9, 65.7, 66.2, 66.7, 67.2, 67.7, 66.8,
67.8, 68.3, 68.8), Average = c(96L, 90L, 94L, 94L, 96L, 96L,
92L, 120L, 114L, 109L, 113L, 103L)), row.names = c(NA, 12L
), class = "data.frame")
#model adjusted for age
mod <- lm(Average ~ Time + Age, data = df)
#get prediction with CIS
premod <- predict(mod, interval = "predict")
#append to data
df2 <- cbind(df,premod)
#add prediction to ggplot with scatter plot
ggplot(df2) +
geom_point(aes(x=Time,y=Average)) +
geom_line(aes(x=Time, y = fit)) +
geom_ribbon(aes(x = Time,ymin = lwr, ymax = upr), alpha = .1)+
facet_wrap(~group)+
theme_bw()

Adding a ggtree object to already existing ggplot with shared y-axis

I have the following data and plot:
Data:
structure(list(type = c("mut", "mut", "mut", "mut", "mut", "mut",
"mut", "mut", "gene", "gene", "gene", "gene"), gene = c("gyrA",
"gyrA", "gyrB", "gyrB", "parC", "parC", "parE", "parE", "qnrA1",
"qnrA1", "sul3", "sul3"), type2 = c(1, 1, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2), id = c("2014-01-7234-1-S", "2015-01-3004-1-S", "2014-01-2992-1-S",
"2016-17-299-1-S", "2015-01-2166-1-S", "2014-01-4651-1-S", "2016-02-514-2-S",
"2016-02-402-2-S", "2016-02-425-2-S", "2015-01-5140-1-S", "2016-02-522-2-S",
"2016-02-739-2-S"), result = c("1", "0", "0", "0", "0", "0",
"1", "1", "0", "0", "0", "1"), species = c("Broiler", "Pig",
"Broiler", "Red fox", "Pig", "Broiler", "Wild bird", "Wild bird",
"Wild bird", "Pig", "Wild bird", "Wild bird"), fillcol = c("Broiler_1",
"Pig_0", "Broiler_0", "Red fox_0", "Pig_0", "Broiler_0", "Wild bird_1",
"Wild bird_1", "Wild bird_0", "Pig_0", "Wild bird_0", "Wild bird_1"
)), row.names = c(NA, -12L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = "gene", drop = TRUE, indices = list(
0:1, 2:3, 4:5, 6:7, 8:9, 10:11), group_sizes = c(2L, 2L,
2L, 2L, 2L, 2L), biggest_group_size = 2L, labels = structure(list(
gene = c("gyrA", "gyrB", "parC", "parE", "qnrA1", "sul3")), row.names = c(NA,
-6L), class = "data.frame", vars = "gene", drop = TRUE, indices = list(
0:1, 2:3, 4:5, 6:7, 8:9, 10:11), group_sizes = c(2L, 2L,
2L, 2L, 2L, 2L), biggest_group_size = 2L, labels = structure(list(
gene = c("gyrA", "gyrB", "parC", "parE", "qnrA1", "sul3")), row.names = c(NA,
-6L), class = "data.frame", vars = "gene", drop = TRUE)))
Plot:
library(ggplot2)
p1 <- ggplot(test_df, aes(fct_reorder(gene, type2),
factor(id),
fill = fillcol,
alpha = result)) +
geom_tile(color = "white")+
theme_minimal()+
labs(fill = NULL)+
theme(axis.text.x = element_text(angle = 90,
hjust = 1,
vjust = 0.3,
size = 7),
axis.title = element_blank(),
panel.grid = element_blank(),
legend.position = "right")+
guides(alpha = FALSE)+
coord_fixed()
Additionally, I have the following tree object:
structure(list(edge = structure(c(23L, 23L, 22L, 22L, 21L, 21L,
20L, 20L, 19L, 19L, 18L, 18L, 17L, 17L, 16L, 16L, 15L, 15L, 14L,
14L, 13L, 13L, 1L, 3L, 2L, 9L, 22L, 23L, 4L, 5L, 20L, 21L, 11L,
12L, 18L, 19L, 10L, 17L, 8L, 16L, 6L, 7L, 14L, 15L), .Dim = c(22L,
2L)), edge.length = c(2, 2, 0, 0, 2.5, 0.5, 2, 2, 0.75, 0.25,
0.5, 0.5, 2.41666666666667, 0.166666666666667, 3.0625, 0.145833333333333,
3.38888888888889, 0.326388888888889, 3, 3, 0.5, 0.111111111111111
), tip.label = c("2016-02-425-2-S", "2016-02-522-2-S", "2015-01-2166-1-S",
"2016-02-402-2-S", "2016-02-514-2-S", "2016-17-299-1-S", "2016-02-739-2-S",
"2015-01-5140-1-S", "2014-01-2992-1-S", "2014-01-7234-1-S", "2014-01-4651-1-S",
"2015-01-3004-1-S"), Nnode = 11L), class = "phylo", order = "postorder")
Which is plotted like this:
library(ggtree)
p2 <- ggtree(tree)+
geom_treescale()+
geom_tiplab(align = TRUE, linesize = 0, size = 1)+
xlim(0, 4.2)
What I want to do is to combine the tree and the first plot, and order the first plot y-axis after the order in the tree, so that they match. I have tried to use some of the solutions here, but I can't seem to produce the same plot with the facet_plot function. Is there a way to identify maching values on the y-axis on both plots, and then combine them?
This is how I want it to look (approximately):
We need to arrange the tile plot in the same order as the tree plot and then we need to lay the two plots out so they correspond. The first task is relatively straightforward, but I'm not sure how to do the second without some manual tweaking of the layout.
library(tidyverse)
library(ggtree)
library(grid)
library(gridExtra)
p2 <- ggtree(tree)+
geom_treescale()+
geom_tiplab(align = TRUE, linesize = 0, size = 3)+
xlim(0, 4.2)
Now that we've created the tree plot, let's get the ordering of the y axis programmatically. We can do that using ggplot_build to get the plot structure.
p2b = ggplot_build(p2)
We can look at the data for the plot layout by running p2b$data in the console. This outputs a list with the various data frames that represent the plot structure. Looking these over, we can see that the fifth and six data frames have the node labels. We'll use the fifth one (p2b$data[[5]] and order them based on the y column to get a vector of node labels (p2b$data[[5]] %>% arrange(y) %>% pull(label))). Then we'll convert test_df$id to a factor variable with this node ordering.
test_df = test_df %>%
mutate(id = factor(id, levels=p2b$data[[5]] %>% arrange(y) %>% pull(label)))
(As another option, you can get the ordering of the nodes directly from p2 with p2$data %>% filter(isTip) %>% arrange(parent) %>% pull(label))
Now we can generate the tile plot p1 with a node order that corresponds to that of the tree plot.
p1 <- ggplot(test_df, aes(fct_reorder(gene, type2),
factor(id),
fill = fillcol,
alpha = result)) +
geom_tile(color = "white")+
theme_minimal()+
labs(fill = NULL)+
theme(axis.text.x = element_text(angle = 90,
hjust = 1,
vjust = 0.3,
size = 7),
axis.title = element_blank(),
panel.grid = element_blank(),
legend.position = "right")+
guides(alpha = FALSE)+
coord_fixed()
We can see in the plot below that the labels correspond.
grid.arrange(p2, p1, ncol=2)
Now we need to lay out the two plots with only one set of labels and with the node lines matching up vertically with the tiles. I've done this with some manual tweaking below by creating a nullGrob() (basically a blank space below p1) and adjusting the heights argument to get the alignment. The layout can probably be done programmatically, but that would take some additional grob (graphical object) manipulation.
grid.arrange(p2 + theme(plot.margin=margin(0,-20,0,0)),
arrangeGrob(p1 + theme(axis.text.y=element_blank()),
nullGrob(),
heights=c(0.98,0.02)),
ncol=2)

What is the "N = 1" box in my R geom_bar legend, and how do I remove?

These are the data:
structure(list(Group.1 = c((name list)
), Group.2 = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 3L, 3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Radio", "Video", "Engineering",
"800Mhz", "PSSRP", "Other"), class = "factor"), x = c(93.5, 208.75,
214, 48, 66.33, 71.5, 19.5, 64.75, 17, 39, 30.75, 96.75, 30,
19, 32.5, 12.75, 47.25, 14, 22.25, 12, 3, 128.5, 9.5, 303.2,
290.35, 364.05, 333.25, 11.75, 553.25, 423, 6, 496)), .Names = c("Group.1",
"Group.2", "x"), row.names = c(NA, -32L), class = "data.frame")
running this plot:
ggplot(data = HrSums, aes(x = Group.1, y = x, fill = Group.2)) +
geom_bar(stat = "sum", position = position_stack(reverse = TRUE)) +
coord_flip() +
labs(title = "Hours Billed, by Technician and Shop", y = "Hours Billed",
x = "Technician", fill = "Shop")
I get this bar chart:
What is the "n" box, and how do I remove it (only) from the legend?
I believe the n box is because geom_bar expects to count the number of times each combination of Group.1 and Group.2 occurs, but instead you're giving a y value in your aes. geom_bar can use a different stat instead of counting, but if you want the sums of values, it expects a weight aesthetic. Here are two ways to do this, one using weight = x in geom_bar, and one that uses dplyr functions to calculate sums beforehand, then supplies this to y.
library(tidyverse)
ggplot(df, aes(x = Group.1, fill = Group.2)) +
geom_bar(aes(weight = x), position = position_stack(reverse = T)) +
coord_flip()
df_sums <- df %>%
group_by(Group.1, Group.2) %>%
summarise(x = sum(x))
ggplot(df_sums, aes(x = Group.1, y = x, fill = Group.2)) +
geom_col(position = position_stack(reverse = T)) +
coord_flip()
if you include the following then you'll only see the aesthetics you're expecting:
show.legend = c(
"x" = TRUE,
"y" = TRUE,
"alpha" = FALSE,
"color" = FALSE,
"fill" = TRUE,
"linetype" = FALSE,
"size" = FALSE,
"weight" = FALSE
)
See show.legend argument on ?geom_bar:
show.legend logical. Should this layer be included in the legends?
NA, the default, includes if any aesthetics are mapped. FALSE never
includes, and TRUE always includes. It can also be a named logical
vector to finely select the aesthetics to display.

Reverse order of x axis labels on grouped bar chart after coord_flip()

In the plot shown below I am trying to change the order of the labels on the y axis (x axis before coord_flip()).
I would like 1 to be on top and 16 at the bottom.
levels <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
library(ggplot2)
ggplot(all_Q, aes(x=qid, y=correct_per, fill=group), group=qid) +
geom_bar(stat="identity", position=position_dodge()) +
scale_x_discrete(name = "Questions", limits = levels) +
scale_y_continuous(name = "Percent correct") +
coord_flip()
Here is what I have tried to so far:
levels <- c(16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1) -> No change.
limits = rev(levels) -> no change
limits = rev(levels(levels)) -> suggestion from this question/answer img below
Reproducible example with a subset of the questions (8,9,10,11)
dput() output:
structure(list(group = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), correct_per = c(90.4761904761905, 100, 100, 87.5, 83.3333333333333, 90.9090909090909, 84.6153846153846, 87.5, 80.9523809523809, 88.6363636363636, 100, 70.8333333333333, 63.4146341463415, 76.7441860465116, 76.9230769230769, 62.5), nr_correct = c(38L, 44L, 26L, 21L, 35L, 40L, 22L, 21L, 34L, 39L, 26L, 17L, 26L, 33L, 20L, 15L), nr_incorrect = c(4L, 0L, 0L, 3L, 7L, 4L, 4L, 3L, 8L, 5L, 0L, 7L, 15L, 10L, 6L, 9L), length = c(42L, 44L, 26L, 24L, 42L, 44L, 26L, 24L, 42L, 44L, 26L, 24L, 41L, 43L, 26L, 24L), qid = c("8", "8", "8", "8", "9", "9", "9", "9", "10", "10", "10", "10", "11", "11", "11", "11")), .Names = c("group", "correct_per", "nr_correct", "nr_incorrect", "length", "qid"), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))
save to file
all_Q <- dget(filename)
levels <- c(8,9,10,11)
ggplot(all_Q, aes(x=qid, y=correct_per, fill=group), group=qid) +
geom_bar(stat="identity", position=position_dodge()) +
scale_x_discrete(name = "Questions", limits = levels) +
scale_y_continuous(name = "Percent correct") +
coord_flip()
Column all_Q$qid is factor, thus you can specify limits (order) passing non-numeric vector.
library(ggplot2)
# Don't use group as you're already grouping by fill
ggplot(all_Q, aes(qid, correct_per, fill = group)) +
geom_bar(stat = "identity", position = "dodge") +
# Passing character vector from 16 to 1
scale_x_discrete(limits = as.character(16:1)) +
# Don't use function scale_y_* only for naming
# With function labs you can specify all the names
labs(x = "Questions",
y = "Correct, %",
fill = "My fill") +
coord_flip()
It's best to work with factors in specific order. I'm sure what you want is
all_Q$qid <- factor(all_Q$qid, levels = c(11, 10, 9, 8))
ggplot(all_Q, aes(x=qid, y=correct_per, fill=group), group=qid) +
geom_bar(stat="identity", position=position_dodge()) +
scale_x_discrete(name = "Questions") +
scale_y_continuous(name = "Percent correct") +
coord_flip()
Yields:
Compare:
all_Q$qid <- factor(all_Q$qid, levels = c(11, 10, 9, 8))
Yields:

Reorder according variable of melted dataframe

I've datas as follow:
DF=structure(list(experiment = c("BR", "CH", "EP", "IP", "JU", "MA",
"SA", "ST", "SV", "VI"), duration = c(28L, 9L, 20L, 4L, 14L,
30L, 26L, 23L, 17L, 6L), percentage_total_exp = c(47.2222222222222,
51.063829787234, 52.3809523809524, 79.0322580645161, 48.6842105263158,
72.7272727272727, 62.0689655172414, 34.469696969697, 61.1111111111111,
34.8837209302326), nb_reaction = c(29, 29, 14, 11, 40, 11, 14,
14, 23, 18)), .Names = c("experiment", "duration", "percentage_total_exp",
"nb_reaction"), row.names = c(NA, -10L), class = "data.frame")
I melted my datas in order to show then as the following ggplot
meltR=melt(DF)
ggplot(meltR, aes(x=experiment , y = value, group = variable, fill = variable)) + geom_bar(stat = "identity", position="dodge")
Now the problem is, that I want to have 3 versions of this plot. One ordered by variable : duration, and the second by variable : percentage_total_exp and the last by variable : nb_reaction.
I don't know how to specify this. I tried y = reorder(value, -duration) but in fact it didnt recognize duration. Is melting is a bad idea in this case ? how to do this ?
EDIT 2 : minimal code to add because my experiment name are in reality very long
plots <- lapply(levels(meltR$variable), function(lev) {
meltR$experiment <- factor(meltR$experiment, levels = meltR$experiment[order(meltR$value[meltR$variable == lev])])
ggplot(meltR, aes(x=experiment , y = value, group = variable, fill = variable)) + geom_bar(stat = "identity", position="dodge") + ggtitle(lev) + theme_bw() + theme(axis.text.x = element_text(size=10, angle=45, hjust=1, vjust=1, face="bold"))
})
grid.arrange(grobs = plots)
Thanks a lot
duration is not of column name but a factor level of the column named variable in your example. Thus, it does not work that way. One option could be to loop over the three factor levels or variable, reorder them according to value and then plot. Here's how it could work:
library(ggplot2)
library(reshape2)
library(gridExtra)
DF=structure(list(experiment = c("BR", "CH", "EP", "IP", "JU", "MA",
"SA", "ST", "SV", "VI"), duration = c(28L, 9L, 20L, 4L, 14L,
30L, 26L, 23L, 17L, 6L), percentage_total_exp = c(47.2222222222222,
51.063829787234, 52.3809523809524, 79.0322580645161, 48.6842105263158,
72.7272727272727, 62.0689655172414, 34.469696969697, 61.1111111111111,
34.8837209302326), nb_reaction = c(29, 29, 14, 11, 40, 11, 14,
14, 23, 18)), .Names = c("experiment", "duration", "percentage_total_exp",
"nb_reaction"), row.names = c(NA, -10L), class = "data.frame")
meltR=melt(DF)
plots <- lapply(levels(meltR$variable), function(lev) {
meltR$experiment <- factor(meltR$experiment, levels = meltR$experiment[order(-meltR$value[meltR$variable == lev])])
ggplot(meltR, aes(x=experiment , y = value, group = variable, fill = variable)) + geom_bar(stat = "identity", position="dodge") + ggtitle(lev)
})
grid.arrange(grobs = plots)

Resources