Related
my question is basically a follow-up to this question. However, the problem is that in the said question the answer completely bypasses the fact that ggarrange is used and instead transfers the whole issue to be handled by the facets functionality of ggplot.
This doesn't work for me since I already am using facets in the sub-plots and I cannot use them again.
Here is some example code. I am wondering how to achieve that the two plots which are joined with ggarrange have the same range of y-axis (of course, not setting the limits manually).
mtcars %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
ggarrange(plotlist = .)
As you can see, the left image's y-axis ranges from 2 to 5, while the right plot's y-axis ranges from 1.5 to 3.5. How can I make them be the same?
I'm once again arguing for abandoning the 'ggarrange' approach, this time in favour of the {patchwork} package, which allows you to apply an operation to all previous plots. In this case, we can use & scale_y_continuous(limits = ...) to set the limits for all plots.
library(ggplot2)
library(dplyr)
library(purrr)
library(patchwork)
mtcars %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
wrap_plots() &
scale_y_continuous(limits = range(mtcars$wt))
Created on 2022-12-08 by the reprex package (v2.0.0)
One option would be to compute and add the range of your x and y variables to your dataset before splitting, which could then be used to set the limits.
library(dplyr)
library(ggplot2)
library(ggpubr)
library(purrr)
mtcars %>%
mutate(across(c(mpg, wt), list(range = ~list(range(.x))))) %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
scale_x_continuous(limits = .$mpg_range[[1]]) +
scale_y_continuous(limits = .$wt_range[[1]]) +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
ggarrange(plotlist = .)
I have 4 variables (A, B, C, D) with similar pattern on 3 Locations. I would like to plot a box plot (variables as dots on Y-axis, locations as X). But the variables have values of different orders of magnitude. Is there a way of scaling the Y-axis and have all variables plotted on the boxplots? Maybe differenced by colouring.
Location = c("Washington","Washington","Washington","Washington","Washington","Washington", "Maine","Maine","Maine","Maine","Maine", "Florida","Florida","Florida","Florida","Florida","Florida")
A = c(0.000693156, 0.000677354, 0.000727863, 0.000650822, 0.000908343, 0.001126689, 0.001316292, 0.000975274, 0.00109082, 0.001057585, 0.000927826, 0.000552769, 0.000532546, 0.000559781, 0.000771569, 0.000563436, 0.000551136)
B = c(0.001915388, 0.001936627, 0.001476521, 0.001573681, 0.002584282, 0.00738909, 0.008089839, 0.006616564, 0.00495211, 0.004515925, 0.003791596, 0.000653847, 0.000350701, 0.000559781, 0.001920087, 0.000738206, 0.001077627)
C = c(0.000138966, 0.000104745, 0.000145573, 0.000103305, 5.08255E-05, 0.000361988, 0.000264876, 0.000454172, 0.000277471, 0.000117919, 8.9214E-05, 0.000173727, 0.000108241, 8.54628E-05, 2.35593E-05, 3.1302E-05, 1.12019E-05)
D = c(0.000108829, 0.000135005, 0.000120617, 9.29746E-05, 0.000105561, 9.27596E-05, 0.000121317, 0.000131471, 0.000152503, 0.000128974, 0.000196271, 0.000142141, 0.000147208, 0.00013674, 0.000147246, 0.000185204, 0.000103058)
df = data.frame(Location, A, B, C, D)
And this is what I have tried for two variables as individual graphs
library(ggplot2)
a <- ggplot(df, aes(x=Location, y=A)) +
geom_boxplot()
a + geom_dotplot(binaxis='y', stackdir='center', dotsize=1, fill="red")
b <- ggplot(df, aes(x=Location, y=B)) +
geom_boxplot()
b + geom_dotplot(binaxis='y', stackdir='center', dotsize=1, fill="blue")
Can I merge all 4 variables in 1 graph with a scaled Y-axis?
Can I add a legend only showing "A" and "D"?
If you reshape your data to "long" format, faceting is one option. Note that you must set scales = 'free' in facet_wrap().
library(tidyverse)
df.long <- df %>%
pivot_longer(A:D, names_to = 'variable', values_to = 'value')
g <- ggplot(data = df.long, aes(x = Location, y = value)) +
geom_boxplot() +
facet_wrap(facets = ~variable, scales = 'free')
print(g)
If you wanted to get everything on one plot, you'd have to rescale the data per group. Here I've normalized each data point to between 0 and 1, relative to its original scale.
df.long <- df %>%
pivot_longer(A:D, names_to = 'variable', values_to = 'value') %>%
group_by(variable) %>%
mutate(value_norm = value - min(value),
value_norm = value_norm / max(value_norm)
)
g.norm <- ggplot(data = df.long, aes(x = Location, y = value_norm, fill = variable)) +
geom_boxplot()
print(g.norm)
Try this. Using scale_y_log10. Not the most beautiful plot, but ...
library(ggplot2)
library(tidyr)
library(dplyr)
df %>%
pivot_longer(-Location) %>%
ggplot(aes(x=Location, y=value, color = name)) +
geom_boxplot() +
geom_dotplot(aes(fill = name), color = "black", binaxis='y', dotsize=.5) +
scale_y_log10()
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2020-04-14 by the reprex package (v0.3.0)
I need to use gghighlight in a clustered bar chart in R in order to highlight only one single bar. My code and sample data looks like this:
library(tidyr)
library(ggplot2)
dat <- data.frame(country=c('USA','Brazil','Ghana','England','Australia'), Stabbing=c(15,10,9,6,7), Accidents=c(20,25,21,28,15), Suicide=c(3,10,7,8,6))
dat.m <- melt(dat, id.vars='country')
dat.g <- gather(dat, type, value, -country)
ggplot(dat.g, aes(type, value)) +
geom_bar(aes(fill = country), stat = "identity", position = "dodge") +
gghighlight(type == "Accidents" & country == "Brazil")
But this gives me this awkward
How can I get gghighlight to highlight only one single bar of one group (so combining two conditions for two discrete variables)?
Here are two alternative options for highlighting a single column in this type of plot:
1) make a new variable (named highlight below) and fill by that (and, if you like, use the line colors to color by country)
2) manually annotate the one column you want to highlight with an arrow and/or text (or work out how to automate the positioning, but that would be more involved) - could be an option for one final figure
library(tidyr)
library(ggplot2)
dat <- data.frame(country=c('USA','Brazil','Ghana','England','Australia'),
Stabbing=c(15,10,9,6,7),
Accidents=c(20,25,21,28,15), Suicide=c(3,10,7,8,6))
dat.m <- reshape2::melt(dat, id.vars='country')
dat.g <- gather(dat, type, value, -country)
## set highlighted bar
dat.g$highlight <- ifelse(dat.g$type == "Accidents" & dat.g$country == "Brazil", TRUE, FALSE)
## option 1: use fill to highlight, colour for country
ggplot(dat.g, aes(type, value, fill = highlight, colour=country), alpha=.6) +
geom_bar(stat = "identity", position = "dodge2", size=1) +
scale_fill_manual(values = c("grey20", "red"))+
guides(fill = FALSE) +
## option 2: use annotate to manually label a specific column:
annotate(geom = "curve", x = 1.15, y = 30, xend = 1.35, yend = 26,
curvature = .2, arrow = arrow(length = unit(2, "mm"))) +
annotate(geom = "text", x = 1, y = 31, label = "Highlight", hjust = "left")
Created on 2020-03-10 by the reprex package (v0.3.0)
I think gghighlight is not built for this kind of plot - not yet! You could file a feature request ? It is a bit unclear though if this visualisation is very helpful. Gghighlight always draws everything - this makes the "weird" shadows when dodging.
If you want to keep using gghightlight, maybe try faceting, which they suggest in their vignette
A suggestion - Use facets:
(using mtcars as example)
library(tidyverse)
library(gghighlight)
mtcars2 <- mtcars %>% mutate(cyl = as.character(cyl), gear = as.character(gear))
ggplot(mtcars2, aes(cyl, disp, fill = gear)) +
geom_col() + #no dodge
gghighlight(cyl == "4") + #only one variable
facet_grid(~ gear) #the other variable is here
#> Warning: Tried to calculate with group_by(), but the calculation failed.
#> Falling back to ungrouped filter operation...
Created on 2020-03-09 by the reprex package (v0.3.0)
Or, here without gghighlight, in a more traditional subsetting approach.
You need to make a subset of data which contains rows for each group you want to dodge by, in this case "cyl" and "gear". I replace the irrelevant data with "NA", you could also use "0".
library(tidyverse)
mtcars2 <- mtcars %>%
mutate(cyl = as.character(cyl), gear = as.character(gear)) %>%
group_by(cyl, gear) %>%
summarise(disp = mean(disp))
subset_mt <- mtcars2 %>% mutate(highlight = if_else(cyl == '4' & gear == '3', disp, NA_real_))
ggplot() +
geom_col(data = mtcars2, aes(cyl, disp, group = gear), fill = 'grey', alpha = 0.6, position = 'dodge') +
geom_col(data = subset_mt, aes(cyl, highlight, fill = gear), position = 'dodge')
#> Warning: Removed 7 rows containing missing values (geom_col).
Created on 2020-03-10 by the reprex package (v0.3.0)
Given a dataframe with discrete values,
d=data.frame(id=1:6, a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
I want to make a plot like
However I want to make different color for each layer, say red and green for "a", yellow/blue for "b".
The idea is to reshape your data (define coordinates to draw the rectangles) in order to use geom_rect from ggplot:
library(ggplot2)
library(reshape2)
i = setNames(expand.grid(1:nrow(d),1:ncol(d[-1])),c('x1','y1'))
ggplot(cbind(i,melt(d, id.vars='id')),
aes(xmin=x1, xmax=x1+1, ymin=y1, ymax=y1+1, color=variable, fill=value)) +
geom_rect()
Try geom_tile(). But you need to reshape your data to get exactly the same figure as you presented.
df <- data.frame(id=factor(c(1:6)), a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
library(reshape2)
df <- melt(df, vars.id = c(df$id))
library(ggplot2)
ggplot(aes(x = id, y = variable, fill = value), data = df) + geom_tile()
require("dplyr")
require("tidyr")
require("ggplot2")
d=data.frame(id=1:6, a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
ggplot(d %>% gather(type, value, a, b, c) %>% mutate(value = paste0(type, value)),
aes(x = id, y = type)) +
geom_tile(aes(fill = value), color = "white") +
scale_fill_manual(values = c("forestgreen", "indianred", "lightgoldenrod1",
"royalblue", "plum1", "plum2", "plum3"))
First we use reshape2 to transform the data from wide to long. Then to get discrete values we use as.factor(value) and finally we use scale_fill_manual to assign the 5 different colours we need. In geom_tile we specify the colour of the tile borders.
library(reshape2)
library(ggplot2)
df <- data.frame(id=1:6, a=c(1,1,1,0,0,0), b=c(0,0,0,1,1,1), c=c(10,20,30,30,10,20))
df <- melt(df, id.vars=c("id"))
ggplot(df, aes(id, variable, fill = as.factor(value))) + geom_tile(colour = "white") +
scale_fill_manual(values = c("lightblue", "steelblue2", "steelblue3", "steelblue4", "darkblue"), name = "Values")+
scale_x_discrete(limits = 1:6)
I'm using ggplot2 and am trying to generate a plot which shows the following data.
df=data.frame(score=c(4,2,3,5,7,6,5,6,4,2,3,5,4,8),
age=c(18,18,23,50,19,39,19,23,22,22,40,35,22,16))
str(df)
df
Instead of doing a frequency plot of the variables (see below code), I want to generate a plot of the average values for each x value. So I want to plot the average score at each age level. At age 18 on the x axis, we might have a 3 on the y axis for score. At age 23, we might have an average score of 4.5, and so forth (Edit: average values corrected). This would ideally be represented with a barplot.
ggplot(df, aes(x=factor(age), y=factor(score))) + geom_bar()
Error: stat_count() must not be used with a y aesthetic.
Just not sure how to do this in R with ggplot2 and can't seem to find anything on such plots. Statisticially, I don't know if the plot I desire to plot is even the right thing to do, but that's a different store.
Thanks!
You can use summary functions in ggplot. Here are two ways of achieving the same result:
# Option 1
ggplot(df, aes(x = factor(age), y = score)) +
geom_bar(stat = "summary", fun = "mean")
# Option 2
ggplot(df, aes(x = factor(age), y = score)) +
stat_summary(fun = "mean", geom = "bar")
Older versions of ggplot use fun.y instead of fun:
ggplot(df, aes(x = factor(age), y = score)) +
stat_summary(fun.y = "mean", geom = "bar")
If I understood you right, you could try something like this:
library(plyr)
library(ggplot2)
ggplot(ddply(df, .(age), mean), aes(x=factor(age), y=factor(score))) + geom_bar()
You can also use aggregate() in base R instead of loading another package.
temp = aggregate(list(score = df$score), list(age = factor(df$age)), mean)
ggplot(temp, aes(x = age, y = score)) + geom_bar()
Another option is doing a group_by of the x-values and summarise the "mean_score" per "age" using dplyr to do it in one pipe. Also you can use geom_col instead of geom_bar. Here is a reproducible example:
df=data.frame(score=c(4,2,3,5,7,6,5,6,4,2,3,5,4,8),
age=c(18,18,23,50,19,39,19,23,22,22,40,35,22,16))
library(dplyr)
library(ggplot2)
df %>%
group_by(age) %>%
summarise(mean_score = mean(score)) %>%
ggplot(aes(x = factor(age), y = mean_score)) +
geom_col() +
labs(x = "Age", y = "Mean score")
Created on 2022-08-26 with reprex v2.0.2