ggplot geom_boxplot color and group variables - r

I'm trying to make a straightforward boxplot in ggplot. I'm not sure how get a grouping variable and a color/fill variable. I've tried to gather, but that doesn't seem to work. Any thoughts?
library(tidyverse)
# Does not work
mtcars %>%
as_tibble() %>%
ggplot(aes(factor(gear),
mpg,
group = vs)) +
geom_boxplot(aes(fill = as.factor(gear)))
# Does not work either
mtcars %>%
as_tibble() %>%
select(gear, mpg, vs) %>%
gather(key, value, -vs) %>%
ggplot(aes(key,
value)) +
geom_boxplot(aes(color = vs))

I'm not sure this is your intended output (gear as x-axis and fill), but here's a working example:
mtcars %>%
ggplot(
aes(
x = factor(gear),
y = mpg,
color = factor(vs),
fill = factor(gear)
)
) + geom_boxplot()
I've found being explicit when declaring your aesthetic mappings can be helpful when learning ggplot2.

Alternatively:
mtcars %>%
as_tibble() %>%
group_by(vs) %>%
ggplot(aes(factor(gear),
mpg,
fill=as.factor(gear))) +
geom_boxplot()

Related

grouped dataframe - groupings as facets in a ggplot?

Some data
grp_diamonds <- diamonds %>%
group_by(cut) %>%
group_split
grp_diamonds[[1]] %>%
ggplot(aes(x = carat, y = price)) +
geom_point()
This returns a plot for grp_diamonds[[1]]
But grp_diamonds is actually a list of 5 dataframes since I used group_split() earlier.
Is there a clever way to automatically use the groups as facets?
Yes, in this example you could just do this:
diamonds %>%
ggplot(aes(x = carat, y = price)) +
geom_point() +
facet_wrap(vars(cut))
But I wondered if there was a way to automatically facet based on existing groupings?
Making use of dplyr::groups and dplyr::vars and !!!one option would be:
library(dplyr)
library(ggplot2)
grp_diamonds <- diamonds %>%
group_by(cut, color)
grp_diamonds %>%
ggplot(aes(x = carat, y = price)) +
geom_point() +
facet_wrap(facets = vars(!!!groups(grp_diamonds)))

Add 2 additional lines to a ggplot

I have successfully plotted my dat below. However, I want to do the following data transformation: dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm) and then ADD 2 lines to my current plot:
geom_smooth() using x.cm as x
geom_smooth() using x.cwc as x
Is there a way to do this?
p.s: Is it also possible to display the 3 unique(x.cm) values as 3 stars on the plot? (see pic below)
library(tidyverse)
dat <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/cw.csv')
dat %>% group_by(groups) %>% ggplot() +
aes(x, y, color = groups, shape = groups)+
geom_point(size = 2) + theme_classic()+
stat_ellipse()
# Now do the transformation:
dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm)
I'm not sure the second part of your question makes any sense to me. But from the description one way is to simply add a level where you alter you data and aes argument, as I do in the example below (using mtcars as an example data)
# Load libraries and data
library(tidyverse)
library(ggplot2)
data(mtcars)
mtcars %>% ggplot(aes(x = hp, y = mpg)) +
geom_point(aes(col = factor(cyl))) +
stat_ellipse(aes(col = factor(cyl))) +
# Add line for ellipsis center
geom_line(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
geom_point(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
# Add smooth for.. what? I don't understand this part of the question.
geom_smooth(data = mtcars %>% group_by(cyl) %>% mutate(x_val = hp - mean(hp)) %>% ungroup(),
mapping = aes(x = x_val, y = mpg))
Now it should be quite clear which part does not make sense to me. Why/what do you mean with the second path (geom_smooth)? Moving the x-axis on the smoother makes no sense to me. Also I took the liberty of changing the definition of the first part, by instead adding the single points of the mean (center of circles) to the plot and connecting the using geom_line.

Annotate facet plot with grouped variables

I would like to place the numbers of observations above a facet boxplot. Here is an example:
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n())
ggplot(exmp, aes(x = am, fill = gear, y = wt)) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(aes(y = 6, label = N))
So, I already created column N to get the label over each box in the boxplot (combination of cyl, am and gear). How do I plot these labels so that they are over the respective box? Please note that the number of levels of gear for each level of am differs on purpose.
I really looked at a lot of ggplot tutorials and there are tons of questions dealing with annotating in facet plots. But none addressed this fairly common problem...
You need to give position_dodge() inside geom_textto match the position of the boxes, also define data argument to get the distinct value of observations:
ggplot(exmp, aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
facet_grid(.~cyl) +
geom_text(data = dplyr::distinct(exmp, N),
aes(y = 6, label = N), position = position_dodge(0.9))
One minor issue here is that you are printing the N value once for every data point, not once for every cyl/am/gear combination. So you might want to add a filtering step to avoid overplotting that text, which can look messy on screen, reduce your control over alpha, and slow down plotting in cases with larger data.
library(tidyverse)
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(am = as.factor(am),
gear = as.factor(gear))
(The data prep above was necessary for me to get the plot to look like your example. I'm using tidyverse 1.2.1 and ggplot2 3.2.1)
ggplot(exmp, aes(x = am, fill = gear, y = wt,
group = interaction(gear, am))) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(data = exmp %>% distinct(cyl, gear, am, N),
aes(y = 6, label = N),
position = position_dodge(width = 0.8))
Here's the same chart with overplotting:
Perhaps using position_dodge() in your geom_text() will get you what you want?
mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ggplot(aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
geom_text(aes(y = 6, label = N), position = position_dodge(width = 0.7)) +
facet_grid(.~cyl)

Error: Mapping should be created with `aes() or `aes_()`

df %>%
group_by(state_name) %>%
summarise(TotalPopulation = mean(population_total)) %>%
ggplot(data=df, aes(x= state_name, y=TotalPopulation)) +
geom_bar("stat=identity")
I am getting the error
Error: Mapping should be created with aes() oraes_()`.
2 issues, you dont need the data argument as it is already piped in. And as #rpolicastro says only 'identity' should be in quotes:
df %>%
group_by(state_name) %>%
summarise(TotalPopulation = mean(population_total)) %>%
ggplot(aes(x= state_name, y=TotalPopulation)) +
geom_bar(stat="identity")
Let's re-create problem with mtcars
mtcars %>%
group_by(cyl) %>%
summarise(TotalMpg = mean(mpg)) %>%
ggplot(data=mtcars, aes(x= cyl, y=TotalMpg)) +
geom_bar("stat=identity")
Error: Mapping should be created with `aes() or `aes_()`
There 2 point to fix:
you pass your data with pipe, so data=mtcars should be removed
quotes should be geom_bar(stat="identity") instead of geom_bar("stat=identity")
Following code produces plot
mtcars %>%
group_by(cyl) %>%
summarise(TotalMpg = mean(mpg)) %>%
ggplot(aes(x= cyl, y=TotalMpg)) +
geom_bar(stat="identity")

Using purrr to plot a mean line on a list of ggplot2 objects

I am playing around with purrr and the methods outlines in this post
My goal here is to use purrr::map2 (or some variant) to apply a function (mean in this case) by a group (cyl) then create some plots that use the result of the previous application of a function in the resulting plot. Or put, another way, I want to add a vertical line for the mean on each of these plots using the mean_list list column all within a dplyr chain. Is this possible?
library(tidyverse)
mt_list <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(mean_list = map2(data, cyl, ~mean(.$disp))) %>%
mutate(plot = map2(data, cyl, ~ggplot(data = .x) +
geom_point(aes(y = drat, x = disp)) #+
#geom_vline(data = mean_list ,aes(xintercept) Unsure about this step
))
This is an example of one type of plot I'm after but this seems like a silly way to do this when the whole point is to have everything contained within a nice tibble like mt_list
mt_list$plot[[1]] +
geom_vline(aes(xintercept = mt_list$mean_list[[1]]))
This can be done by passing mean_list as the second argument to map2 rather than cyl, then using xintercept = .y in your geom_vline.
mt_list <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(mean_list = map(data, ~mean(.$disp))) %>%
mutate(plot = map2(data, mean_list, ~ ggplot(data = .x) +
geom_point(aes(y = drat, x = disp)) +
geom_vline(xintercept = .y)
))
Note that for this particular use case, you can also avoid having to create mean_list at all by using aes(xintercept = mean(disp)):
mt_list <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(plot = map(data, ~ ggplot(data = .) +
geom_point(aes(y = drat, x = disp)) +
geom_vline(aes(xintercept = mean(disp)))))

Resources