plot variable position in dodged bar chart - r

I have a dodged bar chart generated from the mtcars dataset showing a histogram of gear vs cyl. I'd like to mark a particular make of car mark on the chart - the chart attached will illustrate this better. The code is doing what it should (points are placed in line with the x axis label cyl) but I'd like the points to align with the correct bar showing the number of gears instead. Any ideas please?
require(graphics)
carsraw <- mtcars
cars <- mtcars %>%
select(cyl,gear) %>%
mutate(cyl=as.factor(cyl),gear=as.factor(gear)) %>%
group_by(cyl,gear) %>%
tally() %>%
rename(count=n)
Hornet <- mtcars %>%
add_rownames("model") %>%
filter(model %in% c("Hornet 4 Drive","Hornet Sportabout")) %>%
mutate(cyl=as.factor(cyl),gear=as.factor(gear)) %>%
mutate(count=7) %>%
select(model,cyl,gear,count)
(ggplot(data=cars)+
aes(x=cyl,y=count,fill=gear) +
geom_bar( position = "dodge", stat="identity")+
geom_point(data=Hornet,size=5)+
aes(x=cyl,y=count,fill=gear))

You could try using position_nudge in this particular case, although it is a bit of a hack:
(ggplot(data=cars)+
aes(x=cyl,y=count,fill=gear) +
geom_bar( position = "dodge", stat="identity")+
geom_point(data=Hornet,size=5, position = position_nudge(-0.25))+
aes(x=cyl,y=count,fill=gear))
Edit
Given the comments by the OP, here is an alternative that changes the preparation of the data a little bit. What I am doing here is including the existance of the model in the group as a Hornet column in the summarise function (and adding NA for (cyl, gear) pairs that don't include the specific model). Additionally, you have to use group = gear and position = position_dodge(1):
library(tidyverse)
cars <- mtcars %>%
rownames_to_column("model") %>%
mutate(cyl=as.factor(cyl),gear=as.factor(gear)) %>%
group_by(cyl,gear) %>%
summarise(
count = n(),
Hornet = ifelse(
any(model %in% c("Hornet 4 Drive","Hornet Sportabout")),
7,
NA
)
) %>% ungroup()
cars %>% ggplot() +
geom_col(
aes(
x = cyl,
y = count,
fill = gear
),
position = "dodge"
) +
geom_point(
aes(
x = cyl,
y = Hornet,
group = gear
),
na.rm = TRUE,
position = position_dodge(1),
size = 5
)

Related

Create bar graph-find the average mpg by the number of gears

mtcars %>%
group_by(gear, mpg) %>%
summarise(m = mean(mpg)) %>%
ggplot(aes(x = mpg, y = gear)) +
geom_bar(stat = "count")
I cannot figure out to create a bargraph with the average mpg by the number of gears
Is that what you need?
packages
library(dplyr)
library(ggplot2)
Average mpg (m) by the number of gears
mtcars %>%
group_by(gear) %>%
summarise(m = mean(mpg)) %>%
ungroup() %>%
ggplot(aes(y = m, x = gear)) +
geom_bar(stat = "identity")
First, we get the mean of mpg by gear. To do that, you want to group by gear (just gear. You don't need to group by mpg as well).
Ungroup, so you have a unified dataset.
Now you want to plot the mean you created (m) by gear. You can which of them go where. In this case, I put gear on the x-axis and the mean of mpg on the y-axis.
Given you have specific values for the mean, you don't have to count all the values. Just plot the specific value you have there. Thus, use stat = "identity" instead of stat = "count"
Now you can play with colors using fill argument in aes and change the titles and axis labels.
output
In base R (i.e. without additional libraries) you might do
with(mtcars, tapply(mpg, gear, mean)) |>
barplot(xlab='gear', ylab='n', col=4, main='My plot')

How to order facets by variable in ggplot2?

Suppose I have a graph like this:
library(tidyverse)
df <- mtcars %>%
group_by(cyl, gear) %>%
summarise(hp_mean = mean(hp))
ggplot(df, aes(x = gear, y = hp_mean)) +
geom_point(size = 2.12, colour = "black") +
theme_bw() +
facet_wrap(vars(cyl))
and would like to arrange the order of facets, according to the hp_mean value for gear=3. E.g. the facet with cyl=8 should be first as hp_mean for gear=3 is 194 which is the highest.
Any ideas?
All help is much appreaciated!
Might not be the tidiest answer out there but you could:
extract the level of hp when gear == 3 to create a variable to order by (hp_gear3)
use forcats::fct_reorder() to reorder by the mean of this value across gear (from group_by() command)
use .desc = TRUE to put in descending order
plot using stat_summary to do the mean calculation for you
mtcars %>%
group_by(gear) %>%
mutate(hp_gear3 = ifelse(gear == 3, hp, NA),
cyl = fct_reorder(factor(cyl),
hp_gear3,
mean,
na.rm = TRUE,
.desc = TRUE)) %>%
ggplot(aes(gear, hp)) +
stat_summary(fun = mean) +
facet_wrap(~cyl)

Annotate facet plot with grouped variables

I would like to place the numbers of observations above a facet boxplot. Here is an example:
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n())
ggplot(exmp, aes(x = am, fill = gear, y = wt)) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(aes(y = 6, label = N))
So, I already created column N to get the label over each box in the boxplot (combination of cyl, am and gear). How do I plot these labels so that they are over the respective box? Please note that the number of levels of gear for each level of am differs on purpose.
I really looked at a lot of ggplot tutorials and there are tons of questions dealing with annotating in facet plots. But none addressed this fairly common problem...
You need to give position_dodge() inside geom_textto match the position of the boxes, also define data argument to get the distinct value of observations:
ggplot(exmp, aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
facet_grid(.~cyl) +
geom_text(data = dplyr::distinct(exmp, N),
aes(y = 6, label = N), position = position_dodge(0.9))
One minor issue here is that you are printing the N value once for every data point, not once for every cyl/am/gear combination. So you might want to add a filtering step to avoid overplotting that text, which can look messy on screen, reduce your control over alpha, and slow down plotting in cases with larger data.
library(tidyverse)
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(am = as.factor(am),
gear = as.factor(gear))
(The data prep above was necessary for me to get the plot to look like your example. I'm using tidyverse 1.2.1 and ggplot2 3.2.1)
ggplot(exmp, aes(x = am, fill = gear, y = wt,
group = interaction(gear, am))) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(data = exmp %>% distinct(cyl, gear, am, N),
aes(y = 6, label = N),
position = position_dodge(width = 0.8))
Here's the same chart with overplotting:
Perhaps using position_dodge() in your geom_text() will get you what you want?
mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ggplot(aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
geom_text(aes(y = 6, label = N), position = position_dodge(width = 0.7)) +
facet_grid(.~cyl)

Ordering in geom_dotplot with 2 variables

I am using dotplot to do an analysis with two y variables per x variable. I'd like to arrange the chart so that it descends by one of the y variables. I used the reorder() function in the aes() and it reorders it slightly, but not entirely. Chart 1 is what it looks like before, and chart 2 is what it looks like after I use reorder().
Chart 1:
Chart 2:
Here's the code:
answers %>%
ggplot(aes(x = reorder(locale, -percent) , y = percent, fill = box)) +
geom_dotplot(binaxis='y',
stackdir='center',
dotsize = 1,
binwidth = 0.01) +
geom_errorbar(aes(ymin = ci_lo, ymax = ci_hi), width = .5, position = position_dodge(0))
And this is what the "answers" df looks like. The two variables being plotted per locale are in the "box" column - there's a top_box and bottom_box row for each locale:
As pointed out in the comments, you do not provide and data, but I think I have an idea on where you're going wrong.
Here is some example data. I'm going to use a modified mtcars for the example where we will look at the min and max weight of the cars by make.
library(tidyverse)
df <- mtcars %>% rownames_to_column() %>%
select(car = rowname, wt) %>%
mutate(car = gsub("\\s.*?$", "", car)) %>%
group_by(car) %>%
mutate(n = n()) %>%
filter(n > 1) %>%
arrange(car,wt) %>%
filter(row_number() == max(row_number()) | row_number() == min(row_number())) %>%
select(-n) %>%
ungroup() %>%
mutate(stat = rep(c("min", "max"), nrow(.)/2)) %>%
spread(stat, wt)
print(df)
# car max min
# Fiat 2.2 1.94
# Hornet 3.44 3.22
# Mazda 2.88 2.62
# Merc 4.07 3.15
# Toyota 2.46 1.84
Here is what the plot for that would look like:
df %>%
ggplot(aes(x = car))+
geom_point(aes(y = max), color = "red")+
geom_point(aes(y = min), color = "blue")
Now lets talk about what you're trying to do. You say that you would like to order by descending on one of your variables.
df %>%
arrange(-max)%>%
mutate(car = factor(car, levels = car))%>%
ggplot(aes(x = car))+
geom_point(aes(y = max), color = "red")+
geom_point(aes(y = min), color = "blue")
or
df %>%
arrange(-min)%>%
mutate(car = factor(car, levels = car))%>%
ggplot(aes(x = car))+geom_point(aes(y = max), color = "red")+
geom_point(aes(y = min), color = "blue")
I think the key here is that you want to arrange the data and then set the factor levels to get the desired output. If your data is not a factor, then ggplot will use alphabetical order. You may need to spread your data in order to use the exact method outlined above.
Update
You could do this without spreading your data, by arranging with two variables.
Here we will modify the data above to long format
df2 <- df %>% gather(measure, value, -car)
Which plots like this
df2 %>%
ggplot(aes(x = car, y = value, color = measure))+
geom_point()
and then we can arrange without spreading
df2 %>%
arrange(-value, measure) %>%
mutate(car = factor(car, levels = unique(car)))%>%
ggplot(aes(x = car, y = value, color = measure))+
geom_point()
or for descending by min
df2 %>%
arrange(desc(measure), -value) %>%
mutate(car = factor(car, levels = unique(car)))%>%
ggplot(aes(x = car, y = value, color = measure))+
geom_point()

How do you dodge a horizontal linerange in ggplot?

How do you dodge a ggstance::geom_linerangeh in ggplot2?
library(tidyverse)
library(ggstance)
mtcars %>%
group_by(cyl, am) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ggplot() +
geom_linerangeh(aes(y = am %>%
factor,
xmin = lo,
xmax = hi,
group = am %>%
factor),
position = position_dodgev(height = .25)) +
facet_wrap(~cyl, ncol = 1)
results in :
whereas I would like to see the lines sitting slightly below the horizontals, consistent with the standard behaviour of position_dodge elsewhere.
To get dodging, you need to map colour or linetype to another variable that splits am into sub-categories based on that third variable; otherwise there's only one category for each level of am and therefore nothing to dodge.
For example, let's use vs as that other variable and we'll map it to color. We also add rows (using complete) for missing combinations of am,vs, and cyl to ensure that dodging occurs even for combinations of cyl and am where only one level of vs is present in the data.
library(tidyr)
mtcars %>%
group_by(vs=factor(vs), cyl=factor(cyl), am=factor(am)) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ungroup() %>%
complete(am, cyl, nesting(vs)) %>%
ggplot() +
geom_linerangeh(aes(y = am, colour=vs, xmin = lo, xmax = hi),
position = position_dodgev(height = 0.5)) +
facet_wrap(~cyl, ncol = 1) +
theme_bw()

Resources