Create bar graph-find the average mpg by the number of gears - r

mtcars %>%
group_by(gear, mpg) %>%
summarise(m = mean(mpg)) %>%
ggplot(aes(x = mpg, y = gear)) +
geom_bar(stat = "count")
I cannot figure out to create a bargraph with the average mpg by the number of gears

Is that what you need?
packages
library(dplyr)
library(ggplot2)
Average mpg (m) by the number of gears
mtcars %>%
group_by(gear) %>%
summarise(m = mean(mpg)) %>%
ungroup() %>%
ggplot(aes(y = m, x = gear)) +
geom_bar(stat = "identity")
First, we get the mean of mpg by gear. To do that, you want to group by gear (just gear. You don't need to group by mpg as well).
Ungroup, so you have a unified dataset.
Now you want to plot the mean you created (m) by gear. You can which of them go where. In this case, I put gear on the x-axis and the mean of mpg on the y-axis.
Given you have specific values for the mean, you don't have to count all the values. Just plot the specific value you have there. Thus, use stat = "identity" instead of stat = "count"
Now you can play with colors using fill argument in aes and change the titles and axis labels.
output

In base R (i.e. without additional libraries) you might do
with(mtcars, tapply(mpg, gear, mean)) |>
barplot(xlab='gear', ylab='n', col=4, main='My plot')

Related

Filling bar colours with the mean of another continuous variable in ggplot2 histograms

I have a dataset at the municipality level. I would like to draw a histogram of a given variable and, at the same time, fill the bars with another continuous variable (using a color gradient). This is because I believe the municipalities with low values of the variable I am plotting the histogram for have very different population size (on average) when comparing with the municipalities that are in the upper end of the distribution.
Using the mtcar data, say I would like to plot the distribution of mpg and fill the bars with a continuous color to represent the mean of the variable wt for each of the histogram bars. I typed the code below but I don't know how to actually make the fill option take the average of wt. I would want a legend to show up with a color gradient so as to inform if the mean value of wt for each histogram bar is low-medium-high in relative terms.
mtcars %>%
ggplot(aes(x=mpg, fill=wt)) +
geom_histogram()
If you want a genuine histogram you need to transform your data to do this by summarizing it first, and plot with geom_col rather than geom_histogram. The base R function hist will help you here to generate the breaks and midpoints:
library(ggplot2)
library(dplyr)
mtcars %>%
mutate(mpg = cut(x = mpg,
breaks = hist(mpg, breaks = 0:4 * 10, plot = FALSE)$breaks,
labels = hist(mpg, breaks = 0:4 * 10, plot = FALSE)$mids)) %>%
group_by(mpg) %>%
summarize(n = n(), wt = mean(wt)) %>%
ggplot(aes(x = as.numeric(as.character(mpg)), y = n, fill = wt)) +
scale_x_continuous(limits = c(0, 40), name = "mpg") +
geom_col(width = 10) +
theme_bw()
It is not a histogram exactly, but was the closest that I could think for your problem
library(tidyverse)
mtcars %>%
#Create breaks for mpg, where this sequence is just an example
mutate(mpg_cut = cut(mpg,seq(10,35,5))) %>%
#Count and mean of wt by mpg_cut
group_by(mpg_cut) %>%
summarise(
n = n(),
wt = mean(wt)
) %>%
ggplot(aes(x=mpg_cut, fill=wt)) +
#Bar plot
geom_col(aes(y = n), width = 1)

How to order facets by variable in ggplot2?

Suppose I have a graph like this:
library(tidyverse)
df <- mtcars %>%
group_by(cyl, gear) %>%
summarise(hp_mean = mean(hp))
ggplot(df, aes(x = gear, y = hp_mean)) +
geom_point(size = 2.12, colour = "black") +
theme_bw() +
facet_wrap(vars(cyl))
and would like to arrange the order of facets, according to the hp_mean value for gear=3. E.g. the facet with cyl=8 should be first as hp_mean for gear=3 is 194 which is the highest.
Any ideas?
All help is much appreaciated!
Might not be the tidiest answer out there but you could:
extract the level of hp when gear == 3 to create a variable to order by (hp_gear3)
use forcats::fct_reorder() to reorder by the mean of this value across gear (from group_by() command)
use .desc = TRUE to put in descending order
plot using stat_summary to do the mean calculation for you
mtcars %>%
group_by(gear) %>%
mutate(hp_gear3 = ifelse(gear == 3, hp, NA),
cyl = fct_reorder(factor(cyl),
hp_gear3,
mean,
na.rm = TRUE,
.desc = TRUE)) %>%
ggplot(aes(gear, hp)) +
stat_summary(fun = mean) +
facet_wrap(~cyl)

ggplot barplot mean values on graph

I want to create a barplot with 2 factors and 1 continuous variable for y.
Μy code is (it is based on the build-in dataset: mtcars):
data(mtcars)
x=mtcars
library(ggplot2)
ggplot(x,aes(x=factor(carb), y=mpg, fill=factor(carb)))
+geom_bar(stat="summary",fun.y="mean")
+labs(title="Barplot of Average MPG per Carbon category per # of Cylinders", y="Mean MPG",x="Carbon Category")
+facet_grid(.~factor(cyl))
+geom_text(aes(label=mpg),vjust=3)
My goal is to have (and show) the average MPG value per carbon category, per cylinder category. Is my code correct?
The main problem is, I just want the mean value shown on each bar, not all values for this combination of factor values.
For example:
subset(x,c(x$carb==3 & x$cyl==8)) returns 3 different values for MPG, and the graph shows all these three!
You can try
library(tidyverse)
mtcars %>%
group_by(carb, cyl) %>%
summarise(AverageMpg = mean(mpg)) %>%
ggplot(aes(factor(carb), AverageMpg, label=AverageMpg, fill=factor(carb))) +
geom_col() +
geom_text(nudge_y = 0.5) +
facet_grid(~cyl, scales = "free_x", space = "free_x")
If I understand correctly, I suppose this is what you're trying to achieve.
data(mtcars)
library(tidyverse)
mtcars %>%
group_by(carb, cyl) %>%
summarise(AverageMpg = mean(mpg)) %>%
ungroup() %>%
mutate(carb = factor(carb)) %>%
ggplot(mapping = aes(x=carb, y=AverageMpg, fill=carb)) +
geom_col() +
scale_y_continuous(name = "Mean MPG") +
scale_x_discrete("Carbon Category") +
labs(title="Barplot of Average MPG per Carbon category per # of Cylinders") +
facet_grid(.~cyl)

How do you dodge a horizontal linerange in ggplot?

How do you dodge a ggstance::geom_linerangeh in ggplot2?
library(tidyverse)
library(ggstance)
mtcars %>%
group_by(cyl, am) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ggplot() +
geom_linerangeh(aes(y = am %>%
factor,
xmin = lo,
xmax = hi,
group = am %>%
factor),
position = position_dodgev(height = .25)) +
facet_wrap(~cyl, ncol = 1)
results in :
whereas I would like to see the lines sitting slightly below the horizontals, consistent with the standard behaviour of position_dodge elsewhere.
To get dodging, you need to map colour or linetype to another variable that splits am into sub-categories based on that third variable; otherwise there's only one category for each level of am and therefore nothing to dodge.
For example, let's use vs as that other variable and we'll map it to color. We also add rows (using complete) for missing combinations of am,vs, and cyl to ensure that dodging occurs even for combinations of cyl and am where only one level of vs is present in the data.
library(tidyr)
mtcars %>%
group_by(vs=factor(vs), cyl=factor(cyl), am=factor(am)) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ungroup() %>%
complete(am, cyl, nesting(vs)) %>%
ggplot() +
geom_linerangeh(aes(y = am, colour=vs, xmin = lo, xmax = hi),
position = position_dodgev(height = 0.5)) +
facet_wrap(~cyl, ncol = 1) +
theme_bw()

plot variable position in dodged bar chart

I have a dodged bar chart generated from the mtcars dataset showing a histogram of gear vs cyl. I'd like to mark a particular make of car mark on the chart - the chart attached will illustrate this better. The code is doing what it should (points are placed in line with the x axis label cyl) but I'd like the points to align with the correct bar showing the number of gears instead. Any ideas please?
require(graphics)
carsraw <- mtcars
cars <- mtcars %>%
select(cyl,gear) %>%
mutate(cyl=as.factor(cyl),gear=as.factor(gear)) %>%
group_by(cyl,gear) %>%
tally() %>%
rename(count=n)
Hornet <- mtcars %>%
add_rownames("model") %>%
filter(model %in% c("Hornet 4 Drive","Hornet Sportabout")) %>%
mutate(cyl=as.factor(cyl),gear=as.factor(gear)) %>%
mutate(count=7) %>%
select(model,cyl,gear,count)
(ggplot(data=cars)+
aes(x=cyl,y=count,fill=gear) +
geom_bar( position = "dodge", stat="identity")+
geom_point(data=Hornet,size=5)+
aes(x=cyl,y=count,fill=gear))
You could try using position_nudge in this particular case, although it is a bit of a hack:
(ggplot(data=cars)+
aes(x=cyl,y=count,fill=gear) +
geom_bar( position = "dodge", stat="identity")+
geom_point(data=Hornet,size=5, position = position_nudge(-0.25))+
aes(x=cyl,y=count,fill=gear))
Edit
Given the comments by the OP, here is an alternative that changes the preparation of the data a little bit. What I am doing here is including the existance of the model in the group as a Hornet column in the summarise function (and adding NA for (cyl, gear) pairs that don't include the specific model). Additionally, you have to use group = gear and position = position_dodge(1):
library(tidyverse)
cars <- mtcars %>%
rownames_to_column("model") %>%
mutate(cyl=as.factor(cyl),gear=as.factor(gear)) %>%
group_by(cyl,gear) %>%
summarise(
count = n(),
Hornet = ifelse(
any(model %in% c("Hornet 4 Drive","Hornet Sportabout")),
7,
NA
)
) %>% ungroup()
cars %>% ggplot() +
geom_col(
aes(
x = cyl,
y = count,
fill = gear
),
position = "dodge"
) +
geom_point(
aes(
x = cyl,
y = Hornet,
group = gear
),
na.rm = TRUE,
position = position_dodge(1),
size = 5
)

Resources