The code I have is as follows:
mtcars_tab <- mtcars %>% count(cyl, gear) # counts the number of gear / cylinder combinations
mtcars_tab %>%
ggplot() +
geom_mosaic(aes(x = product(gear), fill = cyl, weight = n), divider = mosaic("v")) +
xlab("Gear") +
ylab("cyl") +
ggtitle("Distribution of Gears and Cylinders in the mtcars data")
I would like to annotate the n here on the rectangles of the mosaic plot (preferably centered).
Related
I have a dataset at the municipality level. I would like to draw a histogram of a given variable and, at the same time, fill the bars with another continuous variable (using a color gradient). This is because I believe the municipalities with low values of the variable I am plotting the histogram for have very different population size (on average) when comparing with the municipalities that are in the upper end of the distribution.
Using the mtcar data, say I would like to plot the distribution of mpg and fill the bars with a continuous color to represent the mean of the variable wt for each of the histogram bars. I typed the code below but I don't know how to actually make the fill option take the average of wt. I would want a legend to show up with a color gradient so as to inform if the mean value of wt for each histogram bar is low-medium-high in relative terms.
mtcars %>%
ggplot(aes(x=mpg, fill=wt)) +
geom_histogram()
If you want a genuine histogram you need to transform your data to do this by summarizing it first, and plot with geom_col rather than geom_histogram. The base R function hist will help you here to generate the breaks and midpoints:
library(ggplot2)
library(dplyr)
mtcars %>%
mutate(mpg = cut(x = mpg,
breaks = hist(mpg, breaks = 0:4 * 10, plot = FALSE)$breaks,
labels = hist(mpg, breaks = 0:4 * 10, plot = FALSE)$mids)) %>%
group_by(mpg) %>%
summarize(n = n(), wt = mean(wt)) %>%
ggplot(aes(x = as.numeric(as.character(mpg)), y = n, fill = wt)) +
scale_x_continuous(limits = c(0, 40), name = "mpg") +
geom_col(width = 10) +
theme_bw()
It is not a histogram exactly, but was the closest that I could think for your problem
library(tidyverse)
mtcars %>%
#Create breaks for mpg, where this sequence is just an example
mutate(mpg_cut = cut(mpg,seq(10,35,5))) %>%
#Count and mean of wt by mpg_cut
group_by(mpg_cut) %>%
summarise(
n = n(),
wt = mean(wt)
) %>%
ggplot(aes(x=mpg_cut, fill=wt)) +
#Bar plot
geom_col(aes(y = n), width = 1)
I would like to take a faceted histogram and add text on each plot indicating the total number of observations in that facet. So for carb = 1 the total count would be 7, carb = 2 the total count would be 10 etc.
p <- ggplot(mtcars, aes(x = mpg, stat = "count",fill=as.factor(carb))) + geom_histogram(bins = 8)
p <- p + facet_grid(as.factor(carb) ~ .)
p
I can do this with the table function but for more complex faceting how can I do it quickly?
You can try this. Maybe is not the most optimal because you have to define the x and y position for the label (this is done in Labels for x and in geom_text() for y with 3). But it can help you:
#Other
library(tidyverse)
#Create similar data for labels
Labels <- mtcars %>% group_by(carb) %>% summarise(N=paste0('Number is: ',n()))
#X position
Labels$mpg <- 25
#Plot
ggplot(mtcars, aes(x = mpg, stat = "count",fill=as.factor(carb))) + geom_histogram(bins = 8)+
geom_text(data = Labels,aes(x=mpg,y=3,label=N))+facet_grid(as.factor(carb) ~ .)
Consider the column "disp" in mtcars. I am trying to divide disp into intervals so that I can count the number of observations in each interval. After doing this I want to plot the results as a ggplot geom_line
This is what I have tried:
library (tidyverse)
library (ggplot2)
a1 <- mtcars %>% arrange(desc(disp)) %>%
mutate(counts = cut_interval(disp, length = 5)) %>% group_by(counts) %>% mutate(nn = n())
a2 <- a1 %>% select(counts,nn) %>% unique()
ggplot(a2, aes(counts, nn)) +
geom_point(shape = 16, size = 1, show.legend = FALSE) +
theme_bw()
I get the intervals I need in a2. i can use it to plot a scatterplot but I can see that there is no proper scale. Is there any way to use these intervals to get a continuous scale and draw a lineplot of counts vs nn?
mtcars %>% ggplot(aes(x = disp)) + geom_histogram(binwidth = 1) + theme_bw()
Thanks so much Rui Barradas! I just needed a count plot so no need of doing extra stuff.
I want to create a barplot with 2 factors and 1 continuous variable for y.
Μy code is (it is based on the build-in dataset: mtcars):
data(mtcars)
x=mtcars
library(ggplot2)
ggplot(x,aes(x=factor(carb), y=mpg, fill=factor(carb)))
+geom_bar(stat="summary",fun.y="mean")
+labs(title="Barplot of Average MPG per Carbon category per # of Cylinders", y="Mean MPG",x="Carbon Category")
+facet_grid(.~factor(cyl))
+geom_text(aes(label=mpg),vjust=3)
My goal is to have (and show) the average MPG value per carbon category, per cylinder category. Is my code correct?
The main problem is, I just want the mean value shown on each bar, not all values for this combination of factor values.
For example:
subset(x,c(x$carb==3 & x$cyl==8)) returns 3 different values for MPG, and the graph shows all these three!
You can try
library(tidyverse)
mtcars %>%
group_by(carb, cyl) %>%
summarise(AverageMpg = mean(mpg)) %>%
ggplot(aes(factor(carb), AverageMpg, label=AverageMpg, fill=factor(carb))) +
geom_col() +
geom_text(nudge_y = 0.5) +
facet_grid(~cyl, scales = "free_x", space = "free_x")
If I understand correctly, I suppose this is what you're trying to achieve.
data(mtcars)
library(tidyverse)
mtcars %>%
group_by(carb, cyl) %>%
summarise(AverageMpg = mean(mpg)) %>%
ungroup() %>%
mutate(carb = factor(carb)) %>%
ggplot(mapping = aes(x=carb, y=AverageMpg, fill=carb)) +
geom_col() +
scale_y_continuous(name = "Mean MPG") +
scale_x_discrete("Carbon Category") +
labs(title="Barplot of Average MPG per Carbon category per # of Cylinders") +
facet_grid(.~cyl)
I had a heat map with a gradient for which I wanted to label the legend at specific percentages.
# example data, apologies for the kludginess
library('ggplot2'); library('scales'); require('dplyr');
as.data.frame(with(mtcars, table(gear, cyl))) %>%
group_by(cyl) %>%
mutate(pct_of_cyl_class = Freq / sum(Freq)) %>%
ggplot(. ,aes(cyl, gear)) +
geom_tile(aes(fill=pct_of_cyl_class)) +
scale_fill_gradient(low='yellow',high='brown', name='% of Cyl. Group') +
geom_text(aes(label=percent(pct_of_cyl_class))) +
xlab('Cylinder Class') + ylab('Gears') +
ggtitle('Gear Frequency by Cylinder Class') + theme_minimal()
I needed to set breaks and labels in scale_fill_gradient().
+ scale_fill_gradient(low='yellow',high='brown',
name='% of Cyl. Group',
breaks = 0.25*0:4, labels = percent(0.25*0:4) ) # <-