Combined box-violin plot not aligned [duplicate] - r

This question already has an answer here:
Align violin plots with dodged box plots
(1 answer)
Closed 7 years ago.
I want to graph a distribution along two dimensions using a violinplot with a boxplot in it. The result can be really fascinating, but only when done right.
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth)
plot <- ggplot(ToothGrowth, aes(x=dose, y=len, fill=supp)) +
geom_violin() + geom_boxplot(width=0.1) + theme(legend.position="none")
ggsave(filename="Violinboxplot.png", plot, height=6, width=4)
This is however what I get:
The boxplots are aligned along the axis belonging to the factor. How can I shift them to be in the center of the violinplots?

There is an answer to this question here:
how to align violin plots with boxplots
You can use the position argument to shift the graph elements as needed:
dodge <- position_dodge(width = 0.5)
ggplot(ToothGrowth, aes(x=dose, y=len, fill=supp)) +
geom_violin(position = dodge) +
geom_boxplot(width=.1, position = dodge) +
theme(legend.position="none")

Related

ggplot2 - why does changing axis scale affect summary statistics of variables? [duplicate]

This question already has an answer here:
R ggplot boxplot: change y-axis limit
(1 answer)
Closed last month.
I have a the following data:
x <- data.frame('myvar'=c(10,10,9,9,8,8, runif(100)), 'mygroup' = c(rep('a', 26), rep('b', 80)))
I want to describe the data using a box-and-whiskers plot in ggplot2. I have also included the mean using a stat_summary.
library(ggplot2)
ggplot(x, aes(x=myvar, y=mygroup)) +
geom_boxplot() +
stat_summary(fun=mean, geom='point', shape=20, color='red', fill='red')
This is fine, but for some of my graphs, the outliers are so huge, that it's hard to make sense of the total distribution. In these cases, I have cut the x axis:
ggplot(x, aes(x=myvar, y=mygroup)) +
geom_boxplot() +
stat_summary(fun=mean, geom='point', shape=20, color='red', fill='red') +
scale_x_continuous(limit=c(0,5))
Note, now that the means (and medians?) are calculated using only the subset of data that is visible on the graph. Is there a ggplot way to include the outlier observations in the calculation but drop them from the visualisation?
My desired output would be a graph with x limits at c(0,5) and a red dot at 2.48 for group mygroup='a'.
scale_x_continuous will remove those points not lying within the limits. You want to use coord_cartesian to "zoom in" without removing your data:
ggplot(x, aes(x=myvar, y=mygroup)) +
geom_boxplot() +
stat_summary(fun=mean, geom='point', shape=20, color='red', fill='red') +
coord_cartesian(c(0,5))

ggplot2 different facet width for categorical x-axis [duplicate]

This question already has an answer here:
different size facets proportional of x axis on ggplot 2 r
(1 answer)
Closed 5 years ago.
I have am plotting different facets of categorical data:
df <- as.data.frame(as.factor(c("A","B","C","D","E","F")))
names(df) <- "Xvar"
df$Yvar <- c(2,1,4,5,3,7)
df$facet <- c(rep("facet 1",2),rep("facet 2",4))
ggplot(df, aes(x=Xvar, y=Yvar, group=1)) +
geom_line() +
facet_wrap(~facet, scales="free_x")
How can I make it such that facet 1 consisting of only two categories is half the size of facet 2 containing four categories? I.e. that the width of each facet is proportional to the number of categorical x-axis data points? I tried scales="free_x" to no avail.
If you're willing to use facet_grid instead of facet_wrap, you can do this with the space parameter.
ggplot(df, aes(x=Xvar, y=Yvar, group=1)) +
geom_line() +
facet_grid(~facet, scales="free_x", space = "free_x")

ggplot geom_jitter behind (multiple) geom_boxplot [duplicate]

This question already has answers here:
ggplot2 - jitter and position dodge together
(2 answers)
Closed 5 years ago.
i use following code:
data(mtcars)
ggplot(mtcars, aes(x=factor(cyl), y=mpg)) +
geom_jitter(aes(colour=factor(gear)), width = 0.1) +
geom_boxplot(aes(fill=factor(gear)), alpha=0.6)
with following result:
But i want the colored dots from geom_jitter directly behind the corresponding(!) boxplot. Is there a way to do it?
Solution is position_jitterdodge as mentioned by aosmith and his link.
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(x=factor(cyl), y=mpg, fill=factor(gear), colour=factor(gear))) +
geom_point(position = position_jitterdodge()) +
geom_boxplot(alpha=0.6)
The result looks like:

Adjusting graph bar height with ggplot2 [duplicate]

This question already has an answer here:
Losing the grey margin padding in a ggplot
(1 answer)
Closed 7 years ago.
df2 <- iris[c(5,1)]
df3 <- aggregate(df2$Sepal.Length, list(df2$Species), mean)
names(df3) <- c("x","y")
ggplot(df3, aes(x,y)) +
geom_bar(aes(fill=x),stat="identity") +
theme(axis.ticks=element_blank(), axis.text.x=element_blank())
I have successfully removed axis tick marks and labels from this plot. I am trying to get rid of the blank grey space underneath the bars. Zero should be the lower bound for the chart. I've been searching unsuccessfully for an adjustment function to either pull the bars down or to cut off the bottom grey portion. The hope is that ggplots are not hardwired with the extra space underneath.
Add the following elements to your code (inspired by this Q & A):
theme_classic()
scale_x_discrete(expand=c(0,0))
scale_y_continuous(expand=c(0,0))
Instead of theme_classic(), you can also use theme_bw() which will add horizontal and vertical lines to the plot.
You code should then look like this:
ggplot(df3, aes(x,y)) +
geom_bar(aes(fill=x),stat="identity") +
scale_x_discrete(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0)) +
theme_classic() +
theme(axis.ticks=element_blank(), axis.text.x=element_blank())
this gives:

Fit curve to histogram ggplot [duplicate]

This question already has answers here:
"Density" curve overlay on histogram where vertical axis is frequency (aka count) or relative frequency?
(3 answers)
Closed 7 years ago.
I know that i can fit a density curve to my histogram in ggplot in the following way.
df = data.frame(x=rnorm(100))
ggplot(df, aes(x=x, y=..density..)) + geom_histogram() + geom_density()
However, I want my yaxis to be frequency(counts) instead of density, and retain a curve that fits the distribution. How do I do that?
Depending on your goals, something like this may work by just scaling the density curve using multiplication:
ggplot(df, aes(x=x)) + geom_histogram() + geom_density(aes(y=..density..*10))
or
ggplot(df, aes(x=x)) + geom_histogram() + geom_density(aes(y=..count../10))
Choose other values (instead of 10) if you want to scale things differently.
Edit:
Since you are defining your scaling factor in the global environment, you can define it within aes:
ggplot(df, aes(x=x)) + geom_histogram() + geom_density(aes(n=n, y=..density..*n))
# or
ggplot(df, aes(x=x, n=n)) + geom_histogram() + geom_density(aes(y=..density..*n))
or another, less nice way using get:
ggplot(df, aes(x=x)) +
geom_histogram() +
geom_density(aes(y=..density.. * get("n", pos = .GlobalEnv)))

Resources