Scatter plot and boxplot overlay - r

Based on the previous post ggplot boxplots with scatterplot overlay (same variables),
I would like to have one boxplot for each day of week instead of two boxplots while have scatter points on it with different colour.
The code will be like:
#Box-plot for day of week effect
plot1<-ggplot(data=dodgers, aes(x=ordered_day_of_week, y=Attend)) + geom_boxplot()
#Scatter with specific colors for day of week
plot2<-ggplot(dodgers, aes(x=ordered_month, y=Attend, colour=Bobblehead, size=1.5)) + geom_point()
#Box-ploy with Scatter plot overlay
plot3<-ggplot(data=dodgers, aes(x=ordered_day_of_week, y=Attend, colour=Bobblehead)) + geom_boxplot() + geom_point()
And the result would be:
1, scatter plot
2, boxplot plot
3, combined plot

Put color= inside the aes() of geom_point() and remove it from ggplot() aes(). If you put color= inside the ggplot() then it affects all geoms. Also you could consider to use position dodge to separate points.
Example with mtcars data as OP didn't provide data.
ggplot(mtcars,aes(factor(cyl),mpg))+geom_boxplot()+
geom_point(aes(color=factor(am)),position=position_dodge(width=0.5))

Related

Showing discrete variable in a single geom_sina/violin plot

I'm trying to create a plot with geom_sina and geom_violin where all data points are plotted together (as one violin shape) and are coloured by a factor.
However, when I specify ggplot(mtcars, aes(x = "", y = mpg, fill = am)), the plot is split according to the factor, which is what I'd like to avoid (plot 1). The closest I've come is treating the factor as a continuous variable (plot 2). But then the legend displays a "fill" bar and not the discrete factor levels I'd like.
So, if possible, I'd like the plot to stop splitting by colour when using a factor, or to overide the legend to discrete values if going with numerics.
Any help is much appreciated : )
plot 1
plot 2
Maybe this is what you are looking for. Using the group aesthetic you could overwrite the default grouping by fill or color or ...:
Note: As you want the points do be colored I switched to the color aesthetic.
library(ggplot2)
library(ggforce)
ggplot(mtcars, aes(x = "", y = mpg)) +
geom_violin() +
geom_sina(aes(color = factor(am), group = 1))

Overlay density plot to each existing facet wrapped density plot in ggplot2?

I have a dataframe with ~37000 rows that contains 'name' in string format and 'UTCDateTime' in posixct format and am using it to produce a facet wrapped density plot of time grouped by the names:
I also have a separate density plot of posixct datetime data from an entirely different dataframe:
I want to overlay this second density plot on each individual facet_wrapped plot in the first density plot. Is there a way to do that? In general, if I have plots of any kind that are facet wrapped and another plot of the same type but different data that I want to overlay on each facet of the facet wrap, how do I do so?
This should in theory be as simple as not having the column that you're facetting by in the second dataframe. Example below:
library(ggplot2)
ggplot(iris, aes(Sepal.Width)) +
geom_density(aes(fill = Species)) +
geom_density(data = faithful,
aes(x = eruptions)) +
facet_wrap(~ Species)
Created on 2020-08-12 by the reprex package (v0.3.0)
EDIT: To get the densities on the same scale for the two types of data, you can use the computed variables using after_stat()*:
ggplot(iris, aes(Sepal.Width)) +
geom_density(aes(y = after_stat(scaled),
fill = Species)) +
geom_density(data = faithful,
aes(x = eruptions,
y = after_stat(scaled))) +
facet_wrap(~ Species)
* Prior to ggplot2 v3.3.0 also stat(variable) or ...variable....

In ggplot, using a numeric variable like a factor to create multiple plots, but using the numeric values to control spacing

If I make a data frame like this:
d1 <- data.frame(class=rep(c("A", "B", "C"), each=100),
value=c(rnorm(100,0,1), rnorm(100,1,1), rnorm(100,2,1)))
I can easily make a violin plot with a separate violin for each class:
ggplot(d1, aes(x=class, y=value)) + geom_violin()
But if I make a data frame and plot like this, with numeric values:
d2 <- data.frame(timepoint=rep(c(1, 2, 3.5), each=100),
value=c(rnorm(100,0,1), rnorm(100,1,1), rnorm(100,2,1)))
ggplot(d2, aes(x=timepoint, y=value)) + geom_violin()
I just get a single violin plot like so:
I could do factor(timepoint):
ggplot(d2, aes(x=factor(timepoint), y=value)) + geom_violin()
but then I get a plot with equal spacing. What I want is a plot where the third violin is farther to the right, since it is at time=3.5. That is, where the spacing reflects the actual values of timepoint.
I realize this isn't specific to violin plots, it could be a boxplot or any other kind of plot. Is there a way to do what I want?
As requested,
add group=timepoint to your set of aesthetics, like:
ggplot(d2, aes(x=timepoint, y=value, group=timepoint)) + geom_violin()

how to improve ggplot?

I have a boxplot output in R using ggplot2. box plot i got using the below code
I want to label each box plot as labelled in the sample plot. sample plot i want to get
I have calculated p-value that is 0.06 for first egg1. i would like to paste this text on the plot as shown in the sample plot. how i can do that?
ggplot(testdata) +
geom_boxplot(aes(x=variable, y=value, color= as.factor (classification)))
You can use annotate to add text on your boxplot:
ggplot(testdata) +
geom_boxplot(aes(x=variable, y=value, color= as.factor (classification))) +
annotate(geom="text", x=1, y=6, label="p = 0.06")

another variation on facet_wrap ordering

I've got something like 20 facets on a geom_line ggplot2 plot, with an overlaid geom_rect based on timeseries data, all with a facet_wrap. I constantly need to update my plots, and the order of my facets ultimately may need to change on a daily basis.
My question is: is it possible to order my facets using my time-series data in geom_rect? I.e. make the first facet the one that has the first geom_rect shaded area, and so on and so forth?
Here is my code: x-axis is date, y-axis is incidence3, and faceted by geo....
ggplot () +
geom_rect(data=total,
aes(xmin=as.Date(xmin),
xmax=as.Date(xmax),
ymin=-Inf,
ymax=Inf),
fill='light blue',
alpha=0.3) +
ylab("incidence") + xlab("time") +
facet_wrap(~geo) +
geom_line(data=total, aes(x=as.Date(date), y=incidence3)) +
facet_wrap(~geo, ncol=2, scale = "free_y")

Resources