converting boxplots to densities in ggplot2 in R - r

I have the following ggplot2 plot:
ggplot(iris) + geom_boxplot(aes(x=Species, y=Petal.Length, fill=Species)) + coord_flip()
I would like to instead plot this as horizontal density plots or histograms, meaning have density line plots for each species or histograms instead of boxplots. This does not do the trick:
> ggplot(iris) + geom_density(aes(x=Species, y=Petal.Length, fill=Species)) + coord_flip()
Error in eval(expr, envir, enclos) : object 'y' not found
for simplicity I used Species as the x variable and as the fill but in my actual data the X axis represents one set of conditions and the fill represents another. Though that should not matter for plotting purposes. I'm trying to make it so the X axis represents different conditions for which the value y is plotted as a density/histogram instead of boxplots.
edit this is better illustrated with a variable that has two factor-like variables like Species. In the mpg dataset, I want to make a density plot for each manufacturer, plotting the distribution of displ for each cyl value. The x-axis (which is vertical in flipped coordinates) represents each manufacturer, and value being histogrammed is displ, but for each manufacturer, I want as many histograms as there are cyl values for that manufacturer. Hope this is clearer. I know that this doesn't work because y= expects counts.
ggplot(mpg, aes(x=manufacturer, fill=cyl, y=displ)) +
geom_density(position="identity") + coord_flip()
The closest I get is:
> ggplot(mpg, aes(x=displ, fill=cyl)) +
+ geom_density(position="identity") + facet_grid(manufacturer ~ .)
But I don't want different grids, I'd like them to be different entries in the same plot like in the histogram case.

Something like this? For both histogram and density plots, the y variable is count. So, you've to plot x = Petal.Length whose frequency (for that given binwidth) will be plotted in the y-axis. Just use fill=Species along with x=Petal.Length to give colours by Species.
For histogram:
ggplot(iris, aes(x=Petal.Length, fill=Species)) +
geom_histogram(position="identity") + coord_flip()
For density:
ggplot(iris, aes(x=Petal.Length, fill=Species)) +
geom_density(position="identity") + coord_flip()
Edit: Maybe you're looking for facetting??
ggplot(mpg, aes(x=displ, fill=factor(cyl))) +
geom_density(position="identity") +
facet_wrap( ~ manufacturer, ncol=3)
Gives:
Edit: Since, you don't want facetting, the only other way I can think of is to create a separate group by pasting manufacturer and cyl together:
dd <- mpg
dd$grp <- factor(paste(dd$manufacturer, dd$cyl))
ggplot(dd, aes(x=displ)) +
geom_density(aes(fill=grp), position="identity")
gives:

Related

Overlay density plot to each existing facet wrapped density plot in ggplot2?

I have a dataframe with ~37000 rows that contains 'name' in string format and 'UTCDateTime' in posixct format and am using it to produce a facet wrapped density plot of time grouped by the names:
I also have a separate density plot of posixct datetime data from an entirely different dataframe:
I want to overlay this second density plot on each individual facet_wrapped plot in the first density plot. Is there a way to do that? In general, if I have plots of any kind that are facet wrapped and another plot of the same type but different data that I want to overlay on each facet of the facet wrap, how do I do so?
This should in theory be as simple as not having the column that you're facetting by in the second dataframe. Example below:
library(ggplot2)
ggplot(iris, aes(Sepal.Width)) +
geom_density(aes(fill = Species)) +
geom_density(data = faithful,
aes(x = eruptions)) +
facet_wrap(~ Species)
Created on 2020-08-12 by the reprex package (v0.3.0)
EDIT: To get the densities on the same scale for the two types of data, you can use the computed variables using after_stat()*:
ggplot(iris, aes(Sepal.Width)) +
geom_density(aes(y = after_stat(scaled),
fill = Species)) +
geom_density(data = faithful,
aes(x = eruptions,
y = after_stat(scaled))) +
facet_wrap(~ Species)
* Prior to ggplot2 v3.3.0 also stat(variable) or ...variable....

ggplot scatter creating uniform points

I am trying to make a scatter plot with ggplot to show time watching TV on x axis and immigrant sentiment on y axis.
The code I am using is
ggplot(totalTV,
aes(x = dfnew.TV.watching..total.time.on.average.weekday,
y = dfnew.Immigrant.Sentiment)) +
geom_point()
I am getting this output
My table is so, with first variable being character, and subsequent two being numeric:
Any idea on how to produce a representative scatter of the outcome?
Cheers
Here are some examples using the mtcars dataset.
library(ggplot2)
# Original
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_point()
# Jitter
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_jitter(width = .2) # Control spread with width
# Violin plot
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_violin()
# Boxplot
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_boxplot()
# Remember that different geoms can be combined
ggplot(mtcars,aes(factor(cyl),mpg)) +
geom_violin() +
geom_jitter(width = .2)
# Or something more exotic ala Raincloud-plots
# https://micahallen.org/2018/03/15/introducing-raincloud-plots/

Restricting the x being counted in a historgram

library(alr4)
par(mfrow = c(2,2))
ggplot(walleye, aes(x= age)) + geom_histogram() + facet_grid(~age)
I would like to create 4 histograms from the data set walleye. I would like the histograms to be for the length of the walleye. The for histograms should each have their own age for counting. I would like to restrict the ages from 1 to 4. How can I do that with ggplot?
If I understand what you are trying to do correctly, this should help:
library(alr4)
library(ggplot2)
ggplot(subset(walleye, age<5), aes(x=length)) + geom_histogram() + facet_grid(~age)
This way you are only plotting the subset of the data where age is 1-4, and you are actually plotting histograms of length.
You could try this too (adding another line of code on top of your code):
library(alr4)
library(ggplot2)
p <- ggplot(walleye, aes(x= age)) + geom_histogram() + facet_grid(~age)
p %+% subset(walleye, age %in% 1:4)

How to use free scales but keep a fixed reference point in ggplot?

I am trying to create a plot with facets. Each facet should have its own scale, but for ease of visualization I would like each facet to show a fixed y point. Is this possible with ggplot?
This is an example using the mtcars dataset. I plot the weight (wg) as a function of the number of miles per gallon (mpg). The facets represent the number of cylinders of each car. As you can see, I would like the y scales to vary across facets, but still have a reference point (3, in the example) at the same height across facets. Any suggestions?
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(mpg, wt)) + geom_point() +
geom_hline (yintercept=3, colour="red", lty=6, lwd=1) +
facet_wrap( ~ cyl, scales = "free_y")
[EDIT: in my actual data, the fixed reference point should be at y = 0. I used y = 3 in the example above because 0 didn't make sense for the range of the data points in the example]
It's unclear where the line should be, let's assume in the middle; you could compute limits outside ggplot, and add a dummy layer to set the scales,
library(ggplot2)
library(plyr)
# data frame where 3 is the middle
# 3 = (min + max) /2
dummy <- ddply(mtcars, "cyl", summarise,
min = 6 - max(wt),
max = 6 - min(wt))
ggplot(mtcars, aes(mpg, wt)) + geom_point() +
geom_blank(data=dummy, aes(y=min, x=Inf)) +
geom_blank(data=dummy, aes(y=max, x=Inf)) +
geom_hline (yintercept=3, colour="red", lty=6, lwd=1) +
facet_wrap( ~ cyl, scales = "free_y")

Plotting continuous and discrete series in ggplot with facet

I have data that plots over time with four different variables. I would like to combine them in one plot using facet_grid, where each variable gets its own sub-plot. The following code resembles my data and the way I'm presenting it:
require(ggplot2)
require(reshape2)
subm <- melt(economics, id='date', c('psavert','uempmed','unemploy'))
mcsm <- melt(data.frame(date=economics$date, q=quarters(economics$date)), id='date')
mcsm$value <- factor(mcsm$value)
ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line() +
facet_grid(variable~., scale='free_y') +
geom_step(data=mcsm, aes(date, value)) +
scale_y_discrete(breaks=levels(mcsm$value))
If I leave out scale_y_discrete, R complains that I'm trying to combine discrete value with continuous scale. If I include scale_y_discreate my continuous series miss their scale.
Is there any neat way of solving this issue ie. getting all scales correct ? I also see that the legend is alphabetically sorted, can I change that so the legend is ordered in the same order as the sub-plots ?
Problem with your data is that that for data frame subm value is numeric (continuous) but for the mcsm value is factor (discrete). You can't use the same scale for numeric and continuous values and you get y values only for the last facet (discrete). Also it is not possible to use two scale_y...() functions in one plot.
My approach would be to make mcsm value as numeric (saved as value2) and then use them - it will plot quarters as 1,2,3 and 4. To solve the problem with legend, use scale_color_discrete() and provide breaks= in order you need.
mcsm$value2<-as.numeric(mcsm$value)
ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y') + geom_step(data=mcsm, aes(date, value2)) +
scale_color_discrete(breaks=c('psavert','uempmed','unemploy','q'))
UPDATE - solution using grobs
Another approach is to use grobs and library gridExtra to plot your data as separate plots.
First, save plot with all legends and data (code as above) as object p. Then with functions ggplot_build() and ggplot_gtable() save plot as grob object gp. Extract from gp only part that plots legend (saved as object gp.leg) - in this case is list element number 17.
library(gridExtra)
p<-ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y') + geom_step(data=mcsm, aes(date, value2)) +
scale_color_discrete(breaks=c('psavert','uempmed','unemploy','q'))
gp<-ggplot_gtable(ggplot_build(p))
gp.leg<-gp$grobs[[17]]
Make two new plot p1 and p2 - first plots data of subm and second only data of mcsm. Use scale_color_manual() to set colors the same as used for plot p. For the first plot remove x axis title, texts and ticks and with plot.margin= set lower margin to negative number. For the second plot change upper margin to negative number. faced_grid() should be used for both plots to get faceted look.
p1 <- ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y')+
theme(plot.margin = unit(c(0.5,0.5,-0.25,0.5), "lines"),
axis.text.x=element_blank(),
axis.title.x=element_blank(),
axis.ticks.x=element_blank())+
scale_color_manual(values=c("#F8766D","#00BFC4","#C77CFF"),guide="none")
p2 <- ggplot(data=mcsm, aes(date, value,group=1,col=variable)) + geom_step() +
facet_grid(variable~., scale='free_y')+
theme(plot.margin = unit(c(-0.25,0.5,0.5,0.5), "lines"))+ylab("")+
scale_color_manual(values="#7CAE00",guide="none")
Save both plots p1 and p2 as grob objects and then set for both plots the same widths.
gp1 <- ggplot_gtable(ggplot_build(p1))
gp2 <- ggplot_gtable(ggplot_build(p2))
maxWidth = grid::unit.pmax(gp1$widths[2:3],gp2$widths[2:3])
gp1$widths[2:3] <- as.list(maxWidth)
gp2$widths[2:3] <- as.list(maxWidth)
With functions grid.arrange() and arrangeGrob() arrange both plots and legend in one plot.
grid.arrange(arrangeGrob(arrangeGrob(gp1,gp2,heights=c(3/4,1/4),ncol=1),
gp.leg,widths=c(7/8,1/8),ncol=2))

Resources