Q: Display grouped and combined boxplot in a single plot in R - r

I am trying to display grouped boxplot and combined boxplot into one plot. Take the iris data for instance:
data(iris)
p1 <- ggplot(iris, aes(x=Species, y=Sepal.Length)) +
geom_boxplot()
p1
I am trying to compare overall distribution with distributions within each categories. So is there a way to display a boxplot of all samples on the left of these three grouped boxplots?
Thanks in advance.

You can rbind a new version of iris, where Species equals "All" for all rows, to iris before piping to ggplot
p1 <- iris %>%
rbind(iris %>% mutate(Species = 'All')) %>%
ggplot(aes(x = Species, y = Sepal.Length)) +
geom_boxplot()

Yes, you can just create a column for all species as follows:
iris = iris %>% mutate(all = "All Species")
p1 <- ggplot(iris) +
geom_boxplot(aes(x=Species, y=Sepal.Length)) +
geom_boxplot(aes(x=all, y=Sepal.Length))
p1

Related

ggplot(data = df1) with added ggMarginal (data = df2)

I aim to create a ggplot with Date along the x axis, and jump height along the y axis. Simplistically, for 1 athlete in a large group of athletes, this will allow the reader to see improvements in jump height over time.
Additionally, I would like to add a ggMarginal(type = "density") to this plot. Here, I aim to plot the distribution of all athlete jump heights. As a result, the reader can interpret the performance of the primary athlete in relationship to the group distribution.
For the sack of a reproducible example, the Iris df will work.
'''
library(dplyr)
library(ggplot2)
library(ggExtra)
df1 <- iris %<%
filter(Species == "setosa")
df2 <- iris
#I have tried as follows, but a variety of error have occurred:
ggplot(NULL, aes(x=Sepal.Length, y=Sepal.Width))+
geom_point(data=df1, size=2)+
ggMarginal(data = df2, aes(x=Sepal.Length, y=Sepal.Width), type="density", margins = "y", size = 6)
'''
Although this data frame is significantly different than mine, in relation to the Iris data set, I aim to plot x = Sepal.Length, y = Sepal.Width for the Setosa species (df1), and then use ggMarginal to show the distribution of Sepal.Width on the y axis for all the species (df2)
I hope this makes sense!
Thank you for your time and expertise
As far as I get it from the docs you can't specify a separate data frame for ggMarginal. Either you specify a plot to which you want to add a marginal plot or you provide the data directly to ggMarginal.
But one option to achieve your desired result would be to create your density plot as a separate plot and glue it to your main plot via patchwork:
library(ggplot2)
library(patchwork)
df1 <- subset(iris, Species == "setosa")
df2 <- iris
p1 <- ggplot(df1, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(size = 2)
p2 <- ggplot(df2, aes(y = Sepal.Width)) +
geom_density() +
theme_void()
p1 + p2 +
plot_layout(widths = c(6, 1))

How to plot marginal distribution of each attribute?

I am trying to plot the marginal distributions of each attribute c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width") for each of the three "Species" of iris. Essentially, for each "Species" I need 4 marginal distribution plots. I tried to use the ks package but cannot seem to split them up into separate species.
I used the following:
attach(iris)
library(ks)
library(rgl)
library(misc3d )
s <- levels(iris$Species)
fhat <- kde(x=iris[iris$Species == s[1], 2])
plot(fhat, cont=50, xlab="Sepal length", main="Setosa")
Is there a way to put this in a loop to produce the 12 plots required? How do I plot it for 2 dimensions?
Using ggplot you can arrange all densities in one plot. To do so you need to first pivot the data into long format and can then facet by the variables and Species:
library(tidyverse)
iris %>%
pivot_longer(Sepal.Length:Petal.Width) %>%
ggplot() +
geom_density(aes(x = value)) +
facet_wrap(~ name + Species, scales = "free")

ggplot geom_boxplot and plotting last value with geom_point

I'm new to R. I was trying to plot the last value of each variable in a data frame on top of a boxplot. Without success I was trying:
ggplot(iris, aes(x=Species,y=Sepal.Length)) +
geom_boxplot() +
geom_point(iris, aes(x=unique(iris$Species), y=tail(iris,n=1)))
Thanks, Bill
One approach is
library(tidyverse)
iris1 <- iris %>%
group_by(Species) %>%
summarise(LastVal = last(Sepal.Length))
ggplot(iris, aes(x=Species,y=Sepal.Length)) +
geom_boxplot() +
geom_point(data = iris1, aes(x = Species, y = LastVal))

plot selected columns using ggplot2

I would like to plot multiple separate plots and so far I have the following code:
However, I don't want the final column from my dataset; it makes ggplot2 plot x-variable vs x-variable.
library(ggplot2)
require(reshape)
d <- read.table("C:/Users/trinh/Desktop/Book1.csv", header=F,sep=",",skip=24)
t<-c(0.25,1,2,3,4,6,8,10)
d2<-d2[,3:13] #removing unwanted columns
d2<-cbind(d2,t) #adding x-variable
df <- melt(d2, id = 't')
ggplot(data=df, aes(y=value,x=t) +geom_point(shape=1) +
geom_smooth(method='lm',se=F)+facet_grid(.~variable)
I tried adding
data=subset(df,df[,3:12])
but I don't think I am writing it correctly. Please advise. Thanks.
Here's how you could do it, using data(iris) as an example:
(i) plot with all variables
df <- reshape2::melt(iris, id="Species")
ggplot(df, aes(y=value, x=Species)) + geom_point() + facet_wrap(~ variable)
(ii) plot without "Petal.Width"
library(dplyr)
df2 <- df %>% filter(!variable == "Petal.Width")
ggplot(df2, aes(y=value, x=Species)) + geom_point() + facet_wrap(~ variable)

scatter plot of same variable across different conditions with ggplot facet_grid?

I'd like to correlate the same column of a dataframe for points with distinct row values. For example, in the iris dataframe, I'd like to make three scatter plots comparing Petal.Length of virginica with that of versicolor, setosa with virginica and versicolor with setosa. I want it to appear just like a normal facet_grid or facet_wrap plot. For example, I can do:
ggplot(iris) + geom_point(aes(x=Petal.Length, y=Petal.Length)) + facet_grid(~Species)
This is not what I want, since it's plotting Petal.Length of each species against itself, but I want the plot to appear like this, except where I handcode which species to compare to what other species. How can this be done in ggplot? Thanks.
Your question seems to be about comparing a single variable measured on many individuals that fall into multiple categories. Given your example using the iris dataset, a scatterplot is probably not a useful visualization.
Here I offer several univariate visualizations available in ggplot2. I hope one of these is helpful:
library(ggplot2)
plot_1 = ggplot(iris, aes(x=Petal.Length, colour=Species)) +
geom_density() +
labs(title="Density plots")
plot_2 = ggplot(iris, aes(x=Petal.Length, fill=Species)) +
geom_histogram(colour="grey30", binwidth=0.15) +
facet_grid(Species ~ .) +
labs(title="Histograms")
plot_3 = ggplot(iris, aes(y=Petal.Length, x=Species)) +
geom_point(aes(colour=Species),
position=position_jitter(width=0.05, height=0.05)) +
geom_boxplot(fill=NA, outlier.colour=NA) +
labs(title="Boxplots")
plot_4 = ggplot(iris, aes(y=Petal.Length, x=Species, fill=Species)) +
geom_dotplot(binaxis="y", stackdir="center", binwidth=0.15) +
labs(title="Dot plots")
library(gridExtra)
part_1 = arrangeGrob(plot_1, plot_2, heights=c(0.4, 0.6))
part_2 = arrangeGrob(plot_3, plot_4, nrow=2)
parts_12 = arrangeGrob(part_1, part_2, ncol=2, widths=c(0.6, 0.4))
ggsave(file="plots.png", parts_12, height=6, width=10, units="in")
It is better to group the data first. I'd do something like this:
# get Petal.Length for each species separately
df1 <- subset(iris, Species == "virginica", select=c(Petal.Length, Species))
df2 <- subset(iris, Species == "versicolor", select=c(Petal.Length, Species))
df3 <- subset(iris, Species == "setosa", select=c(Petal.Length, Species))
# construct species 1 vs 2, 2 vs 3 and 3 vs 1 data
df <- data.frame(x=c(df1$Petal.Length, df2$Petal.Length, df3$Petal.Length),
y = c(df2$Petal.Length, df3$Petal.Length, df1$Petal.Length),
grp = rep(c("virginica.versicolor", "versicolor.setosa", "setosa.virginica"), each=50))
df$grp <- factor(df$grp)
# plot
require(ggplot2)
ggplot(data = df, aes(x = x, y = y)) + geom_point(aes(colour=grp)) + facet_wrap( ~ grp)
This results in:

Resources