I'd like to correlate the same column of a dataframe for points with distinct row values. For example, in the iris dataframe, I'd like to make three scatter plots comparing Petal.Length of virginica with that of versicolor, setosa with virginica and versicolor with setosa. I want it to appear just like a normal facet_grid or facet_wrap plot. For example, I can do:
ggplot(iris) + geom_point(aes(x=Petal.Length, y=Petal.Length)) + facet_grid(~Species)
This is not what I want, since it's plotting Petal.Length of each species against itself, but I want the plot to appear like this, except where I handcode which species to compare to what other species. How can this be done in ggplot? Thanks.
Your question seems to be about comparing a single variable measured on many individuals that fall into multiple categories. Given your example using the iris dataset, a scatterplot is probably not a useful visualization.
Here I offer several univariate visualizations available in ggplot2. I hope one of these is helpful:
library(ggplot2)
plot_1 = ggplot(iris, aes(x=Petal.Length, colour=Species)) +
geom_density() +
labs(title="Density plots")
plot_2 = ggplot(iris, aes(x=Petal.Length, fill=Species)) +
geom_histogram(colour="grey30", binwidth=0.15) +
facet_grid(Species ~ .) +
labs(title="Histograms")
plot_3 = ggplot(iris, aes(y=Petal.Length, x=Species)) +
geom_point(aes(colour=Species),
position=position_jitter(width=0.05, height=0.05)) +
geom_boxplot(fill=NA, outlier.colour=NA) +
labs(title="Boxplots")
plot_4 = ggplot(iris, aes(y=Petal.Length, x=Species, fill=Species)) +
geom_dotplot(binaxis="y", stackdir="center", binwidth=0.15) +
labs(title="Dot plots")
library(gridExtra)
part_1 = arrangeGrob(plot_1, plot_2, heights=c(0.4, 0.6))
part_2 = arrangeGrob(plot_3, plot_4, nrow=2)
parts_12 = arrangeGrob(part_1, part_2, ncol=2, widths=c(0.6, 0.4))
ggsave(file="plots.png", parts_12, height=6, width=10, units="in")
It is better to group the data first. I'd do something like this:
# get Petal.Length for each species separately
df1 <- subset(iris, Species == "virginica", select=c(Petal.Length, Species))
df2 <- subset(iris, Species == "versicolor", select=c(Petal.Length, Species))
df3 <- subset(iris, Species == "setosa", select=c(Petal.Length, Species))
# construct species 1 vs 2, 2 vs 3 and 3 vs 1 data
df <- data.frame(x=c(df1$Petal.Length, df2$Petal.Length, df3$Petal.Length),
y = c(df2$Petal.Length, df3$Petal.Length, df1$Petal.Length),
grp = rep(c("virginica.versicolor", "versicolor.setosa", "setosa.virginica"), each=50))
df$grp <- factor(df$grp)
# plot
require(ggplot2)
ggplot(data = df, aes(x = x, y = y)) + geom_point(aes(colour=grp)) + facet_wrap( ~ grp)
This results in:
Related
I aim to create a ggplot with Date along the x axis, and jump height along the y axis. Simplistically, for 1 athlete in a large group of athletes, this will allow the reader to see improvements in jump height over time.
Additionally, I would like to add a ggMarginal(type = "density") to this plot. Here, I aim to plot the distribution of all athlete jump heights. As a result, the reader can interpret the performance of the primary athlete in relationship to the group distribution.
For the sack of a reproducible example, the Iris df will work.
'''
library(dplyr)
library(ggplot2)
library(ggExtra)
df1 <- iris %<%
filter(Species == "setosa")
df2 <- iris
#I have tried as follows, but a variety of error have occurred:
ggplot(NULL, aes(x=Sepal.Length, y=Sepal.Width))+
geom_point(data=df1, size=2)+
ggMarginal(data = df2, aes(x=Sepal.Length, y=Sepal.Width), type="density", margins = "y", size = 6)
'''
Although this data frame is significantly different than mine, in relation to the Iris data set, I aim to plot x = Sepal.Length, y = Sepal.Width for the Setosa species (df1), and then use ggMarginal to show the distribution of Sepal.Width on the y axis for all the species (df2)
I hope this makes sense!
Thank you for your time and expertise
As far as I get it from the docs you can't specify a separate data frame for ggMarginal. Either you specify a plot to which you want to add a marginal plot or you provide the data directly to ggMarginal.
But one option to achieve your desired result would be to create your density plot as a separate plot and glue it to your main plot via patchwork:
library(ggplot2)
library(patchwork)
df1 <- subset(iris, Species == "setosa")
df2 <- iris
p1 <- ggplot(df1, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(size = 2)
p2 <- ggplot(df2, aes(y = Sepal.Width)) +
geom_density() +
theme_void()
p1 + p2 +
plot_layout(widths = c(6, 1))
I am using the R programming language. Using the following link (https://bio304-class.github.io/bio304-book/introduction-to-ggplot2.html) , I made these two plots for the iris dataset:
library(ggplot2)
library(cowplot)
data(iris)
#graph1
setosa.only <- subset(iris, Species == "setosa")
setosa.sepals <- ggplot(setosa.only,
mapping = aes(x = Sepal.Length, y = Sepal.Width))
graph1 = setosa.sepals + geom_point() + sepal.labels
#graph2
graph2 = setosa.sepals +
geom_density2d() +
sepal.labels + labs(subtitle = "I. setosa data only")
cowplot::plot_grid(graph1, graph2, labels = "AUTO")
My question: is it possible to combine both of these graphs together into 1 single plot?
So that it looks something like this? (I tried to draw this by hand):
Thanks
You can add geom_density2d() after geom_point() :
library(ggplot2)
ggplot(setosa.only,
mapping = aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_density2d()
I'm interested in creating a scatter plot using ggplot2 in r where 1 regression line uses all of the points to create that line, the plot itself has points that are different from one another on the basis of a grouping variable with 2 level, and 2 regression lines associated with the grouping variables are present.
I want to combine the 1 overall regression lines from this graph:
With the 2 grouping variable specific regression lines from this graph:
Is this possible? If so, how?
Thanks in advance
# creates data for scatter plot
## dataset of interest
iris
## for iris
colnames(iris)
### creates dataset with just cases where iris$Species == setosa or versicolor
#### unique values for iris$Species
unique(iris$Species)
#### loads tidyverse package
library(tidyverse)
##### filters dataset with just cases where iris$Species == setosa or versicolor
iris__setosa_or_versicolor <- iris %>% filter(iris$Species != "virginica")
##### turns iris__setosa_or_versicolor to dataframe
iris__setosa_or_versicolor <- data.frame(iris__setosa_or_versicolor)
##### unique values for iris__setosa_or_versicolor$Species
unique(iris__setosa_or_versicolor$Species)
## creates scatter plot
### loads ggplot2
library(ggplot2)
### Basic scatter plot separated by Species with regression lines
scatter_plot__sepal_length_x_sepal_width__points_is_species <- ggplot(iris__setosa_or_versicolor, aes(x=Sepal.Length, y=Sepal.Width, color=Species, shape=Species)) + geom_point() + geom_smooth(method=lm, se=FALSE, fullrange=TRUE) + labs(title="Scatter plot of Sepal.Length X Sepal.Width with dots as Species\n where Species is setosa or versicolor, with regression lines for each of the Species variable levels", x="Sepal.Length", y = "Sepal.Width") + scale_colour_manual(values = c("#ff0000","#0000ff"))
scatter_plot__sepal_length_x_sepal_width__points_is_species
### Basic scatter plot with regression line added for all data, and point differentiated by grouping variable
scatter_plot__sepal_length_x_sepal_width__points_is_species <-ggplot(iris__setosa_or_versicolor, aes(x=Sepal.Length, y=Sepal.Width)) + geom_point(aes(col=Species)) + geom_smooth(method=lm, se=FALSE, color="green") + labs(title="Scatter plot of Sepal.Length X Sepal.Width with dots as Species where\n Species is setosa or versicolor but not differentiated by species", x="Sepal.Length", y = "Sepal.Width") + scale_colour_manual(values = c("#ff0000","#0000ff"))
scatter_plot__sepal_length_x_sepal_width__points_is_species
Images from post:
Is this what you are trying to do?
ggplot(iris__setosa_or_versicolor, aes(x=Sepal.Length, y=Sepal.Width)) +
geom_point(aes(col=Species)) +
geom_smooth(method=lm, se=FALSE, color="green") +
geom_smooth(aes(col = Species), method=lm, se=FALSE, fullrange=TRUE) +
labs(title="Scatter plot of Sepal.Length X Sepal.Width with dots as Species where\n Species is setosa or versicolor but not differentiated by species", x="Sepal.Length", y = "Sepal.Width") +
scale_colour_manual(values = c("#ff0000","#0000ff"))
I am trying to display grouped boxplot and combined boxplot into one plot. Take the iris data for instance:
data(iris)
p1 <- ggplot(iris, aes(x=Species, y=Sepal.Length)) +
geom_boxplot()
p1
I am trying to compare overall distribution with distributions within each categories. So is there a way to display a boxplot of all samples on the left of these three grouped boxplots?
Thanks in advance.
You can rbind a new version of iris, where Species equals "All" for all rows, to iris before piping to ggplot
p1 <- iris %>%
rbind(iris %>% mutate(Species = 'All')) %>%
ggplot(aes(x = Species, y = Sepal.Length)) +
geom_boxplot()
Yes, you can just create a column for all species as follows:
iris = iris %>% mutate(all = "All Species")
p1 <- ggplot(iris) +
geom_boxplot(aes(x=Species, y=Sepal.Length)) +
geom_boxplot(aes(x=all, y=Sepal.Length))
p1
I have a data frame in R called x that has hundreds of rows. Each row is a person. I have two variables, Height, which is continuous, and Country, which is a factor. I want to plot a smoothed histogram of all of the heights of the individuals. I want to stratify it by Country. I know that I can do that with the following code:
library(ggplot2)
ggplot(x, aes(x=Height, colour = (Country == "USA"))) + geom_density()
This plots everyone from the USA as one color (true) and everyone from any other country as the other color (false). However, what I would really like to do is plot everyone from the USA in one color and everyone from Oman, Nigeria, and Switzerland as the other color. How would I adapt my code to do this?
I made up some data for illustration:
head(iris)
table(iris$Species)
df <- iris
df$Species2 <- ifelse(df$Species == "setosa", "blue",
ifelse(df$Species == "virginica", "red", ""))
library(ggplot2)
p <- ggplot(df, aes(x = Sepal.Length, colour = (Species == "setosa")))
p + geom_density() # Your example
# Now let's choose the other created column
p <- ggplot(df, aes(x = Sepal.Length, colour = Species2))
p + geom_density() + facet_wrap(~Species2)
Edit to get rid of the "countries" that you don't want in the plot, just subset them out of the data frame you use in the plot (note that the labels with the colours don't exactly match but that can be changed within the data frame itself):
p <- ggplot(df[df$Species2 %in% c("blue", "red"),], aes(x = Sepal.Length, colour = Species2))
p + geom_density() + facet_wrap(~Species2)
And to overlay the lines just take out the facet_wrap:
p + geom_density()
I enjoyed working through the excellent answer above. Here are my mods.
df <- iris
df$Species2 <- ifelse(df$Species == "setosa", "blue",
ifelse(df$Species == "virginica", "red", ""))
homes2006 <- df
names(homes2006)[names(homes2006)=="Species"] <- "ownership"
homes2006a <- as.data.frame(sapply(homes2006, gsub,
pattern ="setosa", replacement = "renters"))
homes2006b <- as.data.frame(sapply(homes2006a, gsub, pattern = "virginica",
replacement = "home-owners"))
homes2006c <- as.data.frame(sapply(homes2006b, gsub, pattern = "versicolor",
replacement = "home-owners"))
##somehow sepal-length became a factor column
homes2006c[,1] <- as.numeric(homes2006c[,1])
library(ggplot2)
p <- ggplot(homes2006c, aes(x = Sepal.Length,
colour = (ownership == "home-owners")))
p + ylab("number of households") +
xlab("monthly income (NIS)") +
ggtitle("income distribution by home ownership") +
geom_density()