How to jitter only one categorical variable - r

How do I retain one variable (a single point) to not be jittered while keeping the jitter on the other categorical variable in ggplot?
Here is the code I am currently using and what the output looks like:
# load ggplot2
library(ggplot2)
library(hrbrthemes)
# A basic scatterplot with color depending on Species
p <- ggplot(dt, aes(x=Type, y=y, color=Type)) +
geom_jitter(shape=22,
alpha=0.5,
size=2) +
geom_hline(yintercept=c(1.4, 28.7, 2.65, 14.9)) +
labs(y = 'ng/g lipid', title = 'PCB 99')
# Log base 10 scale
p + scale_y_continuous(trans = 'log10')
enter image description here

One option would be to split your data in categories (or (single) observations) which you want to be displayed jittered and not to be jittered. The first set of data could then be passed to geom_jitter while for the second you could use geom_point.
Using iris as example data:
library(ggplot2)
ggplot(iris, aes(x = Species, y = Sepal.Length, color = Species)) +
geom_jitter(
data = iris[!iris$Species == "setosa", ],
shape = 22,
alpha = 0.5,
size = 2
) +
geom_point(
data = iris[iris$Species == "setosa", ],
alpha = 0.5,
size = 2
)

Related

Creating a legend with shapes using ggplot2

I have created the following code for a graph in which four fitted lines and corresponding points are plotted. I have problems with the legend. For some reason I cannot find a way to assign the different shapes of the points to a variable name. Also, the colours do not line up with the actual colours in the graph.
y1 <- c(1400,1200,1100,1000,900,800)
y2 <- c(1300,1130,1020,970,830,820)
y3 <- c(1340,1230,1120,1070,940,850)
y4 <- c(1290,1150,1040,920,810,800)
df <- data.frame(x,y1,y2,y3,y4)
g <- ggplot(df, aes(x=x), shape="shape") +
geom_smooth(aes(y=y1), colour="red", method="auto", se=FALSE) + geom_point(aes(y=y1),shape=14) +
geom_smooth(aes(y=y2), colour="blue", method="auto", se=FALSE) + geom_point(aes(y=y2),shape=8) +
geom_smooth(aes(y=y3), colour="green", method="auto", se=FALSE) + geom_point(aes(y=y3),shape=6) +
geom_smooth(aes(y=y4), colour="yellow", method="auto", se=FALSE) + geom_point(aes(y=y4),shape=2) +
ylab("x") + xlab("y") + labs(title="overview")
geom_line(aes(y=1000), linetype = "dashed")
theme_light() +
theme(plot.title = element_text(color="black", size=12, face="italic", hjust = 0.5)) +
scale_shape_binned(name="Value g", values=c(y1="14",y2="8",y3="6",y4="2"))
print(g)
I am wondering why the colours don't match up and how I can construct such a legend that it is clear which shape corresponds to which variable name.
While you can add the legend manually via scale_shape_manual, perhaps the adequate solution would be to reshape your data (try using tidyr::pivot_longer() on y1:y4 variables), and then assigning the resulting variable to the shape aesthetic (you can then manually set the colors to your liking). You would then need to use a single geom_point() and geom_smooth() instead of four of each.
Also, you're missing a reproducible example (what are the values of x?) and your code emits some warnings while trying to perform loess smoothing (because there's fewer data points than need to perform it).
Update (2021-12-12)
Here's a reproducible example in which we reshape the original data and feed it to ggplot using its aes() function to automatically plot different geom_point and geom_smooth for each "y group". I made up the values for the x variable.
library(ggplot2)
library(tidyr)
x <- 1:6
y1 <- c(1400,1200,1100,1000,900,800)
y2 <- c(1300,1130,1020,970,830,820)
y3 <- c(1340,1230,1120,1070,940,850)
y4 <- c(1290,1150,1040,920,810,800)
df <- data.frame(x,y1,y2,y3,y4)
data2 <- df %>%
pivot_longer(y1:y4, names_to = "group", values_to = "y")
ggplot(data2, aes(x, y, color = group, shape = group)) +
geom_point(size = 3) + # increased size for increased visibility
geom_smooth(method = "auto", se = FALSE)
Run the code line by line in RStudio and use it to inspect data2. I think it'll make more sense here's the resulting output:
Another update
Freek19, in your second example you'll need to specify both the shape and color scales manually, so that ggplot2 considers them to be the same, like so:
library(ggplot2)
data <- ... # from your previous example
ggplot(data, aes(x, y, shape = group, color = group)) +
geom_smooth() +
geom_point(size = 3) +
scale_shape_manual("Program type", values=c(1, 2, 3,4,5)) +
scale_color_manual("Program type", values=c(1, 2, 3,4,5))
Hope this helps.
I managed to get close to what I want, using:
library(ggplot2)
data <- data.frame(x = c(0,0.02,0.04,0.06,0.08,0.1),
y = c(1400,1200,1100,1000,910,850, #y1
1300,1130,1010,970,890,840, #y2
1200,1080,980,950,880,820, #y3
1100,1050,960,930,830,810, #y4
1050,1000,950,920,810,800), #y5
group = rep(c("5%","6%","7%","8%","9%"), each = 6))
data
Values <- ggplot(data, aes(x, y, shape = group, color = group)) + # Create line plot with default colors
geom_smooth(aes(color=group)) + geom_point(aes(shape=group),size=3) +
scale_shape_manual(values=c(1, 2, 3,4,5))+
geom_line(aes(y=1000), linetype = "dashed") +
ylab("V(c)") + xlab("c") + labs(title="Valuation")+
theme_light() +
theme(plot.title = element_text(color="black", size=12, face="italic", hjust = 0.5))+
labs(group="Program Type")
Values
I am only stuck with 2 legends. I want to change both name, because otherwise they overlap. However I am not sure how to do this.

Plotting a vertical normal distribution next to a box plot in R

I'm trying to plot box plots with normal distribution of the underlying data next to the plots in a vertical format like this:
This is what I currently have graphed from an excel sheet uploaded to R:
And the code associated with them:
set.seed(12345)
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)
#graphing boxplot and quasirandom scatterplot together
ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) +
geom_quasirandom(shape=20, fill="gray", color = "gray") +
geom_boxplot(fill="NA", color = c("red4", "orchid4", "dark green", "blue"),
outlier.color = "NA") +
theme_hc()
Is this possible in ggplot2 or R in general? Or is the only way this would be feasible is through something like OrignLab (where the first picture came from)?
You can do something similar to your example plot with the gghalves package:
library(gghalves)
n=0.02
ggplot(iris, aes(Species, Sepal.Length)) +
geom_half_boxplot(center=TRUE, errorbar.draw=FALSE,
width=0.5, nudge=n) +
geom_half_violin(side="r", nudge=n) +
geom_half_dotplot(dotsize=0.5, alpha=0.3, fill="red",
position=position_nudge(x=n, y=0)) +
theme_hc()
There are a few ways to do this. To gain full control over the look of the plot, I would just calculate the curves and plot them. Here's some sample data that's close to your own and shares the same names, so it should be directly applicable:
set.seed(12345)
X8_17_20_R_20_60 <- data.frame(
Diameter = rnorm(4000, rep(c(41, 40, 42, 40), each = 1000), sd = 6),
Type = rep(c("AvgFeret", "CalcDiameter", "Feret", "MinFeret"), each = 1000))
Now we create a little data frame of normal distributions based on the parameters taken from each group:
df <- do.call(rbind, mapply( function(d, n) {
y <- seq(min(d), max(d), length.out = 1000)
data.frame(x = n - 5 * dnorm(y, mean(d), sd(d)) - 0.15, y = y, z = n)
}, with(X8_17_20_R_20_60, split(Diameter, Type)), 1:4, SIMPLIFY = FALSE))
Finally, we draw your plot and add a geom_path with the new data.
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)
ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) +
geom_quasirandom(shape = 20, fill = "gray", color = "gray") +
geom_boxplot(fill="NA", aes(color = Type), outlier.color = "NA") +
scale_color_manual(values = c("red4", "orchid4", "dark green", "blue")) +
geom_path(data = df, aes(x = x, y = y, group = z), size = 1) +
theme_hc()
Created on 2020-08-21 by the reprex package (v0.3.0)

ggplot boxplot + jitter plot showing random sampling of data

I'd like to use ggplot to generate a series of boxplots derived from all data within a dataset, but then with jittered points showing a random sampling of the respective data (e.g., 100 data points) to avoid over-plotting (there are thousands of data points). Can anyone please help me with the code for this? The basic framework I have now is below, but I don't know what if any arguments can be added to draw a random sampling of data to display as the jittered points. Thanks for any help.
ggplot(datafile, aes(x=factor(var1), y=var2, fill=var3)) + geom_jitter(size=0.1, position=position_jitter(width=0.3, height=0.2)) + geom_boxplot(alpha=0.5) + facet_grid(.~var3) + theme_bw() + scale_fil_manual(values=c("red", "green", "blue")
You could take a random subset of your data using dplyr:
library(dplyr)
library(ggplot)
ggplot(data = datafile, aes(x = factor(var1), y = var2, fill = var3)) +
geom_jitter(
# use random subset of data
data = datafile %>% group_by(var1) %>% sample_n(100),
aes(x = factor(var1), y = var2, fill = var3)),
size = 0.1,
position = position_jitter(width = 0.3, height = 0.2)) +
geom_boxplot(alpha = 0.5) +
facet_grid(.~var3) +
theme_bw() +
scale_fill_manual(values = c("red", "green", "blue")

Color outliers multiple factors in boxplot

Let's say I have the following data frame:
library(ggplot2)
set.seed(101)
n=10
df<- data.frame(delta=rep(rep(c(0.1,0.2,0.3),each=3),n), metric=rep(rep(c('P','R','C'),3),n),value=rnorm(9*n, 0.0, 1.0))
My goal is to do a boxplot by multiple factors:
p<- ggplot(data = df, aes(x = factor(delta), y = value)) +
geom_boxplot(aes(fill=factor(metric)))
The output is:
So far so good, but if I do:
p+ geom_point(aes(color = factor(metric)))
I get:
I do not know what it is doing. My goal is to color the outliers as it is done here. Note that this solution changes the inside color of the boxes to white and set the border to different colors. I want to keep the same color of the boxes while having the outliers inherit those colors. I want to know how to make the outliers get the same colors from their respective boxplots.
Do you want just to change the outliers' colour ? If so, you can do it easily by drawing boxplot twice.
p <- ggplot(data = df, aes(x = factor(delta), y = value)) +
geom_boxplot(aes(colour=factor(metric))) +
geom_boxplot(aes(fill=factor(metric)), outlier.colour = NA)
# outlier.shape = 21 # if you want a boarder
[EDITED]
colss <- c(P="firebrick3",R="skyblue", C="mediumseagreen")
p + scale_colour_manual(values = colss) + # outliers colours
scale_fill_manual(values = colss) # boxes colours
# the development version (2.1.0.9001)'s geom_boxplot() has an argument outlier.fill,
# so I guess under code would return the similar output in the near future.
p2 <- ggplot(data = df, aes(x = factor(delta), y = value)) +
geom_boxplot(aes(fill=factor(metric)), outlier.shape = 21, outlier.colour = NA)
Maybe this:
ggplot(data = df, aes(x = as.factor(delta), y = value,fill=as.factor(metric))) +
geom_boxplot(outlier.size = 1)+ geom_point(pch = 21,position=position_jitterdodge(jitter.width=0))

overlay colored boxplots on parallel coordinate plots with faceting in ggplot2

I have the following example.
require(ggplot2)
# Example Data
x <- data.frame(var1=rnorm(800,0,1),
var2=rnorm(800,0,1),
var3=rnorm(800,0,1),
type=factor(rep(c("x", "y"), length.out=800)),
set=factor(rep(c("A","B","C","D"), each=200))
)
Now, I would like to plot (thin) parallel coordinate plots of these lines, with points for each of the variable values. I would like to overlay a boxplot (each of a different color for each method) on these parallel coordinate plots at the variables values. On top of this, I would like to facet for the groups and types, say using set~type. Is this possible to do using ggplot2?
Any suggestions? Thanks!
You need to put data in long format first. I didn't put in points, since the graph is already cluttered enough, but you can do so by adding a geom_point.
require(tidyr)
x$id <- 1:nrow(x)
x2 <- gather(x, var, value, var1:var3)
Boxplots
ggplot(x2, aes(var, value)) +
geom_line(aes(group = id), size = 0.05, alpha = 0.3) +
geom_boxplot(aes(fill = var), alpha = 0.5) +
facet_grid(set ~ type) +
theme_bw()
Or perhaps violins
Replacing the boxplots with violins looks pretty cool as well.
ggplot(x2, aes(var, value)) +
geom_line(aes(group = id), size = 0.05, alpha = 0.3) +
geom_violin(aes(fill = var), col = NA, alpha = 0.6) +
facet_grid(set ~ type) +
theme_bw()

Resources