Plot jitter points for certain boxplots only - r

I have five files with data in matrix form and I am plotting it using geom_boxplot. each boxplot corresponds to a file.
What I want to achieve is for only certain files say here for div1,div3,div5 I want to plot boxplot with data points overlaid on the boxplot. I could add data points using geom_jitter but i had to separate those plots with data points from the only boxplots plots.
Since I want to preserve the order of plotting the files..i.e div0,div1.. etc. I could not plot data points for only certain boxplots.
How can add overlay data points for only certain boxplots and not all?
files <- c(div0,div1,div2,div3,div4,div5)
p1 <- ggplot(moltenNew,aes(x=L1,y=value,colour=L1))+ ylim(0.3,0.8) +
geom_boxplot() + facet_wrap(~variable,nrow=1) + scale_x_discrete(limits = basename(files) ,labels = basename(files))
![enter image description here][1]

You could use subset:
set.seed(1)
moltenNew <- rbind(
data.frame(value = rnorm(20, 50, 20), L1 = gl(2, 10), variable = 1),
data.frame(value = rnorm(20, 100, 100), L1 = gl(2, 10), variable = 2),
data.frame(value = rnorm(20, 75, 10), L1 = gl(2, 10), variable = 3)
)
moltenNew
library(ggplot2)
ggplot(moltenNew,aes(x=L1,y=value,colour=L1)) +
geom_boxplot() +
facet_wrap(~variable,nrow=1, scale = "free_y") +
geom_point(subset = .(variable == 2), position = position_jitter(width = .2))

Related

Adding three reference lines to a ggplot2 box plot works, but why can't I add two reference lines?

I have an example data frame:
#Libraries
library(tidyverse)
#Create example data
ex <- data.frame(id = 1:300,
event = rep(c("f", "s", "t"), 100),
x = rnorm(300, 50, 20))
And I need to make one plot with three horizontal reference lines and one plot with two horizontal reference lines.
The plot with three horizontal reference lines works
## Example Plot One (Three Reference Lines) ##
#Create boxplot function
ex_triple_plot <- function(data){
#Create datasets for references
References <- data.frame( x = c(-Inf, Inf, -Inf, Inf, -Inf, Inf),
y = c(60, 50, 40),
References = factor(c(60, 50, 40),
labels = c("High",
"Medium",
"Low")))
#Create Plots
ggplot(data, aes(x = event, y = x)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha=0.6) +
geom_line(aes( x, y, color = References), References) +
scale_color_manual(values=c("red", "red", "red"))}
#Plot
ex_triple_plot(ex)
But when I try to make a plot with two horizontal reference lines, the reference lines do not show up
## Example Plot Two (Two Reference Lines) ##
#Create boxplot function
ex_double_plot <- function(data){
#Create datasets for references
References <- data.frame( x = c(-Inf, Inf, -Inf, Inf),
y = c(60, 40),
References = factor(c(60, 40),
labels = c("High",
"Low")))
#Create Plots
ggplot(data, aes(x = event, y = x)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha=0.6) +
geom_line(aes( x, y, color = References), References) +
scale_color_manual(values=c("red", "red"))}
#Plot
ex_double_plot(ex)
Does anybody know what I'm doing wrong here?

How to produce a similar plot? [R]

The authors of this paper (https://www.sciencedirect.com/science/article/pii/S0092867415006418) mention in their supplementary file that these were produced in Matlab. Due to lack of proficiency, time to learn it, and the license, I was trying to replicate the figure below (Figure 2 of the paper, specifically figure 2A on the left) in R:
Any suggestions? What is this plot called more generally?
Thank you!
To me it looks like a classic point plot! You can reproduce this kind of plot in R with ggplot:
# Fake dataframe with xy coordinates, type of data (for the coloring), pvalue (for size), and different panel
df <- data.frame(
x = rep(1:20, 10),
y = rnorm(200, mean = 0, sd = 2),
type = rep(rep(LETTERS[1:5], each = 4), 10),
pvalue = sample(0:50, size = 200, replace = T)/1000,
panel = sample(rep(paste0("panel", 1:4), each = 50)), 200, replace = F)
# plot
library(ggplot2)
ggplot(df, aes(x, y*x , color = type, size = pvalue)) + geom_hline(yintercept = 0) + geom_point() + facet_wrap(~panel, ncol = 2)
ggsave("demo.png")

How do I jitter just the outliers in a ggplot boxplot?

I am using geom_jitter() for a boxplot with ggplot. I noticed it adds a point for every record on top of the boxplot, instead of jittering just the points that represent outliers.
This is demonstrated by this code.
data <- as.data.frame(c(rnorm(10000, mean = 10, sd = 20), rnorm(300, mean = 90, sd = 5)))
names(data) <- "blapatybloo"
data %>% ggplot(aes("column", blapatybloo)) + geom_boxplot() + geom_jitter(alpha=.1)
How do I apply geom_jitter to only the points on the boxplot without overlapping the rest of the records?
Create a new column to determine if a data point is an outlier or not.
Then overlay the points onto the boxplot.
data <- as.data.frame(c(rnorm(10000, mean = 10, sd = 20),
rnorm(300, mean = 90, sd = 5)))
names(data) <- "blapatybloo"
data <- data %>%
mutate(outlier = blapatybloo > median(blapatybloo) +
IQR(blapatybloo)*1.5 | blapatybloo < median(blapatybloo) -
IQR(blapatybloo)*1.5)
data %>%
ggplot(aes("column", blapatybloo)) +
geom_boxplot(outlier.shape = NA) +
geom_point(data = function(x) dplyr::filter(x, outlier),
position = "jitter")

How to get different colors related to treatment for boxplot and violin plot (ggplot / using geom_split_violin) that are plotted in one?

I am trying to show a boxplot and a violin plot in one.
I can fill in the colors of the boxplot and violin plot based on the treatment. But, I don't want them in exactly the same color, I'd prefer the violin plot or the boxplot filling to be lighter.
Also, I am able to get the outer lines of the boxplot in different colors if I add col=TM to the aes of the geom_boxplot. But, then I can not choose these colors or don't know how to (they are now automatically pink and blue).
BACKGROUND:
I am working with a data set that looks something like this:
TM yax X Zscore
Org zscore zhfa -1.72
Org zscore zfwa -0.12
I am plotting the z-scores based on the X (zhfa e.d.) per treatment (TM).
#Colours
ocean = c('#BBDED6' , '#61C0BF' , '#FAE3D9' , '#FFB6B9' )
## Plot ##
z <- ggplot(data = data, aes(x = X, y = Zscore,fill=TM)) +
geom_split_violin(col="white", fill="white") +
geom_boxplot(alpha = 1, width=0.3, aes(fill=TM), position = position_dodge(width = 0.3))
z + theme(axis.text = element_text(size = 12),legend.position="top") +
stat_compare_means(method="t.test", label.y=2.8, label.x=0.3, size=3) +
scale_fill_manual(values=ocean, labels=c("Mineral fertilizer", "Organic fertilizer"))
Now, half of the violin plot is filled white, but not both (which would already be better). If I would plot geom_split_violin() it would get exactly the same colors as the boxplot.
Furthermore, should the violinplot of zhfa be on the left side but it get's switched and is displayed at the right side, while it matched the data of the organic (left) boxplot.
The graph now:
I don't know if it can be solved by adding something related to the scale_fill_manual or if this is an impossible request
Sample Data:
data <- data.frame(TM = c(rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5)),
Zscore = runif(30,-2,2),
X = c(rep("zwfa", 10), rep("zhfa", 10), rep("zbfa", 10)))
You can add an additional column to your data that is the same structure as TM but different values, then scale the fill:
Sample Data:
data <- data.frame(TM = c(rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5),rep("org", 5), rep("min", 5)),
Zscore = runif(30,-2,2),
X = c(rep("zwfa", 10), rep("zhfa", 10), rep("zbfa", 10)))
Begin solution:
data <- data %>% mutate(TMm = c(rep("orgM", 5), rep("minM", 5),rep("orgM", 5), rep("minM", 5),rep("orgM", 5), rep("minM", 5)))
#Colours
ocean = c('#BBDED6' , '#FAE3D9', '#61C0BF' , '#FFFFFF')
## Plot ##
z <- ggplot(data = data, aes(x = X, y = Zscore,fill=TM)) +
geom_split_violin(mapping = aes(fill=TMm)) +
geom_boxplot(alpha = 1, width=0.3, aes(fill=TM), position = position_dodge(width = 0.3))
z + theme(axis.text = element_text(size = 12),legend.position="top") +
stat_compare_means(method="t.test", label.y=2.8, label.x=0.3, size=3) +
scale_fill_manual(breaks = c("org", "min"), values=ocean, labels=c("Mineral fertilizer", "Organic fertilizer"))
In your data you may have to change breaks = c("org", "min") to whatever you call the factor levels in the TM variable
Or if you want the whole violin plot white:
ocean = c('#BBDED6' , '#FFFFFF', '#61C0BF' , '#FFFFFF')
New Plot:

Adding dummy values on axis in ggplot2 to add asymmetric distance between ticks

How to add dummy values on x-axis in ggplot2
I have 0,2,4,6,12,14,18,22,26 in data and that i have plotted on x-axis. Is there a way to add the remaining even numbers for which there is no data in table? this will create due spaces on the x-axis.
after the activity the x-axis should show 0,2,4,6,8,10,12,14,16,18,20,22,24,26
i have tried using rbind.fill already to add dummy data but when I make them factor the 8,10,12etc coming in last
Thanks
enter image description here
Hope this make sense:
library(ggplot2)
gvals <- factor(letters[1:3])
xvals <- factor(c(0,2,4,6,12,14,18,22,26), levels = seq(0, 26, by = 2))
yvals <- rnorm(10000, mean = 2)
df <- data.frame(x = sample(xvals, size = length(yvals), replace = TRUE),
y = yvals,
group = sample(gvals, size = length(yvals), replace = TRUE))
ggplot(df, aes(x = x, y = y)) + geom_boxplot(aes(fill = group)) +
scale_x_discrete(drop = FALSE)
The tricks are to make the x-variable with all levels you need and to specify drop = FALSE in scale.

Resources