Use position_jitterdodge without mapping aesthetic - r

I would like to produce a plot like the one obtained with the code below. However, I would like to dodge by "replicate", but without actually mapping an aesthetic (because I would like to assign fill and colors to other aesthetics).
dataset <- data_frame(sample = rep(c("Sample1","Sample2","Sample3", "Sample4"), each = 25),
replicate = sample(x = c("A", "B"), size = 100, replace = TRUE),
value = rnorm(n = 100, mean = 0, sd = 10))
ggplot(data = dataset, aes(x = sample, y = value, fill = replicate)) +
geom_point(position = position_jitterdodge(jitter.width = 0.15, dodge.width = 0.75),
show.legend = F)
I had hope using group = replicate instead of fill = replicate but this doesn't work. I can imagine a workaround using for example alpha = replicate as an aesthetic and setting scale_alpha_manual(values = c(1, 1)) in case of duplicates, but I don't find this solution ideal and would like to keep all aesthetics available (other than x and y available for further use)
ggplot(data = dataset, aes(x = sample, y = value, alpha = replicate)) +
geom_point(position = position_jitterdodge(jitter.width = 0.15, dodge.width = 0.75),
show.legend = F) +
scale_alpha_manual(values = c(1, 1))
The plot that I expect to get is:
I hope my question makes sense, any hint ?
Best,
Yvan

You could unite the sample and replicate columns and use that as the x-axis, injecting a 'Placeholder' value for spacing between samples.
library(tidyverse)
set.seed(20181101)
dataset <- data_frame(sample = rep(c("Sample1","Sample2","Sample3", "Sample4"), each = 25),
replicate = sample(x = c("A", "B"), size = 100, replace = TRUE),
value = rnorm(n = 100, mean = 0, sd = 10))
dataset %>%
bind_rows({
#create a dummy placeholder to allow for spacing between samples
data.frame(sample = unique(dataset$sample),
replicate = rep("Placeholder", length(unique(dataset$sample))),
stringsAsFactors = FALSE)
}) %>%
#unite the sample & replicate columns, and use it as the new x-axis
unite(sample_replicate, sample, replicate, remove = FALSE) %>%
ggplot(aes(x = sample_replicate, y = value, color = replicate)) +
geom_jitter() +
#only have x-axis labels for each sample
scale_x_discrete(breaks = paste0("Sample", 1:length(unique(dataset$sample)), "_B"),
labels = paste0("Sample ", 1:length(unique(dataset$sample)))) +
labs(x = "Sample") +
#don't show the Placeholder value in the legend
scale_color_discrete(breaks = c("A", "B"))

Related

Changing the size of a point in a plot based on another value

I have a data frame that has the following columns:
library(dplyr)
mig_tend <- rep(c("Resident","Migrant", "Unknown"), 100)
pred_observed <- runif(length(mig_tend), min = -0.04270965, max = 0.01783518)
weighted_ <- runif(length(mig_tend), min = 3.648399e-07, max = 0.002123505)
scaled_PDSI <- runif(length(mig_tend), min = -0.842694, max = 1.957527)
pdsi <- runif(length(4), min = -2, max = 2)
y <- runif(length(4), min = -0.00613618, max = -0.002790441)
df1 <- data.frame(mig_tend = as.factor(mig_tend),
pred_observed = pred_observed,
weighted = weighted_,
scaled = scaled_PDSI
)
I am trying to create a plot where the size of the points vary based on the weighted values for that specific point. I would like the point to be smaller when the value of weighted is larger and for the point to be larger when the value of weighted is smaller.
reprex <- df1 %>%
# Plot against PDSI
ggplot(aes(x = scaled, col = as.factor(mig_tend))) +
# geom_hline(yintercept = 0, linetype = 2, col = "grey") +
# add in predictions
geom_point(aes(y = pred_observed, size = weighted)) +
scale_size_continuous(range = c(0.01, 0.2)) +
labs(col = "", fill = "", size = "Weights",
y = "Predicted_Observation",
x = "PDSI") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5),
legend.position = "right")
Simplest approach might be to use the negative value, and then tweak the name and labels to undo that:
geom_point(aes(y = pred_observed, size = -weighted)) +
scale_size_continuous(range = c(0.01, 1.2),
labels = ~scales::comma(x = -.x),
name = "weighted") +
Or alternatively you could look at is the reciprocal, though this might be too extreme for your data in highlighting smaller values. In the example below, this is emphasized even more by using scale_size_area, but the upside of that is that the size values map more directly to our perception -- ie in this case it's showing how much you'd have to "dilute" one unit with to get to the underlying number.
geom_point(aes(y = pred_observed, size = 1/weighted)) +
scale_size_area(max_size = 10,
breaks = c(10000, 100000, 500000),
labels = ~paste0("1/", scales::comma(x = .x)),
name = "weighted") +

Overlay two plots from different dataframes in R

I would like to overlay two ggplots from different data sources. I don't think a left_join will work because the dataframes are of two different lengths and would potential change the underlying plots.[Maybe?]
library(tidyverse)
set.seed(123)
player_df <- tibble(name = rep(c("A","B","C","D"), each = 10, times = 1),
pos = rep(c("DEF","DEF","MID","MID"), each = 10, times = 1),
load = c(rnorm(10, mean = 200, sd = 100),
rnorm(10, mean = 300, sd = 50),
rnorm(10, mean = 400, sd = 100),
rnorm(10, mean = 500, sd = 50)))
p1 <- player_df %>%
ggplot(aes(x = load, y = name)) +
geom_point()
pos_df <- tibble(pos = rep(c("DEF","MID"), each = 30, times = 1),
load = (c(rnorm(30, mean = 250, sd = 100),
rnorm(30, mean = 350, sd = 100))))
p2 <- pos_df %>%
ggplot(aes(x = load, y = pos)) +
geom_boxplot()
p1
p2
# add p2 to every p1 player plot by pos
I would like p1 to have the corresponding p2 - by pos - appear behind it. So... add the matching p2 boxplot to each p1 scatterplot.
p1:
p2:
It's not really advisable to attempt to superimpose two plots on each other. A ggplot is made of layers already, so usually it's just a case of superimposing one geom on another. This can be difficult if (as in your case) one of the axes has different labels. However, with a little work it is possible to wrangle your data so that it all sits on a single plot. In your case, you could do something like:
levs <- c("A", "DEF", "B", "C", "MID", "D")
ggplot(within(pos_df, pos <- factor(pos, levs)), aes(x = load, y = pos)) +
geom_boxplot(width = 2.3) +
geom_point(data = within(player_df, pos <- factor(name, levs))) +
scale_y_discrete(limits = c("A", "DEF", "B", " ", "C", "MID", "D"))
Dug into ggplot a bit and re-engineered a boxplot bit by bit.
# manually calculate stats that are used in boxplots
pos_df_summary <- pos_df %>%
group_by(pos, .drop = FALSE) %>%
summarise(min = fivenum(load)[1],
Q1 = fivenum(load)[2],
median = fivenum(load)[3],
Q3 = fivenum(load)[4],
max = fivenum(load)[5]
)
# add the boxplot data to each player
joined_df <- player_df %>%
left_join(., pos_df_summary, by = "pos") %>%
distinct(name, .keep_all = TRUE)
# plot
ggplot(data = NULL, aes(group = name)) +
# create the line from min to max
geom_segment(data = joined_df, aes(y = name, yend = name, x=min, xend=max), color="black") +
#create the box with median line
geom_crossbar(data = joined_df,
aes(y = name, xmin = Q1, xmax = Q3, x = median, fill = "NA"),
color = "black",
fatten = 1) +
scale_fill_manual(values = "white") +
# add the points from the player_df
geom_point(data = player_df,
aes(x = load, y = name, group=name),
color = "red",
show.legend=FALSE) +
theme(legend.position = "none")
There may be some extraneous code in here as I cobbled it from some other resources. Specifically, I'm not sure what the aes(group = name) in the ggplot() call does exactly.

Need to change the legend to decimal values

I am using this code to generate boxplots. I want the legend to be continuos, not discrete. Also, the boxplot colour needs to be different for each value of SOP(this is in increments of .5).
brk1 <- seq(from = 0.5, to = 5.5, by = .5)
ggplot(data = data, aes(x = SOP, y = Chance.of.Admit)) +
geom_boxplot(aes(fill = SOP, group = SOP)) +
scale_x_continuous(breaks = brk1)
In the current plot as the fill is continous variable it is already different color for each SOP value though it hard for human eyes to detect the differences. If you want to have different color can try different type in the scale_fill_continous such as type = "viridis"
library(dplyr)
library(ggplot2)
# Create a random sample data
data <- tibble(
SOP = sample(seq(1, 5, by = 0.5), size = 1000, replace = TRUE),
Chance.of.Admit = runif(1000, min = 0, max = 1)
)
brk1 <- seq(from = 0.5, to = 5.5, by = .5)
# Using scale_fill_continuous with breaks option
ggplot(data = data, aes(x = SOP, y = Chance.of.Admit)) +
geom_boxplot(aes(fill = SOP, group = SOP)) +
scale_x_continuous(breaks = brk1) +
scale_fill_continuous(breaks = brk1,
type = "viridis")
I understand you don't want a discrete legend though it may worth have a looked.
# Create the colors scale coresponded to the SOP
color_scales_fn <- colorRampPalette(c("#173f5f", "#20639b", "#3caea3",
"#f6d55c", "#ed553b"))
sop_list <- sort(unique(data$SOP))
manual_color <- color_scales_fn(length(sop_list))
names(manual_color) <- sop_list
# Using scale_fill_manual with new color palletes
ggplot(data = data, aes(x = SOP, y = Chance.of.Admit)) +
geom_boxplot(aes(fill = as.character(SOP), group = SOP)) +
scale_x_continuous(breaks = brk1) +
scale_fill_manual(values = manual_color, name = "SOP")
Created on 2021-04-02 by the reprex package (v1.0.0)

How to: Two horizontal Barplots "on top of eachother"

I have a data frame with: Fail [3,3,3,1] and Pass [50,40,50,10]
I just want to make a barplot of Fail and Pass
b_f <- barplot(dat_record$Fail[1], horiz = TRUE, ylab = "FAIL", las = 2, col = "red", xlim = c(0,200))
b_p <- barplot(dat_record$Pass[2], horiz = TRUE, ylab = "PASS", las = 2, col = "green", xlim = c(0,200))
How can i put this two barplots on top of eachother in one graphic/diagram, like this:
And second question:
How can i do this properly with ggplot2? I tried it out, but i always failed with:
ggplot(dat_failpass, aes = (x = fail, fill = "red")+
geom_bar(position = "dodge")+
coord_flip()
Can someone answer me this two question or can you give me any tipps? I'm new into this.
Thank you.
Since you want just the first value of the vectors "Fail" and "Pass" value, this code chunk must plot what you want:
library(ggplot2)
fail = c(3, 3, 3, 1)
pass = c(50, 40, 50, 10)
df = data.frame(value = c(fail[1], pass[1]), label = c('Fail', 'Pass'))
ggplot(df, aes(x = label, y = value)) +
geom_bar(stat = 'identity', position = 'stack') +
coord_flip() +
labs(y = 'Count') +
theme(axis.title.y = element_blank())
Here is the output:
Let us know if this solution solved your problem.
Using your data in this format, here the code for plot:
library(tidyverse)
#Data
df <- structure(list(Fail = c(3, 3, 3, 1), Pass = c(50, 40, 50, 10)), class = "data.frame", row.names = c(NA,
-4L))
Code:
#Reshape and plot
df %>% pivot_longer(cols = everything()) %>%
#Plot
ggplot(aes(x=name,y=value))+
geom_bar(stat = 'identity',fill='gray')+
coord_flip()+
theme_bw()
Output:

ggplot color lines with consistent scale adding new color for each new line

I'm using ggplot to plot some data:
## sample data
dat = data.frame(group = rep(letters[1:5], 10),
idx = rep(1:length(letters[1:5]), each = 10))
dat$value = cumsum(cumsum(sample(c(-1, 1), nrow(dat), TRUE)))
ggplot(dat) +
geom_path(aes(x = idx, y = value, color = group, group = group)) +
viridis::scale_color_viridis(option = 'magma', discrete = T)
## add more groups
dat = data.frame(group = rep(letters[1:10], 10),
idx = rep(1:length(letters[1:10]), each = 10))
dat$value = cumsum(cumsum(sample(c(-1, 1), nrow(dat), TRUE)))
## replot
ggplot(dat) +
geom_path(aes(x = idx, y = value, color = group, group = group)) +
viridis::scale_color_viridis(option = 'magma', discrete = T)
My issue with this is that the max and min colors are the same for both plots. And it is adjusting the colors in between.
Is there anyway to use this color scale (or similar) but always have the second color be the same? ie so that the first five colors would be the same for both graphs?

Resources