I try to find a clear approach for combined scatter and line plots with ggplot2 that have an appropriate legend. The following works, in principle, but with warnings:
library("ggplot2")
library("dplyr")
## 2 data sets, one for the lines, one for the points
tbl <- tibble(
f = rep(letters[1:2], each = 10),
x = rep(1:10, 2),
y = c(1e-4 * exp(1:10), log(1:10))
)
obs <- tibble(
f = rep("c", 5),
x = seq(2, 10, 2),
y = log(seq(2, 10, 2)) + rnorm(5, sd = 0.1)
)
rbind(tbl, obs) %>%
ggplot(aes(x, y, color = f, linetype = f)) +
geom_line(show.legend = TRUE) +
geom_point(show.legend = TRUE, aes(shape = f), size = 3) +
scale_linetype_manual(values=c("solid", "solid", "blank")) +
scale_shape_manual(values=c(NA, NA, 16))
but I would like to get rid of warnings and to write something like:
scale_shape_manual(values=c("none", "none", "circle"))
Is there already a "none" or "empty" shape code? Several past answers have been suggested on SO, but I wonder if there is a recent canonical way.
Related
I would like to make heatmaps using the following data:
dt <- data.frame(
h = rep(LETTERS[1:7], 7),
j = c(rep("A", 7), rep("B", 7), rep("C", 7), rep("D", 7), rep("E", 7), rep("F", 7), rep("G", 7)),
Red = runif(7, 0, 1),
Yellow = runif(7, 0, 1),
Green = runif(7, 0, 1),
Blue = runif(7, 0, 1),
Black = runif(7, 0, 1)
)
For each of the heatmaps, the x and y axes stay as the first 2 columns of df. The values that fill in each heatmap will be each of the remaining columns, e.g., Red, Yellow, ...
I borrowed this example to produce the following code:
loop = function(df, x_var, y_var, f_var) {
ggplot(df, aes(x = .data[[x_var]], y = .data[[y_var]], fill = .data[[f_var]])) +
geom_tile(color = "black") +
scale_fill_gradient(low = "white", high = "blue") +
geom_text(aes(label = .data[[f_var]]), color = "black", size = 4) +
coord_fixed() +
theme_minimal() +
labs(x = "",
y = "",
fill = "R", # Want the legend title to be each of the column names that are looped
title = .data[[f_var]])
ggsave(a, file = paste0("heatmap_", f_var,".png"), device = png, width = 15, height = 15, units = "cm")
}
plot_list <- colnames(dt)[-1] %>%
map( ~ loop(df = dt,
x_var = colnames(dt)[1],
y_var = colnames(dt)[2],
f_var = .x))
# view all plots individually (not shown)
plot_list
Problems I encountered when ran this chunk of code:
Error: Discrete value supplied to continuous scale
Step ggsave didn't work. I would like to save each plot by the names of the changing columns.
There are some minor issues with your code. You get the first error as you included the second column of your dataset (which is a categorical, i.e. discrete variable) in the loop. Second, title = .data[[f_var]] will not work. Simply use title = f_var to add the variable name as the title. Finally, you are trying to save an object called a which however is not defined in your code, i.e. you have to assign your plot to a variable a and to return the plot I added a return(a):
set.seed(123)
library(ggplot2)
library(purrr)
loop = function(df, x_var, y_var, f_var) {
a <- ggplot(df, aes(x = .data[[x_var]], y = .data[[y_var]], fill = .data[[f_var]])) +
geom_tile(color = "black") +
scale_fill_gradient(low = "white", high = "blue") +
geom_text(aes(label = .data[[f_var]]), color = "black", size = 4) +
coord_fixed() +
theme_minimal() +
labs(x = "",
y = "",
fill = "R", # Want the legend title to be each of the column names that are looped
title = f_var)
ggsave(a, file = paste0("heatmap_", f_var,".png"), device = png, width = 15, height = 15, units = "cm")
return(a)
}
plot_list <- colnames(dt)[-c(1, 2)] %>%
map( ~ loop(df = dt,
x_var = colnames(dt)[1],
y_var = colnames(dt)[2],
f_var = .x))
# view all plots individually (not shown)
plot_list[c(1, 5)]
#> [[1]]
#>
#> [[2]]
I would like to visualize Vargha & Delaney's A in ggplot for educational purposes.
A is an effect size used to compare ordinal data of two groups that depend on each data point's upward/downward/sideways comparison to all data points of the other group.
For this, I would like to be able to show all upward, downward, and equal comparisons of data points in different colors. For an example of what I'm looking for, check out this rough scribble
For reproducibility's sake here is some data to try it with:
library(tidyverse)
data_VD <- tibble(
A = c(1, 2, 3, 6),
B = c(1, 3, 7, 9)
)
For reference to how A is calculated, see https://journals.sagepub.com/doi/10.3102/10769986025002101, though it shouldn't be necessary for creating the plot.
You could do:
library(tidyverse)
long_dat <- data_VD %>%
{expand.grid(A = .$A, B = .$B)} %>%
mutate(change = factor(sign(B - A)))
ggplot(pivot_longer(data_VD, everything()), aes(x = name, y = value)) +
geom_segment(data = long_dat, size = 1.5,
aes(x = 'A', xend = 'B', y = A, yend = B, color = change)) +
geom_point(size = 4) +
scale_color_manual(values = c('#ed1e26', '#fff205', '#26b24f')) +
theme_classic(base_size = 20) +
scale_y_continuous(breaks = 1:10) +
labs(x = '', y = '') +
theme(legend.position = 'none')
I would like to produce a plot like the one obtained with the code below. However, I would like to dodge by "replicate", but without actually mapping an aesthetic (because I would like to assign fill and colors to other aesthetics).
dataset <- data_frame(sample = rep(c("Sample1","Sample2","Sample3", "Sample4"), each = 25),
replicate = sample(x = c("A", "B"), size = 100, replace = TRUE),
value = rnorm(n = 100, mean = 0, sd = 10))
ggplot(data = dataset, aes(x = sample, y = value, fill = replicate)) +
geom_point(position = position_jitterdodge(jitter.width = 0.15, dodge.width = 0.75),
show.legend = F)
I had hope using group = replicate instead of fill = replicate but this doesn't work. I can imagine a workaround using for example alpha = replicate as an aesthetic and setting scale_alpha_manual(values = c(1, 1)) in case of duplicates, but I don't find this solution ideal and would like to keep all aesthetics available (other than x and y available for further use)
ggplot(data = dataset, aes(x = sample, y = value, alpha = replicate)) +
geom_point(position = position_jitterdodge(jitter.width = 0.15, dodge.width = 0.75),
show.legend = F) +
scale_alpha_manual(values = c(1, 1))
The plot that I expect to get is:
I hope my question makes sense, any hint ?
Best,
Yvan
You could unite the sample and replicate columns and use that as the x-axis, injecting a 'Placeholder' value for spacing between samples.
library(tidyverse)
set.seed(20181101)
dataset <- data_frame(sample = rep(c("Sample1","Sample2","Sample3", "Sample4"), each = 25),
replicate = sample(x = c("A", "B"), size = 100, replace = TRUE),
value = rnorm(n = 100, mean = 0, sd = 10))
dataset %>%
bind_rows({
#create a dummy placeholder to allow for spacing between samples
data.frame(sample = unique(dataset$sample),
replicate = rep("Placeholder", length(unique(dataset$sample))),
stringsAsFactors = FALSE)
}) %>%
#unite the sample & replicate columns, and use it as the new x-axis
unite(sample_replicate, sample, replicate, remove = FALSE) %>%
ggplot(aes(x = sample_replicate, y = value, color = replicate)) +
geom_jitter() +
#only have x-axis labels for each sample
scale_x_discrete(breaks = paste0("Sample", 1:length(unique(dataset$sample)), "_B"),
labels = paste0("Sample ", 1:length(unique(dataset$sample)))) +
labs(x = "Sample") +
#don't show the Placeholder value in the legend
scale_color_discrete(breaks = c("A", "B"))
Sample data
set.seed(123)
par(mfrow = c(1,2))
dat <- data.frame(years = rep(1980:2014, each = 8), x = sample(1000:2000, 35*8 ,replace = T))
boxplot(dat$x ~ dat$year, ylim = c(500, 4000))
I have another dataset that has a single value for some selected years
ref.dat <- data.frame(years = c(1991:1995, 2001:2008), x = sample(1000:2000, 13, replace = T))
plot(ref.dat$years, ref.dat$x, type = "b")
How can I add the line plot on top of the boxplot
With ggplot2 you could do this:
ggplot(dat, aes(x = years, y = x)) +
geom_boxplot(data = dat, aes(group = years)) +
geom_line(data = ref.dat, colour = "red") +
geom_point(data = ref.dat, colour = "red", shape = 1) +
coord_cartesian(ylim = c(500, 4000)) +
theme_bw()
The trick here is to figure out the x-axis on the boxplot. You have 35 boxes and they are plotted at the x-coordinates 1, 2, 3, ..., 35 - i.e. year - 1979. With that, you can add the line with lines as usual.
set.seed(123)
dat <- data.frame(years = rep(1980:2014, each = 8),
x = sample(1000:2000, 35*8 ,replace = T))
boxplot(dat$x ~ dat$year, ylim = c(500, 2500))
ref.dat <- data.frame(years = c(1991:1995, 2001:2008),
x = sample(1000:2000, 13, replace = T))
lines(ref.dat$years-1979, ref.dat$x, type = "b", pch=20)
The points were a bit hard to see, so I changed the point style 20. Also, I used a smaller range on the y-axis to leave less blank space.
I'm having a problem making the symbols in the legend of my plot match those in the plot itself.
Suppose the data has four columns like this
data = data.frame(x = sample(1:10, 10, replace=TRUE), y = sample(1:10, 10, replace=TRUE),
Rank = sample(1:10, 10, replace = TRUE), Quantified = factor(sample(1:2, 10, replace = TRUE))
)
I would like points to be different sizes (distinguished by 'Rank') and represented by different symbols (crosses and open circles, distinguished by 'Quantified').
My code is
ggplot(data, aes(x = x, y = y)) +
geom_point(aes(size = Rank, shape = Quantified)) +
scale_shape_manual("Quantified", labels = c("Yes", "No"), values = c(1, 4)
)
The symbols in the plot are as I want them.
My problem is that I would like the circles in the top legend to be unfilled as they are in the plot.
I've tried a variety of commands in different parts of the code (e.g., fill = "white") but nothing seems to work quite right.
Any suggestions?
Now that I'm sure it's what you want:
library(scales)
ggplot(data, aes(x = x, y = y)) +
geom_point(aes(size = Rank, shape = Quantified)) +
scale_shape_manual("Quantified", labels = c("Yes", "No"), values = c(1, 4)) +
guides(size = guide_legend(override.aes = list(shape = 1)))