Connecting points from two datasets with lines in ggplot2 in R - r

I want to connect datapoints from two datasets with a vertical line. The points that should be connected vertically have the same identifier (V), but I was hoping to keep the datasets separate.
Here is my figure so far:
d1 <- data.frame (V = c("A", "B", "C", "D", "E", "F", "G", "H"),
O = c(9,2.5,7,8,7,6,7,7.5),
S = c(6,5,3,5,3,4,5,6))
d2 <- data.frame (V = c("A", "B", "C", "D"),
O = c(10,3,7.5,8.2),
S = c(6,5,3,5))
scaleFUN <- function(x) sprintf("%.0f", x)
p<-ggplot(data=d1, aes(x=S, y=O), group=factor(V), shape=V) +
geom_point(size = 5, aes(fill=V),pch=21, alpha = 0.35)+
theme_bw()+
geom_point(data = d2, size=5, aes(fill=V), pch=22,colour="black")+
theme(legend.title=element_blank())+
xlab(expression(italic("S"))) + theme(text = element_text(size=25))+
ylab(expression(italic("O")))+ theme(text = element_text(size=25))+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
theme(axis.text.y=element_text(angle=90, hjust=1))+
theme(legend.position="none") # remove legend
print(p)
So the final figure would look something like this:
Can I do this with geom_line() without combining datasets (so the other formatting can be separate for each dataset)?

As bouncyball pointed out, you can use a separate data set (merged from d1 and d2) with geom_segment.
See the following:
ggplot(data = d1, aes(x = S, y = O), group = factor(V), shape = V) +
geom_point(size = 5, aes(fill = V), pch = 21, alpha = 0.35) +
geom_point(data = d2, size = 5, aes(fill = V), pch = 22, colour = "black") +
geom_segment(data = merge(d1, d2, by = 'V'),
aes(x = S.x, xend = S.y, y = O.x, yend = O.y)) +
guides(fill = FALSE)
Which yields:
You can add your themes also.

Related

Alluvial plot with 2 different sources but a converging/shared variable [R]

I have experience with making alluvial plots using the ggalluvial package. However, I have run in to an issue where I am trying to create an alluvial plot with two different sources that converge onto 1 variable.
here is example data
library(dplyr)
library(ggplot2)
library(ggalluvial)
data <- data.frame(
unique_alluvium_entires = seq(1:10),
label_1 = c("A", "B", "C", "D", "E", rep(NA, 5)),
label_2 = c(rep(NA, 5), "F", "G", "H", "I", "J"),
shared_label = c("a", "b", "c", "c", "c", "c", "c", "a", "a", "b")
)
here is the code I use to make the plot
#prep the data
data <- data %>%
group_by(shared_label) %>%
mutate(freq = n())
data <- reshape2::melt(data, id.vars = c("unique_alluvium_entires", "freq"))
data$variable <- factor(data$variable, levels = c("label_1", "shared_label", "label_2"))
#ggplot
ggplot(data,
aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
y = freq, fill = value, label = value)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(color = "grey", width = 1/4, na.rm = TRUE) +
geom_text(stat = "stratum", size = 4) +
theme_void() +
theme(
axis.text.x = element_text(size = 12, face = "bold")
)
(apparently I cannot embed images yet)
As you can see, I can remove the NA values, but the shared_label does not properly "stack". Each unique row should stack on top of each other in the shared_label column. This would also fix the sizing issue so that they are equal size along the y axis.
Any ideas how to fix this? I have tried ggsankey but the same issue arises and I cannot remove NA values. Any tips is greatly appreciated!
This plot is the expected result of the "flow" statistical transformation, which is the default for the "flow" graphical object. (That is, geom_flow() = geom_flow(stat = "flow").) It looks like what you want is to specify the "alluvium" statistical transformation instead. Below i've used all your code but only copied and edited the ggplot() call.
#ggplot
ggplot(data,
aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
y = freq, fill = value, label = value)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow(stat = "alluvium") + # <-- specify alternate stat
geom_stratum(color = "grey", width = 1/4, na.rm = TRUE) +
geom_text(stat = "stratum", size = 4) +
theme_void() +
theme(
axis.text.x = element_text(size = 12, face = "bold")
)
#> Warning: Removed 2 rows containing missing values (geom_text).
Created on 2021-12-10 by the reprex package (v2.0.1)

changing legend of faceted boxplot in ggplot2 to have groups with similar names inside

This question builds off of enter link description here but is in the context of faceted boxplots.
So, I have the following code:
set.seed(20210714)
dd <- data.frame(Method = rep(c("A", "B", "C"), each = 60), Pattern = rep(c("X", "Y", "Z"), times = 30), X1 = runif(180), Complexity = rep(c("High", "Low"), times = 90), nsim = rep(rep(1:10, times = 9), each = 2), n = 10)
dd1 <- data.frame(Method = rep(c("A", "B", "C"), each = 60), Pattern = rep(c("X", "Y", "Z"), times = 30), X1 = runif(180), Complexity = rep(c("High", "Low"), times = 90), nsim = rep(rep(1:10, times = 9), each = 2), n = 5)
dd <- rbind(dd, dd1)
library(ggplot2)
# create dummy dataframe.
dummy.df <- dd
dummy.df[nrow(dd) + 1:2,"Pattern"] <- unique(dd$Pattern)[-3]
dummy.df[nrow(dd) + 1:2,"Method"] <- "ZZZ"
dummy.df[nrow(dd) + 1:2,"Complexity"] <- c("High","Low")
dummy.df$dummy <- interaction(dummy.df$Method,dummy.df$Pattern)
ggplot(dummy.df, aes(x = dummy, y = X1, fill = Method)) +
geom_boxplot(aes(fill = Method)) +
facet_grid(~Complexity) +
theme_light() +
theme(legend.position = 'bottom') +
guides(fill = guide_legend(nrow=1)) +
geom_line(aes(x = dummy,
group=interaction(Pattern,nsim)),
size = 0.35, alpha = 0.35, colour = I("#525252")) +
geom_point(aes(x = dummy,
group=interaction(Pattern,nsim)),
size = 0.35, alpha = 0.25, colour = I("#525252")) +
scale_x_discrete(labels = c("","X", "", "", "", "Y", "", "", "", "Z","","")) +
xlab("Pattern") +
scale_fill_brewer(breaks=c("A", "B", "C"), type="qual", palette="Paired")
dummy.df <- dd
dummy.df[nrow(dd) + 1:2,"Pattern"] <- unique(dd$Pattern)[-3]
dummy.df[nrow(dd) + 1:2,"Method"] <- "ZZZ"
dummy.df[nrow(dd) + 1:2,"Complexity"] <- c("High","Low")
dummy.df$dummy <- interaction(dummy.df$Method,dummy.df$Pattern)
dummy.df$fill <- interaction(dummy.df$Method, dummy.df$n)
dummy.df$dummy <- interaction(dummy.df$fill, dummy.df$Pattern)
dummy.df$dummy <- factor(dummy.df$dummy, levels = levels(dummy.df$dummy)[-c(4, 12, 20, 24)])
dummy.df$dummy[361:362] <- "A.10.Z" ## dummy variables to get rid of NAs
theme_set(theme_bw(base_size = 14))
ggplot(dummy.df, aes(x = dummy, y = X1, fill = fill)) +
geom_boxplot(aes(fill = fill),lwd=0.1,outlier.size = 0.01) +
facet_grid(~Complexity) +
theme(legend.position = 'bottom') +
guides(fill = guide_legend(nrow=1)) +
geom_line(aes(x = dummy,
group=interaction(Pattern,nsim,n)),
size = 0.35, alpha = 0.35, colour = I("#525252")) +
geom_point(aes(x = dummy,
group=interaction(Pattern,nsim,n)),
size = 0.35, alpha = 0.25, colour = I("#525252")) +
scale_x_discrete(labels = c("X", "Y", "Z"), breaks = paste("A.10.", c("X", "Y", "Z"), sep = ""),drop=FALSE) +
xlab("Pattern") +
scale_fill_brewer(breaks= levels(dummy.df$fill)[-c(4,8)], type="qual", palette="Paired")
This yields the following plot.
All is well, except with the legend. I would like the following: the dark colors to be in the First group titled "n=5" on the left, with "A", "B", "C" for the three dark colors, and the light colors to be to the right, in a Second group titled "n=10" on the right, with "A", "B", "C" for the three light colors. Sort of like in the link enter link description here above.
What I can not figure out is how to call the boxplot twice to mimic the solution there.
Is there a way to do this? Please feel free to let me know if the question is not clear.
Thanks again, in advance, for any help!
Adapting my answer on your former question this could be achieved like so:
library(ggplot2)
fill <- levels(dummy.df$fill)[-c(4,8)]
fill <- sort(fill)
labels <- gsub("\\.\\d+", "", fill)
labels <- setNames(labels, fill)
colors <- scales::brewer_pal(type="qual", palette="Paired")(6)
colors <- setNames(colors, fill)
library(ggnewscale)
ggplot(dummy.df, aes(x = dummy, y = X1, fill = fill)) +
geom_boxplot(aes(fill = fill), lwd=0.1,outlier.size = 0.01) +
scale_fill_manual(name = "n = 5", breaks= fill[grepl("5$", fill)], labels = labels[grepl("5$", fill)], values = colors,
guide = guide_legend(title.position = "left", order = 1)) +
new_scale_fill() +
geom_boxplot(aes(fill = fill), lwd=0.1,outlier.size = 0.01) +
scale_fill_manual(name = "n = 10", breaks = fill[grepl("10$", fill)], labels = labels[grepl("10$", fill)], values = colors,
guide = guide_legend(title.position = "left", order = 2)) +
facet_grid(~Complexity) +
theme(legend.position = 'bottom') +
guides(fill = guide_legend(nrow=1)) +
geom_line(aes(x = dummy,
group=interaction(Pattern,nsim,n)),
size = 0.35, alpha = 0.35, colour = I("#525252")) +
geom_point(aes(x = dummy,
group=interaction(Pattern,nsim,n)),
size = 0.35, alpha = 0.25, colour = I("#525252")) +
scale_x_discrete(labels = c("X", "Y", "Z"), breaks = paste("A.10.", c("X", "Y", "Z"), sep = ""),drop=FALSE) +
xlab("Pattern")
#> Warning: Removed 2 rows containing non-finite values (new_stat_boxplot).

ggplot custom legend instead of default

I've searched and tried a bunch of suggestions to be able to display a custom legend instead of the default one in a grouped scatter ggplot. I've tried this and this and following this among others.
For instance, let's say I have a df like this one:
df = data.frame(id = c("A", "A", "B", "C", "C", "C"),
value = c(1,2,1,2,3,4),
ref = c(1.5, 1.5, 1, 2,2,2),
min = c(0.5, 0.5, 1,2,2,2))
and I want to display the values of each id as round dots, but also put the reference values and minimum values for each id as a differently shaped dot, as follows:
p = ggplot(data = df) +
geom_point(aes(x = id, y = value, color = factor(id)), shape = 19, size = 6) +
geom_point(aes(x = id, y = ref, color = factor(id)), shape = 0, size = 8) +
geom_point(aes(x = id, y = min, color = factor(id)), shape = 2, size = 8) +
xlab("") +
ylab("Value")
#print(p)
Now all is fine, but my legend doesn't add anything to the interpretation of the plot, as the X axis and colors are enough to understand it. I know I can remove the legend via theme(legend.position = "none").
Instead, I would like to have a legend of what the actual shapes of each dot represent (e.g., filled round dot = value, triangle = min, square = ref).
Among trying to manually set the scale values via scale_fill_manual and something along those lines
override.shape = shapes$shape
override.linetype = shapes$pch
guides(colour = guide_legend(override.aes = list(shape = override.shape, linetype = override.linetype)))...
....
I've also tried making a secondary plot, but not display it, using something suggested in one of the links pasted above:
shapes = data.frame(shape = c("value", "reference", "minimum"), pch = c(19,0,2), col = c("gray", "gray", "gray"))
p2 = ggplot(shapes, aes(shape, pch)) + geom_point()
#print(p2)
g_legend <- function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)
}
legend <- g_legend(p2)
library(gridExtra)
pp <- arrangeGrob(p1 ,legend,
widths=c(5/4, 1/4),
ncol = 2)
but then I get the error:
> legend <- g_legend(p2)
Error in tmp$grobs[[leg]] :
attempt to select less than one element in get1index
for which I did not find a working solution.. so yeah.. any suggestion on how I could only show a legend related to the different dot shapes would be welcome.
Thank you
You can manually build a shape legend using scale_shape_manual:
library(ggplot2)
ggplot(data = df) +
geom_point(aes(x = id, y = value, color = factor(id), shape = 'value'), size = 6) +
geom_point(aes(x = id, y = ref, color = factor(id), shape = 'ref'), size = 8) +
geom_point(aes(x = id, y = min, color = factor(id), shape = 'min'), size = 8) +
scale_shape_manual(values = c('value' = 19, 'ref' = 0, 'min' = 2)) +
xlab("") +
ylab("Value")
Created on 2020-04-15 by the reprex package (v0.3.0)
But a better way to do this would be to reshape the df to a long format, and map each aes to a variable:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-id) %>%
ggplot() +
geom_point(aes(x = id, y = value, color = factor(id), shape = name, size = name)) +
scale_shape_manual(values = c('value' = 19, 'ref' = 0, 'min' = 2)) +
scale_size_manual(values = c('value' = 6, 'ref' = 8, 'min' = 8)) +
xlab("") +
ylab("Value")
Created on 2020-04-15 by the reprex package (v0.3.0)
To remove the legend for the color use guide_none():
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(-id) %>%
ggplot() +
geom_point(aes(x = id, y = value, color = factor(id), shape = name, size = name)) +
scale_shape_manual(values = c('value' = 19, 'ref' = 0, 'min' = 2)) +
scale_size_manual(values = c('value' = 6, 'ref' = 8, 'min' = 8)) +
guides(color = guide_none()) +
xlab("") +
ylab("Value")
Created on 2020-04-16 by the reprex package (v0.3.0)
Data:
df = data.frame(id = c("A", "A", "B", "C", "C", "C"),
value = c(1,2,1,2,3,4),
ref = c(1.5, 1.5, 1, 2,2,2),
min = c(0.5, 0.5, 1,2,2,2))
You can tidy your data first using tidyr, and then map the aes shape to the new variable
library(tidyr)
df2 <- pivot_longer(df, -id)
ggplot(data = df2) +
geom_point(aes(x = id, y = value, shape = name), size = 6) +
xlab("") +
ylab("Value")

legend does not follow ordered bars in R

My plot has two problems:
(1) the group bars are not ordered as I want them to be - I will like them to appear in the order entered and (2) for the legend, the order appears as V, E, B whereas in the groups, it appears as B, E, V. I can reverse the legend, however, what I will really like to get is change the order of the subplots to V, E, B.
library(ggplot2)
df2 <- data.frame(supp = rep(c("V","E","B"), each=5),
s = rep(c("C3","C1", "C2","C5","C6"), 3),
len = c(1,2,3,4,5,6,8,4,4,3,9,7,6,8,5))
p <- ggplot(data = df2, aes(x = s, y = len, fill = supp)) +
geom_bar(stat = "identity", color = "black", position = position_dodge())
p + scale_fill_brewer(palette = "Blues", guide = guide_legend(reverse = TRUE)) +
scale_x_discrete(limits = rev(levels(df2$s)))
You need to change df2$supp from character to factor and specify the levels as you want them to appear.
See modified code below. Also, check out this link for even more detail about how to control the colour of your variables so they are consistent.
library(ggplot2)
df2 <- data.frame(supp = rep(c("V","E","B"), each=5),
s = rep(c("C3","C1", "C2","C5","C6"), 3),
len = c(1,2,3,4,5,6,8,4,4,3,9,7,6,8,5))
df2$supp <- factor(df2$supp,
levels = c("V", "E", "B"))
p <- ggplot(data=df2, aes(x=(df2$s), y=len, fill=supp)) +
geom_bar(stat="identity", color="black", position=position_dodge())
p + scale_fill_brewer(palette="Blues", guide = guide_legend(reverse=TRUE)) +
scale_x_discrete(limits = rev(levels(df2$s)))
Data
df2 <- data.frame(supp = rep(c("V", "E", "B"), each = 5),
s = rep(c("C3", "C1", "C2", "C5", "C6"), 3),
len = c(1, 2, 3, 4, 5, 6, 8, 4, 4, 3, 9, 7, 6, 8, 5))
Adjustment
Because you use data.frame() to create data, R will set strings as factors by default. So you need to revise the types of variables to what you want.
df2$s <- as.character(df2$s)
df2$supp <- factor(df2$supp, levels = c("V", "E", "B"))
Plot
ggplot(data = df2, aes(x = s, y = len, fill = supp)) +
geom_bar(stat = "identity", color = "black", position = position_dodge()) +
scale_fill_brewer(palette = "Blues", direction = -1)
Here you don't need to use additional guide_legend() and scale_x_discrete() to change order. It will be more concise.

geom_point with different legend for fill and shape

Hmmm, maybe it's the temprature or I'm once again do not see the obvious ...
Here is my code:
library(ggplot2)
p <- ggplot()
p <- p + geom_point(aes(x = 1, y=1,bg = "I", group = "B"),pch = 21, size = 20, color=NA)
p <- p + geom_point(aes(x = 1, y=1.125,bg = "I", group = "B" ),pch = 22, size = 20, color=NA)
p <- p + geom_point(aes(x = 0.75, y=1.125,bg = "II", group = "A" ),pch = 22, size = 20, color=NA)
p <- p + geom_point(aes(x = 0.85, y=1.125,bg = "III", group = "A" ),pch = 22, size = 20, color=NA)
p <- p + scale_fill_manual(values= c("darkred", "darkblue", "darkgreen"), guide=guide_legend(override.aes = list(shape = 23)))
#p <- p + scale_fill_manual(values= c("darkred", "darkblue", "darkgreen"), guide=guide_legend(inherit.aes = FALSE))
p <- p + scale_shape_manual(labels = c("circle", "rectangle"),values = c(21, 22))
p
What I'm trying to achieve are basically two legends, one that reflects the color, in this example there are three different colors ("I", "II", and "III"), and two different types of shapes "rectangle" and "circle", there will never be more than these two different shapes.
Unfortunately there are some additional constraints ... I can't use the aesthetic color due to the fact that I'm also using geom_segment to somehow connect the shapes, and that is the second constraint, I have to use ggplot2.
But I'not able to produce these two legends, any help is appreciated ...
Why don't you store all your points in a data frame? It suits perfectly:
df <- data.frame(x = c(1, 1, 0.75, 0.85),
y = c(1, 1.125, 1.125, 1.125),
nr = c("I", "I", "II", "III"),
sh = c("B", "B", "A", "A"))
And now you can easily see the required mapping:
ggplot(df, aes(x, y, fill = nr, shape = sh)) +
geom_point(size = 20, color = NA) +
scale_shape_manual(labels = c("circle", "rectangle"), values = c(21, 22),
guide = guide_legend(override.aes = list(colour = 1))) +
scale_fill_manual(values = c("darkred", "darkblue", "darkgreen"),
guide = guide_legend(override.aes = list(shape = 23)))

Resources