Colouring nodes using graph and tidygraph in R? - r

I recently asked this question about how to colour nodes by variable. And the code works great. However, I'm back trying to colour the terminal nodes separately. For example, if I create some data, then turn them into tidygraph objects and plot them using ggraph then I get something like this:
library(tidygraph)
library(ggraph)
library(gridExtra)
pal = colorspace::sequential_hcl(palette = "Purples 3", n = 100)
# create some data for the tbl_graph
nodes <- data.frame(name = c("x4", NA, NA),
label = c("x4", 5, 2),
value = c(10, 5, 2))
nodes1 <- data.frame(name = c("x4", "x2", NA, NA, "x1", NA, NA),
label = c("x4", "x2", 2, 1, "x1", 13, 7),
value = c(10, 8, 2, 1, 10, 13, 7))
edges <- data.frame(from = c(1,1), to = c(2,3))
edges1 <- data.frame(from = c(1, 2, 2, 1, 5, 5),
to = c(2, 3, 4, 5, 6, 7))
# create the tbl_graphs
tg <- tbl_graph(nodes = nodes, edges = edges)
tg_1 <- tbl_graph(nodes = nodes1, edges = edges1)
# put into list
myList <- list(tg, tg_1)
# set colours for variables
nodenames <- unique(na.omit(unlist(lapply(myList, .%>%activate(nodes) %>% pull(name) ))))
nodecolors <- setNames(scales::hue_pal(c(0,360)+15, 100, 64, 0, 1)(length(nodenames)), nodenames)
nodecolors
# plot function
plotFun <- function(List, colors=NULL){
plot <- ggraph(List, "partition") +
geom_node_tile(aes(fill = name), size = 0.25) +
geom_node_label(aes(label = label, color = name)) +
scale_y_reverse() +
theme_void() +
theme(legend.position = "none")
if (!is.null(colors)) {
plot <- plot + scale_fill_manual(values=colors) +
scale_fill_manual(values=colors, na.value= 'grey40')
}
plot
}
# create grid of plots
allPlots <- lapply(myList, plotFun, colors=nodecolors)
n <- length(allPlots)
nRow <- floor(sqrt(n))
do.call("grid.arrange", c(allPlots, nrow = nRow))
As you can see the named nodes are all coloured correctly, but the terminal nodes are coloured grey. I am trying to colour the terminal nodes by the corresponding value in the value column of the data. I have tried altering the scale_fill_manual function, but I cant seem to get it to work..
Any suggestions as to how I could do this?

If I understand correctly, you want to apply a different colour mapping to
the terminal nodes, mapping value to colour rather than name, and using
a different colour scale altogether. ggplot2 doesn’t support that directly,
but you can use e.g. ggnewscale to apply a different scale for the rest
of the plot.
I simplified your example a bit to focus on the new scale application:
library(tidygraph)
library(ggraph)
nodes <- data.frame(
name = c("x4", "x2", NA, NA, "x1", NA, NA),
label = c("x4", "x2", 2, 1, "x1", 13, 7),
value = c(10, 8, 2, 1, 10, 13, 7)
)
edges <- data.frame(
from = c(1, 2, 2, 1, 5, 5),
to = c(2, 3, 4, 5, 6, 7)
)
tg <- tbl_graph(nodes = nodes, edges = edges)
ggraph(tg, "partition") +
geom_node_tile(aes(fill = name)) +
geom_node_label(aes(label = label, color = name)) +
# Apply different colour/fill scales to terminal nodes
ggnewscale::new_scale_fill() +
ggnewscale::new_scale_color() +
geom_node_tile(
data = . %>% filter(is.na(name)),
aes(fill = value)
) +
geom_node_label(
data = . %>% filter(is.na(name)),
aes(label = label, color = value)
)

Related

automatically adding brackets or braces to ggplot and plotly objects

I want to automate adding brackets/braces to a ggplot object and then convert it to plotly using ggplotly.
library(tidyverse)
library(plotly)
#devtools::install_github("NicolasH2/ggbrace")
library(ggbrace)
set.seed(10)
mydata <- data.frame(xx = c(1:10), yy = sample(0:10, 10))
my_bracket_data <- data.frame(start = c(1, 4, 6),
end = c(3, 5, 9),
info = c("first", "second", "third"),
y_bottom = rep(11, 3),
y_top = rep(12, 3))
I can do this by using geom_brace but it involves manually typing it out for each brace rather than using the data that is already stored in a data.frame:
plot_geom_brace <- ggplot(data = mydata, aes(xx, yy)) +
geom_line(size = 1.5) +
geom_brace(aes(x = c(1, 3), y = c(11, 12), label = "first"), inherit.data = F, labelsize = 5) +
geom_brace(aes(x = c(4, 5), y = c(11, 12), label = "second"), inherit.data = F, labelsize = 5) +
geom_brace(aes(x = c(6, 9), y = c(11, 12), label = "third"), inherit.data = F, labelsize = 5)
plot_geom_brace
ggplotly(plot_geom_brace)
Is there a way that I can do this without repeatedly writing geom_brace layers for each brace (and instead access the data from my_bracket_data directly)?
As an aside this can be automated using geom_bracket but this is not supported by plotly yet.
library(ggpubr)
plot_geom_bracket <- ggplot(data = mydata, aes(xx, yy)) +
geom_line(size = 1.5) +
geom_bracket(xmin = my_bracket_data$start,
xmax = my_bracket_data$end,
y.position = rep(11, 3),
label = my_bracket_data$info,
inherit.aes = FALSE)
plot_geom_bracket
ggplotly(plot_geom_bracket)
# Warning message:
# In geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]) :
# geom_GeomBracket() has yet to be implemented in plotly.
# If you'd like to see this geom implemented,
# Please open an issue with your example code at
# https://github.com/ropensci/plotly/issues
Any suggestions?
Thanks
An option could be by creating two dataframes of your bracket data. One dataframe for the geom_braces by converting the data to a longer format with pivot_longer to create three geom braces through the aes. To get the labels you can create a small summarise table with the x and y positions per group of info. Here is some reproducible code:
library(tidyverse)
library(plotly)
#devtools::install_github("NicolasH2/ggbrace")
library(ggbrace)
set.seed(10)
mydata <- data.frame(xx = c(1:10), yy = sample(0:10, 10))
my_bracket_data <- data.frame(start = c(1, 4, 6),
end = c(3, 5, 9),
info = c("first", "second", "third"),
y_bottom = rep(11, 3),
y_top = rep(12, 3))
# Data for geom_brace
my_bracket_data_long <- my_bracket_data %>%
pivot_longer(cols = c(start, end), values_to = "x_value", names_to = "x_names") %>%
pivot_longer(cols = c(y_bottom, y_top), values_to = "y_value", names_to = "y_names")
# data for labels braces
my_bracket_data_labels <- my_bracket_data_long %>%
group_by(info) %>%
summarise(x_pos = mean(x_value),
y_pos = mean(y_value) + 1)
# plot
plot_geom_brace <- ggplot(data = mydata, aes(xx, yy)) +
geom_line(size = 1.5) +
geom_brace(data = my_bracket_data_long, aes(x = x_value, y = y_value, group = info)) +
geom_text(data = my_bracket_data_labels, aes(x = x_pos, y = y_pos, group = info, label = info))
ggplotly(plot_geom_brace)
Created on 2023-01-07 with reprex v2.0.2
special ggplot2 libraries like ggpubr usually don't play along well with conversion to plotly objects.
If you dont want to type out each geom_bracet call you could loop over the rows of the dataframe, create the geom statement using paste and pass it to the existing plot object using eval in the following line:
m<- data.frame(s = c(1, 4, 6), ## = my_bracket_data
e = c(3, 5, 9),
i = c("first", "second", "third"),
y_b = rep(11, 3),
y_t = rep(12, 3))
p<- ggplot(data = mydata, aes(xx, yy)) + geom_line(size = 1.5)
for (i in 1:NROW(my_bracket_data)) {
input = paste('geom_brace(aes(x =c(',m[i,]$s,',',m[i,]$e'),c(',m[i,]$y_b,',',
m[i,]$y_t,'),label=',m[i,]$i,'), inherit.data = F, labelsize = 5)',sep='')
p = p + eval(parse(text=input))
}
p
However this is more of a hacky solution, but that's what R tends to become if you incorporate/mix different styles like for example apply functions with tidyr syntax (or in this case ggplot, which could be seen as an ancestor of tidyr) and more programming style approaches (for, while , func...[yes you can programm in R]) and also want to let it automatically converse the whole thing to a Javascript thing (aka plotly) . .. its a beautiful mess .

Loop violin plots of controls with clinical case data points overlaid (ggplot)

New to posting on here. Apologies if I miss including something needed to solve my situation.
I have a matched case-control design where three 'younger' clinical cases have been age-matched to a 'younger' control group, and three 'older' cases have been matched to an 'older' control group. I am plotting the control group distribution in a violin plot and overlaying the corresponding matched cases as data points.
I have a lot of variables and I would like to loop through them to minimise error and increase efficiency. I have had a go at writing the code for the loop but I am not sure what to do with the fact that I have two types of plots (violin and point) and two data frames (controls and cases) involved.
Here is the code I have for the plots:
#fake data
cases <- data.frame(
id = factor(1:6),
strange_stories_ToM_mean = sample(6:8, 6, replace = TRUE),
age = factor(c(rep("young", 3), rep("old", 3)))
)
controls <- data.frame(
id = 7:23,
strange_stories_ToM_mean = sample(c(6,6,7,7,7,7,7,7,7,8,8,8,9,9,9,9,9), 17),
age = c(rep("young", 9), rep("old", 8))
)
#plots
ggplot(data = controls, aes(strange_stories_ToM_mean, age)) +
geom_violin(
trim = FALSE,
alpha = 0.2,
draw_quantiles = c(0.25, 0.5, 0.75),
fill = "gray90"
) +
geom_point(
data = cases,
aes(colour = id, shape = id), # map color/shape to individual cases
size = 5,
show.legend = FALSE
) +
scale_shape_manual(values=c(16, 17, 15, 16, 17, 15)) +
scale_colour_manual(values=c("deeppink1","indianred3", "blueviolet", "springgreen3", "chartreuse2", "darkgreen")) +
scale_size_manual(values=c(5, 4, 5, 5, 4, 5)) +
theme_classic()
ggsave("strange_stories_ToM_mean.svg", width = 8, height = 8, units = "cm")
I looked at using 'for' and created a list to loop through (what I have is below) but I came unstuck at where the list should be incorporated when two data frames are being used and two plots...could lapply be best?
variables <- list() # Create empty listfor(i in ncol(FTD_data)) { # Using for-loop to add all columns tolist variables[[i]] <- FTD_data[ , i]}
names(variables) <- colnames(FTD_data) #rename list elements with variable names from df
for (i in variables)
{CODE TO PLOT INSERT HERE}
One approach to achieve your desired result would be to put your plotting code inside a function which takes one argument, the name of the column to plot. The only change I made to your plotting code is to replace the hardcoded strange_stories_ToM_mean by .data[[col]] to tell ggplot I want to plot the data column whose name is stored in col.
Also, instead of using a for loop I would recommend to use lapply when using ggplot2:
library(ggplot2)
plot_fun <- function(col) {
ggplot(data = controls, aes(.data[[col]], age)) +
geom_violin(
trim = FALSE,
alpha = 0.2,
draw_quantiles = c(0.25, 0.5, 0.75),
fill = "gray90"
) +
geom_point(
data = cases,
aes(colour = id, shape = id),
size = 5,
show.legend = FALSE
) +
scale_shape_manual(values=c(16, 17, 15, 16, 17, 15)) +
scale_colour_manual(values=c("deeppink1","indianred3", "blueviolet", "springgreen3", "chartreuse2", "darkgreen")) +
scale_size_manual(values=c(5, 4, 5, 5, 4, 5)) +
theme_classic()
}
cols_to_plot <- names(controls)[!names(controls) %in% c("id", "age")]
names(cols_to_plot) <- cols_to_plot
p <- lapply(cols_to_plot, plot_fun)
lapply(cols_to_plot, function(x) ggsave(paste0(x, ".svg"), plot = p[[x]], width = 8, height = 8, units = "cm"))
#> $strange_stories_ToM_mean
#> [1] "strange_stories_ToM_mean.svg"
#>
#> $strange_stories_ToM_median
#> [1] "strange_stories_ToM_median.svg"
p
#> $strange_stories_ToM_mean
#>
#> $strange_stories_ToM_median
DATA
set.seed(123)
cases <- data.frame(
id = factor(1:6),
strange_stories_ToM_mean = sample(6:8, 6, replace = TRUE),
strange_stories_ToM_median = sample(6:8, 6, replace = TRUE),
age = factor(c(rep("young", 3), rep("old", 3)))
)
controls <- data.frame(
id = 7:23,
strange_stories_ToM_mean = sample(c(6,6,7,7,7,7,7,7,7,8,8,8,9,9,9,9,9), 17),
strange_stories_ToM_median = sample(c(6,6,7,7,7,7,7,7,7,8,8,8,9,9,9,9,9), 17),
age = c(rep("young", 9), rep("old", 8))
)

Colouring by variable when using tidy graph in R?

I am trying to come up with a way to consistently colour multiple tidygraph plots. Right now, the issue is, when I plot multiple plots to the screen at once, tidygraph chooses a different colour for each variable. hopefully my example below will explain the issue.
To begin, I create some data, turn them into tidygraph objects, and put them together into a list:
library(tidygraph)
library(ggraph)
library(gridExtra)
# create some data for the tbl_graph
nodes <- data.frame(name = c("x4", NA, NA),
label = c("x4", 5, 2))
nodes1 <- data.frame(name = c("x4", "x2", NA, NA, "x1", NA, NA),
label = c("x4", "x2", 2, 1, "x1", 2, 7))
edges <- data.frame(from = c(1,1), to = c(2,3))
edges1 <- data.frame(from = c(1, 2, 2, 1, 5, 5),
to = c(2, 3, 4, 5, 6, 7))
# create the tbl_graphs
tg <- tbl_graph(nodes = nodes, edges = edges)
tg_1 <- tbl_graph(nodes = nodes1, edges = edges1)
# put into list
myList <- list(tg, tg_1)
Then I have a plotting function that allows me to display all the plots at once. I do this using grid.arrange from the gridExtra package, like so:
plotFun <- function(List){
ggraph(List, "partition") +
geom_node_tile(aes(fill = name), size = 0.25) +
geom_node_label(aes(label = label, color = name)) +
scale_y_reverse() +
theme_void() +
theme(legend.position = "none")
}
# Display all plots
allPlots <- lapply(myList, plotFun)
n <- length(allPlots)
nRow <- floor(sqrt(n))
do.call("grid.arrange", c(allPlots, nrow = nRow))
This will produce something like this:
As you can see, it colours by the variable label for each individual plot. This results in the same variable label being coloured differently in each plot. For example, x4 in the first plot is red and in the second plot is blue.
I'm trying to find a way to make the colours for the variable's label consistent across all plots. Maybe using grid.arrange isn't the best solution!?
Any help is appreciated.
Since each plot doesn't know anything about the other plots, it's best to assign colors yourself. First you can extract all the node names and assign them a color
nodenames <- unique(na.omit(unlist(lapply(myList, .%>%activate(nodes) %>% pull(name) ))))
nodecolors <- setNames(scales::hue_pal(c(0,360)+15, 100, 64, 0, 1)(length(nodenames)), nodenames)
nodecolors
# x4 x2 x1
# "#F5736A" "#00B734" "#5E99FF"
We use scales::hue_pal to get the "default" ggplot colors but you could use whatever you like. Then we just need to customize the color/fill scales for the plots with these colors.
plotFun <- function(List, colors=NULL){
plot <- ggraph(List, "partition") +
geom_node_tile(aes(fill = name), size = 0.25) +
geom_node_label(aes(label = label, color = name)) +
scale_y_reverse() +
theme_void() +
theme(legend.position = "none")
if (!is.null(colors)) {
plot <- plot + scale_fill_manual(values=colors) +
scale_color_manual(values=colors, na.value="grey")
}
plot
}
allPlots <- lapply(myList, plotFun, colors=nodecolors)
n <- length(allPlots)
nRow <- floor(sqrt(n))
do.call("grid.arrange", c(allPlots, nrow = nRow))

Equivalent of gganimate::transition_events on plotly

In R, using gganimate, one can make an animated plot where events appear and disappear with time. For example:
library(lubridate)
library(gganimate)
df=data.frame(
x=c(1,2,3,4),
y=c(1,2,3,4),
start=c(1,2,3,4),
end=c(5,6,7,8),
en=as_date(1),
ex=as_date(1))
ggplot(data=df, aes(x=x,y=y))+
geom_point()+
gganimate::transition_events(
start=start,
end=end,
enter_length = as.numeric(en),
exit_length = as.numeric(ex))
This produces a plot in which points appear according to column "start" and desappear according to column "end".
I wonder if there is an easy way to achieve the same in with plotly (preferably using ggplotly()), getting a slider to move along the time.
Here is an example using ggplotly. The result however isn't exactly the same:
library(plotly)
library(lubridate)
df = data.frame(
x = c(1, 2, 3, 4),
y = c(1, 2, 3, 4),
start = c(1, 2, 3, 4),
end = c(5, 6, 7, 8),
en = as_date(1),
ex = as_date(1)
)
frame_list <- Map(seq, from = df$start, to = df$end)
DF <- data.frame(x = rep(df$x, times = lengths(frame_list)),
y = rep(df$y, times = lengths(frame_list)),
frame = unlist(frame_list))
p <- ggplot(DF, aes(x, y)) +
geom_point(aes(size = y, frame = frame))
fig <- ggplotly(p)
fig %>%
animation_opts(
frame = 0,
easing = "linear",
redraw = FALSE,
mode = "immediate"
)
fig

R - Overlay multiple least squares plots with colour coding

I'm trying to visualize some data that looks like this
line1 <- data.frame(x = c(4, 24), y = c(0, -0.42864), group = "group1")
line2 <- data.frame(x = c(4, 12 ,24), y = c(0, 2.04538, 3.4135), group = "group2")
line3 <- data.frame(x = c(4, 12, 24), y = c(0, 3.14633, 3.93718), group = "group3")
line4 <- data.frame(x = c(0, 3, 7, 12, 18), y = c(0, -0.50249, 0.11994, -0.68694, -0.98949), group = "group4")
line5 <- data.frame(x = c(0, 3, 7, 12, 18, 24), y = c(0, -0.55753, -0.66006, 0.43796, 1.38723, 3.17906), group = "group5")
df <- do.call(rbind, list(line1, line2, line3, line4, line5))
What I'm trying to do is plot the least squares line (and points) for each group on the same plot. And I'd like the colour of the lines and points to correspond to the group.
All I've been able to do is plot the points according to their group
ggplot(data = df, aes(x, y, colour = group)) + geom_point(aes(size = 10))
But I have no idea how to add in the lines as well and make their colours correspond to the points that they are fitting.
I'd really appreciate any help with this. It's turning out to be so much harder than I though it would be.
You can simply add a geom_smooth layer to your plot
ggplot(data = df, aes(x, y, colour = group)) + geom_point(aes(size = 10)) +
geom_smooth(method="lm",se=FALSE)
method="lm" specifies that you want a linear model
se=FALSE to avoid plotting confidence intervals

Resources