Using text annotations with plotly::subplot - r

I have data I'd like to plot the distribution density of. The data are from three groups, where for each there are three states, each with a probability, and these probabilities sum to 1.
I'm trying to use R's plotly to plot, for each group, the density of the probabilities, color coded by state, and add some text annotation to each such group plot. Finally I'm trying to combine all of these group plots using plotly::subplot.
Here's the code to generate the data and a list of group plots:
library(dplyr)
library(reshape2)
library(plotly)
set.seed(1)
plot.list <- lapply(1:3,function(g){
if(g == 1){
show.legend <- T
} else{
show.legend <- F
}
df <- data.frame(id=LETTERS,t(sapply(1:length(LETTERS),function(x){
probs <- runif(3,0,1)
return(probs/sum(probs))
}))) %>% dplyr::rename(S1=X1,S2=X2,S3=X3) %>%
reshape2::melt() %>% dplyr::rename(state=variable,probability=value)
df$state <- factor(df$state,levels=c("S1","S2","S3"))
density.df <- do.call(rbind,lapply(levels(df$state),function(s){
dens <- density(dplyr::filter(df,state == s)$probability)
return(data.frame(x=dens$x,y=dens$y,state=s,stringsAsFactors=F))
}))
density.df$state <- factor(density.df$state,levels=levels(df$state))
dens.plot <- plot_ly(x=~density.df$x,y=~density.df$y,type='scatter',mode='lines',color=~density.df$state,showlegend=show.legend) %>%
layout(xaxis=list(title="Probability",zeroline=F),yaxis=list(title="Count",zeroline=F)) %>%
add_annotations(x=0.75,y="top",text=paste0("text: ",g))
if(show.legend) dens.plot <- dens.plot %>% add_annotations(text="State",xref="paper",yref="paper",x=1.02,xanchor="left",y=1.02,yanchor="top",legendtitle=T,showarrow=F)
return(dens.plot)
})
Note that I'm only adding the legend to the first group so it appears only once in the final grouped plot (there's probably a more elegant way of achieving that).
And here's the plotly::subplot command I'm using:
subplot(plot.list,nrows=3,shareX=T,shareY=T,titleX=T,titleY=T)
Which gives:
As you can see the text annotation is stuck at "top" of the first plot rather than at the top of each individual plot.
Any idea how do I get each annotation to be located at the top of its corresponding sub-plot?

Preamble. For reasons that are not entirely obvious to me (but relating to how values for annotations are scaled when running subplot), annotations seem to go awry with vertically stacked subplots. To see this, run the MWE at https://plot.ly/r/text-and-annotations/#subplot-annotations, but change
subplot(p1, p2, titleX = TRUE, titleY = TRUE)
to
subplot(p1, p2, titleX = TRUE, titleY = TRUE, nrows = 2)
In the vertically stacked version, the annotations are not where we would expect them to be. To achieve your desired outcome would require some post-processing of the subplot output. Now, on to your main question.
First, in add_annotations, add xref and yref arguments that correspond to each subplot. In each element of plot.list, I also add an additional element y_anno to keep track of where we would like the annotation to go (at the maximum value of the densities in each subplot).
plot.list <- lapply(1:3,function(g){
if(g == 1){
show.legend <- T
} else{
show.legend <- F
}
df <- data.frame(id=LETTERS,t(sapply(1:length(LETTERS),function(x){
probs <- runif(3,0,1)
return(probs/sum(probs))
}))) %>% dplyr::rename(S1=X1,S2=X2,S3=X3) %>%
reshape2::melt() %>% dplyr::rename(state=variable,probability=value)
df$state <- factor(df$state,levels=c("S1","S2","S3"))
density.df <- do.call(rbind,lapply(levels(df$state),function(s){
dens <- density(dplyr::filter(df,state == s)$probability)
return(data.frame(x=dens$x,y=dens$y,state=s,stringsAsFactors=F))
}))
density.df$state <- factor(density.df$state,levels=levels(df$state))
dens.plot <- plot_ly(x=~density.df$x,
y=~density.df$y,
type='scatter',
mode='lines',
color=~density.df$state,
showlegend=show.legend) %>%
layout(xaxis=list(title="Probability",zeroline=F),yaxis=list(title="Count",zeroline=F)) %>%
add_annotations(x = 0.75,
y = max(density.df$y),
text = paste0("text: ", g),
xref = paste0("x", g), # add this
yref = paste0("y", g), # add this
ax = 0,
ay = 0)
if(show.legend) dens.plot <- dens.plot %>% add_annotations(text="State",xref="paper",yref="paper",x=1.02,xanchor="left",y=1.02,yanchor="top",legendtitle=T,showarrow=F)
dens.plot$y_anno <- max(density.df$y) # add this
return(dens.plot)
})
Now if we run subplot(plot.list,nrows=3,shareX=T,shareY=T,titleX=T,titleY=T), the text will be in each subplot, but not at the top (due to the phenomenon I described in the preamble). To fix this, we can post-process the subplot output:
p <- subplot(plot.list, nrows = 3,shareX = T,shareY = T,titleX = T,titleY = T)
for (i in seq_along(plot.list)) {
for (j in seq_along(p$x$layout$annotations)) {
if (p$x$layout$annotations[[j]]$yref == paste0("y", i))
p$x$layout$annotations[[j]]$y <- plot.list[[i]]$y_anno
}
}
Now p gives us
which is close to what we want.

Related

R: How to customize Sankey plot in ggplotly?

I have data of sales by year and model, which is visualized via Sankey chart. Now I am struggling to handle 2 issue:
Firstly I need to set model B always on the bottom of chaty regardless its value over the years.
When I re-visualize ggplot via ggplotly hover does not show sales or years
code:
df <- data.frame (model = c("A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J"),
Year = c(2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020),
sales = c(450,678,456,344,984,456,234,244,655,789,234,567,234,567,232,900,1005,1900,450,345,567,235,456,345,144,333,555,777,111,444,222,223,445,776,331,788,980,1003,456,434,345,2222,3456,456,678,8911,4560,4567,4566,5555,6666,7777,8888,1233,1255,5677,3411,2344,6122,4533))
install.packages("remotes")
#remotes::install_github("davidsjoberg/ggsankey")
library(ggsankey)
library(tidyverse)
plot <- ggplot(df, aes(x = Year,
node = model,
fill = model,
value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16)
ggplotly(plot)
I'm absolutely certain that there is a better way, but it took me a while to get it working. I think this is what you were looking for.
I started with the ggplot and ggplotly objects that you have here. The primary purpose of this initial plot is to capture the colors. (I could have captured them a few different ways, but this was already done for me in your plot.)
Update ** I've modified the two elements you requested
library(ggsankey)
library(tidyverse)
library(plotly)
# df from the question is unchanged
# visualize the original
(plot <- ggplot(df,
aes(Year, node = model, fill = model, value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial",
color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16))
ggplotly(plot) -> plp
plp
#-------- colors --------
# collect the 10 colors
cols <- map_dfr(1:10, function(k){
nm <- plp$x$data[[k]]$name
filler <- plp$x$data[[k]]$fillcolor
c(nm = nm, filler = filler)
})
Then I divided the contents of the B model into 10 groups to ensure it was always the smallest bump. This allowed me to collect the stacked values for all of the other models, which is needed to push B to the bottom.
#-------------- splitting B -------------
df1 <- df %>% filter(model != "B") %>%
arrange(Year, sales)
df2 <- df %>% filter(model == "B") %>% # this gets used further down
arrange(Year)
# split B into 10 groups - keep on the bottom, then join the groups
# make the groups
ng <- vector(length = 10)
invisible(
map(1:10,
function(i) {
ng[i] <<- rep("B", i) %>% paste0(collapse = "")
})
)
# add values for these groups by year
df4 <- data.frame(Year = rep(unique(df$Year), each = 10),
model = rep(ng, length(unique(df$Year))),
sales = rep(df2$sales/10, each = 10))
df5 <- rbind(df1, df4)
Recreate the Sankey bump with 10 subsections of model B. Everything that follows works with this plot.
#-------------- plotly after dividing B -------------
(nplt <- ggplot(df5, aes(x = Year, node = model, fill = model, value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial",
color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16))
ggplotly(nplt) -> plt
plt
Create a Sankey bump with JUST B, to capture data that represents model B at the bottom. Use this data to substitute all of the traces that represent B in the object plt. The colors get fixed here, as well. (The original 10 colors from the first plot.) Lastly, the hoverinfo gets removed. That will get fixed next.
#-------------- get values for B at the bottom -------------
df %>% filter(model == "B") %>%
ggplot(aes(x = Year,
node = model,
fill = model,
value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16) -> bplt
ggplotly(bplt) -> bplotly
bplotly
#------- take divided B and remove all but one trace for B --------
# xx <- plt$x$data
# plt$x$data <- xx[c(1:2, 12:length(xx))] # keep only one B trace
#---------------- adjustments to plt's build --------------------
# change out data for the B trace, add the right colors
wh <- vector(length = 0)
invisible(
map(1:length(plt$x$data),
function(j) {
nm <- plt$x$data[[j]]$name
plt$x$data[[j]]$hoverinfo <<- "none"
plt$x$data[[j]]$fillcolor <<- unlist(cols[cols$nm == nm, "filler"],
use.names = F)
if(str_detect(nm, "^B$")){
plt$x$data[[j]]$x <<- bplotly$x$data[[1]]$x
plt$x$data[[j]]$y <<- bplotly$x$data[[1]]$y
}
if(str_detect(nm, "BB")) {
wh[length(wh) + 1] <<- j # list of unnecessary traces (extra B groups)
}
})
)
#----- take divided B and remove all but one trace for B ------
plt$x$data <- plt$x$data[-c(wh)] # <------ forget this line when updated last time
# visualize Sankey bump with B at the bottom
plt
The Plotly object is basically 10 globs of color, there is no separation between years in the background. So if you add a tooltip to this as it is, there can be only one...
To get the tooltips you're looking for, I created another trace (well, 10, actually—1 for each model). In order to get the right values (because the sales data isn't in the 50K range), I used the data in plt to create a new data frame.
#--------------- collect values for hovertext positions ----------
x <- plt$x$data[[1]]$x
inds <- which(x %in% 2015:2020, arr.ind = T)
yrs <- x[inds]
tellMe <- invisible(
map(1:length(plt$x$data),
function(m) {
y <- plt$x$data[[m]]$y
y[inds]
}) %>% setNames(sort(unique(df$model))) %>% # changed from LETTERS[1:10]
as.data.frame() %>%
mutate(yr = yrs %>% as.integer()) %>%
pivot_longer(names_to = "model", values_to = "sales",
cols = sort(unique(df$model))) %>%
distinct() %>%
group_by(yr, model) %>%
summarise(val = mean(sales)) %>%
left_join(df, by = c("yr" = "Year", "model" = "model")) %>%
as.data.frame() # drop groups
)
#-------------- create data trace for hovertext --------------
plot_ly(tellMe, x = ~yr, y = ~val, split = ~model,
customdata = ~sales, text = ~model,
line = list(width = .01, shape = "spline", smoothing = 1.3),
hovertemplate = "Year: %{x}<br>Model: %{text}<br>Sales: %{customdata}<extra></extra>",
type = "scatter", mode = "lines", showlegend = F) -> pp2
pp2
If you look at the plot here, it looks blank. That's because of how small the lines are. This is intentional. You don't want lines on your graph.
Fix the colors, so that the hoverlabel background colors match the legend colors.
# change colors to match sankey
pp2 <- plotly_build(pp2)
invisible(
map(1:10,
function(z) {
nm <- pp2$x$data[[z]]$name
# collect and assign the color
cr <- unlist(cols[cols$nm == nm, "filler"], use.names = F)
pp2$x$data[[z]]$line$color <<- cr
})
)
Using subplot here didn't work. Plotly gave me an error when I tried adding a trace, whether all at once or even one for each model. So I forced the traces together.
#-------------- consolidate the traces (subplot won't work) -----------
# collect data one more time!
dx <- plt$x$data
yx <- pp2$x$data
yx <- append(yx, dx) # put plt on top
# replace data
plt$x$data <- yx
# lines are small, increase the distance searched for matches
plt %>% layout(hoverdistance = 40)
The final product:

Change point plotting order in ggnet

I am working with a large network and wish too highlight certain nodes. I would like these nodes to plot on top of a dense network. They currently are identified by a certain color. Here is some simple example code.
library(network)
library(GGally)
# make a random network
x <- c(0,1,0,1,1,1,0,1,0,1,0,1)
seed <- c(10,25,40,34,1,35,6,3,14,5,23,3)
net <- data.frame(matrix(nrow = 12, ncol = 12))
for (i in 1:12) {
set.seed(seed[i])
net[i] <- sample(x)
}
#plot it with two colors
plot = as.network(net,
directed = FALSE,
ignore.eval = FALSE,
names.eval = 'R_val')
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
final <- ggnet2(net,size = 25,color = color,label = TRUE)
I have really exaggerated the dot size here to make them overlap. Is there a way I can get the "yes" points to always plot on top of the "no" points?
EDIT: Added "labels" for clarity.
Yes, there is! Your color vector first denotes the "yes" and then the "no", which seems to determine the plotting order. Assuming you have more than "yes" or "no", you could try convert the color vector to a factor and set levels. Then you can sort the order of your "yes"s and "no"s:
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
factor_color <- sort(factor(color, levels = c("no", "yes")))
ggnet2(net, size = 100, color = factor_color)
EDIT 1
As per your comment, I cannot think of a (more) elegant solution, but this works for me:
#plot it with two colors
plot = as.network(net,
directed = FALSE,
ignore.eval = FALSE,
names.eval = 'R_val')
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
final <- ggnet2(net,size = 100, color = color, label = TRUE)
final_build <- ggplot2::ggplot_build(final)
# Extract the geom_point data and find which elements have 'yes'
yes_index <- which(color == "yes")
label_data <- final_build$data[[2]]
yes_coordinates_label <- cbind(label_data[yes_index,], label = names(net)[yes_index])
final +
geom_point(data = yes_coordinates_label, aes(x = x, y = y),
size = 100, color = first(yes_coordinates_label$colour)) +
geom_text(data = yes_coordinates_label, aes(x = x, y = y, label = label))
The idea is to plot the dots with geom_point() again but only for the dots which are "yes".
EDIT 2
I couldn't help but think of another solution without plotting the points again. It is possible to retrieve the plot information using ggplot_build() and then to reorder the hierarchy of the points drawn; the datapoints which come first are drawn first. Hence doing the following will work:
library(tidyverse)
# Find the index of the GeomPoint layer
geom_types <- final$layers %>% map("geom") %>% map(class)
GeomPoint_ind <- which(sapply(geom_types, function(x) "GeomPoint" %in% x))
# Retrieve plot information
final_build <- ggplot2::ggplot_build(final)
df <- final_build$data[[GeomPoint_ind]]
# Set the indices which you would like to have on top and modify the ggplot_build object.
yes_index <- which(color == "yes")
final_build$data[[2]] <- rbind(df[-yes_index,], df[yes_index,])
# Convert to plot object and plot
new_final <- ggplot_gtable(final_build)
plot(new_final)

R plotly Issues with hovering text in a trace loop

Following this post and this answer I have an additional question:
library(plotly)
# Create data
dat=data.frame(group = factor(rep(LETTERS[1:4], each=10)), my_x = rep(1:10, 4), my_y = rnorm(40))
str(dat)
# Let's do a first plot
p<-plot_ly(dat)
# Add a trace for each group using a loop
for(i in 1:length(levels(dat$group))){
subs <- subset(dat, group == levels(dat$group)[i])
p<-add_trace(p = p,
data = subs,
y=~my_y,
x=~my_x ,
name=levels(dat$group)[i],
type="scatter",
mode="markers+lines",
hoverinfo="text",
text=~paste0(levels(dat$group)[i], ": x=", round(my_x, 2), "y=", round(my_y, 2)))
}
p
Can anybody tell me why it is that when I hover over the data points, each of the labels shows the correct x and y values, however, they are all labelled as 'D:', while the legend shows the lines resemble A, B, C & D. I would like the hover text to be labeled correctly.
It could be an issue with the use of ~ in text. Try by creating the 'text' using the 'subs' data separately and then pass it on the add_trace
p <- plot_ly()
lvls <- levels(dat$group)
for(i in seq_along(lvls)){
subs <- droplevels(subset(dat, group == lvls[i]))
text1 <- with(subs, paste0(lvls[i], ": x=", round(my_x, 2), "y=", round(my_y, 2)))
p <- add_trace(p,
data = subs,
x = ~my_x,
y = ~my_y,
name = lvls[i],
type = 'scatter',
mode = 'markers+lines',
hoverinfo='text',
text=text1)
}
p
-output

Drawing rectangles around specified labels in a dendrogram with 'dendextend'

I'm currently constructing a dendrogram and I'm using 'dendextend' to tweak the look of it.
I've been able to do everything I want to (labelling leaves and highlighting branches of my chosen clusters), except drawing rectangles around pre-defined clusters.
My data (which can be sourced from this file: Barra_IBS_example.matrix) was clustered with 'pvclust', so 'pvrect' draws the rects in the correct position, but it cuts the labels (see image below), so I want to reproduce it with 'rect.dendrogram', however, I can't figure out how to tell the function to use the clustering data from 'pvclust'.
This is the code I'm using:
idnames <- dimnames(ibs_mat)[[1]]
ibs.pv <- pvclust(ibs_mat, nboot=1000)
ibs.clust <- pvpick(ibs.pv, alpha=0.95)
names(ibs.clust$clusters) <- paste0("Cluster", 1:length(ibs.clust$clusters))
# Choose a colour palette
pal <- brewer.pal(length(ibs.clust$clusters), "Paired")
# Transform the list to a dataframe
ibs_meta <- bind_rows(lapply(names(ibs.clust$clusters),
function(l) data.frame(Cluster=l, Sample = ibs.clust$clusters[[l]])))
# Add the rest of the non-clustered samples (and assign them as Cluster0), add colour to each cluster
ibs_table <- ibs_meta %>%
rbind(., data.frame(Cluster = "Cluster0",
Sample = idnames[!idnames %in% .$Sample])) %>%
mutate(Cluster_int=as.numeric(sub("Cluster", "", Cluster))) %>%
mutate(Cluster_col=ifelse(Cluster_int==0, "#000000",
pal[Cluster_int])) %>%
.[match(ibs.pv$hclust$labels[ibs.pv$hclust$order], .$Sample),]
hcd <- as.dendrogram(ibs.pv) %>%
#pvclust_show_signif(ibs.pv, show_type = "lwd", signif_value = c(2, 1),alpha=0.25) %>%
set("leaves_pch", ifelse(ibs_table$Cluster_int>0,19,18)) %>% # node point type
set("leaves_cex", 1) %>% # node point size
set("leaves_col", ibs_table$Cluster_col) %>% #node point color
branches_attr_by_labels(ibs_meta$Sample, TF_values = c(2, Inf), attr = c("lwd")) %>% # change branch width
# rect.dendrogram(k=12, cluster = ibs_table$Cluster_int, border = 8, lty = 5, lwd = 1.5,
# lower_rect = 0) %>% # add rectangles around clusters
plot(main="Barramundi samples IBS based clustering")
pvrect(ibs.pv, alpha=0.95, lwd=1.5)
Many thanks, Ido
ok, this took more work than I had hoped, but I got a solution for you.
I created a new function called pvrect2 and just pushed it to the latest version of dendextend on github. Here is a self contained example demonstrating the solution:
devtools::install_github('talgalili/dendextend')
library(pvclust)
library(dendextend)
data(lung) # 916 genes for 73 subjects
set.seed(13134)
result <- pvclust(lung[, 1:20], method.dist="cor", method.hclust="average", nboot=10)
par(mar = c(9,2.5,2,0))
dend <- as.dendrogram(result)
dend %>%
pvclust_show_signif(result, signif_value = c(3,.5)) %>%
pvclust_show_signif(result, signif_value = c("black", "grey"), show_type = "col") %>%
plot(main = "Cluster dendrogram with AU/BP values (%)")
# pvrect(result, alpha=0.95)
pvrect2(result, alpha=0.95)
text(result, alpha=0.95)
UvdV.png

How can I create subplots in plotly using R where each subplot is two traces

Here is a toy example I have got stuck on
library(plotly)
library(dplyr)
# construct data.frame
df <- tibble(x=c(3,2,3,5,5,5,2),y=c("a","a","a","b","b","b","b"))
# construct data.frame of last y values
latest <- df %>%
group_by(y) %>%
slice(n())
# plot for one value of y (NB not sure why value for 3 appears?)
p <- plot_ly() %>%
add_histogram(data=subset(df,y=="b"),x= ~x) %>%
add_histogram(data=subset(latest,y=="b"),x= ~x,marker=list(color="red")) %>%
layout(barmode="overlay",showlegend=FALSE,title= ~y)
p
How can i set these up as subplots, one for each unique value of y? In the real world example, I would have 20 different y's so would ideally loop or apply the code. In addition, it would be good to set standard x scales of say c(1:10) and have, for example, 2 rows
TIA
build a list containing each of the plots
set the bin sizes manually for the histograms, otherwise the automatic selection will choose different bins for each of the traces within a plot (making it look strange as in you example where the bars of each trace are different widths)
use subplot to put it all together
add titles to individual subplots using a list of annotations, as explained here
Like this:
N = nlevels(factor(df$y))
plot_list = vector("list", N)
lab_list = vector("list", N)
for (i in 1:N) {
this_y = levels(factor(df$y))[i]
p <- plot_ly() %>%
add_trace(type="histogram", data=subset(df,y==this_y), x=x, marker=list(color="blue"),
autobinx=F, xbins=list(start=0.5, end=6.5, size=1)) %>%
add_trace(type="histogram", data=subset(latest,y==this_y), x = x, marker=list(color="red"),
autobinx=F, xbins=list(start=0.5, end=6.5, size=1)) %>%
layout(barmode="overlay", showlegend=FALSE)
plot_list[[i]] = p
titlex = 0.5
titley = c(1.05, 0.45)[i]
lab_list[[i]] = list(x=titlex, y=titley, text=this_y,
showarrow=F, xref='paper', yref='paper', font=list(size=18))
}
subplot(plot_list, nrows = 2) %>%
layout(annotations = lab_list)

Resources