I have data of sales by year and model, which is visualized via Sankey chart. Now I am struggling to handle 2 issue:
Firstly I need to set model B always on the bottom of chaty regardless its value over the years.
When I re-visualize ggplot via ggplotly hover does not show sales or years
code:
df <- data.frame (model = c("A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J"),
Year = c(2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020),
sales = c(450,678,456,344,984,456,234,244,655,789,234,567,234,567,232,900,1005,1900,450,345,567,235,456,345,144,333,555,777,111,444,222,223,445,776,331,788,980,1003,456,434,345,2222,3456,456,678,8911,4560,4567,4566,5555,6666,7777,8888,1233,1255,5677,3411,2344,6122,4533))
install.packages("remotes")
#remotes::install_github("davidsjoberg/ggsankey")
library(ggsankey)
library(tidyverse)
plot <- ggplot(df, aes(x = Year,
node = model,
fill = model,
value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16)
ggplotly(plot)
I'm absolutely certain that there is a better way, but it took me a while to get it working. I think this is what you were looking for.
I started with the ggplot and ggplotly objects that you have here. The primary purpose of this initial plot is to capture the colors. (I could have captured them a few different ways, but this was already done for me in your plot.)
Update ** I've modified the two elements you requested
library(ggsankey)
library(tidyverse)
library(plotly)
# df from the question is unchanged
# visualize the original
(plot <- ggplot(df,
aes(Year, node = model, fill = model, value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial",
color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16))
ggplotly(plot) -> plp
plp
#-------- colors --------
# collect the 10 colors
cols <- map_dfr(1:10, function(k){
nm <- plp$x$data[[k]]$name
filler <- plp$x$data[[k]]$fillcolor
c(nm = nm, filler = filler)
})
Then I divided the contents of the B model into 10 groups to ensure it was always the smallest bump. This allowed me to collect the stacked values for all of the other models, which is needed to push B to the bottom.
#-------------- splitting B -------------
df1 <- df %>% filter(model != "B") %>%
arrange(Year, sales)
df2 <- df %>% filter(model == "B") %>% # this gets used further down
arrange(Year)
# split B into 10 groups - keep on the bottom, then join the groups
# make the groups
ng <- vector(length = 10)
invisible(
map(1:10,
function(i) {
ng[i] <<- rep("B", i) %>% paste0(collapse = "")
})
)
# add values for these groups by year
df4 <- data.frame(Year = rep(unique(df$Year), each = 10),
model = rep(ng, length(unique(df$Year))),
sales = rep(df2$sales/10, each = 10))
df5 <- rbind(df1, df4)
Recreate the Sankey bump with 10 subsections of model B. Everything that follows works with this plot.
#-------------- plotly after dividing B -------------
(nplt <- ggplot(df5, aes(x = Year, node = model, fill = model, value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial",
color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16))
ggplotly(nplt) -> plt
plt
Create a Sankey bump with JUST B, to capture data that represents model B at the bottom. Use this data to substitute all of the traces that represent B in the object plt. The colors get fixed here, as well. (The original 10 colors from the first plot.) Lastly, the hoverinfo gets removed. That will get fixed next.
#-------------- get values for B at the bottom -------------
df %>% filter(model == "B") %>%
ggplot(aes(x = Year,
node = model,
fill = model,
value = sales)) +
geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
scale_fill_viridis_d(option = "A", alpha = .8) +
theme_sankey_bump(base_size = 16) -> bplt
ggplotly(bplt) -> bplotly
bplotly
#------- take divided B and remove all but one trace for B --------
# xx <- plt$x$data
# plt$x$data <- xx[c(1:2, 12:length(xx))] # keep only one B trace
#---------------- adjustments to plt's build --------------------
# change out data for the B trace, add the right colors
wh <- vector(length = 0)
invisible(
map(1:length(plt$x$data),
function(j) {
nm <- plt$x$data[[j]]$name
plt$x$data[[j]]$hoverinfo <<- "none"
plt$x$data[[j]]$fillcolor <<- unlist(cols[cols$nm == nm, "filler"],
use.names = F)
if(str_detect(nm, "^B$")){
plt$x$data[[j]]$x <<- bplotly$x$data[[1]]$x
plt$x$data[[j]]$y <<- bplotly$x$data[[1]]$y
}
if(str_detect(nm, "BB")) {
wh[length(wh) + 1] <<- j # list of unnecessary traces (extra B groups)
}
})
)
#----- take divided B and remove all but one trace for B ------
plt$x$data <- plt$x$data[-c(wh)] # <------ forget this line when updated last time
# visualize Sankey bump with B at the bottom
plt
The Plotly object is basically 10 globs of color, there is no separation between years in the background. So if you add a tooltip to this as it is, there can be only one...
To get the tooltips you're looking for, I created another trace (well, 10, actually—1 for each model). In order to get the right values (because the sales data isn't in the 50K range), I used the data in plt to create a new data frame.
#--------------- collect values for hovertext positions ----------
x <- plt$x$data[[1]]$x
inds <- which(x %in% 2015:2020, arr.ind = T)
yrs <- x[inds]
tellMe <- invisible(
map(1:length(plt$x$data),
function(m) {
y <- plt$x$data[[m]]$y
y[inds]
}) %>% setNames(sort(unique(df$model))) %>% # changed from LETTERS[1:10]
as.data.frame() %>%
mutate(yr = yrs %>% as.integer()) %>%
pivot_longer(names_to = "model", values_to = "sales",
cols = sort(unique(df$model))) %>%
distinct() %>%
group_by(yr, model) %>%
summarise(val = mean(sales)) %>%
left_join(df, by = c("yr" = "Year", "model" = "model")) %>%
as.data.frame() # drop groups
)
#-------------- create data trace for hovertext --------------
plot_ly(tellMe, x = ~yr, y = ~val, split = ~model,
customdata = ~sales, text = ~model,
line = list(width = .01, shape = "spline", smoothing = 1.3),
hovertemplate = "Year: %{x}<br>Model: %{text}<br>Sales: %{customdata}<extra></extra>",
type = "scatter", mode = "lines", showlegend = F) -> pp2
pp2
If you look at the plot here, it looks blank. That's because of how small the lines are. This is intentional. You don't want lines on your graph.
Fix the colors, so that the hoverlabel background colors match the legend colors.
# change colors to match sankey
pp2 <- plotly_build(pp2)
invisible(
map(1:10,
function(z) {
nm <- pp2$x$data[[z]]$name
# collect and assign the color
cr <- unlist(cols[cols$nm == nm, "filler"], use.names = F)
pp2$x$data[[z]]$line$color <<- cr
})
)
Using subplot here didn't work. Plotly gave me an error when I tried adding a trace, whether all at once or even one for each model. So I forced the traces together.
#-------------- consolidate the traces (subplot won't work) -----------
# collect data one more time!
dx <- plt$x$data
yx <- pp2$x$data
yx <- append(yx, dx) # put plt on top
# replace data
plt$x$data <- yx
# lines are small, increase the distance searched for matches
plt %>% layout(hoverdistance = 40)
The final product:
I am working with a large network and wish too highlight certain nodes. I would like these nodes to plot on top of a dense network. They currently are identified by a certain color. Here is some simple example code.
library(network)
library(GGally)
# make a random network
x <- c(0,1,0,1,1,1,0,1,0,1,0,1)
seed <- c(10,25,40,34,1,35,6,3,14,5,23,3)
net <- data.frame(matrix(nrow = 12, ncol = 12))
for (i in 1:12) {
set.seed(seed[i])
net[i] <- sample(x)
}
#plot it with two colors
plot = as.network(net,
directed = FALSE,
ignore.eval = FALSE,
names.eval = 'R_val')
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
final <- ggnet2(net,size = 25,color = color,label = TRUE)
I have really exaggerated the dot size here to make them overlap. Is there a way I can get the "yes" points to always plot on top of the "no" points?
EDIT: Added "labels" for clarity.
Yes, there is! Your color vector first denotes the "yes" and then the "no", which seems to determine the plotting order. Assuming you have more than "yes" or "no", you could try convert the color vector to a factor and set levels. Then you can sort the order of your "yes"s and "no"s:
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
factor_color <- sort(factor(color, levels = c("no", "yes")))
ggnet2(net, size = 100, color = factor_color)
EDIT 1
As per your comment, I cannot think of a (more) elegant solution, but this works for me:
#plot it with two colors
plot = as.network(net,
directed = FALSE,
ignore.eval = FALSE,
names.eval = 'R_val')
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
final <- ggnet2(net,size = 100, color = color, label = TRUE)
final_build <- ggplot2::ggplot_build(final)
# Extract the geom_point data and find which elements have 'yes'
yes_index <- which(color == "yes")
label_data <- final_build$data[[2]]
yes_coordinates_label <- cbind(label_data[yes_index,], label = names(net)[yes_index])
final +
geom_point(data = yes_coordinates_label, aes(x = x, y = y),
size = 100, color = first(yes_coordinates_label$colour)) +
geom_text(data = yes_coordinates_label, aes(x = x, y = y, label = label))
The idea is to plot the dots with geom_point() again but only for the dots which are "yes".
EDIT 2
I couldn't help but think of another solution without plotting the points again. It is possible to retrieve the plot information using ggplot_build() and then to reorder the hierarchy of the points drawn; the datapoints which come first are drawn first. Hence doing the following will work:
library(tidyverse)
# Find the index of the GeomPoint layer
geom_types <- final$layers %>% map("geom") %>% map(class)
GeomPoint_ind <- which(sapply(geom_types, function(x) "GeomPoint" %in% x))
# Retrieve plot information
final_build <- ggplot2::ggplot_build(final)
df <- final_build$data[[GeomPoint_ind]]
# Set the indices which you would like to have on top and modify the ggplot_build object.
yes_index <- which(color == "yes")
final_build$data[[2]] <- rbind(df[-yes_index,], df[yes_index,])
# Convert to plot object and plot
new_final <- ggplot_gtable(final_build)
plot(new_final)
Following this post and this answer I have an additional question:
library(plotly)
# Create data
dat=data.frame(group = factor(rep(LETTERS[1:4], each=10)), my_x = rep(1:10, 4), my_y = rnorm(40))
str(dat)
# Let's do a first plot
p<-plot_ly(dat)
# Add a trace for each group using a loop
for(i in 1:length(levels(dat$group))){
subs <- subset(dat, group == levels(dat$group)[i])
p<-add_trace(p = p,
data = subs,
y=~my_y,
x=~my_x ,
name=levels(dat$group)[i],
type="scatter",
mode="markers+lines",
hoverinfo="text",
text=~paste0(levels(dat$group)[i], ": x=", round(my_x, 2), "y=", round(my_y, 2)))
}
p
Can anybody tell me why it is that when I hover over the data points, each of the labels shows the correct x and y values, however, they are all labelled as 'D:', while the legend shows the lines resemble A, B, C & D. I would like the hover text to be labeled correctly.
It could be an issue with the use of ~ in text. Try by creating the 'text' using the 'subs' data separately and then pass it on the add_trace
p <- plot_ly()
lvls <- levels(dat$group)
for(i in seq_along(lvls)){
subs <- droplevels(subset(dat, group == lvls[i]))
text1 <- with(subs, paste0(lvls[i], ": x=", round(my_x, 2), "y=", round(my_y, 2)))
p <- add_trace(p,
data = subs,
x = ~my_x,
y = ~my_y,
name = lvls[i],
type = 'scatter',
mode = 'markers+lines',
hoverinfo='text',
text=text1)
}
p
-output
I'm currently constructing a dendrogram and I'm using 'dendextend' to tweak the look of it.
I've been able to do everything I want to (labelling leaves and highlighting branches of my chosen clusters), except drawing rectangles around pre-defined clusters.
My data (which can be sourced from this file: Barra_IBS_example.matrix) was clustered with 'pvclust', so 'pvrect' draws the rects in the correct position, but it cuts the labels (see image below), so I want to reproduce it with 'rect.dendrogram', however, I can't figure out how to tell the function to use the clustering data from 'pvclust'.
This is the code I'm using:
idnames <- dimnames(ibs_mat)[[1]]
ibs.pv <- pvclust(ibs_mat, nboot=1000)
ibs.clust <- pvpick(ibs.pv, alpha=0.95)
names(ibs.clust$clusters) <- paste0("Cluster", 1:length(ibs.clust$clusters))
# Choose a colour palette
pal <- brewer.pal(length(ibs.clust$clusters), "Paired")
# Transform the list to a dataframe
ibs_meta <- bind_rows(lapply(names(ibs.clust$clusters),
function(l) data.frame(Cluster=l, Sample = ibs.clust$clusters[[l]])))
# Add the rest of the non-clustered samples (and assign them as Cluster0), add colour to each cluster
ibs_table <- ibs_meta %>%
rbind(., data.frame(Cluster = "Cluster0",
Sample = idnames[!idnames %in% .$Sample])) %>%
mutate(Cluster_int=as.numeric(sub("Cluster", "", Cluster))) %>%
mutate(Cluster_col=ifelse(Cluster_int==0, "#000000",
pal[Cluster_int])) %>%
.[match(ibs.pv$hclust$labels[ibs.pv$hclust$order], .$Sample),]
hcd <- as.dendrogram(ibs.pv) %>%
#pvclust_show_signif(ibs.pv, show_type = "lwd", signif_value = c(2, 1),alpha=0.25) %>%
set("leaves_pch", ifelse(ibs_table$Cluster_int>0,19,18)) %>% # node point type
set("leaves_cex", 1) %>% # node point size
set("leaves_col", ibs_table$Cluster_col) %>% #node point color
branches_attr_by_labels(ibs_meta$Sample, TF_values = c(2, Inf), attr = c("lwd")) %>% # change branch width
# rect.dendrogram(k=12, cluster = ibs_table$Cluster_int, border = 8, lty = 5, lwd = 1.5,
# lower_rect = 0) %>% # add rectangles around clusters
plot(main="Barramundi samples IBS based clustering")
pvrect(ibs.pv, alpha=0.95, lwd=1.5)
Many thanks, Ido
ok, this took more work than I had hoped, but I got a solution for you.
I created a new function called pvrect2 and just pushed it to the latest version of dendextend on github. Here is a self contained example demonstrating the solution:
devtools::install_github('talgalili/dendextend')
library(pvclust)
library(dendextend)
data(lung) # 916 genes for 73 subjects
set.seed(13134)
result <- pvclust(lung[, 1:20], method.dist="cor", method.hclust="average", nboot=10)
par(mar = c(9,2.5,2,0))
dend <- as.dendrogram(result)
dend %>%
pvclust_show_signif(result, signif_value = c(3,.5)) %>%
pvclust_show_signif(result, signif_value = c("black", "grey"), show_type = "col") %>%
plot(main = "Cluster dendrogram with AU/BP values (%)")
# pvrect(result, alpha=0.95)
pvrect2(result, alpha=0.95)
text(result, alpha=0.95)
UvdV.png
Here is a toy example I have got stuck on
library(plotly)
library(dplyr)
# construct data.frame
df <- tibble(x=c(3,2,3,5,5,5,2),y=c("a","a","a","b","b","b","b"))
# construct data.frame of last y values
latest <- df %>%
group_by(y) %>%
slice(n())
# plot for one value of y (NB not sure why value for 3 appears?)
p <- plot_ly() %>%
add_histogram(data=subset(df,y=="b"),x= ~x) %>%
add_histogram(data=subset(latest,y=="b"),x= ~x,marker=list(color="red")) %>%
layout(barmode="overlay",showlegend=FALSE,title= ~y)
p
How can i set these up as subplots, one for each unique value of y? In the real world example, I would have 20 different y's so would ideally loop or apply the code. In addition, it would be good to set standard x scales of say c(1:10) and have, for example, 2 rows
TIA
build a list containing each of the plots
set the bin sizes manually for the histograms, otherwise the automatic selection will choose different bins for each of the traces within a plot (making it look strange as in you example where the bars of each trace are different widths)
use subplot to put it all together
add titles to individual subplots using a list of annotations, as explained here
Like this:
N = nlevels(factor(df$y))
plot_list = vector("list", N)
lab_list = vector("list", N)
for (i in 1:N) {
this_y = levels(factor(df$y))[i]
p <- plot_ly() %>%
add_trace(type="histogram", data=subset(df,y==this_y), x=x, marker=list(color="blue"),
autobinx=F, xbins=list(start=0.5, end=6.5, size=1)) %>%
add_trace(type="histogram", data=subset(latest,y==this_y), x = x, marker=list(color="red"),
autobinx=F, xbins=list(start=0.5, end=6.5, size=1)) %>%
layout(barmode="overlay", showlegend=FALSE)
plot_list[[i]] = p
titlex = 0.5
titley = c(1.05, 0.45)[i]
lab_list[[i]] = list(x=titlex, y=titley, text=this_y,
showarrow=F, xref='paper', yref='paper', font=list(size=18))
}
subplot(plot_list, nrows = 2) %>%
layout(annotations = lab_list)