R Highcharter mosaïc / marimekko chart - r

i tried to create a marimekko chart in R Highcharter, following this example :
http://jsfiddle.net/highcharts/h2np93k1/
I cannot seem to get the sortIndex of the treemap to work, my code is as follows:
parentid <- c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5)
sortIndex <- c(0, 1, 2, 3, 4, 0, 1, 2, 3, 4)
child <- c("Alpha", "Alpha", "Alpha", "Alpha", "Alpha", "Beta", "Beta", "Beta", "Beta", "Beta")
childid <- c(100, 100, 100, 100, 100, 200, 200, 200, 200, 200)
colorid <- c(100, 100, 100, 100, 100, 200, 200, 200, 200, 200)
parent <- c("Parent 1", "Parent 2", "Parent 3", "Parent 4", "Parent 5", "Parent 1", "Parent 2", "Parent 3", "Parent 4", "Parent 5")
value <- c(10, 60, 70, 20, 90, 50, 30, 10, 90, 10)
data <- data.frame(parentid, sortIndex, child, childid, colorid, parent, value)
hctreemap2(data, group_vars=c("parentid", "childid"),
size_var="value",
color_var="colorid",
layoutAlgorithm='stripes',
alternateStartingDirection = T,
stacking="percent",
levelIsConstant = F,
sortIndex=sortIndex,
levels = list(
list(level=1, dataLabels = list(enabled=T, align='left', verticalAlign='top'), borderWidth=3),
list(level=2, dataLabels = list(enabled=T))))
does anyone have any ideas?

I realize this question was several years ago, and Highcharter has changed since it was written.
This answer may not have worked in 2018, but it does work now.
At some point hctreemap and hctreemap2 were deprecated. The instructions from Highcharter are to use data_to_hierarchical to prepare the data and then use either hchart() or highchart() to create the treemap. However, this method will strip the sortIndex, so I don't think that's the route you would want to go.
Instead, I prepared the data by formatting it as it is in the JS link you provided and then graphed it.
The data:
# collect hc colors (like they did in the example)
colrs <- getOption("highcharter.color_palette")
# two data frames, one for each level
# id isn't as important until you go beyond 2 levels
pars <- data.frame(id = unique(data$parent),sortIndex = unique(data$sortIndex))
kids <- data.frame(
name = data$child, parent = data$parent, sortIndex = data$sortIndex,
value = data$value, color = data$color/100) %>% mutate(color = colrs[color])
# this assumes data is already sorted by sort order*
newData <- list() # for storing the data as hierarchical
invisible(map(1:nrow(pars),
function(j) {
p <- pars[j, ]$id # collect par id to find children
k <- kids[kids$parent == p, ] # isolate applicable children
pl <- list_parse(pars[j, ]) # make row a list
kl <- list_parse(k) # make each child row a list
newData <<- append(newData, pl) # add parent
newData <<- append(newData, kl) # add that parent's children
}))
Now the data is ready for plotting.
hchart(newData, type ="treemap", layoutAlgorithm = "stripes",
alternateStartingDirection = T)

Related

labeling values over the link and nodes in Sankey Diagram with networkD3 [duplicate]

Background
I was trying the create a Sankey graph like the following figure. Actually, I wanted to get a output where values (10, 20, 30, 40) will be set in the paths (from one node to another node).
How Did I Try?
At first, I tried using the Plotly library of Python. However, somewhere I have seen that it is not possible to set the values in the links or the paths of Sankey graph in Plotly (of Python). Later, I switched to R (for some other reasons also) where more resources are available (I think). However, here, I am also facing the same problem. I have checked many tutorials (e.g., this one), Q&A (e.g., 1, 2, 3) of SO which are in R. Still, I could not to find a tutorial or resources where the values are displayed in the paths!
My Question
How can I display the values on the links/paths of Sankey Graph, in R?
Note: This and this questions of SO seems to be similar. However, I failed to understand the way to incorporate those in my codes.
Example Code (collected from here)
# install.packages('networkD3')
library(networkD3)
nodes = data.frame("name" =
c("Node A", # Node 0
"Node B", # Node 1
"Node C", # Node 2
"Node D"))# Node 3
links = as.data.frame(matrix(c(
0, 1, 10, # Each row represents a link. The first number
0, 2, 20, # represents the node being conntected from.
1, 3, 30, # the second number represents the node connected to.
2, 3, 40),# The third number is the value of the node
byrow = TRUE, ncol = 3))
names(links) = c("source", "target", "value")
sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "name",
fontSize= 50, nodeWidth = 30)
This can be achieved by injecting custom JavaScript code when it's rendered using htmlwidgets::onRender(). The example below will initially position the link labels appropriately, but if the nodes are manually moved, the link labels will not automatically update accordingly. To achieve that, you would probably have to also override the default dragmove behaviour.
library(htmlwidgets)
library(networkD3)
nodes <-
data.frame(
name = c("Node A", "Node B", "Node C", "Node D")
)
links <-
data.frame(
source = c(0, 0, 1, 2),
target = c(1, 2, 3, 3),
value = c(10, 20, 30, 40)
)
p <- sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "name",
fontSize= 20, nodeWidth = 30)
htmlwidgets::onRender(p, '
function(el) {
var nodeWidth = this.sankey.nodeWidth();
var links = this.sankey.links();
links.forEach((d, i) => {
var startX = d.source.x + nodeWidth;
var endX = d.target.x;
var startY = d.source.y + d.sy + d.dy / 2;
var endY = d.target.y + d.ty + d.dy / 2;
d3.select(el).select("svg g")
.append("text")
.attr("text-anchor", "middle")
.attr("alignment-baseline", "middle")
.attr("x", startX + ((endX - startX) / 2))
.attr("y", startY + ((endY - startY) / 2))
.text(d.value);
})
}
')

Save multiple ggplots with different layout matrices

Currently I'm creating multiple plots with regional data and save them to a PDF file. This works without problems, thanks to an SO post I've found (use grid.arrange over multiple pages or marrangeGrob with a layout_matrix).
This is my code so far:
library(ggplot2)
library(gridExtra)
library(dplyr)
data <- data.frame(
region = c("region 1", "region 2", "region 3", rep("region 4", 2), rep("region 5", 2)),
countries = c("country 1", "country 2", "country 3", "country 4", "country 5", "country 6", "country 7"),
dummydata1 = c(rep(1, 7)),
dummydata2 = c(rep(2, 7))
)
criterias <- list()
criterias[[ 'region_1' ]] <- data %>% filter(region == 'region 1')
criterias[[ 'region_2' ]] <- data %>% filter(region == 'region 2')
criterias[[ 'region_3' ]] <- data %>% filter(region == 'region 3')
criterias[[ 'region_4' ]] <- data %>% filter(region == 'region 4')
criterias[[ 'region_5' ]] <- data %>% filter(region == 'region 5')
# This layout matrix should be used for the regional plots
# Don't wonder about the strange numbering, some plots came later
# and it was easier to modify the matrix then all other functions.
regionLayout <- rbind(
c(1,1,1,1,1,2),
c(NULL,NULL,3,3,NULL,NULL),
c(9,9,4,4,10,10),
c(6,6,6,7,7,7),
c(6,6,6,7,7,7),
c(6,6,6,7,7,7),
c(6,6,6,7,7,7),
c(6,6,6,7,7,7),
c(6,6,6,7,7,7)
)
# This is just a dummy function
# The actual function creates several plots based on the real data
createRegionalPlots <- function (data, region) {
examplePlots <- list(ggplot() + ggtitle('Title (ggtext = plot 1)'),
ggplot() + ggtitle('Month (ggtext = plot 2)'),
ggplot() + ggtitle('Plot 1 (tile = 3)'),
ggplot() + ggtitle('Plot 2 (tile = 4)'),
ggplot() + ggtitle('Plot 3 (geom_bar = 5)'),
ggplot() + ggtitle('Plot 4 (geom_bar = 6)'),
ggplot() + ggtitle('Plot 5 (tile = 7)'),
ggplot() + ggtitle('Plot 6 (tile = 8)'))
}
# Found in https://stackoverflow.com/questions/43491685/
preparePage <- function(plots,layoutMatrix) {
# pdf(file = NULL) #invisible
par(mar=(c(5,5,5,5)))
plotsPerPage <- length(unique(na.omit(c(layoutMatrix))))
ml <- lapply(1:ceiling(length(plots)/plotsPerPage), function(page_IND){
ind <- (1 + ((page_IND - 1) * plotsPerPage )) : (page_IND * plotsPerPage)
grid.arrange(grobs = plots[ind], layout_matrix = layoutMatrix)
})
return(marrangeGrob(grobs=ml,nrow=1,ncol=1,top=NULL))
# dev.off() #invisible
}
# Here I'm running through all regions
regionalPlotList <- list()
for (region in names(criterias)) {
regionData <- criterias[[region]]
regionalPlots <- createRegionalPlots(data = regionData, region = region)
regionalPlotList <- do.call(c, list(regionalPlotList, regionalPlots))
}
# This leaves me with a list of 40 plots (5 regions x 8 plots)
allPlots <- preparePage(regionalPlotList, regionLayout)
ggsave("example.pdf",width = 297, height = 210, units = "mm", plot = allPlots)
As said, this works perfectly and leaves me (using the current data) with a five page report, one per every region and with the required layout.
I have now been asked to add additional per country plots at the end of the regional report and these pages should have a different layout (and different plots).
Overestimating myself (and my knowledge of r resp. ggplot) once again, I thought of this as an easy job (which it probably is for everyone else, but I'm stuck).
So, I've created a list of new criterias and a function, including a new layout:
createCountryPlots <- function(data, country) {
exampleCountryPlots <- list(ggplot() + ggtitle('Title (ggtext = plot 1)'),
ggplot() + ggtitle('Month (ggtext = plot 2)'),
ggplot() + ggtitle('Plot 1 (bar = 3)'),
ggplot() + ggtitle('Plot 2 (pie = 4)'),
ggplot() + ggtitle('Plot 3 (geom_bar = 5)'),
ggplot() + ggtitle('Plot 4 (geom_bar = 6)')
)
}
countryLayout = rbind(
c(1, 1, 1, 1, 1, 2),
c(3, 3, 3, 4, 4, 4),
c(3, 3, 3, 4, 4, 4),
c(3, 3, 3, 4, 4, 4),
c(5, 5, 5, 6, 6, 6),
c(5, 5, 5, 6, 6, 6),
c(5, 5, 5, 6, 6, 6)
)
# prepare the data per country
countryCriterias <- list()
countryCriterias[[ 'country_1' ]] <- data %>% filter(country == 'country 1')
countryCriterias[[ 'country_2' ]] <- data %>% filter(country == 'country 2')
# Running through all selected countries
countryPlotList <- list()
for (country in names(countryCriterias)) {
countryData <- countryCriterias[[country]]
countryPlots <- createCountryPlots(data = countryData, country = country)
countryPlotList <- do.call(c, list(countryPlotList, countryPlots))
}
countryPlots <- preparePage(countryPlotList, countryLayout)
# Just saving the country plots works perfectly again
ggsave("example.pdf",width = 297, height = 210, units = "mm", plot = countryPlots)
Saving this plots in a separate file works without any problems, but I'm currently stuck on how to combine these plots in one single PDF, respecting the different layouts the pages should have.
I've tried several possibilities (i.e. grid.arrange and arrangeGrob etc.), but I haven't been able to combine the plots into a single file.
Could anyone please enlighten me?
Edit:
Sorry, if I didn't make myself clear enough. This would be the result I should have at the end.
Thanks to the hint by #teunbrand to have a look at the patchwork package, I've found a solution to my problem.
It's in general almost the same as before, but instead of trying to arrange the plots first and then saving them, I "print" them directly to a pdf in the for-loop.
# defininig the layouts (simplified)
regionLayout <- "
AAAAAB
##CC##
DDEEFF
GGGHHH
GGGHHH"
countryLayout <- "
AAAAAB
CCCCDD
CCCCDD
EEEEFF
EEEEFF
"
# opening pdf
pdf('example5.pdf', pagecentre = FALSE, width = 29.7/2.54, height = 21/2.54)
par(mar = c(5, 5, 5, 5), oma = c(1, 1, 1, 1))
for (region in names(criterias)) {
regionData <- criterias[[region]]
regionalPlots <- createRegionalPlots(data = regionData, region = region)
# as regionalPlots is a list of plots, I'm using wrap_plots, which can take a dynamic
# number of plots
print(wrap_plots(regionalPlots, design = regionLayout))
}
# then the same for the country plots, with a different layout
countryPlotList <- list()
for (country in names(countryCriterias)) {
countryData <- countryCriterias[[country]]
countryPlots <- createCountryPlots(data = countryData, country = country)
print(wrap_plots(countryPlots, design = countryLayout))
}
dev.off()
And at the end I have my PDF with seperate layouts...
Thank you all for your help!!!
PS: Took me a while to find out why the PDF always was empty, before I realized that wrap_plot just arranges the plots but does not print them. As said, relatively new to R (did I mention that?)

"Nested" barplots, with multiple levels of grouping

How can I group bars in a barplot by a third variable?
I would like to achieve this in base R, without, for example, ggplot2, as in this related question. In another related question the groups of groups are labeled, but not (visually) grouped – as in my example above –, making the plot difficult to read.
Sample data:
groups = c("A", "B")
choices = c("orange", "apple", "beer")
supergroups = c("fruits", "non-fruits")
dat <- data.frame(
group = rep(groups, c(93, 94)),
choice = factor(c(
rep(choices, c(51, 30, 12)),
rep(choices, c(47, 29, 18))
),
levels = choices
),
supergroup = c(
rep(supergroups, c(81, 12)),
rep(supergroups, c(76, 18))
)
)
barplot(table(dat), beside = TRUE)
Which returns the error:
Error in barplot.default(table(dat), beside = TRUE) :
'height' must be a vector or a matrix

Subgroups in R timevis

Utilizing the grouping feature in the excellent R timevis package is well documented and examples are provided in the help page of timevis::timevis().
The documentation also says that it is possible to define subgroups, which
"Groups all items within a group per subgroup, and positions them on the same height instead of stacking them on top of each other."
I am having trouble understanding how to use this feature. For example, in the example below, I would expect that "event 1" and "event 2" are defined as their own subgroups and hence they would be positioned on the same height. However, this is not the case.
timedata <- data.frame(
id = 1:6,
start = Sys.Date() + c(1, - 10, 4, 20, -10, 10),
end = c(rep(as.Date(NA), 4), Sys.Date(), Sys.Date() + 20),
group = c(1,1,1,2,2,2),
content = c("event 1", "event 2", "event 2", "event 1", "range 1", "range 1"),
subgroup = c("1.1", "1.2", "1.2", "2.1", "2.2", "2.2")
)
groups <- data.frame(id = c(1,2), content = c("g1", "g2"))
timevis::timevis(data =timedata, groups = groups)
The result of the example code. The definition of subgroups is unsuccesful
How to correctly utilize the subgroups feature?
I'm working through the subgroup and subgroupOrder functions myself, and wanted to share a couple of tips. The code below should achieve overlaying the events on top of each other, as opposed to stacking them. Note the addition of stack = FALSE in the options list().
The other place to look is at the JS documentation: http://visjs.org/docs/timeline/
timedata <- data.frame(
id = 1:6,
start = Sys.Date() + c(1, - 10, 4, 20, -10, 10),
end = c(rep(as.Date(NA), 4), Sys.Date(), Sys.Date() + 20),
group = c(1,1,1,2,2,2),
content = c("event 1", "event 2", "event 2", "event 1", "range 1", "range 1"),
subgroup = c("1.1", "1.2", "1.2", "2.1", "2.2", "2.2")
)
groups <- data.frame(id = c(1,2), content = c("g1", "g2"))
timevis::timevis(data =timedata, groups = groups, options = list(stack = FALSE))
Produces this output,
Not sure if that's exactly what you're trying to achieve, but just a response. Hope you've made some progress otherwise!

a single ColorRamp palette yields different color schemes in two plotly scatterplots

As the title says, a single ColorRamp palette when used in two different scetterplots built by plotly, gives two slightly differently looking (note the middle part) colorbars even though the upper and the lower bounds of corresponding data sets are manually set to be identical in both plots.
I'd like to make the plots visually comparable, and for that I'd obviously have to have identical colorbars. Is there a way to do that?
Here's the code:
myxaxis <- list(range = c(16, 44), dtick=2, gridwidth = 1, title = "Length of carbon chain") #setting the visible area of x axis
myyaxis <- list(range = c(0, 8), gridwidth = 1, title = "No. of double bonds") #setting the visible area of y axis
mycolors <- colorRampPalette(c('green', 'red', 'black'))(n = 100) #creating an RColorBrewer palette
ch_new1 <- cbind.data.frame(c('PA', 'PA', 'PA', 'PA', 'PA', 'PA', 'PA', 'PA', 'PA', 'upper bound', 'lower bound'), c(4.571087, 6.522441, 6.522441, 5.081869, 4.471815, 5.744834, 7.329796, 5.472866, 5.744834, 1, 1), c(10.52337, 16.75454, 16.0976, 16.47356, 18.94973, 17.46351, 10.97607, 18.11186, 11.64033, 0.2085327, 71.18021), c(32, 34, 34, 36, 36, 36, 38, 38, 38, 100, 100), c(1, 1, 2, 2, 3, 4, 4, 5, 6, 100, 100), c(0.4128963, 16.68394, 26.52718, 23.50851, 16.02339, 3.971546, 6.854153, 3.24342, 2.774968, 1, 1)) #the first dataset
colnames(ch_new1) <- c('Species', 'log_inversed_pval','fold_difference', 'N_of_carbons','N_of_double_bonds', 'rel_abund')
d <- plot_ly(ch_new1, x=~N_of_carbons, y=~N_of_double_bonds, text = ~paste('Percent of total', Species, '=', round(rel_abund, 0)), size=~rel_abund, color=~fold_difference, colors = mycolors)%>% #producing the scatter plot
layout(
xaxis = myxaxis,
yaxis = myyaxis,
title = paste('PA', '2b')
)%>%
colorbar(title="Fold difference", ypad=20)
export(d)
ch_new2 <- cbind.data.frame(c('LPC', 'LPC', 'LPC', 'lower limit', 'upper limit'), c(7.329796, 7.329796, 5.081869, 1, 1), c(2.952345, 5.042931, 3.700331, 0.2085327, 71.18021), c(18, 20, 22, 100, 100), c(0, 3, 5, 100, 100), c(82.87528, 13.56943, 3.555281, 1, 1)) #the second dataset
colnames(ch_new2) <- c('Species', 'log_inversed_pval','fold_difference', 'N_of_carbons','N_of_double_bonds', 'rel_abund')
d <- plot_ly(ch_new2, x=~N_of_carbons, y=~N_of_double_bonds, text = ~paste('Percent of total', Species, '=', round(rel_abund, 0)), size=~rel_abund, color=~fold_difference, colors = mycolors)%>% #creating the second scatterplot
layout(
xaxis = myxaxis,
yaxis = myyaxis,
title = paste(unique(ch$Species)[i], fraction)
)%>%
colorbar(title="Fold difference", ypad=20)
export(d)
chart #1 with bright red middle
chart #2 with dim red middle
I've solved the problem on my own.
Turns out that by adding one or several "anchoring" dummy points placed beyond the margins of the plot (so they are not shown) helps to make the plot colorbars almost identical.
The initial dataset
ch_new1 <- cbind.data.frame(c(...)) #the first dataset
should be appended with anchoring dummy points:
ch_new1 <- cbind.data.frame(c(...)) #the first dataset
ch_new1 <- rbind(ch_new, list('middle anchor point', 1, 50, 100, 100, 1))
ch_new1 <- rbind(ch_new, list('quarter anchor point', 1, 25, 100, 100, 1))
tl;dr anchor the variable responsible for colorbar to multiple reference points (10, 20, 30, 40, 50, ...)

Resources