multiple series in Highcharter R stacked barchart - r

After going through the highcharter package documentation, visiting JBKunst his website, and looking into list.parse2(), I still can not solve the problem. Problem is as follows: Looking to chart multiple series from a data.frame into a stacked barchart, series can be anywhere from 10 - 30 series. For now the series have been defined as below, but clearly there has to be an easier way, for example passing a list or melted data.frame to the function hc_series similar as what can be done with ggplot2.
Below the code with dummy data
mydata <- data.frame(A=runif(1:10),
B=runif(1:10),
C=runif(1:10))
highchart() %>%
hc_chart(type = "column") %>%
hc_title(text = "MyGraph") %>%
hc_yAxis(title = list(text = "Weights")) %>%
hc_plotOptions(column = list(
dataLabels = list(enabled = FALSE),
stacking = "normal",
enableMouseTracking = FALSE)
) %>%
hc_series(list(name="A",data=mydata$A),
list(name="B",data=mydata$B),
list(name="C",data=mydata$C))
Which produces this chart:

a good approach to add multiples series in my opinion is use hc_add_series_list (oc you can use a for loop) which need a list of series (a series is for example list(name="A",data=mydata$A).
As you said, you need to melt/gather the data, you can use tidyr package:
mynewdata <- gather(mydata)
Then you'll need to group the data by key argument to create the data for each key/series. Here you can use dplyr package:
mynewdata2 <- mynewdata %>%
# we change the key to name to have the label in the legend
group_by(name = key) %>%
# the data in this case is simple, is just .$value column
do(data = .$value)
This data frame will contain two columns and the 2nd colum will contain the ten values for each row.
Now you need this information in a list. So we need to parse using list.parse3 instad of list.parse2 beacuse preserve names like name or data.
series <- list.parse3(mynewdata2)
So finally change:
hc_series(list(name="A",data=mydata$A),
list(name="B",data=mydata$B),
list(name="C",data=mydata$C))
by:
hc_add_series_list(series)
Hope this is clear.

Related

Highcharter Sankey diagram with repeated "to" and "from" node names

I am trying to visualise migration data with a Sankey diagram, in which names of nodes will be repeated between the "from" and "to" columns of the data frame.
Unfortunately, highcharter tries to use single nodes and makes the edges go back and forth:
# import and prepare the data
flows <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv",
header = TRUE,
check.names = FALSE)
flows$from <- rownames(flows)
library(tidyr)
flows <- flows %>%
pivot_longer(-from, names_to = "to", values_to = "weight")
# visualise
library(highcharter)
hchart(flows, "sankey")
How would one force the nodes to be placed on two separate columns, while keeping the same colour for each area/continent?
I have used the workaround or renaming the "to" nodes so they don't share names (e.g. prepending "to " to each of them), but I would like to keep the same names and have the colours match.
# extra data preparation step for partial workaround
flows$to <- paste("to", flows$to)
I had the same trouble and it was very frustrating. The only way that worked relatively well for me was, following your approach, generating white space before the names in the "to" column, like this:
data %>% data_to_sankey() %>% mutate(to = paste(" ", to)) %>% hchart(type = "sankey")
I hope this can help you.
Thank you!

make multiple separate stacked barplots from one data frame

I would like to create multiple grouped and stalked barplots with several data frames and be able to export the plots i a single pdf file.
I have several data frames with the same format but varying values. For each data frame I would like to create multiple stalked and grouped bar plots. Ideally the bar plots of the same group from the data frames should be placed next to each other and share the same Y-axis length (in order to easily visually compare the data frames).
Her an example of what ma data looks like:
data1 <- data.frame(group=c('A','A','A','A','B','B','B','B','C','C','C','C'),
Year=c('2012','2013','214','2015','2012','2013','214','2015','2012','2013','214','2015'),
Fruit=c(5,3,6,3,5,4,2,2,3,4,6,2),
Vegetables=c(3,6,1,4,8,9,43,2,1,5,0,1),
Rice=c(20,23,53,12,45,5,23,12,32,41,54,32))
data2 <- data.frame(group=c('A','A','A','A','B','B','B','B','C','C','C','C'),
Year=c('2012','2013','214','2015','2012','2013','214','2015','2012','2013','214','2015'),
Fruit=c(2,4,5,2,3,9,4,7,5,7,4,7),
Vegetables=c(9,7,8,7,4,3,0,0,2,3,5,6),
Rice=c(23,12,32,41,54,32,20,23,53,12,45,5))
data1 <- pivot_longer(data1, cols = 3:5, names_to = 'Type', values_to = 'value')
data2 <- pivot_longer(data1, cols = 3:5, names_to = 'Type', values_to = 'value')
I started by formating the tables like this:
data1 <- pivot_longer(data1, cols = 3:5, names_to = 'Type', values_to = 'value')
data2 <- pivot_longer(data1, cols = 3:5, names_to = 'Type', values_to = 'value')
My attempts to use ggplot to create the desired PDF have so far failed. I took several different attempts but could not get near to the desired PDF. I found instructions on how to create several plots for one data frame, or grouped plots or stalked plots, but never the combination of all three.
If possible the PDF I would like to get for this example should look like this:
In total 6 plots: left 3 plots data1, right 3 plots data2; Group A row1, Group B row2, Group C row3 (if possible same y axis length in one row/Group)
All bar plots: x-axis= years, y-axis= value / 1 stalked bar per year with colors matching Type (Fruit, Vegetable, Rice)
Group name per row
data source(data1, data2) per column
legend with Types (Fruit, Vegetable, Rice)
Q1. Is something like this possible or would one have to create two PDFs (for each data.table, here: data1 and data2).
Q2. Is it possible to format the code in a way to automatically adjust the amount of plots needed according to the data frames and adjust the PDF file size automatically and create a new page if necessary? (In reality i have 5 data frames and 13 Groups, this may however change with time)
I know this is quite a difficult code to write. I have spent two working days on this already though, which is why I am now asking for help here. I will try again tomorrow and post any possible progress here.
Thank you very much for any suggestions
This code should produce the desired plot (or at least something really close).The two critical steps include: 1) joining all the dataframes into a single one, using bind_rows and 2) using facet_grid to set define the layout panels according to two variables (group and id).
library(tidyverse)
# Combine the data
# id column contains the number of the dataframe from which the data comes from
df <- bind_rows(data1, data2, .id = "id")
df %>%
# Change to long format, add 1 to the columns number, as we now added id column
pivot_longer(cols = 4:6,
names_to = 'Type',
values_to = 'value') %>%
# Transform value to x / 1
mutate_at(vars(value), function(x) x / 1) %>%
# Do plot
ggplot(aes(x = Year,
y = value,
fill = Type)) +
# columns
geom_col()+
# Facets by two factors, groups and data source (id)
facet_grid(group ~ id)
# Save plot to pdf
ggsave("my_plot.pdf",
device = "pdf",
height = 15,
width = 20,
units = "cm",
dpi = 300)

Are there any modification/another function to plot two numerical variable against one string variable?

I have a data set like this one: Names of mutations and two numerical variables representing values in two conditions (CIP and TIG):
I was able to plot one variable (e.g. CIP) in these mutation using the following code:
Data names as "Dotchart2)
dotchart(Dotchart2$`CIP resistance`,
labels = rownames((Dotchart2)), pch = 16, cex = 1, pt.cex = 2)
This appeared as follows:
Since I am comparing CIP vs TIG, I would like to have the same figure but showing another dots for the TIG for the same mutation (i.e. on each horizontal mutation line, there will be two dots of different color, one for CIP value and the other for TIG value). It should appear like this figure for instance
Could any of you provide a simplified code for this ?
I think you'll find your answer here.
In the link provided, #JoshO'Brien creates a dotchart plot using a lattice configuration:
autos_data <- read.table("~/Documents/R/test.txt", header=F)
library(lattice)
dotplot(V1~V2, data=autos_data)
This documentation does a thorough job of explaining and detailing graph styles (graph_type), data graphing (formula), and the data source (data=), resulting in the following:
library(lattice)
graph_type(formula, data=)
To do this easily in lattice or ggplot2 you first need to convert your data to long format. I don't have a data set handy in the right format, so I took the famous iris data set and converted it to a wide-format data set called iris_wide (see code at the bottom). I'm using tidyverse here: all of this can also be done in base R.
(To understand what's going on here you should definitely examine the iris_wide and iris_long objects.)
convert from wide to long format
library(tidyverse)
iris_long <- iris_wide %>%
pivot_longer(cols=-id,names_to="species",values_to="value")
lattice version
lattice::dotplot(id~value, data=iris_long, group=species,pch=16,
auto.key=TRUE)
ggplot version
ggplot(iris_long, aes(value,id,colour=species))+geom_point()
convert iris data from long to wide
To match your example, I'm selecting only two categories (species) and one variable (sepal length)
iris_wide <- (iris
%>% filter(Species %in% c("setosa","virginica"))
%>% select(Sepal.Length, Species)
%>% group_by(Species)
%>% mutate(id=seq(n()))
%>% pivot_wider(names_from=Species, values_from=Sepal.Length)
%>% head(10)
%>% mutate(id=LETTERS[seq(n())])
)

Plotting only 1 hourly datapoint (1 per day) alongside hourly points (24 per day) in R Studio

I am a bit stuck with some code. Of course I would appreciate a piece of code which sorts my dilemma, but I am also grateful for hints of how to sort that out.
Here goes:
First of all, I installed the packages (ggplot2, lubridate, and openxlsx)
The relevant part:
I extract a file from an Italians gas TSO website:
Storico_G1 <- read.xlsx(xlsxFile = "http://www.snamretegas.it/repository/file/Info-storiche-qta-gas-trasportato/dati_operativi/2017/DatiOperativi_2017-IT.xlsx",sheet = "Storico_G+1", startRow = 1, colNames = TRUE)
Then I created a data frame with the variables I want to keep:
Storico_G1_df <- data.frame(Storico_G1$pubblicazione, Storico_G1$IMMESSO, Storico_G1$`SBILANCIAMENTO.ATTESO.DEL.SISTEMA.(SAS)`)
Then change the time format:
Storico_G1_df$pubblicazione <- ymd_h(Storico_G1_df$Storico_G1.pubblicazione)
Now the struggle begins. Since in this example I would like to chart the 2 time series with 2 different Y axes because the ranges are very different. This is not really a problem as such, because with the melt function and ggplot i can achieve that. However, since there are NAs in 1 column, I dont know how I can work around that. Since, in the incomplete (SAS) column, I mainly care about the data point at 16:00, I would ideally have hourly plots on one chart and only 1 datapoint a day on the second chart (at said 16:00). I attached an unrelated example pic of a chart style I mean. However, in the attached chart, I have equally many data points on both charts and hence it works fine.
Grateful for any hints.
Take care
library(lubridate)
library(ggplot2)
library(openxlsx)
library(dplyr)
#Use na.strings it looks like NAs can have many values in the dataset
storico.xl <- read.xlsx(xlsxFile = "http://www.snamretegas.it/repository/file/Info-storiche-qta-gas-trasportato/dati_operativi/2017/DatiOperativi_2017-IT.xlsx",
sheet = "Storico_G+1", startRow = 1,
colNames = TRUE,
na.strings = c("NA","N.D.","N.D"))
#Select and rename the crazy column names
storico.g1 <- data.frame(storico.xl) %>%
select(pubblicazione, IMMESSO, SBILANCIAMENTO.ATTESO.DEL.SISTEMA..SAS.)
names(storico.g1) <- c("date_hour","immesso","sads")
# the date column look is in the format ymd_h
storico.g1 <- storico.g1 %>% mutate(date_hour = ymd_h(date_hour))
#Not sure exactly what you want to plot, but here is each point by hour
ggplot(storico.g1, aes(x= date_hour, y = immesso)) + geom_line()
#For each day you can group, need to format the date_hour for a day
#You can check there are 24 points per day
#feed the new columns into the gplot
storico.g1 %>%
group_by(date = as.Date(date_hour, "d-%B-%y-")) %>%
summarise(count = n(),
daily.immesso = sum(immesso)) %>%
ggplot(aes(x = date, y = daily.immesso)) + geom_line()

Is it possible to combine density plots of two separate variables with ggvis

I feel like I've searched everywhere for this but essentially I have time series data of multiple numeric variables and I wanted to create one single plot that has then density function of two or variables on it.
So essentially I have:
df %>% ggvis(~y1) %>% layer_densities()
df %>% ggvis(~y2) %>% layer_densities()
but if I do something like:
df %>% ggvis(~y1) %>% layer_densities() %>% layer_densities(~y2)
I get the following error:
Error in eval(expr, envir, enclos) : object 'y2' not found
I feel like this shouldn't be too difficult but I can't figure it out, I don't think I am supposed to use group by because these are two seperate variables with no similar factors or characteristics. Any help would be appreciated.
You can work-around by reshaping your dataset so you have a grouping variable in one column and the values of both columns you want to plot in another. I do the work via melt from reshape2.
library(reshape2)
df2 = melt(df, measure.vars = c("y1", "y2"))
Once you do that you can use group_by to get a separate density layer for each variable.
df2 %>% group_by(variable) %>%
ggvis(~value) %>%
layer_densities()
in ggplot you can set color = "your time variable" in aes() to get this density

Resources