I am trying to visualise migration data with a Sankey diagram, in which names of nodes will be repeated between the "from" and "to" columns of the data frame.
Unfortunately, highcharter tries to use single nodes and makes the edges go back and forth:
# import and prepare the data
flows <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv",
header = TRUE,
check.names = FALSE)
flows$from <- rownames(flows)
library(tidyr)
flows <- flows %>%
pivot_longer(-from, names_to = "to", values_to = "weight")
# visualise
library(highcharter)
hchart(flows, "sankey")
How would one force the nodes to be placed on two separate columns, while keeping the same colour for each area/continent?
I have used the workaround or renaming the "to" nodes so they don't share names (e.g. prepending "to " to each of them), but I would like to keep the same names and have the colours match.
# extra data preparation step for partial workaround
flows$to <- paste("to", flows$to)
I had the same trouble and it was very frustrating. The only way that worked relatively well for me was, following your approach, generating white space before the names in the "to" column, like this:
data %>% data_to_sankey() %>% mutate(to = paste(" ", to)) %>% hchart(type = "sankey")
I hope this can help you.
Thank you!
Related
In my dataset I have two columns, named part_1 and part_2, that contain several numerical values.
I am required to create a graph that shows how the average varies in the two parts.
I think that the best way is to create a barplot with a bar for each part, but I'm not sure about it.
First, I created two new columns that contain the mean values for the two parts in each row:
averages <- my_data %>% mutate(avg_part1=mean(part_1,na.rm=T)) %>% mutate(avg_part2=mean(part_2,na.rm=T))
Then, I inserted the values in two new variables:
avg_part1 <- averages %>% slice(1) %>% pull(avg_part1) avg_part2 <- averages %>% slice(1) %>% pull(avg_part2)
To create the plot I did:
to_graph<-c("First part"=avg_part1,"Second part"=avg_part2) barplot(to_graph)
And I obtained the graph I wanted, but it's not very nice to see.
I feel like this process is too complex and I may be able to do everything in a couple lines and without creating so many new variables, do you have any suggestions?
Also, I would prefer to create the graph with ggplot because it's better to improve the design, but I don't really know how to do it.
Thanks!
Using ggplot:
library(ggplot2)
library(dplyr)
my_data %>%
stack(select = c(part_1, part_2)) %>%
ggplot(aes(values, x=ind)) + geom_bar(stat="summary", fun=mean)
I am trying to streamline the process of auditing chemistry laboratory data. When we encounter data where an analyte is not detected I need to change the recorded result to a value equal to 1/2 of the level of detection (LOD) for the analytical method. I have LOD's contained within another dataframe to be used as a lookup table.
I have multiple columns representing data from different analytical tests, each with it's own unique LOD. Here's an example of the type of data I am working with:
library(tidyverse)
dat <- tibble("Lab_ID" = as.character(seq(1,10,1)),
"Tributary" = c('sawmill','paint', 'herring', 'water',
'paint', 'sawmill', 'bolt', 'water',
'herring', 'sawmill'),
"date" = rep(as.POSIXct("2021-10-01 12:00:00"), 10),
"TP" = c(1.5,15.7,-2.3,7.6,0.1,45.6,12.2,-0.1,22.2,0.6),
"TN" = c(100.3,56.2,-10.5,0.4,-0.3,11.0,45.8,256.0,12.2,144.0),
"DOC" = c(56.0,120.3,-10.5,0.2,14.6,489.3,0.3,14.4,54.6,88.8))
dat
detect_level <- tibble("Parameter" = c('TP', 'TN', 'DOC'),
'LOD' = c(0.6, 11, 0.3)) %>%
mutate(halfLOD=LOD/2)
detect_level
I have poured over multiple other questions with a similar theme:
Change values in multiple columns of a dataframe using a lookup table
R - Match values from multiple columns in a data.frame to a lookup table.
Replace values in multiple columns using different thresholds
and gotten to a point where I have pivoted the data and split it out into a list of dataframes that are specific analytes:
dat %>%
pivot_longer(cols = c('TP','TN','DOC')) %>%
arrange(name) %>%
split(.$name)
I have tried to apply a function using map(), however I cannot figure out how to integrate the values from the lookup table (detect_level) into my code. If someone could help me continue this pipe, or finish the process to achieve a final product dat2 that should look like this I would appreciate it:
dat2 <- tibble("Lab_ID" = as.character(seq(1,10,1)),
"Tributary" = c('sawmill','paint', 'herring', 'water',
'paint', 'sawmill', 'bolt', 'water',
'herring', 'sawmill'),
"date" = rep(as.POSIXct("2021-10-01 12:00:00"), 10),
"TP" = c(1.5,15.7,0.3,7.6,0.3,45.6,12.2,0.3,22.2,0.6),
"TN" = c(100.3,56.2,5.5,5.5,5.5,11.0,45.8,256.0,12.2,144.0),
"DOC" = c(56.0,120.3,0.15,0.15,14.6,489.3,0.3,14.4,54.6,88.8))
dat2
Another possibility would be from the closest similar question I have found is:
Lookup multiple column from a single table
Here's a snippet of code that I have adapted from this question, however, if you run it you will see that where values exist that are not found in detect_level an NA is returned. Additionally, it does not appear to have worked for $TN or $DOC, even in cases when the $LOD value from detect_level was present.
dat %>%
mutate(across(all_of(unique(detect_level$Parameter)),
~ {i1 <- detect_level$Parameter == cur_column()
detect_level$LOD[i1][match(., detect_level$LOD)]}))
I am not comfortable at all with the purrr language here and have only adapted this code from the question linked, so I would appreciate if this is the direction an answerer chooses, that they might comment code to explain briefly what is happening "under the hood".
Thank you in advance!
Perhaps this helps
library(dplyr)
dat %>%
mutate(across(all_of(detect_level$Parameter),
~ pmax(., detect_level$LOD[match(cur_column(), detect_level$Parameter)])))
For the updated case
dat %>%
mutate(across(all_of(detect_level$Parameter),
~ replace(., . < detect_level$LOD[match(cur_column(),
detect_level$Parameter)],detect_level$halfLOD[match(cur_column(),
detect_level$Parameter)])))
I want to create a graph from a data frame with multiple data columns, where all of the columns contain vertices, like this:
example data
If two vertices are found in a row together, then they should be connected in the graph. In my example, vertex "Case no. 3" should be connected to the following vertices: "case no. 1", "Jon", "case no. 5", "Bill" (NA should be ignored).
Thanks in advance!
Your question is about manipulate the raw data, 'cause you need to construct your edgelist correctly. The only way to do this is to indicate two columns, with the sender of the link (col. 1), and the receiver of the link (col. 2). Self-directed links are allowed (e.g., from 'a' to 'a'). The others columns are caracteristics of the links, ever.
Your example edgelist show 3 columns of vertices: this is not a valid edgeslist, one of the columns is useless. So,
you'll have to construct a valid edgelist, by manipulate the data (see below).
Then, you should tell igraph what is your edgelist and construct a graph, like in this answer and/or this one (sorry for the shameless auto-promote).
In order to construct a valid edgelist from the example you provide, with tidyverse tools and the %>% operator:
# ↓ SAMPLE DATA (colnames are different from the ones you provided) ↓
raw_data <- data.frame(case_no=c(1, 2,3, 4),
related_case =c(3,5,5, NA) ,
received_by = c("Jon", "Wendy","Jon", NA) ,
packed_by = c(NA, "Wendy", "Bill", NA) )
# ↓ First series of links ↓
edges_list <- raw_data %>%
select(FROM = case_no, related_case, TO = received_by) %>%
mutate(TYPE = 'Received') # ↑ THIS IS ONLY THE FIRST COLUMNS OF RECEIVERS
# ↓ APPEND THE SECOND LIST OF RECEIVER TO THE FIRST VERSION OF THE EDGESLIST↓
edges_list <- select(raw_data, FROM = case_no, related_case, TO = packed_by) %>%
mutate(TYPE = 'Packed') %>% #↑ HERE THE SECOND COLUMN OF RECEIVERS↑
rbind(edges_list)
edges_list <- na.omit(edges_list) # ← REMOVE NA FILLED ROWS
edges_list %>% igraph::graph_from_data_frame(directed = T) %>%
igraph::plot.igraph() # CREATE YOUR GRAPH
I've been working on this a while to no avail. I'm using both statnet to create some networks in r from survey data. The way the networks are measured in the survey allowed respondents to list network contacts not included in the survey. The way it turned out is that many network responses were surveyed, just a few are not. I'm trying to map colors to nodes based on other survey responses.
This is a replication of my issue. I want to label the nodes that have available attributes with their attribute and label those without as 'unknown' or NA or ''.
install.packages('statnet')
library(statnet)
mydata <- data.frame(
src=c('bob','sue','tom','john','sheena'),
trg=c('tom','billy','billy','bob','chris'),
vary_1=c(1,2,2,3,1)
)
net_1 <- network(mydata[1:2])
##### My attempt using dplyr to create labels ####
# it doesn't work
labs <- mydata %>%
mutate(flag = .[,1] %in% .[,2]) %>%
gather(key,value,-flag,-vary_1) %>%
mutate(i=ifelse(.$key=='trg',.$vary_1==NA,.$vary_1)) %>%
select(value) %>%
unique() %>%
.[,1] #### I think this approach is something close
set.seed(123)
gplot(net_1,vertex.cex = degree(net_1),
label=labs) #labels using the labs created above
After going through the highcharter package documentation, visiting JBKunst his website, and looking into list.parse2(), I still can not solve the problem. Problem is as follows: Looking to chart multiple series from a data.frame into a stacked barchart, series can be anywhere from 10 - 30 series. For now the series have been defined as below, but clearly there has to be an easier way, for example passing a list or melted data.frame to the function hc_series similar as what can be done with ggplot2.
Below the code with dummy data
mydata <- data.frame(A=runif(1:10),
B=runif(1:10),
C=runif(1:10))
highchart() %>%
hc_chart(type = "column") %>%
hc_title(text = "MyGraph") %>%
hc_yAxis(title = list(text = "Weights")) %>%
hc_plotOptions(column = list(
dataLabels = list(enabled = FALSE),
stacking = "normal",
enableMouseTracking = FALSE)
) %>%
hc_series(list(name="A",data=mydata$A),
list(name="B",data=mydata$B),
list(name="C",data=mydata$C))
Which produces this chart:
a good approach to add multiples series in my opinion is use hc_add_series_list (oc you can use a for loop) which need a list of series (a series is for example list(name="A",data=mydata$A).
As you said, you need to melt/gather the data, you can use tidyr package:
mynewdata <- gather(mydata)
Then you'll need to group the data by key argument to create the data for each key/series. Here you can use dplyr package:
mynewdata2 <- mynewdata %>%
# we change the key to name to have the label in the legend
group_by(name = key) %>%
# the data in this case is simple, is just .$value column
do(data = .$value)
This data frame will contain two columns and the 2nd colum will contain the ten values for each row.
Now you need this information in a list. So we need to parse using list.parse3 instad of list.parse2 beacuse preserve names like name or data.
series <- list.parse3(mynewdata2)
So finally change:
hc_series(list(name="A",data=mydata$A),
list(name="B",data=mydata$B),
list(name="C",data=mydata$C))
by:
hc_add_series_list(series)
Hope this is clear.