R Circular Chord Plots - r

Im learning how to create circular plots in R, similiar to CIRCOS
Im using the package circlize to draw links between origin and destination pairs based on if the flight was OB, Inbound and Return. The logic fo the data doesnt really matter, its just a toy example
I have gotten the plot to work based on the code below which works based on the following logic
Take my data, combine destination column with the flight type
Convert to a matrix and feed the origin and the new column into circlize
Reference
library(dplyr)
library(circlize)
# Create Fake Flight Information in a table
orig = c("IE","GB","US","ES","FI","US","IE","IE","GB")
dest = c("FI","FI","ES","ES","US","US","FI","US","IE")
direc = c("IB","OB","RETURN","DOM","OB","DOM","IB","RETURN","IB")
mydf = data.frame(orig, dest, direc)
# Add a column that combines the dest and direction together
mydf <- mydf %>%
mutate(key = paste(dest,direc)) %>%
select (orig, key)
# Create a Binary Matrix Based on mydf
mymat <- data.matrix(as.data.frame.matrix(table(mydf)))
# create the objects you want to link from to in your diagram
from <- rownames(mymat)
to <- colnames(mymat)
# Create Diagram by suppling the matrix
par(mar = c(1, 1, 1, 1))
chordDiagram(mymat, order = sort(union(from, to)), directional = TRUE)
circos.clear()
I like the plot a lot but would like to change it a little bit. For example FI (which is Finland) has 3 measurements on the diagram FI IB, FI OB and FI. I would like to combine them all under FI if possible and distinguish between the three Types of flights using either a colour scheme, Arrows or even adding an additional track which acts as an umbrella for IB OB and RETURN flights
So for Example,
FI OB would be placed in FI but have a one way arrow to GB to signify OB
FI IB would be placed in FI but have a one way arrow into FI
FI RETURN (if it exists) would have a double headed arrow
Can anyone help, Has anyone seen anything similiar been done before?
The end result should just have the countries on the plot once so that someone can see very quickly which countries have the most amount of flights
I have tried following other posts but am afraid im getting lost when they move to the more advanced stuff
Thank you very much for your time

First, I think there is a duplicated record (IE-FI-IB) in your data.
I will first attach the code and figure and then explain a little bit.
df = data.frame(orig, dest, direc, stringsAsFactors = FALSE)
df = unique(df)
col = c("IB" = "red",
"OB" = "blue",
"RETURN" = "orange",
"DOM" = "green")
directional = c("IB" = -1,
"OB" = 1,
"RETURN" = 2,
"DOM" = 0)
diffHeight = c("IB" = -0.04,
"OB" = 0.04,
"RETURN" = 0,
"DOM" = 0)
chordDiagram(df[1:2], col = col[df[[3]]], directional = directional[df[[3]]],
direction.type = c("arrows+diffHeight"),
diffHeight = diffHeight[df[[3]]])
legend("bottomleft", pch = 15, legend = names(col), col = col)
First you need to use the development version of circlize for which
you can install it by
devtools::install_github("jokergoo/circlize")
In this new version, chordDiagram() supports input variable as a data frame and drawing two-head arrows for the links (just after reading your post :)).
In above code, col, directional, direction.type and diffHeight can all be set as a vector which corresponds to rows in df.
When directional argument in chordDiagram() is set to 2, the corresponding link will have two directions. Then if direction.type contains arrows, there will be a two-head arrow.
Since diffHeight is a vector which correspond to rows in df, if you want to visualize the direction for a single link both by arrow and offset of the roots, you need to merge these two options as a single string as shown in the example code "arrows+diffHeight".
By default direction for links are from the first column to the second column. But in your case, IB means the reversed direction, so we need to set diffHeight to a negative value to reverse the default direction.
Finally, I observe you have links which start and end in a same sector (ES-ES-DOM and US-US-DOM), you can use self.link argument to control how to represent such self-link. self.link is set to 1 in following figure.

Do you need the arrows because the color coding in the graph is telling the From / To story already (FROM -> color edge FROM COUNTRY, TO is color of the FROM COUNTRY arriving at the TO COUNTRY, IF FROM == TO Its own color returns at its own base (see US or ES for example)).
library(dplyr)
library(circlize)
# Create Fake Flight Information in a table
orig = c("IE","GB","US","ES","FI","US","IE","IE","GB")
dest = c("FI","FI","ES","ES","US","US","FI","US","IE")
mydf = data.frame(orig, dest)
# Create a Binary Matrix Based on mydf
mymat <- data.matrix(as.data.frame.matrix(table(mydf)))
# create the objects you want to link from to in your diagram
from <- rownames(mymat)
to <- colnames(mymat)
# Create Diagram by suppling the matrix
par(mar = c(1, 1, 1, 1))
chordDiagram(mymat, order = sort(union(from, to)), directional = TRUE)
circos.clear()
BY the way -> there is also a OFFSET difference on the edge that tells if it is FROM (wider edge) or TO (smaller edge)

Related

Highlight a Specific Section of a Treemap using Treemap package in R

I am using the Treemap package in R to highlight the number of COVID outbreaks in different settings. I am making a number of different reports using R Markdown. Each one describes a different type of settings and I would like to highlight that setting in the treemap for each report, showing what proportion of total outbreaks occur in the setting in question. For example you I am currently working on the K-12 school report and would like to highlight the box representing that category in the figure.
I was previously using an exploded donut pie chart however there were two many subcategories and the graph became hard to read.
I am picturing a way to change the label or border on one specific box, ie. put a yellow border around the box or make the label yellow. I found a way to do both these things for all the boxes but not just one specific box. I made this image using the snipping tool to further illustrate what the desired outcome might look like. The code to generate the treemap can be found in the link below. It looks like this:
# library
library(treemap)
# Build Dataset
group <- c(rep("group-1",4),rep("group-2",2),rep("group-3",3))
subgroup <- paste("subgroup" , c(1,2,3,4,1,2,1,2,3), sep="-")
value <- c(13,5,22,12,11,7,3,1,23)
data <- data.frame(group,subgroup,value)
# treemap
treemap(data,
index=c("group","subgroup"),
vSize="value",
type="index"
)
This is the most straightforward information I can find about the package, this is where I took the sample image and code from: https://www.r-graph-gallery.com/236-custom-your-treemap.html
It looks like the treemap package doesn't have a built-in way to do this. But we can hack it by using the data frame returned by treemap() and adding a rectangle to the appropriate viewport.
# Plot the treemap and save the data used for plotting.
t = treemap(data,
index = c("group", "subgroup"),
vSize = "value",
type = "index"
)
# Add a rectangle around subgroup-2.
library(grid)
library(dplyr)
with(
# t$tm is a data frame with one row per rectangle. Filter to the group we
# want to highlight.
t$tm %>%
filter(group == "group-1",
subgroup == "subgroup-2"),
{
# Use grid.rect to add a rectangle on top of the treemap.
grid.rect(x = x0 + (w / 2),
y = y0 + (h / 2),
width = w,
height = h,
gp = gpar(col = "yellow", fill = NA, lwd = 4),
vp = "data")
}
)

R: Sorting Sectors in Chord Diagram based on Width?

I'm using Chord Diagrams in R (via the Circlize/Circos packages) to visual name associations in a dataset. I was able to generate the Chord Diagram (as shown below):
However, I don't know how to sort each sector (or each name) based on its respective width (e.g.: In the lower half of the Chord Diagram, I would like to arrange the sectors in descending order like this: N/A would be placed first, followed by Dean, Aaron, Malcolm, ... Jay). Is there a specific circos function that would allow me to do this?
Here's my code:
library(circlize)
setwd("C:/Users/Main/Desktop/")
data <- read.table('./r_test.txt',header = FALSE,sep = '\t')
chordDiagram(data,annotationTrack="grid",grid.col =
c("springgreen","coral","indianred","violet",
"greenyellow","cyan","purple","firebrick",
"gold","darkblue","red","magenta",
"orangered","brown","blueviolet","darkgoldenrod",
"aquamarine","khaki"),preAllocateTracks=list(track.height = link.sort =
TRUE,link.decreasing = TRUE)
circos.trackPlotRegion(track.index = 1, panel.fun = function(x, y) {
xlim = get.cell.meta.data("xlim")
xplot = get.cell.meta.data("xplot")
ylim = get.cell.meta.data("ylim")
sector.name = get.cell.meta.data("sector.index")
circos.text(mean(xlim), ylim[1], sector.name, facing = " niceFacing = TRUE,
adj = c(0, .75),cex=2)
},bg.border = NA)
The data file is a tab-delineated .txt file with names in the first 2 columns (there are 10 names in each column along with "Other" and "N/A" in the columns; the third column is a frequency count).
Depends on the order of the data you inputted.
Do data[order,] and do the same thing, where order = a vector of the names in the order that you'd like.
Here is a very useful resource that I have used: https://jokergoo.github.io/circlize_book/book/the-chorddiagram-function.html
Good luck!

Create nested similarly-named sectors using circlize in R

I'm trying to use the circlize package to create a circos plot in which the outer track has unique sector names (10 names), and within each unique sector, there are 2 categories of file types. The two categories are the same for each of the 10 names (end-goal is to show, through directional links, which files were combined when converting file types).
Here is a simplified version of my code so far, which produces the larger track along with an inner track that shows what I am trying to do (but the "FileType1" and "FileType2" should be in two separate sectors to allow links to and from each).
library(circlize)
fileFrom <- paste0("Category", LETTERS[1:10])
f1 = factor(fileFrom)
circos.initialize(factors = f1, xlim = c(0,1))
# create main track (10 categories)
circos.track(ylim = c(0,1),
panel.fun = function(x, y) {
sector.index = get.cell.meta.data("sector.index")
xcenter = get.cell.meta.data("xcenter")
ycenter = get.cell.meta.data("ycenter")
circos.text(xcenter, ycenter,
sector.index,
niceFacing = TRUE,
cex = 1.3,
facing = "bending.inside")
}
)
# create ICARTT/netCDF track
f2 <- factor(rep(c("ICARTT","netCDF"), 5)) # list of labels
circos.track(ylim = c(0, 1), factors = f1, track.height=0.1,
panel.fun = function(x, y) {
name = "FileType1 FileType2"
xcenter = get.cell.meta.data("xcenter")
ycenter = get.cell.meta.data("ycenter")
circos.text(xcenter, ycenter,
niceFacing = TRUE,
labels=name,
cex=0.6,
facing = bending.inside)}
)
The second track only accepts factors that already exist, so I tried initializing the plot with all 12 categories and only calling the ones relevant to each track, but that left holes in the plot.
I am not sure whether there can be "true" sectors for more than one track, so I tried making either the outer of the inner track a "highlight" (based on this question), but it seems the identical factor names are tripping me up (ending up with a plot of only two sectors).
I also considered combining two separate plots, mentioned in section 6.3 of the circlize book, but I still wouldn't know how to create separate sectors with the same name. I am also not sure how to specify the link sources and destinations (sector.numeric.index maybe?)
Thanks in advance for any help.

Error plotting Kohonen maps in R?

I was reading through this blog post on R-bloggers and I'm confused by the last section of the code and can't figure it out.
http://www.r-bloggers.com/self-organising-maps-for-customer-segmentation-using-r/
I've attempted to recreate this with my own data. I have 5 variables that follow an exponential distribution with 2755 points.
I am fine with and can plot the map that it generates:
plot(som_model, type="codes")
The section of the code I don't understand is the:
var <- 1
var_unscaled <- aggregate(as.numeric(training[,var]),by=list(som_model$unit.classif),FUN = mean, simplify=TRUE)[,2]
plot(som_model, type = "property", property=var_unscaled, main = names(training)[var], palette.name=coolBlueHotRed)
As I understand it, this section of the code is suppose to be plotting one of the variables over the map to see what it looks like but this is where I run into problems. When I run this section of the code I get the warning:
Warning message:
In bgcolors[!is.na(showcolors)] <- bgcol[showcolors[!is.na(showcolors)]] :
number of items to replace is not a multiple of replacement length
and it produces the plot:
Which just some how doesn't look right...
Now what I think it has come down to is the way the aggregate function has re-ordered the data. The length of var_unscaled is 789 and the length of som_model$data, training[,var] and unit.classif are all of length 2755. I tried plotting the aggregated data, the result was no warning but an unintelligible graph (as expected).
Now I think it has done this because unit.classif has a lot of repeated numbers inside it and that's why it has reduced in size.
The question is, do I worry about the warning? Is it producing an accurate graph? What exactly is the "Property"'s section looking for in the plot command? Is there a different way I could "Aggregate" the data?
I think that you have to create the palette color. If you put the argument
coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
and then try to get a plot, for example
plot(som_model, type = "count", palette.name = coolBlueHotRed)
the end is succesful.
This link can help you: http://rgm3.lab.nig.ac.jp/RGM/R_rdfile?f=kohonen/man/plot.kohonen.Rd&d=R_CC
I think that not all of the cells on your map have points inside.
You have 30 by 30 map and about 2700 points. In average it's about 3 points per cell. With high probability some cells have more than 3 points and some cells are empty.
The code in the post on R-bloggers works well when all of the cells have points inside.
To make it work on your data try change this part:
var <- 1
var_unscaled <- aggregate(as.numeric(training[, var]), by = list(som_model$unit.classif), FUN = mean, simplify = TRUE)[, 2]
plot(som_model, type = "property", property = var_unscaled, main = names(training)[var], palette.name = coolBlueHotRed)
with this one:
var <- 1
var_unscaled <- aggregate(as.numeric(data.temp[, data.classes][, var]),
by = list(som_model$unit.classif),
FUN = mean,
simplify = T)
v_u <- rep(0, max(var_unscaled$Group.1))
v_u[var_unscaled$Group.1] <- var_unscaled$x
plot(som_model,
type = "property",
property = v_u,
main = colnames(data.temp[, data.classes])[var],
palette.name = coolBlueHotRed)
Hope it helps.
Just add these functions to your script:
coolBlueHotRed <- function(n, alpha = 1) {rainbow(n, end=4/6, alpha=alpha)[n:1]}
pretty_palette <- c("#1f77b4","#ff7f0e","#2ca02c", "#d62728","#9467bd","#8c564b","#e377c2")

spplot() - make color.key look nice

I'm afraid I have a spplot() question again.
I want the colors in my spplot() to represent absolute values, not automatic values as spplot does it by default.
I achieve this by making a factor out of the variable I want to draw (using the command cut()). This works very fine, but the color-key doesn't look good at all.
See it yourself:
library(sp)
data(meuse.grid)
gridded(meuse.grid) = ~x+y
meuse.grid$random <- rnorm(nrow(meuse.grid), 7, 2)
meuse.grid$random[meuse.grid$random < 0] <- 0
meuse.grid$random[meuse.grid$random > 10] <- 10
# making a factor out of meuse.grid$ random to have absolute values plotted
meuse.grid$random <- cut(meuse.grid$random, seq(0, 10, 0.1))
spplot(meuse.grid, c("random"), col.regions = rainbow(100, start = 4/6, end = 1))
How can I have the color.key on the right look good - I'd like to have fewer ticks and fewer labels (maybe just one label on each extreme of the color.key)
Thank you in advance!
[edit]
To make clear what I mean with absolute values: Imagine a map where I want to display the sea height. Seaheight = 0 (which is the min-value) should always be displayed blue. Seaheight = 10 (which, just for the sake of the example, is the max-value) should always be displayed red. Even if there is no sea on the regions displayed on the map, this shouldn't change.
I achieve this with the cut() command in my example. So this part works fine.
THIS IS WHAT MY QUESTION IS ABOUT
What I don't like is the color description on the right side. There are 100 ticks and each tick has a label. I want fewer ticks and fewer labels.
The way to go is using the attribute colorkey. For example:
## labels
labelat = c(1, 2, 3, 4, 5)
labeltext = c("one", "two", "three", "four", "five")
## plot
spplot(meuse.grid,
c("random"),
col.regions = rainbow(100, start = 4/6, end = 1),
colorkey = list(
labels=list(
at = labelat,
labels = labeltext
)
)
)
First, it's not at all clear what you are wanting here. There are many ways to make the color.key look "nice" and that is to understand first the data being passed to spplot and what is being asked of it. cut() is providing fully formatted intervals like (2.3, 5.34] which will need to be handled a different way, increasing the margins in the plot, specific formatting and spacing for the labels, etc. etc. This just may not be what you ultimately want.
Perhaps you just want integer values, rounded from the input values?
library(sp)
data(meuse.grid)
gridded(meuse.grid) = ~x+y
meuse.grid$random <- rnorm(nrow(meuse.grid), 7, 2)
Round the values (or trunc(), ceil(), floor() them . . .)
meuse.grid$rclass <- round(meuse.grid$random)
spplot(meuse.grid, c("rclass"), col.regions = rainbow(100, start = 4/6, end = 1))

Resources