Create a Space Adjacency Matrix in ggplot - r

I wish to create a Space Adjacency Matrix in R, preferably using ggplot/tidyverse for consistency with other scripts, but I am open to other solutions.
Here is what I am looking for.
A Space Adjacency Matrix is used in interior and architectural design to illustrate relationships (adjacencies) between spaces within a building.
Each space in the building has a relationship (or lack of relationship) to every other space.
The input data is likely formatted similarly to this:
rel.ABCD <- data.frame(
id = c(1,2,3,4,5,6),
x1 = c("A","A","A","B","B","C"),
x2 = c("B","C","D","C","D","D"),
relation = c(1,2,1,3,2,1)
)
rel.ABCD
#> id x1 x2 relation
#> 1 1 A B 1
#> 2 2 A C 2
#> 3 3 A D 1
#> 4 4 B C 3
#> 5 5 B D 2
#> 6 6 C D 1
Created on 2022-04-11 by the reprex package (v2.0.1)
Four spaces (A, B, C and D) exist in the example above. Each has a type of relation with the other spaces. Spaces A and B (id 1) have a relation type of 1, spaces B and C (id 4) have a relation type of 3, etc.
In some ways, the Space Adjacency Matrix is similar to a correlation table (in format, not function), matching lists of entities intersect and the value for the relationship is shown at the intersection. The difference is that instead of labels existing on both the x-axes and y-axes, they exist on the y-axes only, like in the example below from vectorworks.net.
The relation is displayed at the intersecting grid for each room pair (e.g. dispatch and office have a relationship type 5, storage and shop have a relationship type 1, etc.). The relation is typically depicted as an icon, number or fill colour.
How can I generate this graph?

I don't know of any package that implements this. But it is good to keep in mind that you can basically plot anything in ggplot2, as long as you can translate what you're plotting to polygons. That said, here is how you can translate this particular problem to polygons.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.1.3
rel.ABCD <- data.frame(
id = c(1,2,3,4,5,6),
x1 = c("A","A","A","B","B","C"),
x2 = c("B","C","D","C","D","D"),
relation = c(1,2,1,3,2,1)
)
# Encode categorical values as numeric
union <- with(rel.ABCD, union(x1, x2))
rel.ABCD <- transform(
rel.ABCD,
x = match(x1, union),
y = match(x2, union)
)
# Expand observation to rectangle polygons
new <- rel.ABCD[rep(seq_len(nrow(rel.ABCD)), each = 4),]
xpand <- c(-1, -1, 1, 1) * 0.5
ypand <- c(-1, 1, 1, -1) * 0.5
new <- transform(
new,
x = x + xpand,
y = y + ypand
)
# Rotate coordinates 45 degrees
rotmat <- matrix(c(-0.5, 0.5, 0.5, 0.5), ncol = 2)
new[, c("x", "y")] <- t(rotmat %*% t(as.matrix(new[, c("x", "y")])))
# Plot
ggplot(new, aes(x, y, group = id)) +
geom_polygon(aes(fill = factor(relation))) +
scale_y_continuous(breaks = seq_along(union),
labels = union) +
coord_equal()
If you want the labels more like your example, you can also coerce these to polygons.
# Make dataframe for labels
labels <- data.frame(
label = union,
x = 0, y = seq_along(union)
)
# Write offset for label polygons
size <- 1
xoffset <- c(-size, -size, 0, 0.5, 0)
yoffset <- c(-0.5, 0.5, 0.5, 0, -0.5)
# Expand every label to a polygon
labels <- labels[rep(seq_len(nrow(labels)), each = 5), ]
labels <- transform(
labels,
x = x + xoffset,
y = y + yoffset
)
ggplot(new, aes(x, y)) +
geom_polygon(aes(fill = factor(relation), group = id),
colour = "black") +
geom_polygon(data = labels, aes(group = label),
colour = "black", fill = NA) +
annotate(
"text",
x = 0, y = seq_along(union), label = union,
hjust = 1
) +
coord_equal() +
guides(x = "none", y = "none")
Created on 2022-04-11 by the reprex package (v2.0.1)

Related

ggplot2 heatmap with tile height and width as aes()

I'm trying to create a heat map for an OD matrix, but I wanted to scale the rows and columns by certain weights. Since these weights are constant across each category I would expect the plot would keep the rows and columns structure.
# Tidy OD matrix
df <- data.frame (origin = c(rep("A", 3), rep("B", 3),rep("C", 3)),
destination = rep(c("A","B","C"),3),
value = c(0, 1, 10, 5, 0, 11, 15, 6, 0))
# Weights
wdf <- data.frame(region = c("A","B","C"),
w = c(1,2,3))
# Add weights to the data.
plot_df <- df %>%
merge(wdf %>% rename(w_origin = w), by.x = 'origin', by.y = 'region') %>%
merge(wdf %>% rename(w_destination = w), by.x = 'destination', by.y = 'region')
Here's how the data looks like:
> plot_df
destination origin value w_origin w_destination
1 A A 0 1 1
2 A C 15 3 1
3 A B 5 2 1
4 B A 1 1 2
5 B B 0 2 2
6 B C 6 3 2
7 C B 11 2 3
8 C A 10 1 3
9 C C 0 3 3
However, when passing the weights as width and height in the aes() I get this:
ggplot(plot_df,
aes(x = destination,
y = origin)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value),
color = 'black')
It seems to be working for the size of the columns (width), but not quite because the proportions are not the right. And the rows are all over the place and not aligned.
I'm only using geom_tile because I could pass height and width as aesthetics, but I accept other suggestions.
The issue is that your tiles are overlapping. The reason is that while you could pass the width and the heights as aesthetics, geom_tile will not adjust the x and y positions of the tiles for you. As your are mapping a discrete variable on x and y your tiles are positioned on a equidistant grid. In your case the tiles are positioned at .5, 1.5 and 2.5. The tiles are then drawn on these positions with the specified width and height.
This could be easily seen by adding some transparency to your plot:
library(ggplot2)
library(dplyr)
ggplot(plot_df,
aes(x = destination,
y = origin)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value), color = "black", alpha = .2)
To achieve your desired result you have to manually compute the x and y positions according to the desired widths and heights to prevent the overlapping of the boxes. To this end you could switch to a continuous scale and set the desired breaks and labels via scale_x/y_ continuous:
breaks <- wdf %>%
mutate(cumw = cumsum(w),
pos = .5 * (cumw + lag(cumw, default = 0))) %>%
select(region, pos)
plot_df <- plot_df %>%
left_join(breaks, by = c("origin" = "region")) %>%
rename(y = pos) %>%
left_join(breaks, by = c("destination" = "region")) %>%
rename(x = pos)
ggplot(plot_df,
aes(x = x,
y = y)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value), color = "black") +
scale_x_continuous(breaks = breaks$pos, labels = breaks$region, expand = c(0, 0.1)) +
scale_y_continuous(breaks = breaks$pos, labels = breaks$region, expand = c(0, 0.1))
So I think I have a partial solution for you. After playing arround with geom_tile, it appears that the order of your dataframe matters when you are using height and width.
Here is some example code I came up with off of yours (run your code first). I converted your data_frame to a tibble (part of dplyr) to make it easier to sort by a column.
# Converted your dataframe to a tibble dataframe
plot_df_tibble = tibble(plot_df)
# Sorted your dataframe by your w_origin column:
plot_df_tibble2 = plot_df_tibble[order(plot_df_tibble$w_origin),]
# Plotted the sorted data frame:
ggplot(plot_df_tibble2,
aes(x = destination,
y = origin)) +
geom_tile(
aes(
width = w_destination,
height = w_origin,
fill = value),
color = 'black')
And got this plot:
Link to image I made
I should note that if you run the converted tibble before you sort that you get the same plot you posted.
It seems like the height and width arguements may not be fully developed for this portion of geom_tile, as I feel that the order of the df should not matter.
Cheers

Is there a way to make a line plot that connects emperical pairs of words with ggplot2?

Im not sure what the correct name for this type of plot would be, but lets say we have a list of names (or letters here): data <- data.frame(letters[1:10])
Lets also say that we want to illustrate which of these names are connected based on some empirical decision, so we have a list of observations we want to connect in a plot like the following (done in powerpoint):
Can this be done in ggplot?
Yes, it can be done in ggplot.
Let's start by setting up a data frame of letters, with associated positions on the x and y axis of a plot. We'll make the x values 1 and 2 (though this is arbitrary), and the y values 1:10 (also arbitrary, as long as they are evenly spaced)
labels <- data.frame(x = c(rep(1, 10), rep(2, 10)),
y = rep(1:10, 2),
labs = rep(LETTERS[10:1], 2),
stringsAsFactors = FALSE)
Now we also need some way of deciding which letters will be joined. Let's do this by having a simple data frame of "left" and "right" values, where each row describes which two letters will be joined:
set.seed(69)
joins <- data.frame(left = sample(LETTERS[1:10], 6, TRUE),
right = sample(LETTERS[1:10], 6, TRUE),
stringsAsFactors = FALSE)
joins
#> left right
#> 1 A G
#> 2 B B
#> 3 H J
#> 4 G D
#> 5 G J
#> 6 F B
Now we can assign start and end x and y co-ordinates for the lines by matching the letters in these two columns to the columns in our labels data frame:
joins$x <- rep(1.05, nrow(joins))
joins$xend <- rep(1.9, nrow(joins))
joins$y <- labels$y[match(joins$left, labels$labs)]
joins$yend <- labels$y[match(joins$right, labels$labs)]
This just leaves the plot. We want to get rid of all the axes, titles and legends so we use theme_void:
library(ggplot2)
ggplot(labels, aes(x, y)) +
geom_text(aes(label = labs), size = 8) +
geom_segment(data = joins, aes(xend = xend, yend = yend, color = left),
arrow = arrow(type = "closed", length = unit(0.02, "npc"))) +
coord_cartesian(xlim = c(0.5, 2.5)) +
theme_void() +
theme(legend.position = "none")
Created on 2020-07-10 by the reprex package (v0.3.0)
This solution could be tidied up, but gives a start using geom_segment
library(tidyverse)
tibble(x0 = 0, x1 = 1, y0 = sample(letters[1:10]), y1 = sample(letters[1:10])) %>%
mutate(y0 = factor(y0, levels = rev(letters[1:10])),
y1 = factor(y1, levels = rev(letters[1:10]))) %>%
ggplot(aes(x = x0, xend = x1, y = y0, yend = y1)) +
geom_segment(arrow = arrow(length = unit(0.03, "npc"))) +
geom_text(aes(x = x1, y = y1, label = y1), nudge_x = 0.01)

R question: How to mimic this graph to show the percent change of cells in health and disease?

I am interested in showing how the proportions of cells change from health to disease. I wanted to show a 'flow' from health to disease rather than just have two separate stacked bar charts, but I'm unsure if this type of visualization has a name and I have not been able to find many examples online. I would like to do this in R. It's almost a mix between a sankey diagram and a chord diagram.
I was hoping some of you would have some ideas on which packages I could use to achieve this in R.
As Ian Campbell points out in the comments, this is called an alluvial plot, and you can probably get quite close with the ggalluvial package. However, it is possible to get a near-identical recreation of your plot using just geom_ribbon and geom_text from ggplot2:
However, it's a bit tricky to do. First we need a way of producing those nice smooth curves that go from one side to the other. The following function takes the starting and ending levels (as numbers between 0 and 1). It also allows an optional increasing or decreasing the width of the columns on either side:
ribbon_line <- function(p1, p2, width = 10, len = 100)
{
if (width > 50) width <- 50
if (width < 0) width <- 0
if (p1 < 0) p1 <- 0
if (p1 > 1) p1 <- 1
if (p2 < 0) p2 <- 0
if (p2 > 1) p2 <- 1
yvals <- c(p1, p1, pnorm(seq(-2.5, 2.5, length.out = len)) * (p2 - p1) + p1, p2, p2)
xvals <- c(0, seq(width, 100 - width, length.out = len + 2), 100)
list(x = xvals, y = yvals)
}
Now we need a way of combining two lines into a data frame with co-ordinates we can plot:
ribbon_df <- function(uppers, lowers, group, width = 10)
{
data.frame(x = ribbon_line(uppers[1], uppers[2], width)$x,
ymax = ribbon_line(uppers[1], uppers[2], width)$y,
ymin = ribbon_line(lowers[1], lowers[2], width)$y,
group = group, stringsAsFactors = FALSE)
}
Next, we need a method of taking a simple input and turning it into a group of these ribbons, plus left and right columns, plus text labels:
multi_ribbons <- function(left_bottom, right_bottom, left_top, right_top,
groups, width = 10)
{
if (length(left_bottom) != length(right_bottom) |
length(left_bottom) != length(left_top) |
length(left_top) != length(right_top))
stop("Left and right columns different length")
if (length(groups) != length(left_bottom))
stop("Group length has to be same length as columns")
d <- lapply(seq_along(groups), function(i) {
ribbon_df(c(left_top[i], right_top[i]),
c(left_bottom[i], right_bottom[i]),
groups[i], width)})
left_cols <- lapply(d, function(x) x[1:2,])
right_cols <- lapply(d, function(x) x[nrow(x) - 1:0,])
res <- list( left = do.call(rbind, left_cols),
right = do.call(rbind, right_cols),
bands = do.call(rbind, d))
text_y <- c((res$left$ymax + res$left$ymin)/2,
(res$right$ymax + res$right$ymin)/2)
text_x <- c(rep(width / 2, length(res$left$x)),
rep(100 - width/2, length(res$left$x)))
text_labels <- paste0(round(c(res$left$ymax - res$left$ymin,
res$right$ymax - res$right$ymin), 3) * 100, "%")
res$text <- data.frame(x = text_x, y = text_y, labels = text_labels)
res
}
Finally, we want a way of taking our data as a simple pair of factor vectors and using the above functions to plot them:
alluvial <- function(yvar, xvar, width = 20)
{
tab <- table(yvar, xvar)
x_labs <- rownames(tab)
y_labs <- colnames(tab)
left <- tab[1,]/sum(tab[1,])
left <- cumsum(sort(left))
right <- tab[2,]/sum(tab[2,])
right <- cumsum(sort(right))
left_lower <- c(0, left[-length(left)])
names(left_lower) <- names(left)
right_lower <- c(0, right[-length(right)])
names(right_lower) <- names(right)
right <- right[match(names(left), names(right))]
right_lower <- right_lower[match(names(left), names(right_lower))]
df_list <- multi_ribbons(left_lower, right_lower, left, right,
names(left), width = 20)
ggplot(df_list$bands, aes(x = x, ymin = ymin, ymax = ymax, fill = group)) +
geom_ribbon(alpha = 0.5) +
geom_ribbon(alpha = 1, data = df_list$left) +
geom_ribbon(alpha = 1, data = df_list$right) +
geom_text(data = df_list$text, inherit.aes = FALSE, colour = "white",
aes(x = x, y = y, label = labels), size = 8) +
geom_text(data = data.frame(x = c(width / 2, 100 - width /2), y = c(1.05, 1.05),
labels = factor(x_labs, levels = x_labs)),
inherit.aes = FALSE,
mapping = aes(x = x, y = y, label = labels), size = 12) +
geom_text(data = data.frame(x = rep(-5, length(y_labs)),
y = unique(df_list$text$y[1:(nrow(df_list$text)/2)]),
labs = unique(df_list$bands$group)),
mapping = aes(x = x, y = y, colour = labs, label = labs),
inherit.aes = FALSE, size = 8, hjust = 1) +
scale_fill_manual(values = c("#e64b35", "#806249", "#00a087", "#3c5488")) +
scale_colour_manual(values = c("#e64b35", "#806249", "#00a087", "#3c5488")) +
coord_cartesian(xlim = c(-15, 101)) +
theme_void() + theme(legend.position = "none")
}
So, if we you data frame is in a format like this:
head(df, 20)
#> condition variable
#> 110 Disease Immune
#> 149 Disease Fibroblast
#> 133 Disease Immune
#> 184 Disease Endothelial
#> 137 Disease Immune
#> 200 Disease Endothelial
#> 30 Health Immune
#> 11 Health Immune
#> 63 Health Fibroblast
#> 88 Health Endothelial
#> 42 Health Fibroblast
#> 38 Health Fibroblast
#> 106 Disease Immune
#> 139 Disease Immune
#> 6 Health Epithelial
#> 21 Health Immune
#> 27 Health Immune
#> 181 Disease Endothelial
#> 95 Health Endothelial
#> 108 Disease Immune
You can just do:
alluvial(df$condition, df$variable)
To get the above plot, or, for something more random:
set.seed(69)
alluvial(sample(c(TRUE, FALSE), 200, replace = TRUE),
sample(LETTERS[1:4], 200, replace = TRUE))
If you want more than four colour or fill levels, you can remove or adjust the scale_colour_manual and scale_fill_manual calls, to get, for example:
set.seed(69)
alluvial(sample(c(TRUE, FALSE), 200, replace = TRUE),
sample(LETTERS[1:20], 200, replace = TRUE))

How to draw the following graph in R? And what are these types of graphs called?

I am trying to present the following data
x <- factor(c(1,2,3,4,5))
x
[1] 1 2 3 4 5
Levels: 1 2 3 4 5
value <- c(10,5,7,4,12)
value
[1] 10 5 7 4 12
y <- data.frame(x, value)
y
x value
1 1 10
2 2 5
3 3 7
4 4 4
5 5 12
I want to convert the above information into the following graphical representation
What is the above type of graphs called. I checked out dot plot, but that only stacks vertically.
This solution plots sets of three bar graphs facetted by x. The height of the bars within each set is determined using the remainder from dividing value by 3. Horizontal spacing is provided by natural geom spacing. Vertical spacing is created using white gridlines.
library(ggplot2)
library(reshape2)
Data
dataset <- data.frame('x' = 1:5, 'value' = c(10, 5, 7, 4, 12))
Since every value is supposed to be represented by three bars, we will add 3 columns to the dataset and distribute the magnitude of the value among them using integer division:
dataset[, c('col1', 'col2', 'col3')] <- floor(dataset$value / 3)
r <- dataset$value %% 3
dataset[r == 1, 'col1'] <- dataset[dataset$value %% 3 == 1, 'col1'] + 1
dataset[r == 2, c('col1', 'col2')] <- dataset[r == 2, c('col1', 'col2')] + 1
Now, we will melt the dataframe for the purposes of plotting:
dataset <- melt(dataset, id.vars = c('x', 'value'))
colnames(dataset)[4] <- 'magnitude' # avoiding colnames conflict
dataset$variable <- as.character(dataset$variable) # column ordering within a facet
Plot
First, we will make a regular bar graph. We can move facet labels to the bottom of the plot area using the switch parameter.
plt <- ggplot(data = dataset)
plt <- plt + geom_col(aes(x=variable, y = magnitude), fill = 'black')
plt <- plt + facet_grid(.~x, switch="both")
Then we will use theme_minimal() and add a few tweaks to the parameters that govern the appearance of gridlines. Specifically, we will make sure that minor XY gridlines and major X gridlines are blank, whereas major Y gridlines are white and plotted on top of the data.
plt <- plt + theme_minimal()
plt <- plt + theme(panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(colour = "white", size = 1.5),
panel.grid.minor = element_blank(),
panel.ontop = TRUE)
We can add value labels using geom_text(). We will only use x values from col2 records such that we're not plotting the value over each bar within each set (col2 happens to be the middle bar).
plt <- plt + geom_text(data = dataset[dataset$variable == 'col2', ],
aes(label = value, x = variable, y = magnitude + 0.5))
plt <- plt + theme(axis.text.x=element_blank()) # removing the 'col' labels
plt + xlab('x') + ylab('value')
The following code will do a graph similar to the one in the question.
I had to change the data.frame, yours was not fit to graph with geom_dotplot. The new variable z$value is a vector of the values 1:5 each repeated as many times as value.
library(ggplot2)
value <- c(10, 5, 7, 4, 12)
z <- sapply(value, function(v) c(1, rep(0, v - 1)))
z <- cumsum(unlist(z))
z <- data.frame(value = z)
ggplot(z, aes(x = jitter(value))) +
geom_dotplot() +
xlab("value")

How to find the coordinates of the labels in `dotchart` in R?

I was wondering if I could find the coordinates (i.e., x, y) of the labels (as circled in blue in the picture below) in a dotchart in R?
y = rnorm(20)
groups = factor( rep(1:2, times = c(5, 15) ) )
dotchart(y, groups = groups)
Update: I'm also asking what is y coordinate of the point exactly between double-headed arrow in the picture below (suppose I know the x but I want the point to be in between the double-headed arrow, so what is the y so I can put the point in the area between the double-headed arrow):
Looking inside the dotchart function, one can see that group labels are written using mtext:
mtext(glabels, side = 2, line = goffset, at = gpos, adj = 0,
col = "red", las = 2, cex = 1)
where gpos is the vector of positions of the group labels, calculated by:
gpos <- rev(cumsum(rev(tapply(groups, groups, length)) + 2) - 1)
#########
1 2
23 16
Below we try to print group labels exactly in the same positions (and red color) of the group labels printed by dotcharts:
graphics.off()
set.seed(1)
y = rnorm(20)
groups = factor( rep(1:2, times = c(5, 15) ) )
dotchart(y, groups = groups)
glabels <- levels(groups)
linch <- 0
ginch <- max(strwidth(glabels, "inch"), na.rm = TRUE)
goffset <- 0.4
nmai <- par("mai")
nmai[2L] <- nmai[4L] + max(linch + goffset, ginch) + 0.1
par(mai = nmai)
lheight <- par("csi")
gpos <- rev(cumsum(rev(tapply(groups, groups, length)) + 2) - 1)
ginch <- max(strwidth(glabels, "inch"), na.rm = TRUE)
goffset <- (max(linch + 0.2, ginch, na.rm = TRUE) + 0.1)/lheight
mtext(glabels, side = 2, line = goffset, at = gpos, adj = 0,
col = "red", las = 2, cex = 1)
EDIT.
Download from this link a modified version of dotchart and save it in your working directory as mydotchart.r
Then type this code:
source("mydotchart.r")
set.seed(1)
y = rnorm(20)
groups = factor( rep(1:2, times = c(5, 15) ) )
mydotchart(y, groups = groups)
The function mydotchart.r gives the following output:
$gpos
1 2
23 16
$linepos
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 18 19 20 21 22
where gpos is the y-position of group labels and linepos is the vector of y-positions of the horizontal dotted grey lines.
Using linepos it is possibile to calculate the position between the double-headed arrow in the question above.
Hope it can help you.

Resources