Create an adjacency matrix from unbalanced trade flow data in R - r

I have a dataset of bilateral trade flows of dimension 84x244.
How can I balance the dataset to look like a 244x244 matrix but keeping the same order and names as the columns?
Non-symmetric matrix
For example the matrix resembles:
A B C D
B 0 0 0 1
D 2 0 0 0
and it should look like
A B C D
A 0 0 0 0
B 0 0 0 1
C 0 0 0 0
D 2 0 0 0
With A B C D as row and column names

Here are two methods that ensure the column names and row names are effectively the same, using a default value of 0 for missing rows/columns. These do not assume that the columns are always full; if this is guaranteed, then you can ignore the column-adding portions.
Both start with:
m <- as.matrix(read.table(header=TRUE, text="
A B C D
B 0 0 0 1
D 2 0 0 0"))
First
needrows <- setdiff(colnames(m), rownames(m))
m <- rbind(m, matrix(0, nrow=length(needrows), ncol=ncol(m), dimnames=list(needrows, colnames(m))))
needcols <- setdiff(rownames(m), colnames(m))
m <- cbind(m, matrix(0, nrow=nrow(m), ncol=length(needcols), dimnames=list(rownames(m), needcols)))
m
# A B C D
# B 0 0 0 1
# D 2 0 0 0
# A 0 0 0 0
# C 0 0 0 0
And to order the rows same as the columns ... note that if there are row names not present in the column names, they will be removed in this, though you can include them with another setdiff if needed.
m[colnames(m),]
# A B C D
# A 0 0 0 0
# B 0 0 0 1
# C 0 0 0 0
# D 2 0 0 0
Second
allnames <- sort(unique(unlist(dimnames(m))))
m2 <- matrix(0, nrow=length(allnames), ncol=length(allnames),
dimnames=list(allnames, allnames))
m2[intersect(rownames(m), allnames), colnames(m)] <-
m[intersect(rownames(m), allnames), colnames(m)]
m2[rownames(m), intersect(colnames(m), allnames)] <-
m[rownames(m), intersect(colnames(m), allnames)]
m2
# A B C D
# A 0 0 0 0
# B 0 0 0 1
# C 0 0 0 0
# D 2 0 0 0

Here is a base R solution. The basic idea is that, you first construct a square matrix will all zeros and assign row names with its column names, and then assign value to the rows according to row names, i.e.,
M <- `dimnames<-`(matrix(0,nrow = ncol(m),ncol = ncol(m)),
replicate(2,list(colnames(m))))
M[rownames(m),] <- m
such that
> M
A B C D
A 0 0 0 0
B 0 0 0 1
C 0 0 0 0
D 2 0 0 0

Related

How to quantify observation pairs in individuals

I'm looking for a way to quantify observation pairs in individuals (patients). In this example I have patients who each had two different diseases. The couple of disease(that is, in the same individuals) "a" and "b" is repeated 4 times, for example, in patients "G", "H", "I" and "J" and the couple "k" and "o" is repeated twice (patient "D" has done diseases "k" and "o" and patient "E" has also done these two diseases).
Patient_ID<- c("A","A","B","B","C","C","D","D","E","E","F","F",
"G","G","H","H","I","I","J","J")
Disease<-c("v","s","s","v","s","v" ,"k","o","k","o","o","s","a","b",
"a","b","b","a","b","a")
DATA<-data.frame(Patient_ID,Disease)
print(DATA)
Patient_ID Disease
1 A v
2 A s
3 B s
4 B v
5 C s
6 C v
7 D k
8 D o
9 E k
10 E o
11 F o
12 F s
13 G a
14 G b
15 H a
16 H b
17 I b
18 I a
19 J b
20 J a
With these statistics I would like to generate such a table below.
a b k o v s
a 0 4 0 0 0 0
b 4 0 0 0 0 0
k 0 0 0 2 0 0
o 0 0 2 0 0 1
v 0 0 0 0 0 3
s 0 0 0 1 3 0
Then generate a table for only levels that have count above a certain threshold (for example 2) like in the second table (below).
a b v s
a 0 4 0 0
b 4 0 0 0
v 0 0 0 3
s 0 0 3 0
Here is a base R option using table+crossprod, i.e.,
res <- `diag<-`(crossprod(table(DATA)),0)
which gives
> res
Disease
Disease a b k o s v
a 0 4 0 0 0 0
b 4 0 0 0 0 0
k 0 0 0 2 0 0
o 0 0 2 0 1 0
s 0 0 0 1 0 3
v 0 0 0 0 3 0
For the subset by given threshold, you can use
th <- 2
inds <- rowSums(res > th)>0
subset_res <- subset(res,inds,inds)
which gives
> subset_res
Disease
Disease a b s v
a 0 4 0 0
b 4 0 0 0
s 0 0 0 3
v 0 0 3 0
At first, use unstack() to transform Disease to a data frame with 2 columns. Remember to make both columns have equal levels. This step is to prevent dropping levels in the following operation. Then input the data frame into table() and it'll create a contingency table. In this table, "a & b" and "b & a" are different. To compute the total counts, you need tab + t(tab).
pair <- data.frame(t(unstack(DATA, Disease ~ Patient_ID)))
pair[] <- lapply(pair, factor, levels = levels(DATA$Disease))
tab <- table(pair)
tab + t(tab)
# X2
# X1 a b k o s v
# a 0 4 0 0 0 0
# b 4 0 0 0 0 0
# k 0 0 0 2 0 0
# o 0 0 2 0 1 0
# s 0 0 0 1 0 3
# v 0 0 0 0 3 0

in R: Two Way Match to Matrix

If I have a data.frame
df <- data.frame(DEP=letters[1:5], ARR=letters[11:15], NO=1:5+5)
DEP ARR NO
1 a k 6
2 b l 7
3 c m 8
4 d n 9
5 e o 10
I want to create a matrix of DEP as ROW ID, and ARR as COL ID, and fill in the matrix with the relevant matching NO...
e.g.
k l m n o
a 6 7 8 9 10 ...etc
Each combination is unique.
DEP and ARR are the same vector of names. I have chosen two different sample ones here for clarity.
I am struggling to use match to sort them and fill them into the matrix template I created below:
mat <- matrix(0,nrow(df),nrow(df)); colnames(mat) <- df$ARR; rownames(mat) <- df$DEP;
k l m n o
a 0 0 0 0 0
b 0 0 0 0 0
c 0 0 0 0 0
d 0 0 0 0 0
e 0 0 0 0 0
Is there an efficient way of doing this? Many thanks for all advice!
?xtabs:
xtabs(NO ~ ., data=df)
# ARR
#DEP k l m n o
# a 6 0 0 0 0
# b 0 7 0 0 0
# c 0 0 8 0 0
# d 0 0 0 9 0
# e 0 0 0 0 10
If I understood your question correctly, you could use a sparse matrix definition:
library(Matrix)
mat <- spMatrix(length(df$DEP), length(df$ARR),
seq(df$DEP), seq(df$ARR), as.numeric(as.character(df$NO)))
rownames(mat) <- df$DEP
colnames(mat) <- df$ARR
#> as.matrix(mat)
# k l m n o
#a 6 0 0 0 0
#b 0 7 0 0 0
#c 0 0 8 0 0
#d 0 0 0 9 0
#e 0 0 0 0 10

Populating data from one data.table to another

I have a distance matrix (as data.table) showing pairwise distances between a number of items, but not all items are in the matrix. I need to create a larger data.table that has all the missing items populated. I can do this with matrices fairly easily:
items=c("a", "b", "c", "d")
small_matrix=matrix(c(0, 1, 2, 3), nrow=2, ncol=2,
dimnames=list(c("a", "b"), c("a", "b")))
# create zero matrix of the right size
full_matrix <- matrix(0, ncol=length(items), nrow=length(items),
dimnames=list(items, items))
# populate items from the small matrix
full_matrix[rownames(small_matrix), colnames(small_matrix)] <- small_matrix
full_matrix
# a b c d
# a 0 2 0 0
# b 1 3 0 0
# c 0 0 0 0
# d 0 0 0 0
What is the equivalent of that in data.table? I can create an 'id' column in small_DT and use it as the key, but I'm not sure how to overwrite items in full_DT that has the same id/column pair.
Let's convert to data.table and keep the row names as an extra column:
dts = as.data.table(small_matrix, keep = T)
# rn a b
#1: a 0 2
#2: b 1 3
dtf = as.data.table(full_matrix, keep = T)
# rn a b c d
#1: a 0 0 0 0
#2: b 0 0 0 0
#3: c 0 0 0 0
#4: d 0 0 0 0
Now just join on the rows, and assuming small matrix is always a subset you can do the following:
dtf[dts, names(dts) := dts, on = 'rn']
dtf
# rn a b c d
#1: a 0 2 0 0
#2: b 1 3 0 0
#3: c 0 0 0 0
#4: d 0 0 0 0
Above assumes version 1.9.5+. Otherwise you'll need to set the key first.
Suppose you have these two data.table:
dt1 = as.data.table(small_matrix)
# a b
#1: 0 2
#2: 1 3
dt2 = as.data.table(full_matrix)
# a b c d
#1: 0 0 0 0
#2: 0 0 0 0
#3: 0 0 0 0
#4: 0 0 0 0
You can't operate like with data.frame or matrix, eg by doing:
dt2[rownames(full_matrix) %in% rownames(small_matrix), names(dt1), with=F] <- dt1
This code will raise an error, because to affect new values, you need to use the := operator:
dt2[rownames(full_matrix) %in% rownames(small_matrix), names(dt1):=dt1][]
# a b c d
#1: 0 2 0 0
#2: 1 3 0 0
#3: 0 0 0 0
#4: 0 0 0 0

Delete columns from a square matrix that sum to zero along with corresponding rows

I have a binary transition matrix. I want to delete rows associated with columns that sum to zero. For example, if
A B C D E
A 0 0 0 1 0
B 1 0 0 1 0
C 0 0 1 1 0
D 0 0 1 0 0
E 0 0 1 1 0
column B and E sum to zero. I know how to get rid of the columns like this,
> a.adj=a[,!!colSums(a)]
> a.adj
A C D
A 0 0 1
B 1 0 1
C 0 1 1
D 0 1 0
E 0 1 1
but how can I at the same time delete rows B and E to get
A C D
A 0 0 1
C 0 1 1
D 0 1 0
If the rownames and colnames are in the same order
indx <- !!colSums(a)
a[indx,indx]
# A C D
#A 0 0 1
#C 0 1 1
#D 0 1 0
Use names to select both columns and rows
> ind <- colnames(a[,!!colSums(a)])
> a[ind, ind]
A C D
A 0 0 1
C 0 1 1
D 0 1 0

converting data frame into affiliation network in R

I have a data frame with the following format:
name workplace
a A
b B
c A
d C
e D
....
I would like to convert this data frame into an affiliation network in R with the format
A B C D ...
a 1 0 0 0
b 0 1 0 0
c 1 0 0 0
d 0 0 1 0
e 0 0 0 1
...
and I used the following program:
for (i in 1:nrow(A1)) {
a1[rownames(a1) == A1$name[i],
colnames(a1) == A1$workplace[i]] <- 1
}
where A1 is the data frame, and a1 is the affiliation network. However, since I have a large data frame, the above program runs very slow. Is there an efficient way that avoids looping in data conversion?
Thank you very much!
If your data called df just do:
as.data.frame.matrix(table(df))
# A B C D
# a 1 0 0 0
# b 0 1 0 0
# c 1 0 0 0
# d 0 0 1 0
# e 0 0 0 1
May be this also helps:
m1 <- model.matrix(~0+workplace, data=dat)
dimnames(m1) <- lapply(dat, unique)
as.data.frame(m1)
# A B C D
#a 1 0 0 0
#b 0 1 0 0
#c 1 0 0 0
#d 0 0 1 0
#e 0 0 0 1

Resources