Assigning values to an empty adjacency matrix based on matching column values - r

I have an nxn dataset, say 5X5 data set.
ALPHA BETA GAMMA DELTA EPSILON
A B A X 1
B C 3 X 3
C D E Z 4
D A D X 5
E A 2 Z 2
I use column “ALPHA” to create an empty adjacency matrix (Aij),
A B C D E
A 0 0 0 0 0
B 0 0 0 0 0
C 0 0 0 0 0
D 0 0 0 0 0
E 0 0 0 0 0
I want to reassign Adjacency matrix values to 1 or 0 based on the matched values of column “DELTA” such that, if “DELTA” matches we set Aij=1 and 0 otherwise. That is, we will have a new adjacency matrix that looks like the following,
A B C D E
A 0 1 0 1 0
B 1 0 0 1 0
C 0 0 0 0 1
D 1 1 0 0 0
E 0 0 1 0 0
What loop command can or matching technique can I use to assign the new values?
Thanks.
Phil

A loop could work. You have A(i=j) as 0 in your example so I subtracted a diagonal matrix
DELTA<-c("X","X","Z","X","Z")
Adj<-mat.or.vec(nr=length(DELTA), nc=length(DELTA))
for (i in 1:length(DELTA)){
Adj[i,DELTA==DELTA[i]]<-1
}
Adj<-Adj-diag(length(DELTA))

You could use outer
res <- +(outer(df1$DELTA, df1$DELTA, FUN='=='))*!diag(dim(df1)[1])
dimnames(res) <- rep(list(df1$ALPHA),2)
res
# A B C D E
#A 0 1 0 1 0
#B 1 0 0 1 0
#C 0 0 0 0 1
#D 1 1 0 0 0
#E 0 0 1 0 0
Or
sapply(df1$DELTA, `==`, df1$DELTA) - diag(dim(df1)[1])
data
df1 <- structure(list(ALPHA = c("A", "B", "C", "D", "E"), BETA = c("B",
"C", "D", "A", "A"), GAMMA = c("A", "3", "E", "D", "2"), DELTA = c("X",
"X", "Z", "X", "Z"), EPSILON = c(1L, 3L, 4L, 5L, 2L)), .Names = c("ALPHA",
"BETA", "GAMMA", "DELTA", "EPSILON"), class = "data.frame",
row.names = c(NA, -5L))

Related

Transform relationship pairs into a matrix

I have one data frame like this. The id of each line is unique and the type defines the group of the id.
id type
a a1
b a1
c a2
d a3
e a4
f a4
I want to make a matrix like below. The value would be 1 if the two id belong to the same type, otherwise 0.
a b c d e f
a 1 1 0 0 0 0
b 1 1 0 0 0 0
c 0 0 1 0 0 0
d 0 0 0 1 0 0
e 0 0 0 0 1 1
f 0 0 0 0 1 1
The data frame is large (over 70 thousands line), and I do not know how to do this efficiently in R. Any suggestions would be appreciated.
Here is a base R solution, and I think you can use the following code
M <- crossprod(t(table(df)))
or
M <- crossprod(table(rev(df)))
such that
> M
id
id a b c d e f
a 1 1 0 0 0 0
b 1 1 0 0 0 0
c 0 0 1 0 0 0
d 0 0 0 1 0 0
e 0 0 0 0 1 1
f 0 0 0 0 1 1
DATA
df <- structure(list(id = c("a", "b", "c", "d", "e", "f"), type = c("a1",
"a1", "a2", "a3", "a4", "a4")), class = "data.frame", row.names = c(NA,
-6L))

Adjacency Matrix from source target dataset

I have a dataset as follows
Var1 Var2 Count
A B 3
A C 4
A D 10
A L 6
I need to create an adjacency matrix for usage downstream in creating a chord diagram. I am looking for an efficient way to get it.
A B C D L
A 0 3 4 10 6
B 3 0 0 0 0
C 4 0 0 0 0
D 10 0 0 0 0
L 6 0 0 0 0
I am looking for a visualization as follows
Assuming you're talking about just the symmetric matrix generation:
dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
Var1 Var2 Count
A B 3
A C 4
A D 10
A L 6')
vars <- sort(unique(unlist(dat[c("Var1","Var2")])))
m <- matrix(0, nr=length(vars), nc=length(vars), dimnames=list(vars,vars))
m[as.matrix(dat[c("Var1","Var2")])] <- m[as.matrix(dat[c("Var2","Var1")])] <- dat$Count
m
# A B C D L
# A 0 3 4 10 6
# B 3 0 0 0 0
# C 4 0 0 0 0
# D 10 0 0 0 0
# L 6 0 0 0 0
Here is an option using xtabs. Convert the first two column to factor with levels specified in the order we want in the output. Then, use xtabs to get a matrix output, transpose the output and add to the original matrix to get the expected output
dat[1:2] <- lapply(dat[1:2], factor, levels = c("A", "B", "C", "D", "L"))
out <- xtabs(Count ~ Var1 + Var2, dat)
out + t(out)
# Var2
#Var1 A B C D L
# A 0 3 4 10 6
# B 3 0 0 0 0
# C 4 0 0 0 0
# D 10 0 0 0 0
# L 6 0 0 0 0
data
dat <- structure(list(Var1 = c("A", "A", "A", "A"), Var2 = c("B", "C",
"D", "L"), Count = c(3L, 4L, 10L, 6L)), class = "data.frame",
row.names = c(NA, -4L))

Loop through a dataframe: counting each pairwise combination of a value for each unique variable.

I have a dataframe called "df" like this:
ID Value
1 a
1 b
1 c
1 d
3 a
3 b
3 e
3 f
. .
. .
. .
I have a matrix filled with zeros like this:
a b c d e f
a x 0 0 0 0 0
b 0 x 0 0 0 0
c 0 0 x 0 0 0
d 0 0 0 x 0 0
e 0 0 0 0 x 0
f 0 0 0 0 0 x
I would then like to loop through the dataframe something like this:
for each ID, for each value i, for each value j != i, matrix[i,j] += 1
So for each ID, for each combination of values, I would like to raise the value in the matrix by 1, resulting in:
a b c d e f
a x 2 1 1 1 1
b 2 x 1 1 1 1
c 1 1 x 1 0 0
d 1 1 1 x 0 0
e 1 1 0 0 x 1
f 1 1 0 0 1 x
So for example, [a,b] = 2, because this combination of values occurs for two different IDs, while [a,c] = 1, because this combination of values only occurs when ID = 1 and not when ID = 3.
How can I achieve this? I already made a vector containing the unique IDs.
Thanks in advance.
The easiest would be to get the table and then do a crossprod
out <- crossprod(table(df))
diag(out) <- NA #replace the diagonals with NA
names(dimnames(out)) <- NULL #set the names of the dimnames as NULL
out
# a b c d e f
#a NA 2 1 1 1 1
#b 2 NA 1 1 1 1
#c 1 1 NA 1 0 0
#d 1 1 1 NA 0 0
#e 1 1 0 0 NA 1
#f 1 1 0 0 1 NA
data
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L), Value = c("a",
"b", "c", "d", "a", "b", "e", "f")), .Names = c("ID", "Value"
), class = "data.frame", row.names = c(NA, -8L))

How to create dichotomous variables based on some factors in r?

The initial dataframe is:
Factor1 Factor2 Factor3
A B C
B C NA
A NA NA
B C D
E NA NA
I want to create 5 dichotomous variables based on the above factor variables. The rule should be the new variable A will get 1 if either Factor1 or Factor2 or Factor3 contains an A otherwise A should be 0, and so on. The newly created variables should look like:
A B C D E
1 1 1 0 0
0 1 1 0 0
1 0 0 0 0
0 1 1 1 0
0 0 0 0 1
We can use table to do this. We replicate the sequence of rows with the number of columns, unlist the dataset and get the frequency of values.
table(rep(1:nrow(df1), ncol(df1)), unlist(df1))
# A B C D E
# 1 1 1 1 0 0
# 2 0 1 1 0 0
# 3 1 0 0 0 0
# 4 0 1 1 1 0
# 5 0 0 0 0 1
If we have more than 1 value per row, then convert to logical and then reconvert it back to binary.
+(!!table(rep(1:nrow(df1), ncol(df1)), unlist(df1)))
data
df1 <- structure(list(Factor1 = c("A", "B", "A", "B", "E"),
Factor2 = c("B",
"C", NA, "C", NA), Factor3 = c("C", NA, NA, "D", NA)),
.Names = c("Factor1",
"Factor2", "Factor3"), class = "data.frame", row.names = c(NA, -5L))

In an NxN matrix give value "True" to pairs from data frame

I have created a 5x5 matrix where rows and columns have the same names and a data frame with name pairs:
N <- 5
Names <- letters[1:N]
mat <- matrix(rep(0, N*N), nrow = N, ncol = N, dimnames = list(Names, Names))
a b c d e
a 0 0 0 0 0
b 0 0 0 0 0
c 0 0 0 0 0
d 0 0 0 0 0
e 0 0 0 0 0
The data frame then consist of different pairs:
col1 col2
1 a c
2 c b
3 d b
4 d e
How can I match these in so that col1 only refers to rows in my matrix and col2 only to columns? The above should compute to the following result:
a b c d e
a 0 0 1 0 0
b 0 0 0 0 0
c 0 1 0 0 0
d 0 1 0 0 1
e 0 0 0 0 0
You can use match to create a "key" of which combinations need to be replaced with 1, like this:
key <- vapply(seq_along(mydf),
function(x) match(mydf[[x]],
dimnames(mat)[[x]]),
numeric(nrow(mydf)))
Then, use matrix indexing to replace the relevant values.
mat[key] <- 1
mat
a b c d e
a 0 0 1 0 0
b 0 0 0 0 0
c 0 1 0 0 0
d 0 1 0 0 1
e 0 0 0 0 0
You could also do:
mat[as.matrix(d1)] <- 1
mat
# a b c d e
#a 0 0 1 0 0
#b 0 0 0 0 0
#c 0 1 0 0 0
#d 0 1 0 0 1
#e 0 0 0 0 0
data
d1 <- structure(list(col1 = c("a", "c", "d", "d"), col2 = c("c", "b",
"b", "e")), .Names = c("col1", "col2"), class = "data.frame",
row.names = c("1", "2", "3", "4"))

Resources