Transform relationship pairs into a matrix

Transform relationship pairs into a matrix - r

I have one data frame like this. The id of each line is unique and the type defines the group of the id.
id type
a a1
b a1
c a2
d a3
e a4
f a4
I want to make a matrix like below. The value would be 1 if the two id belong to the same type, otherwise 0.
a b c d e f
a 1 1 0 0 0 0
b 1 1 0 0 0 0
c 0 0 1 0 0 0
d 0 0 0 1 0 0
e 0 0 0 0 1 1
f 0 0 0 0 1 1
The data frame is large (over 70 thousands line), and I do not know how to do this efficiently in R. Any suggestions would be appreciated.

Here is a base R solution, and I think you can use the following code
M <- crossprod(t(table(df)))
or
M <- crossprod(table(rev(df)))
such that
> M
id
id a b c d e f
a 1 1 0 0 0 0
b 1 1 0 0 0 0
c 0 0 1 0 0 0
d 0 0 0 1 0 0
e 0 0 0 0 1 1
f 0 0 0 0 1 1
DATA
df <- structure(list(id = c("a", "b", "c", "d", "e", "f"), type = c("a1",
"a1", "a2", "a3", "a4", "a4")), class = "data.frame", row.names = c(NA,
-6L))

Related

R - Create matrix from 3 raw vector

I have 3 vectors as the following:
A <- c("A", "B", "C", "D", "E")
B <- c("1/1/1", "1/1/1", "2/1/1", "2/1/1", "3/1/1")
C <- c(1, 1, -1, 1, -1)
and I want to create a matrix like the following using these 3 vectors:
- 1/1/1 2/1/1 3/1/1
A 1 0 0
B 1 0 0
C 0 -1 0
D 0 1 0
E 0 0 -1
where vector A and B are rows and columns respectively and I have the data as C.
Any help would be appreciated.

Use ?xtabs
xtabs(C ~ A+B)
# B
#A 1/1/1 2/1/1 3/1/1
# A 1 0 0
# B 1 0 0
# C 0 -1 0
# D 0 1 0
# E 0 0 -1

You can try:
`[<-`(array(0,c(length(unique(A)),length(unique(B))),
list(unique(A),unique(B))),
cbind(A,B),C)
# 1/1/1 2/1/1 3/1/1
#A 1 0 0
#B 1 0 0
#C 0 -1 0
#D 0 1 0
#E 0 0 -1

Another option is acast from reshape2 after creating a data.frame
library(reshape2)
acast(data.frame(A, B, C), A~B, value.var = "C", fill =0)
# 1/1/1 2/1/1 3/1/1
#A 1 0 0
#B 1 0 0
#C 0 -1 0
#D 0 1 0
#E 0 0 -1

Reshape a data frame into a wide shape

The data contains two variables: id and grade. Each id can have multiple records
for each grade.
dat <- data.frame(id = c(1,1,1,2,2,2,2,3,3,4,5,5,5),
grade = c("a", "b", "c", "a", "a", "b", "b", "d", "f", "c", "a", "e", "f"))
I want to reshape the data into a wide shape such that each id has only one record
and each unique grade becomes a single column. The value of each column is either 0 or 1,
depending on the grades for each id.
The final data set looks like:
id a b c d e f
1 1 1 1 0 0 0
2 1 1 0 0 0 0
3 0 0 0 1 0 1
4 0 0 1 0 0 0
5 1 0 0 0 1 1
I tried this, but no luck.
n.dat <- reshape(dat, timevar = "grade",idvar = c("id"),direction = "wide")

You could simply table the values, then convert to logical based on > 0 condition and then convert back to numeric using the + unary operator (or if you want less golfed, by simply + 0)
+(table(dat) > 0)
# grade
# id a b c d e f
# 1 1 1 1 0 0 0
# 2 1 1 0 0 0 0
# 3 0 0 0 1 0 1
# 4 0 0 1 0 0 0
# 5 1 0 0 0 1 1

Populating data from one data.table to another

I have a distance matrix (as data.table) showing pairwise distances between a number of items, but not all items are in the matrix. I need to create a larger data.table that has all the missing items populated. I can do this with matrices fairly easily:
items=c("a", "b", "c", "d")
small_matrix=matrix(c(0, 1, 2, 3), nrow=2, ncol=2,
dimnames=list(c("a", "b"), c("a", "b")))
# create zero matrix of the right size
full_matrix <- matrix(0, ncol=length(items), nrow=length(items),
dimnames=list(items, items))
# populate items from the small matrix
full_matrix[rownames(small_matrix), colnames(small_matrix)] <- small_matrix
full_matrix
# a b c d
# a 0 2 0 0
# b 1 3 0 0
# c 0 0 0 0
# d 0 0 0 0
What is the equivalent of that in data.table? I can create an 'id' column in small_DT and use it as the key, but I'm not sure how to overwrite items in full_DT that has the same id/column pair.

Let's convert to data.table and keep the row names as an extra column:
dts = as.data.table(small_matrix, keep = T)
# rn a b
#1: a 0 2
#2: b 1 3
dtf = as.data.table(full_matrix, keep = T)
# rn a b c d
#1: a 0 0 0 0
#2: b 0 0 0 0
#3: c 0 0 0 0
#4: d 0 0 0 0
Now just join on the rows, and assuming small matrix is always a subset you can do the following:
dtf[dts, names(dts) := dts, on = 'rn']
dtf
# rn a b c d
#1: a 0 2 0 0
#2: b 1 3 0 0
#3: c 0 0 0 0
#4: d 0 0 0 0
Above assumes version 1.9.5+. Otherwise you'll need to set the key first.

Suppose you have these two data.table:
dt1 = as.data.table(small_matrix)
# a b
#1: 0 2
#2: 1 3
dt2 = as.data.table(full_matrix)
# a b c d
#1: 0 0 0 0
#2: 0 0 0 0
#3: 0 0 0 0
#4: 0 0 0 0
You can't operate like with data.frame or matrix, eg by doing:
dt2[rownames(full_matrix) %in% rownames(small_matrix), names(dt1), with=F] <- dt1
This code will raise an error, because to affect new values, you need to use the := operator:
dt2[rownames(full_matrix) %in% rownames(small_matrix), names(dt1):=dt1][]
# a b c d
#1: 0 2 0 0
#2: 1 3 0 0
#3: 0 0 0 0
#4: 0 0 0 0

Assigning values to an empty adjacency matrix based on matching column values

I have an nxn dataset, say 5X5 data set.
ALPHA BETA GAMMA DELTA EPSILON
A B A X 1
B C 3 X 3
C D E Z 4
D A D X 5
E A 2 Z 2
I use column “ALPHA” to create an empty adjacency matrix (Aij),
A B C D E
A 0 0 0 0 0
B 0 0 0 0 0
C 0 0 0 0 0
D 0 0 0 0 0
E 0 0 0 0 0
I want to reassign Adjacency matrix values to 1 or 0 based on the matched values of column “DELTA” such that, if “DELTA” matches we set Aij=1 and 0 otherwise. That is, we will have a new adjacency matrix that looks like the following,
A B C D E
A 0 1 0 1 0
B 1 0 0 1 0
C 0 0 0 0 1
D 1 1 0 0 0
E 0 0 1 0 0
What loop command can or matching technique can I use to assign the new values?
Thanks.
Phil

A loop could work. You have A(i=j) as 0 in your example so I subtracted a diagonal matrix
DELTA<-c("X","X","Z","X","Z")
Adj<-mat.or.vec(nr=length(DELTA), nc=length(DELTA))
for (i in 1:length(DELTA)){
Adj[i,DELTA==DELTA[i]]<-1
}
Adj<-Adj-diag(length(DELTA))

You could use outer
res <- +(outer(df1$DELTA, df1$DELTA, FUN='=='))*!diag(dim(df1)[1])
dimnames(res) <- rep(list(df1$ALPHA),2)
res
# A B C D E
#A 0 1 0 1 0
#B 1 0 0 1 0
#C 0 0 0 0 1
#D 1 1 0 0 0
#E 0 0 1 0 0
Or
sapply(df1$DELTA, `==`, df1$DELTA) - diag(dim(df1)[1])
data
df1 <- structure(list(ALPHA = c("A", "B", "C", "D", "E"), BETA = c("B",
"C", "D", "A", "A"), GAMMA = c("A", "3", "E", "D", "2"), DELTA = c("X",
"X", "Z", "X", "Z"), EPSILON = c(1L, 3L, 4L, 5L, 2L)), .Names = c("ALPHA",
"BETA", "GAMMA", "DELTA", "EPSILON"), class = "data.frame",
row.names = c(NA, -5L))

In an NxN matrix give value "True" to pairs from data frame

I have created a 5x5 matrix where rows and columns have the same names and a data frame with name pairs:
N <- 5
Names <- letters[1:N]
mat <- matrix(rep(0, N*N), nrow = N, ncol = N, dimnames = list(Names, Names))
a b c d e
a 0 0 0 0 0
b 0 0 0 0 0
c 0 0 0 0 0
d 0 0 0 0 0
e 0 0 0 0 0
The data frame then consist of different pairs:
col1 col2
1 a c
2 c b
3 d b
4 d e
How can I match these in so that col1 only refers to rows in my matrix and col2 only to columns? The above should compute to the following result:
a b c d e
a 0 0 1 0 0
b 0 0 0 0 0
c 0 1 0 0 0
d 0 1 0 0 1
e 0 0 0 0 0

You can use match to create a "key" of which combinations need to be replaced with 1, like this:
key <- vapply(seq_along(mydf),
function(x) match(mydf[[x]],
dimnames(mat)[[x]]),
numeric(nrow(mydf)))
Then, use matrix indexing to replace the relevant values.
mat[key] <- 1
mat
a b c d e
a 0 0 1 0 0
b 0 0 0 0 0
c 0 1 0 0 0
d 0 1 0 0 1
e 0 0 0 0 0

You could also do:
mat[as.matrix(d1)] <- 1
mat
# a b c d e
#a 0 0 1 0 0
#b 0 0 0 0 0
#c 0 1 0 0 0
#d 0 1 0 0 1
#e 0 0 0 0 0
data
d1 <- structure(list(col1 = c("a", "c", "d", "d"), col2 = c("c", "b",
"b", "e")), .Names = c("col1", "col2"), class = "data.frame",
row.names = c("1", "2", "3", "4"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Transform relationship pairs into a matrix - r

Related

R - Create matrix from 3 raw vector

Reshape a data frame into a wide shape

Populating data from one data.table to another

Assigning values to an empty adjacency matrix based on matching column values

In an NxN matrix give value "True" to pairs from data frame

Categories

Resources