I have created a 5x5 matrix where rows and columns have the same names and a data frame with name pairs:
N <- 5
Names <- letters[1:N]
mat <- matrix(rep(0, N*N), nrow = N, ncol = N, dimnames = list(Names, Names))
a b c d e
a 0 0 0 0 0
b 0 0 0 0 0
c 0 0 0 0 0
d 0 0 0 0 0
e 0 0 0 0 0
The data frame then consist of different pairs:
col1 col2
1 a c
2 c b
3 d b
4 d e
How can I match these in so that col1 only refers to rows in my matrix and col2 only to columns? The above should compute to the following result:
a b c d e
a 0 0 1 0 0
b 0 0 0 0 0
c 0 1 0 0 0
d 0 1 0 0 1
e 0 0 0 0 0
You can use match to create a "key" of which combinations need to be replaced with 1, like this:
key <- vapply(seq_along(mydf),
function(x) match(mydf[[x]],
dimnames(mat)[[x]]),
numeric(nrow(mydf)))
Then, use matrix indexing to replace the relevant values.
mat[key] <- 1
mat
a b c d e
a 0 0 1 0 0
b 0 0 0 0 0
c 0 1 0 0 0
d 0 1 0 0 1
e 0 0 0 0 0
You could also do:
mat[as.matrix(d1)] <- 1
mat
# a b c d e
#a 0 0 1 0 0
#b 0 0 0 0 0
#c 0 1 0 0 0
#d 0 1 0 0 1
#e 0 0 0 0 0
data
d1 <- structure(list(col1 = c("a", "c", "d", "d"), col2 = c("c", "b",
"b", "e")), .Names = c("col1", "col2"), class = "data.frame",
row.names = c("1", "2", "3", "4"))
Related
I have one data frame like this. The id of each line is unique and the type defines the group of the id.
id type
a a1
b a1
c a2
d a3
e a4
f a4
I want to make a matrix like below. The value would be 1 if the two id belong to the same type, otherwise 0.
a b c d e f
a 1 1 0 0 0 0
b 1 1 0 0 0 0
c 0 0 1 0 0 0
d 0 0 0 1 0 0
e 0 0 0 0 1 1
f 0 0 0 0 1 1
The data frame is large (over 70 thousands line), and I do not know how to do this efficiently in R. Any suggestions would be appreciated.
Here is a base R solution, and I think you can use the following code
M <- crossprod(t(table(df)))
or
M <- crossprod(table(rev(df)))
such that
> M
id
id a b c d e f
a 1 1 0 0 0 0
b 1 1 0 0 0 0
c 0 0 1 0 0 0
d 0 0 0 1 0 0
e 0 0 0 0 1 1
f 0 0 0 0 1 1
DATA
df <- structure(list(id = c("a", "b", "c", "d", "e", "f"), type = c("a1",
"a1", "a2", "a3", "a4", "a4")), class = "data.frame", row.names = c(NA,
-6L))
The question R DataFrame into square (symmetric) matrix is about transforming a data frame into a symmetric matrix given the following example:
# data
x <- structure(c(1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0),
.Dim = c(3L,5L),
.Dimnames = list(c("X", "Y", "Z"), c("A", "B", "C", "D", "E")))
x
# A B C D E
#X 1 1 0 0 0
#Y 1 1 0 0 0
#Z 0 0 0 1 0
# transformation
x <- x %*% t(x)
diag(x) <- 1
x
# X Y Z
#X 1 2 0
#Y 2 1 0
#Z 0 0 1
Now, I'm trying to keep the columns c("A", "B", "C", "D", "E") in the matrix, like this:
# X Y Z A B C D E
#X 1 0 0 1 1 0 0 0
#Y 0 1 0 1 1 0 0 0
#Z 0 0 1 0 0 0 1 0
#A 1 1 0 1 0 0 0 0
#B 1 1 0 0 1 0 0 0
#C 0 0 0 0 0 1 0 0
#D 0 0 1 0 0 0 1 0
#E 0 0 0 0 0 0 0 1
I'm pretty sure there's an easy way to do it similar to the first transformation including each case. Can anyone suggest a solution?
Thank you in advance!
This is just an augmentation, isn't it?
X <- as.matrix(x)
rbind(cbind(diag(nrow(X)), X), cbind(t(X), diag(ncol(X))))
build it step by step: thanks to #李哲源 for pointing out m <- diag(sum(dim(x))).
m <- diag(sum(dim(x)))
colnames(m) = rownames(m) = c(rownames(x),colnames(x))
m[dimnames(x)[[1]],dimnames(x)[[2]]] <- x
m[dimnames(x)[[2]],dimnames(x)[[1]]] <- t(x)
# X Y Z A B C D E
#X 1 0 0 1 1 0 0 0
#Y 0 1 0 1 1 0 0 0
#Z 0 0 1 0 0 0 1 0
#A 1 1 0 1 0 0 0 0
#B 1 1 0 0 1 0 0 0
#C 0 0 0 0 0 1 0 0
#D 0 0 1 0 0 0 1 0
#E 0 0 0 0 0 0 0 1
You can wrap it into a function and then call it:
makeSymMat <-function(x) {
x <- as.matrix(x)
m <- diag(sum(dim(x)))
colnames(m) = rownames(m) = c(rownames(x),colnames(x))
m[dimnames(x)[[1]],dimnames(x)[[2]]] <- x
m[dimnames(x)[[2]],dimnames(x)[[1]]] <- t(x)
return(m)
}
makeSymMat(x)
I have 3 vectors as the following:
A <- c("A", "B", "C", "D", "E")
B <- c("1/1/1", "1/1/1", "2/1/1", "2/1/1", "3/1/1")
C <- c(1, 1, -1, 1, -1)
and I want to create a matrix like the following using these 3 vectors:
- 1/1/1 2/1/1 3/1/1
A 1 0 0
B 1 0 0
C 0 -1 0
D 0 1 0
E 0 0 -1
where vector A and B are rows and columns respectively and I have the data as C.
Any help would be appreciated.
Use ?xtabs
xtabs(C ~ A+B)
# B
#A 1/1/1 2/1/1 3/1/1
# A 1 0 0
# B 1 0 0
# C 0 -1 0
# D 0 1 0
# E 0 0 -1
You can try:
`[<-`(array(0,c(length(unique(A)),length(unique(B))),
list(unique(A),unique(B))),
cbind(A,B),C)
# 1/1/1 2/1/1 3/1/1
#A 1 0 0
#B 1 0 0
#C 0 -1 0
#D 0 1 0
#E 0 0 -1
Another option is acast from reshape2 after creating a data.frame
library(reshape2)
acast(data.frame(A, B, C), A~B, value.var = "C", fill =0)
# 1/1/1 2/1/1 3/1/1
#A 1 0 0
#B 1 0 0
#C 0 -1 0
#D 0 1 0
#E 0 0 -1
I have an nxn dataset, say 5X5 data set.
ALPHA BETA GAMMA DELTA EPSILON
A B A X 1
B C 3 X 3
C D E Z 4
D A D X 5
E A 2 Z 2
I use column “ALPHA” to create an empty adjacency matrix (Aij),
A B C D E
A 0 0 0 0 0
B 0 0 0 0 0
C 0 0 0 0 0
D 0 0 0 0 0
E 0 0 0 0 0
I want to reassign Adjacency matrix values to 1 or 0 based on the matched values of column “DELTA” such that, if “DELTA” matches we set Aij=1 and 0 otherwise. That is, we will have a new adjacency matrix that looks like the following,
A B C D E
A 0 1 0 1 0
B 1 0 0 1 0
C 0 0 0 0 1
D 1 1 0 0 0
E 0 0 1 0 0
What loop command can or matching technique can I use to assign the new values?
Thanks.
Phil
A loop could work. You have A(i=j) as 0 in your example so I subtracted a diagonal matrix
DELTA<-c("X","X","Z","X","Z")
Adj<-mat.or.vec(nr=length(DELTA), nc=length(DELTA))
for (i in 1:length(DELTA)){
Adj[i,DELTA==DELTA[i]]<-1
}
Adj<-Adj-diag(length(DELTA))
You could use outer
res <- +(outer(df1$DELTA, df1$DELTA, FUN='=='))*!diag(dim(df1)[1])
dimnames(res) <- rep(list(df1$ALPHA),2)
res
# A B C D E
#A 0 1 0 1 0
#B 1 0 0 1 0
#C 0 0 0 0 1
#D 1 1 0 0 0
#E 0 0 1 0 0
Or
sapply(df1$DELTA, `==`, df1$DELTA) - diag(dim(df1)[1])
data
df1 <- structure(list(ALPHA = c("A", "B", "C", "D", "E"), BETA = c("B",
"C", "D", "A", "A"), GAMMA = c("A", "3", "E", "D", "2"), DELTA = c("X",
"X", "Z", "X", "Z"), EPSILON = c(1L, 3L, 4L, 5L, 2L)), .Names = c("ALPHA",
"BETA", "GAMMA", "DELTA", "EPSILON"), class = "data.frame",
row.names = c(NA, -5L))
When I use acastin R, the sorting of my data frame gets messed up. Imagine my data.frame looks like this
V1 V2 V3
1 D Y 0
2 E X 0
3 C N 0
4 B M 0
5 A S 0
doing acast(dd, V1 ~ V2, value.var="V3", fill = 0) will result in an ordered matrix, e.g.
M N S X Y
A 0 0 0 0 0
B 0 0 0 0 0
C 0 0 0 0 0
D 0 0 0 0 0
E 0 0 0 0 0
How do I keep the original sorting of the data frame?
Make V1 and V2 into factors, and when you do so, make their levels the order you want. The default ordering when making factors is to sort them, which is why you got the order you did the first time.
d <- data.frame(V1=c("D", "E", "C", "B", "A"), V2=c("Y","X","N","M","S"), V3=0)
d$V1 <- factor(d$V1, levels=unique(d$V1))
d$V2 <- factor(d$V2, levels=unique(d$V2))
> acast(d, V1~V2, value.var="V3", fun.aggregate=sum)
Y X N M S
D 0 0 0 0 0
E 0 0 0 0 0
C 0 0 0 0 0
B 0 0 0 0 0
A 0 0 0 0 0
You can do something like this :
m <- acast(dd, V1 ~ V2, value.var="V3", fill = 0)
m[dd$V1,dd$V2]
Which gives :
Y X N M S
D 0 0 0 0 0
E 0 0 0 0 0
C 0 0 0 0 0
B 0 0 0 0 0
A 0 0 0 0 0