R: loop / function to create a matrix for comparison (contrasts) - r

I have following type of data, means combination of factors
P1 <- c("a", "a", "a", "a", "b", "b", "b", "c", "c", "d")
P2 <- c("a", "b", "c", "d", "b", "c", "d", "c", "d", "d")
myfactors <- data.frame(P1, P2)
P1 P2
1 a a
2 a b
3 a c
4 a d
5 b b
6 b c
7 b d
8 c c
9 c d
10 d d
In real word the factors might be any number, I am trying write a function that can be applicable to any level of the factors. I want to set contrasts all combinations available in the data set. for example in this data set a-b, a-c,a-d, b-c,b-d, c-d. The contrast rule here.
for example for "a-b" is if P1 = P2 = a or b the coefficient = -1,
if P1=a, P2= b or P1= b, P2 = a, the coefficient = 2,
else coefficient = 0
The output coefficient matrix will like the following:
P1 P2 a-b a-c a-d b-c b-d c-d
a a -1 -1 -1 0 0 0
a b 2 0 0 0 0 0
a c 0 2 0 0 0 0
a d 0 0 2 0 0 0
b b 1 0 0 -1 -1 0
b c 0 0 0 2 0 0
b d 0 0 0 0 2 0
c c 0 1 0 0 0 -1
c d 0 0 0 -1 0 2
d d 0 0 -1 0 -1 -1
As the function I am thinking is flexible one, if I will apply to the following dataset,
P1 <- c("CI", "CI", "CI", "CD", "CD", "CK", "CK")
P2 <- c("CI", "CD", "CK", "CD", "CK", "CK", "CI")
mydf2 <- data.frame(P1, P2)
mydf2
P1 P2
1 CI CI
2 CI CD
3 CI CK
4 CD CD
5 CD CK
6 CK CK
7 CK CI
The expected coefficient matrix for this dataframe is:
P1 P2 CI-CD CI-CK CD-CK CK-CI
CI CI -1 -1 0 -1
CI CD 2 0 0 0
CI CK 0 2 0 0
CD CD -1 0 -1 0
CD CK 0 0 2 0
CK CK 0 -1 -1 -1
CK CI 0 0 0 2
I tried several ways but could not come to successful program.
EDITS:
(1) I am not testing all possible combinations, the combination that only appear in P1 and P2 are tested
(2) I intend to develop solution not only to this instance, but of general application. for example myfactors dataframe above.

You didn't supply a reason for your particular choice of the 6 ordered combinations of P1 and P2 values, so I just ran through them all:
combos <- cbind( combn(unique(c(P2, P1)), 2), combn(unique(c(P2, P1)), 2)[2:1, ])
combos
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "CI" "CI" "CD" "CD" "CK" "CK"
[2,] "CD" "CK" "CK" "CI" "CI" "CD"
As I went through the logic it seemed more compact to test for conditions 1) and 2) and just use Boolean math to return the results. If both conditins are untrue you get 0. I've check the entries that do not match yours and I think your construction was wrong in spots. You have 0 in the "CI-CK" row 7 and I think the answer by your rules should be 2.:
sapply(1:ncol(combos), function(x) with( mydf2,
2*( (P1==combos[1,x] & P2 == combos[2,x]) | (P2==combos[1,x] & P1 == combos[2,x])) -
(P1 == P2 & P1 %in% combos[,x]) ) )
#---------------
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -1 -1 0 -1 -1 0
[2,] 2 0 0 2 0 0
[3,] 0 2 0 0 2 0
[4,] -1 0 -1 -1 0 -1
[5,] 0 0 2 0 0 2
[6,] 0 -1 -1 0 -1 -1
[7,] 0 2 0 0 2 0
#------------------
mydf2[ , 3:8] <- sapply(1:ncol(combos), function(x) with( mydf2,
2*( (P1==combos[1,x] & P2 == combos[2,x]) | (P2==combos[1,x] & P1 == combos[2,x])) -
(P1 == P2 & P1 %in% combos[,x]) ) )
mydf2
#-----------------
P1 P2 CI-CD CI-CK CD-CK CD-CI CK-CI CK-CD
1 CI CI -1 -1 0 -1 -1 0
2 CI CD 2 0 0 2 0 0
3 CI CK 0 2 0 0 2 0
4 CD CD -1 0 -1 -1 0 -1
5 CD CK 0 0 2 0 0 2
6 CK CK 0 -1 -1 0 -1 -1
7 CK CI 0 2 0 0 2 0

Related

Crosstab of two identical variables in R - reflect in diagonal

I've got a dataset where I'm interested in the frequencies of different pairs emerging, but it doesn't matter which order the elements occur. For example:
library(janitor)
set.seed(24601)
options <- c("a", "b", "c", "d", "e", "f")
data.frame(x = sample(options, 20, replace = TRUE),
y = sample(options, 20, replace = TRUE)) %>%
tabyl(x, y)
provides me with the output
x a b c d e f
a 1 0 1 0 1 0
b 0 2 0 1 0 0
c 2 0 1 0 0 0
d 0 0 0 0 1 0
e 1 1 2 0 0 3
f 0 0 1 1 0 1
I'd ideally have the top right or bottom left of this table, where the combination of values a and c would be a total of 3. This is the sum of 1 (in the top right) and 2 (in the middle left). And so on for each other pair of values.
I'm sure there must be a simple way to do this, but I can't figure out what it is...
Edited to add (thanks #Akrun for the request): ideally I'd like the following output
x a b c d e f
a 1 0 3 0 2 0
b 2 0 1 1 0
c 1 0 2 1
d 0 1 1
e 0 3
f 1
We could + with the transposed output (except the first column), then replace the 'out' object upper triangle values (subset the elements based on the upper.tri - returns a logical vector) with that corresponding elements, and assign the lower triangle elements to NA
out2 <- out[-1] + t(out[-1])
out[-1][upper.tri(out[-1])] <- out2[upper.tri(out2)]
out[-1][lower.tri(out[-1])] <- NA
-output
out
# x a b c d e f
# a 1 0 3 0 2 0
# b NA 2 0 1 1 0
# c NA NA 1 0 2 1
# d NA NA NA 0 1 1
# e NA NA NA NA 0 3
# f NA NA NA NA NA 1
data
set.seed(24601)
options <- c("a", "b", "c", "d", "e", "f")
out <- data.frame(x = sample(options, 20, replace = TRUE),
y = sample(options, 20, replace = TRUE)) %>%
tabyl(x, y)
Here is another option, using igraph
out[-1] <- get.adjacency(
graph_from_data_frame(
get.data.frame(
graph_from_adjacency_matrix(
as.matrix(out[-1]), "directed"
)
), FALSE
),
type = "upper",
sparse = FALSE
)
which gives
> out
x a b c d e f
a 1 0 3 0 2 0
b 0 2 0 1 1 0
c 0 0 1 0 2 1
d 0 0 0 0 1 1
e 0 0 0 0 0 3
f 0 0 0 0 0 1

Transform event list dataframe in adjacency dataframe

I have a df in which every columns represent an event and in cells there are the individuals, like this:
df=data.frame(topic1=c("a", "b","c", "d"), topic2=c("e","f", "g", "a"), topic3=c("b","c","g","h"))
I need to transform it in adjacency df, like this:
topic1 topic2 topic3
a 1 1 0
b 1 0 1
c 1 0 1
d 1 0 0
e 0 1 0
f 0 1 0
g 0 1 1
h 0 0 1
THX!
Form levs containing the levels in sorted order and then for each column of df determine which levs are in it. This gives a logical matrix which we can convert to numeric using +.
levs <- sort(unique(unlist(df))) # a b c d e f g h
+ sapply(df, function(x) levs %in% x)
giving:
topic1 topic2 topic3
[1,] 1 1 0
[2,] 1 0 1
[3,] 1 0 1
[4,] 1 0 0
[5,] 0 1 0
[6,] 0 1 0
[7,] 0 1 1
[8,] 0 0 1
The last line could be written even more compactly as:
+ sapply(df, `%in%`, x = levs)

unordered combination and store the result in a matrix in r

Say I have a list (a, b, c), I Want to find out all the possible combinations of them and store in a matrix like:
a b c
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
[4,] 1 1 0
[5,] 1 0 1
[6,] 0 1 1
[7,] 1 1 1`
I don't know how to make it. Thanks for the help!
To do exactly what you want, use permutations in the gtools package. This works as follows:
m <- permutations(2, 3, v=c(0,1), repeats.allowed=T)
colnames(m) <- c('a','b','c')
# delete [0,0,0]
m <- m[-1,]
Yields:
a b c
[1,] 0 0 1
[2,] 0 1 0
[3,] 0 1 1
[4,] 1 0 0
[5,] 1 0 1
[6,] 1 1 0
[7,] 1 1 1
Idea was taken from the comment section under this question:
Generate all combinations of length 2 using 3 letters
My adaptation is not very elegant... but it seems to do the job.
output <- expand.grid(rep(list(c('a', 'b', 'c')), 3))
colnames(output) <- c('a', 'b', 'c')
for (col in colnames(output)) {
output[, col] <- as.character(output[,col])
output[, col] <- ifelse(output[, col]==col, 1, 0)
}
output <- output[!duplicated(output), ]
rownames(output) <- NULL
print(output)
# a b c
# 1 1 0 0
# 2 0 0 0
# 3 1 1 0
# 4 0 1 0
# 5 1 0 1
# 6 0 0 1
# 7 1 1 1
# 8 0 1 1

Simplify this grid such that each row and column has 1 value

Example code here:
> temp2
a b c d e f g h
i 1 1 0 0 0 1 0 1
j 0 1 0 0 0 1 0 1
k 0 1 1 0 0 1 1 1
l 0 0 0 0 1 0 0 1
m 0 0 1 1 0 0 1 1
n 0 0 1 1 0 0 1 1
o 0 0 0 1 0 0 1 1
p 0 0 0 0 1 0 0 1
> dput(temp2)
structure(list(a = c(1, 0, 0, 0, 0, 0, 0, 0), b = c(1, 1, 1,
0, 0, 0, 0, 0), c = c(0, 0, 1, 0, 1, 1, 0, 0), d = c(0, 0, 0,
0, 1, 1, 1, 0), e = c(0, 0, 0, 1, 0, 0, 0, 1), f = c(1, 1, 1,
0, 0, 0, 0, 0), g = c(0, 0, 1, 0, 1, 1, 1, 0), h = c(1, 1, 1,
1, 1, 1, 1, 1)), .Names = c("a", "b", "c", "d", "e", "f", "g",
"h"), class = "data.frame", row.names = c("i", "j", "k", "l",
"m", "n", "o", "p"))
I have this 8x8 grid of 1s and 0s. I need to solve for some grid where each row and each column has exactly one 1 and the rest 0s, but the 1 has to be in a place where the original grid has a 1. It's almost like a sudoku question but not exactly. Any thoughts on how to get started?
I would need some function that can do this for a general grid, not simply this specific one. We can assume that there's always a solution grid, given some starting grid.
Thanks!
Edit: a valid solution
> temp3
a b c d e f g h
i 1 0 0 0 0 0 0 0
j 0 1 0 0 0 0 0 0
k 0 0 0 0 0 1 0 0
l 0 0 0 0 1 0 0 0
m 0 0 0 1 0 0 0 0
n 0 0 1 0 0 0 0 0
o 0 0 0 0 0 0 1 0
p 0 0 0 0 0 0 0 1
EDIT2: given that there's only 8! unique solutions for any grid, i may attempt a brute force / matching approach.
This can be solved as a transportation problem or as an integer programming problem. We also show a one-line solution using only base R which generates random matrices for which each row and each columns column sums to 1 filtering out and returning the ones satisfying the additional constraints that each element of the solution matrix be less than or equal to the corresponding element of temp2.
1) transportation problem Using lp.transport in lpSolve we can solve it in one statement:
library(lpSolve)
res <- lp.transport(as.matrix(temp2), "max",
rep("=", 8), rep(1, 8), rep("=", 8), rep(1, 8), integers = 0:1)
res
## Success: the objective function is 8
soln <- array(res$solution, dim(temp2))
# verify
all(colSums(soln)==1) && all(rowSums(soln)==1) && all(temp2>=soln) && all(soln %in% 0:1)
## [1] TRUE
2) integer programming
If X is the solution we have specified the row and column constraints but have not specified the X <= temp2 constraints since they will be satisfied automatically as no solution putting a 1 where a temp2 0 is can have the maximum objective of 8.
library(lpSolve)
n <- nrow(temp2)
obj <- unlist(temp2)
const_row <- t(sapply(1:n, function(i) c(row(temp2)) == i)) # each row sums to 1
const_col <- t(sapply(1:n, function(i) c(col(temp2)) == i)) # each col sums to 1
const.mat <- rbind(const_row, const_col)
res <- lp("max", obj, const.mat, "=", 1, all.bin = TRUE)
res
## Success: the objective function is 8
soln <- array(res$solution, dim(temp2))
# verify
all(colSums(soln)==1) && all(rowSums(soln)==1) && all(temp2>=soln) && all(soln %in% 0:1)
## [1] TRUE
(Note that by the same argument we could have relaxed the problem to a linear programming problem provided we add 0 <= soln[i, j] <= 1 constraints since by the same argument that allowed us to omit the soln[i, j] <= temp2[i, j] constraints the maximization will force the soln elements to be 0 or 1 anyways.)
2a) This approach is longer but does spell out the X <= temp2 constraints explicitly:
n <- nrow(temp2)
obj <- numeric(n*n)
const1 <- diag(n*n) # soln[i,j] <= temp2[i,j]
const2 <- t(sapply(1:n, function(i) c(row(temp2)) == i)) # each row sums to 1
const3 <- t(sapply(1:n, function(i) c(col(temp2)) == i)) # each col sums to 1
const.mat <- rbind(const1, const2, const3)
const.dir <- rep(c("<=", "="), c(n*n, 2*n))
const.rhs <- c(unlist(temp2), rep(1, 2*n))
res <- lp("max", obj, const.mat, const.dir, const.rhs, all.bin = TRUE)
res
## Success: the objective function is 0
soln <- array(res$solution, dim(temp2))
# verify
all(colSums(soln)==1) && all(rowSums(soln)==1) && all(temp2>=soln) && all(soln %in% 0:1)
## [1] TRUE
2b) Note that if X is the solution matrix then in X <= temp2 only the positions of X corresponding to zeros in temp2 actually constrain so we could eliminate any constraint corresponding to a 1 in temp2 in the (2a) solution. With this change all constraints become equality constraints.
n <- nrow(temp2)
obj <- numeric(n*n)
const1 <- diag(n*n)[unlist(temp2) == 0, ]
const2 <- t(sapply(1:n, function(i) c(row(temp2)) == i)) # each row sums to 1
const3 <- t(sapply(1:n, function(i) c(col(temp2)) == i)) # each col sums to 1
const.mat <- rbind(const1, const2, const3)
const.dir <- "="
const.rhs <- c(numeric(nrow(const1)), rep(1, 2*n))
res <- lp("max", obj, const.mat, const.dir, const.rhs, all.bin = TRUE)
res
## Success: the objective function is 0
soln <- array(res$solution, dim(temp2))
# verify
all(colSums(soln)==1) && all(rowSums(soln)==1) && all(temp2>=soln) && all(soln %in% 0:1)
## [1] TRUE
In fact, we could go further and remove the variables that correspond to zero elements of temp2.
3) r2dtable Here we use rd2table to generate 10,000 8x8 tables whose rows and columns sum to 1 and then filter them to pick out only those satisfying the X < temp2 constrainsts. Withtemp2` from the question and the random seed shown has found 3 solutions. If with different inputs it finds no solutions then try generating a higher number of random proposals. This approach does not use any packages.
set.seed(123) # for reproducibility
Filter(function(x) all(x <= temp2), r2dtable(10000, rep(1, 8), rep(1, 8)))
giving:
[[1]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 1 0 0
[3,] 0 1 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 1
[5,] 0 0 0 0 0 0 1 0
[6,] 0 0 1 0 0 0 0 0
[7,] 0 0 0 1 0 0 0 0
[8,] 0 0 0 0 1 0 0 0
[[2]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 1 0 0
[3,] 0 1 0 0 0 0 0 0
[4,] 0 0 0 0 1 0 0 0
[5,] 0 0 0 1 0 0 0 0
[6,] 0 0 1 0 0 0 0 0
[7,] 0 0 0 0 0 0 1 0
[8,] 0 0 0 0 0 0 0 1
[[3]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 0 0 0 0 0 0 0
[2,] 0 1 0 0 0 0 0 0
[3,] 0 0 0 0 0 1 0 0
[4,] 0 0 0 0 1 0 0 0
[5,] 0 0 1 0 0 0 0 0
[6,] 0 0 0 0 0 0 1 0
[7,] 0 0 0 1 0 0 0 0
[8,] 0 0 0 0 0 0 0 1
A brute-force way:
m = as.matrix(temp2)
w = data.frame(which(m == 1, arr.ind = TRUE))
combos = as.matrix(do.call(expand.grid, with(w, split(col, row))))
combos[ apply(combos, 1, function(x) !anyDuplicated(x)), ]
1 2 3 4 5 6 7 8
[1,] 1 6 2 8 7 3 4 5
[2,] 1 2 6 8 7 3 4 5
[3,] 1 6 2 8 3 7 4 5
[4,] 1 2 6 8 3 7 4 5
[5,] 1 6 2 8 4 3 7 5
[6,] 1 2 6 8 4 3 7 5
[7,] 1 6 2 8 3 4 7 5
[8,] 1 2 6 8 3 4 7 5
[9,] 1 6 2 5 7 3 4 8
[10,] 1 2 6 5 7 3 4 8
[11,] 1 6 2 5 3 7 4 8
[12,] 1 2 6 5 3 7 4 8
[13,] 1 6 2 5 4 3 7 8
[14,] 1 2 6 5 4 3 7 8
[15,] 1 6 2 5 3 4 7 8
[16,] 1 2 6 5 3 4 7 8
OP claims to only ever need to handle an 8x8 grid, so I guess this performs well enough. Each row of the result is a solution. The first row says that (1,1), (2,6), (3,2) ... is a solution.
A variation using data.table:
library(data.table)
m = as.matrix(temp2)
comboDT = setDT(melt(m))[ value == 1, do.call(CJ, split(Var2, Var1)) ][,
rid := .I ][, melt(.SD, id="rid", variable.name="row", value.name="col")]
setkey(comboDT, rid)
comboDT[ .( comboDT[, !anyDuplicated(col), by=rid][(V1), rid]) ]
this works. Let grid be my grid (temp2 from above). then this will return a grid that works
# create random sufficient grid
counter = 0
while(2 > 1) {
counter = counter + 1
if(counter == 10000) {
break
}
rand_grid = matrix(0, nrow = 8, ncol = 8)
indices_avail = seq(1,8,by=1)
for(i in 1:8) {
k = sample(indices_avail, 1)
rand_grid[i, k] = 1
indices_avail = indices_avail[indices_avail != k]
}
if(sum(grid[which(rand_grid == 1)]) == 8) {
break
}
print(counter)
}
This approach will return all valid combinations. First find all matrix row combinations. Then search through exhaustively. This method would have to be improved if your matrix size increased. One simple improvement would be to run the diag test in parallel.
st<-as.matrix(temp2) # make sure we are working with matrices
## This method will return all possible matrices of combinations
## in essence if you have diag(matr) = width matrix than you have
## a valid choice
## Helper function to build all combinations, there may be better way to
## do this but it gets the job done
allCombinationsAux<-function(z,nreg,x){
if(sum(nreg)>1){
innerLoop<-do.call(rbind,lapply(x[nreg&(z!=x)], test1,nreg&(z!=x),x))
ret<-cbind(z,innerLoop )
}
else{
ret<-x[nreg]
}
ret
}
## Build all of the combinations of possible matrices
combs<-do.call(rbind,lapply(x,function(y) allCombinationsAux(y,y!=x,x)))
## iterate through all the possible combinations of matrices, to find out
## which ones have 1s throughout the diag
inds<-which(apply(combs,1,function(x) sum(diag(st[x,]))==8))
lapply(inds,function(x) st[combs[x,],])
While there are great answers already here for the brute-force approach and actually using math, just for kicks, here's a version that guesses and checks lags of the non-matching columns. For the example in question, it actually turns out to be quite quick, and as a bonus, you could find a new answer on any particular run! How fun! To the code:
set.seed(47) # remove this to have more fun
mat.in <- as.matrix(temp2) # we'll work in matrices
mat.out <- diag(8) # a starting guess
dimnames(mat.out) <- dimnames(mat.in) # make our answer pretty
iteration <- 1 # for kicks, a loop counter
while (any((mat.out != mat.in)[as.logical(mat.out)])) {
mat.ref <- mat.out
mat.out <- mat.out[, sample(8)] # make this deterministic if you like
inner <- 1 # don't repeat yourself (too much)
while (any(mat.out != mat.ref) & inner <= 8) {
mat.ref <- mat.out
# find non-matching indices and lag those columns
to.lag <- which((mat.out != mat.in)[as.logical(mat.out)])
i <- 1:8
i[to.lag] <- c(to.lag[length(to.lag)], to.lag[-length(to.lag)])
mat.out <- mat.out[, i]
cat(inner, " ") # let's see what it does
inner <- inner + 1
}
print(iteration) # whoo, scrolling numbers
iteration <- iteration + 1
}
## 1 2 3 [1] 1
## 1 2 3 4 5 6 7 8 [1] 2
## 1 2 [1] 3
## 1 2 3 [1] 4
which, for this particular seed returns
mat.out
## a c e g d b f h
## i 1 0 0 0 0 0 0 0
## j 0 0 0 0 0 1 0 0
## k 0 1 0 0 0 0 0 0
## l 0 0 0 0 1 0 0 0
## m 0 0 1 0 0 0 0 0
## n 0 0 0 0 0 0 1 0
## o 0 0 0 1 0 0 0 0
## p 0 0 0 0 0 0 0 1
It could certainly be optimized further, but it's already pretty quick (without the printing, which slows it down):
Unit: microseconds
expr min lq mean median uq max neval
let's guess 137.796 383.6445 838.2327 693.819 1163.08 2510.436 100
running all 100 times in a fraction of a second. It's quite a bit faster than actual guessing (chopping out the inner loop):
Unit: microseconds
expr min lq mean median uq max neval cld
guess smart 148.997 349.916 848.6314 588.162 1085.841 3117.78 100 a
actually guess 322.458 7341.961 31197.1237 20012.969 47677.501 160250.02 100 b
Note, though, that luck plays a role here, and if there are fewer solutions, it will take longer. If there are no solutions, it will run forever. It could, of course, be optimized to avoid such a fate by making sure it doesn't reuse the same starting permutation provided by sample(8) (a good idea regardless, which I deemed superfluous here as it only runs through a handful of permutations each run anyway). Hack away.

Transforming dataframe into expanded matrix in r

Say I have the following dataframe:
dfx <- data.frame(Var1=c("A", "B", "C", "D", "B", "C", "D", "C", "D", "D"),
Var2=c("E", "E", "E", "E", "A", "A", "A", "B", "B", "C"),
Var1out = c(1,-1,-1,-1,1,-1,-1,1,-1,-1),
Var2out= c(-1,1,1,1,-1,1,1,-1,1,1))
dfx
Var1 Var2 Var1out Var2out
1 A E 1 -1
2 B E -1 1
3 C E -1 1
4 D E -1 1
5 B A 1 -1
6 C A -1 1
7 D A -1 1
8 C B 1 -1
9 D B -1 1
10 D C -1 1
What you see here are 10 rows that correspond to match-ups between players A, B, C, D and E. They play each other once and the winner of each match-up is denoted by a +1 and the loser of each match-up is denoted by a -1 (put into the respective column Player Var1 result in Var1out, Player Var2 result in Var2out).
Desired output.
I wish to transform this dataframe to this output matrix (the order of rows are not important to me, but as you can see each row refers to a unique match-up):
A B C D E
1 1 0 0 0 -1
2 0 -1 0 0 1
3 0 0 -1 0 1
4 0 0 0 -1 1
5 -1 1 0 0 0
6 1 0 -1 0 0
7 1 0 0 -1 0
8 0 -1 1 0 0
9 0 1 0 -1 0
10 0 0 1 -1 0
What I've done:
I managed to make this matrix in a roundabout way. As roundabout ways tend to be slow and less satisfactory, I was wondering if anyone can spot a better way.
I first made sure that my two columns containing players had factor levels that contained every possible player that ever occurs (you'll note for instance that player E never occurs in Var1).
# Making sure Var1 and Var2 have same factor levels
levs <- unique(c(levels(dfx$Var1), levels(dfx$Var2))) #get all possible levels of factors
dfx$Var1 <- factor(dfx$Var1, levels=levs)
dfx$Var2 <- factor(dfx$Var2, levels=levs)
I next split the dataframe into two - one for Var1 and Var1out, and one for Var2 and Var2out:
library(dplyr)
temp.Var1 <- dfx %>% select(Var1, Var1out)
temp.Var2 <- dfx %>% select(Var2, Var2out)
Here I use model.matrix to expand columns by factor level:
mat.Var1<-with(temp.Var1, data.frame(model.matrix(~Var1+0)))
mat.Var2<-with(temp.Var2, data.frame(model.matrix(~Var2+0)))
I then replace for each row the column with a '1' indicating the presence of that factor, with the correct result and add these matrices:
mat1 <- apply(mat.Var1, 2, function(x) ifelse(x==1, x<-temp.Var1$Var1out, x<-0) )
mat2 <- apply(mat.Var2, 2, function(x) ifelse(x==1, x<-temp.Var2$Var2out, x<-0) )
matX <- mat1+mat2
matX
Var1A Var1B Var1C Var1D Var1E
1 1 0 0 0 -1
2 0 -1 0 0 1
3 0 0 -1 0 1
4 0 0 0 -1 1
5 -1 1 0 0 0
6 1 0 -1 0 0
7 1 0 0 -1 0
8 0 -1 1 0 0
9 0 1 0 -1 0
10 0 0 1 -1 0
Although this works, I have a sense that I am probably missing simpler solutions for this problem. Thanks.
Create an empty matrix and use matrix indexing to fill the relevant values in:
cols <- unique(unlist(dfx[1:2]))
M <- matrix(0, nrow = nrow(dfx), ncol = length(cols), dimnames = list(NULL, cols))
M[cbind(sequence(nrow(dfx)), match(dfx$Var1, cols))] <- dfx$Var1out
M[cbind(sequence(nrow(dfx)), match(dfx$Var2, cols))] <- dfx$Var2out
M
# A B C D E
# [1,] 1 0 0 0 -1
# [2,] 0 -1 0 0 1
# [3,] 0 0 -1 0 1
# [4,] 0 0 0 -1 1
# [5,] -1 1 0 0 0
# [6,] 1 0 -1 0 0
# [7,] 1 0 0 -1 0
# [8,] 0 -1 1 0 0
# [9,] 0 1 0 -1 0
# [10,] 0 0 1 -1 0
Another way is to use acast
library(reshape2)
#added `use.names=FALSE` from #Ananda Mahto's comments
dfy <- data.frame(Var=unlist(dfx[,1:2], use.names=FALSE),
VarOut=unlist(dfx[,3:4], use.names=FALSE), indx=1:nrow(dfx))
acast(dfy, indx~Var, value.var="VarOut", fill=0)
# A B C D E
#1 1 0 0 0 -1
#2 0 -1 0 0 1
#3 0 0 -1 0 1
#4 0 0 0 -1 1
#5 -1 1 0 0 0
#6 1 0 -1 0 0
#7 1 0 0 -1 0
#8 0 -1 1 0 0
#9 0 1 0 -1 0
#10 0 0 1 -1 0
Or use spread
library(tidyr)
spread(dfy,Var, VarOut , fill=0)[,-1]
# A B C D E
#1 1 0 0 0 -1
#2 0 -1 0 0 1
#3 0 0 -1 0 1
#4 0 0 0 -1 1
#5 -1 1 0 0 0
#6 1 0 -1 0 0
#7 1 0 0 -1 0
#8 0 -1 1 0 0
#9 0 1 0 -1 0
#10 0 0 1 -1 0

Resources