Matrix from rows with delimited items in R - r

I have a such database with semicolon delimited values in rows:
A;1;3;5;7;9
B;1;2;3
C;1;3;5
D;2;4;8
There is different count of items in each row. Each item is only once in each row (no repeating).
I'd like to make a matrix for item base collaborative filtering. The first column with letters is deleted and the numbers are transformed like this:
1 2 3 4 5 6 7 8 9
-----------------
1 0 1 0 1 0 1 0 1
1 1 1 0 0 0 0 0 0
1 0 1 0 1 0 0 0 0
0 1 0 1 0 0 0 0 0
Can you please give me an advice how to manage it?

Here is an option. We read in the string into a character vector, strsplit on ;, initialize the empty matrix, and then assign for each row using a matrix index of the row with all the column values:
DAT <- readLines(textConnection("A;1;3;5;7;9
B;1;2;3
C;1;3;5
D;2;4;8"))
DAT.NUM <- lapply(strsplit(DAT, ";"), function(x) as.integer(x[-1]))
RES <- matrix(0L, length(DAT), max(unlist(DAT.NUM)))
for(i in seq_along(DAT)) RES[cbind(i, DAT.NUM[[i]])] <- 1L
Produces:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 0 1 0 1 0 1 0 1
[2,] 1 1 1 0 0 0 0 0 0
[3,] 1 0 1 0 1 0 0 0 0
[4,] 0 1 0 1 0 0 0 1 0
Alternatively, inspired by #user227710, you can:
t(table(stack(setNames(DAT.NUM, seq_along(DAT.NUM)))))
Which produces:
values
ind 1 2 3 4 5 7 8 9
1 1 0 1 0 1 1 0 1
2 1 1 1 0 0 0 0 0
3 1 0 1 0 1 0 0 0
4 0 1 0 1 0 0 1 0

Related

Random matrix with diagonal entries 0's and all other entries are 0's and 1's [duplicate]

This question already has an answer here:
Set diagonal of a matrix to zero in R
(1 answer)
Closed 2 years ago.
I tried using the rbern function in R but I realized that the diagonal entries are not all 0's.
This would be a possible way:
m <- 10
n <- 10
mat <- matrix(sample(0:1,m*n, replace=TRUE),m,n)
diag(mat) <- 0
#> mat
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 0 1 1 0 1 0 0 0 1 0
# [2,] 1 0 1 1 0 0 1 0 0 1
# [3,] 0 0 0 1 0 1 0 0 1 0
# [4,] 0 1 0 0 0 0 0 0 1 0
# [5,] 0 1 1 1 0 0 1 1 0 0
# [6,] 1 0 1 0 1 0 0 0 1 0
# [7,] 1 1 1 0 0 0 0 0 1 1
# [8,] 1 0 1 1 1 1 1 0 1 1
# [9,] 1 1 1 1 1 1 0 0 0 1
#[10,] 1 0 1 0 1 0 0 0 1 0

Matrix generation in R without loop

I am trying to create a matrix of the following kind in R: the number of rows is equal to n (supplied); in row i, for all i=1:n, the elements at positions n(i-1)+1 through n(i-1)+n inclusive are 1, all other elements are 0.
For example, if n=3, the matrix looks like
1 1 1 0 0 0 0 0 0
0 0 0 1 1 1 0 0 0
0 0 0 0 0 0 1 1 1
Or for n=4:
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
Is there any way of constructing this matrix in R, for general n, without using for loops (or any other kind of loop preferably)?
The simplest / most efficient method (in base R) would be ideal.
Solution 1: diag returns the diagonal of a matrix. Repeat each element 3 times and (re-)coerce it into a matrix:
matrix(rep(diag(3), each=3), nrow=3, byrow=TRUE)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#> [1,] 1 1 1 0 0 0 0 0 0
#> [2,] 0 0 0 1 1 1 0 0 0
#> [3,] 0 0 0 0 0 0 1 1 1
Solution 2: table interprets the two vectors as factors and counts the combinations of their levels. Since each combination only exists once, you get the same result:
table(rep(1:3, each = 3), 1:9)
#>
#> 1 2 3 4 5 6 7 8 9
#> 1 1 1 1 0 0 0 0 0 0
#> 2 0 0 0 1 1 1 0 0 0
#> 3 0 0 0 0 0 0 1 1 1
Created on 2021-02-21 by the reprex package (v1.0.0)

R: Matrix counting matches when 2 teams interacted from schedule with 3 participants per match

I'd like to make some calculations on FIRST robotics teams and need to build, for lack of better words, a binary interaction matrix. That is when two teams were on the same alliance. Each alliance has three teams, so there are 7 values from each match added to the matrix, when considering (i,j), (j,i), and (i,i).
The full data I'm using is here: http://frc-events.firstinspires.org/2016/MOKC/qualifications
But for simplicity, here is an example of 9 teams playing 1 match each.
> data.frame(Team.1=1:3,Team.2=4:6,Team.3=7:9)
Team.1 Team.2 Team.3
1 1 4 7
2 2 5 8
3 3 6 9
The matrix should count each binary interaction, (1,4),(4,7),(3,6),(6,3),(9,9), etc, and will be an N x N matrix, where in the above example N=9. Here's the matrix that represents the above lists:
> matrix(data=c(1,0,0,1,0,0,1,0,0,+
+ 0,1,0,0,1,0,0,1,0,+
+ 0,0,1,0,0,1,0,0,1,+
+ 1,0,0,1,0,0,1,0,0,+
+ 0,1,0,0,1,0,0,1,0,+
+ 0,0,1,0,0,1,0,0,1,+
+ 1,0,0,1,0,0,1,0,0,+
+ 0,1,0,0,1,0,0,1,0,+
+ 0,0,1,0,0,1,0,0,1),9,9)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 0 0 1 0 0 1 0 0
[2,] 0 1 0 0 1 0 0 1 0
[3,] 0 0 1 0 0 1 0 0 1
[4,] 1 0 0 1 0 0 1 0 0
[5,] 0 1 0 0 1 0 0 1 0
[6,] 0 0 1 0 0 1 0 0 1
[7,] 1 0 0 1 0 0 1 0 0
[8,] 0 1 0 0 1 0 0 1 0
[9,] 0 0 1 0 0 1 0 0 1
In the real data, the team number are not sequential, and are would be more like 5732,1345,3451,etc, and there are more matches per team meaning the matrix values would be between 0 and max number of matches any of the teams played. This can be seen in the real data.
Thanks to anyone that can help.
There is probably a more elegant approach, but here is one using data.table.
library(data.table)
dat <- data.table(Team.1=1:3,Team.2=4:6,Team.3=7:9)
#add match ID
dat[,match:=1:.N]
#turn to long
mdat <- melt(dat,id="match",value.name="team")[,variable:=NULL]
#merge with itself
dat2 <- merge(mdat, mdat, by=c("match"),all=T, allow.cartesian = T)
# reshape
dcast(dat2, team.x~team.y, fun.agg=length)
team.x 1 2 3 4 5 6 7 8 9
1: 1 1 0 0 1 0 0 1 0 0
2: 2 0 1 0 0 1 0 0 1 0
3: 3 0 0 1 0 0 1 0 0 1
4: 4 1 0 0 1 0 0 1 0 0
5: 5 0 1 0 0 1 0 0 1 0
6: 6 0 0 1 0 0 1 0 0 1
7: 7 1 0 0 1 0 0 1 0 0
8: 8 0 1 0 0 1 0 0 1 0
9: 9 0 0 1 0 0 1 0 0 1
And, because I can, one in base-R. A case where I think the use of a for-loop is justified (as you keep modifying the same object).
#make matrix to put results in
nteams = length(unique(unlist(dat)))
res <- matrix(0,nrow=nteams, ncol=nteams)
#split data by row, generate combinations for each row and add to matrix
for(i in 1:nrow(dat)){
x=unlist(dat[i,])
coords=as.matrix(expand.grid(x,x))
res[coords] <- res[coords]+1
}
Here is my suggestion with base functions. I tried to create a matrix. My approach was to look for the position indexes for 1.
library(magrittr)
mydf <- data.frame(Team.1 = 1:3, Team.2 = 4:6,Team.3 = 7:9)
### Create a matrix with position indexes
lapply(1:nrow(mydf), function(x){
a <- t(combn(mydf[x, ], 2)) # Get some combination
b <- a[, 2:1] # Get other combination by reversing columns
foo <- rbind(a, b)
foo
}) %>%
do.call(rbind, .) -> ana
ana <- matrix(unlist(ana), nrow = nrow(ana))
### Another set: Get indexes for self (e.g., (1,1), (2,2), (3,3))
foo <- rep(1:max(mydf), times = 2)
matrix(foo, nrow = length(foo) / 2) -> bob
### A matric with all position indexes
cammy <- rbind(ana, bob)
### Create a plain matrix
mat <- matrix(0, nrow = max(mydf), ncol = max(mydf))
### Fill in the matrix with 1
mat[cammy] <- 1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 1 0 0 1 0 0 1 0 0
# [2,] 0 1 0 0 1 0 0 1 0
# [3,] 0 0 1 0 0 1 0 0 1
# [4,] 1 0 0 1 0 0 1 0 0
# [5,] 0 1 0 0 1 0 0 1 0
# [6,] 0 0 1 0 0 1 0 0 1
# [7,] 1 0 0 1 0 0 1 0 0
# [8,] 0 1 0 0 1 0 0 1 0
# [9,] 0 0 1 0 0 1 0 0 1
EDIT
Here is a revised version based on the previous idea. This is not concise like Heroka's idea with base functions. In my modified data, team 1 and 4 had two matches. The idea here is that I counted how many times each pair appeared in the data set. The dplyr part is doing that. In the for loop, I filled in the matrix, mat by going through each row of cammy.
mydf <- data.frame(Team.1=c(1:3,1),Team.2=c(4:6,4),Team.3=c(7:9,5))
# Team.1 Team.2 Team.3
#1 1 4 7
#2 2 5 8
#3 3 6 9
#4 1 4 5
library(dplyr)
lapply(1:nrow(mydf), function(x){
a <- t(combn(mydf[x, ], 2)) # Get some combination
b <- a[, 2:1] # Get other combination by reversing columns
foo <- rbind(a, b)
foo
}) %>%
do.call(rbind, .) -> ana
ana <- data.frame(matrix(unlist(ana), nrow = nrow(ana)))
### Another set: Get indexes for self (e.g., (1,1), (2,2), (3,3))
foo <- rep(1:max(mydf), times = 2)
data.frame(matrix(foo, nrow = length(foo) / 2)) -> bob
cammy <- bind_rows(ana, bob) %>%
group_by(X1, X2) %>%
mutate(total = n()) %>%
as.matrix
### Create a plain matrix
mat <- matrix(0, nrow = max(mydf), ncol = max(mydf))
for(i in 1:nrow(cammy)){
mat[cammy[i, 1], cammy[i, 2]] <- cammy[i, 3]
}
print(mat)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 1 0 0 2 1 0 1 0 0
# [2,] 0 1 0 0 1 0 0 1 0
# [3,] 0 0 1 0 0 1 0 0 1
# [4,] 2 0 0 1 1 0 1 0 0
# [5,] 1 1 0 1 1 0 0 1 0
# [6,] 0 0 1 0 0 1 0 0 1
# [7,] 1 0 0 1 0 0 1 0 0
# [8,] 0 1 0 0 1 0 0 1 0
# [9,] 0 0 1 0 0 1 0 0 1

R comparing features to create matrix of distances

I have an R Question. I have an algorithm in mind which does this, but was wondering if there are neater ways of doing the following:
Say you have the following matrix:
[,1] [,2] [,3] [,4] [,5]
[A,] 0 0 0 0 1
[B,] 0 0 0 1 1
[C,] 0 0 1 1 1
[D,] 0 0 1 1 0
[E,] 1 0 0 0 0
[F,] 1 1 1 0 0
Now I want to create another matrix of the differences of each row to another row (i.e., matrix of distances) something like (although I have it half filled, it is just mirror to get top part):
[,A] [,B] [,C] [,D] [,E] [,F]
[A,] 0
[B,] 1 0
[C,] 2 1 0
[D,] 3 2 1 0
[E,] 2 3 4 3 0
[F,] 4 5 4 3 2 0
My method is to use a loop comparing each row's columns with corresponding columns of rows below, but with large matrices its not efficient. Any ideas on how to do this better?
thx
As said in the comment using dist with manhattan method:
dt <- read.table(text=' [,1] [,2] [,3] [,4] [,5]
[A,] 0 0 0 0 1
[B,] 0 0 0 1 1
[C,] 0 0 1 1 1
[D,] 0 0 1 1 0
[E,] 1 0 0 0 0
[F,] 1 1 1 0 0')
mm <- as.matrix(dt)
dist(mm,method='manhattan' ,diag=TRUE)
[A,] [B,] [C,] [D,] [E,] [F,]
[A,] 0
[B,] 1 0
[C,] 2 1 0
[D,] 3 2 1 0
[E,] 2 3 4 3 0
[F,] 4 5 4 3 2 0

How to transform a item set matrix in R

How to transform a matrix like
A 1 2 3
B 3 6 9
c 5 6 9
D 1 2 4
into form like:
1 2 3 4 5 6 7 8 9
1 0 2 1 1 0 0 0 0 0
2 0 0 1 1 0 0 0 0 0
3 0 0 0 0 0 1 0 0 1
4 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 1 0 0 1
6 0 0 0 0 0 0 0 0 2
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
I have some implement for it ,but it use the for loop
I wonder if there has some inner function in R (for example "apply")
add:
Sorry for the confusion.The first matrix just mean items sets, every set of items come out pairs ,for example the first set is "1 2 3" , and will become (1,2),(1,3),(2,3), correspond the second matrix.
and another question :
If the matrix is very large (10000000*10000000)and is sparse
should I use sparse matrix or big.matrix?
Thanks!
Removing the row names from M gives this:
m <- matrix(c(1,3,5,1,2,6,6,2,3,9,9,4), nrow=4)
> m
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 3 6 9
## [3,] 5 6 9
## [4,] 1 2 4
# The indicies that you want to increment in x, but some are repeated
# combn() is used to compute the combinations of columns
indices <- matrix(t(m[,combn(1:3,2)]),,2,byrow=TRUE)
# Count repeated rows
ones <- rep(1,nrow(indices))
cnt <- aggregate(ones, by=as.data.frame(indices), FUN=sum)
# Set each value to the appropriate count
x <- matrix(0, 9, 9)
x[as.matrix(cnt[,1:2])] <- cnt[,3]
x
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 0 2 1 1 0 0 0 0 0
## [2,] 0 0 1 1 0 0 0 0 0
## [3,] 0 0 0 0 0 1 0 0 1
## [4,] 0 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 1 0 0 1
## [6,] 0 0 0 0 0 0 0 0 2
## [7,] 0 0 0 0 0 0 0 0 0
## [8,] 0 0 0 0 0 0 0 0 0
## [9,] 0 0 0 0 0 0 0 0 0

Resources