R comparing features to create matrix of distances - r

I have an R Question. I have an algorithm in mind which does this, but was wondering if there are neater ways of doing the following:
Say you have the following matrix:
[,1] [,2] [,3] [,4] [,5]
[A,] 0 0 0 0 1
[B,] 0 0 0 1 1
[C,] 0 0 1 1 1
[D,] 0 0 1 1 0
[E,] 1 0 0 0 0
[F,] 1 1 1 0 0
Now I want to create another matrix of the differences of each row to another row (i.e., matrix of distances) something like (although I have it half filled, it is just mirror to get top part):
[,A] [,B] [,C] [,D] [,E] [,F]
[A,] 0
[B,] 1 0
[C,] 2 1 0
[D,] 3 2 1 0
[E,] 2 3 4 3 0
[F,] 4 5 4 3 2 0
My method is to use a loop comparing each row's columns with corresponding columns of rows below, but with large matrices its not efficient. Any ideas on how to do this better?
thx

As said in the comment using dist with manhattan method:
dt <- read.table(text=' [,1] [,2] [,3] [,4] [,5]
[A,] 0 0 0 0 1
[B,] 0 0 0 1 1
[C,] 0 0 1 1 1
[D,] 0 0 1 1 0
[E,] 1 0 0 0 0
[F,] 1 1 1 0 0')
mm <- as.matrix(dt)
dist(mm,method='manhattan' ,diag=TRUE)
[A,] [B,] [C,] [D,] [E,] [F,]
[A,] 0
[B,] 1 0
[C,] 2 1 0
[D,] 3 2 1 0
[E,] 2 3 4 3 0
[F,] 4 5 4 3 2 0

Related

Matrix generation in R without loop

I am trying to create a matrix of the following kind in R: the number of rows is equal to n (supplied); in row i, for all i=1:n, the elements at positions n(i-1)+1 through n(i-1)+n inclusive are 1, all other elements are 0.
For example, if n=3, the matrix looks like
1 1 1 0 0 0 0 0 0
0 0 0 1 1 1 0 0 0
0 0 0 0 0 0 1 1 1
Or for n=4:
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
Is there any way of constructing this matrix in R, for general n, without using for loops (or any other kind of loop preferably)?
The simplest / most efficient method (in base R) would be ideal.
Solution 1: diag returns the diagonal of a matrix. Repeat each element 3 times and (re-)coerce it into a matrix:
matrix(rep(diag(3), each=3), nrow=3, byrow=TRUE)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#> [1,] 1 1 1 0 0 0 0 0 0
#> [2,] 0 0 0 1 1 1 0 0 0
#> [3,] 0 0 0 0 0 0 1 1 1
Solution 2: table interprets the two vectors as factors and counts the combinations of their levels. Since each combination only exists once, you get the same result:
table(rep(1:3, each = 3), 1:9)
#>
#> 1 2 3 4 5 6 7 8 9
#> 1 1 1 1 0 0 0 0 0 0
#> 2 0 0 0 1 1 1 0 0 0
#> 3 0 0 0 0 0 0 1 1 1
Created on 2021-02-21 by the reprex package (v1.0.0)

Working With a Diagonal Matrix

Hi I'm pretty much stumped on on trying to figure this out and could use a little help. Basically, I have a n x n matrix where the diagonal is set to a value k and every other value is 0.
1 2 3 4 5
1 k 0 0 0 0
2 0 k 0 0 0
3 0 0 k 0 0
4 0 0 0 k 0
5 0 0 0 0 k
Basically, I need to be able to make two other diagonals in this matrix with the value of 1 so it ends up looking like this:
1 2 3 4 5
1 k 1 0 0 0
2 1 k 1 0 0
3 0 1 k 1 0
4 0 0 1 k 1
5 0 0 0 1 k
So far all I have for code is being able to make the diagonal matrix
m=diag(k,n,n) but I have no idea on how to add the two other diagonals. Would I use apply() and cbind() or rbind()?
You can use col and row to create and index to subset and assign the upper and lower diagonals.
k=3
m <- k* diag(6)
m[abs(row(m) - col(m)) == 1] <- 1
m
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 3 1 0 0 0 0
#[2,] 1 3 1 0 0 0
#[3,] 0 1 3 1 0 0
#[4,] 0 0 1 3 1 0
#[5,] 0 0 0 1 3 1
#[6,] 0 0 0 0 1 3
If you wanted reverse diagonals you could use col(m) - row(m)
Try this function, it will make a matrix of dimensions row X col and diagonal of the numeric n.
matfun <- function(diag=n, row=4,col=4){
x = diag(1,row,col)
diag*x+rbind(as.vector(rep(0,col)),x[1:(row-1),])+cbind(as.vector(rep(0,row)),x[,1:(col-1)])
}
HTH

Matrix from rows with delimited items in R

I have a such database with semicolon delimited values in rows:
A;1;3;5;7;9
B;1;2;3
C;1;3;5
D;2;4;8
There is different count of items in each row. Each item is only once in each row (no repeating).
I'd like to make a matrix for item base collaborative filtering. The first column with letters is deleted and the numbers are transformed like this:
1 2 3 4 5 6 7 8 9
-----------------
1 0 1 0 1 0 1 0 1
1 1 1 0 0 0 0 0 0
1 0 1 0 1 0 0 0 0
0 1 0 1 0 0 0 0 0
Can you please give me an advice how to manage it?
Here is an option. We read in the string into a character vector, strsplit on ;, initialize the empty matrix, and then assign for each row using a matrix index of the row with all the column values:
DAT <- readLines(textConnection("A;1;3;5;7;9
B;1;2;3
C;1;3;5
D;2;4;8"))
DAT.NUM <- lapply(strsplit(DAT, ";"), function(x) as.integer(x[-1]))
RES <- matrix(0L, length(DAT), max(unlist(DAT.NUM)))
for(i in seq_along(DAT)) RES[cbind(i, DAT.NUM[[i]])] <- 1L
Produces:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 0 1 0 1 0 1 0 1
[2,] 1 1 1 0 0 0 0 0 0
[3,] 1 0 1 0 1 0 0 0 0
[4,] 0 1 0 1 0 0 0 1 0
Alternatively, inspired by #user227710, you can:
t(table(stack(setNames(DAT.NUM, seq_along(DAT.NUM)))))
Which produces:
values
ind 1 2 3 4 5 7 8 9
1 1 0 1 0 1 1 0 1
2 1 1 1 0 0 0 0 0
3 1 0 1 0 1 0 0 0
4 0 1 0 1 0 0 1 0

list all permutations of k numbers, taken from 0:k, that sums to k

This question is closely related to another question R:sample(). I want to find a way in R to list all the permutations of k numbers, that sums to k, where each number is chosen from 0:k. If k=7, I can choose 7 numbers from 0,1,...,7. A feasible solution is then 0,1,2,3,1,0,0 another is 1,1,1,1,1,1,1. I don't want to generate all permutations, since if k is just fairly larger than 7 this explodes.
Of course in the k=7 example I could use the following:
perms7<-matrix(numeric(7*1716),ncol=7)
count=0
for(i in 0:7)
for(j in 0:(7-i))
for(k in 0:(7-i-j))
for(l in 0:(7-i-j-k))
for(n in 0:(7-i-j-k-l))
for(m in 0:(7-i-j-k-l-n)){
res<-7-i-j-k-l-n-m
count<-count+1
perms7[count,]<-c(i,j,k,l,n,m,res)
}
head(perms7,10)
But how can I generalize this approach to account for any k without having to write (k-1) loops?
I tried to come up with a recursive scheme:
perms7<-matrix(numeric(7*1716),ncol=7) #store solutions (adjustable size later)
k<-7 #size of interest
d<-0 #depth
count=0 #count of permutations
rec<-function(j,d,a){
a<-a-j #max loop
d<-d+1 #depth (posistion)
for(i in 0:a ) {
if(d<(k-1)) rec(i,d,a)
count<<-count+1
perms7[count,d]<<-i
perms7[count,k]<<-k-sum(perms7[count,-k])
}
}
rec(0,0,k)
But got stuck, and I'm not quite sure this is the right way to go. Wonder if there is any "magic" R function that is neat for this (though very specific) problem or just part of it.
In the k=7 case, all the 2.097.152 permutations and the 1.716 that sum to k=7 can be found by:
library(gtools)
k=7
perms <- permutations(k+1, k, 0:k, repeats.allowed=T) #all permutations
perms.k <- perms[rowSums(perms) == k,] #permutations which sums to k
for k=8 there are 43.046.721 permutations but I only want to list the 6.435.
Any help is greatly appreciated!
There's a package for that...
require( partitions )
parts(7)
#[1,] 7 6 5 5 4 4 4 3 3 3 3 2 2 2 1
#[2,] 0 1 2 1 3 2 1 3 2 2 1 2 2 1 1
#[3,] 0 0 0 1 0 1 1 1 2 1 1 2 1 1 1
#[4,] 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1
#[5,] 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
#[6,] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
#[7,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
You appear to be looking for compositions(). e.g. for k=4:
parts(4)
#[1,] 4 3 2 2 1
#[2,] 0 1 2 1 1
#[3,] 0 0 0 1 1
#[4,] 0 0 0 0 1
compositions(4,4)
#[1,] 4 3 2 1 0 3 2 1 0 2 1 0 1 0 0 3 2 1 0 2 1 0 1 0 0 2 1 0 1 0 0 1 0 0 0
#[2,] 0 1 2 3 4 0 1 2 3 0 1 2 0 1 0 0 1 2 3 0 1 2 0 1 0 0 1 2 0 1 0 0 1 0 0
#[3,] 0 0 0 0 0 1 1 1 1 2 2 2 3 3 4 0 0 0 0 1 1 1 2 2 3 0 0 0 1 1 2 0 0 1 0
#[4,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 4
And just to check your math... :-)
ncol(compositions(8,8))
#[1] 6435

Experimental design table in R [duplicate]

This question already has answers here:
How to create design matrix in r
(6 answers)
Closed 8 years ago.
How can I generate the following experimental design table in R?
Looks like you want every combination except 0 0 0 0.
> # create all combinations of 4 0s/1s
> design <- expand.grid(0:1, 0:1, 0:1, 0:1)
> design
Var1 Var2 Var3 Var4
1 0 0 0 0
2 1 0 0 0
3 0 1 0 0
4 1 1 0 0
5 0 0 1 0
6 1 0 1 0
7 0 1 1 0
8 1 1 1 0
9 0 0 0 1
10 1 0 0 1
11 0 1 0 1
12 1 1 0 1
13 0 0 1 1
14 1 0 1 1
15 0 1 1 1
16 1 1 1 1
> # remove the single run you don't want
> design[-1,]
Var1 Var2 Var3 Var4
2 1 0 0 0
3 0 1 0 0
4 1 1 0 0
5 0 0 1 0
6 1 0 1 0
7 0 1 1 0
8 1 1 1 0
9 0 0 0 1
10 1 0 0 1
11 0 1 0 1
12 1 1 0 1
13 0 0 1 1
14 1 0 1 1
15 0 1 1 1
16 1 1 1 1
You may make use of a nice trick connected with binary representations of consecutive integers (I assume you do not wish to generate a row with zeros only):
n <- 4
M <- matrix(NA_integer_, nrow=2^n-1, ncol=n)
for (i in 1:(2^n-1))
M[i, ] <- as.integer(intToBits(i)[1:n])
print(M)
which gives for n==4:
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 1 1 0 0
[4,] 0 0 1 0
[5,] 1 0 1 0
[6,] 0 1 1 0
[7,] 1 1 1 0
[8,] 0 0 0 1
[9,] 1 0 0 1
[10,] 0 1 0 1
[11,] 1 1 0 1
[12,] 0 0 1 1
[13,] 1 0 1 1
[14,] 0 1 1 1
[15,] 1 1 1 1
If you're going to analyze factorial designs in R, you're better off using one of the many DoE packages. For instance, the DoE.base package has a function, fac.design(...) which does essentially what you want:
library(DoE.base)
df <- fac.design(nlevels=2,nfactors=4,randomize=F,
factor.names=list(0:1,0:1,0:1,0:1))
As pointed out in another answer, your design is a full factorial, except that is it missing two of the combinations (which makes me wonder if it's a factorial design at all...).

Resources