r - Make adjacency matrix with tcrossprod capturing only positive values - r

I need to create an adjacency matrix from a dataframe using tcrossprod, but the resulting matrix needs to obey a restriction that I will explain below. Consider the following dataframe:
z <- data.frame(Person = c("a","b","c","d"), Man_United = c(1,0,1,0))
z
Person Man_United
1 a 1
2 b 0
3 c 1
4 d 0
I make an adjacency matrix from z using tcrossprod.
x <- tcrossprod(table(z))
diag(x) <- 0
x
Person
Person a b c d
a 0 0 1 0
b 0 0 0 1
c 1 0 0 0
d 0 1 0 0
I need the resulting adjacency matrix to indicate a tie (here signaled with the number 1), only when both persons have value 1 in the original dataframe (i.e. are fans of Manchester United, in this example). For example, persons "a" and "c" of dataframe z are fans, so in the resulting adjacency matrix I want their intersecting cell to be valued 1. That works fine here. However, persons "b" and "d" are not fans, and the fact that both have value 0 in the original dataframe does not mean that they are connected in any meaningful way. tcrossprod, however, produces a matrix that suggests that they are in fact connected.
How to use tcrossprod in a way that it caputures only the positve values of dataframes in producing adjacency matrices?

We may restrict attention on table results of ones with
tcrossprod(table(z)[, "1"])
# [,1] [,2] [,3] [,4]
[# 1,] 1 0 1 0
# [2,] 0 0 0 0
# [3,] 1 0 1 0
# [4,] 0 0 0 0
or, if you want to preserve the names,
tcrossprod(table(z)[, "1", drop = FALSE])
# Person
# Person a b c d
# a 1 0 1 0
# b 0 0 0 0
# c 1 0 1 0
# d 0 0 0 0
If there can be more nonzero values, then you may replace "1" by -1 as to eliminate the column for zeroes.

Related

R - In new dataframe: if cell matches another column of same row, then

d <- data.frame(B1 = c(1,2,3,4),B2 = c(0,1,2,3))
d$total=rowSums(d)
B1 B2 total
1 0 1
2 1 3
3 2 5
4 3 7
Using the dataframe above, I want to create a new dataframe with the following logic:
Going by rows, if cells (B1:B2) matches d$total, return 1, else 0.
Ideally output to look like:
B1n B2n
1 0
0 0
0 0
0 0
What is the best way to do this in R?
Thank you.
You can compare first 2 columns with total value.
res <- +(d[1:2] == d$total)
res
# B1 B2
#[1,] 1 0
#[2,] 0 0
#[3,] 0 0
#[4,] 0 0
The result is a matrix, if you want dataframe as output you can do res <- data.frame(res).
Here is an alternate way to solve this problem. You can use dplyr::transmute which is the opposite of dplyr::mutate which will give you two separate columns. Inside transmute are just conditions.
library(dplyr)
newdf <- d %>% transmute(B1n=ifelse(B1+B2==B1,1,0),B2n=ifelse(B1+B2==B2,1,0))
> newdf
B1n B2n
1 1 0
2 0 0
3 0 0
4 0 0

Extract column names with value 1 in binary matrix

I have a problem; i would like to create a new matrix starting from a binary matrix structured like this:
A B C D E F
G 0 0 1 1 0 0
H 0 0 0 1 1 0
I 0 0 0 0 1 0
L 1 1 0 0 0 0
i want to create a new matrix made by row names of the starting one, and a new and unique column, called X, which contains for every rows, the name/names of the column every time the correspondent matrix number is 1.
How could i do?
Try this where m is your matrix:
as.matrix(apply(m==1,1,function(a) paste0(colnames(m)[a], collapse = "")))
# [,1]
#G "CD"
#H "DE"
#I "E"
#L "AB"
Another option which might be faster if m is large:
t <- which(m==1, arr.ind = TRUE)
as.matrix(aggregate(col~row, cbind(row=rownames(t), col=t[,2]), function(x)
paste0(colnames(m)[x], collapse = "")))

transition matrix force ncol to equal nrows

I have created a transition matrix as a 'from cluster' (rows) 'to cluster' (columns) frequency. Think Markov chain.
Assume I have 5 from clusters but only 3 to clusters then I get a 5*3 transition matrix. How do a force it to be a 5*5 transition matrix? Effectively how to I show the all zero columns?
I'm after an elegant solution as this will be applied on a much larger problem involving hundreds of clusters. I am really quite unfamiliar with R Matrix's and to my knowledge I don't know of an elegant way to force number of columns to enter number of rows then impute zero's where no match except for using a for loop which my hunch is that's not the best solution.
Example code:
# example data
cluster_before <- c(1,2,3,4,5)
cluster_after <- c(1,2,4,4,1)
# Table output
table(cluster_before,cluster_after)
# ncol does not = nrows. I want to rectify that
# I want output to look like this:
what_I_want <- matrix(
c(1,0,0,0,0,
0,1,0,0,0,
0,0,0,1,0,
0,0,0,1,0,
1,0,0,0,0),
byrow=TRUE,ncol=5
)
# Possible solution. But for loop can't be best solution?
empty_mat <- matrix(0,ncol=5,nrow=5)
matrix_to_update <- empty_mat
for (i in 1:length(cluster_before)) {
val_before <- cluster_before[i]
val_after <- cluster_after[i]
matrix_to_update[val_before,val_after] <- matrix_to_update[val_before,val_after]+1
}
matrix_to_update
# What's the more elegant solution?
Thanks in advance for your help. It's much appreciated.
Make them factors and then table:
levs <- union(cluster_before, cluster_after)
table(factor(cluster_before,levs), factor(cluster_after,levs))
# 1 2 3 4 5
# 1 1 0 0 0 0
# 2 0 1 0 0 0
# 3 0 0 0 1 0
# 4 0 0 0 1 0
# 5 1 0 0 0 0
Another solution is to use matrix indicies:
what_I_want <- matrix(0,ncol=5,nrow=5)
what_I_want[cbind(cluster_before,cluster_after)] <- 1
print(what_I_want)
## [,1] [,2] [,3] [,4] [,5]
##[1,] 1 0 0 0 0
##[2,] 0 1 0 0 0
##[3,] 0 0 0 1 0
##[4,] 0 0 0 1 0
##[5,] 1 0 0 0 0
The second line sets the elements corresponding to the row (cluster_before) and column (cluster_after) indices to 1.
Hope this helps.

Creating a Matrix From a Vector in R

I have a vector with two columns, one column containing numerical values and one column containing names.I'm a novice to R but essentially I want to take a vector and create a matrix with it wherein the values within the matrix would add together. So for example, where the vector A has a value of 1 and B has a value of 1, in the matrix at the intersection of A and B I want the values to add and become 2.
I've tried to use a for loop but I'm having trouble with the arguments to put within the loop. Any help would be greatly appreciated and I'd be glad to clarify stuff if it doesn't make sense.
Essentially what I want is to take this:
A 1
B 0
C 0
D 1
And turn it into this:
A B C D
A 1 1 2
B 1 0 1
C 1 0 1
D 2 1 1
Thanks!
R > x <- c(1,0,0,1)
R > outer(x, x, "+")
[,1] [,2] [,3] [,4]
[1,] 2 1 1 2
[2,] 1 0 0 1
[3,] 1 0 0 1
[4,] 2 1 1 2
The next thing is to ignore the diagonal. Updated by Vincent:
names(x) <- c("A","B","C","D")

How to randomize (or permute) a dataframe rowwise and columnwise?

I have a dataframe (df1) like this.
f1 f2 f3 f4 f5
d1 1 0 1 1 1
d2 1 0 0 1 0
d3 0 0 0 1 1
d4 0 1 0 0 1
The d1...d4 column is the rowname, the f1...f5 row is the columnname.
To do sample(df1), I get a new dataframe with count of 1 same as df1. So, the count of 1 is conserved for the whole dataframe but not for each row or each column.
Is it possible to do the randomization row-wise or column-wise?
I want to randomize the df1 column-wise for each column, i.e. the number of 1 in each column remains the same. and each column need to be changed by at least once. For example, I may have a randomized df2 like this: (Noted that the count of 1 in each column remains the same but the count of 1 in each row is different.
f1 f2 f3 f4 f5
d1 1 0 0 0 1
d2 0 1 0 1 1
d3 1 0 0 1 1
d4 0 0 1 1 0
Likewise, I also want to randomize the df1 row-wise for each row, i.e. the no. of 1 in each row remains the same, and each row need to be changed (but the no of changed entries could be different). For example, a randomized df3 could be something like this:
f1 f2 f3 f4 f5
d1 0 1 1 1 1 <- two entries are different
d2 0 0 1 0 1 <- four entries are different
d3 1 0 0 0 1 <- two entries are different
d4 0 0 1 0 1 <- two entries are different
PS. Many thanks for the help from Gavin Simpson, Joris Meys and Chase for the previous answers to my previous question on randomizing two columns.
Given the R data.frame:
> df1
a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0
Shuffle row-wise:
> df2 <- df1[sample(nrow(df1)),]
> df2
a b c
3 0 1 0
4 0 0 0
2 1 0 0
1 1 1 0
By default sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing parameter replace=FALSE (the default) to sample(...) ensures that sampling is done without replacement which accomplishes a row wise shuffle.
Shuffle column-wise:
> df3 <- df1[,sample(ncol(df1))]
> df3
c a b
1 0 1 1
2 0 1 0
3 0 0 1
4 0 0 0
This is another way to shuffle the data.frame using package dplyr:
row-wise:
df2 <- slice(df1, sample(1:n()))
or
df2 <- sample_frac(df1, 1L)
column-wise:
df2 <- select(df1, one_of(sample(names(df1))))
Take a look at permatswap() in the vegan package. Here is an example maintaining both row and column totals, but you can relax that and fix only one of the row or column sums.
mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5)
set.seed(4)
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
This gives:
R> out$perm[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 1
[2,] 0 1 0 1 0
[3,] 0 0 0 1 1
[4,] 1 0 0 0 1
R> out$perm[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 0 1 1
[2,] 0 0 0 1 1
[3,] 1 0 0 1 0
[4,] 0 0 1 0 1
To explain the call:
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
times is the number of randomised matrices you want, here 99
burnin is the number of swaps made before we start taking random samples. This allows the matrix from which we sample to be quite random before we start taking each of our randomised matrices
thin says only take a random draw every thin swaps
mtype = "prab" says treat the matrix as presence/absence, i.e. binary 0/1 data.
A couple of things to note, this doesn't guarantee that any column or row has been randomised, but if burnin is long enough there should be a good chance of that having happened. Also, you could draw more random matrices than you need and discard ones that don't match all your requirements.
Your requirement to have different numbers of changes per row, also isn't covered here. Again you could sample more matrices than you want and then discard the ones that don't meet this requirement also.
you can also use the randomizeMatrix function in the R package picante
example:
test <- matrix(c(1,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0),nrow=4,ncol=4)
> test
[,1] [,2] [,3] [,4]
[1,] 1 0 1 0
[2,] 1 1 0 1
[3,] 0 0 0 0
[4,] 1 0 1 0
randomizeMatrix(test,null.model = "frequency",iterations = 1000)
[,1] [,2] [,3] [,4]
[1,] 0 1 0 1
[2,] 1 0 0 0
[3,] 1 0 1 0
[4,] 1 0 1 0
randomizeMatrix(test,null.model = "richness",iterations = 1000)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 1
[2,] 1 1 0 1
[3,] 0 0 0 0
[4,] 1 0 1 0
>
The option null.model="frequency" maintains column sums and richness maintains row sums.
Though mainly used for randomizing species presence absence datasets in community ecology it works well here.
This function has other null model options as well, check out following link for more details (page 36) of the picante documentation
Of course you can sample each row:
sapply (1:4, function (row) df1[row,]<<-sample(df1[row,]))
will shuffle the rows itself, so the number of 1's in each row doesn't change. Small changes and it also works great with columns, but this is a exercise for the reader :-P
If the goal is to randomly shuffle each column, some of the above answers don't work since the columns are shuffled jointly (this preserves inter-column correlations). Others require installing a package. Yet a one-liner exist:
df2 = lapply(df1, function(x) { sample(x) })
You can also "sample" the same number of items in your data frame with something like this:
nr<-dim(M)[1]
random_M = M[sample.int(nr),]
Random Samples and Permutations ina dataframe
If it is in matrix form convert into data.frame
use the sample function from the base package
indexes = sample(1:nrow(df1), size=1*nrow(df1))
Random Samples and Permutations
Here is a data.table option using .N with sample like this:
library(data.table)
setDT(df)
df[sample(.N)]
#> a b c
#> 1: 0 1 0
#> 2: 1 1 0
#> 3: 1 0 0
#> 4: 0 0 0
Created on 2023-01-28 with reprex v2.0.2
Data:
df <- read.table(text = " a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0", header = TRUE)

Resources