R: how to convert a binary interactions dataframe into a matrix? - r

I have a interaction datframe in R, like this:
> interaction
x y z
[1,] 4 1 112
[2,] 3 1 104
[3,] 2 4 19
[4,] 1 3 154
[5,] 3 5 332
[6,] 4 1 187
[7,] 5 5 489
[8,] 2 2 149
i want to covert it into a matrix, take x as rownames, take y as colnames, and take z as their interaction value, x,y can take same value.
anybody knows how to convert? maybe just one step in R. Please.
thank you very much!
-------------------2017/3/31---------------------------------------
or there is another edition of my question:
interactions <-data.frame(x=c(40,30,20,10,30,40,50,80),y=c(50,10,40,30,50,10,50,90),z=c(112,104,19,154,332,187,489,149))
m <- matrix(0,10,10)
colnames(m)<-c(10,20,30,40,50,60,70,80,90,100)
rownames(m)<-c(10,20,30,40,50,60,70,80,90,100)
how to covert the interactions data into matrix "m".
thank you!

Is this the sort of thing...? It assumes you want to add duplicates - such as rows 1 and 6 (both (4,1)). (See much better solution in comment below!)
intn <- data.frame(x=c(4,3,2,1,3,4,5,2),y=c(1,1,4,3,5,1,5,2),z=c(112,104,19,154,332,187,489,149))
m <- matrix(0,nrow=max(intn$x),ncol=max(intn$y))
for(i in seq_len(nrow(intn))) {
m[intn$x[i],intn$y[i]] <- m[intn$x[i],intn$y[i]] + intn$z[i]
}
m
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 154 0 0
[2,] 0 149 0 19 0
[3,] 104 0 0 0 332
[4,] 299 0 0 0 0
[5,] 0 0 0 0 489
In response to follow-up question - if there are more possible values of x and y, you can still use xtabs but add in some dummy data with the valid x and y values. The row and column names will be the combined dummy and actual values (as characters rather than numeric). Something like this...
xvals <- c(-2,0,1,2,3,4,5,2.5,7) #possible x values
yvals <- c(-1,1,2,2.5,3,4,5,6,7) #possible y values
dum <- data.frame(x=xvals,y=yvals) #xvals and yvals need to be same length
dum$z <- 0
m2 <- xtabs(z~x+y,rbind(dum,intn))
m2
y
x -1 1 2 2.5 3 4 5 6 7
-2 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 154 0 0 0 0
2 0 0 149 0 0 19 0 0 0
2.5 0 0 0 0 0 0 0 0 0
3 0 104 0 0 0 0 332 0 0
4 0 299 0 0 0 0 0 0 0
5 0 0 0 0 0 0 489 0 0
7 0 0 0 0 0 0 0 0 0

Related

Intersection of two integer matrices by position R

I would like to know which positions of one matrix intersect with another matrix and which values, for example
lab <- as.matrix(read.table(text="[1,] 0 0 0 0 0 0 0 0 0 1
[2,] 2 0 2 2 2 2 2 2 2 0
[3,] 2 0 2 0 0 0 0 0 2 2
[4,] 2 2 2 0 0 0 0 0 2 2
[5,] 2 0 2 0 0 0 0 0 0 0
[6,] 2 0 2 0 0 0 0 0 0 0
[7,] 2 0 2 0 0 0 0 0 0 0
[8,] 2 0 2 0 0 0 0 3 3 3
[9,] 2 0 2 0 0 0 0 0 3 3
[10,] 2 0 2 0 0 0 0 0 0 3")[,-1])
str(lab)
la1 <- as.matrix(read.table(text="[1,] 0 1 0 0 0 0 0 0 0 2
[2,] 3 0 4 4 4 4 4 4 4 0
[3,] 3 0 4 0 0 0 0 0 4 4
[4,] 3 0 4 0 5 5 0 0 4 4
[5,] 3 0 4 0 5 5 0 0 0 0
[6,] 3 0 4 0 0 0 0 0 0 0
[7,] 3 0 4 0 0 0 0 0 0 0
[8,] 3 0 4 0 0 0 0 6 6 6
[9,] 3 0 4 0 0 0 0 6 6 6
[10,] 3 0 4 0 0 0 0 0 0 6")[,-1])
Then, these numbers represent patches, patch 3 of la1 intersect patch 3 and 4 of la1, patch 1 of lab intersect 0 (no other patch), patch 3 of lab intersect patch 6 of la1. I am using the following code
require(dplyr)
tuples <- tibble()
dx <- dim(lab)[1]
for( i in seq_len(dx))
for( j in seq_len(dx))
{
ii <- tibble(l0=lab[i,j],l1=la1[i,j])
tuples <- bind_rows(tuples,ii)
}
tuples %>% distinct()
As I will use big 3000x3000 matrices so I am thinking if there is any faster way, maybe with rcpp or raster, of doing it.
Without a double for loop, we can transpose the matrixes into a two column tibble and get the distinct rows
out <- tibble(l0 = c(t(lab)), l1 = c(t(la1))) %>%
distinct
-checking with OP's output
out_old <- tuples %>%
distinct()
all.equal(out, out_old, check.attributes = FALSE)
#[1] TRUE
Benchmarks
lab2 <- matrix(sample(0:9, size = 3000 * 3000, replace = TRUE), 3000, 3000)
la2 <- matrix(sample(0:9, size = 3000 * 3000, replace = TRUE), 3000, 3000)
system.time({out2 <- tibble(l0 = c(t(lab2)), l1 = c(t(la2))) %>%
distinct})
# user system elapsed
# 0.398 0.042 0.440
If you just want to speed up, you can try unique over data.table, e.g.,
unique(data.table(c(lab), c(la)))
Here comes a base R solution.
as.vector might be faster than c.
unique(cbind(as.vector(lab), as.vector(la1)))
# [,1] [,2]
# [1,] 0 0
# [2,] 2 3
# [3,] 0 1
# [4,] 2 0
# [5,] 2 4
# [6,] 0 5
# [7,] 3 6
# [8,] 0 6
# [9,] 1 2

How to add matrix across the diagonal ("folding it in half") in R?

I need to add one half of my matrix to the other half across the diagonal. In my matrix (shown below), I need the "1" in 63,25 to be added to the "2" in 25,63, and so on for all values in the matrix.
Then I need a way to clear out half of the matrix, either above or below the diagonal.
I tried:
sum(diag(lakes_matrix))
but this did not work.
25 63 1567 40 50 60 70 80
25 0 2 0 0 0 0 0 0
63 1 0 0 0 0 0 0 0
1567 0 1 0 0 0 0 0 0
40 0 0 1 0 0 0 0 0
50 0 0 0 2 0 0 0 0
60 0 0 0 0 0 0 0 0
70 0 0 0 0 0 1 0 0
80 0 0 0 0 0 0 0 0
m <- matrix(1:9, nrow = 3)
m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
n <- m + t(m) # add transpose to original
n
[,1] [,2] [,3]
[1,] 2 6 10
[2,] 6 10 14
[3,] 10 14 18
n * upper.tri(n) # clear out the lower diagonal
[,1] [,2] [,3]
[1,] 0 6 10
[2,] 0 0 14
[3,] 0 0 0
So you can make a function
my_func <- function(m) {
# do some assertions: m is matrix, square and numeric etc
(m + t(m)) * upper.tri(m)
}

Clustering of Count data

I am currently trying to find clusters in a data set that looks like this:
Dienstag 19 Mittwoch 20 Donnerstag 21 Freitag 22 Montag 25 Dienstag 26 Donnerstag 28
[1,] 0 0 0 0 0 0 NA
[2,] 0 0 0 0 0 0 NA
[3,] 0 0 0 0 0 0 NA
[4,] 0 0 0 0 1 0 NA
[5,] 1 0 1 1 1 1 NA
[6,] 0 0 0 0 0 0 NA
[7,] 4 0 1 0 2 1 NA
[8,] 0 1 2 1 0 2 NA
[9,] 0 0 1 0 0 0 NA
[10,] 1 0 0 0 0 1 0
[11,] 2 0 1 0 0 5 0
[12,] 1 0 0 0 0 1 1
[13,] 0 1 0 0 0 0 0
[14,] 0 0 1 0 4 1 0
It corresponds at the counting of times a user used an application given the day and the hour.
I want to find pattern/clusters that relate the usage with the hour, but I don't know how to manage it. It would really be helpful if you could give me some suggestions about methods.
There are statistical means at clustering as well but here's a visual approach. I was lazy and used libraries I am familiar with to accomplish this goal but it is likely accomplished more efficiently with some base tools.
## dat <- read.table(text=" Dienstag.19 Mittwoch.20 Donnerstag.21 Freitag.22 Montag.25 Dienstag.26 Donnerstag.28
## [1,] 0 0 0 0 0 0 NA
## [2,] 0 0 0 0 0 0 NA
## [3,] 0 0 0 0 0 0 NA
## [4,] 0 0 0 0 1 0 NA
## [5,] 1 0 1 1 1 1 NA
## [6,] 0 0 0 0 0 0 NA
## [7,] 4 0 1 0 2 1 NA
## [8,] 0 1 2 1 0 2 NA
## [9,] 0 0 1 0 0 0 NA
## [10,] 1 0 0 0 0 1 0
## [11,] 2 0 1 0 0 5 0
## [12,] 1 0 0 0 0 1 1
## [13,] 0 1 0 0 0 0 0
## [14,] 0 0 1 0 4 1 0", header=TRUE)
dat$hour <- factor(1:nrow(dat))
library(reshape2); library(qdap); library(ggplot2); library(plyr)
dat2 <- melt(dat)
dat2[, 2] <- beg2char(dat2[, 2], ".")
dat2 <- ddply(dat2, .(variable), transform,
rescale = scale(value))
ggsave("heat.png")
ggplot(dat3, aes(variable, hour)) + geom_tile(aes(fill=rescale)) +
scale_fill_gradient(low = "white", high = "red")
Most clustering algorithms will assume continuous data. While of course you can "cast" integers to double values, the results will no longer be as meaningful as they were for true continuous values.
I like Tylers visual approach. If there is a meaningful pattern, your brains visual cortex is probably the best tool to discover it.

How to create weighted adjacency list/matrix from edge list?

My problem is very simple: I need to create an adjacency list/matrix from a list of edges.
I have an edge list stored in a csv document with column1 = node1 and column2 = node2 and I would like to convert this to a weighted adjacency list or a weighted adjacency matrix.
To be more precise, here's how the data looks like -where the numbers are simply node ids:
node1,node2
551,548
510,512
548,553
505,504
510,512
552,543
512,510
512,510
551,548
548,543
543,547
543,548
548,543
548,542
Any tips on how to achieve the conversion from this to a weighted adjacency list/matrix?
This is how I resolved to do it previously, without success (courtesy of Dai Shizuka):
dat=read.csv(file.choose(),header=TRUE) # choose an edgelist in .csv file format
el=as.matrix(dat) # coerces the data into a two-column matrix format that igraph likes
el[,1]=as.character(el[,1])
el[,2]=as.character(el[,2])
g=graph.edgelist(el,directed=FALSE) # turns the edgelist into a 'graph object'
Thank you!
This response uses base R only. The result is a standard matrix used to represent the adjacency matrix.
el <- cbind(a=1:5, b=5:1) #edgelist (a=origin, b=destination)
mat <- matrix(0, 5, 5)
mat[el] <- 1
mat
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0 0 0 0 1
#[2,] 0 0 0 1 0
#[3,] 0 0 1 0 0
#[4,] 0 1 0 0 0
#[5,] 1 0 0 0 0
Here mat is your adjacency matrix defined from edgelist el, which is a simple cbind of the vectors 1:5 and 5:1.
If your edgelist includes weights, then you need a slightly different solution.
el <- cbind(a=1:5, b=5:1, c=c(3,1,2,1,1)) # edgelist (a=origin, b=destination, c=weight)
mat<-matrix(0, 5, 5)
for(i in 1:NROW(el)) mat[ el[i,1], el[i,2] ] <- el[i,3] # SEE UPDATE
mat
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0 0 0 0 3
#[2,] 0 0 0 1 0
#[3,] 0 0 2 0 0
#[4,] 0 1 0 0 0
#[5,] 1 0 0 0 0
UPDATE
Some time later I realized that the for loop (3rd line) in the previous weighted edgelist example is unnecessary. You can replace it with the following vectorized operation:
mat[el[,1:2]] <- el[,3]
The post on my website you mention in the question (https://sites.google.com/site/daishizuka/toolkits/sna/sna_data) uses the igraph package, so make sure that is loaded.
Moreover, I recently realized that igraph provides a much easier way to create a weighted adjacency matrix from edgelists, using graph.data.frame(). I've updated this on my site, but here is a simple example:
library(igraph)
el=matrix(c('a','b','c','d','a','d','a','b','c','d'),ncol=2,byrow=TRUE) #a sample edgelist
g=graph.data.frame(el)
get.adjacency(g,sparse=FALSE)
That should do it. The sparse=FALSE argument tells it to show the 0s in the adjacency matrix.
If you really don't want to use igraph, I think this is a clunky way to do it:
el=matrix(c('a','b','c','d','a','d','a','b','c','d'),ncol=2,byrow=TRUE) #a sample edgelist
lab=names(table(el)) #extract the existing node IDs
mat=matrix(0,nrow=length(lab),ncol=length(lab),dimnames=list(lab,lab)) #create a matrix of 0s with the node IDs as rows and columns
for (i in 1:nrow(el)) mat[el[i,1],el[i,2]]=mat[el[i,1],el[i,2]]+1 #for each row in the edgelist, find the appropriate cell in the empty matrix and add 1.
Start with your data frame edges and use igraph to obtain adjacency matrix:
head(edges)
node1 node2
1 551 548
2 510 512
3 548 553
4 505 504
5 510 512
6 552 543
library(igraph)
as.matrix(get.adjacency(graph.data.frame(edges)))
551 510 548 505 552 512 543 553 504 547 542
551 0 0 2 0 0 0 0 0 0 0 0
510 0 0 0 0 0 2 0 0 0 0 0
548 0 0 0 0 0 0 2 1 0 0 1
505 0 0 0 0 0 0 0 0 1 0 0
552 0 0 0 0 0 0 1 0 0 0 0
512 0 2 0 0 0 0 0 0 0 0 0
543 0 0 1 0 0 0 0 0 0 1 0
553 0 0 0 0 0 0 0 0 0 0 0
504 0 0 0 0 0 0 0 0 0 0 0
547 0 0 0 0 0 0 0 0 0 0 0
542 0 0 0 0 0 0 0 0 0 0 0
Another possibility with the qdapTools package:
library(qdapTools)
el[rep(seq_len(nrow(el)), el[,'c']), c('a', 'b')] %>%
{split(.[,'b'], .[,'a'])} %>%
mtabulate()
## 1 2 3 4 5
## 1 0 0 0 0 3
## 2 0 0 0 1 0
## 3 0 0 2 0 0
## 4 0 1 0 0 0
## 5 1 0 0 0 0

How to transform a item set matrix in R

How to transform a matrix like
A 1 2 3
B 3 6 9
c 5 6 9
D 1 2 4
into form like:
1 2 3 4 5 6 7 8 9
1 0 2 1 1 0 0 0 0 0
2 0 0 1 1 0 0 0 0 0
3 0 0 0 0 0 1 0 0 1
4 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 1 0 0 1
6 0 0 0 0 0 0 0 0 2
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
I have some implement for it ,but it use the for loop
I wonder if there has some inner function in R (for example "apply")
add:
Sorry for the confusion.The first matrix just mean items sets, every set of items come out pairs ,for example the first set is "1 2 3" , and will become (1,2),(1,3),(2,3), correspond the second matrix.
and another question :
If the matrix is very large (10000000*10000000)and is sparse
should I use sparse matrix or big.matrix?
Thanks!
Removing the row names from M gives this:
m <- matrix(c(1,3,5,1,2,6,6,2,3,9,9,4), nrow=4)
> m
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 3 6 9
## [3,] 5 6 9
## [4,] 1 2 4
# The indicies that you want to increment in x, but some are repeated
# combn() is used to compute the combinations of columns
indices <- matrix(t(m[,combn(1:3,2)]),,2,byrow=TRUE)
# Count repeated rows
ones <- rep(1,nrow(indices))
cnt <- aggregate(ones, by=as.data.frame(indices), FUN=sum)
# Set each value to the appropriate count
x <- matrix(0, 9, 9)
x[as.matrix(cnt[,1:2])] <- cnt[,3]
x
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 0 2 1 1 0 0 0 0 0
## [2,] 0 0 1 1 0 0 0 0 0
## [3,] 0 0 0 0 0 1 0 0 1
## [4,] 0 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 1 0 0 1
## [6,] 0 0 0 0 0 0 0 0 2
## [7,] 0 0 0 0 0 0 0 0 0
## [8,] 0 0 0 0 0 0 0 0 0
## [9,] 0 0 0 0 0 0 0 0 0

Resources