Intersection of two integer matrices by position R - r

I would like to know which positions of one matrix intersect with another matrix and which values, for example
lab <- as.matrix(read.table(text="[1,] 0 0 0 0 0 0 0 0 0 1
[2,] 2 0 2 2 2 2 2 2 2 0
[3,] 2 0 2 0 0 0 0 0 2 2
[4,] 2 2 2 0 0 0 0 0 2 2
[5,] 2 0 2 0 0 0 0 0 0 0
[6,] 2 0 2 0 0 0 0 0 0 0
[7,] 2 0 2 0 0 0 0 0 0 0
[8,] 2 0 2 0 0 0 0 3 3 3
[9,] 2 0 2 0 0 0 0 0 3 3
[10,] 2 0 2 0 0 0 0 0 0 3")[,-1])
str(lab)
la1 <- as.matrix(read.table(text="[1,] 0 1 0 0 0 0 0 0 0 2
[2,] 3 0 4 4 4 4 4 4 4 0
[3,] 3 0 4 0 0 0 0 0 4 4
[4,] 3 0 4 0 5 5 0 0 4 4
[5,] 3 0 4 0 5 5 0 0 0 0
[6,] 3 0 4 0 0 0 0 0 0 0
[7,] 3 0 4 0 0 0 0 0 0 0
[8,] 3 0 4 0 0 0 0 6 6 6
[9,] 3 0 4 0 0 0 0 6 6 6
[10,] 3 0 4 0 0 0 0 0 0 6")[,-1])
Then, these numbers represent patches, patch 3 of la1 intersect patch 3 and 4 of la1, patch 1 of lab intersect 0 (no other patch), patch 3 of lab intersect patch 6 of la1. I am using the following code
require(dplyr)
tuples <- tibble()
dx <- dim(lab)[1]
for( i in seq_len(dx))
for( j in seq_len(dx))
{
ii <- tibble(l0=lab[i,j],l1=la1[i,j])
tuples <- bind_rows(tuples,ii)
}
tuples %>% distinct()
As I will use big 3000x3000 matrices so I am thinking if there is any faster way, maybe with rcpp or raster, of doing it.

Without a double for loop, we can transpose the matrixes into a two column tibble and get the distinct rows
out <- tibble(l0 = c(t(lab)), l1 = c(t(la1))) %>%
distinct
-checking with OP's output
out_old <- tuples %>%
distinct()
all.equal(out, out_old, check.attributes = FALSE)
#[1] TRUE
Benchmarks
lab2 <- matrix(sample(0:9, size = 3000 * 3000, replace = TRUE), 3000, 3000)
la2 <- matrix(sample(0:9, size = 3000 * 3000, replace = TRUE), 3000, 3000)
system.time({out2 <- tibble(l0 = c(t(lab2)), l1 = c(t(la2))) %>%
distinct})
# user system elapsed
# 0.398 0.042 0.440

If you just want to speed up, you can try unique over data.table, e.g.,
unique(data.table(c(lab), c(la)))

Here comes a base R solution.
as.vector might be faster than c.
unique(cbind(as.vector(lab), as.vector(la1)))
# [,1] [,2]
# [1,] 0 0
# [2,] 2 3
# [3,] 0 1
# [4,] 2 0
# [5,] 2 4
# [6,] 0 5
# [7,] 3 6
# [8,] 0 6
# [9,] 1 2

Related

one hot encode each column in a Int matrix in R

I have an issue of translating matrix into one hot encoding in R. I implemented in Matlab but i have difficulty in handling the object in R. Here i have an object of type 'matrix'.
I would like to apply one hot encoding to this matrix. I have problem with column names.
here is an example:
> set.seed(4)
> t <- matrix(floor(runif(10, 1,9)),5,5)
[,1] [,2] [,3] [,4] [,5]
[1,] 5 3 5 3 5
[2,] 1 6 1 6 1
[3,] 3 8 3 8 3
[4,] 3 8 3 8 3
[5,] 7 1 7 1 7
> class(t)
[1] "matrix"
Expecting:
1_1 1_3 1_5 1_7 2_1 2_3 2_6 2_8 ...
[1,] 0 0 1 0 0 1 0 0 ...
[2,] 1 0 0 0 0 0 1 0 ...
[3,] 0 1 0 0 0 0 0 1 ...
[4,] 0 1 0 0 0 0 0 1 ...
[5,] 0 0 0 1 1 0 0 0 ...
I tried the following, but the matrix remains the same.
library(data.table)
library(mltools)
test_table <- one_hot(as.data.table(t))
Any suggestions would be very much appreciated.
Your data table must contain some columns (variables) that have class "factor". Try this:
> t <- data.table(t)
> t[,V1:=factor(V1)]
> one_hot(t)
V1_1 V1_3 V1_5 V1_7 V2 V3 V4 V5
1: 0 0 1 0 3 5 3 5
2: 1 0 0 0 6 1 6 1
3: 0 1 0 0 8 3 8 3
4: 0 1 0 0 8 3 8 3
5: 0 0 0 1 1 7 1 7
But I read that from here that the dummyVars function from the caret package is quicker if your matrix is large.
Edit: Forgot to set the seed. :P
And a quick way to factor all variables in a data table:
t.f <- t[, lapply(.SD, as.factor)]
There are probably more concise ways to do this but this should work (and is at least easy to read and understand ;)
Suggested solution using base R and double loop:
set.seed(4)
t <- matrix(floor(runif(10, 1,9)),5,5)
# initialize result object
#
t_hot <- NULL
# for each column in original matrix
#
for (col in seq_along(t[1,])) {
# for each unique value in this column (sorted so the resulting
# columns appear in order)
#
for (val in sort(unique(t[, col]))) {
t_hot <- cbind(t_hot, ifelse(t[, col] == val, 1, 0))
# make name for this column
#
colnames(t_hot)[ncol(t_hot)] <- paste0(col, "_", val)
}
}
This returns:
1_1 1_3 1_5 1_7 2_1 2_3 2_6 2_8 3_1 3_3 3_5 3_7 4_1 4_3 4_6 4_8 5_1 5_3 5_5 5_7
[1,] 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0
[2,] 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0
[3,] 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0
[4,] 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0
[5,] 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1

How to find number of times a change happens in a matrix, using R

Let's say I have a matrix
>tmp
[,1] [,2] [,3]
[1,] 0 0 3
[2,] 0 2 0
[3,] 1 0 0
[4,] 1 0 0
[5,] 0 2 0
[6,] 1 0 0
[7,] 0 0 3
[8,] 0 0 3
[9,] 0 2 0
I now want to count number of changes in the matrix, so let's say in the first row I have a 3, then it changes to 2 in the next row and so on. I want to add these changes to a table like this:
1 2 3
1 1 1 1
2 2 0 0
3 0 2 1
So it says that 1 changes to 1, 1 time. 1 changes to 2, 1 time. 2 changes to 1, 2 times and so on. I have tried thinking about it for some time, but i can't figure out a smart method. I was thinking of using the function table() in R, but i am not sure how to. Does anyone have a smart solution to this problem?
Thanks!
t2 = as.vector(t(tmp))
t2 = t2[t2 != 0]
trans = data.frame(from = t2[-length(t2)], to = t2[-1])
with(trans, table(from, to))
# to
# from 1 2 3
# 1 1 1 1
# 2 2 0 0
# 3 0 2 1
You could, of course, skip the data frame entirely and jump to table(from = t2[-length(t2)], to = t2[-1]).
Using this data:
tmp = as.matrix(read.table(text = " 0 0 3
0 2 0
1 0 0
1 0 0
0 2 0
1 0 0
0 0 3
0 0 3
0 2 0"))
library(zoo)
library(magrittr)
tmp %>%
apply(1, function(x) x[x!=0]) %>% # Get non-zero element from each row
rollapplyr(2, I) %>% # Make matrix whose rows are all 2-windows of above
{table(from = .[,1], to = .[,2])} # make into table
# to
# from 1 2 3
# 1 1 1 1
# 2 2 0 0
# 3 0 2 1
Data used
tmp <- data.table::fread("
a b c d
[1,] 0 0 3
[2,] 0 2 0
[3,] 1 0 0
[4,] 1 0 0
[5,] 0 2 0
[6,] 1 0 0
[7,] 0 0 3
[8,] 0 0 3
[9,] 0 2 0
")[, -'a']
tmp <- as.matrix(tmp)

R: when both events (columns) are true, refer to one of them to decide the value

I want to manipulate two columns in R, so that when both events are true, refer to one of the columns to decide the value. For example:
a<- c(0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0)
b<- c(0,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0)
when a and b are both true, at a[9] and a[10], refer to b to decide the value of another column c in the following lines. Then, if b is FALSE at some line, (here is line 17) check again if both a and b are true. So, the desired output is like this:
c<- c(0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0)
data <- cbind(a,b,c)
data
a b c
[1,] 0 0 0
[2,] 0 1 0
[3,] 0 1 0
[4,] 0 0 0
[5,] 0 0 0
[6,] 1 0 0
[7,] 1 0 0
[8,] 1 0 0
[9,] 1 1 1
[10,] 1 1 1
[11,] 0 1 1
[12,] 0 1 1
[13,] 0 1 1
[14,] 0 1 1
[15,] 0 1 1
[16,] 0 1 1
[17,] 0 0 0
[18,] 0 0 0
As the data comes in many lines, I would prefer the use vectorized method like ifelse() to handle this.
Many thanks to all the people who can help me with this.
I'm not 100% sure I understand the issue but here is a tidyverse solution that reproduces the output:
a<- c(0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0)
b<- c(0,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0)
df <- data.frame(a,b)
library(tidyverse)
df %>% mutate(c=case_when(
a == 1 & b == 1 ~ 1,
a == 0 & b == 0 ~ 0,
TRUE ~ NA_real_
)) %>% fill(c)
# a b c
# 1 0 0 0
# 2 0 1 0
# 3 0 1 0
# 4 0 0 0
# 5 0 0 0
# 6 1 0 0
# 7 1 0 0
# 8 1 0 0
# 9 1 1 1
# 10 1 1 1
# 11 0 1 1
# 12 0 1 1
# 13 0 1 1
# 14 0 1 1
# 15 0 1 1
# 16 0 1 1
# 17 0 0 0
# 18 0 0 0

Clustering of Count data

I am currently trying to find clusters in a data set that looks like this:
Dienstag 19 Mittwoch 20 Donnerstag 21 Freitag 22 Montag 25 Dienstag 26 Donnerstag 28
[1,] 0 0 0 0 0 0 NA
[2,] 0 0 0 0 0 0 NA
[3,] 0 0 0 0 0 0 NA
[4,] 0 0 0 0 1 0 NA
[5,] 1 0 1 1 1 1 NA
[6,] 0 0 0 0 0 0 NA
[7,] 4 0 1 0 2 1 NA
[8,] 0 1 2 1 0 2 NA
[9,] 0 0 1 0 0 0 NA
[10,] 1 0 0 0 0 1 0
[11,] 2 0 1 0 0 5 0
[12,] 1 0 0 0 0 1 1
[13,] 0 1 0 0 0 0 0
[14,] 0 0 1 0 4 1 0
It corresponds at the counting of times a user used an application given the day and the hour.
I want to find pattern/clusters that relate the usage with the hour, but I don't know how to manage it. It would really be helpful if you could give me some suggestions about methods.
There are statistical means at clustering as well but here's a visual approach. I was lazy and used libraries I am familiar with to accomplish this goal but it is likely accomplished more efficiently with some base tools.
## dat <- read.table(text=" Dienstag.19 Mittwoch.20 Donnerstag.21 Freitag.22 Montag.25 Dienstag.26 Donnerstag.28
## [1,] 0 0 0 0 0 0 NA
## [2,] 0 0 0 0 0 0 NA
## [3,] 0 0 0 0 0 0 NA
## [4,] 0 0 0 0 1 0 NA
## [5,] 1 0 1 1 1 1 NA
## [6,] 0 0 0 0 0 0 NA
## [7,] 4 0 1 0 2 1 NA
## [8,] 0 1 2 1 0 2 NA
## [9,] 0 0 1 0 0 0 NA
## [10,] 1 0 0 0 0 1 0
## [11,] 2 0 1 0 0 5 0
## [12,] 1 0 0 0 0 1 1
## [13,] 0 1 0 0 0 0 0
## [14,] 0 0 1 0 4 1 0", header=TRUE)
dat$hour <- factor(1:nrow(dat))
library(reshape2); library(qdap); library(ggplot2); library(plyr)
dat2 <- melt(dat)
dat2[, 2] <- beg2char(dat2[, 2], ".")
dat2 <- ddply(dat2, .(variable), transform,
rescale = scale(value))
ggsave("heat.png")
ggplot(dat3, aes(variable, hour)) + geom_tile(aes(fill=rescale)) +
scale_fill_gradient(low = "white", high = "red")
Most clustering algorithms will assume continuous data. While of course you can "cast" integers to double values, the results will no longer be as meaningful as they were for true continuous values.
I like Tylers visual approach. If there is a meaningful pattern, your brains visual cortex is probably the best tool to discover it.

How to transform a item set matrix in R

How to transform a matrix like
A 1 2 3
B 3 6 9
c 5 6 9
D 1 2 4
into form like:
1 2 3 4 5 6 7 8 9
1 0 2 1 1 0 0 0 0 0
2 0 0 1 1 0 0 0 0 0
3 0 0 0 0 0 1 0 0 1
4 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 1 0 0 1
6 0 0 0 0 0 0 0 0 2
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
I have some implement for it ,but it use the for loop
I wonder if there has some inner function in R (for example "apply")
add:
Sorry for the confusion.The first matrix just mean items sets, every set of items come out pairs ,for example the first set is "1 2 3" , and will become (1,2),(1,3),(2,3), correspond the second matrix.
and another question :
If the matrix is very large (10000000*10000000)and is sparse
should I use sparse matrix or big.matrix?
Thanks!
Removing the row names from M gives this:
m <- matrix(c(1,3,5,1,2,6,6,2,3,9,9,4), nrow=4)
> m
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 3 6 9
## [3,] 5 6 9
## [4,] 1 2 4
# The indicies that you want to increment in x, but some are repeated
# combn() is used to compute the combinations of columns
indices <- matrix(t(m[,combn(1:3,2)]),,2,byrow=TRUE)
# Count repeated rows
ones <- rep(1,nrow(indices))
cnt <- aggregate(ones, by=as.data.frame(indices), FUN=sum)
# Set each value to the appropriate count
x <- matrix(0, 9, 9)
x[as.matrix(cnt[,1:2])] <- cnt[,3]
x
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 0 2 1 1 0 0 0 0 0
## [2,] 0 0 1 1 0 0 0 0 0
## [3,] 0 0 0 0 0 1 0 0 1
## [4,] 0 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 1 0 0 1
## [6,] 0 0 0 0 0 0 0 0 2
## [7,] 0 0 0 0 0 0 0 0 0
## [8,] 0 0 0 0 0 0 0 0 0
## [9,] 0 0 0 0 0 0 0 0 0

Resources