Looping through all combinations of row products in a matrix - r

I have a matrix called Dataset which contains high throughput data like below :
V1 V2 V3
A 2 3 3
B 4 2 7
C 3 1 4
What I want to do is find all combinations of row products for this matrix in the form of a list, as shown below (I'll call it Comb):
V1 V2 V3
A A 4 9 9
A B 8 6 21
A C 6 3 12
B B 16 4 49
B C 12 2 28
C C 9 1 16
What I have so far is the following:
combs <- combn(seq_len(nrow(Dataset)), 2)
Comb <- Dataset[combs[1,], ] * Dataset[combs[2,], ]
rownames(Comb) <- apply(combn(rownames(Dataset), 2), 2, paste, collapse = " ")
Unfortunately, the main problem in using this script I don't get the products of rows that are multiplied by themselves. So using the above script, I would get the following matrix:
V1 V2 V3
A B 8 6 21
A C 6 3 12
B C 12 2 28
So I was wondering if it would be possible to modify the code I have in such a way that it would multiply the values in the same row together? Or would there be another to do this that might be more efficient? When I tried the script on a high throughput dataset (which is fairly large), it seemed to take several seconds to output a list for a table with 1000 rows, so if anyone knows of a way to do this task that might be faster, I'd love to hear your thoughts.
Thanks for your help!

Here is a minimal adaptation of your implementation. We just add to your combn result the values you want, and then pretty much just use the same logic:
r.seq <- seq_len(nrow(Dataset))
combs <- matrix(c(combn(r.seq, 2), rep(r.seq, each=2)), nrow=2) # notice how we add values here
Comb <- Dataset[combs[1,], ] * Dataset[combs[2,], ]
rownames(Comb) <- apply(matrix(rownames(Dataset)[c(combs)], nrow=2), 2, paste, collapse=" ")
Produces:
V1 V2 V3
A B 8 6 21
A C 6 3 12
B C 12 2 28
A A 4 9 9
B B 16 4 49
C C 9 1 16
You can always sort by rowname too. One advantage over expand grid is that this only calculates the combinations you want.

You may apply expand.grid on the sequence of rows ('d1'), subset the 'df1' rows using the columns of ('d1'), multiply. Create an index ('indx') of row names with outer to remove the rows that are not needed.
d1 <- expand.grid(1:nrow(df1), 1:nrow(df1))
res <- df1[d1[,1],] * df1[d1[,2],]
indx <- outer(rownames(df1), rownames(df1), FUN=paste)
v1 <- indx[upper.tri(indx, diag=TRUE)]
res1 <- res[do.call(paste,expand.grid(rownames(df1), rownames(df1))) %in% v1,]
row.names(res1) <- v1
res1
# V1 V2 V3
#A A 4 9 9
#A B 8 6 21
#B B 16 4 49
#A C 6 3 12
#B C 12 2 28
#C C 9 1 16

Related

R combinations of column vectors with names of new vectors as combination of original vectors

Apologies for the embarrassingly simple problem. I want to create combinations of all column vectors in a data frame, add the new vectors and rename them as a combination of the original column vector names.
For example
A B V3 V4 V5 V6
1 1 3 1 3 3 9
2 2 4 4 8 8 16
3 3 5 9 15 15 25
I'd like V3 to be named AA, V4 to be AB, V5 to be B*A...etc
the closest I have come is a python solution via a 'for loop'. Makes sense, but what is the r syntax to name the columns?
df<-data.frame(x1=1:3,x2=3:5)
for (i in 1:ncol(df)){
for (j in 1:ncol(df){
df[i,"-",j]<-df([,i]*df[,j])
}
}
Alternatively, I could use sapply instead of a loop but I am still stuck with renaming the new columns:
df<-data.frame(x1=1:3,x2=3:5)
a[3:6]<-sapply(a[1:2],"*",a[1:2])
Many thanks,
LR
One option is expand.grid
d1 <- expand.grid(names(df), names(df))
df[do.call(paste0, d1)] <- apply(d1, 1, FUN = function(x) do.call(`*`, df[x]))
df
# x1 x2 x1x1 x2x1 x1x2 x2x2
#1 1 3 1 3 3 9
#2 2 4 4 8 8 16
#3 3 5 9 15 15 25
Or another option is
do.call(polym, c(df, degree = 2, raw = TRUE))
Or
poly(as.matrix(df), degree = 2, raw = TRUE)

R - Subset rows of a data frame on a condition in all the columns

I want to subset rows of a data frame on a single condition in all the columns, avoiding the use of subset.
I understand how to subset a single column but I cannot generalize for all columns (without call all the columns).
Initial data frame :
V1 V2 V3
1 1 8 15
2 2 0 16
3 3 10 17
4 4 11 18
5 5 0 19
6 0 13 20
7 7 14 21
In this example, I want to subset the rows without zeros.
Expected output :
V1 V2 V3
1 1 8 15
2 3 10 17
3 4 11 18
4 7 14 21
Thanks
# create your data
a <- c(1,2,3,4,5,0,7)
b <- c(8,0,10,11,0,14,14)
c <- c(15,16,17,18,19,20,21)
data <- cbind(a, b, c)
# filter out rows where there is at least one 0
data[apply(data, 1, min) > 0,]
A solution using rowSums function after matching to 0.
# creating your data
data <- data.frame(a = c(1,2,3,4,5,0,7),
b = c(8,0,10,11,0,14,14),
c = c(15,16,17,18,19,20,21))
# Selecting rows containing no 0.
data[which(rowSums(as.matrix(data)==0) == 0),]
Another way
data[-unique(row(data)[grep("^0$", unlist(data))]),]

Matching and merging headers in R

In R, I want to match and merge two matrices.
For example,
> A
ID a b c d e f g
1 ex 3 8 7 6 9 8 4
2 am 7 5 3 0 1 8 3
3 ple 8 5 7 9 2 3 1
> B
col1
1 a
2 c
3 e
4 f
Then, I want to match header of matrix A and 1st column of matrix B.
The final result should be a matrix like below.
> C
ID a c e f
1 ex 3 7 9 8
2 am 7 3 1 8
3 ple 8 7 2 3
*(My original data has more than 500 columns and more than 20,000 rows.)
Are there any tips for that? Would really appreciate your help.
*In advance, if the matrix B is like below,
> B
col1 col2 col3 col4
1 a c e f
How to make the matrix C in this case?
You want:
A[, c('ID', B[, 1])]
For the second case, you want to use row number 1 of the second matrix, instead of its first column.
A[, c('ID', B[1, ])]
If B is a data.frame instead of a matrix, the syntax changes somewhat — you can use B$col1 instead of B[, 1], and to select by row, you need to transform the result to a vector, because the result of selecting a row in a data.frame is again a data.frame, i.e. you need to do unlist(B[1, ]).
You can use a subset:
cbind(A$ID, A[names(A) %in% B$col1])

Subset columns using logical vector

I have a dataframe that I want to drop those columns with NA's rate > 70% or there is dominant value taking over 99% of rows. How can I do that in R?
I find it easier to select rows with logic vector in subset function, but how can I do the similar for columns? For example, if I write:
isNARateLt70 <- function(column) {//some code}
apply(dataframe, 2, isNARateLt70)
Then how can I continue to use this vector to subset dataframe?
If you have a data.frame like
dd <- data.frame(matrix(rpois(7*4,10),ncol=7, dimnames=list(NULL,letters[1:7])))
# a b c d e f g
# 1 11 2 5 9 7 6 10
# 2 10 5 11 13 11 11 8
# 3 14 8 6 16 9 11 9
# 4 11 8 12 8 11 6 10
You can subset with a logical vector using one of
mycols<-c(T,F,F,T,F,F,T)
dd[mycols]
dd[, mycols]
There's really no need to write a function when we have colMeans (thanks #MrFlick for the advice to change from colSums()/nrow(), and shown at the bottom of this answer).
Here's how I would approach your function if you want to use sapply on it later.
> d <- data.frame(x = rep(NA, 5), y = c(1, NA, NA, 1, 1),
z = c(rep(NA, 3), 1, 2))
> isNARateLt70 <- function(x) mean(is.na(x)) <= 0.7
> sapply(d, isNARateLt70)
# x y z
# FALSE TRUE TRUE
Then, to subset with the above line your data using the above line of code, it's
> d[sapply(d, isNARateLt70)]
But as mentioned, colMeans works just the same,
> d[colMeans(is.na(d)) <= 0.7]
# y z
# 1 1 NA
# 2 NA NA
# 3 NA NA
# 4 1 1
# 5 1 2
Maybe this will help too. The 2 parameter in apply() means apply this function column wise on the data.frame cars.
> columns <- apply(cars, 2, function(x) {mean(x) > 10})
> columns
speed dist
TRUE TRUE
> cars[1:10, columns]
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
7 10 18
8 10 26
9 10 34
10 11 17

Convert a matrix with dimnames into a long format data.frame

Hoping there's a simple answer here but I can't find it anywhere.
I have a numeric matrix with row names and column names:
# 1 2 3 4
# a 6 7 8 9
# b 8 7 5 7
# c 8 5 4 1
# d 1 6 3 2
I want to melt the matrix to a long format, with the values in one column and matrix row and column names in one column each. The result could be a data.table or data.frame like this:
# col row value
# 1 a 6
# 1 b 8
# 1 c 8
# 1 d 1
# 2 a 7
# 2 c 5
# 2 d 6
...
Any tips appreciated.
Use melt from reshape2:
library(reshape2)
#Fake data
x <- matrix(1:12, ncol = 3)
colnames(x) <- letters[1:3]
rownames(x) <- 1:4
x.m <- melt(x)
x.m
Var1 Var2 value
1 1 a 1
2 2 a 2
3 3 a 3
4 4 a 4
...
The as.table and as.data.frame functions together will do this:
> m <- matrix( sample(1:12), nrow=4 )
> dimnames(m) <- list( One=letters[1:4], Two=LETTERS[1:3] )
> as.data.frame( as.table(m) )
One Two Freq
1 a A 7
2 b A 2
3 c A 1
4 d A 5
5 a B 9
6 b B 6
7 c B 8
8 d B 10
9 a C 11
10 b C 12
11 c C 3
12 d C 4
Assuming 'm' is your matrix...
data.frame(col = rep(colnames(m), each = nrow(m)),
row = rep(rownames(m), ncol(m)),
value = as.vector(m))
This executes extremely fast on a large matrix and also shows you a bit about how a matrix is made, how to access things in it, and how to construct your own vectors.
A modification that doesn't require you to know anything about the storage structure, and that easily extends to high dimensional arrays if you use the dimnames, and slice.index functions:
data.frame(row=rownames(m)[as.vector(row(m))],
col=colnames(m)[as.vector(col(m))],
value=as.vector(m))

Resources