How to enumerate all combinations in a matrix in R? - r

I am trying to construct a matrix that includes all the possible combinations. For example,
a=(1:2)^3 #=c(1,8)
b=(1:3)^2 #=c(1,4,9)
And I would like to define c such that c=c(1+1,1+4,1+9,8+1,8+4,8+9). I have learned from my previous question on how to get such a c from function outer. My current question is, how can I get a matrix M as follows:
Thanks in advance!

OK, here it is:
z <- outer(b, a, "+")
cbind(a[col(z)], b[row(z)], c(z))
# [,1] [,2] [,3]
#[1,] 1 1 2
#[2,] 1 4 5
#[3,] 1 9 10
#[4,] 8 1 9
#[5,] 8 4 12
#[6,] 8 9 17
A slightly adapted expand.grid solution.
ref <- expand.grid(b = b, a = a)
val <- do.call("+", ref) ## or `rowSums(ref)` with an implicit `as.matrix`
cbind(ref, c = val)
# b a c
#1 1 1 2
#2 4 1 5
#3 9 1 10
#4 1 8 9
#5 4 8 12
#6 9 8 17
In this case the result is a data frame rather than a matrix.

We can use expand.grid with outer
data.frame(expand.grid(a, b), c = c(outer(a, b, "+")))
# Var1 Var2 c
#1 1 1 2
#2 8 1 9
#3 1 4 5
#4 8 4 12
#5 1 9 10
#6 8 9 17
where
outer(a, b, "+") #gives
# [,1] [,2] [,3]
#[1,] 2 5 10
#[2,] 9 12 17

Or another option is CJ
library(data.table)
CJ(a, b)[, C := V1 + V2][]
#. V1 V2 C
#1: 1 1 2
#2: 1 4 5
#3: 1 9 10
#4: 8 1 9
#5: 8 4 12
#6: 8 9 17

Related

Extract positions in a data frame based on a vector

In a dataset I want to know where there are missing values, therefore i use which(is.na(df)). Then I do for example imputation in this dataset and thereafter I want to extract the imputed positions. But I dont know how to extract these data. Does anyone have suggestions? Thanks!
id <- factor(rep(letters[1:2], each=5))
A <- c(1,2,NA,67,8,9,0,6,7,9)
B <- c(5,6,31,9,8,1,NA,9,7,4)
C <- c(2,3,5,NA,NA,2,7,6,4,6)
D <- c(6,5,89,3,2,9,NA,12,69,8)
df <- data.frame(id, A, B,C,D)
df
id A B C D
1 a 1 5 2 6
2 a 2 6 3 5
3 a NA 31 5 89
4 a 67 9 NA 3
5 a 8 8 NA 2
6 b 9 1 2 9
7 b 0 NA 7 NA
8 b 6 9 6 12
9 b 7 7 4 69
10 b 9 4 6 8
pos_na <- which(is.na(df))
pos_na
[1] 13 27 34 35 47
# after imputation
id <- factor(rep(letters[1:2], each=5))
A <- c(1,2,4,67,8,9,0,6,7,9)
B <- c(5,6,31,9,8,1,65,9,7,4)
C <- c(2,3,5,8,2,2,7,6,4,6)
D <- c(6,5,89,3,2,9,6,12,69,8)
df <- data.frame(id, A, B,C,D)
df
id A B C D
1 a 1 5 2 6
2 a 2 6 3 5
3 a 4 31 5 89
4 a 67 9 8 3
5 a 8 8 2 2
6 b 9 1 2 9
7 b 0 65 7 6
8 b 6 9 6 12
9 b 7 7 4 69
10 b 9 4 6 8
Wanted output: 4,65,8,2 6
To store positions of NA use which with arr.ind = TRUE which gives row and column numbers.
pos_na <- which(is.na(df), arr.ind = TRUE)
pos_na
# row col
#[1,] 3 2
#[2,] 7 3
#[3,] 4 4
#[4,] 5 4
#[5,] 7 5
So that after imputation you can extract the values directly.
as.numeric(df[pos_na])
[1] 4 65 8 2 6
Instead of wrapping with which, we can keep it as a logical matrix
i1 <- is.na(df[-1])
Then, after the imputation, just use the i1
df[-1][i1]
#[1] 4 65 8 2 6
Note, the -1 indexing for columns is to remove the first column which is 'character'

For loop in matrix or similar structure for solving large matrix [duplicate]

This question already has answers here:
R Sum every k columns in matrix
(5 answers)
Closed 4 years ago.
[Can we have a for loop or other thing for solving the following matrix?
Matrix A (given 6 x 16)
a 1 5 6 9 5 8 5 6 7 9 4 6 2 5 4 6
b 8 6 2 4 7 9 2 3 4 8 6 2 1 6 8 2
c 9 5 1 7 5 3 7 5 3 9 5 1 2 6 9 3
d 2 5 6 3 4 1 8 4 2 6 9 5 1 3 7 1
e 7 4 2 3 6 5 7 4 1 2 3 6 9 8 5 2
f 1 5 3 7 8 9 4 6 3 1 5 2 8 9 5 4
Output (6 x 4)
a 1+5+6+9 5+8+5+6 7+9+4+6 2+5+4+6
b 8+6+2+4 7+9+2+3 4+8+6+2 1+6+8+2
c 9+5+1+7 5+3+7+5 3+9+5+1 2+6+9+3
d 2+5+6+3 4+1+8+4 2+6+9+5 1+3+7+1
e 7+4+2+3 6+5+7+4 1+2+3+6 9+8+5+2
f 1+5+3+7 8+9+4+6 3+1+5+2 8+9+5+4
I have a large maxtrix of 4519 x 4519, therefore looking for a for loop.]
matb <- matrix(data = 0, nrow =6 ,ncol = 6)
for (a in 1: nrow (data)) {
for (b in 1:seq (1,5,by=2)) {
c <- b+1
matb [a,1:3] <- rbind (sum(data[a,b:c]))
}
}
I tried using above syntax, but it did not work. Therefore, looking for help on for loop or function to solve this problem.
We can use recycling to select alternating columns, then add:
# example matrix
m <- matrix(1:12, ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
m[, c(TRUE, FALSE)] + m[, c(FALSE, TRUE)]
# [,1] [,2]
# [1,] 5 17
# [2,] 7 19
# [3,] 9 21

Get value from matrix based on address from two other tables in R

I'm trying to construct a table C that gets values from a set matrices X, Y and Z based on the "address" given in two other tables A and B.
To do this I've first added ID columns to tables A and B by:
A$A.ID <- seq.int(nrow(A))
B$B.ID <- seq.int(nrow(B))
and found all the permutations of the IDs using:
C <- expand.grid(A$A.ID, B$B.ID)
Now I want to add a columns X, Y and Z to C but have no idea what i'm doing (pretty new to programming :/ )
To explain the process I've drawn a picture. Hopefully it helps...
Let me know if you need to know anything else.
I think this works going by the pattern you describe. First of all, here's some example data:
A <- data.frame(A.ID=1:2, X=1:2, Y=3:2, Z=2:1)
B <- data.frame(B.ID=1:2, X=1:2, Y=2:1, Z=1:2)
A;B
# A.ID X Y Z
#1 1 1 3 2
#2 2 2 2 1
# B.ID X Y Z
#1 1 1 2 1
#2 2 2 1 2
X <- matrix(1:9,nrow=3); Y <- matrix(1:16,nrow=4); Z <- matrix(1:4,nrow=2)
X;Y;Z
# [,1] [,2] [,3]
#[1,] 1 4 7
#[2,] 2 5 8
#[3,] 3 6 9
# [,1] [,2] [,3] [,4]
#[1,] 1 5 9 13
#[2,] 2 6 10 14
#[3,] 3 7 11 15
#[4,] 4 8 12 16
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
Then a bunch of Mapping of the required indexes to a matrix, which is used to subset each of the X/Y/Z objects:
arep <- rep(1:nrow(A),nrow(B))
brep <- rep(1:nrow(B),each=nrow(A))
cells <- Map(
`[`,
list(X=X,Y=Y,Z=Z),
Map(function(x,y) t(mapply(c,x,y)), A[arep,-1], B[brep,-1])
)
data.frame(A["A.ID"][arep,,drop=FALSE], B["B.ID"][brep,,drop=FALSE], cells)
# A.ID B.ID X Y Z
#1 1 1 1 7 2
#2 2 1 2 6 1
#1.1 1 2 4 3 4
#2.1 2 2 5 2 3

R sum rows of matrix by column name

This seems like it should be easy but I can't figure it out. I would like to sum all of the columns of my matrix that have the same name. So, in the example below, I would like to end up with another matrix with only three columns.
set.seed(4)
z<-matrix(sample(1:10,20, replace=T), nrow=4)
colnames(z)<-c("a","c","b","a","b")
z
a c b a b
[1,] 6 9 10 2 10
[2,] 1 3 1 10 6
[3,] 3 8 8 5 10
[4,] 3 10 3 5 8
should yield:
a c b
[1,] 8 9 20
[2,] 11 3 7
[3,] 8 8 18
[4,] 8 10 11
I tried:
z<-aggregate(colnames(z), data=z, sum)
but it did not work. I would prefer to use base R if possible.
You can use rowsum with the column names as group variable:
t(rowsum(t(z), colnames(z)))
# a b c
#[1,] 8 20 9
#[2,] 11 7 3
#[3,] 8 18 8
#[4,] 8 11 10
Try this:
sapply(unique(colnames(z)), function(x) rowSums(z[, colnames(z)==x, drop=FALSE]))
Here is an option using xtabs
library(reshape2)
xtabs(value~Var1 +Var2, melt(z))
# Var2
#Var1 a c b
# 1 8 9 20
# 2 11 3 7
# 3 8 8 18
# 4 8 10 11
Or with tapply
tapply(z, list(row(z), colnames(z)[col(z)]), FUN = sum)
# a b c
# 1 8 20 9
# 2 11 7 3
#3 8 18 8
#4 8 11 10
This will also do.
set.seed(4)
z<-matrix(sample(1:10,20, replace=T), nrow=4)
colnames(z)<-c("a","c","b","a","b")
z <- as.data.table(z)
z[,id:=.I]
z <- melt(z,id.vars="id")
z[,sum:=sum(value),by=.(variable,id)]
z[,value:=NULL]
z <- dcast.data.table(z, id~variable,value.var = "sum", fun.aggregate = max)
z[,id:=NULL]
Resulting in
a c b
1: 8 9 20
2: 11 3 7
3: 8 8 18
4: 8 10 11

How to change the way split returns values in R?

I'm working on a project and I want to take a matrix, split it by the values w and x, and then for each of those splits find the maximum value of y.
Here's an example matrix
>rah = cbind(w = 1:6, x = 1:3, y = 12:1, z = 1:12)
>rah
w x y z
[1,] 1 1 12 1
[2,] 2 2 11 2
[3,] 3 3 10 3
[4,] 4 1 9 4
[5,] 5 2 8 5
[6,] 6 3 7 6
[7,] 1 1 6 7
[8,] 2 2 5 8
[9,] 3 3 4 9
[10,] 4 1 3 10
[11,] 5 2 2 11
[12,] 6 3 1 12
So I run split
> doh = split(rah, list(rah[,1], rah[,2]))
> doh
$`1.1`
[1] 1 1 1 1 12 6 1 7
$`2.1`
integer(0)
$`3.1`
integer(0)
$`4.1`
[1] 4 4 1 1 9 3 4 10
$`5.1`
integer(0)
$`6.1`
integer(0)
$`1.2`
integer(0)
$`2.2`
[1] 2 2 2 2 11 5 2 8
$`3.2`
integer(0)
$`4.2`
integer(0)
$`5.2`
[1] 5 5 2 2 8 2 5 11
...
So I'm a bit confused as to how take the output of split and use it to sort the rows with the matching combination of w and x values (Such as row 1 compared to row 7) and then compared them to find the one with the high y value.
EDIT: Informative answers so far but I just realized that I forgot to mention one very important part: I want to keep the whole row (x,w,y,z).
Use aggregate instead
> aggregate(y ~ w + x, max, data=rah)
w x y
1 1 1 12
2 4 1 9
3 2 2 11
4 5 2 8
5 3 3 10
6 6 3 7
If you want to use split, try
> split_rah <- split(rah[,"y"], list(rah[, "w"], rah[, "x"]))
> ind <- sapply(split_rah, function(x) length(x)>0)
> sapply(split_rah[ind], max)
1.1 4.1 2.2 5.2 3.3 6.3
12 9 11 8 10 7
Just for the record, summaryBy from doBy package also works in the same fashion of aggregate
> library(doBy)
> summaryBy(y ~ w + x, FUN=max, data=as.data.frame(rah))
w x y.max
1 1 1 12
2 2 2 11
3 3 3 10
4 4 1 9
5 5 2 8
6 6 3 7
data.table solution:
> library(data.table)
> dt <- data.table(rah)
> dt[, max(y), by=list(w, x)]
w x V1
1: 1 1 12
2: 2 2 11
3: 3 3 10
4: 4 1 9
5: 5 2 8
6: 6 3 7
> tapply(rah[,"y"], list( rah[,"w"], rah[,"x"]), max)
1 2 3
1 12 NA NA
2 NA 11 NA
3 NA NA 10
4 9 NA NA
5 NA 8 NA
6 NA NA 7
Another option using plyr package:
ddply(as.data.frame(rah),.(w,x),summarize,z=max(y))
w x z
1 1 1 12
2 2 2 11
3 3 3 10
4 4 1 9
5 5 2 8
6 6 3 7

Resources