R sum rows of matrix by column name - r

This seems like it should be easy but I can't figure it out. I would like to sum all of the columns of my matrix that have the same name. So, in the example below, I would like to end up with another matrix with only three columns.
set.seed(4)
z<-matrix(sample(1:10,20, replace=T), nrow=4)
colnames(z)<-c("a","c","b","a","b")
z
a c b a b
[1,] 6 9 10 2 10
[2,] 1 3 1 10 6
[3,] 3 8 8 5 10
[4,] 3 10 3 5 8
should yield:
a c b
[1,] 8 9 20
[2,] 11 3 7
[3,] 8 8 18
[4,] 8 10 11
I tried:
z<-aggregate(colnames(z), data=z, sum)
but it did not work. I would prefer to use base R if possible.

You can use rowsum with the column names as group variable:
t(rowsum(t(z), colnames(z)))
# a b c
#[1,] 8 20 9
#[2,] 11 7 3
#[3,] 8 18 8
#[4,] 8 11 10

Try this:
sapply(unique(colnames(z)), function(x) rowSums(z[, colnames(z)==x, drop=FALSE]))

Here is an option using xtabs
library(reshape2)
xtabs(value~Var1 +Var2, melt(z))
# Var2
#Var1 a c b
# 1 8 9 20
# 2 11 3 7
# 3 8 8 18
# 4 8 10 11
Or with tapply
tapply(z, list(row(z), colnames(z)[col(z)]), FUN = sum)
# a b c
# 1 8 20 9
# 2 11 7 3
#3 8 18 8
#4 8 11 10

This will also do.
set.seed(4)
z<-matrix(sample(1:10,20, replace=T), nrow=4)
colnames(z)<-c("a","c","b","a","b")
z <- as.data.table(z)
z[,id:=.I]
z <- melt(z,id.vars="id")
z[,sum:=sum(value),by=.(variable,id)]
z[,value:=NULL]
z <- dcast.data.table(z, id~variable,value.var = "sum", fun.aggregate = max)
z[,id:=NULL]
Resulting in
a c b
1: 8 9 20
2: 11 3 7
3: 8 8 18
4: 8 10 11

Related

Extract positions in a data frame based on a vector

In a dataset I want to know where there are missing values, therefore i use which(is.na(df)). Then I do for example imputation in this dataset and thereafter I want to extract the imputed positions. But I dont know how to extract these data. Does anyone have suggestions? Thanks!
id <- factor(rep(letters[1:2], each=5))
A <- c(1,2,NA,67,8,9,0,6,7,9)
B <- c(5,6,31,9,8,1,NA,9,7,4)
C <- c(2,3,5,NA,NA,2,7,6,4,6)
D <- c(6,5,89,3,2,9,NA,12,69,8)
df <- data.frame(id, A, B,C,D)
df
id A B C D
1 a 1 5 2 6
2 a 2 6 3 5
3 a NA 31 5 89
4 a 67 9 NA 3
5 a 8 8 NA 2
6 b 9 1 2 9
7 b 0 NA 7 NA
8 b 6 9 6 12
9 b 7 7 4 69
10 b 9 4 6 8
pos_na <- which(is.na(df))
pos_na
[1] 13 27 34 35 47
# after imputation
id <- factor(rep(letters[1:2], each=5))
A <- c(1,2,4,67,8,9,0,6,7,9)
B <- c(5,6,31,9,8,1,65,9,7,4)
C <- c(2,3,5,8,2,2,7,6,4,6)
D <- c(6,5,89,3,2,9,6,12,69,8)
df <- data.frame(id, A, B,C,D)
df
id A B C D
1 a 1 5 2 6
2 a 2 6 3 5
3 a 4 31 5 89
4 a 67 9 8 3
5 a 8 8 2 2
6 b 9 1 2 9
7 b 0 65 7 6
8 b 6 9 6 12
9 b 7 7 4 69
10 b 9 4 6 8
Wanted output: 4,65,8,2 6
To store positions of NA use which with arr.ind = TRUE which gives row and column numbers.
pos_na <- which(is.na(df), arr.ind = TRUE)
pos_na
# row col
#[1,] 3 2
#[2,] 7 3
#[3,] 4 4
#[4,] 5 4
#[5,] 7 5
So that after imputation you can extract the values directly.
as.numeric(df[pos_na])
[1] 4 65 8 2 6
Instead of wrapping with which, we can keep it as a logical matrix
i1 <- is.na(df[-1])
Then, after the imputation, just use the i1
df[-1][i1]
#[1] 4 65 8 2 6
Note, the -1 indexing for columns is to remove the first column which is 'character'

How to enumerate all combinations in a matrix in R?

I am trying to construct a matrix that includes all the possible combinations. For example,
a=(1:2)^3 #=c(1,8)
b=(1:3)^2 #=c(1,4,9)
And I would like to define c such that c=c(1+1,1+4,1+9,8+1,8+4,8+9). I have learned from my previous question on how to get such a c from function outer. My current question is, how can I get a matrix M as follows:
Thanks in advance!
OK, here it is:
z <- outer(b, a, "+")
cbind(a[col(z)], b[row(z)], c(z))
# [,1] [,2] [,3]
#[1,] 1 1 2
#[2,] 1 4 5
#[3,] 1 9 10
#[4,] 8 1 9
#[5,] 8 4 12
#[6,] 8 9 17
A slightly adapted expand.grid solution.
ref <- expand.grid(b = b, a = a)
val <- do.call("+", ref) ## or `rowSums(ref)` with an implicit `as.matrix`
cbind(ref, c = val)
# b a c
#1 1 1 2
#2 4 1 5
#3 9 1 10
#4 1 8 9
#5 4 8 12
#6 9 8 17
In this case the result is a data frame rather than a matrix.
We can use expand.grid with outer
data.frame(expand.grid(a, b), c = c(outer(a, b, "+")))
# Var1 Var2 c
#1 1 1 2
#2 8 1 9
#3 1 4 5
#4 8 4 12
#5 1 9 10
#6 8 9 17
where
outer(a, b, "+") #gives
# [,1] [,2] [,3]
#[1,] 2 5 10
#[2,] 9 12 17
Or another option is CJ
library(data.table)
CJ(a, b)[, C := V1 + V2][]
#. V1 V2 C
#1: 1 1 2
#2: 1 4 5
#3: 1 9 10
#4: 8 1 9
#5: 8 4 12
#6: 8 9 17

For loop in matrix or similar structure for solving large matrix [duplicate]

This question already has answers here:
R Sum every k columns in matrix
(5 answers)
Closed 4 years ago.
[Can we have a for loop or other thing for solving the following matrix?
Matrix A (given 6 x 16)
a 1 5 6 9 5 8 5 6 7 9 4 6 2 5 4 6
b 8 6 2 4 7 9 2 3 4 8 6 2 1 6 8 2
c 9 5 1 7 5 3 7 5 3 9 5 1 2 6 9 3
d 2 5 6 3 4 1 8 4 2 6 9 5 1 3 7 1
e 7 4 2 3 6 5 7 4 1 2 3 6 9 8 5 2
f 1 5 3 7 8 9 4 6 3 1 5 2 8 9 5 4
Output (6 x 4)
a 1+5+6+9 5+8+5+6 7+9+4+6 2+5+4+6
b 8+6+2+4 7+9+2+3 4+8+6+2 1+6+8+2
c 9+5+1+7 5+3+7+5 3+9+5+1 2+6+9+3
d 2+5+6+3 4+1+8+4 2+6+9+5 1+3+7+1
e 7+4+2+3 6+5+7+4 1+2+3+6 9+8+5+2
f 1+5+3+7 8+9+4+6 3+1+5+2 8+9+5+4
I have a large maxtrix of 4519 x 4519, therefore looking for a for loop.]
matb <- matrix(data = 0, nrow =6 ,ncol = 6)
for (a in 1: nrow (data)) {
for (b in 1:seq (1,5,by=2)) {
c <- b+1
matb [a,1:3] <- rbind (sum(data[a,b:c]))
}
}
I tried using above syntax, but it did not work. Therefore, looking for help on for loop or function to solve this problem.
We can use recycling to select alternating columns, then add:
# example matrix
m <- matrix(1:12, ncol = 4)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
m[, c(TRUE, FALSE)] + m[, c(FALSE, TRUE)]
# [,1] [,2]
# [1,] 5 17
# [2,] 7 19
# [3,] 9 21

Rearranging the columns of a data frame [duplicate]

This question already has answers here:
Splitting triplicates into duplicates
(3 answers)
Closed 8 years ago.
Given a data frame, I'd like to rearrange it and return another data frame of 2 columns. The 2 columns of this data frame are made up of any 2 elements of a row in the original data frame. So we will have C(ncol,2) * nrow number of rows in the second data frame. Here's an example. Given the data frame z, I'd like to return x. How can I do this?
> z = data.frame(A = c(1,2,3), B = c(4,5,6), C = c(7,8,9))
> z
A B C
1 1 4 7
2 2 5 8
3 3 6 9
> x
A B
1 1 4
2 1 7
3 4 7
4 2 5
5 2 8
6 5 8
7 3 6
8 3 9
9 6 9
Or, you could try:
matrix(apply(z, 1, combn,2), ncol=2, byrow=TRUE)
# [,1] [,2]
#[1,] 1 4
#[2,] 1 7
#[3,] 4 7
#[4,] 2 5
#[5,] 2 8
#[6,] 5 8
#[7,] 3 6
#[8,] 3 9
#[9,] 6 9
To get data.frame as output
setNames(as.data.frame(matrix(apply(z, 1, combn,2), ncol=2, byrow=TRUE)), LETTERS[1:2])
Something like this would work
newz <- setNames(do.call(rbind.data.frame, lapply(split(z, 1:nrow(z)), function(x)
t(combn(x,2)))),
c("A","B"))
newz
# A B
# 1.1 1 4
# 1.2 1 7
# 1.3 4 7
# 2.1 2 5
# 2.2 2 8
# 2.3 5 8
# 3.1 3 6
# 3.2 3 9
# 3.3 6 9
This generates the new rows using all combinations if the columns via combn(). If you hate the default rownames, you can get rid of them with
rownames(newz)<-NULL
newz
# A B
# 1 1 4
# 2 1 7
# 3 4 7
# 4 2 5
# 5 2 8
# 6 5 8
# 7 3 6
# 8 3 9
# 9 6 9

How to change the way split returns values in R?

I'm working on a project and I want to take a matrix, split it by the values w and x, and then for each of those splits find the maximum value of y.
Here's an example matrix
>rah = cbind(w = 1:6, x = 1:3, y = 12:1, z = 1:12)
>rah
w x y z
[1,] 1 1 12 1
[2,] 2 2 11 2
[3,] 3 3 10 3
[4,] 4 1 9 4
[5,] 5 2 8 5
[6,] 6 3 7 6
[7,] 1 1 6 7
[8,] 2 2 5 8
[9,] 3 3 4 9
[10,] 4 1 3 10
[11,] 5 2 2 11
[12,] 6 3 1 12
So I run split
> doh = split(rah, list(rah[,1], rah[,2]))
> doh
$`1.1`
[1] 1 1 1 1 12 6 1 7
$`2.1`
integer(0)
$`3.1`
integer(0)
$`4.1`
[1] 4 4 1 1 9 3 4 10
$`5.1`
integer(0)
$`6.1`
integer(0)
$`1.2`
integer(0)
$`2.2`
[1] 2 2 2 2 11 5 2 8
$`3.2`
integer(0)
$`4.2`
integer(0)
$`5.2`
[1] 5 5 2 2 8 2 5 11
...
So I'm a bit confused as to how take the output of split and use it to sort the rows with the matching combination of w and x values (Such as row 1 compared to row 7) and then compared them to find the one with the high y value.
EDIT: Informative answers so far but I just realized that I forgot to mention one very important part: I want to keep the whole row (x,w,y,z).
Use aggregate instead
> aggregate(y ~ w + x, max, data=rah)
w x y
1 1 1 12
2 4 1 9
3 2 2 11
4 5 2 8
5 3 3 10
6 6 3 7
If you want to use split, try
> split_rah <- split(rah[,"y"], list(rah[, "w"], rah[, "x"]))
> ind <- sapply(split_rah, function(x) length(x)>0)
> sapply(split_rah[ind], max)
1.1 4.1 2.2 5.2 3.3 6.3
12 9 11 8 10 7
Just for the record, summaryBy from doBy package also works in the same fashion of aggregate
> library(doBy)
> summaryBy(y ~ w + x, FUN=max, data=as.data.frame(rah))
w x y.max
1 1 1 12
2 2 2 11
3 3 3 10
4 4 1 9
5 5 2 8
6 6 3 7
data.table solution:
> library(data.table)
> dt <- data.table(rah)
> dt[, max(y), by=list(w, x)]
w x V1
1: 1 1 12
2: 2 2 11
3: 3 3 10
4: 4 1 9
5: 5 2 8
6: 6 3 7
> tapply(rah[,"y"], list( rah[,"w"], rah[,"x"]), max)
1 2 3
1 12 NA NA
2 NA 11 NA
3 NA NA 10
4 9 NA NA
5 NA 8 NA
6 NA NA 7
Another option using plyr package:
ddply(as.data.frame(rah),.(w,x),summarize,z=max(y))
w x z
1 1 1 12
2 2 2 11
3 3 3 10
4 4 1 9
5 5 2 8
6 6 3 7

Resources