I have a matrix
df<-matrix(data=c(3,7,5,0,1,0,0,0,0,8,0,9), ncol=2)
rownames(df)<-c("a","b","c","d","e","f")
[,1] [,2]
a 3 0
b 7 0
c 5 0
d 0 8
e 1 0
f 0 9
and I would like to order the matrix in descending order first by column 1 and then by column two resulting in the matrix
df.ordered<-matrix(data=c(7,5,3,1,0,0,0,0,0,0,9,8),ncol=2)
rownames(df.ordered)<-c("b","c","a","e","f","d")
[,1] [,2]
b 7 0
c 5 0
a 3 0
e 1 0
f 0 9
d 0 8
Any suggestions on how I could achieve this? Thanks.
The order function should do it.
df[order(df[,1],df[,2],decreasing=TRUE),]
To complete the main answer, here is a way to do it programmatically, without having to specify the columns by hand:
set.seed(2013) # preparing my example
mat <- matrix(sample.int(10,size = 30, replace = T), ncol = 3)
mat
[,1] [,2] [,3]
[1,] 5 1 6
[2,] 10 3 1
[3,] 8 8 1
[4,] 8 9 9
[5,] 3 7 3
[6,] 8 8 5
[7,] 10 10 2
[8,] 8 10 7
[9,] 10 1 9
[10,] 9 4 5
As a simple example, let say I want to use all the columns in their order of appearance to sort the rows of the matrix: (One could easily give a vector of indexes to the matrix)
mat[do.call(order, as.data.frame(mat)),] #could be ..as.data.frame(mat[,index_vec])..
[,1] [,2] [,3]
[1,] 3 7 3
[2,] 5 1 6
[3,] 8 8 1
[4,] 8 8 5
[5,] 8 9 9
[6,] 8 10 7
[7,] 9 4 5
[8,] 10 1 9
[9,] 10 3 1
[10,] 10 10 2
order function will help you out, try this:
df[order(-df[,1],-df[,2]),]
[,1] [,2]
b 7 0
c 5 0
a 3 0
e 1 0
f 0 9
d 0 8
The minus before df indicates that the order is decreasing. You will get the same result setting decreasing=TRUE.
df[order(df[,1],df[,2],decreasing=TRUE),]
Related
I would like to write a function that takes columns and rows of matrices as arguments and gives a matrix as an output.
For example, a function that takes rows i of an m by k matrix A and columns j of a k by n matrix B, and return a matrix M with elements m_i,j that equals to min(A[i,] * B[,j]) (element-wise multiplication):
Is there any simple way to avoid using loops? Does an sapply equivalent for matrices exists?
> matrix_A
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 3 4 5 6
[3,] 3 4 5 6 7
[4,] 0 1 2 3 4
[5,] 5 6 7 8 9
> matrix_B
[,1] [,2] [,3] [,4] [,5]
[1,] 7 6 5 4 3
[2,] 6 5 4 3 2
[3,] 1 2 3 4 5
[4,] 8 7 6 5 4
[5,] 9 8 7 6 5
>
> output_matrix <- matrix(, nrow=nrow(matrix_A), ncol=ncol(matrix_B))
> for (row_i in 1:nrow(matrix_A)) {
+ for (col_j in 1:ncol(matrix_B)) {
+ output_matrix[row_i, col_j] <- min(matrix_A[row_i,]*matrix_B[,col_j])
+ }
+ }
> output_matrix
[,1] [,2] [,3] [,4] [,5]
[1,] 3 6 5 4 3
[2,] 4 8 10 8 6
[3,] 5 10 15 12 8
[4,] 0 0 0 0 0
[5,] 7 14 21 18 12
>
Using apply from base R,
apply(m2, 2, function(i) apply(m1, 1, function(j) min(j*i)))
which gives,
[,1] [,2] [,3] [,4] [,5]
[1,] 3 6 5 4 3
[2,] 4 8 10 8 6
[3,] 5 10 15 12 8
[4,] 0 0 0 0 0
[5,] 7 14 21 18 12
A fully vectorized solution can be,
t(matrix(do.call(pmin,
as.data.frame(
do.call(rbind, rep(split(m1, 1:nrow(m1)), each = 5)) * do.call(rbind, rep(split(t(m2), 1:nrow(m2)), 5)))),
nrow(m1)))
You can avoid R loops (*apply functions are loops too) for this specific example. Often an efficient solution is possible, but needs a specific algorithm as I demonstrate here. If you don't need to optimize speed, use loops. Your for loop offers the best readability and is easy to understand.
matrix_A <- matrix(c(1,2,3,0,5,
2,3,4,1,6,
3,4,5,2,7,
4,5,6,3,8,
5,6,7,4,9), 5)
matrix_B <- matrix(c(7,6,1,8,9,
6,5,2,7,8,
5,4,3,6,7,
4,3,4,5,6,
3,2,5,4,5), 5)
#all combinations of i and j
inds <- expand.grid(seq_len(nrow(matrix_A)), seq_len(ncol(matrix_B)))
#subset A and transposed B then multiply the resulting matrices
#then calculate rowwise min and turn result into a matrix
library(matrixStats)
matrix(rowMins(matrix_A[inds[[1]],] * t(matrix_B)[inds[[2]],]), nrow(matrix_A))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 3 6 5 4 3
#[2,] 4 8 10 8 6
#[3,] 5 10 15 12 8
#[4,] 0 0 0 0 0
#[5,] 7 14 21 18 12
We use expand.grid to create all possible combinations of row and col pairs. We then use mapply to multiply all the row-column combination element wise and then select the min from it.
mat <- expand.grid(1:nrow(A),1:nrow(B))
mapply(function(x, y) min(matrix_A[x,] * matrix_B[, y]) , mat[,1], mat[,2])
#[1] 3 4 5 0 7 6 8 10 0 14 5 10 15 0 21 4 8 12 0 18 3 6 8 0 12
Assuming matrix_A, matrix_B and output_matrix all have the same dimensions we can relist the output from mapply to get the original dimensions.
output_matrix <- mapply(function(x, y) min(matrix_A[x,] * matrix_B[, y]),
mat[,1], mat[,2])
relist(output_matrix, matrix_A)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 3 6 5 4 3
#[2,] 4 8 10 8 6
#[3,] 5 10 15 12 8
#[4,] 0 0 0 0 0
#[5,] 7 14 21 18 12
Here we use pmap to iterate over the rows and columns of A and B:
library(tidyverse)
pmap_dbl(expand.grid(1:nrow(A), 1:nrow(B)), ~ min(A[..1, ] * B[ , ..2])) %>%
matrix(nrow=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 3 6 5 4 3
[2,] 4 8 10 8 6
[3,] 5 10 15 12 8
[4,] 0 0 0 0 0
[5,] 7 14 21 18 12
I have a list of matrices, generated with the code below
a<-c(0,5,0,1,5,1,5,4,6,7)
b<-c(3,1,0,2,4,2,5,5,7,8)
c<-c(5,9,0,1,3,2,5,6,2,7)
d<-c(6,5,0,1,3,4,5,6,7,1)
k<-data.frame(a,b,c,d)
k<-as.matrix(k)
#dimnames(k)<-list(cntry,cntry)
e<-c(0,5,2,2,1,2,3,6,9,2)
f<-c(2,0,4,1,1,3,4,5,1,4)
g<-c(3,3,0,2,0,9,3,2,1,9)
h<-c(6,1,1,1,5,7,8,8,0,2)
l<-data.frame(e,f,g,h)
l<-as.matrix(l)
#dimnames(l)<-list(cntry,cntry)
list<-list(k,l)
names(list)<-2010:2011
list
list
$`2010`
a b c d
[1,] 0 3 5 6
[2,] 5 1 9 5
[3,] 0 3 2 2
[4,] 1 2 1 1
[5,] 5 4 3 3
[6,] 1 2 2 4
[7,] 5 5 5 5
[8,] 4 5 6 6
[9,] 6 7 2 7
[10,] 7 8 7 1
$`2011`
e f g h
[1,] 0 2 3 6
[2,] 5 0 3 1
[3,] 2 4 0 1
[4,] 2 1 2 1
[5,] 1 1 0 5
[6,] 2 3 9 7
[7,] 3 4 3 8
[8,] 6 5 2 8
[9,] 9 1 1 0
[10,] 2 4 9 2
In each matrix I would like to delete the rows that are smaller than 1. But when I delete in matrix "2010" the first row (because <1), all other first rows in 2010 and 2011 should be deleted. Then the third row of first column is <1, then all other third columns should be deleted and so on...
The result should look like:
a b c d
[4,] 1 2 1 1
[6,] 1 2 2 4
[7,] 5 5 5 5
[8,] 4 5 6 6
[10,] 7 8 7 1
$`2011`
e f g h
[4,] 2 1 2 1
[6,] 2 3 9 7
[7,] 3 4 3 8
[8,] 6 5 2 8
[10,] 2 4 9 2
We can use rowSums
lapply(list, function(x) x[!rowSums(x <1),])
If we need to remove the rows that are common
ind <- Reduce(`&`, lapply(list, function(x) !rowSums(x < 1)))
lapply(list, function(x) x[ind,])
# a b c d
#[1,] 1 2 1 1
#[2,] 1 2 2 4
#[3,] 5 5 5 5
#[4,] 4 5 6 6
#[5,] 7 8 7 1
#$`2011`
# e f g h
#[1,] 2 1 2 1
#[2,] 2 3 9 7
#[3,] 3 4 3 8
#[4,] 6 5 2 8
#[5,] 2 4 9 2
Update
Based on the OP's comments about removing rows where the row is greater than the standard deviation of each columns,
lapply(list, function(x) {
for(i in seq_len(ncol(x))) x <- x[!rowSums(x > sd(x[,i])),]
x
})
# get union of the row index with at least one of the elements less 1
removed <- Reduce(union, lapply(list, function(x) which(rowSums(x < 1) != 0)))
lapply(list, function(x) x[-removed, ])
$`2010`
a b c d
[1,] 1 2 1 1
[2,] 1 2 2 4
[3,] 5 5 5 5
[4,] 4 5 6 6
[5,] 7 8 7 1
$`2011`
e f g h
[1,] 2 1 2 1
[2,] 2 3 9 7
[3,] 3 4 3 8
[4,] 6 5 2 8
[5,] 2 4 9 2
For example suppose I have matrix A
x y z f
1 1 2 A 1005
2 2 4 B 1002
3 3 2 B 1001
4 4 8 C 1001
5 5 10 D 1004
6 6 12 D 1004
7 7 11 E 1005
8 8 14 E 1003
From this matrix I want to find the repeated values like 1001, 1005, D, 2 (in third column) and I also want to find their index (which row, or which position).
I am new to R!
Obviously it is possible to do with simple searching element by element by using a for loop, but I want to know, is there any function available in R for this kind of problem.
Furthermore, I tried using duplicated and unique, both functions are giving me the duplicated row number or column number, they are also giving me how many of them were repeated, but I can not search for whole matrix using both of them!
You can write a rather simple function to get this information. Though note that this solution works with a matrix. It does not work with a data.frame. A similar function could be written for a data.frame using the fact that the data.frame data structure is a subset of a list.
# example data
set.seed(234)
m <- matrix(sample(1:10, size=100, replace=T), 10)
find_matches <- function(mat, value) {
nr <- nrow(mat)
val_match <- which(mat == value)
out <- matrix(NA, nrow= length(val_match), ncol= 2)
out[,2] <- floor(val_match / nr) + 1
out[,1] <- val_match %% nr
return(out)
}
R> m
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 8 6 6 7 6 7 4 10 6 9
[2,] 8 6 6 3 10 4 5 4 6 9
[3,] 1 6 9 2 9 2 3 6 4 2
[4,] 8 6 7 8 3 9 9 4 9 2
[5,] 1 1 5 6 7 1 5 1 10 6
[6,] 7 5 4 7 8 2 4 4 7 10
[7,] 10 4 7 8 3 1 8 6 3 4
[8,] 8 8 2 2 7 5 6 4 10 4
[9,] 10 2 9 6 6 9 7 2 4 7
[10,] 3 9 9 4 2 7 7 2 9 6
R> find_matches(m, 8)
[,1] [,2]
[1,] 1 1
[2,] 2 1
[3,] 4 1
[4,] 8 1
[5,] 8 2
[6,] 4 4
[7,] 7 4
[8,] 6 5
[9,] 7 7
In this function, the row index is output in column 1 and the column index is output in column 2
I have a list of 6 with 10 values in each. I would like to fill a 10x6 (10 rows, 6 columns) matrix with these values. I've tried some things but it's not working. I'm sure there must be an easy way to do it, but I haven't found it yet. Could anyone please help?
Here some example data:
l = lapply(1:6, rep, 10)
then use ?do.call and cbind to paste the list elements as columns:
do.call(cbind, l)
and you get a matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 1 2 3 4 5 6
[3,] 1 2 3 4 5 6
[4,] 1 2 3 4 5 6
[5,] 1 2 3 4 5 6
[6,] 1 2 3 4 5 6
[7,] 1 2 3 4 5 6
[8,] 1 2 3 4 5 6
[9,] 1 2 3 4 5 6
[10,] 1 2 3 4 5 6
I have a sparse matrix represented as
> (f <- data.frame(row=c(1,2,3,1,2,1,2,3,4,1,1,2),value=1:12))
row value
1 1 1
2 2 2
3 3 3
4 1 4
5 2 5
6 1 6
7 2 7
8 3 8
9 4 9
10 1 10
11 1 11
12 2 12
Here the first column is always present (in fact, the first few are present, the rest are not).
I want to get the data into the matrix format:
> t(matrix(c(1,2,3,NA,4,5,NA,NA,6,7,8,9,10,NA,NA,NA,11,12,NA,NA),nrow=4,ncol=5))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 NA
[2,] 4 5 NA NA
[3,] 6 7 8 9
[4,] 10 NA NA NA
[5,] 11 12 NA NA
Here is what seems to be working:
> library(Matrix)
> as.matrix(sparseMatrix(i = cumsum(f[[1]] == 1), j=f[[1]], x=f[[2]]))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 0
[2,] 4 5 0 0
[3,] 6 7 8 9
[4,] 10 0 0 0
[5,] 11 12 0 0
Except that I have to replace 0 with NA myself.
Is there a better solution?
You can do everything with base functions. The trick is to use indexing by a 2-col (row and col indices) matrix:
j <- f$row
i <- cumsum(j == 1)
x <- f$value
m <- matrix(NA, max(i), max(j))
m[cbind(i, j)] <- x
m
Whether it is better or not than using the Matrix package is subjective. Overkill in my opinion if you are not doing anything else with it. Also if your data had 0 in the f$value column, they would end up being converted as NA if you are not too careful.