find unique elements in matrix based on a subset of columns - r

I have a table which i want to transform
t LabelA LabelB start stop
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[4,] 2 4 9 1 2
[5,] 2 3 5 1 2
[6,] 2 1 6 1 2
[7,] 2 7 2 2 2
[8,] 3 3 5 3 4
[9,] 3 1 6 3 4
[10,] 3 7 2 3 5
[11,] 3 4 9 3 5
I want to filter the data in a way that rows which just differ by there number in the first column are removed (not completely but only the duplicate). So for rows 1 and 4 only row 1 should remain in the table. Or for row 3 and 9 only row 9 should remain. It is important that the information in the first column is remained and that the earliest occurance of the row remaisn in the table not the other incidences.

You can use duplicated:
mat[!duplicated(as.data.frame(mat[, -1])), ]
t LabelA LabelB start stop
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[7,] 2 7 2 2 2
[8,] 3 3 5 3 4
[9,] 3 1 6 3 4
[10,] 3 7 2 3 5
[11,] 3 4 9 3 5
where mat is the name of your matrix.

Try using duplicated function:
mymx <- matrix(c(1,4,9,1,2 ,1,3,5,1,2 ,1,1,6,1,2 ,2,4,9,1,2 ,2,3,5,1,2 ,2,1,6,1,2 ,2,7,2,2,2 ,3,3,5,3,4 ,3,1,6,3,4 ,3,7,2,3,5 ,3,4,9,3,5), ncol=5, byrow=T)
mymx[!duplicated(mymx[,-1]),]
> mymx[!duplicated(mymx[,-1]),]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[4,] 2 7 2 2 2
[5,] 3 3 5 3 4
[6,] 3 1 6 3 4
[7,] 3 7 2 3 5
[8,] 3 4 9 3 5

Related

manipulation of list of matrices in R

I have a list of matrices, generated with the code below
a<-c(0,5,0,1,5,1,5,4,6,7)
b<-c(3,1,0,2,4,2,5,5,7,8)
c<-c(5,9,0,1,3,2,5,6,2,7)
d<-c(6,5,0,1,3,4,5,6,7,1)
k<-data.frame(a,b,c,d)
k<-as.matrix(k)
#dimnames(k)<-list(cntry,cntry)
e<-c(0,5,2,2,1,2,3,6,9,2)
f<-c(2,0,4,1,1,3,4,5,1,4)
g<-c(3,3,0,2,0,9,3,2,1,9)
h<-c(6,1,1,1,5,7,8,8,0,2)
l<-data.frame(e,f,g,h)
l<-as.matrix(l)
#dimnames(l)<-list(cntry,cntry)
list<-list(k,l)
names(list)<-2010:2011
list
list
$`2010`
a b c d
[1,] 0 3 5 6
[2,] 5 1 9 5
[3,] 0 3 2 2
[4,] 1 2 1 1
[5,] 5 4 3 3
[6,] 1 2 2 4
[7,] 5 5 5 5
[8,] 4 5 6 6
[9,] 6 7 2 7
[10,] 7 8 7 1
$`2011`
e f g h
[1,] 0 2 3 6
[2,] 5 0 3 1
[3,] 2 4 0 1
[4,] 2 1 2 1
[5,] 1 1 0 5
[6,] 2 3 9 7
[7,] 3 4 3 8
[8,] 6 5 2 8
[9,] 9 1 1 0
[10,] 2 4 9 2
In each matrix I would like to delete the rows that are smaller than 1. But when I delete in matrix "2010" the first row (because <1), all other first rows in 2010 and 2011 should be deleted. Then the third row of first column is <1, then all other third columns should be deleted and so on...
The result should look like:
a b c d
[4,] 1 2 1 1
[6,] 1 2 2 4
[7,] 5 5 5 5
[8,] 4 5 6 6
[10,] 7 8 7 1
$`2011`
e f g h
[4,] 2 1 2 1
[6,] 2 3 9 7
[7,] 3 4 3 8
[8,] 6 5 2 8
[10,] 2 4 9 2
We can use rowSums
lapply(list, function(x) x[!rowSums(x <1),])
If we need to remove the rows that are common
ind <- Reduce(`&`, lapply(list, function(x) !rowSums(x < 1)))
lapply(list, function(x) x[ind,])
# a b c d
#[1,] 1 2 1 1
#[2,] 1 2 2 4
#[3,] 5 5 5 5
#[4,] 4 5 6 6
#[5,] 7 8 7 1
#$`2011`
# e f g h
#[1,] 2 1 2 1
#[2,] 2 3 9 7
#[3,] 3 4 3 8
#[4,] 6 5 2 8
#[5,] 2 4 9 2
Update
Based on the OP's comments about removing rows where the row is greater than the standard deviation of each columns,
lapply(list, function(x) {
for(i in seq_len(ncol(x))) x <- x[!rowSums(x > sd(x[,i])),]
x
})
# get union of the row index with at least one of the elements less 1
removed <- Reduce(union, lapply(list, function(x) which(rowSums(x < 1) != 0)))
lapply(list, function(x) x[-removed, ])
$`2010`
a b c d
[1,] 1 2 1 1
[2,] 1 2 2 4
[3,] 5 5 5 5
[4,] 4 5 6 6
[5,] 7 8 7 1
$`2011`
e f g h
[1,] 2 1 2 1
[2,] 2 3 9 7
[3,] 3 4 3 8
[4,] 6 5 2 8
[5,] 2 4 9 2

I would like to fill a matrix with values of a list

I have a list of 6 with 10 values in each. I would like to fill a 10x6 (10 rows, 6 columns) matrix with these values. I've tried some things but it's not working. I'm sure there must be an easy way to do it, but I haven't found it yet. Could anyone please help?
Here some example data:
l = lapply(1:6, rep, 10)
then use ?do.call and cbind to paste the list elements as columns:
do.call(cbind, l)
and you get a matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 1 2 3 4 5 6
[3,] 1 2 3 4 5 6
[4,] 1 2 3 4 5 6
[5,] 1 2 3 4 5 6
[6,] 1 2 3 4 5 6
[7,] 1 2 3 4 5 6
[8,] 1 2 3 4 5 6
[9,] 1 2 3 4 5 6
[10,] 1 2 3 4 5 6

Creating a matrix from multiple column vectors

How can I create a matrix from multiple column vectors?
I know that I can easily create a data frame with column vectors:
> colA <- 1:5
> colB <- 21:25
> colC <- 31:35
> data.frame(colA, colB, colC)
colA colB colC
1 1 21 31
2 2 22 32
3 3 23 33
4 4 24 34
5 5 25 35
However, when I try matrix(), it gives me unexpected results, as shown below. How can create my desired matrix? I know I can do as.matrix(df), which nicely preserves the column names, but I'm looking for a more direct approach.
> matrix(colA, colB, colC)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] 1 2 3 4 5 1 2 3 4 5 1 2 3
[2,] 2 3 4 5 1 2 3 4 5 1 2 3 4
[3,] 3 4 5 1 2 3 4 5 1 2 3 4 5
[4,] 4 5 1 2 3 4 5 1 2 3 4 5 1
[5,] 5 1 2 3 4 5 1 2 3 4 5 1 2
[6,] 1 2 3 4 5 1 2 3 4 5 1 2 3
[7,] 2 3 4 5 1 2 3 4 5 1 2 3 4
[8,] 3 4 5 1 2 3 4 5 1 2 3 4 5
[9,] 4 5 1 2 3 4 5 1 2 3 4 5 1
[10,] 5 1 2 3 4 5 1 2 3 4 5 1 2
[11,] 1 2 3 4 5 1 2 3 4 5 1 2 3
[12,] 2 3 4 5 1 2 3 4 5 1 2 3 4
[13,] 3 4 5 1 2 3 4 5 1 2 3 4 5
[14,] 4 5 1 2 3 4 5 1 2 3 4 5 1
[15,] 5 1 2 3 4 5 1 2 3 4 5 1 2
[16,] 1 2 3 4 5 1 2 3 4 5 1 2 3
[17,] 2 3 4 5 1 2 3 4 5 1 2 3 4
[18,] 3 4 5 1 2 3 4 5 1 2 3 4 5
[19,] 4 5 1 2 3 4 5 1 2 3 4 5 1
[20,] 5 1 2 3 4 5 1 2 3 4 5 1 2
[21,] 1 2 3 4 5 1 2 3 4 5 1 2 3
[,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
[1,] 4 5 1 2 3 4 5 1 2 3 4 5
[2,] 5 1 2 3 4 5 1 2 3 4 5 1
[3,] 1 2 3 4 5 1 2 3 4 5 1 2
[4,] 2 3 4 5 1 2 3 4 5 1 2 3
[5,] 3 4 5 1 2 3 4 5 1 2 3 4
[6,] 4 5 1 2 3 4 5 1 2 3 4 5
[7,] 5 1 2 3 4 5 1 2 3 4 5 1
[8,] 1 2 3 4 5 1 2 3 4 5 1 2
[9,] 2 3 4 5 1 2 3 4 5 1 2 3
[10,] 3 4 5 1 2 3 4 5 1 2 3 4
[11,] 4 5 1 2 3 4 5 1 2 3 4 5
[12,] 5 1 2 3 4 5 1 2 3 4 5 1
[13,] 1 2 3 4 5 1 2 3 4 5 1 2
[14,] 2 3 4 5 1 2 3 4 5 1 2 3
[15,] 3 4 5 1 2 3 4 5 1 2 3 4
[16,] 4 5 1 2 3 4 5 1 2 3 4 5
[17,] 5 1 2 3 4 5 1 2 3 4 5 1
[18,] 1 2 3 4 5 1 2 3 4 5 1 2
[19,] 2 3 4 5 1 2 3 4 5 1 2 3
[20,] 3 4 5 1 2 3 4 5 1 2 3 4
[21,] 4 5 1 2 3 4 5 1 2 3 4 5
[,26] [,27] [,28] [,29] [,30] [,31]
[1,] 1 2 3 4 5 1
[2,] 2 3 4 5 1 2
[3,] 3 4 5 1 2 3
[4,] 4 5 1 2 3 4
[5,] 5 1 2 3 4 5
[6,] 1 2 3 4 5 1
[7,] 2 3 4 5 1 2
[8,] 3 4 5 1 2 3
[9,] 4 5 1 2 3 4
[10,] 5 1 2 3 4 5
[11,] 1 2 3 4 5 1
[12,] 2 3 4 5 1 2
[13,] 3 4 5 1 2 3
[14,] 4 5 1 2 3 4
[15,] 5 1 2 3 4 5
[16,] 1 2 3 4 5 1
[17,] 2 3 4 5 1 2
[18,] 3 4 5 1 2 3
[19,] 4 5 1 2 3 4
[20,] 5 1 2 3 4 5
[21,] 1 2 3 4 5 1
Warning message:
In matrix(colA, colB, colC) :
data length [5] is not a sub-multiple or multiple of the number of rows [21]
You can use cbind to produce the desired matrix:
mat <- cbind(colA, colB, colC)
mat
# colA colB colC
# [1,] 1 21 31
# [2,] 2 22 32
# [3,] 3 23 33
# [4,] 4 24 34
# [5,] 5 25 35
class(mat)
# [1] "matrix"
You don't get the matrix you're expecting with the call of matrix(colA, colB, colC), because your arguments are getting interpreted as the first, second, and third arguments to the matrix function (aka data, nrow, and ncol). If you wanted to use the matrix function, you would need to provide your data as a single argument, with something like mat <- matrix(c(colA, colB, colC), ncol=3). If you used this syntax, you would not get the column names from the variables like we did with cbind.

matrix flow: to the right instead of downwards?

I'm not sure what you call this, but the default 'flow' of matrices is downwards (as seen below)
matrix(1,7,5)*(1:7)
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
What if your intention is to multiply the vector to the right instead of downwards? Is there a better way to write the command below? Is there a toggle for column instead of row (same for replicate(7,1:7) it assumes downwards flow (paste row vectors downwards instead of column vectors to the right); is transpose the solution?)
t(t(matrix(1,7,5))*(1:5))
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
If you really want to do this a lot after defining the matrix you can always make an operator yourself:
'%mat%'<- function(x,y)t(t(x)*y)
matrix(1,7,5)%mat%1:5
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 1 2 3 4 5
[3,] 1 2 3 4 5
[4,] 1 2 3 4 5
[5,] 1 2 3 4 5
[6,] 1 2 3 4 5
[7,] 1 2 3 4 5
But I think it easier to just transpose twice as you said in the question:
t(t(matrix(1,7,5))*1:5)
Or of course opt to transpose the matrix once in the beginning, do everything you need to do with it and then transpose it back.
As far as I know there is no way to change the default behaviour of *, nor would you probably want too,
A matrix is simply a vector with a dim attribute. The elements of the matrix are stored in the vector in column-major order and there is no way to change this. * is an element-by-element operator that recycles its arguments as necessary. You can see the recycling rule at work via:
> x <- matrix(1,7,5)
> x*1:5
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 2 4
[2,] 2 4 1 3 5
[3,] 3 5 2 4 1
[4,] 4 1 3 5 2
[5,] 5 2 4 1 3
[6,] 1 3 5 2 4
[7,] 2 4 1 3 5
You can see the multiplication is taking place by column and the vector (1:5) is being recycled to be the same length as the matrix. Rather than transposing, you could use the matrix function to re-size your matrix by row.
> matrix(x*1:5,nrow(x),ncol(x),byrow=TRUE)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 1 2 3 4 5
[3,] 1 2 3 4 5
[4,] 1 2 3 4 5
[5,] 1 2 3 4 5
[6,] 1 2 3 4 5
[7,] 1 2 3 4 5
I'm not sure that's the most efficient solution, but it's the best I can think of at the moment and it's slightly faster than using t twice.
Do you mean this?
> matrix(rep(1:7,5), nrow=7, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 2 2 2 2 2
[3,] 3 3 3 3 3
[4,] 4 4 4 4 4
[5,] 5 5 5 5 5
[6,] 6 6 6 6 6
[7,] 7 7 7 7 7
> matrix(rep(1:7,5), nrow=5, ncol=7, byrow=TRUE)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 2 3 4 5 6 7
[2,] 1 2 3 4 5 6 7
[3,] 1 2 3 4 5 6 7
[4,] 1 2 3 4 5 6 7
[5,] 1 2 3 4 5 6 7

How do I make a matrix from a list of vectors in R?

Goal: from a list of vectors of equal length, create a matrix where each vector becomes a row.
Example:
> a <- list()
> for (i in 1:10) a[[i]] <- c(i,1:5)
> a
[[1]]
[1] 1 1 2 3 4 5
[[2]]
[1] 2 1 2 3 4 5
[[3]]
[1] 3 1 2 3 4 5
[[4]]
[1] 4 1 2 3 4 5
[[5]]
[1] 5 1 2 3 4 5
[[6]]
[1] 6 1 2 3 4 5
[[7]]
[1] 7 1 2 3 4 5
[[8]]
[1] 8 1 2 3 4 5
[[9]]
[1] 9 1 2 3 4 5
[[10]]
[1] 10 1 2 3 4 5
I want:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5
One option is to use do.call():
> do.call(rbind, a)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5
simplify2array is a base function that is fairly intuitive. However, since R's default is to fill in data by columns first, you will need to transpose the output. (sapply uses simplify2array, as documented in help(sapply).)
> t(simplify2array(a))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5
The built-in matrix function has the nice option to enter data byrow. Combine that with an unlist on your source list will give you a matrix. We also need to specify the number of rows so it can break up the unlisted data. That is:
> matrix(unlist(a), byrow=TRUE, nrow=length(a) )
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5
Not straightforward, but it works:
> t(sapply(a, unlist))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5
t(sapply(a, '[', 1:max(sapply(a, length))))
where 'a' is a list.
Would work for unequal row size
> library(plyr)
> as.matrix(ldply(a))
V1 V2 V3 V4 V5 V6
[1,] 1 1 2 3 4 5
[2,] 2 1 2 3 4 5
[3,] 3 1 2 3 4 5
[4,] 4 1 2 3 4 5
[5,] 5 1 2 3 4 5
[6,] 6 1 2 3 4 5
[7,] 7 1 2 3 4 5
[8,] 8 1 2 3 4 5
[9,] 9 1 2 3 4 5
[10,] 10 1 2 3 4 5

Resources