Extracting unique rows in a 3+ column matrix - r

Using R, I am trying to extract unique rows in a matrix, where a "unique row" is subject to all the values in a given row.
For example if I had this data set:
x = matrix(c(1,1,1,2,2,5,1,2,2,1,2,1,5,3,5,2,1,1),6,3)
Rows 1 & 6, and rows 4 & 5 are duplicated since (1,1,5) = (5,1,1) and (2,1,2) = (2,2,1).
Ultimately, i'm trying to end up with something in the form of:
y = matrix(c(1,1,1,2,1,2,2,1,5,3,5,2),4,3)
or
z = matrix(c(1,1,2,5,2,2,2,1,3,5,1,1),4,3)
The order doesn't matter as long as only one of the unique rows remains. I've searched online, but functions such as unique() and duplicated() have only worked for exact matching rows.
Thanks in advance for any help you provide.

Another answer: use sets. Slightly modified matrix:
library(sets)
x <- matrix(c(1,1,1,2,2,5,5, 1,2,2,1,2,1,5, 5,3,5,2,1,1,1),7,3)
x
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 2 3
[3,] 1 2 5
[4,] 2 1 2
[5,] 2 2 1
[6,] 5 1 1
[7,] 5 5 1
If (5,1,1) = (5,5,1) you can use just ordinary sets:
a <- sapply(1:nrow(x), function(i) as.set(x[i,]))
x[!duplicated(a),]
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 2 3
[3,] 1 2 5
[4,] 2 1 2
Note: rows 6 and 7 are both gone.
If (5,1,1) != (5,5,1), use generalized sets:
b <- sapply(1:nrow(x), function(i) as.gset(x[i,]))
x[!duplicated(b),]
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 2 3
[3,] 1 2 5
[4,] 2 1 2
[5,] 5 5 1

Related

R: How to permute only column in a data frame/matrix

I want to create x randomised matrices where only the columns are permuted but the rows are kept constant. I already took a look at permatful() in the vegan package. Nevertheless, i was not able to generate the desired result even though i am quite sure that this should be possible somehow.
df = matrix(c(2,3,1,4,5,1,3,6,2,4,1,3), ncol=3)
This is (one possible) desired result
[,1] [,2] [,3]
[1,] 2 5 2
[2,] 3 1 4
[3,] 1 3 1
[4,] 4 6 3
v
v permutation
v
[,1] [,2] [,3]
[1,] 5 2 2
[2,] 1 4 3
[3,] 3 1 1
[4,] 6 3 4
I tried something like permatfull(df, times=1, fixedmar = "rows", shuffle = "samp") which results in
[,1] [,2] [,3]
[1,] 5 2 2
[2,] 1 4 3
[3,] 3 1 1
[4,] 3 4 6
Now column 1 (originally column 2) has changed from 5,1,3,6 to 5,1,3,3.
Anyone an idea why I do not get the expected result?
Thanks in Advance,
Christian

Restore matrix row and column names to defaults in R (e.g., [1,], [2,]...)

Can matrix row and column names be set to defaults (e.g., [1,], [2,]... [,1], [,2]...) in R?
For example, is there a quick way to transform a matrix like this
x1 <- matrix(1:9,nrow=3,ncol=3,dimnames=list(1:3,letters[1:3]))
> x1
a b c
1 1 4 7
2 2 5 8
3 3 6 9
into this
> x1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
You're looking for dimnames<-:
dimnames(x1) <- NULL
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
You can see the help file by typing ?dimnames. It is also linked from ?matrix.

Flattening 3-Dimensional data to elongated 2-D in R

I have data with dim 10,5,2 (t,x,y) and I want to convert it to dimensions 10*5,3. i.e to append every t frame to (x,y) frame with t value.
eg:
data[1,,]=
x y
1 2
1 3
data[2,,]=
x y
5 2
1 6
I would like to convert this data to flatten array like this
x y t
1 2 1
1 3 1
5 2 2
1 6 2
I was looking if there is already R function to do this or I'd do it by looping every t array and add the recreated array at bottom of main array.
a <- array(1:8, c(2,2,2))
a[1,,]
# [,1] [,2]
#[1,] 1 5
#[2,] 3 7
a[2,,]
# [,1] [,2]
#[1,] 2 6
#[2,] 4 8
m <- matrix(aperm(a, c( 2, 1, 3)), nrow=prod(dim(a)[2:3]))
cbind(m, rep(seq_len(dim(a)[2]), each=dim(a)[1]))
# [,1] [,2] [,3]
#[1,] 1 5 1
#[2,] 3 7 1
#[3,] 2 6 2
#[4,] 4 8 2
Here's a different approach:
a <- array(c(1,5,1,1,2,2,3,6), dim = c(2,2,2) )
do.call('rbind',lapply(1:dim(a)[3], function(x) cbind(a[x,,], t = x)))
t
[1,] 1 2 1
[2,] 1 3 1
[3,] 5 2 2
[4,] 1 6 2
Also:
If ais the array.
ft <- ftable(a)
cbind(ft[,1:2], as.numeric(factor(gsub("\\_.*","",row.names(as.matrix(ft))))))
[,1] [,2] [,3]
[1,] 1 2 1
[2,] 1 3 1
[3,] 5 2 2
[4,] 1 6 2

R Sum complete cases of two columns

How can I sum the number of complete cases of two columns?
With c equal to:
a b
[1,] NA NA
[2,] 1 1
[3,] 1 1
[4,] NA 1
Applying something like
rollapply(c, 2, function(x) sum(complete.cases(x)),fill=NA)
I'd like to get back a single number, 2 in this case. This will be for a large data set with many columns, so I'd like to use rollapply across the whole set instead of simply doing sum(complete.cases(a,b)).
Am I over thinking it?
Thanks!
Did you try sum(complete.cases(x))?!
set.seed(123)
x <- matrix( sample( c(NA,1:5) , 15 , TRUE ) , 5 )
# [,1] [,2] [,3]
#[1,] 1 NA 5
#[2,] 4 3 2
#[3,] 2 5 4
#[4,] 5 3 3
#[5,] 5 2 NA
sum(complete.cases(x))
#[1] 3
To find the complete.cases() of the first two columns:
sum(complete.cases(x[,1:2]))
#[1] 4
And to apply to two columns of a matrix across the whole matrix you could do this:
# Bigger data for example
set.seed(123)
x <- matrix( sample( c(NA,1:5) , 50 , TRUE ) , 5 )
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,] 1 NA 5 5 5 4 5 2 NA NA
#[2,] 4 3 2 1 4 3 5 4 2 1
#[3,] 2 5 4 NA 3 3 4 1 2 2
#[4,] 5 3 3 1 5 1 4 1 2 1
#[5,] 5 2 NA 5 3 NA NA 1 NA 5
# Column indices
id <- seq( 1 , ncol(x) , by = 2 )
[1] 1 3 5 7 9
apply( cbind(id,id+1) , 1 , function(i) sum(complete.cases(x[,c(i)])) )
[1] 4 3 4 4 3
complete.cases() works row-wise across the whole data.frame or matrix returning TRUE for those rows which are not missing any data. A minor aside, "c" is a bad variable name because c() is one of the most commonly used functions.
You can calculate the number of complete cases in neighboring matrix columns using rollapply like this:
m <- matrix(c(NA,1,1,NA,1,1,1,1),ncol=4)
# [,1] [,2] [,3] [,4]
#[1,] NA 1 1 1
#[2,] 1 NA 1 1
library(zoo)
rowSums(rollapply(is.na(t(m)), 2, function(x) !any(x)))
#[1] 0 1 2
This shoudl work for both matrix and data.frame
> sum(apply(c, 1, function(x)all(!is.na(x))))
[1] 2
and you could simply iterate through large matrix M
for (i in 1:(ncol(M)-1) ){
c <- M[,c(i,i+1]
agreement <- sum(apply(c, 1, function(x)all(!is.na(x))))
}

In R, using `unique()` with extra conditions to extract submatrices: easy solution without plyr

In R, let M be the matrix
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 3 3
[3,] 2 4 5
[4,] 6 7 8
I would like to select the submatrix m
[,1] [,2] [,3]
[1,] 1 3 3
[2,] 2 4 5
[3,] 6 7 8
using unique on M[,1], specifying to keep the row with the maximal value in the second columnM.
At the end, the algorithm should keep row [2,] from the set \{[1,], [2,]\}. Unfortunately unique() returns me a vector with actual values, and not row numbers, after elimination of duplicates.
Is there a way to get the asnwer without the package plyr?
Thanks a lot,
Avitus
Here's how:
is.first.max <- function(x) seq_along(x) == which.max(x)
M[as.logical(ave(M[, 2], M[, 1], FUN = is.first.max)), ]
# [,1] [,2] [,3]
# [1,] 1 3 3
# [2,] 2 4 5
# [3,] 6 7 8
You're looking for duplicated.
m <- as.matrix(read.table(text="1 2 3
1 3 3
2 4 5
6 7 8"))
m <- m[order(m[,2], decreasing=TRUE), ]
m[!duplicated(m[,1]),]
# V1 V2 V3
# [1,] 6 7 8
# [2,] 2 4 5
# [3,] 1 3 3
Not the most efficient:
M <- matrix(c(1,1,2,6,2,3,4,7,3,3,5,8),4)
t(sapply(unique(M[,1]),function(i) {temp <- M[M[,1]==i,,drop=FALSE]
temp[which.max(temp[,2]),]
}))
# [,1] [,2] [,3]
#[1,] 1 3 3
#[2,] 2 4 5
#[3,] 6 7 8

Resources