Removing rows based off of duplicate answers in different columns [duplicate] - r

I have a data frame of integers that is a subset of all of the n choose 3 combinations of 1...n.
E.g., for n=5, it is something like:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 3 4
[5,] 1 3 5
[6,] 1 4 5
[7,] 2 1 3
[8,] 2 1 4
[9,] 2 1 5
[10,] 2 3 4
[11,] 2 3 5
[12,] 2 4 5
[13,] 3 1 2
[14,] 3 1 4
[15,] 3 1 5
[16,] 3 2 4
[17,] 3 2 5
[18,] 3 4 5
[19,] 4 1 2
[20,] 4 1 3
[21,] 4 1 5
[22,] 4 2 3
[23,] 4 2 5
[24,] 4 3 5
[25,] 5 1 2
[26,] 5 1 3
[27,] 5 1 4
[28,] 5 2 3
[29,] 5 2 4
[30,] 5 3 4
What I'd like to do is remove any rows with duplicate combinations, irrespective of ordering. E.g., [1,] 1 2 3 is the same as [1,] 2 1 3 is the same as [1,] 3 1 2.
unique, duplicated, &c. don't seem to take this into account. Also, I am working with quite a large amount of data (n is ~750), so it ought to be a pretty fast operation. Are there any base functions or packages that can do this?

Sort within the rows first, then use duplicated, see below:
# example data
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items
dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5

Related

get.edgelist or get.data.frame to transform matrix'values into weights of an edgelist

I have matrix that through the interpretation as a network shall give me an edgelist with the matrix' values as weights.
M4 <- matrix(rpois(36,5),nrow=6) #randomly googled that to have sample data
Mg <- graph.adjacency(M4) # make matrix into graph
Mw <- get.data.frame(Mg) # 1st try
Mw2 <- cbind( get.edgelist(Mg) , round( E(Mg)$weight, 3 )) # second try
You'll probably see that Mw using get.data.frame is giving me an edgelist, but without the edge weight. There is probably an attribute, which I can't find or understand.
Mw2 using this solution returns:
Error in round(E(Mg)$weight, 3) : non-numeric argument to mathematical function.
How to work around that? Leaves me puzzled.
If you would like to assign weights to edges, you should enable the option weighted like below
Mg <- graph.adjacency(M4, weighted = TRUE)
and finally you will get Mw2 like
> Mw2
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 1 2 9
[3,] 1 3 4
[4,] 1 4 4
[5,] 1 5 7
[6,] 1 6 2
[7,] 2 1 4
[8,] 2 2 1
[9,] 2 3 4
[10,] 2 4 5
[11,] 2 5 5
[12,] 2 6 4
[13,] 3 1 5
[14,] 3 2 2
[15,] 3 3 6
[16,] 3 4 5
[17,] 3 5 6
[18,] 3 6 4
[19,] 4 1 3
[20,] 4 2 4
[21,] 4 3 9
[22,] 4 4 8
[23,] 4 5 3
[24,] 4 6 5
[25,] 5 1 7
[26,] 5 2 5
[27,] 5 3 3
[28,] 5 4 3
[29,] 5 5 5
[30,] 5 6 6
[31,] 6 1 2
[32,] 6 2 7
[33,] 6 3 4
[34,] 6 4 7
[35,] 6 5 11
[36,] 6 6 3

rows less than 5 [duplicate]

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 2 years ago.
i'm trying to find which simple events have a difference between the maximum and minimum of no more than 10.
the code i have is
sampleSpace <- combinayions(100,3,seq(1,100,by=1),repeats.allowed=FALSE)
diff <- sample_max - sample_min
can anyone help me.
The subset of sampleSpace where the difference between max and min in each row is given by:
ss <- apply(sampleSpace, 1, function(x) abs(diff(range(x))) <= 10)
So you can do
sampleSpace[ss,]
to get just the rows meeting that criterion. There are 25,020 remaining rows. The first 30 are:
sampleSpace[ss,]
#> [,1] [,2] [,3]
#> [1,] 1 2 3
#> [2,] 1 2 4
#> [3,] 1 2 5
#> [4,] 1 2 6
#> [5,] 1 2 7
#> [6,] 1 2 8
#> [7,] 1 2 9
#> [8,] 1 2 10
#> [9,] 1 2 11
#> [10,] 1 3 2
#> [11,] 1 3 4
#> [12,] 1 3 5
#> [13,] 1 3 6
#> [14,] 1 3 7
#> [15,] 1 3 8
#> [16,] 1 3 9
#> [17,] 1 3 10
#> [18,] 1 3 11
#> [19,] 1 4 2
#> [20,] 1 4 3
#> [21,] 1 4 5
#> [22,] 1 4 6
#> [23,] 1 4 7
#> [24,] 1 4 8
#> [25,] 1 4 9
#> [26,] 1 4 10
#> [27,] 1 4 11
#> [28,] 1 5 2
#> [29,] 1 5 3
#> [30,] 1 5 4
you can simply use which(), as follows:
sampleSpace = sampleSpace[which(diff <= 10), ]
head(sampleSpace)
Here is the output:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 2 6
[5,] 1 2 7
[6,] 1 2 8
We can use rowRanges from matrixStats
library(matrixStats)
m1 <- rowRanges(sampleSpace)
out <- sampleSpace[(m1[,2]- m1[,1]) <= 10,]

unique relation between two columns X and Y using R [duplicate]

I have a data frame of integers that is a subset of all of the n choose 3 combinations of 1...n.
E.g., for n=5, it is something like:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 3 4
[5,] 1 3 5
[6,] 1 4 5
[7,] 2 1 3
[8,] 2 1 4
[9,] 2 1 5
[10,] 2 3 4
[11,] 2 3 5
[12,] 2 4 5
[13,] 3 1 2
[14,] 3 1 4
[15,] 3 1 5
[16,] 3 2 4
[17,] 3 2 5
[18,] 3 4 5
[19,] 4 1 2
[20,] 4 1 3
[21,] 4 1 5
[22,] 4 2 3
[23,] 4 2 5
[24,] 4 3 5
[25,] 5 1 2
[26,] 5 1 3
[27,] 5 1 4
[28,] 5 2 3
[29,] 5 2 4
[30,] 5 3 4
What I'd like to do is remove any rows with duplicate combinations, irrespective of ordering. E.g., [1,] 1 2 3 is the same as [1,] 2 1 3 is the same as [1,] 3 1 2.
unique, duplicated, &c. don't seem to take this into account. Also, I am working with quite a large amount of data (n is ~750), so it ought to be a pretty fast operation. Are there any base functions or packages that can do this?
Sort within the rows first, then use duplicated, see below:
# example data
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items
dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5

Why does the allPerms function in R always give one combination less?

I am trying to find all possible combinations for a number ie essentially the factorial of the number but also have display all possible combinations.
When I use the allPerms function I am supposed to get all possible combinations but it gives always one combination less. Why is this so?
library(permute)
allPerms(3)
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 1 3
[3,] 2 3 1
[4,] 3 1 2
[5,] 3 2 1
allPerms(4)
[,1] [,2] [,3] [,4]
[1,] 1 2 4 3
[2,] 1 3 2 4
[3,] 1 3 4 2
[4,] 1 4 2 3
[5,] 1 4 3 2
[6,] 2 1 3 4
[7,] 2 1 4 3
[8,] 2 3 1 4
[9,] 2 3 4 1
[10,] 2 4 1 3
[11,] 2 4 3 1
[12,] 3 1 2 4
[13,] 3 1 4 2
[14,] 3 2 1 4
[15,] 3 2 4 1
[16,] 3 4 1 2
[17,] 3 4 2 1
[18,] 4 1 2 3
[19,] 4 1 3 2
[20,] 4 2 1 3
[21,] 4 2 3 1
[22,] 4 3 1 2
[23,] 4 3 2 1
As you can see the very first combinations of 123 and 1234 for both the cases is missing respectively.
I know I can get all possible combinations using the permn() function from combinat package.
I just wanted to know if there is a way to use allPerms itself for this purpose. Or any other function too. Any info on this will be very useful. Thank you.
You want to set the observed flag to TRUE using the how() helper function.
h <- how(observed = TRUE)
allPerms(3, h)
> h <- how(observed = TRUE)
> allPerms(3, h)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 3 2
[3,] 2 1 3
[4,] 2 3 1
[5,] 3 1 2
[6,] 3 2 1
Why is observed = FALSE the default? Well, this is intentional because the entire package was designed from the viewpoint of restricted permutation tests that are common in applied uses of ordination methods in ecology. Given that we already have the observed permutation, the data, we don't want it in the permutations used to define the null distribution of the test statistic; well we do, but only through the observed data, not any extra ones that might come up during permutation.

Removing duplicate combinations (irrespective of order)

I have a data frame of integers that is a subset of all of the n choose 3 combinations of 1...n.
E.g., for n=5, it is something like:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[4,] 1 3 4
[5,] 1 3 5
[6,] 1 4 5
[7,] 2 1 3
[8,] 2 1 4
[9,] 2 1 5
[10,] 2 3 4
[11,] 2 3 5
[12,] 2 4 5
[13,] 3 1 2
[14,] 3 1 4
[15,] 3 1 5
[16,] 3 2 4
[17,] 3 2 5
[18,] 3 4 5
[19,] 4 1 2
[20,] 4 1 3
[21,] 4 1 5
[22,] 4 2 3
[23,] 4 2 5
[24,] 4 3 5
[25,] 5 1 2
[26,] 5 1 3
[27,] 5 1 4
[28,] 5 2 3
[29,] 5 2 4
[30,] 5 3 4
What I'd like to do is remove any rows with duplicate combinations, irrespective of ordering. E.g., [1,] 1 2 3 is the same as [1,] 2 1 3 is the same as [1,] 3 1 2.
unique, duplicated, &c. don't seem to take this into account. Also, I am working with quite a large amount of data (n is ~750), so it ought to be a pretty fast operation. Are there any base functions or packages that can do this?
Sort within the rows first, then use duplicated, see below:
# example data
dat = matrix(scan('data.txt'), ncol = 3, byrow = TRUE)
# Read 90 items
dat[ !duplicated(apply(dat, 1, sort), MARGIN = 2), ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 4
# [3,] 1 2 5
# [4,] 1 3 4
# [5,] 1 3 5
# [6,] 1 4 5
# [7,] 2 3 4
# [8,] 2 3 5
# [9,] 2 4 5
# [10,] 3 4 5

Resources