Finding Cross-ranking matches between multiple columns in R - r

I have a sampling problem where I'm looking to search through columns of data for the highest ranked matches (which correspond to other set of data). Some matches are already ruled out:
-9 because it was the first selection (col 1 values)
-2 because it was highest ranked in set 9 (col 2 values)
-7 because it was highest ranked between cols 1 & 2 (col 3 values)
For example, the highest ranked match between columns 1,2,3 should be 1 because it has rank [2,7,2] across the three columns, which is the lowest sum. This new rank result would append to the 10x3 matrix to make it a 10X4 matrix, and a fifth match would then be sought.
[,1] [,2] [,3]
[1,] 2 9 9
[2,] 1 10 1
[3,] 4 6 3
[4,] 5 8 2
[5,] 7 7 4
[6,] 8 3 6
[7,] 10 1 10
[8,] 3 5 5
[9,] 6 4 8
[10,] 9 2 7
The goal is to do this for much larger data sets (say 500 rows) and creating a subset of about 20 values, so something besides visual inspection is needed! The list of numbers in each column is the same length and includes the same values, just in a unique order for each column.

Related

Remove duplicate rows based on a column values by storing the row whose entry in another column is maximum

I have the following matrix
> mat<-rbind(c(9,6),c(10,6),c(11,7),c(12,7),c(12,8),c(12,9),c(12,10),c(12,11),c(12,12),c(13,12))
> mat
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 7
[5,] 12 8
[6,] 12 9
[7,] 12 10
[8,] 12 11
[9,] 12 12
[10,] 13 12
I would like to remove duplicate rows based on first column values and store the row whose entry in the second column is maximum. E.g. for the example above, the desidered outcome is
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 12
[5,] 13 12
I tried with
> mat[!duplicated(mat[,1]),]
but I obtained
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 7
[5,] 13 12
which is different from the desidered outcome for the entry [4,2]. Suggestions?
You can sort the matrix first, using ascending order for column 1 and descending order for column 2. Then the duplicated function will remove all but the maximum column 2 value for each column 1 value.
mat <- mat[order(mat[,1],-mat[,2]),]
mat[!duplicated(mat[,1]),]
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 12
[5,] 13 12
Like Josephs solution, but if you add row names first you can keep the original order (which will be the same in this case).
rownames(mat) <- 1:nrow(mat)
mat <- mat[order(mat[,2], -mat[,2]),]
mat <- mat[!duplicated(mat[,1]),]
mat[order(as.numeric(rownames(mat))),]
# [,1] [,2]
# 1 9 6
# 2 10 6
# 3 11 7
# 4 12 12
# 5 13 12
First Sort then keep only the first row for each duplicate
mat <- mat[order(mat[,1], mat[,2]),]
mat[!duplicated(mat[,1]),]
EDIT: Sorry I thought your desired result is last df,Ok so you want max value
mat<-rbind(c(9,6),c(10,6),c(11,7),c(12,7),c(12,8),c(12,9),c(12,10),c(12,11),c(12,12),c(13,12))
#Reverse sort
mat <- mat[order(mat[,1], mat[,2], decreasing=TRUE),]
#Keep only the first row for each duplicate, this will give the largest values
mat <- mat[!duplicated(mat[,1]),]
#finally sort it
mat <- mat[order(mat[,1], mat[,2]),]

calculate all combination less the all value of a vector [duplicate]

This question already has answers here:
Generate list of all possible combinations of elements of vector
(10 answers)
Closed 4 years ago.
given a vector c(3,4,5) the length of vector is variable, i would return matrix of combination with all element less the value in the vector
make like this:
[x_1] [x_2] [x_3]
[1,] 3 4 5
[2,] 2 4 5
[3,] 1 4 5
[4,] 3 3 5
[5,] 3 2 5
[6,] 3 1 5
[7,] 2 3 5
[8,] 1 2 5
[9,] 1 4 4
[10,] 1 4 3
[11,] 1 4 2
.....
this is only a part of all possible combination, but i would have all possible combination.
I believe this is it.
x <- c(3, 4, 5)
lst <- lapply(x, ':', 1)
Map(expand.grid, list(lst))

Transformation of matrix by criteria

Professionals of R, I have a question:
I have the matrix as below, and I want create the criteria: to construct the matrix using only the next strings: 1st string + i, where i=3 so I want to get the new matrix with the first, 5th, 9th strings of the initial matrix, and so the dimension of new matrix has to be 3x3. Maybe is there the special function in R for this procedure or needed to realize this task through the FUN in R?
[,1] [,2] [,3]
[1,] 1 2 15
[2,] 2 3 16
[3,] 3 4 1
[4,] 4 5 2
[5,] 5 6 3
[6,] 6 7 4
[7,] 7 8 5
[8,] 8 9 6
[9,] 9 10 7
Below the desired matrix:
1 2 15
5 6 3
9 10 7

Why group_by() not working with apply() function

I have a matrix and want to calculate products of sequences for each column of a matrix by groups. First, each group corresponds to different numbers. Second, each column of the matrix should be computed separately too.
For example, for user 1 and y column, the number should be 1*1; for user 2 and y column, the number should be 2*3*1
I used apply with group_by but the output is incorrect.
#data
user<-c(1,1,2,2,2)
y<-c(1,1,2,3,1)
z<-c(2,2,3,3,3)
dt<-data.frame(user,y,z)
user y z
1 1 1 2
2 1 1 2
3 2 2 3
4 2 3 3
5 2 1 3
The output below is not by user
dt%>%group_by(user)%>% (function(x){apply(x[,2:3],2,prod)})
y z
6 108
Desired output ( you can remove the NA too. It does not matter whether adding a new column or not)
[,1] [,2]
[1,] NA NA
[2,] 1 4
[3,] NA NA
[4,] NA NA
[5,] 6 27

How to traverse matrix in diagonal strips and return the index of each position?

Have a matrix N x N and I want to traverse this matrix in diagonal strips and return the index position.
If I have a matrix 4x4 the code should return (1,1); (1,2); (2,1); (1,3); (2,2); (3,1); (1,4); (2,3); (3,2); (4,1); and so on
I'm trying to do this in R Studio
1) row(m) + col(m) is constant along reverse diagonals and within reverse diagonal we order by row:
m <- matrix(1:16, 4, 4) # test matrix
m[order(row(m) + col(m), row(m))]
## [1] 1 5 2 9 6 3 13 10 7 4 14 11 8 15 12 16
2) Not quite as compact as (1) but here is a variation that uses the same principle but uses outer and recycling instead of row and col:
k <- nrow(m)
m[ order(outer(1:k, 1:k, "+") + 0:(k-1)/k) ]
## [1] 1 5 2 9 6 3 13 10 7 4 14 11 8 15 12 16
You can use three for loops - the outermost one can count which diagonal you're on. It goes from 1 to N*N - 1 (one diagonal for each X value, one for each Y value, and then one that they share, starting at (1,N) and going to (N,1).
From there you only need to calculate the X and Y values in the inside 2 loops, using the diagonal counter
No loops needed with R's matrix indexing.
One test for whether a row,col number is the same diagonal is row+col being the same. You can also order the row and columns of a matrix by this principle, so use a two column matrix to deliver the values in the order:
M <- matrix(1:16, 4, 4)
idxs <- cbind( c(row(M)), c(col(M)) )
imat <- idxs[ order( rowSums(idxs), idxs[,1] ), ] # returns two columns
# turns out you don't need to sort by both rows and columns
# but could have used rev(col(M)) as secondary sort
> imat
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 2 1
[4,] 1 3
[5,] 2 2
[6,] 3 1
[7,] 1 4
[8,] 2 3
[9,] 3 2
[10,] 4 1
[11,] 2 4
[12,] 3 3
[13,] 4 2
[14,] 3 4
[15,] 4 3
[16,] 4 4
M[ imat ]
#[1] 1 5 2 9 6 3 13 10 7 4 14 11 8 15 12 16

Resources