Find the largest values on a matrix in R - r

I am working on a matrix in R, 230 x 230 and I want to extract the 10 (or any other number than 1) max inputs on the matrix, both their position and value.
The extra problem is that this is a similarity matrix, so I have 1s in the diagonal which of course I want to leave out of the max search.
Any ideas or commands for that?

A neat way to do this in general is with the underused arrayInd function, which gives you row and column positions for plain jane vector positions. That's how which(..., arr.ind = TRUE) does it. Here's how you might do it:
## creating a random 230x230 matrix
n <- 230;
set.seed(1);
m <- matrix(sample.int(100000, n*n, replace = TRUE), n, n);
diag(m) <- 1;
## function to return n largest values and position for matrix m
nlargest <- function(m, n, sim = TRUE) {
mult <- 1;
if (sim) mult <- 2;
res <- order(m)[seq_len(n) * mult];
pos <- arrayInd(res, dim(m), useNames = TRUE);
list(values = m[res],
position = pos)
}
diag(m) <- NA;
nlargest(m, 10);
# $values
# [1] 1 2 11 12 12 12 13 18 21 22
#
# $position
# row col
# [1,] 59 95
# [2,] 178 202
# [3,] 160 34
# [4,] 83 151
# [5,] 150 194
# [6,] 18 225
# [7,] 13 38
# [8,] 206 182
# [9,] 89 22
#[10,] 142 99

Related

How to calculate the mean for every n vectors from a df

How to calculate the mean for every n vectors from a df creating a new data frame with the results.
I expect to get:
column 1: mean (V1,V2),
column 2: mean (V3,V4),
column 3: mean (V5,V6)
,and so forth
data
df <- data.frame(v1=1:6,V2=7:12,V3=13:18,v4=19:24,v5=25:30,v6=31:36)
Here is base R option
n <- 2 # Mean across every n = 2 columns
do.call(cbind, lapply(seq(1, ncol(df), by = n), function(idx) rowMeans(df[c(idx, idx + 1)])))
# [,1] [,2] [,3]
#[1,] 4 16 28
#[2,] 5 17 29
#[3,] 6 18 30
#[4,] 7 19 31
#[5,] 8 20 32
#[6,] 9 21 33
This returns a matrix rather than a data.frame (which makes more sense here since you're dealing with "all-numeric" data).
Explanation: The idea is a non-overlapping sliding window approach. seq(1, ncol(df), by = n) creates the start indices of the columns (here: 1, 3, 5). We then loop over those indices idx and calculate the row means of df[c(idx, idx + 1)]. This returns a list which we then cbind into a matrix.
As a minor modifcation, you can also predefine a data.frame with the right dimensions and then skip the do.call(cbind, ...) step by having R do an implicit list to data.frame typecast.
out <- data.frame(matrix(NA, ncol = ncol(df) / 2, nrow = nrow(df)))
out[] <- lapply(seq(1, ncol(df), by = n), function(idx) rowMeans(df[c(idx, idx + 1)]))
# X1 X2 X3
#1 4 16 28
#2 5 17 29
#3 6 18 30
#4 7 19 31
#5 8 20 32
#6 9 21 33
You may try,
dummy <- data.frame(
v1 = c(1:10),
v2 = c(1:10),
v3 = c(1:10),
v4 = c(1:10),
v5 = c(1:10),
v6 = c(1:10)
)
nvec_mean <- function(df, n){
res <- c()
m <- matrix(1:ncol(df), ncol = n, byrow = T)
if (ncol(df) %% n != 0){
stop()
}
for (i in 1:nrow(m)){
v <- rowMeans(df[,m[i,]])
res <- cbind(res, v)
}
colnames(res) <- c(1:nrow(m))
res
}
nvec_mean(dummy,3)
1 2
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 4
[5,] 5 5
[6,] 6 6
[7,] 7 7
[8,] 8 8
[9,] 9 9
[10,] 10 10
If you didn't want rowMeans or result is not what you wanted, please let me know.
Simple(?) version
df <- data.frame(v1=1:6,V2=7:12,V3=13:18,v4=19:24,v5=25:30,v6=31:36)
n = 2
res <- c()
m <- matrix(1:ncol(df), ncol = 2, byrow = T)
for (i in 1:nrow(m)){
v <- rowMeans(df[,m[i,]])
res <- cbind(res, v)
}
res
v v v
[1,] 4 16 28
[2,] 5 17 29
[3,] 6 18 30
[4,] 7 19 31
[5,] 8 20 32
[6,] 9 21 33

sum vectors in a matrix by distance from a cell (R)

Suppose I have a matrix A of dimensions n x m. A starting cell (i,j), And a constant k which satisfies k < n x m.
I need a way to extract the values inside A such that all values are within k steps from the starting cell. a step is either a column move or a row move.
Then Im looking to sum the extracted values by 2 groups where 1 group consists of sums obtained from the same column in the original matrix and the other group is the sum obtained from summation of values along rows of the original matrix.
It is important for me that this addresses situations where the starting cell is within k steps from the edge of the matrix.
Example set (I'm heavily simplifying here):
> #create matrix where m = 7,n = 7
> Mat <- sample(1:49,49) %>% matrix(7,7)
>
> #declare starting cell where (i = 4, j = 2)
> i = 4
> j = 2
>
> #declare number of steps
> k = 2
>
> Mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 25 35 29 10 16 46 23
[2,] 32 43 7 5 31 1 14
[3,] 36 19 49 45 13 41 47
[4,] 17 18 48 9 3 28 12
[5,] 26 6 30 33 20 2 11
[6,] 40 24 39 21 37 38 8
[7,] 4 15 34 22 27 44 42
> Mat[i,j]
[1] 18
for this example an output would be two vectors (one for column sums and one for row sums):
> Columnsum <- c(sum(36,17,26) , #sum(Mat[3:5,1])
+ sum(43,19,18,6,24), #sum(Mat[2:6,2])
+ sum(49,48,30), #sum(Mat[3:5,3])
+ sum(9)) #sum(Mat[4:4,3])
>
> Rowsum <- c(sum(43), #sum(Mat[2,2:2])
+ sum(36,19,49), #sum(Mat[3,1:3])
+ sum(17,18,48,9), #sum(Mat[4,1:4])
+ sum(26,6,30), #sum(Mat[5,1:3])
+ sum(24)) #sum(Mat[6,2:2])
>
> Columnsum
[1] 79 110 127 9
> Rowsum
[1] 43 104 92 62 24
You could 'remove' parts of your matrix Mat with entries more than k steps away from (i,j) by overwriting them with NA:
Mat[abs(row(Mat) - i) + abs(col(Mat) - j) > k] <- NA
Then remove the rows and columns that are entirely NA:
Mat <- Mat[rowSums(is.na(Mat)) != ncol(Mat), colSums(is.na(Mat)) != nrow(Mat)]
And finally you can compute the row and column sums:
Columnsum <- colSums(Mat, na.rm = TRUE)
Rowsum <- rowSums(Mat, na.rm = TRUE)

Convert bigger dimension matrix to smaller dimension matrix with a loop

I currently have 185*185 matrix and the goal is to convert this matrix into a 35*35 matrix by aggregating the value based on the rows and cols of the 185 matrix.
Example:
I have a 8*8 matrix as below:
matrix_x <- matrix(1:64, nrow = 8)
Then I want to convert it into a 4*4 matrix:
matrix_y <- matrix(NA, nrow = 4, ncol = 4)
The list below is created for aggregating the 8*8 matrix cols to a 4*4 matrix
col_list <- list(
1,
2:3,
c(4,8),
5:7
)
What I've done to achieve this is by assigning the value manually as below
matrix_y[1,1] <- sum(matrix_x[col_list[[1]],col_list[[1]]])
matrix_y[1,2] <- sum(matrix_x[col_list[[1]],col_list[[2]]])
matrix_y[1,3] <- sum(matrix_x[col_list[[1]],col_list[[3]]])
matrix_y[1,4] <- sum(matrix_x[col_list[[1]],col_list[[4]]])
matrix_y[2,1] <- sum(matrix_x[col_list[[2]],col_list[[1]]])
matrix_y[2,2] <- sum(matrix_x[col_list[[2]],col_list[[2]]])
matrix_y[2,3] <- sum(matrix_x[col_list[[2]],col_list[[3]]])
matrix_y[2,4] <- sum(matrix_x[col_list[[2]],col_list[[4]]])
matrix_y[3,1] <- sum(matrix_x[col_list[[3]],col_list[[1]]])
matrix_y[3,2] <- sum(matrix_x[col_list[[3]],col_list[[2]]])
matrix_y[3,3] <- sum(matrix_x[col_list[[3]],col_list[[3]]])
matrix_y[3,4] <- sum(matrix_x[col_list[[3]],col_list[[4]]])
matrix_y[4,1] <- sum(matrix_x[col_list[[4]],col_list[[1]]])
matrix_y[4,2] <- sum(matrix_x[col_list[[4]],col_list[[2]]])
matrix_y[4,3] <- sum(matrix_x[col_list[[4]],col_list[[3]]])
matrix_y[4,4] <- sum(matrix_x[col_list[[4]],col_list[[4]]])
This approach works well, but I'm looking for a more efficient way to achieve this since the approach I've done takes so many code lines.
There should be a neater/easier way to do this but here is one straight-forward option :
n <- 4
t(sapply(seq_len(n), function(p) sapply(col_list, function(q) sum(matrix_x[p, q]))))
# [,1] [,2] [,3] [,4]
#[1,] 1 26 82 123
#[2,] 2 28 84 126
#[3,] 3 30 86 129
#[4,] 4 32 88 132
This gives the same matrix as matrix_y in the post.
For the updated question, we can use outer
apply_fun <- function(x, y) sum(matrix_x[x, y])
outer(col_list, col_list, Vectorize(apply_fun))
# [,1] [,2] [,3] [,4]
#[1,] 1 26 82 123
#[2,] 5 58 170 255
#[3,] 12 72 184 276
#[4,] 18 108 276 414
Or following the same approach as in original answer with nested sapply
t(sapply(col_list, function(p) sapply(col_list, function(q) sum(matrix_x[p, q]))))

sample sequence of successive integer of an interval

I have a question about sampling: I would like to sample successive number in a vector without replacement. Is there a simple way to do so?
For example,
sample(c(1:100), 10, replace = F)
76 99 94 53 12 34 5 82 75 30
gives me 10 number between 1 and 100. Now I would like to have 10 sequence of 3 successive integer without replacement: c(2,3,4), c(10,11,12), c(82,83,84) etc.
The different sequences can't overlap, that is if c(2,3,4) is my first sampling, then none of the following one can have these numbers.
I would even look for the possibility of sampling 10 sequences of different sizes, the sizes given by a vector like
sizevec <- sample(c(1:4),10,replace = T)
Thanks for the help
set.seed(42)
lapply(sample(1:10, 1) + cumsum(sample(4:10, 10, TRUE)), function(x) x + 1:3)
# [[1]]
# [1] 21 22 23
# [[2]]
# [1] 27 28 29
# [[3]]
# [1] 36 37 38
# [[4]]
# [1] 44 45 46
# [[5]]
# [1] 51 52 53
# [[6]]
# [1] 60 61 62
# [[7]]
# [1] 64 65 66
# [[8]]
# [1] 72 73 74
# [[9]]
# [1] 80 81 82
# [[10]]
# [1] 87 88 89
A solution using tow while loop to take samples. After running the code, x is a list of desired output.
# Set seed for reproduciblility
set.seed(123)
# Create a list to store values
x <- list()
# Create a vector to store values in x
y <- integer()
# Set the threshold to stop
threshold <- 4
# Set the condition
condition <- TRUE
while (length(x) < threshold){
while (condition){
# Sample a number between 1 to 98
s <- sample(c(1:98), 1)
# Create a sequence
se <- s:(s + 2)
# Check if the values in se is in y, save it to the condition
condition <- any(se %in% y)
}
# Save se to the list
x[[length(x) + 1]] <- se
# Update y
y <- unlist(x)
# Reset the condition
condition <- TRUE
}
# View the results
x
# [[1]]
# [1] 29 30 31
#
# [[2]]
# [1] 79 80 81
#
# [[3]]
# [1] 41 42 43
#
# [[4]]
# [1] 89 90 91
HI you are unclear if the vectors may overlap or not. assuming the may overlap this should work
lapply(sample(c(1:97), 10, replace = F),function(i){ 0:2 + i})
having a random length would then look like this
lapply(sample(c(1:97), 10, replace = F),function(i){ 0:sample(1:10,1) + i})

Sum Every N Values in Matrix

So I have taken a look at this question posted before which was used for summing every 2 values in each row in a matrix. Here is the link:
sum specific columns among rows. I also took a look at another question here: R Sum every k columns in matrix which is more similiar to mine. I could not get the solution in this case to work. Here is the code that I am working with...
y <- matrix(1:27, nrow = 3)
y
m1 <- as.matrix(y)
n <- 3
dim(m1) <- c(nrow(m1)/n, ncol(m1), n)
res <- matrix(rowSums(apply(m1, 1, I)), ncol=n)
identical(res[1,],rowSums(y[1:3,]))
sapply(split.default(y, 0:(length(y)-1) %/% 3), rowSums)
I just get an error message when applying this. The desired output is a matrix with the following values:
[,1] [,2] [,3]
[1,] 12 39 66
[2,] 15 42 69
[3,] 18 45 72
To sum consecutive sets of n elements from each row, you just need to write a function that does the summing and apply it to each row:
n <- 3
t(apply(y, 1, function(x) tapply(x, ceiling(seq_along(x)/n), sum)))
# 1 2 3
# [1,] 12 39 66
# [2,] 15 42 69
# [3,] 18 45 72
Transform the matrix to an array and use colSums (as suggested by #nongkrong):
y <- matrix(1:27, nrow = 3)
n <- 3
a <- y
dim(a) <- c(nrow(a), ncol(a)/n, n)
b <- aperm(a, c(2,1,3))
colSums(b)
# [,1] [,2] [,3]
#[1,] 12 39 66
#[2,] 15 42 69
#[3,] 18 45 72
Of course this assumes that ncol(y) is divisible by n.
PS: You can of course avoid creating so many intermediate objects. They are there for didactic purposes.
I would do something similar to the OP -- apply rowSums on subsets of the matrix:
n = 3
ng = ncol(y)/n
sapply( 1:ng, function(jg) rowSums(y[, (jg-1)*n + 1:n ]))
# [,1] [,2] [,3]
# [1,] 12 39 66
# [2,] 15 42 69
# [3,] 18 45 72

Resources