Ranking and Counting Matrix Elements in R - r

I know there are similar questions but I couldn't find an answer to my question. I'm trying to rank elements in a matrix and then extract data of 5 highest elements.
Here is my attempt.
set.seed(20)
d<-matrix(rnorm(100),nrow=10,ncol=10)
start<-d[1,1]
for (i in 1:10) {
for (j in 1:10) {
if (start < d[i,j])
{high<-d[i,j]
rowind<-i
colind<-j
}
}
}
Although this gives me the data of the highest element, including row and column numbers, I can't think of a way to do the same for elements ranked from 2 to 5. I also tried
rank(d, ties.method="max")
But it wasn't helpful because it just spits out the rank in vector format.
What I ultimately want is a data frame (or any sort of table) that contains
rank, column name, row name, and the data(number) of highest 5 elements in matrix.
Edit
set.seed(20)
d<-matrix(rnorm(100),nrow=10,ncol=10)
d[1,2]<-5
d[2,1]<-5
d[1,3]<-4
d[3,1]<-4
Thanks for the answers. Those perfectly worked for my purpose, but as I'm running this code for correlation chart -where there will be duplicate numbers for every pair- I want to count only one of the two numbers for ranking purpose. Is there any way to do this? Thanks.

Here's a very crude way:
DF = data.frame(row = c(row(d)), col = c(col(d)), v = c(d))
DF[order(DF$v, decreasing=TRUE), ][1:5, ]
row col v
91 1 10 2.208443
82 2 9 1.921899
3 3 1 1.785465
32 2 4 1.590146
33 3 4 1.556143
It would be nice to only have to partially sort, but in ?order, it looks like this option is only available for sort, not for order.
If the matrix has row and col names, it might be convenient to see them instead of numbers. Here's what I might do:
dimnames(d) <- list(letters[1:10], letters[1:10])
DF = data.frame(as.table(d))
DF[order(DF$Freq, decreasing=TRUE), ][1:5, ]
Var1 Var2 Freq
91 a j 2.208443
82 b i 1.921899
3 c a 1.785465
32 b d 1.590146
33 c d 1.556143
The column names don't make much sense here, unfortunately, but you can change them with names(DF) <- as usual.

Here is one option with Matrix
library(Matrix)
m1 <- summary(Matrix(d, sparse=TRUE))
head(m1[order(-m1[,3]),],5)
# i j x
#93 3 10 2.359634
#31 1 4 2.234804
#23 3 3 1.980956
#55 5 6 1.801341
#16 6 2 1.678989
Or use melt
library(reshape2)
m2 <- melt(d)
head(m2[order(-m2[,3]), ], 5)

Here is something quite simple in base R.
# set.seed(20)
# d <- matrix(rnorm(100), nrow = 10, ncol = 10)
d.rank <- matrix(rank(-d), nrow = 10, ncol = 10)
which(d.rank <= 5, arr.ind=TRUE)
row col
[1,] 3 1
[2,] 2 4
[3,] 3 4
[4,] 2 9
[5,] 1 10
d[d.rank <= 5]
[1] 1.785465 1.590146 1.556143 1.921899 2.208443
Results can (easily) be made clearer (see comment from Frank):
cbind(which(d.rank <= 5, arr.ind=TRUE), v = d[d.rank <= 5], rank = rank(-d[d.rank <= 5]))
row col v rank
[1,] 3 1 1.785465 3
[2,] 2 4 1.590146 4
[3,] 3 4 1.556143 5
[4,] 2 9 1.921899 2
[5,] 1 10 2.208443 1

Related

How does the 'group' argument in rowsum work?

I understand what rowsum() does, but I'm trying to get it to work for myself. I've used the example provided in R which is structured as such:
x <- matrix(runif(100), ncol = 5)
group <- sample(1:8, 20, TRUE)
xsum <- rowsum(x, group)
What is the matrix of values that is produced by xsum and how are the values obtained. What I thought was happening was that the values obtained from group were going to be used to state how many entries from the matrix to use in a rowsum. For example, say that group = (2,4,3,1,5). What I thought this would mean is that the first two entries going by row would be selected as the first entry to xsum. It appears as though this is not what is happening.
rowsum adds all rows that have the same group value. Let us take a simpler example.
m <- cbind(1:4, 5:8)
m
## [,1] [,2]
## [1,] 1 5
## [2,] 2 6
## [3,] 3 7
## [4,] 4 8
group <- c(1, 1, 2, 2)
rowsum(m, group)
## [,1] [,2]
## 1 3 11
## 2 7 15
Since the first two rows correspond to group 1 and the last 2 rows to group 2 it sums the first two rows giving the first row of the output and it sums the last 2 rows giving the second row of the output.
rbind(`1` = m[1, ] + m[2, ], `2` = m[3, ] + m[4, ])
## [,1] [,2]
## 1 3 11
## 2 7 15
That is the 3 is formed by adding the 1 from row 1 of m and the 2 of row 2 of m. The 11 is formed by adding 5 from row 1 of m and 6 from row 2 of m.
7 and 15 are formed similarly.

Calculating moving differences across columns per row in r

I would like to do calculations across columns in my data, by row. The calculations are "moving" in that I would like to know the difference between two numbers in column 1 and 2, then columns 3 and 4, and so on. I have looked at "loops" and "rollapply" functions, but could not figure this out. Below are three options of what was attempted. Only the third option gives me the result I am after, but it is very lengthy code and also does not allow for automation (the input data will be a much larger matrix, so typing out the calculation for each row won't work).
Please advice how to make this code shorter and/or any other packages/functions to check out which will do the job. THANK YOU!
MY TEST SCRIPT IN R + errors/results
Sample data set
a<- c(1,2,3, 4, 5)
b<- c(1,2,3, 4, 5)
c<- c(1,2,3, 4, 5)
test.data <- data.frame(cbind(a,b*2,c*10))
names(test.data) <- c("a", "b", "c")
Sample of calculations attempted:
OPTION 1
require(zoo)
rollapply(test.data, 2, diff, fill = NA, align = "right", by.column=FALSE)
RESULT 1 (not what we're after. What we need is at the bottom of Option 3)
# a b c
#[1,] NA NA NA
#[2,] 1 2 10
#[3,] 1 2 10
#[4,] 1 2 10
#[5,] 1 2 10
OPTION 2:
results <- for (i in 1:length(nrow(test.data))) {
diff(as.numeric(test.data[i,]), lag=1)
print(results)}
RESULT 2: (again not what we're after)
# NULL
OPTION 3: works, but long way, so would like to simplify code and make generic for any length of observations in my dataframe and any number of columns (i.e. more than 3). I would like to "automate" the steps below, if know number of observations (i.e. rows).
row1=diff(as.numeric(test[1,], lag=1))
row2=diff(as.numeric(test[2,], lag=1))
row3=diff(as.numeric(test[3,], lag=1))
row4=diff(as.numeric(test[4,], lag=1))
row5=diff(as.numeric(test[5,], lag=1))
results.OK=cbind.data.frame(row1, row2, row3, row4, row5)
transpose.results.OK=data.frame(t(as.matrix(results.OK)))
names(transpose.results.OK)=c("diff.ab", "diff.bc")
Final.data = transpose.results.OK
print(Final.data)
RESULT 3: (THIS IS WHAT I WOULD LIKE TO GET, "row1" can be "obs1" etc)
# diff.ab diff.bc
#row1 1 8
#row2 2 16
#row3 3 24
#row4 4 32
#row5 5 40
THE END
Here are the 3 options redone plus a 4th option:
# 1
library(zoo)
d <- t(rollapplyr(t(test.data), 2, diff, by.column = FALSE))
# 2
d <- test.data[-1]
for (i in 1:nrow(test.data)) d[i, ] <- diff(unlist(test.data[i, ]))
# 3
d <- t(diff(t(test.data)))
# 4 - also this works
nc <- ncol(test.data)
d <- test.data[-1] - test.data[-nc]
For any of them to set the names:
colnames(d) <- paste0("diff.", head(names(test.data), -1), colnames(d))
(2) and (4) give this data.frame and (1) and (3) give the corresponding matrix:
> d
diff.ab diff.bc
1 1 8
2 2 16
3 3 24
4 4 32
5 5 40
Use as.matrix or as.data.frame if you want the other.
An apply based solution using diff on row-wise can be achieved as:
# Result
res <- t(apply(test.data, 1, diff)) #One can change it to data.frame
# Name of the columns
colnames(res) <- paste0("diff.", head(names(test.data), -1),
tail(names(test.data), -1))
res
# diff.ab diff.bc
# [1,] 1 8
# [2,] 2 16
# [3,] 3 24
# [4,] 4 32
# [5,] 5 40

Circumvent aggregation in for-loop R

I want to find the first index k of an array, where the aggregate until that k is bigger than an given cutoff. This looks like follows in the code:
k <- 0
agg <- 0
while (agg < cutoff) {
k <- k +1
agg <- sum(array[1:k])
}
I was told there is a way to rewrite this without the for loop, I was also told the which statement would be helpful. I'm new to R and couldn't find the way. Any thoughts on this?
First we find array of partial sums:
x <- 1:10
partial_sums <- Reduce('+', x, accumulate = T)
partial_sums
[1] 1 3 6 10 15 21 28 36 45 55
Next we find the indices of all the elements of partial_sums array which are bigger then cutoff:
cutoff <- 17
indices <- which(partial_sums > cutoff)
indices[1]
[1] 6
Please note, that indices could be empty.
You can use the following:
seed(123)#in order to have reproducible "random" numbers
m1 <- matrix(sample(10),nrow = 5,ncol = 2)# create a matrix
m1
[,1] [,2]
[1,] 7 5
[2,] 4 2
[3,] 9 8
[4,] 1 6
[5,] 3 10
cutoff <- 5 #applying cutoff value
apply(m1,2,function(x){x<cutoff})#checking each column using apply instead of loop
OR:
which(m1 < cutoff) #results in the indices of m1 that comply to the condition <cutoff
[1] 2 4 5 7
EDIT
cutoff<-30# a new cutoff
v1<-unlist(lapply(seq_along(1:(nrow(m1)*ncol(m1))),function(x){sum(m1[1:x])}))#adding the values of each cell
which(v1>=cutoff)[1]#find the 1st of occurrence

R matrix getting row and column number and actual value

I have a matrix as below
B = matrix(
c(2, 4, 3, 1, 5, 7),
nrow=3,
ncol=2)
B # B has 3 rows and 2 columns
# [,1] [,2]
#[1,] 2 1
#[2,] 4 5
#[3,] 3 7
I would like to create a data.frame with 3 columns: row number, column number and actual value from above matrix. I am thinking of writing 2 for loops. Is there a more efficient way to do this?
The output that i want (i am showing only first 2 rows below)
rownum columnnum value
1 1 2
1 2 1
Try
cbind(c(row(B)), c(col(B)), c(B))
Or
library(reshape2)
melt(B)
As per #nicola's comments, the output needed may be in the row-major order. In that case, take the transpose of the matrix and do the same
TB <- t(B)
cbind(rownum = c(col(TB)), colnum = c(row(TB)), value = c(TB))
data.frame(which(B==B, arr.ind=TRUE), value=as.vector(B))

Swap (selected/subset) data frame columns in R

What is the simplest way that one can swap the order of a selected subset of columns in a data frame in R. The answers I have seen (Is it possible to swap columns around in a data frame using R?) use all indices / column names for this. If one has, say, 100 columns and need either: 1) to swap column 99 with column 1, or 2) move column 99 before column 1 (but keeping column 1 now as column 2) the suggested approaches appear cumbersome. Funny there is no small package around for this (Wickham's "reshape" ?) - or can one suggest a simple code ?
If you really want a shortcut for this, you could write a couple of simple functions, such as the following.
To swap the position of two columns:
swapcols <- function(x, col1, col2) {
if(is.character(col1)) col1 <- match(col1, colnames(x))
if(is.character(col2)) col2 <- match(col2, colnames(x))
if(any(is.na(c(col1, col2)))) stop("One or both columns don't exist.")
i <- seq_len(ncol(x))
i[col1] <- col2
i[col2] <- col1
x[, i]
}
To move a column from one position to another:
movecol <- function(x, col, to.pos) {
if(is.character(col)) col <- match(col, colnames(x))
if(is.na(col)) stop("Column doesn't exist.")
if(to.pos > ncol(x) | to.pos < 1) stop("Invalid position.")
x[, append(seq_len(ncol(x))[-col], col, to.pos - 1)]
}
And here are examples of each:
(m <- matrix(1:12, ncol=4, dimnames=list(NULL, letters[1:4])))
# a b c d
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
swapcols(m, col1=1, col2=3) # using column indices
# c b a d
# [1,] 7 4 1 10
# [2,] 8 5 2 11
# [3,] 9 6 3 12
swapcols(m, 'd', 'a') # or using column names
# d b c a
# [1,] 10 4 7 1
# [2,] 11 5 8 2
# [3,] 12 6 9 3
movecol(m, col='a', to.pos=2)
# b a c d
# [1,] 4 1 7 10
# [2,] 5 2 8 11
# [3,] 6 3 9 12

Resources