After several hours of searching, I am turning to your expertise. Beginner in R, I try to speed up my code. My goal is to replace the values in a matrix A. However, I want to replace values based on two vectors of another matrix B. B[, 1] is the name of row i of the matrix A. The second column, B[, 2] corresponds to the name of column of the matrix A.
The first version of my code was to use the match function in a loop.
for(k in 1:L){
i <- B[k,1]
j <- B[k,2]
d <- match(i,rownames(A))
e <- match(j,colnames(A))
A[d, e] <- 0
}
The second version allowed me to speed a little bit:
for( k in 1:L) {
A[match(B[k,1],rownames(A)), match(B[k,2],colnames(A))] <- 0
}
However, the processing time is long, too long. So I thought to use the apply function. For this, I have to use apply in each row vectors of B.
Is Using apply function a great way? Or I am going in the wrong way?
It appears to me that you can simply do A[B[, 1:2]] <- 0, by using the power of matrix indexing.
For example, A[cbind(1:4, 1:4)] <- 0 will replace A[1,1], A[2,2], A[3,3] and A[4,4] to 0. In fact, if A has "dimnames" attributes (the "rownames" and "colnames" you refer to), we can also use the character strings as index.
Reproducible example
A <- matrix(1:16, 4, 4, dimnames = list(letters[1:4], LETTERS[1:4]))
# A B C D
#a 1 5 9 13
#b 2 6 10 14
#c 3 7 11 15
#d 4 8 12 16
set.seed(0); B <- cbind(sample(letters[1:4])), sample(LETTERS[1:4]))
# [,1] [,2]
#[1,] "d" "D"
#[2,] "a" "A"
#[3,] "c" "B"
#[4,] "b" "C"
## since `B` has just 2 columns, we can use `B` rather than `B[, 1:2]`
A[B] <- 0
# A B C D
#a 0 5 9 13
#b 2 6 0 14
#c 3 0 11 15
#d 4 8 12 0
Related
I want to find the first index k of an array, where the aggregate until that k is bigger than an given cutoff. This looks like follows in the code:
k <- 0
agg <- 0
while (agg < cutoff) {
k <- k +1
agg <- sum(array[1:k])
}
I was told there is a way to rewrite this without the for loop, I was also told the which statement would be helpful. I'm new to R and couldn't find the way. Any thoughts on this?
First we find array of partial sums:
x <- 1:10
partial_sums <- Reduce('+', x, accumulate = T)
partial_sums
[1] 1 3 6 10 15 21 28 36 45 55
Next we find the indices of all the elements of partial_sums array which are bigger then cutoff:
cutoff <- 17
indices <- which(partial_sums > cutoff)
indices[1]
[1] 6
Please note, that indices could be empty.
You can use the following:
seed(123)#in order to have reproducible "random" numbers
m1 <- matrix(sample(10),nrow = 5,ncol = 2)# create a matrix
m1
[,1] [,2]
[1,] 7 5
[2,] 4 2
[3,] 9 8
[4,] 1 6
[5,] 3 10
cutoff <- 5 #applying cutoff value
apply(m1,2,function(x){x<cutoff})#checking each column using apply instead of loop
OR:
which(m1 < cutoff) #results in the indices of m1 that comply to the condition <cutoff
[1] 2 4 5 7
EDIT
cutoff<-30# a new cutoff
v1<-unlist(lapply(seq_along(1:(nrow(m1)*ncol(m1))),function(x){sum(m1[1:x])}))#adding the values of each cell
which(v1>=cutoff)[1]#find the 1st of occurrence
After several hours of searching, I am turning to your expertise. Beginner in R, I try to speed up my code. My goal is to replace the values in a matrix A. However, I want to replace values based on two vectors of another matrix B. B[, 1] is the name of row i of the matrix A. The second column, B[, 2] corresponds to the name of column of the matrix A.
The first version of my code was to use the match function in a loop.
for(k in 1:L){
i <- B[k,1]
j <- B[k,2]
d <- match(i,rownames(A))
e <- match(j,colnames(A))
A[d, e] <- 0
}
The second version allowed me to speed a little bit:
for( k in 1:L) {
A[match(B[k,1],rownames(A)), match(B[k,2],colnames(A))] <- 0
}
However, the processing time is long, too long. So I thought to use the apply function. For this, I have to use apply in each row vectors of B.
Is Using apply function a great way? Or I am going in the wrong way?
It appears to me that you can simply do A[B[, 1:2]] <- 0, by using the power of matrix indexing.
For example, A[cbind(1:4, 1:4)] <- 0 will replace A[1,1], A[2,2], A[3,3] and A[4,4] to 0. In fact, if A has "dimnames" attributes (the "rownames" and "colnames" you refer to), we can also use the character strings as index.
Reproducible example
A <- matrix(1:16, 4, 4, dimnames = list(letters[1:4], LETTERS[1:4]))
# A B C D
#a 1 5 9 13
#b 2 6 10 14
#c 3 7 11 15
#d 4 8 12 16
set.seed(0); B <- cbind(sample(letters[1:4])), sample(LETTERS[1:4]))
# [,1] [,2]
#[1,] "d" "D"
#[2,] "a" "A"
#[3,] "c" "B"
#[4,] "b" "C"
## since `B` has just 2 columns, we can use `B` rather than `B[, 1:2]`
A[B] <- 0
# A B C D
#a 0 5 9 13
#b 2 6 0 14
#c 3 0 11 15
#d 4 8 12 0
I know there are similar questions but I couldn't find an answer to my question. I'm trying to rank elements in a matrix and then extract data of 5 highest elements.
Here is my attempt.
set.seed(20)
d<-matrix(rnorm(100),nrow=10,ncol=10)
start<-d[1,1]
for (i in 1:10) {
for (j in 1:10) {
if (start < d[i,j])
{high<-d[i,j]
rowind<-i
colind<-j
}
}
}
Although this gives me the data of the highest element, including row and column numbers, I can't think of a way to do the same for elements ranked from 2 to 5. I also tried
rank(d, ties.method="max")
But it wasn't helpful because it just spits out the rank in vector format.
What I ultimately want is a data frame (or any sort of table) that contains
rank, column name, row name, and the data(number) of highest 5 elements in matrix.
Edit
set.seed(20)
d<-matrix(rnorm(100),nrow=10,ncol=10)
d[1,2]<-5
d[2,1]<-5
d[1,3]<-4
d[3,1]<-4
Thanks for the answers. Those perfectly worked for my purpose, but as I'm running this code for correlation chart -where there will be duplicate numbers for every pair- I want to count only one of the two numbers for ranking purpose. Is there any way to do this? Thanks.
Here's a very crude way:
DF = data.frame(row = c(row(d)), col = c(col(d)), v = c(d))
DF[order(DF$v, decreasing=TRUE), ][1:5, ]
row col v
91 1 10 2.208443
82 2 9 1.921899
3 3 1 1.785465
32 2 4 1.590146
33 3 4 1.556143
It would be nice to only have to partially sort, but in ?order, it looks like this option is only available for sort, not for order.
If the matrix has row and col names, it might be convenient to see them instead of numbers. Here's what I might do:
dimnames(d) <- list(letters[1:10], letters[1:10])
DF = data.frame(as.table(d))
DF[order(DF$Freq, decreasing=TRUE), ][1:5, ]
Var1 Var2 Freq
91 a j 2.208443
82 b i 1.921899
3 c a 1.785465
32 b d 1.590146
33 c d 1.556143
The column names don't make much sense here, unfortunately, but you can change them with names(DF) <- as usual.
Here is one option with Matrix
library(Matrix)
m1 <- summary(Matrix(d, sparse=TRUE))
head(m1[order(-m1[,3]),],5)
# i j x
#93 3 10 2.359634
#31 1 4 2.234804
#23 3 3 1.980956
#55 5 6 1.801341
#16 6 2 1.678989
Or use melt
library(reshape2)
m2 <- melt(d)
head(m2[order(-m2[,3]), ], 5)
Here is something quite simple in base R.
# set.seed(20)
# d <- matrix(rnorm(100), nrow = 10, ncol = 10)
d.rank <- matrix(rank(-d), nrow = 10, ncol = 10)
which(d.rank <= 5, arr.ind=TRUE)
row col
[1,] 3 1
[2,] 2 4
[3,] 3 4
[4,] 2 9
[5,] 1 10
d[d.rank <= 5]
[1] 1.785465 1.590146 1.556143 1.921899 2.208443
Results can (easily) be made clearer (see comment from Frank):
cbind(which(d.rank <= 5, arr.ind=TRUE), v = d[d.rank <= 5], rank = rank(-d[d.rank <= 5]))
row col v rank
[1,] 3 1 1.785465 3
[2,] 2 4 1.590146 4
[3,] 3 4 1.556143 5
[4,] 2 9 1.921899 2
[5,] 1 10 2.208443 1
I have a data set look like this:
a<-c(1,1,1,2,2,2,3,3,3)
b<-rep(1:3,3)
c<-c(rep(c("i","j","k","l"),2),"o")
d<-data.frame(a,b,c)
which gives:
a b c
1 1 1 i
2 1 2 j
3 1 3 k
4 2 1 l
5 2 2 i
6 2 3 j
7 3 1 k
8 3 2 l
9 3 3 o
I am looking for a way to transform c into the following form:
1 2 3
1 i j k
2 l i j
3 k l o
So basically I hope to use a as the row index, b as column index, then transform the column c to a matrix. Is there any way this could be done efficiently by using data.table or other packages?
Thanks a lot guys!
#doscendo's solution is pretty clean; you just have to make sure the data frame is sorted properly. Here is a slightly more generic version that uses a matrix index to create what you're after and will work both if the data frame doesn't specify every value, or if a value is specified more than once (last value prevails), or if the data isn't sorted (although of course for the last one you can always sort):
mx <- with(d, matrix(ncol=max(a), nrow=max(b)))
mx[as.matrix(d[1:2])] <- as.character(d$c)
[,1] [,2] [,3]
[1,] "i" "j" "k"
[2,] "l" "i" "j"
[3,] "k" "l" "o"
As docendo said you can use just c for row-major layout
matrix(c, 3, byrow=T)
Or if you desperately desire to use a and b vectors i suggest next trick
mymatrix = matrix(rep(0, 9), 3) # creating init matrix of required size
for(i in seq_along(a)){mymatrix[a[i],b[i]]=c[i]} # fill out the matrix
I have initially a matrix, p:
# p is a matrix
p
A B
[1,] 1 1
[2,] 2 3
[3,] 3 2
[4,] 1 1
[5,] 8 2
For a given matrix, I want to iterate through the rows and removing any inversions. So that the new matrix is:
p
A B
[1,] 1 1
[2,] 2 3
[3,] 8 2
This is what I got:
p<-unique(p) # gets rid of duplicates
output<-lapply(p, function(x){
check<-which(p$A[x,] %in% p$B[x,])#is the value in row x of column A found in
#column B if so return the row number it was found in column B
if (length(check)!=0 ){
if(p$A[check,]== p$B[x]){ # now check if at the found row (check)of p$A is equal to p$B[x]
p<-p[-check,] #if so remove that inverse
}
}
}
)
I get this message Error in which(p$A[x] %in% p$B[x]) :
Why am I getting this Error?
Is there a better way to find inversions?
Try
p <- unique(p)
p[!duplicated(apply(p, 1, function(x) paste(sort(x), collapse=''))),]
# A B
#[1,] 1 1
#[2,] 2 3
#[3,] 8 2
data
p <- matrix(c(1,2,3,1,8, 1,3,2,1,2),
dimnames=list(NULL, c("A", "B")), ncol=2)
It's not clear whether the order of values is important in your final output, but perhaps you can make use of pmin and pmax.
Here's an approach using those functions within "data.table":
library(data.table)
unique(as.data.table(p)[, list(A = pmin(A, B), B = pmax(A, B))])
# A B
# 1: 1 1
# 2: 2 3
# 3: 2 8
The question is a bit unclear. I am assuming based on your example that you want to remove the row containing "3 2" because first value occurs in the second column (in a different row). In that case
check <- which(p[,1] %in% p[,2])
should return the rows that you want to delete. Your second round of checking is not needed. You could just delete the rows returned.