I have initially a matrix, p:
# p is a matrix
p
A B
[1,] 1 1
[2,] 2 3
[3,] 3 2
[4,] 1 1
[5,] 8 2
For a given matrix, I want to iterate through the rows and removing any inversions. So that the new matrix is:
p
A B
[1,] 1 1
[2,] 2 3
[3,] 8 2
This is what I got:
p<-unique(p) # gets rid of duplicates
output<-lapply(p, function(x){
check<-which(p$A[x,] %in% p$B[x,])#is the value in row x of column A found in
#column B if so return the row number it was found in column B
if (length(check)!=0 ){
if(p$A[check,]== p$B[x]){ # now check if at the found row (check)of p$A is equal to p$B[x]
p<-p[-check,] #if so remove that inverse
}
}
}
)
I get this message Error in which(p$A[x] %in% p$B[x]) :
Why am I getting this Error?
Is there a better way to find inversions?
Try
p <- unique(p)
p[!duplicated(apply(p, 1, function(x) paste(sort(x), collapse=''))),]
# A B
#[1,] 1 1
#[2,] 2 3
#[3,] 8 2
data
p <- matrix(c(1,2,3,1,8, 1,3,2,1,2),
dimnames=list(NULL, c("A", "B")), ncol=2)
It's not clear whether the order of values is important in your final output, but perhaps you can make use of pmin and pmax.
Here's an approach using those functions within "data.table":
library(data.table)
unique(as.data.table(p)[, list(A = pmin(A, B), B = pmax(A, B))])
# A B
# 1: 1 1
# 2: 2 3
# 3: 2 8
The question is a bit unclear. I am assuming based on your example that you want to remove the row containing "3 2" because first value occurs in the second column (in a different row). In that case
check <- which(p[,1] %in% p[,2])
should return the rows that you want to delete. Your second round of checking is not needed. You could just delete the rows returned.
Related
I understand what rowsum() does, but I'm trying to get it to work for myself. I've used the example provided in R which is structured as such:
x <- matrix(runif(100), ncol = 5)
group <- sample(1:8, 20, TRUE)
xsum <- rowsum(x, group)
What is the matrix of values that is produced by xsum and how are the values obtained. What I thought was happening was that the values obtained from group were going to be used to state how many entries from the matrix to use in a rowsum. For example, say that group = (2,4,3,1,5). What I thought this would mean is that the first two entries going by row would be selected as the first entry to xsum. It appears as though this is not what is happening.
rowsum adds all rows that have the same group value. Let us take a simpler example.
m <- cbind(1:4, 5:8)
m
## [,1] [,2]
## [1,] 1 5
## [2,] 2 6
## [3,] 3 7
## [4,] 4 8
group <- c(1, 1, 2, 2)
rowsum(m, group)
## [,1] [,2]
## 1 3 11
## 2 7 15
Since the first two rows correspond to group 1 and the last 2 rows to group 2 it sums the first two rows giving the first row of the output and it sums the last 2 rows giving the second row of the output.
rbind(`1` = m[1, ] + m[2, ], `2` = m[3, ] + m[4, ])
## [,1] [,2]
## 1 3 11
## 2 7 15
That is the 3 is formed by adding the 1 from row 1 of m and the 2 of row 2 of m. The 11 is formed by adding 5 from row 1 of m and 6 from row 2 of m.
7 and 15 are formed similarly.
After several hours of searching, I am turning to your expertise. Beginner in R, I try to speed up my code. My goal is to replace the values in a matrix A. However, I want to replace values based on two vectors of another matrix B. B[, 1] is the name of row i of the matrix A. The second column, B[, 2] corresponds to the name of column of the matrix A.
The first version of my code was to use the match function in a loop.
for(k in 1:L){
i <- B[k,1]
j <- B[k,2]
d <- match(i,rownames(A))
e <- match(j,colnames(A))
A[d, e] <- 0
}
The second version allowed me to speed a little bit:
for( k in 1:L) {
A[match(B[k,1],rownames(A)), match(B[k,2],colnames(A))] <- 0
}
However, the processing time is long, too long. So I thought to use the apply function. For this, I have to use apply in each row vectors of B.
Is Using apply function a great way? Or I am going in the wrong way?
It appears to me that you can simply do A[B[, 1:2]] <- 0, by using the power of matrix indexing.
For example, A[cbind(1:4, 1:4)] <- 0 will replace A[1,1], A[2,2], A[3,3] and A[4,4] to 0. In fact, if A has "dimnames" attributes (the "rownames" and "colnames" you refer to), we can also use the character strings as index.
Reproducible example
A <- matrix(1:16, 4, 4, dimnames = list(letters[1:4], LETTERS[1:4]))
# A B C D
#a 1 5 9 13
#b 2 6 10 14
#c 3 7 11 15
#d 4 8 12 16
set.seed(0); B <- cbind(sample(letters[1:4])), sample(LETTERS[1:4]))
# [,1] [,2]
#[1,] "d" "D"
#[2,] "a" "A"
#[3,] "c" "B"
#[4,] "b" "C"
## since `B` has just 2 columns, we can use `B` rather than `B[, 1:2]`
A[B] <- 0
# A B C D
#a 0 5 9 13
#b 2 6 0 14
#c 3 0 11 15
#d 4 8 12 0
I know there are similar questions but I couldn't find an answer to my question. I'm trying to rank elements in a matrix and then extract data of 5 highest elements.
Here is my attempt.
set.seed(20)
d<-matrix(rnorm(100),nrow=10,ncol=10)
start<-d[1,1]
for (i in 1:10) {
for (j in 1:10) {
if (start < d[i,j])
{high<-d[i,j]
rowind<-i
colind<-j
}
}
}
Although this gives me the data of the highest element, including row and column numbers, I can't think of a way to do the same for elements ranked from 2 to 5. I also tried
rank(d, ties.method="max")
But it wasn't helpful because it just spits out the rank in vector format.
What I ultimately want is a data frame (or any sort of table) that contains
rank, column name, row name, and the data(number) of highest 5 elements in matrix.
Edit
set.seed(20)
d<-matrix(rnorm(100),nrow=10,ncol=10)
d[1,2]<-5
d[2,1]<-5
d[1,3]<-4
d[3,1]<-4
Thanks for the answers. Those perfectly worked for my purpose, but as I'm running this code for correlation chart -where there will be duplicate numbers for every pair- I want to count only one of the two numbers for ranking purpose. Is there any way to do this? Thanks.
Here's a very crude way:
DF = data.frame(row = c(row(d)), col = c(col(d)), v = c(d))
DF[order(DF$v, decreasing=TRUE), ][1:5, ]
row col v
91 1 10 2.208443
82 2 9 1.921899
3 3 1 1.785465
32 2 4 1.590146
33 3 4 1.556143
It would be nice to only have to partially sort, but in ?order, it looks like this option is only available for sort, not for order.
If the matrix has row and col names, it might be convenient to see them instead of numbers. Here's what I might do:
dimnames(d) <- list(letters[1:10], letters[1:10])
DF = data.frame(as.table(d))
DF[order(DF$Freq, decreasing=TRUE), ][1:5, ]
Var1 Var2 Freq
91 a j 2.208443
82 b i 1.921899
3 c a 1.785465
32 b d 1.590146
33 c d 1.556143
The column names don't make much sense here, unfortunately, but you can change them with names(DF) <- as usual.
Here is one option with Matrix
library(Matrix)
m1 <- summary(Matrix(d, sparse=TRUE))
head(m1[order(-m1[,3]),],5)
# i j x
#93 3 10 2.359634
#31 1 4 2.234804
#23 3 3 1.980956
#55 5 6 1.801341
#16 6 2 1.678989
Or use melt
library(reshape2)
m2 <- melt(d)
head(m2[order(-m2[,3]), ], 5)
Here is something quite simple in base R.
# set.seed(20)
# d <- matrix(rnorm(100), nrow = 10, ncol = 10)
d.rank <- matrix(rank(-d), nrow = 10, ncol = 10)
which(d.rank <= 5, arr.ind=TRUE)
row col
[1,] 3 1
[2,] 2 4
[3,] 3 4
[4,] 2 9
[5,] 1 10
d[d.rank <= 5]
[1] 1.785465 1.590146 1.556143 1.921899 2.208443
Results can (easily) be made clearer (see comment from Frank):
cbind(which(d.rank <= 5, arr.ind=TRUE), v = d[d.rank <= 5], rank = rank(-d[d.rank <= 5]))
row col v rank
[1,] 3 1 1.785465 3
[2,] 2 4 1.590146 4
[3,] 3 4 1.556143 5
[4,] 2 9 1.921899 2
[5,] 1 10 2.208443 1
I have a matrix mat.
mat<-matrix(
c('a','a','b','a','b','b'),
nrow=3, ncol=2)
I want to make a vector of the count matches in each row of the matrix. For example, let's say I wanted to count the number of matches of the letter a in each row. The first row of the matrix has an a,a: two matches of a. The second row of the matrix has an a,b: one match of a.
I can count the number of matches of the character a in a row with this line of code:
sum(!is.na(charmatch(mat[1,c(1,2)],"a"))) # first row, returns 2
sum(!is.na(charmatch(mat[2,c(1,2)],"a"))) # second row, returns 1
I want to vectorize this counting procedure. In other words, I want to do something like this
as.vector(rowsum(!is.na(charmatch(mat[,c(1,2)], "a"))))
So that it returns a vector like this 2,1,0 which means 2 matches of a in row 1 of the matrix, 1 match of a in row 2 of the matrix, 0 matches of a in row 3 of the matrix.
You can just do
rowSums(mat=='a', na.rm=TRUE)
#[1] 2 1 0
For all unique values
Un <- sort(unique(c(mat)))
res <- sapply(Map(`==`, list(mat), Un), rowSums, na.rm=TRUE)
colnames(res) <- Un
res
# a b
#[1,] 2 0
#[2,] 1 1
#[3,] 0 2
Or as contributed by #Ananda Mahto, a faster approach would be
lvl <- sort(unique(c(mat)))
vapply(lvl, function(x) rowSums(mat == x, na.rm = TRUE), numeric(nrow(mat)))
If you wanted to do this for all values, you can try one of the following:
table with factor in apply
levs <- unique(c(mat))
t(apply(mat, 1, function(x) table(factor(x, levs))))
# a b
# [1,] 2 0
# [2,] 1 1
# [3,] 0 2
melt and dcast with fun.aggregate = length from "reshape2"
library(reshape2)
dcast(melt(mat), Var1 ~ value, value.var = "Var2")
# Aggregation function missing: defaulting to length
# Var1 a b
# 1 1 2 0
# 2 2 1 1
# 3 3 0 2
Better yet would just be table after manually creating the values to tabulate:
table(rep(sequence(nrow(mat)), ncol(mat)), c(mat))
#
# a b
# 1 2 0
# 2 1 1
# 3 0 2
What is the simplest way that one can swap the order of a selected subset of columns in a data frame in R. The answers I have seen (Is it possible to swap columns around in a data frame using R?) use all indices / column names for this. If one has, say, 100 columns and need either: 1) to swap column 99 with column 1, or 2) move column 99 before column 1 (but keeping column 1 now as column 2) the suggested approaches appear cumbersome. Funny there is no small package around for this (Wickham's "reshape" ?) - or can one suggest a simple code ?
If you really want a shortcut for this, you could write a couple of simple functions, such as the following.
To swap the position of two columns:
swapcols <- function(x, col1, col2) {
if(is.character(col1)) col1 <- match(col1, colnames(x))
if(is.character(col2)) col2 <- match(col2, colnames(x))
if(any(is.na(c(col1, col2)))) stop("One or both columns don't exist.")
i <- seq_len(ncol(x))
i[col1] <- col2
i[col2] <- col1
x[, i]
}
To move a column from one position to another:
movecol <- function(x, col, to.pos) {
if(is.character(col)) col <- match(col, colnames(x))
if(is.na(col)) stop("Column doesn't exist.")
if(to.pos > ncol(x) | to.pos < 1) stop("Invalid position.")
x[, append(seq_len(ncol(x))[-col], col, to.pos - 1)]
}
And here are examples of each:
(m <- matrix(1:12, ncol=4, dimnames=list(NULL, letters[1:4])))
# a b c d
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
swapcols(m, col1=1, col2=3) # using column indices
# c b a d
# [1,] 7 4 1 10
# [2,] 8 5 2 11
# [3,] 9 6 3 12
swapcols(m, 'd', 'a') # or using column names
# d b c a
# [1,] 10 4 7 1
# [2,] 11 5 8 2
# [3,] 12 6 9 3
movecol(m, col='a', to.pos=2)
# b a c d
# [1,] 4 1 7 10
# [2,] 5 2 8 11
# [3,] 6 3 9 12