Related
I have a matrix with 5 columns and 4 rows. I also have a vector with 3 columns. I want to subtract the values in the vector from columns 3,4 and 5 respectively at each row of the matrix.
b <- matrix(rep(1:20), nrow=4, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
c <- c(5,6,7)
to get
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 4 7 10
[2,] 2 6 5 8 11
[3,] 3 7 6 9 12
[4,] 4 8 7 10 13
This is exactly what sweep was made for:
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- sweep(b[,3:5], 2, x)
b
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 5 4 7 10
#[2,] 2 6 5 8 11
#[3,] 3 7 6 9 12
#[4,] 4 8 7 10 13
..or even without subsetting or reassignment:
sweep(b, 2, c(0,0,x))
Perhaps not that elegant, but
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- t(t(b[,3:5])-x)
should do the trick. We subset the matrix to change only the part we need, and we use t() (transpose) to flip the matrix so simple vector recycling will take care of subtracting from the correct row.
If you want to avoid the transposed, you could do something like
b[,3:5] <- b[,3:5]-x[col(b[,3:5])]
as well. Here we subset twice, and we use the second to get the correct column for each value in x because both those matrices will index in the same order.
I think my favorite from the question that #thelatemail linked was
b[,3:5] <- sweep(b[,3:5], 2, x, `-`)
Another way, with apply:
b[,3:5] <- t(apply(b[,3:5], 1, function(x) x-c))
A simple solution:
b <- matrix(rep(1:20), nrow=4, ncol=5)
c <- c(5,6,7)
for(i in 1:nrow(b)) {
b[i,3:5] <- b[i,3:5] - c
}
This can be done with the rray package in a very satisfying way (using its (numpy-like) broadcasting - operator %b-%):
#install.packages("rray")
library(rray)
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5, 6, 7)
b[, 3:5] <- b[, 3:5] %b-% matrix(x, 1)
b
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 5 4 7 10
#> [2,] 2 6 5 8 11
#> [3,] 3 7 6 9 12
#> [4,] 4 8 7 10 13
For large matrices this is even faster than sweep:
#install.packages("bench")
res <- bench::press(
size = c(10, 1000, 10000),
frac_selected = c(0.1, 0.5, 1),
{
B <- matrix(sample(size*size), nrow=size, ncol=size)
B2 <- B
x <- sample(size, size=ceiling(size*frac_selected))
idx <- sample(size, size=ceiling(size*frac_selected))
bench::mark(rray = {B2[, idx] <- B[, idx, drop = FALSE] %b-% matrix(x, nrow = 1); B2},
sweep = {B2[, idx] <- sweep(B[, idx, drop = FALSE], MARGIN = 2, x); B2}
)
}
)
plot(res)
The pattern list looks like:
pattern <- c('aaa','bbb','ccc','ddd')
X came from df looks like:
df$X <- c('aaa-053','aaa-001','aab','bbb')
What I tried to do: use agrep to find the matching name in pattern based on df$X, then assign value to an existing column 'column2' based on the matching result, for example, if 'aaa-053' matched 'aaa', then 'aaa' would be the value in 'column2', if not matched, then return na in that column.
for (i in 1:length(pattern)) {
match <- agrep(pattern, df$X, ignore.case=TRUE, max=0)
if agrep = TRUE {
df$column2 <- pattern
} else {df$column2 <- na
}
}
Ideal column2 in df looks like:
'aaa','aaa',na,'bbb'
agrep by itself isn't going to give you much to determine which to use when multiples match. For instance,
agrep(pattern[1], df$x)
# [1] 1 2 3
which makes sense for the first two, but the third is not among your expected values. Similarly, it's feasible that it might select multiple patterns for a given string.
Here's an alternative:
D <- adist(pattern, df$x, fixed = FALSE)
D
# [,1] [,2] [,3] [,4]
# [1,] 0 0 1 3
# [2,] 3 3 2 0
# [3,] 3 3 3 3
# [4,] 3 3 3 3
D[D > 0] <- NA
D
# [,1] [,2] [,3] [,4]
# [1,] 0 0 NA NA
# [2,] NA NA NA 0
# [3,] NA NA NA NA
# [4,] NA NA NA NA
apply(D, 2, function(z) which.min(z)[1])
# [1] 1 1 NA 2
pattern[apply(D, 2, function(z) which.min(z)[1])]
# [1] "aaa" "aaa" NA "bbb"
I have the following R matrix that is a combination of 2x3 and 3x3 submatrices and it can be more than 2 submatrices with different dimension (e.g. m1xp and m2xp and m3xp where each of m1,m2,m3 <= p)
A2 <- list(rbind(c(1,1,1),c(-1,1,-1)),
rbind(c(-1,1,1),c(1,-1,2),c(2,-1,2)))
library(Matrix)
A2 <- as.matrix(Matrix::bdiag(A2))
Rhs <- matrix(c(0,5,0.5,4),nrow = 4)
beta <- c(rep(1.2,3),c(0.5,0.2,0.1))
> A2
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 0 0 0
[2,] -1 1 -1 0 0 0
[3,] 0 0 0 -1 1 1
[4,] 0 0 0 1 -1 2
[5,] 0 0 0 2 -1 2
I would like to get all the rows indices combination between the first sub-matrix and the 2nd sub-matrix to solve an linear optimization problem. The combination has to be from both submatrices then solve for new beta and then check if the condition Aq %*% beta == Rhs is satisfy, stop. If not, then take another combination. I think below is all the rows combination between the sub-matrices:
A combination as one from the first sub-matrix and one from the second sub-matrix
Aq <- A2[c(1,3),]
Aq <- A2[c(1,4),]
Aq <- A2[c(1,5),]
Aq <- A2[c(2,3),]
Aq <- A2[c(2,4),]
Aq <- A2[c(2,5),]
Then, a combination as one from the first and 2 from the second matrix
Aq <- A2[c(1,3,4),]
Aq <- A2[c(1,3,5),]
Aq <- A2[c(1,4,5),]
Aq <- A2[c(2,3,4),]
Aq <- A2[c(2,3,5),]
Aq <- A2[c(2,4,5),]
Then, a combination as one from the first and 3 from the second matrix
Aq <- A2[c(1,3,4,5),]
Aq <- A2[c(2,3,4,5),]
Then, a combination as 2 from the first and one from the second matrix
Aq <- A2[c(1,2,3),]
Aq <- A2[c(1,2,4),]
Aq <- A2[c(1,2,5),]
Then, a combination as 2 from the first and 2 from the second matrix
Aq <- A2[c(1,2,3,4),]
Aq <- A2[c(1,2,3,5),]
Aq <- A2[c(1,2,4,5),]
Then, a combination as 2 from the first and 3 from the second matrix
Aq <- A2[c(1,2,3,4,5),]
Is there a better way to get all the combinations?
Then I would like to create a loop that choice one on the above combination at a time and check if
if (Aq %*% beta == Rhs) {
break
} else {
TAKE ANOTHER COMBINATION Aq
}
Please note I could have more than 2 submatrices that create the block matrix. Then I have to create all row combinations between from the first, 2nd and 3rd matrix. I am hoping there is easy way to do in R. I have tried grid.expand function but it is not giving me the desired output.
A possible base R approach:
indices1 <- 1:2
indices2 <- 3:5
apply(expand.grid(seq_along(indices1), seq_along(indices2)), 1,
function(x) t(apply(
expand.grid(combn(indices1, x[1], simplify=FALSE),
combn(indices2, x[2], simplify=FALSE)),
1, unlist)))
output:
[[1]]
Var1 Var2
[1,] 1 3
[2,] 2 3
[3,] 1 4
[4,] 2 4
[5,] 1 5
[6,] 2 5
[[2]]
Var11 Var12 Var2
[1,] 1 2 3
[2,] 1 2 4
[3,] 1 2 5
[[3]]
Var1 Var21 Var22
[1,] 1 3 4
[2,] 2 3 4
[3,] 1 3 5
[4,] 2 3 5
[5,] 1 4 5
[6,] 2 4 5
[[4]]
Var11 Var12 Var21 Var22
[1,] 1 2 3 4
[2,] 1 2 3 5
[3,] 1 2 4 5
[[5]]
Var1 Var21 Var22 Var23
[1,] 1 3 4 5
[2,] 2 3 4 5
[[6]]
Var11 Var12 Var21 Var22 Var23
[1,] 1 2 3 4 5
edit: adding a more general version:
#identifying the indices
indices <- split(seq_len(nrow(A2)), max.col(abs(A2) > 0, "first"))
#generating the combinations
apply(expand.grid(lapply(indices, seq_along)), 1L,
function(idx) {
t(apply(
expand.grid(
lapply(seq_along(idx),
function(k) {
combn(indices[[k]], idx[k], simplify=FALSE)
})),
1L, unlist))
})
I spent a while the other day looking for a way to check if a row vector is contained in some set of row vectors in R. Basically, I want to generalize the %in% operator to match a tuple instead of each entry in a vector. For example, I want:
row.vec = c("A", 3)
row.vec
# [1] "A" "3"
data.set = rbind(c("A",1),c("B",3),c("C",2))
data.set
# [,1] [,2]
# [1,] "A" "1"
# [2,] "B" "3"
# [3,] "C" "2"
row.vec %tuple.in% data.set
# [1] FALSE
for my made-up operator %tuple.in% because the row vector c("A",3) is not a row vector in data.set. Using the %in% operator gives:
row.vec %in% data.set
# [1] TRUE TRUE
because "A" and 3 are in data.set, which is not what I want.
I have two questions. First, are there any good existing solutions to this?
Second, since I couldn't find them (even if they exist), I tried to write my own function to do it. It works for an input matrix of row vectors, but I'm wondering if any experts have proposed improvements:
is.tuple.in <- function(matrix1, matrix2){
# Apply rbind() so that matrix1 has columns even if it is a row vector.
matrix1 = rbind(matrix1)
if(ncol(matrix1) != ncol(matrix2)){
stop("Matrices must have the same number of columns.") }
# Now check for the first row and handle other rows recursively
row.vec = matrix1[1,]
tuple.found = FALSE
for(i in 1:nrow(matrix2)){
# If we find a match, then this row exists in matrix 2 and we can break the loop
if(all(row.vec == matrix2[i,])){
tuple.found = TRUE
break
}
}
# If there are more rows to be checked, use a recursive call
if(nrow(matrix1) > 1){
return(c(tuple.found, is.tuple.in(matrix1[2:nrow(matrix1),],matrix2)))
} else {
return(tuple.found)
}
}
I see a couple problems with that that I'm not sure how to fix. First, I'd like the base case to be clear at the start of the function. I didn't manage to do this because I pass matrix1[2:nrow(matrix1),] in the recursive call, which produces an error if matrix1 has one row. So instead of getting to a case where matrix1 is empty, I have an if condition at the end deciding if more iterations are necessary.
Second, I think the use of rbind() at the start is sloppy, but I needed it for when matrix1 had been reduced to a single row. Without using rbind(), ncol(matrix1) produced an error in the 1-row case. I figure my trouble here has to do with a lack of knowledge about R data types.
Any help would be appreciated.
I'm wondering if you have made this a bit more complicated than it is. For example,
set.seed(1618)
vec <- c(1,3)
mat <- matrix(rpois(1000,3), ncol = 2)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
# gives me this
# [,1] [,2]
# 6 3 1
# 38 3 1
# 39 3 1
# 85 1 3
# 88 1 3
# 89 1 3
# 95 3 1
# 113 1 3
# ...
you could subset this further if you care about the order
or you could modify the function slightly:
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [,1] [,2]
# 85 1 3
# 88 1 3
# 89 1 3
# 113 1 3
# 133 1 3
# 139 1 3
# 187 1 3
# ...
another example with a longer vector
set.seed(1618)
vec <- c(1,4,5,2)
mat <- matrix(rpois(10000, 3), ncol = 4)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
# [,1] [,2] [,3] [,4]
# 57 2 5 1 4
# 147 1 5 2 4
# 279 1 2 5 4
# 303 1 5 2 4
# 437 1 5 4 2
# 443 1 4 5 2
# 580 5 4 2 1
# ...
I see a couple that match:
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [,1] [,2] [,3] [,4]
# 443 1 4 5 2
# 901 1 4 5 2
# 1047 1 4 5 2
but only three
for your single row case:
vec <- c(1,4,5,2)
mat <- matrix(c(1,4,5,2), ncol = 4)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [1] 1 4 5 2
here is a simple function with the above code
is.tuplein <- function(vec, mat, exact = TRUE) {
rownames(mat) <- 1:nrow(mat)
if (exact)
tmp <- mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
else tmp <- mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
return(tmp)
}
is.tuplein(vec = vec, mat = mat)
# [1] 1 4 5 2
seems to work, so let's make our own %in% operator:
`%tuple%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = TRUE)
`%tuple1%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = FALSE)
and try her out
set.seed(1618)
c(1,2,3) %tuple% matrix(rpois(1002,3), ncol = 3)
# [,1] [,2] [,3]
# 133 1 2 3
# 190 1 2 3
# 321 1 2 3
set.seed(1618)
c(1,2,3) %tuple1% matrix(rpois(1002,3), ncol = 3)
# [,1] [,2] [,3]
# 48 2 3 1
# 64 2 3 1
# 71 1 3 2
# 73 3 1 2
# 108 3 1 2
# 112 1 3 2
# 133 1 2 3
# 166 2 1 3
Does this do what you want (even for more than 2 columns)?
paste(row.vec,collapse="_") %in% apply(data.set,1,paste,collapse="_")
Now I have a data set that looks like this:
> data
a b c d
[1,] 0.5943590 2.195610 0.5332164 1.3004142
[2,] 0.7635876 1.917823 0.9714945 1.3251010
[3,] 0.9942722 2.350122 1.2048159 1.1675700
[4,] 0.3736785 1.876318 0.9109197 0.8520509
And then I want to use a function for every two columns, for example,
F2<- function(x,y) (sum((x - y) ^ 2)) #define function
F2(data$a, data$b) #use function for first two columns
F2(data$a, data$c) #use function for first and third columns
F2(data$b, data$c) #use function for second and third columns
..................
How to use apply family to do this? Any help is greatly appreciated.
That's a job for combn:
#some data
set.seed(42)
m <- matrix(rnorm(16),4)
F2<- function(x,y) (sum((x - y) ^ 2))
res <- matrix(NA, ncol(m), ncol(m))
res[lower.tri(res)] <- combn(ncol(m), 2,
FUN=function(ind) F2(m[,ind[1]], m[,ind[2]]))
print(res)
# [,1] [,2] [,3] [,4]
# [1,] NA NA NA NA
# [2,] 2.992875 NA NA NA
# [3,] 4.293073 8.320698 NA NA
# [4,] 7.944818 6.484424 16.44946 NA
#for nicer printing
as.dist(res)
# 1 2 3
# 2 2.992875
# 3 4.293073 8.320698
# 4 7.944818 6.484424 16.449463
And of course for this specific function you should better use dist, which is optimized for that kind of distance calculations:
dist(t(m))^2
# 1 2 3
# 2 2.992875
# 3 4.293073 8.320698
# 4 7.944818 6.484424 16.449463