I have a n x 3 x m array, call it I. It contains 3 columns, n rows (say n=10), and m slices. I have a computation that must be done to replace the third column in each slice based on the other 2 columns in the slice.
I've written a function insertNewRows(I[,,simIndex]) that takes a given slice and replaces the third column. The following for-loop does what I want, but it's slow. Is there a way to speed this up by using one of the apply functions? I cannot figure out how to get them to work in the way I'd like.
for(simIndex in 1:m){
I[,, simIndex] = insertNewRows(I[,,simIndex])
}
I can provide more details on insertNewRows if needed, but the short version is that it takes a probability based on the columns I[,1:2, simIndex] of a given slice of the array, and generates a binomial RV based on the probability.
It seems like one of the apply functions should work just by using
I = apply(FUN = insertNewRows, MARGIN = c(1,2,3)) but that just produces gibberish..?
Thank you in advance!
IK
The question has not defined the input nor the transformation nor the result so we can't really answer it but here is an example of adding a row of ones to to a[,,i] for each i so maybe that will suggest how you could solve the problem yourself.
This is how you could use sapply, apply, plyr::aaply, reshaping using matrix/aperm and abind::abind.
# input array and function
a <- array(1:24, 2:4)
f <- function(x) rbind(x, 1) # append a row of 1's
aa <- array(sapply(1:dim(a)[3], function(i) f(a[,,i])), dim(a) + c(1,0,0))
aa2 <- array(apply(a, 3, f), dim(a) + c(1,0,0))
aa3 <- aperm(plyr::aaply(a, 3, f), c(2, 3, 1))
aa4 <- array(rbind(matrix(a, dim(a)[1]), 1), dim(a) + c(1,0,0))
aa5 <- abind::abind(a, array(1, dim(a)[2:3]), along = 1)
dimnames(aa3) <- dimnames(aa5) <- NULL
sapply(list(aa2, aa3, aa4, aa5), identical, aa)
## [1] TRUE TRUE TRUE TRUE
aa[,,1]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [3,] 1 1 1
aa[,,2]
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
## [3,] 1 1 1
aa[,,3]
## [,1] [,2] [,3]
## [1,] 13 15 17
## [2,] 14 16 18
## [3,] 1 1 1
aa[,,4]
## [,1] [,2] [,3]
## [1,] 19 21 23
## [2,] 20 22 24
## [3,] 1 1 1
Related
I've seen a few solutions to similar problems, but they all require iteration over the number of items to be added together.
Here's my goal: from a list of numbers, find all of the combinations (without replacement) that add up to a certain total. For example, if I have numbers 1,1,2,3,5 and total 5, it should return 5,2,3, and 1,1,3.
I was trying to use combn but it required you to specify the number of items in each combination. Is there a way to do it that allows for solution sets of any size?
This is precisely what combo/permuteGeneral from RcppAlgos (I am the author) were built for. Since we have repetition of specific elements in our sample vector, we will be finding combinations of multisets that meet our criteria. Note that this is different than the more common case of generating combinations with repetition where each element is allowed to be repeated m times. For many combination generating functions, multisets pose problems as duplicates are introduced and must be dealt with. This can become a bottleneck in your code if the size of your data is decently large. The functions in RcppAlgos handle these cases efficiently without creating any duplicate results. I should mention that there are a couple of other great libraries that handle multisets quite well: multicool and arrangements.
Moving on to the task at hand, we can utilize the constraint arguments of comboGeneral to find all combinations of our vector that meet a specific criteria:
vec <- c(1,1,2,3,5) ## using variables from #r2evans
uni <- unique(vec)
myRep <- rle(vec)$lengths
ans <- 5
library(RcppAlgos)
lapply(seq_along(uni), function(x) {
comboGeneral(uni, x, freqs = myRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = ans)
})
[[1]]
[,1]
[1,] 5
[[2]]
[,1] [,2]
[1,] 2 3
[[3]]
[,1] [,2] [,3]
[1,] 1 1 3
[[4]]
[,1] [,2] [,3] [,4] ## no solutions of length 4
These functions are highly optimized and extend well to larger cases. For example, consider the following example that would produce over 30 million combinations:
## N.B. Using R 4.0.0 with new updated RNG introduced in 3.6.0
set.seed(42)
bigVec <- sort(sample(1:30, 40, TRUE))
rle(bigVec)
Run Length Encoding
lengths: int [1:22] 2 1 2 3 4 1 1 1 2 1 ...
values : int [1:22] 1 2 3 4 5 7 8 9 10 11 ...
bigUni <- unique(bigVec)
bigRep <- rle(bigVec)$lengths
bigAns <- 199
len <- 12
comboCount(bigUni, len, freqs = bigRep)
[1] 32248100
All 300000+ results are returned very quickly:
system.time(bigTest <- comboGeneral(bigUni, len, freqs = bigRep,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = bigAns))
user system elapsed
0.273 0.004 0.271
head(bigTest)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 2 3 4 25 26 26 26 27 28 30
[2,] 1 1 2 3 5 24 26 26 26 27 28 30
[3,] 1 1 2 3 5 25 25 26 26 27 28 30
[4,] 1 1 2 3 7 24 24 26 26 27 28 30
[5,] 1 1 2 3 7 24 25 25 26 27 28 30
[6,] 1 1 2 3 7 24 25 26 26 26 28 30
nrow(bigTest)
[1] 280018
all(rowSums(bigTest) == bigAns)
[1] TRUE
Addendum
I must mention that generally when I see a problem like: "finding all combinations that sum to a particular number" my first thought is integer partitions. For example, in the related problem Getting all combinations which sum up to 100 in R, we can easily solve with the partitions library. However, this approach does not extend to the general case (as we have here) where the vector contains specific repetition or we have a vector that contains values that don't easily convert to an integer equivalent (E.g. the vector (0.1, 0.2, 0.3, 0.4) can easily be treated as 1:4, however treating c(3.98486 7.84692 0.0038937 7.4879) as integers and subsequently applying an integer partitions approach would require an extravagant amount of computing power rendering this method useless).
I took your combn idea and looped over the possible sizes of the sets.
func = function(x, total){
M = length(x)
y = NULL
total = 15
for (m in 1:M){
tmp = combn(x, m)
ind = which(colSums(tmp) == total)
if (length(ind) > 0){
for (j in 1:length(ind))
y = c(y, list(tmp[,ind[j]]))
}
}
return (unique(lapply(y, sort)))
}
x = c(1,1,2,3,5,8,13)
> func(x, 15)
[[1]]
[1] 2 13
[[2]]
[1] 1 1 13
[[3]]
[1] 2 5 8
[[4]]
[1] 1 1 5 8
[[5]]
[1] 1 1 2 3 8
Obviously, this will have problems as M grows since tmp will get big pretty quickly and the length of y can't be (maybe?) pre-determined.
Similar to mickey's answer, we can use combn inside another looping mechanism. I'll use lapply:
vec <- c(1,1,2,3,5)
ans <- 5
Filter(length, lapply(seq_len(length(vec)),
function(i) {
v <- combn(vec, i)
v[, colSums(v) == ans, drop = FALSE]
}))
# [[1]]
# [,1]
# [1,] 5
# [[2]]
# [,1]
# [1,] 2
# [2,] 3
# [[3]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 3
You can omit the Filter(length, portion, though it may return a number of empty matrices. They're easy enough to deal with and ignore, I just thought removing them would be aesthetically preferred.
This method gives you a matrix with multiple candidates in each column, so
ans <- 4
Filter(length, lapply(seq_len(length(vec)),
function(i) {
v <- combn(vec, i)
v[, colSums(v) == ans, drop = FALSE]
}))
# [[1]]
# [,1] [,2]
# [1,] 1 1
# [2,] 3 3
# [[2]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 2
If duplicates are a problem, you can always do:
Filter(length, lapply(seq_len(length(vec)),
function(i) {
v <- combn(vec, i)
v <- v[, colSums(v) == ans, drop = FALSE]
v[,!duplicated(t(v)),drop = FALSE]
}))
# [[1]]
# [,1]
# [1,] 1
# [2,] 3
# [[2]]
# [,1]
# [1,] 1
# [2,] 1
# [3,] 2
Now here is a solution involving gtools:
# Creating lists of all permutations of the vector x
df1 <- gtools::permutations(n=length(x),r=length(x),v=1:length(x),repeats.allowed=FALSE)
ls1 <- list()
for(j in 1:nrow(df1)) ls1[[j]] <- x[df1[j,1:ncol(df1)]]
# Taking all cumulative sums and filtering entries equaling our magic number
sumsCum <- t(vapply(1:length(ls1), function(j) cumsum(ls1[[j]]), numeric(length(x))))
indexMN <- which(sumsCum == magicNumber, arr.ind = T)
finalList <- list()
for(j in 1:nrow(indexMN)){
magicRow <- indexMN[j,1]
magicCol <- 1:indexMN[j,2]
finalList[[j]] <- ls1[[magicRow]][magicCol]
}
finalList <- unique(finalList)
where x = c(1,1,2,3,5) and magicNumber = 5. This is a first draft, I am sure it can be improved here and there.
Not the most efficient but the most compact so far:
x <- c(1,1,2,3,5)
n <- length(x)
res <- 5
unique(combn(c(x,rep(0,n-1)), n, function(x) x[x!=0][sum(x)==res], FALSE))[-1]
# [[1]]
# [1] 1 1 3
#
# [[2]]
# [1] 2 3
#
# [[3]]
# [1] 5
#
I want to apply a function over one margin (column in my example) of a matrix. The problem is that the function returns matrix and apply converts it to vector so that it returns a matrix. My goal is to get three-dimensional array. Here is the example (note that matrix() is not the function of interest, just an example):
x <- matrix(1:12, 4, 3)
apply(x, 2, matrix, nrow = 2, ncol = 2)
The output is exactly the same as the input. I have pretty dull solution to this:
library(abind)
abind2 <- function (x, ...)
abind(x, ..., along = dim(x) + 1)
apply(x, 2, list) %>%
lapply(unlist) %>%
lapply(matrix, nrow = 2, ncol = 2) %>%
do.call(what = 'abind2')
I believe there must exist something better than this. Something that does not include list()ing and unlist()ing columns.
Edit:
Also, the solution should be ready to be easily applicable to any-dimensional array with any choice of MARGIN which my solution is not.
This, for example, I want to return 4-dimensional array.
x <- array(1:24, c(4,3,2))
apply(x, 2:3, list) %>%
lapply(unlist) %>%
lapply(matrix, nrow = 2, ncol = 2) %>%
do.call(what = 'abind2')
Not that complicated at all. Simply use
array(x, dim = c(2, 2, ncol(x)))
Matrix and general arrays are stored by column into a 1D long array in physical address. You can just reallocate dimension.
OK, here is possibly what you want to do in general:
tapply(x, col(x), FUN = matrix, nrow = 2, ncol = 2)
#$`1`
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
#
#$`2`
# [,1] [,2]
#[1,] 5 7
#[2,] 6 8
#
#$`3`
# [,1] [,2]
#[1,] 9 11
#[2,] 10 12
You can try to convert your matrix into a data.frame and use lapply to apply your function on the columns (as a data.frame is a list), it will return a list, where each element represents the function result for a column:
lapply(as.data.frame(x), matrix, nrow = 2, ncol = 2)
# $V1
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
# $V2
# [,1] [,2]
# [1,] 5 7
# [2,] 6 8
# $V3
# [,1] [,2]
# [1,] 9 11
# [2,] 10 12
EDIT with the second definition of x:
x <- array(1:24, c(4,3,2))
lapply(as.data.frame(x), matrix, nrow = 2, ncol = 2)
# $V1
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
# $V2
# [,1] [,2]
# [1,] 5 7
# [2,] 6 8
# $V3
# [,1] [,2]
# [1,] 9 11
# [2,] 10 12
# $V4
# [,1] [,2]
# [1,] 13 15
# [2,] 14 16
# $V5
# [,1] [,2]
# [1,] 17 19
# [2,] 18 20
# $V6
# [,1] [,2]
# [1,] 21 23
# [2,] 22 24
EDIT2: a try to get an arry as result
Based on this similar question, you may try this code:
x <- array(1:24, c(4,3,2))
sapply(1:3,
function(y) sapply(1:ncol(x[, y, ]),
function(z) matrix(x[,y,z], ncol=2, nrow=2),
simplify="array"),
simplify="array")
Dimension of the result is 2 2 2 3.
Actually, the problem here is that it needs two different calls to apply when x is an array of more than 2 dimension. In the last example of the quesion (with x <- array(1:24, c(4,3,2))), we want to apply to each element of third dimension a function that apply to each element of second dimension the matrix function.
i have here a minimal sample data to understand my final matrix:
test <- list( c(1, 2, 3, 4) )
test2 <- list( c(2, 3) )
and my matrix should be:
2 4 6 8
3 6 9 12
it's like a nestes for loop. I go over each row and in each i use the value from it and sum it with column value.
after a few houres I have this:
sapply(2, function(j) lapply(seq_along(test), function(i) test[[i]] * test2[[i]][j]))
it gives the final simulated row two: (param for row is '2' after sapply)
[[1]]
[1] 3 6 9 12
The going over rows could be done with seq_along(test2) but i don't know how to save data after each row ... i was last testing this: .. and fail..
a=matrix(data=0, nrow=2, ncol=4)
lapply(seq_along(test2), function(k) a[k,]<-unlist(sapply(2, function(j) lapply(seq_along(test), function(i) test[[i]] * test2[[i]][j])) ) )
output:
[1] 3 6 9 12
Later on, i would like to have more vectors in input lists and repeat the hole action descriped on top.
We can use outer after unlisting the list
t(outer(unlist(test), unlist(test2)))
# [,1] [,2] [,3] [,4]
#[1,] 2 4 6 8
#[2,] 3 6 9 12
You mean matrix multiplication? Quick example:
> t(matrix(unlist(test)) %*% matrix(unlist(test2), nrow = 1))
[,1] [,2] [,3] [,4]
[1,] 2 4 6 8
[2,] 3 6 9 12
If i have a n dimensional array it can be sliced by a m * n matrix like this
a <- array(1:27,c(3,3,3))
b <- matrix(rep(1:3,3),3)
# This will return the index a[1,1,1] a[2,2,2] and a[3,3,3]
a[b]
# Output
[1] 1 14 27
Is there any "effective and easy" way to do a similar slice but to keep some dimensions free?
That is slice a n dimensional array with a m * (n-i) dimensional array and
get a i+1 dimensional array as result.
a <- array(1:27,c(3,3,3))
b <- matrix(rep(1:2,2),2)
# This will return a vector of the index a[1] a[2] a[1] and a[2]
a[b]
# Output
[1] 1 2 1 2
# This will return the indexes of the cartesian product between the vectors,
# that is a array consisting of a[1,,1] a[1,,2] a[2,,1] and a[2,,2]
a[c(1,2),,c(1,2)]
# Output
, , 1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
, , 2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
The desired result should be if the last command returned an array
with a[1,,1] and a[2,,2].
For now I solve this the problem with a for loop and abind but I'm sure there must be a better way.
# Desired functionality
a <- array(1:27,c(3,3,3))
b <- array(c(c(1,2),c(1,2)),c(2,2))
sliceem(a,b,freeDimension=2)
# Desired output (In this case rbind(a[1,,1],a[2,,2]) )
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 11 14 17
I think this is the cleanest way -- making a separate function:
slicem <- function(a,idx,drop=FALSE) do.call(`[`,c(list(a),idx,list(drop=drop)))
# usage for OP's example
a <- array(1:27, c(3,3,3))
idx <- list(1:2, TRUE, 1:2)
slicem(a,idx)
which gives
, , 1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
, , 2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
You have to write TRUE for each dimension that you aren't selecting from.
Following the OP's new expectations...
library(abind)
nistfun <- function(a,list_o_idx,drop=FALSE){
lens <- lengths(list_o_idx)
do.call(abind, lapply(seq.int(max(lens)), function(i)
slicem(a, mapply(`[`, list_o_idx, pmin(lens,i), SIMPLIFY=FALSE), drop=drop)
))
}
# usage for OP's new example
nistfun(a, idx)
# , , 1
#
# [,1] [,2] [,3]
# [1,] 1 4 7
#
# , , 2
#
# [,1] [,2] [,3]
# [1,] 11 14 17
Now, any non-TRUE indices must have the same length, since they will be matched up.
abind is used here instead of rbind (see an earlier edit on this answer) because it is the only sensible general way to think about slicing up an array. If you really want to drop dimensions, it's quite ambiguous which should be dropped and how, so the vector alone is returned:
nistfun(a, idx, drop=TRUE)
# [1] 1 4 7 11 14 17
If you want to throw this back into an array of some sort, you can do that after the fact:
matrix( nistfun(a, idx), max(lengths(idx)), dim(a)[sapply(idx,isTRUE)]), byrow=TRUE)
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 11 14 17
I want to go from something like this:
1> a = matrix(c(1,4,2,5,2,5,2,1,4,4,3,2,1,6,7,4),4)
1> a
[,1] [,2] [,3] [,4]
[1,] 1 2 4 1
[2,] 4 5 4 6
[3,] 2 2 3 7
[4,] 5 1 2 4
To something like this:
[,1] [,2]
[1,] 12 15
[2,] 10 16
...without using for-loops, plyr, or otherwise without looping. Possible? I'm trying to shrink a geographic lat/long dataset from 5 arc-minutes to half-degree, and I've got an ascii grid. A little function where I specify blocksize would be great. I've got hundreds of such files, so things that allow me to do it quickly without parallelization/supercomputers would be much appreciated.
You can use matrix multiplication for this.
# Computation matrix:
mat <- function(n, r) {
suppressWarnings(matrix(c(rep(1, r), rep(0, n)), n, n/r))
}
Square-matrix example, uses a matrix and its transpose on each side of a:
# Reduce a 4x4 matrix by a factor of 2:
x <- mat(4, 2)
x
## [,1] [,2]
## [1,] 1 0
## [2,] 1 0
## [3,] 0 1
## [4,] 0 1
t(x) %*% a %*% x
## [,1] [,2]
## [1,] 12 15
## [2,] 10 16
Non-square example:
b <- matrix(1:24, 4 ,6)
t(mat(4, 2)) %*% b %*% mat(6, 2)
## [,1] [,2] [,3]
## [1,] 14 46 78
## [2,] 22 54 86
tapply(a, list((row(a) + 1L) %/% 2L, (col(a) + 1L) %/% 2L), sum)
# 1 2
# 1 12 15
# 2 10 16
I used 1L and 2L instead of 1 and 2 so indices remain integers (as opposed to numerics) and it should run faster that way.
I guess that might help you, but still it uses sapply which can be considered as loop-ish tool.
a <- matrix(c(1,4,2,5,2,5,2,1,4,4,3,2,1,6,7,4),4)
block.step <- 2
res <- sapply(seq(1, nrow(a), by=block.step), function(x)
sapply(seq(1, nrow(a), by=block.step), function(y)
sum(a[x:(x+block.step-1), y:(y+block.step-1)])
)
)
res
Is it anyhow helpful ?