I have a data frame in R that consists of two (numeric) columns, say A and B. I would like to constructor the column consisting of the elements
A1+B1
(A1+B1)*A2+B2
((A1+B1)*A2+B2)*A3+B3
...
This probably is an easy one-liner, but I don't see it.
Edit: deleted the word vectorized from the title, since I'm basically just interested in any elegant solution (the dumb ones I can do myself). In F# - which I'm more familiar with - this would be something like (assuming the elements would be in a list as tuples, which would be more idiomatic):
ABlist |> List.fold (fun acc (a,b) -> acc*a+b) 1
Which is still something very short and clear. I'm dragging this up, because I'm an R noob and unfamiliar with it, but I have read somewhere that it's a functional language, so I would guess a solution in terms of a fold over a data-frame would exist?
This is a different answer driven by the failure of Bram's effort at using Reduce. It builds a list that has the A;B pairs and then sets the intial value for the accumulator to 1 so that the first multiplication doesn't get zeroed out:
abList <- mapply(c, A,B, SIMPLIFY=FALSE) # keeps as a list
Reduce( function(acc,x) {acc*x[1]+x[2]},
abList,init=c(1,0),
accumulate=TRUE)[-1] # removes the initialization value
#--------
[[1]]
[1] 4 3
[[2]]
[1] 10 8
[[3]]
[1] 31 25
[[4]]
[1] 128 104
[[5]]
[1] 645 525
Might take some further work with s/lapply( ..., "[", 1) to pull out the accumulator
> sapply( Reduce( function(acc,x) {acc*x[1]+x[2]},
+ abList,init=c(1,0),
+ accumulate=TRUE)[-1], "[", 1)
[1] 4 10 31 128 645
Ok, turns out I was too lazy, I figured it out myself (note: before that there was a good answer using Rcpp already, but I can't use that at work). Basically just a translation to R of what I wrote in my edit on how I would do this in F#:
a <- c(1,2,3)
b <- c(4,5,6)
abList <- list(a,b)
Reduce( function(acc,x) {acc*x[[1]]+x[[2]]},
abList,
accumulate=TRUE)
Does the trick. Edit: as per the comment below, it actually doesn't do the trick. If one build abList by
abList <- apply(rbind(a,b),2,as.pairlist)
and then folds by:
Reduce(function(acc,x) {(acc*x[[1]])+x[[2]]},abList,1,accumulate=TRUE)
One gets the right answer (with a 1 prepended because that's the intial value for the accumulator)
This is relatively straightforward in Rcpp, which won't have the performance problems you would see if you tried to implement this with loops in R.
library(Rcpp)
sum.prod <- cppFunction(
"NumericVector sum_prod(NumericVector x, NumericVector y) {
NumericVector result(x.size());
result[0] = x[0] + y[0];
for (int i=1; i < x.size(); ++i) result[i] = result[i-1]*x[i] + y[i];
return result;
}")
sum.prod(c(1, 2, 3, 4, 5), c(3, 2, 1, 4, 5))
# [1] 4 10 31 128 645
I've found Rcpp to be the simplest way to speed up hard-to-vectorize computations.
X = 1; for ( i in seq(length(A) ) ) { X= B[i]+A[i]*X; print(X) }
[1] 4
[1] 10
[1] 31
[1] 128
[1] 645
If you want to accumulate rather than reprot:
X = 1; for ( i in seq(length(A) ) ) { X[1+i]= B[i]+A[i]*X[i] }; X[-1]
#[1] 4 10 31 128 645
Will be ploddingly slow compared to the Rcpp solution, but if you need to do the compilation step on the fly it's only when the lengths are more than 1000 that you might even notice the difference:
> A <- sample(1:10000, 1000); B <- sample(1:10000, 1000)
> system.time( {X = 1; for ( i in seq(length(A) ) ) { X[1+i]= B[i]+A[i]*X[i] }; X[-1]})
user system elapsed
0.014 0.002 0.017
> library(Rcpp)
> system.time( {sum.prod <- cppFunction(
+ "NumericVector sum_prod(NumericVector x, NumericVector y) {
+ NumericVector result(x.size());
+ result[0] = x[0] + y[0];
+ for (int i=1; i < x.size(); ++i) result[i] = result[i-1]*x[i] + y[i];
+ return result;
+ }")
+ sum.prod(A,B) } )
user system elapsed
0.012 0.002 0.014
Related
I want to produce a vector starting at 1 and stopping at 900, where each increase is based on taking the last generated number and adding 1 and then multiplying it with 1.55; and then round it to the nearest number.
That is the sequence should be:
First number: 1 (i.e., start at 1)
Second number: 3 (i.e., (1+1)*1.55)
Third number: 6 (i.e., (3+1)*1.55)
I have tried:
x0=1
seq(x0, 600, (+1*1.55))
A not so smart approach using while loop
stop = 900
new_num = 1
num = 1
while(new_num < stop) {
new_num = round((new_num + 1) * 1.55)
num = c(num, new_num)
}
head(num,-1)
# [1] 1 3 6 11 19 31 50 79 124 194 302 470 730
Reduce Solution
Here is a solution using Reduce
# framework
x0 <- 1
bound <- 900
r <- 1.55
estimate <- round(log(bound, r))
# iterations and filtering
vec <- Reduce(f = function(y,x) round((1+y)*r), x = 1:estimate,
init = x0, accumulate = TRUE)
vec <- vec[vec <= bound]
# result
[1] 1 3 6 11 19 31 50 79 124 194 302 470 730
Notes
As mentioned in the comments, to have whole numbers one must use round;floor;ceil;etc., I chose round. But the "true" values are not whole numbers.
I calculate an estimate that I use in Reduce in order to have a rough idea of the size of the result - that can surely be done cleverer.
Rcpp Solution
Here is a solution using the Rcpp package (so the function is written in c++ and "transformed" into an R function). The function here uses a while loop, so pretty much #RonakShah's solution written in c++.
C++ file
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector nateCpp(int init, int bound) {
int current = init;
int outLength = 1;
NumericVector out(outLength);
out[0] = current;
while ((int) round((1+current)*1.55) <= bound) {
current = (int) round((1+current)*1.55);
out.insert(out.length(), current);
}
return out;
}
R file
# sourcing the file to have the function
Rcpp::sourceCpp('MyCppFile.cpp')
nateCpp(1,900)
[1] 1 3 6 11 19 31 50 79 124 194 302 470 730
Benchmarking
Here are benchmarking figures:
Unit: microseconds
expr min lq mean median uq max neval cld
ronak(1, 10^5) 34.644 38.656 56.628269 40.844 52.878 25973.41 1e+05 c
nate(1, 10^5) 36.103 40.844 57.294825 43.032 53.243 26714.79 1e+05 c
darren(1, 10^5) 29.903 33.550 46.599951 35.374 41.209 12248.98 1e+05 b
nateCpp(1, 10^5) 2.553 4.012 6.578349 4.741 5.471 10963.14 1e+05 a
Unsurprisingly, the solution involving c++ is by far the fastest. For the R only solutions,
#DarrenTsai's recursion is the fastest (recursions should be used with caution though, problems such as too many nested expressions might arise - in this case it should be fine as the numbers grow rapidly) followed by
#RonakShah's and my function.
The functions
ronak <- function(x0,stop) {
new_num = 1
num = 1
while(new_num < stop) {
new_num = round((new_num + 1) * 1.55)
num = c(num, new_num)
}
head(num,-1)
}
nate <- function(x0, bound) {
r <- 1.55
estimate <- round(log(bound, r))
vec <- Reduce(f = function(y,x) round((1+y)*r), x = 1:estimate, init = x0, accumulate = TRUE)
vec <- vec[vec <= bound]
vec
}
darren <- function(start, end){
n <- length(start) ; last <- start[n]
if(last >= end)
return(start[-n])
else
darren(c(start, round((last + 1) * 1.55)), end)
}
A recursive solution:
FUN <- function(start, end){
n <- length(start) ; last <- start[n]
if(last >= end)
return(start[-n])
else
FUN(c(start, round((last + 1) * 1.55)), end)
}
FUN(1, 900)
# [1] 1 3 6 11 19 31 50 79 124 194 302 470 730
Consider the following function:
addAmount <- function(x, amount) {
stopifnot(length(x) == 1)
return(x + amount)
}
It can be used to add some amount to x:
> addAmount(x = 5, amount = 3)
[1] 8
> addAmount(x = 2, amount = 3)
[1] 5
However, x must be of length 1:
> addAmount(x = 7:9, amount = 3)
Error: length(x) == 1 is not TRUE
I added this restriction intentionally for exemplification.
Using Vectorize, it is possible to pass in a vector for x:
> Vectorize(addAmount)(x = 7:9, amount = 3)
[1] 10 11 12
So far, so good.
However, I'd like to turn my addAmount function into a "add 3" function, using currying:
add3 <- functional::Curry(addAmount, amount = 3)
This works as expected if x is of length 1 and fails (as expected) if x is not of length 1:
> add3(x = 5)
[1] 8
> add3(x = 7:9)
Error: length(x) == 1 is not TRUE
The problem is: add3 cannot be vectorized:
> Vectorize(add3)(x = 7:9)
Error: length(x) == 1 is not TRUE
Somehow, the curried function is not "compatible" with Vectorize, i.e. it behaves as if it had not been vectorized at all.
Question: What can I do about this? How can currying and vectorization be combined? (And: What is going wrong?)
I found a workaround (heavily inspired by Hadley's add function) using environments instead of Curry, but I'm looking for a cleaner solution that doesn't require this kind of clumsy "factory" functions:
getAdder <- function(amount) {
force(amount)
addAmount <- function(x) {
stopifnot(length(x) == 1)
return(x + amount)
}
return(addAmount)
}
add3 <- getAdder(3)
Vectorize(add3)(x = 7:9)
[1] 10 11 12
Tested with R 3.4.1 and the functional package (version 0.6).
You can vectorize before currying:
add3 <- functional::Curry(Vectorize(addAmount), amount = 3)
add3(1:10)
[1] 4 5 6 7 8 9 10 11 12 13
I have a nonzero symmetric matrix 'matr' that is 12000X12000. I need to find the indices of the top 10000 elements in 'matr' in R. The code I have written takes a long time - I was wondering if there was any pointers to make it faster.
listk <- numeric(0)
for( i in 1:10000) {
idx <- which(matr == max(matr), arr.ind=T)
if( length(idx) != 0) {
listk <- rbind( listk, idx[1,])
matr[idx[1,1], idx[1,2]] <- 0
matr[idx[2,1], idx[2,2]] <- 0
}
}
Here's how you might find the indices (ij) of the 4 largest elements in a 10-by-10 matrix m.
## Sample data
m <- matrix(runif(100), ncol=10)
## Extract the indices of the 4 largest elements
(ij <- which(m >= sort(m, decreasing=T)[4], arr.ind=TRUE))
# row col
# [1,] 2 1
# [2,] 5 1
# [3,] 6 2
# [4,] 3 10
## Use the indices to extract the values
m[ij]
# [1] 0.9985190 0.9703268 0.9836373 0.9914510
Edit:
For large matrices, performing a partial sort will be a faster way to find the 10,000th largest element:
v <- runif(1e7)
system.time(a <- sort(v, decreasing=TRUE)[10000])
# user system elapsed
# 4.35 0.03 4.38
system.time(b <- -sort(-v, partial=10000)[10000])
# user system elapsed
# 0.60 0.09 0.69
a==b
# [1] TRUE
I like #JoshO'Brien 's answer; the use of partial sorting is great! Here's an Rcpp solution (I'm not a strong C++ programmer so probably bone-headed errors; corrections welcome... how would I template this in Rcpp, to handle different types of input vector?)
I start by including the appropriate headers and using namespaces for convenience
#include <Rcpp.h>
#include <queue>
using namespace Rcpp;
using namespace std;
Then arrange to expose my C++ function to R
// [[Rcpp::export]]
IntegerVector top_i_pq(NumericVector v, int n)
and define some variables, most importantly a priority_queue to hold as a pair the numeric value and index. The queue is ordered so the smallest values are at the 'top', with small relying on the standard pair<> comparator.
typedef pair<double, int> Elt;
priority_queue< Elt, vector<Elt>, greater<Elt> > pq;
vector<int> result;
Now I'll walk through the input data, adding it to the queue if either (a) I don't yet have enough values or (b) the current value is larger than the smallest value in the queue. In the latter case, I pop off the smallest value, and insert it's replacement. In this way the priority queue always contains the n_max largest elements.
for (int i = 0; i != v.size(); ++i) {
if (pq.size() < n)
pq.push(Elt(v[i], i));
else {
Elt elt = Elt(v[i], i);
if (pq.top() < elt) {
pq.pop();
pq.push(elt);
}
}
}
And finally I pop the indexes from the priority queue into the return vector, remembering to translate to 1-based R coordinates.
result.reserve(pq.size());
while (!pq.empty()) {
result.push_back(pq.top().second + 1);
pq.pop();
}
and return the result to R
return wrap(result);
This has nice memory use (the priority queue and return vector are both small relative to the original data) and is fast
> library(Rcpp); sourceCpp("top_i_pq.cpp"); z <- runif(12000 * 12000)
> system.time(top_i_pq(z, 10000))
user system elapsed
0.992 0.000 0.998
Problems with this code include:
The default comparator greater<Elt> works so that, in the case of a tie spanning the value of the _n_th element, the last, rather than first, duplicate is retained.
NA values (and non-finite values?) may not be handled correctly; I'm not sure whether this is true or not.
The function only works for NumericVector input, but the logic is appropriate for any R data type for which an appropriate ordering relationship is defined.
Problems 1 and 2 can likely be dealt with by writing an appropriate comparator; maybe for 2 this is already implemented in Rcpp? I don't know how to leverage C++ language features and the Rcpp design to avoid re-implementing the function for each data type I want to support.
Here's the full code:
#include <Rcpp.h>
#include <queue>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
IntegerVector top_i_pq(NumericVector v, int n)
{
typedef pair<double, int> Elt;
priority_queue< Elt, vector<Elt>, greater<Elt> > pq;
vector<int> result;
for (int i = 0; i != v.size(); ++i) {
if (pq.size() < n)
pq.push(Elt(v[i], i));
else {
Elt elt = Elt(v[i], i);
if (pq.top() < elt) {
pq.pop();
pq.push(elt);
}
}
}
result.reserve(pq.size());
while (!pq.empty()) {
result.push_back(pq.top().second + 1);
pq.pop();
}
return wrap(result);
}
A bit late into the party, but I came up with this, which avoids the sort.
Say you want the top 10k elements from you 12k x 12k matrix. The idea is to "clip" the data to the elements corresponding to a quantile of that size.
find_n_top_elements <- function( x, n ){
#set the quantile to correspond to n top elements
quant <- n / (dim(x)[1]*dim(x)[2])
#select the cutpoint to get the quantile above quant
lvl <- quantile(x, probs=1.0-quant)
#select the elements above the cutpoint
res <- x[x>lvl[[1]]]
}
#create a 12k x 12k matrix (1,1Gb!)
n <- 12000
x <- matrix( runif(n*n), ncol=n)
system.time( res <- find_n_top_elements( x, 10e3 ) )
Resulting in
system.time( res <- find_n_top_elements( x, 10e3 ) )
user system elapsed
3.47 0.42 3.89
For comparison, just sorting x on my system takes
system.time(sort(x))
user system elapsed
30.69 0.21 31.33
Matrix in R is like a vector.
mat <- matrix(sample(1:5000, 10000, rep=T), 100, 100)
mat.od <- order(mat, decreasing = T)
mat.od.arr <- cbind(mat.od%%nrow(mat), mat.od%/%nrow(mat)+1)
mat.od.arr[,2][mat.od.arr[,1]==0] <- mat.od.arr[,2][mat.od.arr[,1]==0] - 1
mat.od.arr[,1][mat.od.arr[,1]==0] <- nrow(mat)
head(mat.od.arr)
# [,1] [,2]
# [1,] 58 5
# [2,] 59 72
# [3,] 38 22
# [4,] 23 10
# [5,] 38 14
# [6,] 90 15
mat[58, 5]
# [1] 5000
mat[59, 72]
# [1] 5000
mat[38, 22]
# [1] 4999
mat[23, 10]
# [1] 4998
I have a nonzero symmetric matrix 'matr' that is 12000X12000. I need to find the indices of the top 10000 elements in 'matr' in R. The code I have written takes a long time - I was wondering if there was any pointers to make it faster.
listk <- numeric(0)
for( i in 1:10000) {
idx <- which(matr == max(matr), arr.ind=T)
if( length(idx) != 0) {
listk <- rbind( listk, idx[1,])
matr[idx[1,1], idx[1,2]] <- 0
matr[idx[2,1], idx[2,2]] <- 0
}
}
Here's how you might find the indices (ij) of the 4 largest elements in a 10-by-10 matrix m.
## Sample data
m <- matrix(runif(100), ncol=10)
## Extract the indices of the 4 largest elements
(ij <- which(m >= sort(m, decreasing=T)[4], arr.ind=TRUE))
# row col
# [1,] 2 1
# [2,] 5 1
# [3,] 6 2
# [4,] 3 10
## Use the indices to extract the values
m[ij]
# [1] 0.9985190 0.9703268 0.9836373 0.9914510
Edit:
For large matrices, performing a partial sort will be a faster way to find the 10,000th largest element:
v <- runif(1e7)
system.time(a <- sort(v, decreasing=TRUE)[10000])
# user system elapsed
# 4.35 0.03 4.38
system.time(b <- -sort(-v, partial=10000)[10000])
# user system elapsed
# 0.60 0.09 0.69
a==b
# [1] TRUE
I like #JoshO'Brien 's answer; the use of partial sorting is great! Here's an Rcpp solution (I'm not a strong C++ programmer so probably bone-headed errors; corrections welcome... how would I template this in Rcpp, to handle different types of input vector?)
I start by including the appropriate headers and using namespaces for convenience
#include <Rcpp.h>
#include <queue>
using namespace Rcpp;
using namespace std;
Then arrange to expose my C++ function to R
// [[Rcpp::export]]
IntegerVector top_i_pq(NumericVector v, int n)
and define some variables, most importantly a priority_queue to hold as a pair the numeric value and index. The queue is ordered so the smallest values are at the 'top', with small relying on the standard pair<> comparator.
typedef pair<double, int> Elt;
priority_queue< Elt, vector<Elt>, greater<Elt> > pq;
vector<int> result;
Now I'll walk through the input data, adding it to the queue if either (a) I don't yet have enough values or (b) the current value is larger than the smallest value in the queue. In the latter case, I pop off the smallest value, and insert it's replacement. In this way the priority queue always contains the n_max largest elements.
for (int i = 0; i != v.size(); ++i) {
if (pq.size() < n)
pq.push(Elt(v[i], i));
else {
Elt elt = Elt(v[i], i);
if (pq.top() < elt) {
pq.pop();
pq.push(elt);
}
}
}
And finally I pop the indexes from the priority queue into the return vector, remembering to translate to 1-based R coordinates.
result.reserve(pq.size());
while (!pq.empty()) {
result.push_back(pq.top().second + 1);
pq.pop();
}
and return the result to R
return wrap(result);
This has nice memory use (the priority queue and return vector are both small relative to the original data) and is fast
> library(Rcpp); sourceCpp("top_i_pq.cpp"); z <- runif(12000 * 12000)
> system.time(top_i_pq(z, 10000))
user system elapsed
0.992 0.000 0.998
Problems with this code include:
The default comparator greater<Elt> works so that, in the case of a tie spanning the value of the _n_th element, the last, rather than first, duplicate is retained.
NA values (and non-finite values?) may not be handled correctly; I'm not sure whether this is true or not.
The function only works for NumericVector input, but the logic is appropriate for any R data type for which an appropriate ordering relationship is defined.
Problems 1 and 2 can likely be dealt with by writing an appropriate comparator; maybe for 2 this is already implemented in Rcpp? I don't know how to leverage C++ language features and the Rcpp design to avoid re-implementing the function for each data type I want to support.
Here's the full code:
#include <Rcpp.h>
#include <queue>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
IntegerVector top_i_pq(NumericVector v, int n)
{
typedef pair<double, int> Elt;
priority_queue< Elt, vector<Elt>, greater<Elt> > pq;
vector<int> result;
for (int i = 0; i != v.size(); ++i) {
if (pq.size() < n)
pq.push(Elt(v[i], i));
else {
Elt elt = Elt(v[i], i);
if (pq.top() < elt) {
pq.pop();
pq.push(elt);
}
}
}
result.reserve(pq.size());
while (!pq.empty()) {
result.push_back(pq.top().second + 1);
pq.pop();
}
return wrap(result);
}
A bit late into the party, but I came up with this, which avoids the sort.
Say you want the top 10k elements from you 12k x 12k matrix. The idea is to "clip" the data to the elements corresponding to a quantile of that size.
find_n_top_elements <- function( x, n ){
#set the quantile to correspond to n top elements
quant <- n / (dim(x)[1]*dim(x)[2])
#select the cutpoint to get the quantile above quant
lvl <- quantile(x, probs=1.0-quant)
#select the elements above the cutpoint
res <- x[x>lvl[[1]]]
}
#create a 12k x 12k matrix (1,1Gb!)
n <- 12000
x <- matrix( runif(n*n), ncol=n)
system.time( res <- find_n_top_elements( x, 10e3 ) )
Resulting in
system.time( res <- find_n_top_elements( x, 10e3 ) )
user system elapsed
3.47 0.42 3.89
For comparison, just sorting x on my system takes
system.time(sort(x))
user system elapsed
30.69 0.21 31.33
Matrix in R is like a vector.
mat <- matrix(sample(1:5000, 10000, rep=T), 100, 100)
mat.od <- order(mat, decreasing = T)
mat.od.arr <- cbind(mat.od%%nrow(mat), mat.od%/%nrow(mat)+1)
mat.od.arr[,2][mat.od.arr[,1]==0] <- mat.od.arr[,2][mat.od.arr[,1]==0] - 1
mat.od.arr[,1][mat.od.arr[,1]==0] <- nrow(mat)
head(mat.od.arr)
# [,1] [,2]
# [1,] 58 5
# [2,] 59 72
# [3,] 38 22
# [4,] 23 10
# [5,] 38 14
# [6,] 90 15
mat[58, 5]
# [1] 5000
mat[59, 72]
# [1] 5000
mat[38, 22]
# [1] 4999
mat[23, 10]
# [1] 4998
I set up a 3 dimensinoal matrix of size 365x7x4.
x <- array(rep(1, 365*5*4), dim=c(365, 5, 4))
Now I would to use a for loop to fill each element with a value.
Lets say the value of each element should be sum of row, column and depth.
I guess this is relatively easy.
Thanks! best, F
Using a simpler example so we can see what is being done
arr <- array(seq_len(3*3*3), dim = rep(3,3,3))
the following code gives the requested output:
dims <- dim(arr)
ind <- expand.grid(lapply(dims, seq_len))
arr[] <- rowSums(ind)
The above gives
> arr
, , 1
[,1] [,2] [,3]
[1,] 3 4 5
[2,] 4 5 6
[3,] 5 6 7
, , 2
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 5 6 7
[3,] 6 7 8
, , 3
[,1] [,2] [,3]
[1,] 5 6 7
[2,] 6 7 8
[3,] 7 8 9
> arr[1,1,1]
[1] 3
> arr[1,2,3]
[1] 6
> arr[3,3,3]
[1] 9
Update: Using the example in #TimP's Answer here I update the Answer to show how it can be done in a more R-like fashion.
Given
arr <- array(seq_len(3*3*3), dim = rep(3,3,3))
Replace elements of arr with i + j + k unless k > 2, in which case j*k-i is used instead.
dims <- dim(arr)
ind <- expand.grid(lapply(dims, seq_len))
## which k > 2
want <- ind[,3] > 2
arr[!want] <- rowSums(ind[!want, ])
arr[want] <- ind[want, 2] * ind[want, 3] - ind[want, 1]
Whilst it is tempting to stick with familiar idioms like looping, and contrary to popular belief loops are not inefficient in R, learning to think in a vectorised way will pay off many times over as you learn the language and start applying it to data analysis task.
Here are some timings on Fabian's example:
> x <- array(rep(1, 365*5*4), dim=c(365, 5, 4))
> system.time({
+ for (i in seq_len(dim(x)[1])) {
+ for (j in seq_len(dim(x)[2])) {
+ for (k in seq_len(dim(x)[3])) {
+ val = i+j+k
+ if (k > 2) {
+ val = j*k-i
+ }
+ x[i,j,k] = val
+ }
+ }
+ }
+ })
user system elapsed
0.043 0.000 0.044
> arr <- array(rep(1, 365*5*4), dim=c(365, 5, 4))
> system.time({
+ dims <- dim(arr)
+ ind <- expand.grid(lapply(dims, seq_len))
+ ## which k > 2
+ want <- ind[,3] > 2
+ arr[!want] <- rowSums(ind[!want, ])
+ arr[want] <- ind[want, 2] * ind[want, 3] - ind[want, 1]
+ })
user system elapsed
0.005 0.000 0.006
and for a much larger (for my ickle laptop at least!) problem
> x <- array(rep(1, 200*200*200), dim=c(200, 200, 200))
> system.time({
+ for (i in seq_len(dim(x)[1])) {
+ for (j in seq_len(dim(x)[2])) {
+ for (k in seq_len(dim(x)[3])) {
+ val = i+j+k
+ if (k > 2) {
+ val = j*k-i
+ }
+ x[i,j,k] = val
+ }
+ }
+ }
+ })
user system elapsed
51.759 0.129 53.090
> arr <- array(rep(1, 200*200*200), dim=c(200, 200, 200))
> system.time({
+ dims <- dim(arr)
+ ind <- expand.grid(lapply(dims, seq_len))
+ ## which k > 2
+ want <- ind[,3] > 2
+ arr[!want] <- rowSums(ind[!want, ])
+ arr[want] <- ind[want, 2] * ind[want, 3] - ind[want, 1]
+ })
user system elapsed
2.282 1.036 3.397
but even that may be modest to small by today's standards. You can see that the looping starts to become ever more uncompetitive because of the all the function calls required by that method.
Fabian: from the phrasing of your question, I believe you're just looking for a simple way of setting values in the array to follow any set of rules you might devise. No problem.
Your array is small (and from the context I strongly suspect you only want to use the code for something of that size). So good practice is simply to use a set of three for loops, which will run almost instantly - no need for any unnecessary complications. My code below shows an example: here we set element x[i,j,k] to be i+j+k, unless k>2, in which case we set it to be j*k-i instead.
Obviously, you can have as many rules as you want - just add an if statement for each one, and define val to be the value you want x[i,j,k] to take if that condition is true. (There's a few different ways to set this up, but this one seems the simplest to understand.) At the end of the innermost loop, x[i,j,k] gets set to the required value (val), and we then go on and do the next element until they're all done. That's it!
x = array(rep(1, 365*5*4), dim=c(365, 5, 4))
for (i in seq_len(dim(x)[1])) {
for (j in seq_len(dim(x)[2])) {
for (k in seq_len(dim(x)[3])) {
val = i+j+k
if (k > 2) {
val = j*k-i
}
x[i,j,k] = val
}
}
}
Hope this helps :)
Quick update (non-loopy method): For completeness, if you're in a real hurry and want your code to run in 0.07 seconds rather than 0.19 seconds... you could also set things up in a vectory way like this:
comb = expand.grid(seq_len(365), seq_len(5), seq_len(4))
i = comb$Var1; j = comb$Var2; k = comb$Var3
val = i+j+k
subs = which(k>2); val[subs] = (j*k-i)[subs]
x = array(val, dim = c(365, 5, 4))
In the above, the variables i, j and k are vectors with length 7300 (the number of cells in the array). As before, the default choice for val is the sum i+j+k except on the subset k>2, where val is j*k-i instead - exactly the same as the example in the first part of my answer. Obviously the notation in this method is quite a bit harder which is why I thought it'd be better to show you the loop-based solution. Hopefully you'll see how you could add in other conditions to the above though. The final line maps the vector val over to the array x in the right way so that each x[i,j,k] takes on the correct value of val. Try it and see :)
One small point to note though: if you were ever to want to run this sort of algorithm on a massive array (much, much, much bigger than the one you have now), then the approach immediately above would definitely be the one to use to minimise runtime. For your case, my advice is to use whichever one you feel more comfortable with as the runtime isn't really an issue.
Cheers! :)