How to make an efficient combination of numbers R? - r

I am trying to make a matrix of all combinations of 5 numbers between 1 and 100 (integers) that sum to 100. If I could set up min and max for each 5 numbers that would be even greater.
The easy way I have done it is to do do 5 nested loops.
for (a in min:max )
{
for (b in min:max )
{
for (c in min:max)
{
for(d in min:max)
{
for (e in min:max)
{
for (f in min:max)
{
for (g in min:max)
{
for (h in min:max)
{
port <- c (a,b,c,d,e,f,g,h)
if(a+b+c+d+e+f+g+h==100) {portif <- rbind(port,portif)}
}}}}}}}}
But I am pretty sure there is a better way in R than these prettry slow loops.
Edit :
- Yes, the order is important
It would be even greater if I could set a different min and max for each a,b,c ...
Thanks a lot for your help

Get all (choose(100, 5) resulting in 75287520) combinations:
x <- combn(1L:100L, 5)
Compute the column sums and check which equals to 100:
x[, colSums(x) == 100]
Resulting in 25337 combinations, e.g.:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 90
[2,] 1 2 3 5 89
[3,] 1 2 3 6 88
[4,] 1 2 3 7 87
[5,] 1 2 3 8 86
...

Dynamic programming might be faster for you, but harder to implement. Here's a recursive solution:
f <- function(min, max, cnt) {
if(max < min) return(NULL)
if(cnt == 1) return(max)
do.call(rbind, lapply(min:max,
function(i){
X <- f(min, max-i, cnt-1)
if(!is.null(X)) cbind(i, X)
})
)
}
To not include permutations of the same set, you can change the recursion to
X <- f(i+1, max-i, cnt-1)
//edit: To have different min and max for each ply, you can make min and max vectors, then change usage to eg min[cnt]; you may also want to swap the order to cbind(X,i) for sanity.

Than you, your both codes are much faster
I found another bit of code which seems aloso quite good
library("partitions")
numparts <- 8
sumparts <- 20
weights <- compositions(n=sumparts, m=numparts, include.zero=TRUE)/sumparts

Related

Generating "Non-Random" Numbers in R?

I know how to generate 100 random numbers in R (without replacement):
random_numbers = sample.int(100, 100, replace = FALSE)
I was now curious about learning how to generate 100 "non random" numbers (without replacement). The first comes to mind is to generate a random number, and the next number will be the old number + 1 with a probability of 0.5 or an actual random number with probability 0.5. Thus, these numbers are not "fully random".
This was my attempt to write this code for numbers in a range of 0 to 100 (suppose I want to repeat this procedure 100 times):
library(dplyr)
all_games <- vector("list", 100)
for (i in 1:100){
index_i = i
guess_sets <- 1:100
prob_i = runif(n=1, min=1e-12, max=.9999999999)
guess_i = ifelse(prob_i> 0.5, sample.int(1, 100, replace = FALSE), guess_i + 1)
guess_sets_i <- setdiff(guess_sets_i, guess_i)
all_games_i = as.list(index_i, guess_i, all_games_i)
all_games[[i]] <- all_games_i
}
all_games <- do.call("rbind", all_games)
I tried to make a list that stores all guesses such that the range for the next guess automatically excludes numbers that have already been guessed, but I get this error:
Error in sample.int(1, 100, replace = FALSE) :
cannot take a sample larger than the population when 'replace = FALSE'
Ideally, I am trying to get the following results (format doesn't matter):
index_1 : 5,6,51,4,3,88,87,9 ...
index_2 77,78,79,2,65,3,1,99,100,4...
etc.
Can someone please show me how to do this? Are there easier ways in R to generate "non-random numbers"?
Thank you!
Note: I think an extra line of logic needs to be added - Suppose I guess the number 100, after guessing the number 100 I must guess a new random number since 100+1 is not included in the original range. Also, if I guess the number 5, 17 then 4 - and after guessing 4, the loop tells me to guess 4+1, this is impossible because 5 has already been guessed. In such a case, I would also have to guess a new random number?
It would be tricky to make your algorithm very efficient in R... it doesn't lend itself nicely to vectorization. Here's how I'd write it directly as a for loop:
semirandom = function(n) {
safe_sample = function(x, ...) {
if(length(x) == 1) return(x)
sample(x, ...)
}
result = numeric(n)
result[1] = sample.int(n, size = 1)
for(i in 2:length(result)) {
if(runif(1) < .5 &&
result[i - 1] < n &&
!((result[i - 1] + 1) %in% result)) {
result[i] = result[i - 1] + 1
} else {
result[i] = safe_sample(x = setdiff(1:n, result), size = 1)
}
}
result
}
# generate 10 semirandom numbers 5 times
replicate(semirandom(10), n = 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 6 4 4 2 6
# [2,] 3 5 5 3 7
# [3,] 4 3 6 4 5
# [4,] 5 1 2 5 2
# [5,] 7 9 3 6 3
# [6,] 9 10 10 1 1
# [7,] 10 2 8 9 4
# [8,] 2 8 1 8 10
# [9,] 1 7 9 10 9
# [10,] 8 6 7 7 8
You get the error cannot take a sample larger than the population when 'replace = FALSE' because you attempt to extract 100 values from a vector of length one without replacement.
The following draws numbers between 1 and 100, draws each number not more than once, has a 50 percent chance of drawing the previous number + 1 and a 50 percent chance of drawing another random number, if the previous number + 1 has not been drawn yet, and a 100 percent chance to draw another random number, if the previous number + 1 has been drawn.
i <- sample.int(100, 1)
j <- i
for(x in 1:99) {
if((i + 1L) %in% j) {
i <- sample((1:100)[-j], 1L)
} else {
if(runif(1L) > 0.5 || i == 100L) {
i <- sample((1:100)[-j], 1L)
} else {
i <- i + 1L
}
}
j <- c(j, i)
}

Choose closest x elements by index in a list/vector

If I have a vector such as x <-c(1,2,3,4,5,6,7,8,9), I want a function f such that
f(vector,index,num) where it takes the vector and gives me num "closest" elements to that one on the index
Examples:
f(x,3,4) = c(1,2,4,5)
f(x,1,5) = c(2,3,4,5,6)
f(x,8,3) = c(6,7,9)
Since there is also the issue where if we have an odd num, we will need to choose whether to pick left or right side by symmetry, let's go with choosing the left side (but right side is ok too)
i.e f(x,4,5) = c(1,2,3,5,6) and f(x,7,3) = c(5,6,8)
I hope my question is clear, thank you for any help/responses!
edit: The original vector of c(1:9) is arbitrary, the vector could be a vector of strings, or a vector of length 1000 with shuffled numbers with repeats etc.
i.e c(1,7,4,2,3,7,2,6,234,56,8)
num_closest_by_indices <- function(v, idx, num) {
# Try the base case, where idx is not within (num/2) of the edge
i <- abs(seq_along(x) - idx)
i[idx] <- +Inf # sentinel
# If there are not enough elements in the base case, incrementally add more
for (cutoff_idx in seq(floor(num/2), num)) {
if (sum(i <= cutoff_idx) >= num) {
# This will add two extra indices every iteration. Strictly if we have an even length, we should add the leftmost one first and `continue`, to break ties towards the left.
return(v[i <= cutoff_idx])
}
}
}
Here's an illustration of this algorithm: we rank the indices in order of desirability, then pick the lowest num legal ones:
> seq_along(x)
1 2 3 4 5 6 7 8 9
> seq_along(x) - idx
-2 -1 0 1 2 3 4 5 6
> i <- abs(seq_along(x) - idx)
2 1 0 1 2 3 4 5 6
> i[idx] <- +Inf # sentinel to prevent us returning the element itself
2 1 Inf 1 2 3 4 5 6
Now we can just find num elements with smallest values (break ties arbitrarily, unless you have a preference (left)).
Our first guess is all indices <= (num/2) ; this might not be enough if index is within (num/2) of the start/end.
> i <= 2
TRUE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE
> v[i <= 2]
1 2 4 5
So, adapting #dash2's code to handle the corner cases where some indices are illegal (nonpositive, or > length(x)), i.e. ! %in% 1:L. Then min(elems) would be the number of illegal indices which we cannot pick, hence we must pick abs(min(elems)) more.
Notes:
in the end the code is simpler and faster to handle it by three piecewise cases. Aww.
it actually seems to simplify things if we pick (num+1) indices, then remove idx before returning the answer. Using result[-idx] to remove it.
Like so:
f <- function (vec, elem, n) {
elems <- seq(elem - ceiling(n/2), elem + floor(n/2))
if (max(elems) > length(vec)) elems <- elems - (max(elems) - length(vec))
if (elems[1] < 1) elems <- elems + (1 - elems[1])
elems <- setdiff(elems, elem)
vec[elems]
}
Giving results:
> f(1:9, 1, 5)
[1] 2 3 4 5 6
> f(1:9, 9, 5)
[1] 4 5 6 7 8
> f(1:9, 2, 5)
[1] 1 3 4 5 6
> f(1:9, 4, 5)
[1] 1 2 3 5 6
> f(1:9, 4, 4)
[1] 2 3 5 6
> f(1:9, 2, 4)
[1] 1 3 4 5
> f(1:9, 1, 4)
[1] 2 3 4 5
> f(1:9, 9, 4)
[1] 5 6 7 8
Start a function with the variable argument x first, and the reference table and n after
.nearest_n <- function(x, table, n) {
The algorithm assumes that table is numeric, without any duplicates, and all values finite; n has to be less than or equal to the length of the table
## assert & setup
stopifnot(
is.numeric(table), !anyDuplicated(table), all(is.finite(table)),
n <= length(table)
)
Sort the table and then 'clamp' maximum and minimum values
## sort and clamp
table <- c(-Inf, sort(table), Inf)
len <- length(table)
Find the interval in table where x occurs; findInterval() uses an efficient search. Use the interval index as the initial lower index, and add 1 for the upper index, making sure to stay in-bounds.
## where to start?
lower <- findInterval(x, table)
upper <- min(lower + 1L, len)
Find the nearest n neighbors by comparing the lower and upper index distance to x, record the nearest value, and increment the lower or upper index as appropriate and making sure to stay in-bounds
## find
nearest <- numeric(n)
for (i in seq_len(n)) {
if (abs(x - table[lower]) < abs(x - table[upper])) {
nearest[i] = table[lower]
lower = max(1L, lower - 1L)
} else {
nearest[i] = table[upper]
upper = min(len, upper + 1L)
}
}
Then return the solution and finish the function
nearest
}
The code might seem verbose, but is actually relatively efficient because the only operations on the entire vector (sort(), findInterval()) are implemented efficiently in R.
A particular advantage of this approach is that it can be vectorized in it's first argument, calculating the test for using lower (use_lower = ...) as a vector and using pmin() / pmax() as clamps.
.nearest_n <- function(x, table, n) {
## assert & setup
stopifnot(
is.numeric(table), !anyDuplicated(table), all(is.finite(table)),
n <= length(table)
)
## sort and clamp
table <- c(-Inf, sort(table), Inf)
len <- length(table)
## where to start?
lower <- findInterval(x, table)
upper <- pmin(lower + 1L, len)
## find
nearest <- matrix(0, nrow = length(x), ncol = n)
for (i in seq_len(n)) {
use_lower <- abs(x - table[lower]) < abs(x - table[upper])
nearest[,i] <- ifelse(use_lower, table[lower], table[upper])
lower[use_lower] <- pmax(1L, lower[use_lower] - 1L)
upper[!use_lower] <- pmin(len, upper[!use_lower] + 1L)
}
# return
nearest
}
For instance
> set.seed(123)
> table <- sample(100, 10)
> sort(table)
[1] 5 29 41 42 50 51 79 83 86 91
> .nearest_n(c(30, 20), table, 4)
[,1] [,2] [,3] [,4]
[1,] 29 41 42 50
[2,] 29 5 41 42
Generalize this by taking any argument and coercing it to the required form using a reference look-up table table0 and the indexes into it table1
nearest_n <- function(x, table, n) {
## coerce to common form
table0 <- sort(unique(c(x, table)))
x <- match(x, table0)
table1 <- match(table, table0)
## find nearest
m <- .nearest_n(x, table1, n)
## result in original form
matrix(table0[m], nrow = nrow(m))
}
As an example...
> set.seed(123)
> table <- sample(c(letters, LETTERS), 30)
> nearest_n(c("M", "Z"), table, 5)
[,1] [,2] [,3] [,4] [,5]
[1,] "o" "L" "O" "l" "P"
[2,] "Z" "z" "Y" "y" "w"

R: How to do this matrix operation without loops or more efficient?

I'm trying to make this operation matrices, multiplying the first column with 2, 3 and 4, the first hold value, and then multiply the second column with 3 and 4, keep the value of the third and multiply the third column with 4. I want to do this without using a "for" loop, wanted to use functions like sapply or mapply. Does anyone have an idea how to do it?
Example with one line:
a[1,1]*(a[1,2], a[1,3], a[1,4]) = 2 4 4 4
a[1,1] a[1,2]*(a[1,3], a[1,4]) = 2 4 16 16 #keep a[1,1] a[1,2]
a[1,1] a[1,2] a[1,3] a[1,3]*(a[1,4]) = 2 4 16 256 # #keep a[1,1] a[1,2] a[1,3]
Input:
> a<- matrix(2,4,4) # or any else matrix like a<- matrix(c(1,8,10,1,4,1),3,3)
> a
[,1] [,2] [,3] [,4]
[1,] 2 2 2 2
[2,] 2 2 2 2
[3,] 2 2 2 2
[4,] 2 2 2 2
Output:
> a
[,1] [,2] [,3] [,4]
[1,] 2 4 16 256
[2,] 2 4 16 256
[3,] 2 4 16 256
[4,] 2 4 16 256
EDIT: LOOP VERSION
a<- matrix(2,4,4);
ai<-a[,1,drop=F];
b<- matrix(numeric(0),nrow(a),ncol(a)-1);
i<- 1;
for ( i in 1:(ncol(a)-1)){
a<- a[,1]*a[,-1,drop=F];
b[,i]<- a[,1];
}
b<- cbind(ai[,1],b);
b
If I understand correctly, what you are trying to do is, starting with a matrix A with N columns, perform the following steps:
Step 1. Multiply columns 2 through N of A by column 1 of A. Call the resulting matrix A1.
Step 2. Multiply columns 3 through N of A1 by column 2 of A1. Call the resulting matrix A2.
...
Step (N-1). Multiply column N of A(N-2) by column (N-1) of A(N-2). This is the desired result.
If this is indeed what you are trying to do, you need to either write a double for loop (which you want to avoid, as you say) or come up with some iterative method of performing the above steps.
The double for way would look something like this
DoubleFor <- function(m) {
res <- m
for(i in 1:(ncol(res)-1)) {
for(j in (i+1):ncol(res)) {
res[, j] <- res[, i] * res[, j]
}
}
res
}
Using R's vectorized operations, you can avoid the inner for loop
SingleFor <- function(m) {
res <- m
for(i in 1:(ncol(res)-1))
res[, (i+1):ncol(res)] <- res[, i] * res[, (i+1):ncol(res)]
res
}
When it comes to iterating a procedure, you may want to define a recursive function, or use Reduce. The recursive function would be something like
RecursiveFun <- function(m, i = 1) {
if (i == ncol(m)) return(m)
n <- ncol(m)
m[, (i+1):n] <- m[, (i+1):n] * m[, i]
Recall(m, i + 1) # Thanks to #batiste for suggesting using Recall()!
}
while Reduce would use a similar function without the recursion (which is provided by Reduce)
ReduceFun <- function(m) {
Reduce(function(i, m) {
n <- ncol(m)
m[, (i+1):n] <- m[, (i+1):n] * m[, i]
m
}, c((ncol(m)-1):1, list(m)), right = T)
}
These will all produce the same result, e.g. testing on your matrix
a <- matrix(c(1, 8, 10, 1, 4, 1), 3, 3)
DoubleFor(a)
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 8 32 2048
# [3,] 10 10 1000
all(DoubleFor(a) == SingleFor(a) & SingleFor(a) == RecursiveFun(a) &
RecursiveFun(a) == ReduceFun(a))
# [1] TRUE
Just out of curiosity, I did a quick speed comparison, but I don't think any one of the above will be significantly faster than the others for your size of matrices, so I would just go with the one you think is more readable.
a <- matrix(rnorm(1e6), ncol = 1e3)
system.time(DoubleFor(a))
# user system elapsed
# 22.158 0.012 22.220
system.time(SingleFor(a))
# user system elapsed
# 27.349 0.004 27.415
system.time(RecursiveFun(a))
# user system elapsed
# 25.150 1.336 26.534
system.time(ReduceFun(a))
# user system elapsed
# 26.574 0.004 26.626

Counting column data in a matrix with resets

I'm gathering data on how much my cats poop into a matrix:
m <- cbind(fluffy=c(1.1,1.2,1.3,1.4),misterCuddles=c(0.9,NA,1.1,1.0))
row.names(m) <- c("2013-01-01", "2013-01-02", "2013-01-03","2013-01-04")
Which gives me this:
fluffy misterCuddles
2013-01-01 1.1 0.9
2013-01-02 1.2 NA
2013-01-03 1.3 1.1
2013-01-04 1.4 1.0
On every date, I'd like to know how many days in a row each cat has gone number 2. So the resulting matrix should look like this:
fluffy misterCuddles
2013-01-01 1 1
2013-01-02 2 0
2013-01-03 3 1
2013-01-04 4 2
Is there a way to do this efficiently? The cumsum function does something similar, but that's a primitive so I can't modify it to suit my dirty, dirty needs.
I could run a for loop and store a count like so:
m.output <- matrix(nrow=nrow(m),ncol=ncol(m))
for (column in 1:ncol(m)) {
sum <- 0
for (row in 1:nrow(m)) {
if (is.na(m[row,column])) sum <- 0
else sum <- sum + 1
m.output[row,column] <- sum
}
}
Is this the most efficient way to do this? I have a lot of cats, and I've recorded years worth of poop data. Can I parallellize this by column somehow?
All of the answers here are actually too complicated (including my own, from earlier, copied below). The Reduce family of answers is just masking a for-loop in a single function call. I like Roland's and Ananda's, but both I think have a little too much going on.
Thus, here's a simple vectorized solution:
reset <- function(x) {
s <- seq_along(x)
s[!is.na(x)] <- 0
seq_along(x) - cummax(s)
}
> apply(m, 2, reset)
fluffy misterCuddles
[1,] 1 1
[2,] 2 0
[3,] 3 1
[4,] 4 2
It also works on Roland's example:
m2 <- cbind(fluffy=c(NA,1.1,1.2,1.3,1.4,1.0,2),
misterCuddles=c(NA,1.3,2,NA,NA,1.1,NA))
> apply(m2, 2, reset)
fluffy misterCuddles
[1,] 0 0
[2,] 1 1
[3,] 2 2
[4,] 3 0
[5,] 4 0
[6,] 5 1
[7,] 6 0
From earlier: this is not vectorized, but also works:
pooprun <- function(x){
z <- numeric(length=length(x))
count <- 0
for(i in 1:length(x)){
if(is.na(x[i]))
count <- 0
else
count <- + count + 1
z[i] <- count
}
return(z)
}
apply(m, 2, pooprun)
> apply(m, 2, pooprun)
fluffy misterCuddles
[1,] 1 1
[2,] 2 0
[3,] 3 1
[4,] 4 2
THE BENCHMARKING
Here I simply wrap everyone's answers in a function call (based on their name).
> library(microbenchmark)
> microbenchmark(alexis(), hadley(), thomas(), matthew(), thomasloop(), usobi(), ananda(), times=1000)
Unit: microseconds
expr min lq median uq max neval
alexis() 1.540 4.6200 5.3890 6.1590 372.185 1000
hadley() 87.755 92.758 94.298 96.6075 1767.012 1000
thomas() 92.373 99.6860 102.7655 106.6140 315.223 1000
matthew() 128.168 136.2505 139.7150 145.4880 5196.344 1000
thomasloop() 133.556 141.6390 145.1030 150.4920 84131.427 1000
usobi() 148.182 159.9210 164.7320 174.1620 5010.445 1000
ananda() 720.507 742.4460 763.6140 801.3335 5858.733 1000
And here are the results for Roland's example data:
> microbenchmark(alexis(), hadley(), thomas(), matthew(), thomasloop(), usobi(), ananda(), times=1000)
Unit: microseconds
expr min lq median uq max neval
alexis() 2.310 5.3890 6.1590 6.9290 75.438 1000
hadley() 75.053 78.902 80.058 83.136 1747.767 1000
thomas() 90.834 97.3770 100.2640 104.3050 358.329 1000
matthew() 139.715 149.7210 154.3405 161.2680 5084.728 1000
thomasloop() 144.718 155.4950 159.7280 167.4260 5182.103 1000
usobi() 177.048 188.5945 194.3680 210.9180 5360.306 1000
ananda() 705.881 729.9370 753.4150 778.8175 8226.936 1000
Note: Alexis's and Hadley's solutions took quite a while to actually define as functions on my machine, whereas the others work out-of-the-box, but Alexis's is otherwise the clear winner.
This should work. Note that each of your cats is an independent individual so you can turn your data frame into a list and use mclapply which uses a paralleled approach.
count <- function(y,x){
if(is.na(x)) return(0)
return (y + 1)
}
oneCat = m[,1]
Reduce(count,oneCat,init=0,accumulate=TRUE)[-1]
EDIT: here is the full answer
count <- function(x,y){
if(is.na(y)) return(0)
return (x + 1)
}
mclapply(as.data.frame(m),Reduce,f=count,init=0,accumulate=TRUE)
EDIT2: The main bad problem is that I do get extra 0's at the beginning so...
result = mclapply(as.data.frame(m),Reduce,f=count,init=0,accumulate=TRUE)
finalResult = do.call('cbind',result)[-1,]
rownames(finalResult) = rownames(m)
does the job.
Another option, similar #Usobi's in that it uses Reduce, but with a slightly different approach:
apply(!is.na(m), 2, Reduce, f=function(x,y) if (y) x + y else y, accumulate=TRUE)
# fluffy misterCuddles
# [1,] 1 1
# [2,] 2 0
# [3,] 3 1
# [4,] 4 2
I had saved a snippet from here that translates almost exactly for a problem like this:
countReset <- function(x) {
x[!is.na(x)] <- 1
y <- ave(x, rev(cumsum(rev(is.na(x)))), FUN=cumsum)
y[is.na(y)] <- 0
y
}
apply(m, 2, countReset)
# fluffy misterCuddles
# 2013-01-01 1 1
# 2013-01-02 2 0
# 2013-01-03 3 1
# 2013-01-04 4 2
Since I'm in a period where I'm trying to get used to .Call, here's another idea that seems to work and -probably- is fast. (Don't take my word for it, though, my skills are not trustworthy!!):
library(inline) #use "inline" package for convenience
f <- cfunction(sig = c(R_mat = "numeric", R_dims = "integer"), body = '
R_len_t *dims = INTEGER(R_dims);
R_len_t rows = dims[0], cols = dims[1];
double *mat = REAL(R_mat);
SEXP ans;
PROTECT(ans = allocMatrix(INTSXP, rows, cols));
R_len_t *pans = INTEGER(ans);
for(int ic = 0; ic < cols; ic++)
{
pans[0 + ic*rows] = ISNA(mat[0 + ic*rows]) ? 0 : 1;
for(int ir = 1; ir < rows; ir++)
{
if(ISNA(mat[ir + ic*rows]))
{
pans[ir + ic*rows] = 0;
}else
{
if(!ISNA(mat[(ir - 1) + ic*rows]))
{
pans[ir + ic*rows] = pans[(ir - 1) + ic*rows] + 1;
}else
{
pans[ir + ic*rows] = 1;
}
}
}
}
UNPROTECT(1);
return(ans);
')
f(m, dim(m))
# [,1] [,2]
#[1,] 1 1
#[2,] 2 0
#[3,] 3 1
#[4,] 4 2
f(mm, dim(mm)) #I named Roland's matrix, mm ; I felt that I had to pass this test!
# [,1] [,2]
#[1,] 0 0
#[2,] 1 1
#[3,] 2 2
#[4,] 3 0
#[5,] 4 0
#[6,] 5 1
#[7,] 6 0
So the solution to this problem has two parts:
A function that accepts a vector per cat and returns a vector telling me at each date, how many days since the last NA
A function that accepts an NxM matrix and returns an NxM matrix, applying function (1) to each column
For (2), I adapted this from #Usobi's answer:
daysSinceLastNA <- function(matrix, vectorFunction, cores=1) {
listResult <- mclapply(as.data.frame(matrix), vectorFunction, mc.cores=cores)
result <- do.call('cbind', listResult)
rownames(result) <- rownames(matrix)
result
}
For (1), I have two solutions:
#ananda-mahto's solution:
daysSinceLastNA_1 <- function(vector) {
vector[!is.na(vector)] <- 1
result <- ave(vector, rev(cumsum(rev(is.na(vector)))), FUN=cumsum)
result[is.na(result)] <- 0
result
}
#Usobi's solution:
daysSinceLastNA_2 <- function(vector) {
reduction <- function(total, additional) ifelse(is.na(additional), 0, total + 1)
Reduce(reduction, vector, init=0, accumulate=TRUE)[-1]
}
Then I call them like this:
> system.time(result1 <- daysSinceLastNA (test, daysSinceLastNA_1 ))
user system elapsed
5.40 0.01 5.42
> system.time(result2 <- daysSinceLastNA (test, daysSinceLastNA_2 ))
user system elapsed
58.02 0.00 58.03
On my test dataset, which is roughly a 2500x2500 matrix, the first approach is an order of magnitude faster.
If I run on linux with 64 cores, solution (1) runs in 2 seconds, and solution (2) runs in 6 seconds.
For this sort of problem, which is easily solved with a for loop, I find Rcpp a very natural answer.
library(Rcpp)
cppFunction("NumericVector cumsum2(NumericVector x) {
int n = x.length();
NumericVector out(x);
for(int i = 0; i < n; ++i) {
if (NumericVector::is_na(x[i]) || i == 0) {
x[i] = 0;
} else {
x[i] = x[i - 1] + 1;
}
}
return out;
}")
The code requires a little more bookkeeping than the equivalent R code, but the bulk of the function is a very simple for loop.
You can then apply in R like any other vectorised function:
m2 <- cbind(
fluffy=c(NA,1.1,1.2,1.3,1.4,1.0,2),
misterCuddles=c(NA,1.3,2,NA,NA,1.1,NA)
)
apply(m2, 2, cumsum2)
You could of course make the C++ code iterate over the columns of the matrix, but I think that since this is already easily expressed in R, you might as well use the built in tools.

replace diagonal elements in an array

Does anyone know a neat/efficient way to replace diagonal elements in array, similar to the use of diag(x) <- value for a matrix? In other words something like this:
> m<-array(1:27,c(3,3,3))
> for(k in 1:3){
+ diag(m[,,k])<-5
+ }
> m
, , 1
[,1] [,2] [,3]
[1,] 5 4 7
[2,] 2 5 8
[3,] 3 6 5
, , 2
[,1] [,2] [,3]
[1,] 5 13 16
[2,] 11 5 17
[3,] 12 15 5
, , 3
[,1] [,2] [,3]
[1,] 5 22 25
[2,] 20 5 26
[3,] 21 24 5
but without the use of a for loop (my arrays are pretty large and this manipulation will already be within a loop).
Many thanks.
Try this:
with(expand.grid(a = 1:3, b = 1:3), replace(m, cbind(a, a, b), 5))
EDIT:
The question asked for neat/efficient but, of course, those are not the same thing. The one liner here is compact and loop-free but if you are looking for speed I think you will find that the loop in the question is actually the fastest of all the answers.
You can use the following function for that, provided you have only 3 dimensions in your array. You can generalize to more dimensions based on this code, but I'm too lazy to do that for you ;-)
`arraydiag<-` <- function(x,value){
dims <- dim(x)
id <- seq_len(dims[1]) +
dims[2]*(seq_len(dims[2])-1)
id <- outer(id,(seq_len(dims[3])-1)*prod(dims[1:2]),`+`)
x[id] <- value
dim(x) <- dims
x
}
This works like :
m<-array(1:36,c(3,3,4))
arraydiag(m)<-NA
m
Note that, contrary to the diag() function, this function cannot deal with matrices that are not square. You can look at the source code of diag() to find out how to adapt this code in order it does so.
diagArr <-
function (dim)
{
n <- dim[2]
if(dim[1] != n) stop("expecting first two dimensions to be equal")
d <- seq(1, n*n, by=n+1)
as.vector(outer(d, seq(0, by=n*n, length=prod(dim[-1:-2])), "+"))
}
m[diagArr(dim(m))] <- 5
This is written with the intention that it works for dimensions higher than 3 but I haven't tested it in that case. Should be okay though.

Resources