Filling a 3D Matrix by for loops with values in R - r

I set up a 3 dimensinoal matrix of size 365x7x4.
x <- array(rep(1, 365*5*4), dim=c(365, 5, 4))
Now I would to use a for loop to fill each element with a value.
Lets say the value of each element should be sum of row, column and depth.
I guess this is relatively easy.
Thanks! best, F

Using a simpler example so we can see what is being done
arr <- array(seq_len(3*3*3), dim = rep(3,3,3))
the following code gives the requested output:
dims <- dim(arr)
ind <- expand.grid(lapply(dims, seq_len))
arr[] <- rowSums(ind)
The above gives
> arr
, , 1
[,1] [,2] [,3]
[1,] 3 4 5
[2,] 4 5 6
[3,] 5 6 7
, , 2
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 5 6 7
[3,] 6 7 8
, , 3
[,1] [,2] [,3]
[1,] 5 6 7
[2,] 6 7 8
[3,] 7 8 9
> arr[1,1,1]
[1] 3
> arr[1,2,3]
[1] 6
> arr[3,3,3]
[1] 9
Update: Using the example in #TimP's Answer here I update the Answer to show how it can be done in a more R-like fashion.
Given
arr <- array(seq_len(3*3*3), dim = rep(3,3,3))
Replace elements of arr with i + j + k unless k > 2, in which case j*k-i is used instead.
dims <- dim(arr)
ind <- expand.grid(lapply(dims, seq_len))
## which k > 2
want <- ind[,3] > 2
arr[!want] <- rowSums(ind[!want, ])
arr[want] <- ind[want, 2] * ind[want, 3] - ind[want, 1]
Whilst it is tempting to stick with familiar idioms like looping, and contrary to popular belief loops are not inefficient in R, learning to think in a vectorised way will pay off many times over as you learn the language and start applying it to data analysis task.
Here are some timings on Fabian's example:
> x <- array(rep(1, 365*5*4), dim=c(365, 5, 4))
> system.time({
+ for (i in seq_len(dim(x)[1])) {
+ for (j in seq_len(dim(x)[2])) {
+ for (k in seq_len(dim(x)[3])) {
+ val = i+j+k
+ if (k > 2) {
+ val = j*k-i
+ }
+ x[i,j,k] = val
+ }
+ }
+ }
+ })
user system elapsed
0.043 0.000 0.044
> arr <- array(rep(1, 365*5*4), dim=c(365, 5, 4))
> system.time({
+ dims <- dim(arr)
+ ind <- expand.grid(lapply(dims, seq_len))
+ ## which k > 2
+ want <- ind[,3] > 2
+ arr[!want] <- rowSums(ind[!want, ])
+ arr[want] <- ind[want, 2] * ind[want, 3] - ind[want, 1]
+ })
user system elapsed
0.005 0.000 0.006
and for a much larger (for my ickle laptop at least!) problem
> x <- array(rep(1, 200*200*200), dim=c(200, 200, 200))
> system.time({
+ for (i in seq_len(dim(x)[1])) {
+ for (j in seq_len(dim(x)[2])) {
+ for (k in seq_len(dim(x)[3])) {
+ val = i+j+k
+ if (k > 2) {
+ val = j*k-i
+ }
+ x[i,j,k] = val
+ }
+ }
+ }
+ })
user system elapsed
51.759 0.129 53.090
> arr <- array(rep(1, 200*200*200), dim=c(200, 200, 200))
> system.time({
+ dims <- dim(arr)
+ ind <- expand.grid(lapply(dims, seq_len))
+ ## which k > 2
+ want <- ind[,3] > 2
+ arr[!want] <- rowSums(ind[!want, ])
+ arr[want] <- ind[want, 2] * ind[want, 3] - ind[want, 1]
+ })
user system elapsed
2.282 1.036 3.397
but even that may be modest to small by today's standards. You can see that the looping starts to become ever more uncompetitive because of the all the function calls required by that method.

Fabian: from the phrasing of your question, I believe you're just looking for a simple way of setting values in the array to follow any set of rules you might devise. No problem.
Your array is small (and from the context I strongly suspect you only want to use the code for something of that size). So good practice is simply to use a set of three for loops, which will run almost instantly - no need for any unnecessary complications. My code below shows an example: here we set element x[i,j,k] to be i+j+k, unless k>2, in which case we set it to be j*k-i instead.
Obviously, you can have as many rules as you want - just add an if statement for each one, and define val to be the value you want x[i,j,k] to take if that condition is true. (There's a few different ways to set this up, but this one seems the simplest to understand.) At the end of the innermost loop, x[i,j,k] gets set to the required value (val), and we then go on and do the next element until they're all done. That's it!
x = array(rep(1, 365*5*4), dim=c(365, 5, 4))
for (i in seq_len(dim(x)[1])) {
for (j in seq_len(dim(x)[2])) {
for (k in seq_len(dim(x)[3])) {
val = i+j+k
if (k > 2) {
val = j*k-i
}
x[i,j,k] = val
}
}
}
Hope this helps :)
Quick update (non-loopy method): For completeness, if you're in a real hurry and want your code to run in 0.07 seconds rather than 0.19 seconds... you could also set things up in a vectory way like this:
comb = expand.grid(seq_len(365), seq_len(5), seq_len(4))
i = comb$Var1; j = comb$Var2; k = comb$Var3
val = i+j+k
subs = which(k>2); val[subs] = (j*k-i)[subs]
x = array(val, dim = c(365, 5, 4))
In the above, the variables i, j and k are vectors with length 7300 (the number of cells in the array). As before, the default choice for val is the sum i+j+k except on the subset k>2, where val is j*k-i instead - exactly the same as the example in the first part of my answer. Obviously the notation in this method is quite a bit harder which is why I thought it'd be better to show you the loop-based solution. Hopefully you'll see how you could add in other conditions to the above though. The final line maps the vector val over to the array x in the right way so that each x[i,j,k] takes on the correct value of val. Try it and see :)
One small point to note though: if you were ever to want to run this sort of algorithm on a massive array (much, much, much bigger than the one you have now), then the approach immediately above would definitely be the one to use to minimise runtime. For your case, my advice is to use whichever one you feel more comfortable with as the runtime isn't really an issue.
Cheers! :)

Related

Faster ways to generate Yellowstone sequence (A098550) in R?

I just saw a YouTube video from Numberphile on the Yellowstone sequence (A098550). It's base on a sequence starting with 1 and 2, with subsequent terms generated by the rules:
no repeated terms
always pick the lowest integer
gcd(a_n, a_(n-1)) = 1
gcd(a_n, a_(n-2)) > 1
The first 15 terms would be: 1 2 3 4 9 8 15 14 5 6 25 12 35 16 7
A Q&D approach in R could be something like this, but understandably, this becomes very slow at attempts to make longer sequences. It also make some assumptions about the highest number that is possible within the sequence (as info: the sequence of 10,000 items never goes higher than 5000).
What can we do to make this faster?
library(DescTools)
a <- c(1, 2, 3)
p <- length(a)
# all natural numbers
all_ints <- 1:5000
for (n in p:1000) {
# rule 1 - remove all number that are in sequence already
next_a_set <- all_ints[which(!all_ints %in% a)]
# rule 3 - search the remaining set for numbers that have gcd == 1
next_a_option <- next_a_set[which(
sapply(
next_a_set,
function(x) GCD(a[n], x)
) == 1
)]
# rule 4 - search the remaining number for gcd > 1
next_a <- next_a_option[which(
sapply(
next_a_option,
function(x) GCD(a[n - 1], x)
) > 1
)]
# select the lowest
a <- c(a, min(next_a))
n <- n + 1
}
Here's a version that's about 20 times faster than yours, with comments about the changes:
# Set a to the final length from the start.
a <- c(1, 2, 3, rep(NA, 997))
p <- 3
# Define a vectorized gcd() function. We'll be testing
# lots of gcds at once. This uses the Euclidean algorithm.
gcd <- function(x, y) { # vectorized gcd
while (any(y != 0)) {
x1 <- ifelse(y == 0, x, y)
y <- ifelse(y == 0, 0, x %% y)
x <- x1
}
x
}
# Guess at a reasonably large vector to work from,
# but we'll grow it later if not big enough.
allnum <- 1:1000
# Keep a logical record of what has been used
used <- c(rep(TRUE, 3), rep(FALSE, length(allnum) - 3))
for (n in p:1000) {
# rule 1 - remove all number that are in sequence already
# nothing to do -- used already records that.
repeat {
# rule 3 - search the remaining set for numbers that have gcd == 1
keep <- !used & gcd(a[n], allnum) == 1
# rule 4 - search the remaining number for gcd > 1
keep <- keep & gcd(a[n-1], allnum) > 1
# If we found anything, break out of this loop
if (any(keep))
break
# Otherwise, make the set of possible values twice as big,
# and try again
allnum <- seq_len(2*length(allnum))
used <- c(used, rep(FALSE, length(used)))
}
# select the lowest
newval <- which.max(keep)
# Assign into the appropriate place
a[n+1] <- newval
# Record that it has been used
used[newval] <- TRUE
}
If you profile it, you'll see it spends most of its time in the gcd() function. You could probably make that a lot faster by redoing it in C or C++.
The biggest change here is pre-allocation and restricting the search to numbers that have not yet been used.
library(numbers)
N <- 5e3
a <- integer(N)
a[1:3] <- 1:3
b <- logical(N) # which numbers have been used already?
b[1:3] <- TRUE
NN <- 1:N
system.time({
for (n in 4:N) {
a1 <- a[n - 1L]
a2 <- a[n - 2L]
for (k in NN[!b]) {
if (GCD(k, a1) == 1L & GCD(k, a2) > 1L) {
a[n] <- k
b[k] <- TRUE
break
}
}
if (!a[n]) {
a <- a[1:(n - 1L)]
break
}
}
})
#> user system elapsed
#> 1.28 0.00 1.28
length(a)
#> [1] 1137
For a fast C++ algorithm, see here.

Loop calculation with previous value not using for in R

I'm a beginning R programmer. I have trouble in a loop calculation with a previous value like recursion.
An example of my data:
dt <- data.table(a = c(0:4), b = c( 0, 1, 2, 1, 3))
And calculated value 'c' is y[n] = (y[n-1] + b[n])*a[n]. Initial value of c is 0. (c[1] = 0)
I used the for loop and the code and result is as below.
dt$y <- 0
for (i in 2:nrow(dt)) {
dt$y[i] <- (dt$y[i - 1] + dt$b[i]) * dt$a[i]
}
a b y
1: 0 0 0
2: 1 1 1
3: 2 2 6
4: 3 1 21
5: 4 3 96
This result is what I want. However, my data has over 1,000,000 rows and several columns, therefore I'm trying to find other ways without using a for loop. I tried to use "Reduce()", but it only works with a single vector (ex. y[n] = y_[n-1]+b[n]). As shown above, my function uses two vectors, a and b, so I can't find a solution.
Is there a more efficient way to be faster without using a for loop, such as using a recursive function or any good package functions?
This kind of computation cannot make use of R's advantage of vectorization because of the iterative dependencies. But the slow-down appears to really be coming from indexing performance on a data.frame or data.table.
Interestingly, I was able to speed up the loop considerably by accessing a, b, and y directly as numeric vectors (1000+ fold advantage for 2*10^5 rows) or as matrix "columns" (100+ fold advantage for 2*10^5 rows) versus as columns in a data.table or data.frame.
This old discussion may still shed some light on this rather surprising result: https://stat.ethz.ch/pipermail/r-help/2011-July/282666.html
Please note that I also made a different toy data.frame, so I could test a larger example without returning Inf as y grew with i:
Option data.frame (numeric vectors embedded in a data.frame or data.table per your example):
vec_length <- 200000
dt <- data.frame(a=seq(from=0, to=1, length.out = vec_length), b=seq(from=0, to=-1, length.out = vec_length), y=0)
system.time(for (i in 2:nrow(dt)) {
dt$y[i] <- (dt$y[i - 1] + dt$b[i]) * dt$a[i]
})
#user system elapsed
#79.39 146.30 225.78
#NOTE: Sorry, I didn't have the patience to let the data.table version finish for vec_length=2*10^5.
tail(dt$y)
#[1] -554.1953 -555.1842 -556.1758 -557.1702 -558.1674 -559.1674
Option vector (numeric vectors extracted in advance of loop):
vec_length <- 200000
dt <- data.frame(a=seq(from=0, to=1, length.out = vec_length), b=seq(from=0, to=-1, length.out = vec_length), y=0)
y <- as.numeric(dt$y)
a <- as.numeric(dt$a)
b <- as.numeric(dt$b)
system.time(for (i in 2:length(y)) {
y[i] <- (y[i - 1] + b[i]) * a[i]
})
#user system elapsed
#0.03 0.00 0.03
tail(y)
#[1] -554.1953 -555.1842 -556.1758 -557.1702 -558.1674 -559.1674
Option matrix (data.frame converted to matrix before loop):
vec_length <- 200000
dt <- as.matrix(data.frame(a=seq(from=0, to=1, length.out = vec_length), b=seq(from=0, to=-1, length.out = vec_length), y=0))
system.time(for (i in 2:nrow(dt)) {
dt[i, 1] <- (dt[i - 1, 3] + dt[i, 2]) * dt[i, 1]
})
#user system elapsed
#0.67 0.01 0.69
tail(dt[,3])
#[1] -554.1953 -555.1842 -556.1758 -557.1702 -558.1674 -559.1674
#NOTE: a matrix is actually a vector but with an additional attribute (it's "dim") that says how the "matrix" should be organized into rows and columns
Option data.frame with matrix style indexing:
vec_length <- 200000
dt <- data.frame(a=seq(from=0, to=1, length.out = vec_length), b=seq(from=0, to=-1, length.out = vec_length), y=0)
system.time(for (i in 2:nrow(dt)) {
dt[i, 3] <- (dt[(i - 1), 3] + dt[i, 2]) * dt[i, 1]
})
#user system elapsed
#110.69 0.03 112.01
tail(dt[,3])
#[1] -554.1953 -555.1842 -556.1758 -557.1702 -558.1674 -559.1674
An option is to use Rcpp since for this recursive equation is easy to code in C++:
library(Rcpp)
cppFunction("
NumericVector func(NumericVector b, NumericVector a) {
int len = b.size();
NumericVector y(len);
for (int i = 1; i < len; i++) {
y[i] = (y[i-1] + b[i]) * a[i];
}
return(y);
}
")
func(c( 0, 1, 2, 1, 3), c(0:4))
#[1] 0 1 6 21 96
timing code:
vec_length <- 1e7
dt <- data.frame(a=1:vec_length, b=1:vec_length, y=0)
y <- as.numeric(dt$y)
a <- as.numeric(dt$a)
b <- as.numeric(dt$b)
system.time(for (i in 2:length(y)) {
y[i] <- (y[i - 1] + b[i]) * a[i]
})
# user system elapsed
# 19.22 0.06 19.44
system.time(func(b, a))
# user system elapsed
# 0.09 0.02 0.09
Here is a base R solution.
According to the information from #ThetaFC, an indication for speedup is to use matrix or vector (rather than data.frame for data.table). Thus, it is better to have the following preprocessing before calculating df$y, i.e.,
a <- as.numeric(df$a)
b <- as.numeric(df$b)
Then, you have two approaches to get df$y:
writing your customized recursion function
f <- function(k) {
if (k == 1) return(0)
c(f(k-1),(tail(f(k-1),1) + b[k])*a[k])
}
df$y <- f(nrow(df))
Or a non-recursion function (I guess this will be much faster then the recursive approach)
g <- Vectorize(function(k) sum(rev(cumprod(rev(a[2:k])))*b[2:k]))
df$y <- g(seq(nrow(df)))
such that
> df
a b y
1 0 0 0
2 1 1 1
3 2 2 6
4 3 1 21
5 4 3 96
I don't think this will be any faster, but here's one way to do it without an explicit loop
dt[, y := purrr::accumulate2(a, b, function(last, a, b) (last + b)*a
, .init = 0)[-1]]
dt
# a b y
# 1: 0 0 0
# 2: 1 1 1
# 3: 2 2 6
# 4: 3 1 21
# 5: 4 3 96

Summation of a sequence

If n(1) = 1 ,n(2) = 5, n(3) = 13, n(4) = 25, ...
I am using a for loop for summation of these terms
1 + (1*4 - 4) + (2*4 - 4) + (3*4 - 4) + ..
This is the function I am using with a for loop:
shapeArea <- function(n) {
terms <- as.numeric(1)
for(i in 1:n){
terms <- append(terms, (i*4 - 4))
}
sum(terms)
}
This works fine (as shown here):
> shapeArea(3)
[1] 13
> shapeArea(2)
[1] 5
> shapeArea(4)
[1] 25
Yet I was also thinking how can I do this without saving the terms of the series in numeric vector terms. In other words is there a way to find summations of terms without saving them in a vector first. Or is this the efficient way to do this.
Thanks
You can change your shapeArea function to a one-liner
shapeArea <- function(num) {
1 + sum(seq(num) * 4) - (4 * num)
}
shapeArea(1)
#[1] 1
shapeArea(2)
#[1] 5
shapeArea(3)
#[1] 13
shapeArea(4)
#[1] 25

How to combine two loops

In a tutorial on for() Loops came across the following exercise:
Exercise 4.4. Write a function to perform matrix-vector multiplication. It should take a matrix A and a vector b as arguments, and return the vector Ab. Use two loops to do this, rather than %*% or any vectorization.
Lets say I use a specific matrix A(dim:3,4) and vector b(length(3)).
> # Ex 4.4
> out<-c(1,1,1)
> Ab<-function(A,b) {
+ for(i in 1:dim(A)[1]) {
+
+ out[i]=sum(A[i,]*b)
+ }
+ out
+ }
> a = c(1,1,1)
> A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 10
> a
[1] 1 1 1
> Ab(A,a)
[1] 12 15 19
This works for a very specific case, i.e. matrix with 3 rows and vector of length 3, but leaves much to be desired, i don't know what a good solution to this exercise would be but the question says 'use two loops'. Suggestions will be much appreciated.
thx
You are hiding the inner loop with A[i,]*b which is doing vectorized multiplication (ie. a hidden loop). So, if you expand that out explicitly you will have the two required loops.
Ab<-function(A,b) {
if (dim(A)[2] != NROW(b)) stop("wrong dimensions")
out <- matrix(, nrow(A), 1)
for(i in 1:dim(A)[1]) {
s <- 0
for (j in 1:dim(A)[2]) s <- s + A[i,j] * b[j]
out[i] <- s
}
out
}

Looping through selected values in R

I want to iterate a loop only for some values so I am using this:
present <- c(3,5,7,8)
for(i in present)
{
print(i)
}
which gives me
[1] 3
[1] 5
[1] 7
[1] 8
however I need to jump to the next value within the loop, say I dont want 5 to be printed in above example.
I cannot use next since I want it in nested for like this
present <- c(3,5,7,8)
for(i in present)
{
k <- i
"Jump to next value of present"
while(k < "The next value for i should come here")
{
k <- k + 1
print(k)
}
}
The output would be 3 4 5 6 7 8 but the condition must check value of k if it exceeds next value of i.
Is there anyway to accomplish this?
I'll take help of C to explain further,
for(i=0; i < 10; i++)
{
for(k=i;k <= i+1;k++)
{
printf("%d", k);
}
}
The link contains output of above code
http://codepad.org/relkenY3
It is easy in C since next value is in sequence, but here next value is not known, hence the problem.
What you should do is loop through two vectors:
x <- head(present, -1)
# [1] 3 5 7
y <- tail(present, -1)
# [1] 5 7 8
and the function to do that is mapply (have a look at ?mapply). A close translation of your pseudo-code would be:
invisible(mapply(function(x, y) while(x < y) {x <- x + 1; print(x)}, x, y))
but maybe you'll find this more interesting:
mapply(seq, x + 1, y)
I suspect the answer is to use seq_along and use it as an index into "present", but as others have pointed out your code does not promise to deliver what you expect, even with that simple modification. The K <- K=1 assignment jumps ahead too far to deliver a value of 3 at any point and the termination condition is likewise not clear. It turns into an infinite loop in the form you construct. Work with this;
present <- c(3,5,7,8)
for(i in seq_along(present))
{
k <- i
while(k < length(present) )
{
k <- k + 1
print(present[k])
}
}

Resources