Applying a function to the edges of a multidimensional array in R - r

How best can I apply a function to the edges of a multidimensional array in R, without hard-coding the number of dimensions in advance. In a two dimensional array, I could, for instance:
myarray[1,] = f(myarray[1,])
myarray[M,] = f(myarray[M,])
myarray[,1] = f(myarray[,1])
myarray[,N] = f(myarray[,N])
But what if I want to have a function do this for an array of any dimension? In particular, how can I handle the indexing in a relatively painless way? (Assume that have multiple applications of the function taking place at corners is not a problem.)
If I flatten the array, I can do this, but I'd prefer a vectorized approach. Alternatively, I could just hard code this for arrays of every dimension up to, some dimension and fail on higher, but I'd prefer something prettier, if possible.

Here's a solution that should be able to handle an arbitrary number of dimensions. The basic idea is that
For each dimension, apply() is called with the function of choice
Each result of that function is turned into a list of length one
This should make apply() return a list of results for each dimension
The first and last list items for each dimension are stored in the results vector
This would be very time consuming for arrays with large dimensions and/or time consuming functions of choice, since the function is applied to a potentially large number of values that are not used. But it should allow for arbitrary functions and arbitrary results of those functions. Here it goes:
## Set up array
xx<-array(1:24,dim=c(1,2,3,4))
## Determine number of dimensions in array
ndim<-length(dim(xx))
## Set up results vector (a list)
myAns<-vector("list",ndim)
## Iterating over the number of dimensions, apply a function
for(ii in seq_len(ndim)){
tempAns<-apply(xx,ii,function(x)list(mean(x)))
## Store first and last results in myAns vector
## If result is length 1, only store the single result
if(length(tempAns)==1){
myAns[[ii]]<-tempAns
} else {
myAns[[ii]]<-c(head(tempAns,1),tail(tempAns,1))
}
}

Did you have something like this in mind?
> ary <- array(1:27, c(3,3,3))
> apply(ary, MARGIN = 3, function(x) {
+ lastColMean <- mean(x[, ncol(x)])
+ lastRowMean <- mean(x[nrow(x), ])
+ data.frame(lastColMean, lastRowMean)
+ })
[[1]]
lastColMean lastRowMean
1 8 6
[[2]]
lastColMean lastRowMean
1 17 15
[[3]]
lastColMean lastRowMean
1 26 24

Related

Storing matrix after every iteration

I have following code.
for(i in 1:100)
{
for(j in 1:100)
R[i,j]=gcm(i,j)
}
gcm() is some function which returns a number based on the values of i and j and so, R has all values. But this calculation takes a lot of time. My machine's power was interrupted several times due to which I had to start over. Can somebody please help, how can I save R somewhere after every iteration, so as to be safe? Any help is highly appreciated.
You can use the saveRDS() function to save the result of each calculation in a file.
To understand the difference between save and saveRDS, here is a link I found useful. http://www.fromthebottomoftheheap.net/2012/04/01/saving-and-loading-r-objects/
If you want to save the R-workspace have a look at ?save or ?save.image (use the first to save a subset of your objects, the second one to save your workspace in toto).
Your edited code should look like
for(i in 1:100)
{
for(j in 1:100)
R[i,j]=gcm(i,j)
save.image(file="path/to/your/file.RData")
}
About your code taking a lot of time I would advise trying the ?apply function, which
Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix
You want gmc to be run for-each cell, which means you want to apply it for each combination of row and column coordinates
R = 100; # number of rows
C = 100; # number of columns
M = expand.grid(1:R, 1:C); # Cartesian product of the coordinates
# each row of M contains the indexes of one of R's cells
# head(M); # just to see it
# To use apply we need gmc to take into account one variable only (that' not entirely true, if you want to know how it really works have a look how at ?apply)
# thus I create a function which takes into account one row of M and tells gmc the first cell is the row index, the second cell is the column index
gmcWrapper = function(x) { return(gmc(x[1], x[2])); }
# run apply which will return a vector containing *all* the evaluated expressions
R = apply(M, 1, gmcWrapper);
# re-shape R into a matrix
R = matrix(R, nrow=R, ncol=C);
If the apply-approach is again slow try considering the snowfall package which will allow you to follow the apply-approach using parallel computing. An introduction to snowfall usage can be found in this pdf, look at page 5 and 6 in particular

R add to a list in a loop, using conditions

I have a data.frame dim = (200,500)
I want to do a shaprio.test on each column of my dataframe and append to a list. This is what I'm trying:
colstoremove <- list();
for (i in range(dim(I.df.nocov)[2])) {
x <- shapiro.test(I.df.nocov[1:200,i])
colstoremove[[i]] <- x[2]
}
However this is failing. Some pointers? (background is mainly python, not much of an R user)
Consider lapply() as any data frame passed into it runs operations on columns and the returned list will be equal to number of columns:
colstoremove <- lapply(I.df.noconv, function(col) shapiro.test(col)[2])
Here is what happens in
for (i in range(dim(I.df.nocov)[2]))
For the sake of example, I assume that I.df.nocov contains 100 rows and 5 columns.
dim(I.df.nocov) is the vector of I.df.nocov dimensions, i.e. c(100, 5)
dim(I.df.nocov)[2] is the 2nd dimension of I.df.nocov, i.e. 5
range(x)is a 2-element vector which contains minimal and maximal values of x. For example, range(c(4,10,1)) is c(1,10). So range(dim(I.df.nocov)[2]) is c(5,5).
Therefore, the loop iterate twice: first time with i=5, and second time also with i=5. Not surprising that it fails!
The problem is that R's function range and Python's function with the same name do completely different things. The equivalent of Python's range is called seq. For example, seq(5)=c(1,2,3,4,5), while seq(3,5)=c(3,4,5), and seq(1,10,2)=c(1,3,5,7,9). You may also write 1:n, it is the same as seq(n), and m:n is same as seq(m,n) (but the priority of ':' is very high, so 1:2*x is interpreted as (1:2)*x.
Generally, if something does not work in R, you should print the subexpressions from the innerwise to the outerwise. If some subexpression is too big to be printed, use str(x) (str means "structure"). And never assume that functions in Python and R are same! If there is a function with same name, it usually does a different thing.
On a side note, instead of dim(I.df.nocov)[2] you could just write ncol(I.df.nocov) (there is also a function nrow).

R: How to create a loop for, for a range of data in a function?

I have this parameter:
L_inf <- seq(17,20,by=0.1)
and this function:
fun <- function(x){
L_inf*(1-exp(-B*(x-0)))}
I would to apply this function for a range of value of L_inf.
I tried with loop for, like this:
A <- matrix() # maybe 10 col and 31 row or vice versa
for (i in L_inf){
A[i] <- fun(1:10)
}
Bur R respond: longer object length is not a multiple of shorter object length.
My expected output is a matrix (or data frame, or list maybe) with 10 result (fun(1:10)) for each value of the vector L_inf (lenght=31).
How can to do it?
You are trying to put a vector of 10 elements into one of the matrix cell. You want to assign it to the matrix row instead (you can access the ith row with A[i,]).
But using a for loop in this case is inefficient and it is quite straightforward to use one of the "apply" function. Apply functions typically return a list (which is the most versatile container since there is basically no constraint).
Here sapply is an apply function which tries to Simplify its result to a convenient data structure. In this case, since all results have the same length (10), sapply will simplify the result to a matrix.
Note that I modified your function to make it explicitly depend on L_inf. Otherwise it will not do what you think it should do (see keyword "closures" if you want more info).
L_inf_range <- seq(17,20,by=0.1)
B <- 1
fun <- function(x, L_inf) {
L_inf*(1-exp(-B*(x-0)))
}
sapply(L_inf_range, function(L) fun(1:10, L_inf=L))

R: how do you apply a function to a vector and get a vector of different length?

I have a list of means, for example
avgs = c(1,2,3)
and a function:
simulate <- function (avg)
{ rnorm(n=10,m=avg,sd=1) }
What is the best way to get a vector of 30 values, rather than a multidimensional array from
sapply(avgs,simulate)?
In your case, just take advantage of the fact that rnorm is vectorized and thus will accept entire vectors as arguments:
rnorm(30, avgs, 1)
You can also remove dimensions from your matrix with c:
c(sapply(avgs, simulate))
but this approach is slower and less direct.

R assign several list elements the same object

I currently have a loop - well actually a loop in loop, in a simulation model which gets slow with larger numbers of individuals. I've vectorised most of it and made it a heck of a lot faster. But there's a part where I assign multiple elements of a list as the same thing, simplifying a big loop to just the task I want to achieve:
new.matrices[[length(new.matrices)+1]]<-old.matrix
With each iteration of the loop the line above is called, and the same matrix object is assigned to the next new element of a list.
I'm trying to vectorize this - if possible, or make it faster than a loop or apply statement.
So far I've tried stuff along the lines of:
indices <- seq(from = length(new.matrices) + 1, to = length(new.matrices) + reps)
new.matrices[indices] <- old.matrix
However this results in the message:
Warning message:
In new.effectors[effectorlength] <- matrix :
number of items to replace is not a multiple of replacement length
It also tries to assign one value of the old.matrix to one element of new.matrices like so:
[[1]]
[1] 8687
[[2]]
[1] 1
[[3]]
[1] 5486
[[4]]
[1] 0
When the desired result is one list element = one whole matrix, a copy of old.matrix
Is there a way I can vectorize sticking a matrix in list elements without looping? With loops how it is currently implemented we are talking many thousands of repetitions which slows things down considerably, hence my desire to vectorize this if possible.
Probably you already solved your problem, anyway, the issue in your code
new.matrices[indices] <- old.matrix
was caused by trying to replace some objects (the NULL elements in your new.matrices list) with something different, a matrix. So R coerces old.matrix into a vector and tries to stick each single value to a different list element, (that's why you got this result, and when, say, reps is 4 or 8 and old.matrix is NOT a 2 x 2 matrix, you also get the warning). Doing
new.matrices[indices] <- list(old.matrix)
will work, and R will replicate the single element list list(old.matrix) "reps" times automatically.

Resources