Multidimensional binding - r

For dealing with two-dimensional matrices, rbind and cbind are useful functions. Are there more generic functions to perform the same operation in more dimensions? Suppose I have data like this:
data <- lapply(c(11,22,33), function(i) matrix(i, nrow=2, ncol=4))
What I'd like to obtain is this:
data <- do.call(c, data)
dim(data) <- c(2, 4, 3)
but without having to work out all the dimensions myself.
Is there a function providing this functionality, either built-in or as part of a reasonably common package? Or do you want to share your own ideas of how such a function could be implemented most elegantly?
Bonus points:
If the function gives some control over the order of dimensions, then a subsequent call to aperm could be avoided.
It would be nice if it could operate by either passing multiple function arguments or a list of arguments, although using do.call or list, either one will suffice.
I'd like to use such a function as the .combine argument to a foreach call. So it should be able to construct multi-dimensional matrices using calls of the form f(f(f(a, b), c), d) (each call takes exactly two arguments, the first usually the result of the previous call) or even f(f(a, b), c, d) (more than two arguments, the first still might be the result of the previous call), with a, b, c, d all of the same size, resulting in a matrix with a dimension 1 higher than the dimensions of these and a size of 4 in that dimension, corresponding to the 4 elements a through d.

The abind package has precisely this function, with most of the features you mention, although I haven't checked all of them in detail.
At the very least, it would give you a start on how one would implement something along these lines.

Related

Handling matrices using Brobdingnag package

I need to build a matrix with extremely small entries.
So far I realized that the fastest way to define the kind of matrix that I need is:
Define a vectorized function of coordinates:
func = function(m,n){...}
Combine every possible coordinate using outer:
matrix = outer(1:100,1:100,FUN=func)
Having to deal with extremely small numbers I work in func's environment using brob numbers, its output will therefore be of the same type of a brob:
typeof(func(0:100,0:100) )
[1] "S4"
If I directly plug two vectors 0:100 in my function func it returns a vector of brobs but if I try to use it with outer I get the error:
Error in outer(1:100, 1:100, FUN = func) : invalid first argument
I suppose this is because package Brobdingnag can somehow deal with vectors but not with matrices. Is it right? Is there any way to make it work?

R: possible to use function with two arguments for Map?

What's the right approach to using Map for a function with two arguments in R?
I could get the same effect by using a function which takes 1 argument that consists of a list, and then pass in a list of lists, but I'd like to know if there's a better solution.
Just feed in the extra arguments as a vector like mapply.
Map('+', 1:5, 2:6)
You can name them if you want. If they're not long enough they're recycled out to the right length (e.g. n here)
Map(rnorm, n=1, mean=1:5, sd=1:5)
Since mapply(f, c(a,b,c,...)) = c(f(a), f(b), f(c), ...), it is unclear what those extra arguments should be. If the additional arguments are fixed (or are derived from the element itself), you can use an anonymous function: mapply(function(x) g(1, true, x, 42), c(a,b,c,...)).

R: passing by parameter to function and using apply instead of nested loop and recursive indexing failed

I have two lists of lists. humanSplit and ratSplit. humanSplit has element of the form::
> humanSplit[1]
$Fetal_Brain_408_AGTCAA_L001_R1_report.txt
humanGene humanReplicate alignment RNAtype
66 DGKI Fetal_Brain_408_AGTCAA_L001_R1_report.txt 6 reg
68 ARFGEF2 Fetal_Brain_408_AGTCAA_L001_R1_report.txt 5 reg
If you type humanSplit[[1]], it gives the data without name $Fetal_Brain_408_AGTCAA_L001_R1_report.txt
RatSplit is also essentially similar to humanSplit with difference in column order. I want to apply fisher's test to every possible pairing of replicates from humanSplit and ratSplit. Now I defined the following empty vector which I will use to store the informations of my fisher's test
humanReplicate <- vector(mode = 'character', length = 0)
ratReplicate <- vector(mode = 'character', length = 0)
pvalue <- vector(mode = 'numeric', length = 0)
For fisher's test between two replicates of humanSplit and ratSplit, I define the following function. In the function I use `geneList' which is a data.frame made by reading a file and has form:
> head(geneList)
human rat
1 5S_rRNA 5S_rRNA
2 5S_rRNA 5S_rRNA
Now here is the main function, where I use a function getGenetype which I already defined in other part of the code. Also x and y are integers :
fishertest <-function(x,y) {
ratReplicateName <- names(ratSplit[x])
humanReplicateName <- names(humanSplit[y])
## merging above two based on the one-to-one gene mapping as in geneList
## defined above.
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
mergedRatData <- merge(geneList, ratSplit[[x]], by.x = "rat", by.y = "ratGene")
## [here i do other manipulation with using already defined function
## getGenetype that is defined outside of this function and make things
## necessary to define following contingency table]
contingencyTable <- matrix(c(HnRn,HnRy,HyRn,HyRy), nrow = 2)
fisherTest <- fisher.test(contingencyTable)
humanReplicate <- c(humanReplicate,humanReplicateName )
ratReplicate <- c(ratReplicate,ratReplicateName )
pvalue <- c(pvalue , fisherTest$p)
}
After doing all this I do the make matrix eg to use in apply. Here I am basically trying to do something similar to double for loop and then using fisher
eg <- expand.grid(i = 1:length(ratSplit),j = 1:length(humanSplit))
junk = apply(eg, 1, fishertest(eg$i,eg$j))
Now the problem is, when I try to run, it gives the following error when it tries to use function fishertest in apply
Error in humanSplit[[y]] : recursive indexing failed at level 3
Rstudio points out problem in following line:
mergedHumanData <-merge(geneList,humanSplit[[y]], by.x = "human", by.y = "humanGene")
Ultimately, I want to do the following:
result <- data.frame(humanReplicate,ratReplicate, pvalue ,alternative, Conf.int1, Conf.int2, oddratio)
I am struggling with these questions:
In defining fishertest function, how should I pass ratSplit and humanSplit and already defined function getGenetype?
And how I should use apply here?
Any help would be much appreciated.
Up front: read ?apply. Additionally, the first three hits on google when searching for "R apply tutorial" are helpful snippets: one, two, and three.
Errors in fishertest()
The error message itself has nothing to do with apply. The reason it got as far as it did is because the arguments you provided actually resolved. Try to do eg$i by itself, and you'll see that it is returning a vector: the corresponding column in the eg data.frame. You are passing this vector as an index in the i argument. The primary reason your function erred out is because double-bracket indexing ([[) only works with singles, not vectors of length greater than 1. This is a great example of where production/deployed functions would need type-checking to ensure that each argument is a numeric of length 1; often not required for quick code but would have caught this mistake. Had it not been for the [[ limit, your function may have returned incorrect results. (I've been bitten by that many times!)
BTW: your code is also incorrect in its scoped access to pvalue, et al. If you make your function return just the numbers you need and the aggregate it outside of the function, your life will simplify. (pvalue <- c(pvalue, ...) will find pvalue assigned outside the function but will not update it as you want. You are defeating one purpose of writing this into a function. When thinking about writing this function, try to answer only this question: "how do I compare a single rat record with a single human record?" Only after that works correctly and simply without having to overwrite variables in the parent environment should you try to answer the question "how do I apply this function to all pairs and aggregate it?" Try very hard to have your function not change anything outside of its own environment.
Errors in apply()
Had your function worked properly despite these errors, you would have received the following error from apply:
apply(eg, 1, fishertest(eg$i, eg$j))
## Error in match.fun(FUN) :
## 'fishertest(eg$i, eg$j)' is not a function, character or symbol
When you call apply in this sense, it it parsing the third argument and, in this example, evaluates it. Since it is simply a call to fishertest(eg$i, eg$j) which is intended to return a data.frame row (inferred from your previous question), it resolves to such, and apply then sees something akin to:
apply(eg, 1, data.frame(...))
Now that you see that apply is being handed a data.frame and not a function.
The third argument (FUN) needs to be a function itself that takes as its first argument a vector containing the elements of the row (1) or column (2) of the matrix/data.frame. As an example, consider the following contrived example:
eg <- data.frame(aa = 1:5, bb = 11:15)
apply(eg, 1, mean)
## [1] 6 7 8 9 10
# similar to your use, will not work; this error comes from mean not getting
# any arguments, your error above is because
apply(eg, 1, mean())
## Error in mean.default() : argument "x" is missing, with no default
Realize that mean is a function itself, not the return value from a function (there is more to it, but this definition works). Because we're iterating over the rows of eg (because of the 1), the first iteration takes the first row and calls mean(c(1, 11)), which returns 6. The equivalent of your code here is mean()(c(1, 11)) will fail for a couple of reasons: (1) because mean requires an argument and is not getting, and (2) regardless, it does not return a function itself (in a "functional programming" paradigm, easy in R but uncommon for most programmers).
In the example here, mean will accept a single argument which is typically a vector of numerics. In your case, your function fishertest requires two arguments (templated by my previous answer to your question), which does not work. You have two options here:
Change your fishertest function to accept a single vector as an argument and parse the index numbers from it. Bothing of the following options do this:
fishertest <- function(v) {
x <- v[1]
y <- v[2]
ratReplicateName <- names(ratSplit[x])
## ...
}
or
fishertest <- function(x, y) {
if (missing(y)) {
y <- x[2]
x <- x[1]
}
ratReplicateName <- names(ratSplit[x])
## ...
}
The second version allows you to continue using the manual form of fishertest(1, 57) while also allowing you to do apply(eg, 1, fishertest) verbatim. Very readable, IMHO. (Better error checking and reporting can be used here, I'm just providing a MWE.)
Write an anonymous function to take the vector and split it up appropriately. This anonymous function could look something like function(ii) fishertest(ii[1], ii[2]). This is typically how it is done for functions that either do not transform as easily as in #1 above, or for functions you cannot or do not want to modify. You can either assign this intermediary function to a variable (which makes it no longer anonymous, figure that) and pass that intermediary to apply, or just pass it directly to apply, ala:
.func <- function(ii) fishertest(ii[1], ii[2])
apply(eg, 1, .func)
## equivalently
apply(eg, 1, function(ii) fishertest(ii[1], ii[2]))
There are two reasons why many people opt to name the function: (1) if the function is used multiple times, better to define once and reuse; (2) it makes the apply line easier to read than if it contained a complex multi-line function definition.
As a side note, there are some gotchas with using apply and family that, if you don't understand, will be confusing. Not the least of which is that when your function returns vectors, the matrix returned from apply will need to be transposed (with t()), after which you'll still need to rbind or otherwise aggregrate.
This is one area where using ddply may provide a more readable solution. There are several tutorials showing it off. For a quick intro, read this; for a more in depth discussion on the bigger picture in which ddply plays a part, read Hadley's Split, Apply, Combine Strategy for Data Analysis paper from JSS.

what data type is produced by mapply()?

This is what my code looks like. a, b, c, and d are scalers, e is a list of vectors. A, B, C and D are vectors.
GetOutput=function(a,b,c,d){
e=FunOther(a,b,c,d)
i=mean(e$f)
j=mean(e$g)
k=abs(mean(e$h))
return(list(b=b,i=i,j=j,k=k))
}
Output=mapply(GetOutput,A,B,C,D)
GetOutput will return a list of 4 scalers. I want to factor this up to a matrix of inputs and a matrix of outputs. I had been using a for loop but I thought I would try mapply instead.
Suppose A, B, C and D have a length 100. I just want to get a vector with length 100 which give me all of the i's so that I can calculate their minima. Then the same for the j's and k's. This is part of a Monte Carlo study. But I am having trouble understanding the Output object. It appears to be a list of lists. What I thought would be a one liner turns into several operations. The best I can come up with is:
Output2=as.data.frame(t(Output))
OutputMeans=c(mean(as.numeric(Output2$i)),
mean(as.numeric(Output2$j)),
mean(as.numeric(Output2$k)))
This seems just bananas to me. I though I could operate on Output directly with the mean function without having to bother with all of these transformations.
If you had instead written: return( c(b=b,i=i,j=j,k=k) ) , then each element in the list from mapply would have been a named vector, rather than what you did get .... a list of lists. And since the 'simplify' argument would have let you return a matrix, you could have returned a non-recursive result as well. Because mapply is so versatile, it gives you multiple levels of control of the returned structure.
Another R programming tip: Don't use '$' as an extraction function inside functions. If you are sure of your column name you can use `[['colname']] but then by using '[[' you can later generalize your function to accept column names as arguments, a feature which '$' will not support.

Applying apply to a function w multiple parameters

I want to use the apply function to the train function in the caret package.
The train function requires three parameters a, b and c. For my purposes a and b do not vary, but I'd like to iterate over many values of c which are contained in a vector.
How can I use apply (or one of its cousins) to do this storing the results in a list (or other structure)? I've read the documentation and have used apply to find row and column means. But, mean only requires 1 parameter. I have three. Also, my b parameter is a large data frame. I've thought about replicating a and b for each c, but this seems wasteful.
Try this:
lapply( c.vector, train, a=1, b=2)
Obviously I am guessing at values for a and b, but the principle should be clear. The ... argument mechanism allows you to supply named fixed parameters to functions called with lapply or sapply, or apply. The problem with using apply is that it expects a matrix or datafrrame argument and that can be unwise since the row vectors will get coerced to the lowest common denominator class, often "character"

Resources