I am trying to optimise a code that I have written using the apply() and similar functions (e.g. lapply()). Unfortunately I do not see much of improvement so searching I came across this post apply() is slow - how to make it faster or what are my alternatives? where a suggestion is to use the function with() instead of apply() which is certainly much faster.
What I want to do is to apply a user defined function to every row of a matrix. This function takes as input the data from the row, makes some calculations and returns a vector with the results.
A toy example where I use the apply() function, the with() and a vectorized version:
#Generate a matrix 10x3
prbl1=matrix(runif(30),nrow=10)
prbl2=data.frame(prbl1)
prbl3=prbl2
#function for the apply()
fn1=function(row){
x=row[1]
y=row[2]
z=row[3]
k1=2*x+3*y+4*z
k2=2*x*3*y*4*z
k3=2*x*y+3*x*z
return(c(k1,k2,k3))
}
#function for the with()
fn2=function(x,y,z){
k1=2*x+3*y+4*z
k2=2*x*3*y*4*z
k3=2*x*y+3*x*z
return(c(k1,k2,k3))
}
#Vectorise fn2
fn3=Vectorize(fn2)
#apply the functions:
rslt1=t(apply(prbl1,1,fn1))
rslt2=t(with(prbl2,fn2(X1,X2,X3)))
rslt2=cbind(rslt2[1:10],rslt2[11:20],rslt2[21:30])
rslt3=t(with(prbl3,fn3(X1,X2,X3)))
All three produce the same output, a matrix 10x3 which is what I want. Nevertheless, notice at rslt2 that I need to bind the results as the output of using with() is a vector of length 300. I suspected that this is due to the fact that the function is not vectorised (if I understood this correctly). In rslt3 I am using a vectorised version of fn2 which generated the output in the expected way.
When I compare the performance of the three, I get:
library(rbenchmark)
benchmark(rslt1=t(apply(prbl1,1,fn1)),
rslt2=with(prbl2,fn2(X1,X2,X3)),
rslt3=with(prbl3,fn3(X1,X2,X3)),
replications=1000000)
test replications elapsed relative user.self sys.self user.child sys.child
1 rslt1 1000000 103.51 7.129 102.63 0.02 NA NA
2 rslt2 1000000 14.52 1.000 14.41 0.01 NA NA
3 rslt3 1000000 123.44 8.501 122.41 0.05 NA NA
where with() without vectorisation is definitely faster.
My question: Since rslt2 is the most efficient approach, is there a way that I can use this correctly without the need to bind the results afterwards? It does the job but I feel is not efficient coding.
The first and third functions you give are being applied 1 row at a time, so are called 10 times in your example. The second function is taking advantage of the fact that multiplication and addition in R are already vectorised and so using any form of loop or ply function is unnecessary. The function is only called once. If you wanted to use your current code, all you'd need to do is change the c to cbind in fn2.
fn2=function(x,y,z){
k1=2*x+3*y+4*z
k2=2*x*3*y*4*z
k3=2*x*y+3*x*z
return(cbind(k1,k2,k3))
}
All that with does is evaluate the expression it's given in the list, data.frame or environment given. So with(prbl2,fn2(X1,X2,X3)) is entirely equivalent to fn2(prbl2$X1, prbl2$X2, prbl2$X3).
Is this your real function? If it is, then problem solved. If not, then it depends on whether your real function consists entirely of operations and functions that already are vectorised or can be replaced with vectorised equivalents.
For the amended function per the comments:
Single row:
fn1 <- function(row){
x <- row[1]
y <- row[2]
z <- row[3]
k1 <- 2*x+3*y+4*z
k2 <- 2*x*3*y*4*z
k3 <- 2*x*y+3*x*z
if (k1>0 & k2>0 &k3>0){
return(cbind(k1,k2,k3))
} else {
k1 <- 5*x+3*y+4*z
k2 <- 5*x*3*y*4*z
k3 <- 5*x*y+3*x*z
if (k1<0 || k2<0 || k3<0) {
return(cbind(0,0,0))
} else {
return(cbind(k1,k2,k3))
}
}
}
Whole matrix:
fn2 <- function(mat) {
x <- mat[, 1]
y <- mat[, 2]
z <- mat[, 3]
k1 <- 2*x+3*y+4*z
k2 <- 2*x*3*y*4*z
k3 <- 2*x*y+3*x*z
l1 <- 5*x+3*y+4*z
l2 <- 5*x*3*y*4*z
l3 <- 5*x*y+3*x*z
out <- array(0, dim = dim(mat))
useK <- k1 > 0 & k2 > 0 & k3 > 0
useL <- !useK & l1 >= 0 & l2 >= 0 & l3 >= 0
out[useK, ] <- cbind(k1, k2, k3)[useK, ]
out[useL, ] <- cbind(l1, l2, l3)[useL, ]
out
}
Related
I have just started learning R and I wrote this code to learn on functions and loops.
squared<-function(x){
m<-c()
for(i in 1:x){
y<-i*i
c(m,y)
}
return (m)
}
squared(5)
NULL
Why does this return NULL. I want i*i values to append to the end of mand return a vector. Can someone please point out whats wrong with this code.
You haven't put anything inside m <- c() in your loop since you did not use an assignment. You are getting the following -
m <- c()
m
# NULL
You can change the function to return the desired values by assigning m in the loop.
squared <- function(x) {
m <- c()
for(i in 1:x) {
y <- i * i
m <- c(m, y)
}
return(m)
}
squared(5)
# [1] 1 4 9 16 25
But this is inefficient because we know the length of the resulting vector will be 5 (or x). So we want to allocate the memory first before looping. This will be the better way to use the for() loop.
squared <- function(x) {
m <- vector("integer", x)
for(i in seq_len(x)) {
m[i] <- i * i
}
m
}
squared(5)
# [1] 1 4 9 16 25
Also notice that I have removed return() from the second function. It is not necessary there, so it can be removed. It's a matter of personal preference to leave it in this situation. Sometimes it will be necessary, like in if() statements for example.
I know the question is about looping, but I also must mention that this can be done more efficiently with seven characters using the primitive ^, like this
(1:5)^2
# [1] 1 4 9 16 25
^ is a primitive function, which means the code is written entirely in C and will be the most efficient of these three methods
`^`
# function (e1, e2) .Primitive("^")
Here's a general approach:
# Create empty vector
vec <- c()
for(i in 1:10){
# Inside the loop, make one or elements to add to vector
new_elements <- i * 3
# Use 'c' to combine the existing vector with the new_elements
vec <- c(vec, new_elements)
}
vec
# [1] 3 6 9 12 15 18 21 24 27 30
If you happen to run out of memory (e.g. if your loop has a lot of iterations or vectors are large), you can try vector preallocation which will be more efficient. That's not usually necessary unless your vectors are particularly large though.
I have a function with two arguments. The first argument takes vector, and the second argument takes a scalar. I want to apply this function to each row of a matrix, but this function takes different second argument every time. I tried the following, it didn't work. I expected to calculate the p.value for each row and then divide the p.value by the row number. I expected the result to be a vector, but I got a matrix instead. This is a pseudo example, but it illustrates my purpose.
> foo = matrix(rnorm(100),ncol=20)
> f = function (x,y) t.test(x[1:10],x[11:20])$p.value/y
> goo = 1:5
> apply(foo,1,f,y=goo)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.9406881 0.6134117 0.5484542 0.11299535 0.20420786
[2,] 0.4703440 0.3067059 0.2742271 0.05649767 0.10210393
[3,] 0.3135627 0.2044706 0.1828181 0.03766512 0.06806929
[4,] 0.2351720 0.1533529 0.1371135 0.02824884 0.05105196
[5,] 0.1881376 0.1226823 0.1096908 0.02259907 0.04084157
The following for loop strategy produces the expected result, expect would be very slow for the real data.
> res = numeric(5)
> for (i in 1:5){
res[i]=f(foo[i,],i)
}
> res
[1] 0.94068810 0.30670585 0.18281807 0.02824884 0.04084157
Any suggestions would be appreciated!
If your real purpose is like your example, you can vectorize the division:
f <- function(x) t.test(x[1:10], x[11:20])$p.value
apply(foo, 1, f) / goo
Based on the comment, the above is not appropriate.
In the case of the example, you might observe that the diagonal of the returned matrix is the desired result:
f = function (x,y) t.test(x[1:10],x[11:20])$p.value/y
goo = 1:5
diag(apply(foo,1,f,y=goo))
Besides being inefficient in time or space, this suffers from another problem. It is a result of the operation on y being vectorized that this is correct for the example. And in that case, the former solution is better. So I suspect that in your actual problem, your operation is not vectorized.
Sometimes a for loop really is the best answer. The apply family of functions are not magical; they are still loops.
Here is an sapply solution. It won't beat for for time (probably won't lose either) but it doesn't have a high space overhead. The idea is to apply the row index and use that to extract the row of foo and the element of goo to pass to f
sapply(seq(nrow(foo)), function(i) f(foo[i,], goo[i]))
f <- function (x,y) t.test(x[1:10],x[11:20])$p.value/y
f2 <- function(a, b){
tt <- t.test(x = a[1:10], y = a[11:20])$p.value
tt/b
}
f3 <- function() {
res <- numeric(5)
for (i in 1:5){
res[i] <- f(foo[i,],i)
}
res
}
f4 <- function(x) t.test(x[1:10], x[11:20])$p.value
set.seed(101)
foo <- matrix(rnorm(100),ncol=20)
goo <- 1:5
library(rbenchmark)
benchmark(
apply(foo, 1, f4) / goo,
mapply(f,split(foo,row(foo)),goo),
f2(foo,goo),
f3(),replications=1000,
sapply(seq(nrow(foo)), function(i) f(foo[i,], goo[i])),
columns=c("test","replications","elapsed","relative"))
## test replications elapsed relative
## 1 apply(foo, 1, f4)/goo 1000 1.581 5.528
## 3 f2(foo, goo) 1000 0.286 1.000
## 4 f3() 1000 1.458 5.098
## 2 mapply(...) 1000 1.599 5.591
## 5 sapply(...) 1000 1.486 5.196
The direct division is best (but not actually applicable); for this example there's not much difference between the other solutions, but for loop is better than sapply which is better than mapply. You should try this on a more realistic example to see how it's going to scale for your problem.
I am new in using R.
So I am not sure about how to use apply.
I would like to speed up my function with using apply:
for(i in 1: ncol(exp)){
for (j in 1: length(fe)){
tmp =TRUE
id = strsplit(colnames(exp)[i],"\\.")
if(id == fe[j]){
tmp = FALSE
}
if(tmp ==TRUE){
only = cbind(only,c(names(exp)[i],exp[,i]) )
}
}
}
How can I use the apply function to do this above?
EDIT :
Thank you so much for the very good explanation and sorry for my bad description. You guess everything right, but When wanted to delete matches in fe.
Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe<-LETTERS[1:2]
then the result should be only colnames with 'C'. Everything else should be deleted.
1 C.z
2 11
3 12
4 13
5 14
6 15
7 16
8 17
9 18
10 19
11 20
EDIT : If you only want to delete the columns whose name appear in fe, you can simply do :
Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe<-LETTERS[1:2]
id <- sapply(strsplit(names(Exp),"\\."),
function(i)!i[1] %in% fe)
Exp[id]
This code does exactly what your (updated) for-loop does as well, only a lot more efficient. You don't have to loop through fe, the %in% function is vectorized.
In case the name can appear anywhere between the dots, then
id <- sapply(strsplit(names(Exp),"\\."),
function(i)sum(i %in% fe)==0)
Your code does some very funny things, and I have no clue what exactly you're trying to do. For one, strsplit gives a list, so id == fe[j] will always return false, unless fe[j] is a list itself. And I doubt it is... So I'd correct your code as
id = strsplit(colnames(Exp)[i],"\\.")[[1]][1]
in case you want to compare with everything that is before the dot, or to
id = unlist(strsplit(colnames(Exp)[i],"\\."))
if you want to compare with everything in the string. In that case, you should use %in%instead of == as well.
Second, what you get is a character matrix, which essentially multiplies rows. if all elements in fe[j] are unique, you could as well do :
only <- rbind(names(exp),exp)
only <- do.call(cbind,lapply(mat,function(x)
matrix(rep(x,ncol(exp)-1),nrow=nrow(exp)+1)
))
Assuming that the logic in your code does make sense (as you didn't apply some sample data this is impossible to know), the optimalization runs :
mat <- rbind(names(Exp),Exp)
do.call(cbind,
lapply(mat, function(x){
n <- sum(!fe %in% strsplit(x[1],"\\.")[[1]][1])
matrix(rep(x,n),nrow=nrow(mat))
}))
Note that - in case you are interested if fe[j] appears anywhere in the name - you can change the code to :
do.call(cbind,
lapply(mat, function(x){
n <- sum(!fe %in% unlist(strsplit(x[1],"\\.")))
matrix(rep(x,n),nrow=nrow(mat))
}))
If this doesn't return what you want, then your code doesn't do that either. I checked with following sample data, and all gives the same result :
Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe <- LETTERS[1:4]
The apply() family of functions are convenience functions. They will not necessarily be faster than a well-written for loop or vectorized functions. For example:
set.seed(21)
x <- matrix(rnorm(1e6),5e5,2)
system.time({
yLoop <- x[,1]*0 # preallocate result
for(i in 1:NROW(yLoop)) yLoop[i] <- mean(x[i,])
})
# user system elapsed
# 13.39 0.00 13.39
system.time(yApply <- apply(x, 1, mean))
# user system elapsed
# 16.19 0.28 16.51
system.time(yRowMean <- rowMeans(x))
# user system elapsed
# 0.02 0.00 0.02
identical(yLoop,yApply,yRowMean)
# TRUE
The reason your code is so slow is that--as Gavin pointed out--you're growing your array for every loop iteration. Preallocate the entire array before the loop and you will see a significant speedup.
Still trying to get into the R logic... what is the "best" way to unpack (on LHS) the results from a function returning multiple values?
I can't do this apparently:
R> functionReturningTwoValues <- function() { return(c(1, 2)) }
R> functionReturningTwoValues()
[1] 1 2
R> a, b <- functionReturningTwoValues()
Error: unexpected ',' in "a,"
R> c(a, b) <- functionReturningTwoValues()
Error in c(a, b) <- functionReturningTwoValues() : object 'a' not found
must I really do the following?
R> r <- functionReturningTwoValues()
R> a <- r[1]; b <- r[2]
or would the R programmer write something more like this:
R> functionReturningTwoValues <- function() {return(list(first=1, second=2))}
R> r <- functionReturningTwoValues()
R> r$first
[1] 1
R> r$second
[1] 2
--- edited to answer Shane's questions ---
I don't really need giving names to the result value parts. I am applying one aggregate function to the first component and an other to the second component (min and max. if it was the same function for both components I would not need splitting them).
(1) list[...]<- I had posted this over a decade ago on r-help. Since then it has been added to the gsubfn package. It does not require a special operator but does require that the left hand side be written using list[...] like this:
library(gsubfn) # need 0.7-0 or later
list[a, b] <- functionReturningTwoValues()
If you only need the first or second component these all work too:
list[a] <- functionReturningTwoValues()
list[a, ] <- functionReturningTwoValues()
list[, b] <- functionReturningTwoValues()
(Of course, if you only needed one value then functionReturningTwoValues()[[1]] or functionReturningTwoValues()[[2]] would be sufficient.)
See the cited r-help thread for more examples.
(2) with If the intent is merely to combine the multiple values subsequently and the return values are named then a simple alternative is to use with :
myfun <- function() list(a = 1, b = 2)
list[a, b] <- myfun()
a + b
# same
with(myfun(), a + b)
(3) attach Another alternative is attach:
attach(myfun())
a + b
ADDED: with and attach
I somehow stumbled on this clever hack on the internet ... I'm not sure if it's nasty or beautiful, but it lets you create a "magical" operator that allows you to unpack multiple return values into their own variable. The := function is defined here, and included below for posterity:
':=' <- function(lhs, rhs) {
frame <- parent.frame()
lhs <- as.list(substitute(lhs))
if (length(lhs) > 1)
lhs <- lhs[-1]
if (length(lhs) == 1) {
do.call(`=`, list(lhs[[1]], rhs), envir=frame)
return(invisible(NULL))
}
if (is.function(rhs) || is(rhs, 'formula'))
rhs <- list(rhs)
if (length(lhs) > length(rhs))
rhs <- c(rhs, rep(list(NULL), length(lhs) - length(rhs)))
for (i in 1:length(lhs))
do.call(`=`, list(lhs[[i]], rhs[[i]]), envir=frame)
return(invisible(NULL))
}
With that in hand, you can do what you're after:
functionReturningTwoValues <- function() {
return(list(1, matrix(0, 2, 2)))
}
c(a, b) := functionReturningTwoValues()
a
#[1] 1
b
# [,1] [,2]
# [1,] 0 0
# [2,] 0 0
I don't know how I feel about that. Perhaps you might find it helpful in your interactive workspace. Using it to build (re-)usable libraries (for mass consumption) might not be the best idea, but I guess that's up to you.
... you know what they say about responsibility and power ...
Usually I wrap the output into a list, which is very flexible (you can have any combination of numbers, strings, vectors, matrices, arrays, lists, objects int he output)
so like:
func2<-function(input) {
a<-input+1
b<-input+2
output<-list(a,b)
return(output)
}
output<-func2(5)
for (i in output) {
print(i)
}
[1] 6
[1] 7
I put together an R package zeallot to tackle this problem. zeallot includes a multiple assignment or unpacking assignment operator, %<-%. The LHS of the operator is any number of variables to assign, built using calls to c(). The RHS of the operator is a vector, list, data frame, date object, or any custom object with an implemented destructure method (see ?zeallot::destructure).
Here are a handful of examples based on the original post,
library(zeallot)
functionReturningTwoValues <- function() {
return(c(1, 2))
}
c(a, b) %<-% functionReturningTwoValues()
a # 1
b # 2
functionReturningListOfValues <- function() {
return(list(1, 2, 3))
}
c(d, e, f) %<-% functionReturningListOfValues()
d # 1
e # 2
f # 3
functionReturningNestedList <- function() {
return(list(1, list(2, 3)))
}
c(f, c(g, h)) %<-% functionReturningNestedList()
f # 1
g # 2
h # 3
functionReturningTooManyValues <- function() {
return(as.list(1:20))
}
c(i, j, ...rest) %<-% functionReturningTooManyValues()
i # 1
j # 2
rest # list(3, 4, 5, ..)
Check out the package vignette for more information and examples.
functionReturningTwoValues <- function() {
results <- list()
results$first <- 1
results$second <-2
return(results)
}
a <- functionReturningTwoValues()
I think this works.
There's no right answer to this question. I really depends on what you're doing with the data. In the simple example above, I would strongly suggest:
Keep things as simple as possible.
Wherever possible, it's a best practice to keep your functions vectorized. That provides the greatest amount of flexibility and speed in the long run.
Is it important that the values 1 and 2 above have names? In other words, why is it important in this example that 1 and 2 be named a and b, rather than just r[1] and r[2]? One important thing to understand in this context is that a and b are also both vectors of length 1. So you're not really changing anything in the process of making that assignment, other than having 2 new vectors that don't need subscripts to be referenced:
> r <- c(1,2)
> a <- r[1]
> b <- r[2]
> class(r)
[1] "numeric"
> class(a)
[1] "numeric"
> a
[1] 1
> a[1]
[1] 1
You can also assign the names to the original vector if you would rather reference the letter than the index:
> names(r) <- c("a","b")
> names(r)
[1] "a" "b"
> r["a"]
a
1
[Edit] Given that you will be applying min and max to each vector separately, I would suggest either using a matrix (if a and b will be the same length and the same data type) or data frame (if a and b will be the same length but can be different data types) or else use a list like in your last example (if they can be of differing lengths and data types).
> r <- data.frame(a=1:4, b=5:8)
> r
a b
1 1 5
2 2 6
3 3 7
4 4 8
> min(r$a)
[1] 1
> max(r$b)
[1] 8
If you want to return the output of your function to the Global Environment, you can use list2env, like in this example:
myfun <- function(x) { a <- 1:x
b <- 5:x
df <- data.frame(a=a, b=b)
newList <- list("my_obj1" = a, "my_obj2" = b, "myDF"=df)
list2env(newList ,.GlobalEnv)
}
myfun(3)
This function will create three objects in your Global Environment:
> my_obj1
[1] 1 2 3
> my_obj2
[1] 5 4 3
> myDF
a b
1 1 5
2 2 4
3 3 3
Lists seem perfect for this purpose. For example within the function you would have
x = desired_return_value_1 # (vector, matrix, etc)
y = desired_return_value_2 # (vector, matrix, etc)
returnlist = list(x,y...)
} # end of function
main program
x = returnlist[[1]]
y = returnlist[[2]]
Yes to your second and third questions -- that's what you need to do as you cannot have multiple 'lvalues' on the left of an assignment.
How about using assign?
functionReturningTwoValues <- function(a, b) {
assign(a, 1, pos=1)
assign(b, 2, pos=1)
}
You can pass the names of the variable you want to be passed by reference.
> functionReturningTwoValues('a', 'b')
> a
[1] 1
> b
[1] 2
If you need to access the existing values, the converse of assign is get.
[A]
If each of foo and bar is a single number, then there's nothing wrong with c(foo,bar); and you can also name the components: c(Foo=foo,Bar=bar). So you could access the components of the result 'res' as res[1], res[2]; or, in the named case, as res["Foo"], res["BAR"].
[B]
If foo and bar are vectors of the same type and length, then again there's nothing wrong with returning cbind(foo,bar) or rbind(foo,bar); likewise nameable. In the 'cbind' case, you would access foo and bar as res[,1], res[,2] or as res[,"Foo"], res[,"Bar"]. You might also prefer to return a dataframe rather than a matrix:
data.frame(Foo=foo,Bar=bar)
and access them as res$Foo, res$Bar. This would also work well if foo and bar were of the same length but not of the same type (e.g. foo is a vector of numbers, bar a vector of character strings).
[C]
If foo and bar are sufficiently different not to combine conveniently as above, then you shuld definitely return a list.
For example, your function might fit a linear model and
also calculate predicted values, so you could have
LM<-lm(....) ; foo<-summary(LM); bar<-LM$fit
and then you would return list(Foo=foo,Bar=bar) and then access the summary as res$Foo, the predicted values as res$Bar
source: http://r.789695.n4.nabble.com/How-to-return-multiple-values-in-a-function-td858528.html
Year 2021 and this is something I frequently use.
tidyverse package has a function called lst that assigns name to the list elements when creating the list.
Post which I use list2env() to assign variable or use the list directly
library(tidyverse)
fun <- function(){
a<-1
b<-2
lst(a,b)
}
list2env(fun(), envir=.GlobalEnv)#unpacks list key-values to variable-values into the current environment
This is only for the sake of completeness and not because I personally prefer it. You can pipe %>% the result, evaluate it with curly braces {} and write variables to the parent environment using double-arrow <<-.
library(tidyverse)
functionReturningTwoValues() %>% {a <<- .[1]; b <<- .[2]}
UPDATE:
Your can also use the multiple assignment operator from the zeallot package:: %<-%
c(a, b) %<-% list(0, 1)
I will post a function that returns multiple objects by way of vectors:
Median <- function(X){
X_Sort <- sort(X)
if (length(X)%%2==0){
Median <- (X_Sort[(length(X)/2)]+X_Sort[(length(X)/2)+1])/2
} else{
Median <- X_Sort[(length(X)+1)/2]
}
return(Median)
}
That was a function I created to calculate the median. I know that there's an inbuilt function in R called median() but nonetheless I programmed it to build other function to calculate the quartiles of a numeric data-set by using the Median() function I just programmed. The Median() function works like this:
If a numeric vector X has an even number of elements (i.e., length(X)%%2==0), the median is calculated by averaging the elements sort(X)[length(X)/2] and sort(X)[(length(X)/2+1)].
If Xdoesn't have an even number of elements, the median is sort(X)[(length(X)+1)/2].
On to the QuartilesFunction():
QuartilesFunction <- function(X){
X_Sort <- sort(X) # Data is sorted in ascending order
if (length(X)%%2==0){
# Data number is even
HalfDN <- X_Sort[1:(length(X)/2)]
HalfUP <- X_Sort[((length(X)/2)+1):length(X)]
QL <- Median(HalfDN)
QU <- Median(HalfUP)
QL1 <- QL
QL2 <- QL
QU1 <- QU
QU2 <- QU
QL3 <- QL
QU3 <- QU
Quartiles <- c(QL1,QU1,QL2,QU2,QL3,QU3)
names(Quartiles) = c("QL (1)", "QU (1)", "QL (2)", "QU (2)","QL (3)", "QU (3)")
} else{ # Data number is odd
# Including the median
Half1DN <- X_Sort[1:((length(X)+1)/2)]
Half1UP <- X_Sort[(((length(X)+1)/2)):length(X)]
QL1 <- Median(Half1DN)
QU1 <- Median(Half1UP)
# Not including the median
Half2DN <- X_Sort[1:(((length(X)+1)/2)-1)]
Half2UP <- X_Sort[(((length(X)+1)/2)+1):length(X)]
QL2 <- Median(Half2DN)
QU2 <- Median(Half2UP)
# Methods (1) and (2) averaged
QL3 <- (QL1+QL2)/2
QU3 <- (QU1+QU2)/2
Quartiles <- c(QL1,QU1,QL2,QU2,QL3,QU3)
names(Quartiles) = c("QL (1)", "QU (1)", "QL (2)", "QU (2)","QL (3)", "QU (3)")
}
return(Quartiles)
}
This function returns the quartiles of a numeric vector by using three methods:
Discarding the median for the calculation of the quartiles when the number of elements of the numeric vector Xis odd.
Keeping the median for the calculation of the quartiles when the number of elements of the numeric vector Xis odd.
Averaging the results obtained by using methods 1 and 2.
When the number of elements in the numeric vector X is even, the three methods coincide.
The result of the QuartilesFunction() is a vector that depicts the first and third quartiles calculated by using the three methods outlined.
With R 3.6.1, I can do the following
fr2v <- function() { c(5,3) }
a_b <- fr2v()
(a_b[[1]]) # prints "5"
(a_b[[2]]) # prints "3"
To obtain multiple outputs from a function and keep them in the desired format you can save the outputs to your hard disk (in the working directory) from within the function and then load them from outside the function:
myfun <- function(x) {
df1 <- ...
df2 <- ...
save(df1, file = "myfile1")
save(df2, file = "myfile2")
}
load("myfile1")
load("myfile2")
I understand how outer() works in R:
> outer(c(1,2,4),c(8,16,32), "*")
[,1] [,2] [,3]
[1,] 8 16 32
[2,] 16 32 64
[3,] 32 64 128
It basically takes 2 vectors, finds the crossproduct of those vectors, and then applies the function to each pair in the crossproduct.
I don't have two vectors, however. I have two lists of matrices:
M = list();
M[[1]] = matrix(...)
M[[2]] = matrix(...)
M[[3]] = matrix(...)
And I want to do an operation on my list of matricies. I want to do:
outer(M, M, "*")
In this case, I want to take the dot product of each combination of matrices I have.
Actually, I am trying to generate a kernel matrix (and I have written a kernel function), so I want to do:
outer(M, M, kernelFunction)
where kernelFunction calculates a distance between my two matrices.
The problem is that outer() only takes "vector" arguments, rather than "list"s etc. Is there a function that does the equivalent of outer() for non-vector entities?
Alternately, I could use a for-loop to do this:
M = list() # Each element in M is a matrix
for (i in 1:numElements)
{
for (j in 1:numElements)
{
k = kernelFunction(M[[i]], M[[j]])
kernelMatrix[i,j] = k;
}
}
but I am trying to avoid this in favor of an R construct (which might be more efficient). (Yes I know I can modify the for-loop to compute the diagonal matrix and save 50% of the computations. But that's not the code that I'm trying to optimize!)
Is this possible? Any thoughts/suggestions?
The outer function actually DOES work on lists, but the function that you provide gets the two input vectors repeated so that they contain all possible combinations...
As for which is faster, combining outer with vapply is 3x faster than the double for-loop on my machine. If the actual kernel function does "real work", the difference in looping speed is probably not so important.
f1 <- function(a,b, fun) {
outer(a, b, function(x,y) vapply(seq_along(x), function(i) fun(x[[i]], y[[i]]), numeric(1)))
}
f2 <- function(a,b, fun) {
kernelMatrix <- matrix(0L, length(a), length(b))
for (i in seq_along(a))
{
for (j in seq_along(b))
{
kernelMatrix[i,j] = fun(a[[i]], b[[j]])
}
}
kernelMatrix
}
n <- 300
m <- 2
a <- lapply(1:n, function(x) matrix(runif(m*m),m))
b <- lapply(1:n, function(x) matrix(runif(m*m),m))
kernelFunction <- function(x,y) 0 # dummy, so we only measure the loop overhead
> system.time( r1 <- f1(a,b, kernelFunction) )
user system elapsed
0.08 0.00 0.07
> system.time( r2 <- f2(a,b, kernelFunction) )
user system elapsed
0.23 0.00 0.23
> identical(r1, r2)
[1] TRUE
Just use the for loop. Any built-in functions will degenerate to that anyway, and you'll lose clarity of expression, unless you carefully build a function that generalises outer to work with lists.
The biggest improvement you could make would be to preallocate the matrix:
M <- list()
length(M) <- numElements ^ 2
dim(M) <- c(numElements, numElements)
PS. A list is a vector.
Although this is an old question, here is another solution that is more in the spirit of the outer function. The idea is to apply outer along the indices of list1 and list2:
cor2 <- Vectorize(function(x,y) {
vec1 <- list1[[x]]
vec2 <- list2[[y]]
cor(vec1,vec2,method="spearman")
})
outer(1:length(list1), 1:length(list2), cor2)