In R, what is the most efficient/idiomatic way to count the number of TRUE values in a logical vector? I can think of two ways:
z <- sample(c(TRUE, FALSE), 1000, rep = TRUE)
sum(z)
# [1] 498
table(z)["TRUE"]
# TRUE
# 498
Which do you prefer? Is there anything even better?
The safest way is to use sum with na.rm = TRUE:
sum(z, na.rm = TRUE) # best way to count TRUE values
which gives 1.
There are some problems with other solutions when logical vector contains NA values.
See for example:
z <- c(TRUE, FALSE, NA)
sum(z) # gives you NA
table(z)["TRUE"] # gives you 1
length(z[z == TRUE]) # f3lix answer, gives you 2 (because NA indexing returns values)
Additionally table solution is less efficient (look at the code of table function).
Also, you should be careful with the "table" solution, in case there are no TRUE values in the logical vector. See for example:
z <- c(FALSE, FALSE)
table(z)["TRUE"] # gives you `NA`
or
z <- c(NA, FALSE)
table(z)["TRUE"] # gives you `NA`
Another option which hasn't been mentioned is to use which:
length(which(z))
Just to actually provide some context on the "which is faster question", it's always easiest just to test yourself. I made the vector much larger for comparison:
z <- sample(c(TRUE,FALSE),1000000,rep=TRUE)
system.time(sum(z))
user system elapsed
0.03 0.00 0.03
system.time(length(z[z==TRUE]))
user system elapsed
0.75 0.07 0.83
system.time(length(which(z)))
user system elapsed
1.34 0.28 1.64
system.time(table(z)["TRUE"])
user system elapsed
10.62 0.52 11.19
So clearly using sum is the best approach in this case. You may also want to check for NA values as Marek suggested.
Just to add a note regarding NA values and the which function:
> which(c(T, F, NA, NULL, T, F))
[1] 1 4
> which(!c(T, F, NA, NULL, T, F))
[1] 2 5
Note that which only checks for logical TRUE, so it essentially ignores non-logical values.
Another way is
> length(z[z==TRUE])
[1] 498
While sum(z) is nice and short, for me length(z[z==TRUE]) is more self explaining. Though, I think with a simple task like this it does not really make a difference...
If it is a large vector, you probably should go with the fastest solution, which is sum(z). length(z[z==TRUE]) is about 10x slower and table(z)[TRUE] is about 200x slower than sum(z).
Summing up, sum(z) is the fastest to type and to execute.
Another option is to use summary function. It gives a summary of the Ts, Fs and NAs.
> summary(hival)
Mode FALSE TRUE NA's
logical 4367 53 2076
>
which is good alternative, especially when you operate on matrices (check ?which and notice the arr.ind argument). But I suggest that you stick with sum, because of na.rm argument that can handle NA's in logical vector.
For instance:
# create dummy variable
set.seed(100)
x <- round(runif(100, 0, 1))
x <- x == 1
# create NA's
x[seq(1, length(x), 7)] <- NA
If you type in sum(x) you'll get NA as a result, but if you pass na.rm = TRUE in sum function, you'll get the result that you want.
> sum(x)
[1] NA
> sum(x, na.rm=TRUE)
[1] 43
Is your question strictly theoretical, or you have some practical problem concerning logical vectors?
There's also a package called bit that is specifically designed for fast boolean operations. It's especially useful if you have large vectors or need to do many boolean operations.
z <- sample(c(TRUE, FALSE), 1e8, rep = TRUE)
system.time({
sum(z) # 0.170s
})
system.time({
bit::sum.bit(z) # 0.021s, ~10x improvement in speed
})
I've been doing something similar a few weeks ago. Here's a possible solution, it's written from scratch, so it's kind of beta-release or something like that. I'll try to improve it by removing loops from code...
The main idea is to write a function that will take 2 (or 3) arguments. First one is a data.frame which holds the data gathered from questionnaire, and the second one is a numeric vector with correct answers (this is only applicable for single choice questionnaire). Alternatively, you can add third argument that will return numeric vector with final score, or data.frame with embedded score.
fscore <- function(x, sol, output = 'numeric') {
if (ncol(x) != length(sol)) {
stop('Number of items differs from length of correct answers!')
} else {
inc <- matrix(ncol=ncol(x), nrow=nrow(x))
for (i in 1:ncol(x)) {
inc[,i] <- x[,i] == sol[i]
}
if (output == 'numeric') {
res <- rowSums(inc)
} else if (output == 'data.frame') {
res <- data.frame(x, result = rowSums(inc))
} else {
stop('Type not supported!')
}
}
return(res)
}
I'll try to do this in a more elegant manner with some *ply function. Notice that I didn't put na.rm argument... Will do that
# create dummy data frame - values from 1 to 5
set.seed(100)
d <- as.data.frame(matrix(round(runif(200,1,5)), 10))
# create solution vector
sol <- round(runif(20, 1, 5))
Now apply a function:
> fscore(d, sol)
[1] 6 4 2 4 4 3 3 6 2 6
If you pass data.frame argument, it will return modified data.frame.
I'll try to fix this one... Hope it helps!
I've just had a particular problem where I had to count the number of true statements from a logical vector and this worked best for me...
length(grep(TRUE, (gene.rep.matrix[i,1:6] > 1))) > 5
So This takes a subset of the gene.rep.matrix object, and applies a logical test, returning a logical vector. This vector is put as an argument to grep, which returns the locations of any TRUE entries. Length then calculates how many entries grep finds, thus giving the number of TRUE entries.
Related
I need to create an R function which takes a numeric matrix A of arbitrary format n*m as input and returns a vector that is as long as A's number of columns that contains the sum of the positive elements of each column.
I must do this in 2 ways - the first in a nested loop and the second as a one liner using vector/matrix operations.
So far I have come up with the following code which creates the vector the size of the amounts of columns of matrix A but I can only seem to get it to give me the sum of all positive elements of the matrix instead of each column:
colSumPos(A){
columns <- ncol(A)
v1 <- vector("numeric", columns)
for(i in 1:columns)
{
v1[i] <- sum(A[which(A>0)])
}
}
Could someone please explain how I get the sum of each column separately and then how I can simplify the code to dispose of the nested loop?
Thanks in advance for the help!
We can use apply with MARGIN=2 to loop through the columns and get the sum of elements that are greater than 0
apply(A, 2, function(x) sum(x[x >0], na.rm = TRUE))
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
Or another option is colSums after replacing the values less than or equal to 0 with NA
colSums(A*NA^(A<=0), na.rm = TRUE)
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
Or by more direct approach
colSums(replace(A, A<=0, NA), na.rm = TRUE)
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
Or if there are no NA elements (no need for na.rm=TRUE), we can replace the values that are less than or equal to 0 with 0 and make it compact (as #ikop commented)
colSums(A*(A>0))
#[1] 1.8036685 0.7129192 0.9305136 2.6625824 0.0000000
data
set.seed(24)
A <- matrix(rnorm(25), 5, 5)
You try code folow if you using for loop
sumColum <- function(A){
for(i in 1:nrow(A)){
for(j in 1:ncol(A)){
colSums(replace(A, A<=0, NA), na.rm = TRUE)
}
}
colSums(A)
}
I am relatively new to R, and matrix-based scripting languages in general. I have written this function to return the index's of each row which has a content similar to any another row's content. It is a primitive form of spam reduction that I am developing.
if (!require("RecordLinkage")) install.packages("RecordLinkage")
library("RecordLinkage")
# Takes a column of strings, returns a list of index's
check_similarity <- function(x) {
threshold <- 0.8
values <- NULL
for(i in 1:length(x)) {
values <- c(values, which(jarowinkler(x[i], x[-i]) > threshold))
}
return(values)
}
is there a way that I could write this to avoid the for loop entirely?
We can simplify the code somewhat using sapply.
# some test data #
x = c('hello', 'hollow', 'cat', 'turtle', 'bottle', 'xxx')
# create an x by x matrix specifying which strings are alike
m = sapply(x, jarowinkler, x) > threshold
# set diagonal to FALSE: we're not interested in strings being identical to themselves
diag(m) = FALSE
# And find index positions of all strings that are similar to at least one other string
which(rowSums(m) > 0)
# [1] 1 2 4 5
I.e. this returns the index positions of 'hello', 'hollow', 'turtle', and 'bottle' as being similar to another string
If you prefer, you can use colSums instead of rowSums to get a named vector, but this could be messy if the strings are long:
which(colSums(m) > 0)
# hello hollow turtle bottle
# 1 2 4 5
function(q,b,Data1,Data2){
x<-sum(
ifelse(Data1[13+q,b]/Data1[12+q,b]>Data2[13+q,1]/Data2[12+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[11+q,b]>Data2[13+q,1]/Data2[11+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[10+q,b]>Data2[13+q,1]/Data2[10+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[9+q,b]>Data2[13+q,1]/Data2[9+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[8+q,b]>Data2[13+q,1]/Data2[8+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[7+q,b]>Data2[13+q,1]/Data2[7+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[6+q,b]>Data2[13+q,1]/Data2[6+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[5+q,b]>Data2[13+q,1]/Data2[5+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[4+q,b]>Data2[13+q,1]/Data2[4+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[3+q,b]>Data2[13+q,1]/Data2[3+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[2+q,b]>Data2[13+q,1]/Data2[2+q,1],1,0)+
ifelse(Data1[13+q,b]/Data1[1+q,b]>Data2[13+q,1]/Data2[1+q,1],1,0)
)/12
}
Is there a way to simplify this? (no characters, only numbers in the data sets)
Thank you
Two pieces of knowledge you can combine to improve your code:
Firstly, you can divide a single number by a vector and R will return a vector with elementwise divisions. For example:
5 / c(1,2,3,4,5,6)
# [1] 5.0000000 2.5000000 1.6666667 1.2500000 1.0000000 0.8333333
The numerator on both sides of the inequality are the same all the time, you can use the above. So instead of explicitly calling it for every inequality, you can just call it once.
Secondly, an expression with TRUE or FALSE will be coerced to 1 and 0 when you try to perform arithmetic operations (in your case division, or calculating a mean). Inequalities return TRUE or FALSE values. Explicitly telling R to convert them to 0 and 1 is wasted energy, because R will automatically do it in your last step.
Putting this together in a simplified function:
function(q, b, Data1, Data2){
qseq <- (1:12) + q # Replaces all "q+1", "q+2", ... , "q+12"
dat1 <- Data1[qseq, b] # Replaces all "Data1[q+1, b]", ... "Data1[q+12, b]"
dat2 <- Data2[qseq, 1] # Replaces all "Data2[q+1, 1]", ... "Data2[q+12, 1]"
mean( Data1[13+q, b]/dat1 > Data2[13+q, 1]/dat2 )
this simplify a bit:
function(q,b,Data1,Data2){
data1_num <- Data1[13+q,b]
data2_num <- Data2[13+q,1]
x <- 0
for (i in 1:12) {
x <- x + ((data1_num/Data1[i+q,b]) > (data2_num /Data2[i+q,1]))
}
x <- x /12
#return(x)
}
But If you provide data example, and the output your expecting, i'm sure there is way to simplify it better
I am new in using R.
So I am not sure about how to use apply.
I would like to speed up my function with using apply:
for(i in 1: ncol(exp)){
for (j in 1: length(fe)){
tmp =TRUE
id = strsplit(colnames(exp)[i],"\\.")
if(id == fe[j]){
tmp = FALSE
}
if(tmp ==TRUE){
only = cbind(only,c(names(exp)[i],exp[,i]) )
}
}
}
How can I use the apply function to do this above?
EDIT :
Thank you so much for the very good explanation and sorry for my bad description. You guess everything right, but When wanted to delete matches in fe.
Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe<-LETTERS[1:2]
then the result should be only colnames with 'C'. Everything else should be deleted.
1 C.z
2 11
3 12
4 13
5 14
6 15
7 16
8 17
9 18
10 19
11 20
EDIT : If you only want to delete the columns whose name appear in fe, you can simply do :
Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe<-LETTERS[1:2]
id <- sapply(strsplit(names(Exp),"\\."),
function(i)!i[1] %in% fe)
Exp[id]
This code does exactly what your (updated) for-loop does as well, only a lot more efficient. You don't have to loop through fe, the %in% function is vectorized.
In case the name can appear anywhere between the dots, then
id <- sapply(strsplit(names(Exp),"\\."),
function(i)sum(i %in% fe)==0)
Your code does some very funny things, and I have no clue what exactly you're trying to do. For one, strsplit gives a list, so id == fe[j] will always return false, unless fe[j] is a list itself. And I doubt it is... So I'd correct your code as
id = strsplit(colnames(Exp)[i],"\\.")[[1]][1]
in case you want to compare with everything that is before the dot, or to
id = unlist(strsplit(colnames(Exp)[i],"\\."))
if you want to compare with everything in the string. In that case, you should use %in%instead of == as well.
Second, what you get is a character matrix, which essentially multiplies rows. if all elements in fe[j] are unique, you could as well do :
only <- rbind(names(exp),exp)
only <- do.call(cbind,lapply(mat,function(x)
matrix(rep(x,ncol(exp)-1),nrow=nrow(exp)+1)
))
Assuming that the logic in your code does make sense (as you didn't apply some sample data this is impossible to know), the optimalization runs :
mat <- rbind(names(Exp),Exp)
do.call(cbind,
lapply(mat, function(x){
n <- sum(!fe %in% strsplit(x[1],"\\.")[[1]][1])
matrix(rep(x,n),nrow=nrow(mat))
}))
Note that - in case you are interested if fe[j] appears anywhere in the name - you can change the code to :
do.call(cbind,
lapply(mat, function(x){
n <- sum(!fe %in% unlist(strsplit(x[1],"\\.")))
matrix(rep(x,n),nrow=nrow(mat))
}))
If this doesn't return what you want, then your code doesn't do that either. I checked with following sample data, and all gives the same result :
Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe <- LETTERS[1:4]
The apply() family of functions are convenience functions. They will not necessarily be faster than a well-written for loop or vectorized functions. For example:
set.seed(21)
x <- matrix(rnorm(1e6),5e5,2)
system.time({
yLoop <- x[,1]*0 # preallocate result
for(i in 1:NROW(yLoop)) yLoop[i] <- mean(x[i,])
})
# user system elapsed
# 13.39 0.00 13.39
system.time(yApply <- apply(x, 1, mean))
# user system elapsed
# 16.19 0.28 16.51
system.time(yRowMean <- rowMeans(x))
# user system elapsed
# 0.02 0.00 0.02
identical(yLoop,yApply,yRowMean)
# TRUE
The reason your code is so slow is that--as Gavin pointed out--you're growing your array for every loop iteration. Preallocate the entire array before the loop and you will see a significant speedup.
Still trying to get into the R logic... what is the "best" way to unpack (on LHS) the results from a function returning multiple values?
I can't do this apparently:
R> functionReturningTwoValues <- function() { return(c(1, 2)) }
R> functionReturningTwoValues()
[1] 1 2
R> a, b <- functionReturningTwoValues()
Error: unexpected ',' in "a,"
R> c(a, b) <- functionReturningTwoValues()
Error in c(a, b) <- functionReturningTwoValues() : object 'a' not found
must I really do the following?
R> r <- functionReturningTwoValues()
R> a <- r[1]; b <- r[2]
or would the R programmer write something more like this:
R> functionReturningTwoValues <- function() {return(list(first=1, second=2))}
R> r <- functionReturningTwoValues()
R> r$first
[1] 1
R> r$second
[1] 2
--- edited to answer Shane's questions ---
I don't really need giving names to the result value parts. I am applying one aggregate function to the first component and an other to the second component (min and max. if it was the same function for both components I would not need splitting them).
(1) list[...]<- I had posted this over a decade ago on r-help. Since then it has been added to the gsubfn package. It does not require a special operator but does require that the left hand side be written using list[...] like this:
library(gsubfn) # need 0.7-0 or later
list[a, b] <- functionReturningTwoValues()
If you only need the first or second component these all work too:
list[a] <- functionReturningTwoValues()
list[a, ] <- functionReturningTwoValues()
list[, b] <- functionReturningTwoValues()
(Of course, if you only needed one value then functionReturningTwoValues()[[1]] or functionReturningTwoValues()[[2]] would be sufficient.)
See the cited r-help thread for more examples.
(2) with If the intent is merely to combine the multiple values subsequently and the return values are named then a simple alternative is to use with :
myfun <- function() list(a = 1, b = 2)
list[a, b] <- myfun()
a + b
# same
with(myfun(), a + b)
(3) attach Another alternative is attach:
attach(myfun())
a + b
ADDED: with and attach
I somehow stumbled on this clever hack on the internet ... I'm not sure if it's nasty or beautiful, but it lets you create a "magical" operator that allows you to unpack multiple return values into their own variable. The := function is defined here, and included below for posterity:
':=' <- function(lhs, rhs) {
frame <- parent.frame()
lhs <- as.list(substitute(lhs))
if (length(lhs) > 1)
lhs <- lhs[-1]
if (length(lhs) == 1) {
do.call(`=`, list(lhs[[1]], rhs), envir=frame)
return(invisible(NULL))
}
if (is.function(rhs) || is(rhs, 'formula'))
rhs <- list(rhs)
if (length(lhs) > length(rhs))
rhs <- c(rhs, rep(list(NULL), length(lhs) - length(rhs)))
for (i in 1:length(lhs))
do.call(`=`, list(lhs[[i]], rhs[[i]]), envir=frame)
return(invisible(NULL))
}
With that in hand, you can do what you're after:
functionReturningTwoValues <- function() {
return(list(1, matrix(0, 2, 2)))
}
c(a, b) := functionReturningTwoValues()
a
#[1] 1
b
# [,1] [,2]
# [1,] 0 0
# [2,] 0 0
I don't know how I feel about that. Perhaps you might find it helpful in your interactive workspace. Using it to build (re-)usable libraries (for mass consumption) might not be the best idea, but I guess that's up to you.
... you know what they say about responsibility and power ...
Usually I wrap the output into a list, which is very flexible (you can have any combination of numbers, strings, vectors, matrices, arrays, lists, objects int he output)
so like:
func2<-function(input) {
a<-input+1
b<-input+2
output<-list(a,b)
return(output)
}
output<-func2(5)
for (i in output) {
print(i)
}
[1] 6
[1] 7
I put together an R package zeallot to tackle this problem. zeallot includes a multiple assignment or unpacking assignment operator, %<-%. The LHS of the operator is any number of variables to assign, built using calls to c(). The RHS of the operator is a vector, list, data frame, date object, or any custom object with an implemented destructure method (see ?zeallot::destructure).
Here are a handful of examples based on the original post,
library(zeallot)
functionReturningTwoValues <- function() {
return(c(1, 2))
}
c(a, b) %<-% functionReturningTwoValues()
a # 1
b # 2
functionReturningListOfValues <- function() {
return(list(1, 2, 3))
}
c(d, e, f) %<-% functionReturningListOfValues()
d # 1
e # 2
f # 3
functionReturningNestedList <- function() {
return(list(1, list(2, 3)))
}
c(f, c(g, h)) %<-% functionReturningNestedList()
f # 1
g # 2
h # 3
functionReturningTooManyValues <- function() {
return(as.list(1:20))
}
c(i, j, ...rest) %<-% functionReturningTooManyValues()
i # 1
j # 2
rest # list(3, 4, 5, ..)
Check out the package vignette for more information and examples.
functionReturningTwoValues <- function() {
results <- list()
results$first <- 1
results$second <-2
return(results)
}
a <- functionReturningTwoValues()
I think this works.
There's no right answer to this question. I really depends on what you're doing with the data. In the simple example above, I would strongly suggest:
Keep things as simple as possible.
Wherever possible, it's a best practice to keep your functions vectorized. That provides the greatest amount of flexibility and speed in the long run.
Is it important that the values 1 and 2 above have names? In other words, why is it important in this example that 1 and 2 be named a and b, rather than just r[1] and r[2]? One important thing to understand in this context is that a and b are also both vectors of length 1. So you're not really changing anything in the process of making that assignment, other than having 2 new vectors that don't need subscripts to be referenced:
> r <- c(1,2)
> a <- r[1]
> b <- r[2]
> class(r)
[1] "numeric"
> class(a)
[1] "numeric"
> a
[1] 1
> a[1]
[1] 1
You can also assign the names to the original vector if you would rather reference the letter than the index:
> names(r) <- c("a","b")
> names(r)
[1] "a" "b"
> r["a"]
a
1
[Edit] Given that you will be applying min and max to each vector separately, I would suggest either using a matrix (if a and b will be the same length and the same data type) or data frame (if a and b will be the same length but can be different data types) or else use a list like in your last example (if they can be of differing lengths and data types).
> r <- data.frame(a=1:4, b=5:8)
> r
a b
1 1 5
2 2 6
3 3 7
4 4 8
> min(r$a)
[1] 1
> max(r$b)
[1] 8
If you want to return the output of your function to the Global Environment, you can use list2env, like in this example:
myfun <- function(x) { a <- 1:x
b <- 5:x
df <- data.frame(a=a, b=b)
newList <- list("my_obj1" = a, "my_obj2" = b, "myDF"=df)
list2env(newList ,.GlobalEnv)
}
myfun(3)
This function will create three objects in your Global Environment:
> my_obj1
[1] 1 2 3
> my_obj2
[1] 5 4 3
> myDF
a b
1 1 5
2 2 4
3 3 3
Lists seem perfect for this purpose. For example within the function you would have
x = desired_return_value_1 # (vector, matrix, etc)
y = desired_return_value_2 # (vector, matrix, etc)
returnlist = list(x,y...)
} # end of function
main program
x = returnlist[[1]]
y = returnlist[[2]]
Yes to your second and third questions -- that's what you need to do as you cannot have multiple 'lvalues' on the left of an assignment.
How about using assign?
functionReturningTwoValues <- function(a, b) {
assign(a, 1, pos=1)
assign(b, 2, pos=1)
}
You can pass the names of the variable you want to be passed by reference.
> functionReturningTwoValues('a', 'b')
> a
[1] 1
> b
[1] 2
If you need to access the existing values, the converse of assign is get.
[A]
If each of foo and bar is a single number, then there's nothing wrong with c(foo,bar); and you can also name the components: c(Foo=foo,Bar=bar). So you could access the components of the result 'res' as res[1], res[2]; or, in the named case, as res["Foo"], res["BAR"].
[B]
If foo and bar are vectors of the same type and length, then again there's nothing wrong with returning cbind(foo,bar) or rbind(foo,bar); likewise nameable. In the 'cbind' case, you would access foo and bar as res[,1], res[,2] or as res[,"Foo"], res[,"Bar"]. You might also prefer to return a dataframe rather than a matrix:
data.frame(Foo=foo,Bar=bar)
and access them as res$Foo, res$Bar. This would also work well if foo and bar were of the same length but not of the same type (e.g. foo is a vector of numbers, bar a vector of character strings).
[C]
If foo and bar are sufficiently different not to combine conveniently as above, then you shuld definitely return a list.
For example, your function might fit a linear model and
also calculate predicted values, so you could have
LM<-lm(....) ; foo<-summary(LM); bar<-LM$fit
and then you would return list(Foo=foo,Bar=bar) and then access the summary as res$Foo, the predicted values as res$Bar
source: http://r.789695.n4.nabble.com/How-to-return-multiple-values-in-a-function-td858528.html
Year 2021 and this is something I frequently use.
tidyverse package has a function called lst that assigns name to the list elements when creating the list.
Post which I use list2env() to assign variable or use the list directly
library(tidyverse)
fun <- function(){
a<-1
b<-2
lst(a,b)
}
list2env(fun(), envir=.GlobalEnv)#unpacks list key-values to variable-values into the current environment
This is only for the sake of completeness and not because I personally prefer it. You can pipe %>% the result, evaluate it with curly braces {} and write variables to the parent environment using double-arrow <<-.
library(tidyverse)
functionReturningTwoValues() %>% {a <<- .[1]; b <<- .[2]}
UPDATE:
Your can also use the multiple assignment operator from the zeallot package:: %<-%
c(a, b) %<-% list(0, 1)
I will post a function that returns multiple objects by way of vectors:
Median <- function(X){
X_Sort <- sort(X)
if (length(X)%%2==0){
Median <- (X_Sort[(length(X)/2)]+X_Sort[(length(X)/2)+1])/2
} else{
Median <- X_Sort[(length(X)+1)/2]
}
return(Median)
}
That was a function I created to calculate the median. I know that there's an inbuilt function in R called median() but nonetheless I programmed it to build other function to calculate the quartiles of a numeric data-set by using the Median() function I just programmed. The Median() function works like this:
If a numeric vector X has an even number of elements (i.e., length(X)%%2==0), the median is calculated by averaging the elements sort(X)[length(X)/2] and sort(X)[(length(X)/2+1)].
If Xdoesn't have an even number of elements, the median is sort(X)[(length(X)+1)/2].
On to the QuartilesFunction():
QuartilesFunction <- function(X){
X_Sort <- sort(X) # Data is sorted in ascending order
if (length(X)%%2==0){
# Data number is even
HalfDN <- X_Sort[1:(length(X)/2)]
HalfUP <- X_Sort[((length(X)/2)+1):length(X)]
QL <- Median(HalfDN)
QU <- Median(HalfUP)
QL1 <- QL
QL2 <- QL
QU1 <- QU
QU2 <- QU
QL3 <- QL
QU3 <- QU
Quartiles <- c(QL1,QU1,QL2,QU2,QL3,QU3)
names(Quartiles) = c("QL (1)", "QU (1)", "QL (2)", "QU (2)","QL (3)", "QU (3)")
} else{ # Data number is odd
# Including the median
Half1DN <- X_Sort[1:((length(X)+1)/2)]
Half1UP <- X_Sort[(((length(X)+1)/2)):length(X)]
QL1 <- Median(Half1DN)
QU1 <- Median(Half1UP)
# Not including the median
Half2DN <- X_Sort[1:(((length(X)+1)/2)-1)]
Half2UP <- X_Sort[(((length(X)+1)/2)+1):length(X)]
QL2 <- Median(Half2DN)
QU2 <- Median(Half2UP)
# Methods (1) and (2) averaged
QL3 <- (QL1+QL2)/2
QU3 <- (QU1+QU2)/2
Quartiles <- c(QL1,QU1,QL2,QU2,QL3,QU3)
names(Quartiles) = c("QL (1)", "QU (1)", "QL (2)", "QU (2)","QL (3)", "QU (3)")
}
return(Quartiles)
}
This function returns the quartiles of a numeric vector by using three methods:
Discarding the median for the calculation of the quartiles when the number of elements of the numeric vector Xis odd.
Keeping the median for the calculation of the quartiles when the number of elements of the numeric vector Xis odd.
Averaging the results obtained by using methods 1 and 2.
When the number of elements in the numeric vector X is even, the three methods coincide.
The result of the QuartilesFunction() is a vector that depicts the first and third quartiles calculated by using the three methods outlined.
With R 3.6.1, I can do the following
fr2v <- function() { c(5,3) }
a_b <- fr2v()
(a_b[[1]]) # prints "5"
(a_b[[2]]) # prints "3"
To obtain multiple outputs from a function and keep them in the desired format you can save the outputs to your hard disk (in the working directory) from within the function and then load them from outside the function:
myfun <- function(x) {
df1 <- ...
df2 <- ...
save(df1, file = "myfile1")
save(df2, file = "myfile2")
}
load("myfile1")
load("myfile2")