Function argument as a part of the output name - r

Perhaps a silly question, but I can't find any answers to it anywhere (that I've looked :P ). I am trying to create a function with two arguments, these will be vectors (e.g.x=c(a,b,c) and y=c(50,75,100)). I will write a function which calculates all the combinations of these and have the argument used as a part of the output name. E.g.
function(x,y)
df$output_a_50 = a*2+50^2
df$output_a_75 = a*2+75^2
.....
Any suggestions will be appreciated :)

As #Spacedman and others discussed, your problem is that if you pass c(a, b, c) to your function, the names will be lost. The best alternative in my opinion, is to pass a list:
foo <- function(x, y) {
df <- list()
for (xx in names(x)) {
for (yy in y) {
varname <- paste("output", xx, yy, sep = "_")
df[[varname]] <- x[[xx]]*2 + yy^2
}
}
df
}
foo(x = list(a = NA, b = 1, c = 2:3),
y = c(50, 75, 100))
# $output_a_50
# [1] NA
#
# $output_a_75
# [1] NA
#
# $output_a_100
# [1] NA
#
# $output_b_50
# [1] 2502
#
# $output_b_75
# [1] 5627
#
# $output_b_100
# [1] 10002
#
# $output_c_50
# [1] 2504 2506
#
# $output_c_75
# [1] 5629 5631
#
# $output_c_100
# [1] 10004 10006

Related

Function to apply mean on a list of vectors without need to list the vectors (changes to a function call)

I have multiple objects and I need to apply some function to them, in my example mean. But the function call shouldn't include list, it must look like this: my_function(a, b, d).
Advise how to do it please, probably I need quote or substitute, but I'm not sure how to use them.
a <- c(1:15)
b <- c(1:17)
d <- c(1:19)
my_function <- function(objects) {
lapply(objects, mean)
}
my_function(list(a, b, d))
A possible solution:
a <- c(1:15)
b <- c(1:17)
d <- c(1:19)
my_function <- function(...) {
lapply(list(...), mean)
}
my_function(a, b, d)
#> [[1]]
#> [1] 8
#>
#> [[2]]
#> [1] 9
#>
#> [[3]]
#> [1] 10
To still be able to benefit from the other arguments of mean such as na.rm= and trim=, i.e. to generalize, we may match the formalArgs with the dots and split the call accordingly.
my_function <- function(...) {
cl <- match.call()
m <- match(formalArgs(base:::mean.default), names(cl), 0L)
vapply(as.list(cl)[-c(1L, m)], function(x) {
eval(as.call(c(quote(base:::mean.default), list(x), as.list(cl[m]))))
}, numeric(1L))
}
## OP's example
my_function(a, b, d)
# [1] 8 9 10
## generalization:
set.seed(42)
my_function(rnorm(12), rnorm(5), c(NA, rnorm(3)))
# [1] 0.7553736 -0.2898547 NA
set.seed(42)
my_function(rnorm(12), rnorm(5), c(NA, rnorm(3)), na.rm=TRUE)
# 0.7553736 -0.2898547 -1.2589363
set.seed(42)
my_function(rnorm(12), rnorm(5), c(NA, rnorm(3)), na.rm=TRUE, trim=.5)
# 0.5185655 -0.2787888 -2.4404669
Data:
a <- 1:15; b <- 1:17; d <- 1:19

Any speedier way to randomly subset vectors inside a list?

I'm looking for a speedy solution for randomly subsetting vectors nested in a list.
If we simulate the following data, we get a list l that holds 3 million vectors inside, each one is of length 5. But I want the length of each vector to vary. So I thought I should apply a function that randomly subsets each vector. The problem is, this method is not as speedy as I wished.
simulate data: the list l
library(stringi)
set.seed(123)
vec_n <- 15e6
vec_vals <- 1:vec_n
vec_names <- stringi::stri_rand_strings(vec_n, 5)
my_named_vec <- setNames(vec_vals, vec_names)
split_func <- function(x, n) {
unname(split(x, rep_len(1:n, length(x))))
}
l <- split_func(my_named_vec, n = vec_n / 5)
head(l)
#> [[1]]
#> HmPsw Qk8NP Quo3T 8f0GH nZmjN
#> 1 3000001 6000001 9000001 12000001
#>
#> [[2]]
#> 2WtYS ZaHFl 6YjId jbGuA tAG65
#> 2 3000002 6000002 9000002 12000002
#>
#> [[3]]
#> xSgZ6 jM5Uw ujPOc CTV5F 5JRT5
#> 3 3000003 6000003 9000003 12000003
#>
#> [[4]]
#> tF2Kx r4ZCI Ooklo VOLHU M6z6H
#> 4 3000004 6000004 9000004 12000004
#>
#> [[5]]
#> tgdze w8d1B FYERK jlClo NQfsF
#> 5 3000005 6000005 9000005 12000005
#>
#> [[6]]
#> hXaH9 gsY1u CjBwC Oqqty dxJ4c
#> 6 3000006 6000006 9000006 12000006
Now that we have l, I wish to subset each vector randomly: meaning that the number of elements being subsetted (per vector) will be random. So one option is to set the following utility function:
randomly_subset_vec <- function(x) {
my_range <- 1:length(x)
x[-sample(my_range, sample(my_range))]
}
lapply(head(l), randomly_subset_vec)
#> [[1]]
#> Quo3T
#> 6000001
#>
#> [[2]]
#> 6YjId jbGuA
#> 6000002 9000002
#>
#> [[3]]
#> xSgZ6 jM5Uw ujPOc CTV5F
#> 3 3000003 6000003 9000003
#>
#> [[4]]
#> Ooklo
#> 6000004
#>
#> [[5]]
#> named integer(0)
#>
#> [[6]]
#> CjBwC Oqqty dxJ4c
#> 6000006 9000006 12000006
But running this procedure over the entire l takes forever. I've tried using rrapply which is a fast package for dealing with lists, and it takes "only" 110 seconds on my machine.
library(rrapply)
library(tictoc)
tic()
l_subsetted <- rrapply(object = l, f = randomly_subset_vec)
toc()
#> 110.23 sec elapsed
I will be happy with either of the following:
Is there a speedier alternative to:
rrapply(object = l, f = randomly_subset_vec)
Or more generally, is there a speedier way to start with my_named_vec and arrive at l_subsetted?
UPDATE 1 to fix the name behavior in stack for large objects
Your subsets don't include the full set, so this first removes a random element from each vector, then randomly retains all other elements:
library(stringi)
set.seed(123)
vec_n <- 15e6
vec_vals <- 1:vec_n
vec_names <- stringi::stri_rand_strings(vec_n, 5)
my_named_vec <- setNames(vec_vals, vec_names)
split_func <- function(x, n) {
unname(split(x, rep_len(1:n, length(x))))
}
l <- split_func(my_named_vec, n = vec_n / 5)
system.time({
lenl <- lengths(l)
# use stack to unlist the list while keeping the originating list index for each value
vec_names <- names(unlist(l))
blnKeep <- replace(sample(c(FALSE, TRUE), length(vec_names), replace = TRUE), ceiling(runif(length(l))*lenl) + c(0, head(cumsum(lenl), -1)), FALSE)
temp <- stack(setNames(l, seq_along(l)))[blnKeep,]
# re-list
l_subsetted <- unname(split(setNames(temp$values, vec_names[blnKeep]), temp$ind))
})
#> user system elapsed
#> 22.999 0.936 23.934
head(l_subsetted)
#> [[1]]
#> HmPsw nZmjN
#> 1 12000001
#>
#> [[2]]
#> 2WtYS 6YjId
#> 2 6000002
#>
#> [[3]]
#> xSgZ6 jM5Uw ujPOc
#> 3 3000003 6000003
#>
#> [[4]]
#> tF2Kx r4ZCI
#> 4 3000004
#>
#> [[5]]
#> FYERK NQfsF
#> 6000005 12000005
#>
#> [[6]]
#> gsY1u
#> 3000006
Created on 2021-11-01 by the reprex package (v2.0.0)
UPDATE 2 for vectors of uniformly distributed lengths:
#runr is correct in the comments that the above code will result in binomially-distributed vector lengths, while the OP's original code results in uniformly-distributed vector lengths. Below is an example of how to use the same idea to get uniformly-distributed vector lengths. The code is more complex, but the run-time seems to be a bit faster (possibly due to circumventing stack):
library(stringi)
set.seed(123)
vec_n <- 15e6
vec_vals <- 1:vec_n
vec_names <- stringi::stri_rand_strings(vec_n, 5)
my_named_vec <- setNames(vec_vals, vec_names)
split_func <- function(x, n) {
unname(split(x, rep_len(1:n, length(x))))
}
l <- split_func(my_named_vec, n = vec_n / 5)
system.time({
idx <- seq_along(l)
lenl <- lengths(l)
ul <- unlist(l)
# get a random number of elements to remove from each vector
nRemove <- ceiling(runif(length(l))*lenl)
nRemove2 <- nRemove
blnNotEmpty <- nRemove != lenl # will the subset vector have any elements?
blnKeep <- rep(TRUE, length(l))
# loop until the predetermined number of elements have been removed from each vector
while (length(nRemove)) {
# remove a random element from vectors that have too many
ul <- ul[-(ceiling(runif(length(idx))*lenl[idx]) + c(0, head(cumsum(lenl), -1))[idx])]
lenl[idx] <- lenl[idx] - 1L # decrement the vector lengths
blnKeep <- nRemove != 1
idx <- idx[blnKeep]
nRemove <- nRemove[blnKeep] - 1L # decrement the number of elements left to remove
}
l_subsetted <- rep(list(integer(0)), length(l))
l_subsetted[blnNotEmpty] <- unname(split(ul, rep.int(seq_along(l), lenl)))
})
#> user system elapsed
#> 18.396 0.935 19.332
head(l_subsetted)
#> [[1]]
#> Qk8NP Quo3T 8f0GH
#> 3000001 6000001 9000001
#>
#> [[2]]
#> integer(0)
#>
#> [[3]]
#> xSgZ6 ujPOc CTV5F 5JRT5
#> 3 6000003 9000003 12000003
#>
#> [[4]]
#> tF2Kx Ooklo VOLHU
#> 4 6000004 9000004
#>
#> [[5]]
#> tgdze w8d1B jlClo NQfsF
#> 5 3000005 9000005 12000005
#>
#> [[6]]
#> gsY1u CjBwC Oqqty dxJ4c
#> 3000006 6000006 9000006 12000006
# check that vector lengths are uniformly-distributed (lengths of 0-4 are equally likely)
table(lengths(l_subsetted))
#>
#> 0 1 2 3 4
#> 599633 599041 601209 600648 599469
Created on 2021-11-02 by the reprex package (v2.0.1)
Very rough and I'm not particularly proud of this. I'm sure there is a more elegant way but this ran in the matter of seconds on my machine
> # Make some fake data
> out <- lapply(1:3000000, function(i){sample(LETTERS, 5, replace = FALSE)})
> out[1:5]
[[1]]
[1] "D" "H" "C" "Y" "V"
[[2]]
[1] "M" "E" "H" "G" "S"
[[3]]
[1] "R" "P" "O" "L" "M"
[[4]]
[1] "C" "U" "G" "Q" "X"
[[5]]
[1] "Q" "L" "W" "O" "V"
> # Create list with ids to sample
> id <- lapply(1:3000000, function(i){sample(1:5, sample(1:5, 1), replace = FALSE)})
> id[1:5]
[[1]]
[1] 2
[[2]]
[1] 2 3 4 1 5
[[3]]
[1] 4
[[4]]
[1] 5
[[5]]
[1] 1 2
> # Extract the ids from the original data using the id list.
> # Like I said I'm not particularly proud of this but it gets the job
> # done quick enough on my computer
> out <- lapply(1:3000000, function(i){out[[i]][id[[i]]]})
> out[1:5]
[[1]]
[1] "H"
[[2]]
[1] "E" "H" "G" "M" "S"
[[3]]
[1] "L"
[[4]]
[1] "X"
[[5]]
[1] "Q" "L"
Simplify the sampling function:
randomly_subset_vec_2 <- function(x) {
my_range <- length(x)
x[-sample(my_range, sample(my_range, 1))]
}
This alone can give a significant speed-up.
And though I have not tested it, given the problem description, to remove some elements (minus sign before sample) is to keep the others. Why not extract some elements (no minus sign) thereby keeping those?
Simpler and faster: To sample directly from x is the fastest so far.
randomly_subset_vec_3 <- function(x) {
sample(x, sample(length(x), 1))
}
It seems that the largest bottleneck is running all the sample calls, so we could try the following. One way, is the solution by Julius Vainora. First, we generate funFast by Rcpp:
library(inline)
library(Rcpp)
src <-
'
int num = as<int>(size), x = as<int>(n);
Rcpp::NumericVector vx = Rcpp::clone<Rcpp::NumericVector>(x);
Rcpp::NumericVector pr = Rcpp::clone<Rcpp::NumericVector>(prob);
Rcpp::NumericVector rnd = rexp(x) / pr;
for(int i= 0; i<vx.size(); ++i) vx[i] = i;
std::partial_sort(vx.begin(), vx.begin() + num, vx.end(), Comp(rnd));
vx = vx[seq(0, num - 1)] + 1;
return vx;
'
incl <-
'
struct Comp{
Comp(const Rcpp::NumericVector& v ) : _v(v) {}
bool operator ()(int a, int b) { return _v[a] < _v[b]; }
const Rcpp::NumericVector& _v;
};
'
funFast <- cxxfunction(signature(n = "Numeric", size = "integer", prob = "numeric"),
src, plugin = "Rcpp", include = incl)
Then, define an alternative to your randomly_subset_vec using funFast instead of sample:
'randomly_subset_vec_2' <- function(x) {
range <- length(x)
probs <- rep(1/range, range)
o <- funFast(range, size = funFast(range, size = 1, prob = probs), prob = probs)
return(x[-o])
}
tic();obj <- rrapply(object = l, f = randomly_subset_vec_2);toc();
Maybe we can replace randomly_subset_vec with something simpler with sample and sample.int:
lapply(l, function(x) x[sample.int(5, sample(5, 1))])
More efficient is probably to replace the many individual sample calls by a single larger sample call. Below is an approach that samples a large logical matrix keep (since l initially has a rectangular format) and keep only the entries for which keep evaluates to TRUE:
system.time({
keep <- matrix(sample(c(TRUE, FALSE), size = vec_n, replace = TRUE), nrow = 5, ncol = length(l))
l1 <- lapply(seq_along(l), function(i) l[[i]][keep[, i]])
})
#> user system elapsed
#> 8.667 0.448 9.114
head(l1)
#> [[1]]
#> HmPsw Quo3T 8f0GH
#> 1 6000001 9000001
#>
#> [[2]]
#> 2WtYS ZaHFl 6YjId
#> 2 3000002 6000002
#>
#> [[3]]
#> xSgZ6 jM5Uw ujPOc CTV5F 5JRT5
#> 3 3000003 6000003 9000003 12000003
#>
#> [[4]]
#> M6z6H
#> 12000004
#>
#> [[5]]
#> tgdze w8d1B FYERK jlClo NQfsF
#> 5 3000005 6000005 9000005 12000005
#>
#> [[6]]
#> hXaH9 CjBwC Oqqty
#> 6 6000006 9000006
NB: here the order of the entries in l stays the same (i.e. no resampling), also list elements of l1 are not guaranteed to contain at least one value.
I'm putting this in a new answer so as to not further confuse my previous one.
I noticed from some of the comments that the vectors in l are intended to have all the same lengths (5) and that you may not need l at all. It's also a little unclear whether you want the lengths of l_subsetted to be between 0 and 4 or between 0 and 5. You also seem to be interested in the distribution of the lengths of l_subsetted (uniform vs. binomial).
Below is a generic function if length(unique(lengths(l))) == 1. It subsets directly from my_named_vec without creating l. It pretty consistently runs in the 5-13 second range.
set.seed(123)
vec_n <- 15e6L
my_named_vec <- setNames(1:vec_n, stringi::stri_rand_strings(vec_n, 5))
fSub <- function(nv, vecLen = 5L, maxLen = 5L, unif = FALSE) {
# subset each named vector from the list l (l is not generated):
# l <- unname(split(nv, rep_len(seq(length(nv)/vecLen), length(nv))))
# INPUTS:
# nv: named vector whose length is a multiple of vecLen
# vecLen: the length of the vectors in l
# maxLen: the maximum length of the subsetted vectors
# unif: FALSE = binomial subset vector lengths
# TRUE = uniform subset vector lengths
# OUTPUT: a list of named vectors subset from l
nrw <- length(nv)%/%vecLen # length of the output list
# get all possible logical indices for sampling each vector in l
mKeep <- as.matrix(expand.grid(rep(list(c(TRUE, FALSE)), vecLen)), ncol = vecLen)
nKeep <- rowSums(mKeep)
# remove logical indices that would result in vectors greater than maxLen
blnKeep <- nKeep <= maxLen
mKeep <- mKeep[blnKeep,]
nKeep <- nKeep[blnKeep]
if (unif) {
# sample mKeep with non-uniform probability in order to get uniform lengths
iKeep <- sample(length(nKeep), nrw, replace = TRUE, prob = 1/choose(vecLen, nKeep))
} else {
iKeep <- sample(length(nKeep), nrw, replace = TRUE)
}
blnKeep <- c(mKeep[iKeep,])
l <- rep(list(integer(0L)), nrw)
l[iKeep != length(nKeep)] <- unname(split(nv[blnKeep], rep(1:nrw, vecLen)[blnKeep]))
return(l)
}
lbinom5 <- fSub(my_named_vec) # binomial vector lengths (0 to 5)
lunif5 <- fSub(my_named_vec, unif = TRUE) # uniform vector lengths (0 to 5)
lbinom4 <- fSub(my_named_vec, maxLen = 4L) # binomial vector lenghts (0 to 4)
lunif4 <- fSub(my_named_vec, maxLen = 4L, unif = TRUE) # uniform vector lengths (0 to 4)
> microbenchmark::microbenchmark(
+ lbinom5 = {lbinom5 <- fSub(my_named_vec)},
+ lunif5 = {lunif5 <- fSub(my_named_vec, unif = TRUE)},
+ lbinom4 = {lbinom4 <- fSub(my_named_vec, maxLen = 4L)},
+ lunif4 = {lunif4 <- fSub(my_named_vec, maxLen = 4L, unif = TRUE)},
+ times = 10)
Unit: seconds
expr min lq mean median uq max neval
lbinom5 5.974837 8.060281 9.192600 9.014967 10.15609 13.01182 10
lunif5 5.240133 6.618115 9.688577 10.799230 11.44718 12.73518 10
lbinom4 5.082508 6.497218 8.636434 8.656817 11.40678 11.81519 10
lunif4 5.468311 6.639423 8.310269 7.919579 10.28546 11.28075 10
You can try the code below
lapply(
l,
function(x) {
head(sample(x), sample(length(x), 1))
}
)

Actively logging assignments in R

I'm trying to set up a way in R to print details of an each assignment while R code is run. So, for example, if the code x <- 1 is run then x has been assigned 1 will automatically be printed.
Is this possible?
I have two thoughts on how this might be done but can't figure out if either is possible.
redefine the = primitive so that it also prints a message
have an assignment trigger another function to run
one possible solution, but requires editing the code would be
# custom assignment function -----------------------------------------------------------------
`%<-%` <- function (lhs, rhs) {
cl <- match.call()
lhs <- substitute(lhs)
env <- parent.frame()
message("Info: `", lhs, "` defined as `", enquote(cl$rhs)[2], "`")
invisible(eval(assign(x = paste(lhs),
value = rhs,
envir = env))
)
}
# some tests ----------------------------------------------------------------------------------
ad %<-% c(1,2,33)
#> Info: `ad` defined as `c(1, 2, 33)`
ac %<-% 22
#> Info: `ac` defined as `22`
ad %<-% 22
#> Info: `ad` defined as `22`
df <- mtcars
df %<-% mtcars
#> Info: `df` defined as `mtcars`
If you don't want to modify files, you can define a modified source() function to replace the assignments with the newly defined %<-% function.
source_loudly <- function(filePath, ...) {
file_con <- file(filePath, open = "r")
txt <- readLines(file_con)
close(file_con)
txt_mod <- gsub(pattern = "<-", replace = "%<-%", x = txt)
source(textConnection(txt_mod), ...)
}
filePath <- "R/bits/example.R" # point to a local file on your pc
source_loudly(filePath = filePath, echo = T)
Created on 2021-03-19 by the reprex package (v1.0.0)
Here's a getter/setter hack that comes close without costing too much. While it does require you to change existing code, it has the benefit that you can change the initial assignment to list instead of tracer and everything continues to work unchanged.
tracer <- local({
.e <- NULL
function(..., name = "unk") {
.e <<- list(...)
.e$.name <<- name
`class<-`(.e, c("tracer", "environment"))
}
})
`[.tracer` <- `[[.tracer` <- `$.tracer` <- function(x, i) {
cat(sprintf("get: %s\n", deparse(substitute(i))))
NextMethod()
}
`[<-.tracer` <- `[[<-.tracer` <- `$<-.tracer` <- function(x, i, value) {
cat(sprintf("set: %s <- %s\n", deparse(substitute(i)),
substr(paste(deparse(substitute(value)), collapse = " "), 1, 80)))
NextMethod()
}
Notes:
deparse tends to split long lines into a vector of strings; this is mitigated here with paste(..., collapse=" ");
... but long literal values (e.g., frames) can be a bit annoying in the logs, so I arbitrarily chose substr(., 1, 80) as a reasonable size to log.
this hints at one problem I'll expand on below: this doesn't tell you which columns have been modified, just that the object has been updated.
Demonstration with "simple" objects:
quux <- tracer(a=1, b=2:3, d=list(pi, "a"), mt=mtcars[1:2,])
quux$a
# get: "a"
# [1] 1
quux$a <- 11
# set: "a" <- 11
quux$b
# get: "b"
# [1] 2 3
quux$b <- 2:5
# set: "b" <- 2:5
quux$b
# get: "b"
# [1] 2 3 4 5
So far, so good. Now onto the list:
quux$d
# get: "d"
# [[1]]
# [1] 3.141593
# [[2]]
# [1] "a"
quux$d[[1]]
# get: "d"
# [1] 3.141593
quux$d[[1]] <- pi^2
# get: "d"
# set: "d" <- list(9.86960440108936, "a")
The latter needs some explanation, notably about the order of operations. The assignment is really `[[<-`(quux$d, 1, pi^2), which is not traced. This adjusts the first element of the list, and then assigns this new list back to quux$d, where our $<-.tracer sees that full-list reassignment.
That is not completely unreasonable for small objects, but it becomes a little more annoying with larger objects:
quux$mt$cyl
# get: "mt"
# [1] 6 6
quux$mt$cyl <- quux$mt$cyl + 5
# get: "mt"
# get: "mt"
# set: "mt" <- structure(list(mpg = c(21, 21), cyl = c(11, 11), disp = c(160, 160), hp = c(110, 110), drat = c(3.9, 3.9), wt = c(2.62, 2.875 ), qsec = c(16.46, 17.02), vs = c(0, 0), am = c(1, 1), gear = c(4, 4), c
quux$mt$cyl
# get: "mt"
# [1] 11 11
Similarly, for an assignment we see both the first "get" step and then the whole-object-reassignment. (It is cutoff because I used substr(., 1, 80).)
Also, note that in both quux$d and quux$mt, the tracer functions never see the sub-element or column being adjusted. Since R orders the operations as it does, our tracer functions cannot reveal what is going on there (easily).
Now, when you're ready to remove this level of activity-logging, just replace your initial call to tracer(.) with list(.), and all operations continue to work but without logging.
quux <- list(a=1, b=2:3, d=list(pi, "a"), mt=mtcars[1:2,])
quux$a
# [1] 1
quux$a <- 11
quux$b
# [1] 2 3
quux$b <- 2:5
quux$b
# [1] 2 3 4 5
quux$d
# [[1]]
# [1] 3.141593
# [[2]]
# [1] "a"
quux$d[[1]]
# [1] 3.141593
quux$d[[1]] <- pi^2
quux$mt$cyl
# [1] 6 6
quux$mt$cyl <- quux$mt$cyl + 5
quux$mt$cyl
# [1] 11 11

How to rename list objects in self defined function?

quote <- function(namefoo, namebar){
set.seed(3)
foo <- rnorm(n = 5)
bar <- rnorm(n = 5)
return(list(namefoo=foo,namebar=bar))
}
From the above function, If I ran quote(test, test1) the name of the two objects in the list remain as namefoo and namebar instead of what I specified in the function call.
If I just ran the code seperately as:
set.seed(3)
foo <- rnorm(n = 5)
bar <- rnorm(n = 5)
obj <- list(test=foo,test1=bar)
Then obj will return foo and bar with the amended names. How do I make my function do this? I've tried several combinations of including quotes as well, from the function call to the function itself but it doesn't seem to work.
One way is this:
quote <- function(namefoo, namebar){
set.seed(3)
foo <- rnorm(n = 5)
bar <- rnorm(n = 5)
out <- list(foo, bar)
names(out) <- c(namefoo, namebar)
out
}
You can save the list to a variable and then name the elements with names.
# quote('foo', 'bar')
# $namefoo
# [1] -0.9619334 -0.2925257 0.2587882 -1.1521319
# [5] 0.1957828
#
# $namebar
# [1] 0.03012394 0.08541773 1.11661021
# [4] -1.21885742 1.26736872
It's a very bad idea to name your function quote, a very important R function is named just like that.
Use setNames :
fun <- function(namefoo, namebar){
set.seed(3)
foo <- rnorm(n = 5)
bar <- rnorm(n = 5)
setNames(list(foo,bar),c(namefoo, namebar))
}
fun("hi","there")
# $hi
# [1] -0.9619334 -0.2925257 0.2587882 -1.1521319 0.1957828
#
# $there
# [1] 0.03012394 0.08541773 1.11661021 -1.21885742 1.26736872
You might also see this kind of code around too, using more advanced features of rlang / tidyverse :
library(tidyverse)
fun2 <- function(namefoo, namebar){
set.seed(3)
foo <- rnorm(n = 5)
bar <- rnorm(n = 5)
lst(!!namefoo := foo,!!namebar := bar)
}
fun2("hi","there")
# $hi
# [1] -0.9619334 -0.2925257 0.2587882 -1.1521319 0.1957828
#
# $there
# [1] 0.03012394 0.08541773 1.11661021 -1.21885742 1.26736872
We can do
quotefn <- function(...) {
nm <- c(...)
out <- replicate(length(nm), rnorm(n = 5), simplify = FALSE)
names(out) <- nm
out}
quotefn("foo", "bar")
#$foo
#[1] -0.5784837 -0.9423007 -0.2037282 -1.6664748 -0.4844551
#$bar
#[1] -0.74107266 1.16061578 1.01206712 -0.07207847 -1.13678230

Calling setdiff() on multiple vectors

How can I use setdiff() in R to get the elements that are in one vector but not in the others My example is as follows:
dat1 <- c("osa", "bli", "usd", "mnl")
dat2 <- c("mnu", "erd", "usd", "mnl")
dat3 <- c("ssu", "erd", "usd", "mnl")
The following code only returns what is diffrent in dat1 compared to dat2 and dat3:
diffs <- Reduce(setdiff,
list(A = dat1,
B = dat2,
C = dat3
)
How can I modify this code to be able to get all the elements that are uniquely present in on vector compared to the other? Thanks
another solution using setdiff :
myl <- list(A = dat1,
B = dat2,
C = dat3)
lapply(1:length(myl), function(n) setdiff(myl[[n]], unlist(myl[-n])))
[[1]]
[1] "osa" "bli"
[[2]]
[1] "mnu"
[[3]]
[1] "ssu"
a second possibility :
f <- function (...)
{
aux <- list(...)
ind <- rep(1:length(aux), sapply(aux, length))
x <- unlist(aux)
boo <- !(duplicated(x) | duplicated(x, fromLast = T))
split(x[boo], ind[boo])
}
f(dat1, dat2, dat3)
$`1`
[1] "osa" "bli"
$`2`
[1] "mnu"
$`3`
[1] "ssu"
Try this:
all.dat <- list(dat1, dat2, dat3)
from.dat <- rep(seq_along(all.dat), sapply(all.dat, length))
in.dat <- split(from.dat, unlist(all.dat))
in.one.dat <- in.dat[sapply(in.dat, length) == 1]
in.one.dat
# $bli
# [1] 1
# $mnu
# [1] 2
# $osa
# [1] 1
# $ssu
# [1] 3
which tells you what items are found in only one of the dat objects, and which one. If you only care for the names, then finish with: names(in.one.dat).

Resources