Create a generalized function in R - r

a<-c(0,1,1,0)
b<-c(1,0,0,0)
c<-c(0,0,0,1)
binary_subset<-function(a){
a_seq = lapply(a, seq, 0) # keep 0s as 0, make 1s c(1, 0)
subset=do.call(expand.grid, a_seq)
colnames(subset)=(1:length(a))
return(subset)
}
test_fun<-function(a,b,c,d){
list <- list(a,b,c,d)
interactions_abc<-do.call("rbind",lapply(list, binary_subset))
interactions_no_duplicate<-unique(interactions_abc[1:length(a)])
rownames(interactions_no_duplicate)=1:nrow(interactions_no_duplicate)
interactions_no_duplicate
}
>test_fun(a,b,c,d)
Error in test_fun(a, b, c, d) : object 'd' not found
I am trying to write a function where the input is not fixed. I have defined the function for 4 binary vectors. If I input 3 binary vectors, I am getting an error because the 4th vector is missing. This will work only if I input 4 binary vectors.
How can I fix this? Means: if I input 2 or 3 vectors, the function will produce the corresponding output; that means the function will run for two vectors and ignore the rest.

Maybe you can use ... for the function arguments, e.g.,
test_fun <- function(...) {
list <- list(...)
interactions_abc <- do.call("rbind", lapply(list, binary_subset))
interactions_no_duplicate <- unique(interactions_abc[1:length(list[[1]])])
rownames(interactions_no_duplicate) <- 1:nrow(interactions_no_duplicate)
interactions_no_duplicate
}

Consider passing in a list as single, sole argument by retrieving all numeric vectors from global environment with eapply and Filter. Below functions are re-factored for one line where { and } are optional.
a <- c(0,1,1,0)
b <- c(1,0,0,0)
c <- c(0,0,0,1)
binary_subset <- function(x) {
setNames(do.call(expand.grid, lapply(x, seq, 0)), 1:length(a))
}
test_fun <- function(mylist) {
data.frame(unique(do.call("rbind", lapply(mylist, binary_subset))),
row.names = NULL, check.names = FALSE)
}
vecs <- Filter(is.numeric, eapply(.GlobalEnv, identity))
test_fun(vecs)
Online Demo

Related

How to write a function with an unspecified number of arguments where the arguments are column names

I am trying to write a function with an unspecified number of arguments using ... but I am running into issues where those arguments are column names. As a simple example, if I want a function that takes a data frame and uses within() to make a new column that is several other columns pasted together, I would intuitively write it as
example.fun <- function(input,...){
res <- within(input,pasted <- paste(...))
res}
where input is a data frame and ... specifies column names. This gives an error saying that the column names cannot be found (they are treated as objects). e.g.
df <- data.frame(x = c(1,2),y=c("a","b"))
example.fun(df,x,y)
This returns "Error in paste(...) : object 'x' not found "
I can use attach() and detach() within the function as a work around,
example.fun2 <- function(input,...){
attach(input)
res <- within(input,pasted <- paste(...))
detach(input)
res}
This works, but it's clunky and runs into issues if there happens to be an object in the global environment that is called the same thing as a column name, so it's not my preference.
What is the correct way to do this?
Thanks
1) Wrap the code in eval(substitute(...code...)) like this:
example.fun <- function(data, ...) {
eval(substitute(within(data, pasted <- paste(...))))
}
# test
df <- data.frame(x = c(1, 2), y = c("a", "b"))
example.fun(df, x, y)
## x y pasted
## 1 1 a 1 a
## 2 2 b 2 b
1a) A variation of that would be:
example.fun.2 <- function(data, ...) {
data.frame(data, pasted = eval(substitute(paste(...)), data))
}
example.fun.2(df, x, y)
2) Another possibility is to convert each argument to a character string and then use indexing.
example.fun.3 <- function(data, ...) {
vnames <- sapply(substitute(list(...))[-1], deparse)
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.3(df, x, y)
3) Other possibilities are to change the design of the function and pass the variable names as a formula or character vector.
example.fun.4 <- function(data, formula) {
data.frame(data, pasted = do.call("paste", get_all_vars(formula, data)))
}
example.fun.4(df, ~ x + y)
example.fun.5 <- function(data, vnames) {
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.5(df, c("x", "y"))

Keep last n characters of cells in a function in R

Consider the following data.frame:
df <- setNames(data.frame(rep("text_2010"),rep(1,5)), c("id", "value"))
I only want to keep the 4 last characters of the cells in the column "id". Therefore, I can use the following code:
df$id <- substr(df$id,nchar(df$id)-3,nchar(df$id))
However, I want to create a function that does the same. Therefore, I create the following function and apply it:
testfunction <- function(x) {
x$id <- substr(x$id,nchar(x$id)-3,nchar(x$id))
}
df <- testfunction(df)
But I do not get the same result. Why is that?
Add return(x) in your function to return the changed object.
testfunction <- function(x) {
x$id <- substr(x$id,nchar(x$id)-3,nchar(x$id))
return(x)
}
df <- testfunction(df)
However, you don't need an explicit return statement always (although it is better to have one). R by default returns the last line in your function so here you can also do
testfunction <- function(x) {
transform(x, id = substring(id, nchar(id)-3))
}
df <- testfunction(df)
which should work the same.
We can also create a function that takes an argument n (otherwise, the function would be static for the n and only useful as a dynamic function for different data) and constructs a regex pattern to be used with sub
testfunction <- function(x, n) {
pat <- sprintf(".*(%s)$", strrep(".", n))
x$id <- sub(pat, "\\1", x$id)
return(x)
}
-testing
testfunction(df, n = 4)
# id value
#1 2010 1
#2 2010 1
#3 2010 1
#4 2010 1
#5 2010 1
Base R solution attempting to mirror Excel's RIGHT() function:
# Function to extract the right n characters from each element of a provided vector:
right <- function(char_vec, n = 1){
# Check if vector provided isn't of type character:
if(!is.character(char_vec)){
# Coerce it, if not: char_vec => character vector
char_vec <- vapply(char_vec, as.character, "character")
}
# Store the number of characters in each element of the provided vector:
# num_chars => integer vector
num_chars <- nchar(char_vec)
# Return the right hand n characters of the string: character vector => Global Env()
return(substr(char_vec, (num_chars + 1) - n, num_chars))
}
# Application:
right(df$id, 4)
Data:
df <- setNames(data.frame(rep("text_2010"),rep(1,5)), c("id", "value"))

Return a named list with various elements from function call

Question
I have a function like this:
myfunc <- function(x){
a1 = 1
a2 = c(2,4)
a3 = data.frame(x = 1:10)
...
an = 'str'
res = list(a1 = a1,a2 = a2,..., an=an)
return(res)
}
As we can see, I return my results with a named list. However, if the number of elements is large, I cannot type a_i = a_i one by one. I use the code snippet below to save half of my time(but I still need to type " around my elements' name, it's a waste of time):
res_short = sapply(c('a1','a2',...,'an'),FUN = function(x){list(get(x))})
return(res_short)
Note that there may not exist a pattern in my elements' name a1,a2,...,an, I just use a1,a2...,an to be simplified.
I think I return with a named list is good, since list can store different types of elements. Is there any other methods to write my function return? I want to be clear and time-saving!
mget Use mget as shown below. To return all variables use mget(ls()) or to return all variables except x use mget(setdiff(ls(), "x")). ls will not return object names that begin with a dot unless the all argument is used, i.e. ls(all = TRUE), which could be used to prevent certain variables from being returned. Another possibility is to use the mode= argument of mget to restrict the objects returned to ones that are numeric, say. See ?mget. Yet another approach to restrict the objects returned is to use Filter on the result of mget. For example, res <- Filter(is.data.frame, mget(ls())) only returns data frames.
myfunc <- function(x){
a1 = 1
a2 = c(2,4)
a3 = data.frame(x = 1:10)
an = 'str'
res = mget(ls(pattern = "^a"))
return(res)
}
myfunc(3) # test
environment Another possibility is to return the environment within the executing function. All objects in the function (not just the ones beginning with a) will be in the environment.
myfunc2 <- function(x) {
a1 = 1
a2 = c(2,4)
a3 = data.frame(x = 1:10)
an = 'str'
res = environment()
return(res)
}
out <- myfunc2(3) # test
out$a
within Another possibility is to use within. Only variables created in the within will be returned. x is used in the within but not created in the within so it is not returned.
myfunc3 <- function(x) {
res <- within(list(), {
a1 <- x
a2 <- BOD
})
return(res)
}
myfunc3(3) # test
Multiple ls Perform an ls() before and after the section creating the variables to be output and then mget the difference.
myfunc4 <- function(x) {
.excl <- ls()
a1 <- x
a2 <- BOD
res <- mget(setdiff(ls(), .excl))
return(res)
}
myfunc4(3) # test
If I understand it correctly, your requirements are very flexible. You have a bunch of variables with names that have no pattern. You want to apply a different computation for each variable. Well, you realize that you do need to type everything in at least once. One approach is to have a list of all possible variable names and their computations. You can then apply all of them, or a subset to your input. Here is an example for 3 names with 3 different computations.
mycomputer = list(
add5 = function(x) {
x + 5
},
mymean = function(x) {
mean(x)
},
square = function(x) {
x*x
}
)
computeall = function(x) {
result = lapply(names(mycomputer), function(f) {
mycomputer[[f]](x)
})
names(result) = names(mycomputer)
result
}
computeall(c(1,2,3))
## $add5
## [1] 6 7 8
##
## $mymean
## [1] 2
##
## $square
## [1] 1 4 9

Function mapped to reduce function to concatenate vectors together

I'm trying to write a function that maps a function to reduce to concatenate a list of vectors together into 1 with the very first entry and the very last entry.
For example,
reduce(list(1:10, 11:20, 21:100), r_cat, .init = NULL)
should return a vector equal to
1:100
This is what I have so far.
r_cat = function(x, y) {
out <- y[[1]]
for(i in seq(2, length(y))) {
out <- x(out, y[[i]])
}
out
}
Any thoughts?
No need to write a new function, unlist solves your problem:
List <- list(1:10, 11:20, 21:100)
unlist(List)
If you want to use Reduce from R base, then you can use c
Reduce("c", List)
You can also get the same result plugging c into reduce from purrr
library(purrr)
reduce(List, c)

set matrix element using apply in R

I am trying to assign the values from the dataframe into a matrix. The columns 2 and 3 are mapped to rows and columns respectively in the matrix. This is not working since the sim.mat is not storing the values.
score <- function(x, sim.mat) {
r <- as.numeric(x[2])
c <- as.numeric(x[3])
sim.mat[r,c] <- as.numeric(x[4])
}
mat <- apply(sim.data, 1, score, sim.mat)
Is this the right approach? If yes how can I get it to work.
No need for apply, try this:
score <- function(x, sim.mat) {
r <- as.numeric(x[[2]])
c <- as.numeric(x[[3]])
sim.mat[cbind(r,c)] <- as.numeric(x[[4]])
sim.mat
}
mat <- score(sim.data, sim.mat)
Check the "Matrices and arrays" section of ?"[" for documentation.
If you really wanted to use apply like you did, you would need your function to modify sim.data in the calling environment, do:
score <- function(x, sim.mat) {
r <- as.numeric(x[2])
c <- as.numeric(x[3])
sim.mat[r,c] <<- as.numeric(x[4])
}
apply(sim.data, 1, score, sim.mat)
sim.mat
This type of programming where functions have side-effects is really not recommended.

Resources