Giving arguments from "..." argument to right function in R [duplicate] - r

This question already has answers here:
Split up `...` arguments and distribute to multiple functions
(4 answers)
Closed 6 years ago.
I have a function to compute the correlation of matrix of both categorical and continuous variables:
correlation <- function(matrix, ...) {
xx <- do.call(rbind, lapply(colnames(mtrc), function(ex_i) {
ty_i <- wtype(matrix, ex_i)
yy <- sapply(colnames(mtrc), function(ex_j) {
ty_j <- wtype(matrix, ex_j)
if(ty_i == "numeric" & ty_j == "numeric") {
cor(mtrc[ , c(ex_i, ex_j)], ...)[1, 2]
} else if(ty_i == "factor" & ty_j == "factor") {
cramersV(table(mtrc[ , c(ex_i, ex_j)]), ...)
} else {
fm <- paste(ex_i, "~", ex_j)
if(ty_i == "factor") {
fm <- paste(ex_j, "~", ex_i)
}
fm <- lm(fm, data = mtrc[ , c(ex_i, ex_j)], ...)
lm.beta(fm)
}
})
names(yy) <- colnames(mtrc)
yy
}))
rownames(xx) <- colnames(mtrc)
xx
}
My question is how to pass, properly, the argument ... to cor, cramerV and lm. Since the argument's names of these three functions do not match if the user gives an argument for cor and there is a categorical variable in the matrix, the cramerV or lm raises an error (unused argument...).
So... I'm open to any solution or idea you can have.

I did not realize that there was an excellent question by Richard Scriven at 2014: Split up `...` arguments and distribute to multiple functions, when I made my answer below. So yes, this is a duplicated question. But I will keep my answer here, as it represents what I thought (and what I think).
Original answer
I think this is better, by giving your correlation function a finer control:
correlation <- function(matrix, cor.opt = list(), cramersV.opt = list(), lm.opt = list()) {
xx <- do.call(rbind, lapply(colnames(mtrc), function(ex_i) {
ty_i <- wtype(matrix, ex_i)
yy <- sapply(colnames(mtrc), function(ex_j) {
ty_j <- wtype(matrix, ex_j)
if(ty_i == "numeric" & ty_j == "numeric") {
do.call("cor", c(list(x = mtrc[ , c(ex_i, ex_j)]), cor.opt))[1, 2]
} else if(ty_i == "factor" & ty_j == "factor") {
do.call("cramersV", c(list(x = table(mtrc[ , c(ex_i, ex_j)])), cramersV.opt))
} else {
fm <- paste(ex_i, "~", ex_j)
if(ty_i == "factor") {
fm <- paste(ex_j, "~", ex_i)
}
fm <- do.call("lm", c(list(formula = fm, data = mtrc[ , c(ex_i, ex_j)]), lm.opt))
lm.beta(fm)
}
})
names(yy) <- colnames(mtrc)
yy
}))
rownames(xx) <- colnames(mtrc)
xx
}
You can pass different arguments intended for different functions via arguments cor.opt, cramersV.opt and lm.opt. Then, inside your function correlation, use do.call() for all relevant function call.
Comment
I like #Roland's idea. He chooses to use ..., while splitting list(...) according to formal arguments of different functions. On the other hand, I have asked you to manually specify those arguments into different lists. In the end, both of us ask you to use do.call() for function call.
Roland's idea is broadly applicable, as it is easier to extend to more functions requiring ....

Related

Finding all variables created by assignment - Not working for pairlist

I'm currently doing Advanced-R, 18 Expressions.
Topic is about 18.5.2 Finding all variables created by assignment, but the given code doesn't work in the case of pairlist.
I followed all the given codes, but the results are not quite same with what I expect.
To begin with, in order to figure out what the type of the input, expr_type() is needed.
expr_type <- function(x) {
if(rlang::is_syntactic_literal(x)) {
"constant"
} else if (is.symbol(x)) {
"symbol"
} else if (is.call(x)) {
"call"
} else if (is.pairlist(x)) {
"pairlist"
} else {
typeof(x)
}
}
And the author, hadley, coupled this with a wrapper around the switch function.
switch_expr <- function(x, ...) {
switch(expr_type(x),
...,
stop("Don't know how to handle type ", typeof(x), call. = FALSE)
)
}
In the case of base cases, symbol and constant, is trivial because neither represents assignment.
find_assign_rec <- function(x) {
switch_expr(x,
constant = ,
symbol = character()
)
}
In the case of recursive cases, especially for pairlists, he suggested
flat_map_chr <- function(.x, .f, ...) {
purrr::flatten_chr(purrr::map(.x, .f, ...))
}
So summing up, it follows
find_assign_rec <- function(x) {
switch_expr(x,
# Base cases
constant = ,
symbol = character(),
# Recursive cases
pairlist = flat_map_chr(as.list(x), find_assign_rec),
)
}
find_assign <- function(x) find_assign_rec(enexpr(x))
Then, I expect in the case of pl <- pairlist(x = 1, y = 2), find_assign(pl) should return #> [1] "x" "y"
But the actual output is character(0)
What is wrong with this?

Passing column name and data frame to custom function in R

I am trying to write a function in R that:
1) Receives a data frame and column name as parameters.
2) Performs an operation on the column in the data frame.
func <- function(col, df)
{
col = deparse(substitute(col))
print(paste("Levels: ", levels(df[[col]])))
}
func(Col1, DF)
func(Col2, DF)
mapply(func, colnames(DF)[1:2], DF)
Output
> func(Col1, DF)
[1] "Levels: GREEN" "Levels: YELLOW"
> func(Col2, DF)
[1] "Levels: 0.1" "Levels: 1"
> mapply(func, colnames(DF)[1:2], DF)
Error in `[[.default`(df, col) : subscript out of bounds
Two things :
in your function func, you apply deparse(substitute(col)) to an object col you expected is not a string. So it works with func(Col1, DF). But in your mapply() call, your argument colnames(...) is a string, so it create an error. Same error obtained with func('Col1', DF).
in a mapply() call, all arguments need to be a vector or a list. So you need to use list(df, df), or if you don't want to replicate, remove the argument df of your function func.
This is one alternative that should work:
func <- function(col, df)
{
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, colnames(DF)[1:2], list(DF, DF))
Please have a look at the last comment of #demarsylvain - maybe a copy-paste error on your side, you should have done:
func <- function(col,df) {
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, c('Species', 'Species'), list(iris, iris))
you did:
func <- function(col) {
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, c('Species', 'Species'), list(iris, iris))
Please upvote and accept the solution of #demarsylvain, it works
EDIT to adress your comment:
To have a generic version for an arbitrary list of column names you can use this code, sorry for the loop :)
func <- function(col,df) {
print(paste("Levels: ", levels(df[,col])))
}
cnames = colnames(iris)
i <- 1
l = list()
while(i <= length(cnames)) {
l[[i]] <- iris
i <- i + 1
}
mapply(FUN = func, cnames, l)

R how to pass NULL for optional parameters to function (e.g. in for loop)

I wrote a for loop to test different settings for an ordination function in R (package "vegan", called by "phyloseq"). I have several subsets of my data within a list (sample_subset_list) and therefore, testing different parameters for all these subsets results in many combinations.
The ordination function contains the optional argument formula and I would like to perform my ordinations with and without a formula. I assume NULL would be the correct way to not use the formula parameter? But how do I pass NULL when using a for loop (or apply etc)?
Using the phyloseq example data:
library(phyloseq)
data(GlobalPatterns)
ps <- GlobalPatterns
ps1 <- filter_taxa(ps, function (x) {sum(x > 0) > 10}, prune = TRUE)
ps2 <- filter_taxa(ps, function (x) {sum(x > 0) > 20}, prune = TRUE)
sample_subset_list <- list()
sample_subset_list <- c(ps1, ps2)
I tried:
formula <- c("~ SampleType", NULL)
> formula
[1] "~ SampleType"
ordination_list <- list()
for (current_formula in formula) {
tmp <- lapply(sample_subset_list,
ordinate,
method = "CCA",
formula = as.formula(current_formula))
ordination_list[[paste(current_formula)]] <- tmp
}
this way, formula only consists of "~ SampleType". If I put NULL into ticks, it gets wrongly interpreted as formula:
formula <- c("~ SampleType", "NULL")
Error in parse(text = x, keep.source = FALSE)
What is right way to solve this?
Regarding Lyzander's answer:
# make sure to use (as suggested)
formula <- list("~ SampleType", NULL)
# and not
formula <- list()
formula <- c("~ SampleType", NULL)
You can use a list instead:
formula <- list("~ my_constraint", NULL)
# for (i in formula) print(i)
#[1] "~ my_constraint"
#NULL
If your function takes NULL as an argument for a function you should also do:
ordination_list <- list()
for (current_formula in formula) {
tmp <- lapply(sample_subset_list,
ordinate,
method = "CCA",
formula = if (is.null(current_formula)) NULL else as.formula(current_formula))
ordination_list[[length(ordination_list) + 1]] <- tmp
}

Is it possible to see source code of a value of function

I am using a function from a package. this function returns some values. For example:
k<-dtw(v1,v2, keep.internals=TRUE)
and I can get this value:
k$costMatrix
Does it possible to see the source code of costMatrix? if yes how can I do that?
UPDATE
this is the source code of the function:
function (x, y = NULL, dist.method = "Euclidean", step.pattern = symmetric2,
window.type = "none", keep.internals = FALSE, distance.only = FALSE,
open.end = FALSE, open.begin = FALSE, ...)
{
lm <- NULL
if (is.null(y)) {
if (!is.matrix(x))
stop("Single argument requires a global cost matrix")
lm <- x
}
else if (is.character(dist.method)) {
x <- as.matrix(x)
y <- as.matrix(y)
lm <- proxy::dist(x, y, method = dist.method)
}
else if (is.function(dist.method)) {
stop("Unimplemented")
}
else {
stop("dist.method should be a character method supported by proxy::dist()")
}
wfun <- .canonicalizeWindowFunction(window.type)
dir <- step.pattern
norm <- attr(dir, "norm")
if (!is.null(list(...)$partial)) {
warning("Argument `partial' is obsolete. Use `open.end' instead")
open.end <- TRUE
}
n <- nrow(lm)
m <- ncol(lm)
if (open.begin) {
if (is.na(norm) || norm != "N") {
stop("Open-begin requires step patterns with 'N' normalization (e.g. asymmetric, or R-J types (c)). See papers in citation().")
}
lm <- rbind(0, lm)
np <- n + 1
precm <- matrix(NA, nrow = np, ncol = m)
precm[1, ] <- 0
}
else {
precm <- NULL
np <- n
}
gcm <- globalCostMatrix(lm, step.matrix = dir, window.function = wfun,
seed = precm, ...)
gcm$N <- n
gcm$M <- m
gcm$call <- match.call()
gcm$openEnd <- open.end
gcm$openBegin <- open.begin
gcm$windowFunction <- wfun
lastcol <- gcm$costMatrix[np, ]
if (is.na(norm)) {
}
else if (norm == "N+M") {
lastcol <- lastcol/(n + (1:m))
}
else if (norm == "N") {
lastcol <- lastcol/n
}
else if (norm == "M") {
lastcol <- lastcol/(1:m)
}
gcm$jmin <- m
if (open.end) {
if (is.na(norm)) {
stop("Open-end alignments require normalizable step patterns")
}
gcm$jmin <- which.min(lastcol)
}
gcm$distance <- gcm$costMatrix[np, gcm$jmin]
if (is.na(gcm$distance)) {
stop("No warping path exists that is allowed by costraints")
}
if (!is.na(norm)) {
gcm$normalizedDistance <- lastcol[gcm$jmin]
}
else {
gcm$normalizedDistance <- NA
}
if (!distance.only) {
mapping <- backtrack(gcm)
gcm <- c(gcm, mapping)
}
if (open.begin) {
gcm$index1 <- gcm$index1[-1] - 1
gcm$index2 <- gcm$index2[-1]
lm <- lm[-1, ]
gcm$costMatrix <- gcm$costMatrix[-1, ]
gcm$directionMatrix <- gcm$directionMatrix[-1, ]
}
if (!keep.internals) {
gcm$costMatrix <- NULL
gcm$directionMatrix <- NULL
}
else {
gcm$localCostMatrix <- lm
if (!is.null(y)) {
gcm$query <- x
gcm$reference <- y
}
}
class(gcm) <- "dtw"
return(gcm)
}
but if I write globalCostMatrix I dont get the source code of this function
The easiest way to find how functions work is by looking at the source. You have a good chance that by typing function name in the R console, you will get the function definitions (although not always with good layout, so seeking the source where brackets are present, is a viable option).
In your case, you have a function dtw from the same name package. This function uses a function called globalCostMatrix. If you type that name into R, you will get an error that object was not found. This happens because the function was not exported when the package was created, probably because the author thinks this is not something a regular user would use (but not see!) or to prevent clashes with other packages who may use the same function name.
However, for an interested reader, there are at least two ways to access the code in this function. One is by going to CRAN, downloading the source tarballs and finding the function in the R folder of the tar ball. The other one, easier, is by using getAnywhere function. This will give you the definition of the function just like you're used for other, user accessible functions like dtw.
> library(dtw)
> getAnywhere("globalCostMatrix")
A single object matching ‘globalCostMatrix’ was found
It was found in the following places
namespace:dtw
with value
function (lm, step.matrix = symmetric1, window.function = noWindow,
native = TRUE, seed = NULL, ...)
{
if (!is.stepPattern(step.matrix))
stop("step.matrix is no stepMatrix object")
n <- nrow(lm)
... omitted for brevity
I think you want to see what the function dtw() does with your data. I seems that it creates a data.frame containing a column named costMatrix.
To find out how the data in the column costMatrix was generated, just type and execute dtw (without brackets!). R will show you the source of the function dtw() afterwards.

Simplify ave() or aggregate() with several inputs

How can I write this all in one line?
mydata is a "zoo" series, limit is a numeric vector of the same size
tmp <- ave(coredata(mydata), as.Date(index(mydata)),
FUN = function(x) cummax(x)-x)
tmp <- (tmp < limit)
final <- ave(tmp, as.Date(index(mydata)),
FUN = function(x) cumprod(x))
I've tried to use two vectors as argument to ave(...) but it seems to accept just one even if I join them into a matrix.
This is just an example, but any other function could be use.
Here I need to compare the value of cummax(mydata)-mydata with a numeric vector and
once it surpasses it I'll keep zeros till the end of the day. The cummax is calculated from the beginning of each day.
If limit were a single number instead of a vector (with different possible numbers) I could write it:
ave(coredata(mydata), as.Date(index(mydata)),
FUN = function(x) cumprod((cummax(x) - x) < limit))
But I can't introduce there a vector longer than x (it should have the same length than each day) and I don't know how to introduce it as another argument in ave().
Seems like this routine imposes intraday stoploss based on maxdrawdown. So I assume you want to be able to pass in variable limit as a second argument to your aggregation function which only currently only takes 1 function due to the way ave works.
If putting all this in one line is not an absolute must, I can share a function I've written that generalizes aggregation via "cut variables". Here's the code:
mtapplylist2 <- function(t, IDX, DEF, MoreArgs=NULL, ...)
{
if(mode(DEF) != "list")
{
cat("Definition must be list type\n");
return(NULL);
}
a <- c();
colnames <- names(DEF);
for ( i in 1:length(DEF) )
{
def <- DEF[[i]];
func <- def[1];
if(mode(func) == "character") { func <- get(func); }
cols <- def[-1];
# build the argument to be called
arglist <- list();
arglist[[1]] <- func;
for( j in 1:length(cols) )
{
col <- cols[j];
grp <- split(t[,col], IDX);
arglist[[1+j]] <- grp;
}
arglist[["MoreArgs"]] <- MoreArgs;
v <- do.call("mapply", arglist);
# print(class(v)); print(v);
if(class(v) == "matrix")
{
a <- cbind(a, as.vector(v));
} else {
a <- cbind(a, v);
}
}
colnames(a) <- colnames;
return(a);
}
And you can use it like this:
# assuming you have the data in the data.frame
df <- data.frame(date=rep(1:10,10), ret=rnorm(100), limit=rep(c(0.25,0.50),50))
dfunc <- function(x, ...) { return(cummax(x)-x ) }
pfunc <- function(x,y, ...) { return((cummax(x)-x) < y) }
# assumes you have the function declared in the same namespace
def <- list(
"drawdown" = c("dfunc", "ret"),
"hasdrawdown" = c("pfunc", "ret", "limit")
);
# from R console
> def <- list("drawdown" = c("dfunc", "ret"),"happened" = c("pfunc","ret","limit"))
> dim( mtapplylist2(df, df$date, def) )
[1] 100 2
Notice that the "def" variable is a list containing the following items:
computed column name
vector arg function name as a string
name of the variable in the input data.frame that are inputs into the function
If you look at the guts of "mtapplylist2" function, the key components would be "split" and "mapply". These functions are sufficiently fast (I think split is implemented in C).
This works with functions requiring multiple arguments, and also for functions returning vector of the same size or aggregated value.
Try it out and let me know if this solves your problem.

Resources