Pass distribution functions, named in vectors, to a function - r

This question is a follow up on two questions I had answered before:
Create the function
Calculate mean
I have a couple of variables (var1, var2 and var3), which have different distribution functions:
var1_distr1 <- pdqr::as_d(function(x)dnorm(x, mean = 3, sd = 1))
var1_distr2 <- pdqr::as_d(function(x)dnorm(x, mean = 6, sd = 1))
var1_distr3 <- pdqr::as_d(function(x)dnorm(x, mean = 2, sd = 2))
var2_distr1 <- pdqr::as_d(function(x)dnorm(x, mean = 5, sd = 3))
var2_distr2 <- pdqr::as_d(function(x)dnorm(x, mean = 3, sd = 1))
var2_distr3 <- pdqr::as_d(function(x)dnorm(x, mean = 4, sd = 2))
var3_distr1 <- pdqr::as_d(function(x)dnorm(x, mean = 4, sd = 1))
var3_distr2 <- pdqr::as_d(function(x)dnorm(x, mean = 5, sd = 1))
var3_distr3 <- pdqr::as_d(function(x)dnorm(x, mean = 7, sd = 2))
To create proportional distribution function, to match the combination of two or three different variables whith their appropriate probablity functions I have created the next function I learned in the first question:
foo <- function(...){
#set x values
x <- seq(1, 10, by = 1)
#create y values
y <- 1L
for (fun in list(...)) y <- y * fun(x)
#create new PDF
p <- data.frame(x,y)
pdqr::new_d(p, type = "continuous")
}
So, if I want to create a proportional distribution function var2_distr1__var3_distr3 of var2_distr1 and var3_distr3 I can just do this: var2_distr1__var3_distr3 <- foo(var2_distr1, var3_distr3), works like charm.
Now I have per for each variable, per case, I have selected the appropriate distrubution, using a simple if_else, which returns the appropriate distribution in a dataframe like this:
df <- data.frame(var1 = c("var1_distr1", "var1_distr3", "var1_distr1", "var1_distr2", "var1_distr2", "var1_distr1", "var1_distr3"),
var2 = c("var2_distr2", "var2_distr1", "var2_distr2", "var2_distr1", "var2_distr3", "var2_distr3", "var2_distr1"),
var3 = c("var3_distr2", "var3_distr3", "var3_distr1", "var3_distr1", "var3_distr2", "var3_distr3", "var3_distr1"))
If I want the mean for the relavant individual distributions per case for a single variable I can use this
df$var2_distr1_mean <- sapply(mget(df$var2_distr1), pdqr::summ_mean)
df$var3_distr3_mean <- sapply(mget(df$var3_distr3), pdqr::summ_mean)
which I learned in the second question.
However, if I want to get the mean of the proportional distributions given in var1 and var2 I get into trouble.
> df$var1_2_mean <- mapply(pdqr::summ_mean, foo(df$var1, df$var2))
Error in fun(x) : could not find function "fun"
While if I individually pass the distribution functions, this happens:
> df$var1_2_mean <- mapply(summ_mean, foo(var1_distr1, var2_distr2))
Error in dots[[1L]][[1L]] : object of type 'closure' is not subsettable
As suggested by #Limey, if put the PDF's in a list:
PDFS <- list(var1_distr1 = var1_distr1, var1_distr2 = var1_distr2, var1_distr3 = var1_distr3,
var2_distr1 = var2_distr1, var2_distr2 = var2_distr2, var2_distr3 = var2_distr3,
var3_distr1 = var3_distr1, var3_distr2 = var3_distr2, var3_distr3 = var3_distr3)
However, when calling that (using this approach apply-list-of-functions-to-list-of-values) I get this:
> df$var1_2_mean <- foo(sapply(PDFS, mapply, df$var1, df$var2))
Error in (function (x) : unused argument (dots[[2]][[1]])
> sapply(PDFS, mapply, df$var1, df$var2)
Error in (function (x) : unused argument (dots[[2]][[1]])
> sapply(PDFS, mapply, df$var1)
Error: `x` must be 'numeric', not 'character'.
> df$var1_2_mean <- foo(sapply(PDFS, mapply, paste(df$var1, df$var2, sep = ", ")))
Error: `x` must be 'numeric', not 'character'.
> df$var1_2_mean <- summ_mean(foo(sapply(PDFS, mapply, paste(df$var1, df$var2, sep = ", "))))
Error: `x` must be 'numeric', not 'character'.
> df$var1_2_mean <- sapply(foo(mget(mapply(PDFS, sapply, df$var1, df$var2))), pdqr::summ_mean)
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'PDFS' of mode 'function' was not found
> lapply(PDFS, function(x) x())
Error in x() : argument "x" is missing, with no default
I'm still missing something, and I believe it's on vectorisation. Might invoke_map work?

I don't have the pdqr package, so I can't solve your exact problem, but here's a proof-of-concept example that may be helpful. As I mention in comments, you haven't specified your exact use case, but I do feel you are imposing constraints that make your life more difficult than it need be. For example passing function names rather than functions to your summary function, using a data frame rather than a list, etc.
Anyway, start by defining some functions and store them in a list.
foo1 <- function() {"Foo 1"}
foo2 <- function() {"Foo 2"}
foo3 <- function() {"Foo 3"}
funcList <- list(foo1, foo2, foo3)
Now use utils::combn() to generate all combinations of two of these three functions and call each member of each pair in turn.
combn(
funcList,
m=2,
FUN=function(combination) {
lapply(combination, function(x) x())
}
)
Giving
[,1] [,2] [,3]
[1,] "Foo 1" "Foo 1" "Foo 2"
[2,] "Foo 2" "Foo 3" "Foo 3"
combn() takes the list of functions as input. m=2 requests the generation of all combinations of 2 elements from the list. FUN= specifies a function to be applied to each combination. The anonymous function supplied simply takes the supplied combination and simply calls each element of the combination in turn.

Related

apply function for multiple fixed parameter in R

I have created a function, say myfunc, which has 4 parameters, say para1, para2, para3 and para4. In my problem, para1 is a matrix, para2 is a real number, para3 is a vector, and para4 is a real number. The function returns a list. I have defined the function in the following manner :
myfunc <- function(para1, para2, para3 = NULL, para4 = 100){
Body ## Body of the function
return(list("A" = a, "B" = b, "C" = c)
}
Now, let lambda <- c(2,3,6,10). I am trying to write a code so that the function outputs the following :
myfunc(my_data, 2, my_vec, 100)
myfunc(my_data, 3, my_vec, 100)
myfunc(my_data, 6, my_vec, 100)
myfunc(my_data, 10, my_vec, 100)
This can be easily done by a for loop, but I was thinking if we can use apply or sapply or tapply function for this purpose. So, keeping the other parameters fixed, I want outputs of the same function with different values (viz. the values in lambda) of para2. Can this be done ?
I found a quite similar question here, and saw some answers. I followed those answers, but I'm getting an error. I wrote the following code :
myfunc <- function(para1, para2, para3 = NULL, para4 = 100) { Body }
para1 <- my_data
para3 <- my_vec
para4 <- 100
lambda <- c(2,3,6,10)
sapply(lambda, myfunc, para1=para1, para3, para4=para4)
Can I please get some assistance ? Thanks in advance.
We can use lapply to loop over the lambda
lapply(lambda, function(x) myfunc(my_data, x, my_vec, 100))
If we are not using lambda function
lapply(lamdba, myfunc, para1 = my_data, para3 = my_ec, para4 = 100)

apply a function across columns in R

Let's say I have a dataframe (df) in R:
df <- data.frame(x = rnorm(5, mean = 5), u = rnorm(5, mean = 5), y = rnorm(5, mean = 5), z = rnorm(5, mean = 5))
print(df)
I want to get the mean absolute difference (MAD) between the first column (x) and the other columns.
With this function, I can find the MAD between the first column and another (the second for example):
mad <- function(dat){
abs(mean(dat[,1] - dat[,2], na.rm = TRUE))
}
mad(dat = df)
But I want to generalize the function to apply across all of the columns. Changing the function to something like this:
mad <- function(dat) {
abs(mean(dat[,1] - dat[,2:4], na.rm = TRUE))
}
mad(dat = df)
does not work and returns this error: "argument is not numeric or logical: returning NA"
I was thinking of using apply() across the dataframe, as that seems to be the general advice that I've found on here. But I don't understand how to keep the first column constant and subtract the other columns from the first.
We can create the function with two arguments
mad <- function(x, y) abs(mean(x - y, na.rm = TRUE))
and use sapply/lapply to loop over the columns other than 1, apply the mad function by extracting the first column of data with the looped column values
sapply(df[-1], function(x) mad(df[,1], x))
# u y z
#0.003399429 0.991685267 0.710553411
Here is another option without defining mad function:
sapply(abs(df[-1] - df[["x"]]), mean, na.rm = TRUE)

How to make custom function more robust and less error prone

Here is my data:
LoDFs <- list(first = mtcars[, c(1:3)], second = mtcars[, c(4:6)])
row.names(LoDFs[[1]]) <- NULL
row.names(LoDFs[[2]]) <- NULL
Here is my function:
RollapplyMultipleFuncsAndWins <- function(df.val, df.name, window.size, funs, ..., GroupByWindowSize = TRUE){
library(zoo) # REQUIRED FOR rollapply
by.rows <- 1
combinations <- expand.grid(window.size, funs)
combinations <- cbind(combinations, rep(names(funs), each = length(window.size)))
colnames(combinations) <- c("window.size", "func.call", "func.name")
combinations$window.size <- sprintf(paste0("%0", max(nchar(combinations$window.size)), "d"),
combinations$window.size)
LoMs <- apply(combinations, by.rows, function(x) {
rollapply(
df.val,
width = as.numeric(x[["window.size"]]),
by = as.numeric(x[["window.size"]]),
FUN = x[["func.call"]],
align = "left")})
# COLUMN NAMING CONVENTION: column_name.function_name
LoMs <- lapply(seq_along(LoMs), function(x) {
colnames(LoMs[[x]]) <- paste(colnames(LoMs[[x]]),
combinations$func.name[x],
sep=".");
LoMs[[x]] })
# MULTIPLE FUNCTIONS WITH SAME WINDOW SIZE IN ONE DATASETS
# LIST ELEMENTS NAMING CONVENTION: dataset_name.window_size
if (GroupByWindowSize){
df.win.grps <- lapply(unique(combinations$window.size), function(x) { grep(x, combinations$window.size) })
LoMs <- lapply(df.win.grps, function(x){ do.call(cbind, LoMs[x]) })
names(LoMs) <- paste(rep(df.name, each=length(df.win.grps)),
unique(combinations$window.size),
sep=".")
}
# MULTIPLE FUNCTIONS WITH SAME WINDOW SIZE IN MULTIPLE DATASETS
# LIST ELEMENTS NAMING CONVENTION: dataset_name.function_name.window_size
else {
names(LoMs) <- paste(rep(df.name, each=nrow(combinations)),
combinations$func.name,
combinations$window.size,
sep=".")
}
return(LoMs)
}
Purpose of this function is to apply multiple functions with multiple rollings/movings windows size over one dataset. It takes size of rollings/movings and functions as inputs and creates all possible combinations of those values. For example when you pass c(2, 3, 10) as window.size and c(median = median, mean = mean) as funs It will create following combinations (which says that median and mean will be called with rolling/moving window of size 2, 3, 10 for specified dataset):
window.size func.call func.name
1 02 function (x, na.rm = FALSE) , UseMethod("median") median
2 03 function (x, na.rm = FALSE) , UseMethod("median") median
3 10 function (x, na.rm = FALSE) , UseMethod("median") median
4 02 function (x, ...) , UseMethod("mean") mean
5 03 function (x, ...) , UseMethod("mean") mean
6 10 function (x, ...) , UseMethod("mean") mean
Function then returns list of matrices where each matrix corresponds to results obtained using particular window size including results from all functions (if GroupByWindowSize is TRUE) or list of matrices where each matrix corresponds to results obtained using particular window size and particular function (if GroupByWindowSize is FALSE). You can try e.g. following to better understand what I mean:
res_one_def <- RollapplyMultipleFuncsAndWins(LoDFs[[1]], names(LoDFs)[1], c(2, 3), c(median = median, mean = mean))
res_one_non_def <- RollapplyMultipleFuncsAndWins(LoDFs[[1]], names(LoDFs)[1], c(2, 3), c(median = median, mean = mean), GroupByWindowSize=FALSE)
Problem is when I want same window size but multiple functions e.g.:
res_one_def <- RollapplyMultipleFuncsAndWins(LoDFs[[1]], names(LoDFs)[1], c(1), c(median = median, mean = mean))
I've figured out that the problem is with calling LoMs <- apply(combinations, by.rows, function(x) { .... line. Instead of list of matrices (as it previously returns) it now returns one matrix and I do not know why (now the combinations is of same type as before just smaller):
window.size func.call func.name
1 1 function (x, na.rm = FALSE) , UseMethod("median") median
2 1 function (x, ...) , UseMethod("mean") mean
Questions:
Why I get the error described above?
If you check the code you can see that I'm building combinations as expand.grid(window.size, funs) but what if I want to being able to handle also expand.grid(funs, window.size) (notice reordered arguments) will if (GroupByWindowSize){ branch correctly work also in this example (let's pretend that combinations will be passed as argument to function so I want to being able to handle various types)?
Is possible somehow define naming convention for list elements in the beginning of function and easily switch it from dataset_name.window_size to e.g. dataset_name.function_name.window_size in both if-else branches? As you can see now the names(LoMs) ... in both branches is very different, I'm curious if it is possible to make it unique somehow?
How can I make this code more robust and more generic in general, Is my approach correct or is there better way? Any ideas welcomed.

R input names from a character vector to a function formula statement

I am well aware there are much better solutions for the particular problem described below (e.g., cor and rcorr in Hmisc, as discussed here). This is just an illustration for a more general R issue I just can't figure out: passing multiple variable names from a character vector to a formula statement within a function.
Assume there is a dataset consisting of numeric variables.
vect.a <- rnorm(n = 20, mean = 0, sd = 1)
vect.b <- rnorm(n = 20, mean = 0, sd = 1)
vect.c <- rnorm(n = 20, mean = 0, sd = 1)
vect.d <- rnorm(n = 20, mean = 0, sd = 1)
dataset <- data.frame(vect.a, vect.b, vect.c, vect.d)
names(dataset) <- c("var1", "var2", "var3", "var4")
A correlation test has to be performed for each possible pair of variables within this data set, using a formula statement of the type ~ VarA + VarB within the function cor.test:
for (i in 1:(length(names(dataset))-1)){
for (j in (i+1):length(names(dataset))) {
cor.test(~ names(dataset)[i] + names(dataset)[j], data = "dataset")
}
}
which returns an error: invalid 'envir' argument of type 'character'
I assume a character string is incompatible with the formula statement but which class would be compatible with it? If the entire approach is wrong, please explain why and provide or point to an alternative solution. If the approach is somehow "ugly" or "non-R", please explain why.
You get that formula by using as.formula with a string argument.
>> x <- c('x1','x2','x3')
>> f <- as.formula(paste('~ ', x[1], ' + ', x[2]))
>> f
~x1 + x2
>> class(f)
[1] "formula"
There is another issue here, data="dataset" should be data=dataset, since dataset is a name.
> dataset <- data.frame(a=1:5, b=sample(1:5))
> cor.test(~ a + b, data="dataset")
Error in eval(predvars, data, env) :
invalid 'envir' argument of type 'character'
> cor.test(~ a + b, data=dataset)
Pearson's product-moment correlation
...

Error using the "prob" package in an R function

I'm attempting to write a function that uses the prob package to compute conditional probabilities. When using the function I continue to encounter the same error, which states an object within the function cannot be found.
Below is a reproducible example in which I compute a conditional probability without the function and then attempt to use the function to produce the same result. I'm not sure if the error is due to limitations with the prob package or an error on my part.
# Load prob package
library(prob)
# Set seed for reproducibility
set.seed(30)
# Sample data frame
sampledata <- data.frame(
X <- sample(1:10),
Y <- sample(c(-1, 0, 1), 10, replace=TRUE))
# Set probability space
S <- probspace(sampledata)
# Subset Y between -1 and 0
A <- subset(S, Y>=-1 & Y<=0)
# Subset X greater than 6
B <- subset(S, X>6)
# Compute conditional probability
P <- prob(A, given=B)
The above code produces the following probability:
> P
[1] 0.25
Attempting to write a function to calculate the same probability:
# Create function with data frame, variables, and conditional inputs
prob.function <- function(df, variable1, variable2, state1, state2, cond1){
s <- probspace(df)
a <- subset(s, variable1>=state1 & variable1<=state2)
b <- subset(s, variable2>cond1)
p <- prob(a, given=b)
return(p)
}
# Demonstrate the function
test <- prob.function(sampledata, Y, X, -1, 0, 6)
This function gives the following error:
Error in eval(expr, envir, enclos) : object 'b' not found
Any help you can provide would be great.
Thanks!
This looks like a bug in prob.
When I run this in Vanilla R, I get the same error. But when I create an object b in my workspace, the error disapears:
> print(b)
Error in print(b) : object 'b' not found
> test <- prob.function(sampledata, Y, X, -1, 0, 6)
Error in eval(expr, envir, enclos) : object 'b' not found
>
> b <- "dummy variable"
> print(b)
[1] "dummy variable"
> test <- prob.function(sampledata, Y, X, -1, 0, 6)
> test
[1] 0.25
>
As a temporary workaround, just create a dummy b in your current environment.
As for the bug, if you look at the source for prob.default (which in the example above is what prob(a, given=b) is eventually calling), you'll see the following section:
if (missing(given)) {
< cropped >
}
else {
f <- substitute(given)
g <- eval(f, x) <~~~~
if (!is.logical(g)) { <~~~~
if (!is.data.frame(given)) <~~~~
stop("'given' must be data.frame or evaluate to logical")
B <- given
}
...
< cropped >
}
it is jumping from g to given, perhaps inadvertently? I would reach out to the package maintainer, as this may be an oversight.
I don't think this is a bug in package prob.
First, you should create you sampledata as
sampledata <- data.frame(
X = sample(1:10),
Y = sample(c(-1, 0, 1), 10, replace=TRUE))
Your original code creates not only this dataframe but also variables X and Y in the global environment which are actually being used later when you call your function.
Second, you shouldn't call subset() inside a function. Use bracket subsetting instead:
prob.function <- function(df, variable1, variable2, state1, state2, cond1){
s <- probspace(df)
a <- s[s[[variable1]]>=state1 & s[[variable1]]<=state2, ]
b <- s[s[[variable2]]>cond1, ]
p <- prob(a, given=b)
return(p)
}
And pass variable1 and variable2 as strings:
test <- prob.function(sampledata, "Y", "X", -1, 0, 6)
Now you have test==0.25, and no error.
References for what is going on:
http://adv-r.had.co.nz/Computing-on-the-language.html#non-standard-evaluation-in-subset
Assignment operators in R: '=' and '<-'
Why is `[` better than `subset`?

Resources