Using ddply inside a function - r

I'm trying to make a function using ddply inside of it. However I can't get to work. This is a dummy example reproducing what I get. Does this have anything to do this bug?
library(ggplot2)
data(diamonds)
foo <- function(data, fac1, fac2, bar) {
res <- ddply(data, .(fac1, fac2), mean(bar))
res
}
foo(diamonds, "color", "cut", "price")

I don't believe this is a bug. ddply expects the name of a function, which you haven't really supplied with mean(bar). You need to write a complete function that calculates the mean you'd like:
foo <- function(data, fac1, fac2, bar) {
res <- ddply(data, c(fac1, fac2), function(x,ind){
mean(x[,ind]},bar)
res
}
Also, you shouldn't pass strings to .(), so I changed that to c(), so that you can pass the function arguments directly to ddply.

There are quite a few things wrong with your code, but the main issue is: you are passing column names as character strings.
Just doing a 'find-and-replace' with your parameters within the function yields:
res <- ddply(diamonds, .("color", "cut"), mean("price"))
If you understand how ddply works (I kind of doubt this, given the rest of the code), you will understand that this is not supposed to work: ignoring the error in the last part (the function), this should be (notice the lack of quotes: the .() notation is nothing more than plyr's way of providing the quotes):
res <- ddply(diamonds, .(color, cut), mean(price))
Fortunately, ddplyalso supports passing its second argument as a vector of characters, i.e. the names of the columns, so (once again disregarding issues with the last parameter), this should become:
foo <- function(data, facs, bar) {
res <- ddply(data, facs, mean(bar))
res
}
foo(diamonds, c("color", "cut"), "price")
Finally: the function you pass to ddply should be a function that takes as its first argument a data.frame, which will each time hold the part of you passed along data.frame (diamonds) for the current values of color and cut. mean("price") or mean(price) are neither. If you insist on using ddply, here's what you need to do:
foo <- function(data, facs, bar) {
res <- ddply(data, facs, function(dfr, colnm){mean(dfr[,colnm])}, bar)
res
}
foo(diamonds, c("color", "cut"), "price")

Related

Using non-standard evaluation to call an argument in a nested function

I am trying to take an argument from a simple function "adder" and then use a loop to look at the effect of incrementing that argument.
I know there must be better approaches, such as building a single function that makes a longer data frame or maybe a nested loop without the second function... so I welcome those!
But what I'm more specifically interested is how to quote(?) and then parse(?) the argument, here called either "a" or "b" (but the function would declare them "arg_to_change") inside the new function, here called "change_of_adder_arguments".
adder <- function(a=1,b=2){
data.frame(t=1:100) %>% mutate(x=a*t, y=b*2)
}
change_of_adder_arguments <- function(arg_to_change) {
output <- list()
arg_to_change_enquo <- enquo(arg_to_change)
for (i in 1:5) {
output[[i]] <- ggplot(adder(!!arg_to_change_enquo := i), aes(x, y)) + geom_point()
}
return(output)
}
change_of_adder_arguments(a)
change_of_adder_arguments(b)
Error: Problem with mutate() input x.
x could not find function ":="
i Input x is a * t.
The nail in the coffin seems to be using the arg_to_change_enquo on the LHS of the assignment operator. I know there are many articles here about non-standard evaluation, but I have tried quote, enquo, bquote, parse/eval, sym, substitute, !!, {{}}, =, :=, assign and combinations of all these with no luck. My instinct is that the answer is in specifying which environment? If anybody knows of any good references that "ELI5" about enviroments, I would greatly appreciate it. Thanks!
You can use do.call and pass the arguments to change as a list.
library(ggplot2)
change_of_adder_arguments <- function(arg_to_change) {
output <- vector('list', 5)
arg_to_change_string <- deparse(substitute(arg_to_change))
for (i in 1:5) {
output[[i]] <- ggplot(do.call(adder, setNames(as.list(i),
arg_to_change_string)), aes(x, y)) + geom_point()
}
return(output)
}
plot <- change_of_adder_arguments(b)

How can create a function using variables in a dataframe

I'm sure the question is a bit dummy (sorry)... I'm trying to create a function using differents variables I have stored in a Dataframe. The function is like that:
mlr_turb <- function(Cond_in, Flow_in, pH_in, pH_out, Turb_in, nm250_i, nm400_i, nm250_o, nm400_o){
Coag = (+0.032690 + 0.090289*Cond_in + 0.003229*Flow_in - 0.021980*pH_in - 0.037486*pH_out
+0.016031*Turb_in -0.026006*nm250_i +0.093138*nm400_o - 0.397858*nm250_o - 0.109392*nm400_o)/0.167304
return(Coag)
}
m4_turb <- mlr_turb(dataset)
The problem is when I try to run my function in a dataframe (with the same name of variables). It doesn't detect my variables and shows this message:
Error in mlr_turb(dataset) :
argument "Flow_in" is missing, with no default
But, actually, there is, also all the variables.
I think I missplace or missing some order in the function that gives it the possibility to take the variables from the dataset. I have searched a lot about that but I have not found any answer...
No dumb questions!
I think you're looking for do.call. This function allows you to unpack values into a function as arguments. Here's a really simple example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15))
# unpack the values into the function using do.call
do.call('myFun', myData)
Output:
[1] 0.3765084 0.6902654 0.9557522 1.1833122 1.3805309
You meet a standard problem when writing R that is related to the question of standard evaluation (SE) vs non standard evaluation (NSE). If you need more elements, you can have a look at this blog post I wrote
I think the most convenient way to write function using variables is to use variable names as arguments of the function.
Let's take again #Muon example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
The question is where R should find the values behind names x, y and z. In a function, R will first look within the function environment (here x,y and z are defined as parameters) then it will look at global environment and then it will look at the different packages attached.
In myFun, R expects vectors. If you give a column name, you will experience an error. What happens if you want to give a column name ? You must say to R that the name you gave should be associated to a value in the scope of a dataframe. You can for instance do something like that:
myFun <- function(df, col1 = "x", col2 = "y", col3 = "z"){
result <- (df[,col1] + df[,col2])/df[,col3]
return(result)
}
You can go far further in that aspect with data.table package. If you start writing functions that need to use variables from a dataframe, I recommend you to start having a look at this package
I like Muon's answer, but I couldn't get it to work if there are columns in the data.frame not in the function. Using the with() function is a simple way to make this work as well...
#Code from Muon:
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15),
a=6:10) #adding a var not used in myFun
# unpack the values into the function using do.call
do.call('myFun', myData)
#generates an error for the unused "a" column
#using with() function:
with(myData, myFun(x, y, z))

R mapply with named arguments

One fear I have when using mapply in R is that I may mess up the order of arguments & hence unconsciously generate garbage results.
mydata<-data.frame(Temperature=foobar,Pressure=foobar2)
myfunction<-function(P,T)
{
....
}
mapply(FUN = myfunction,mydata$Temperature,mydata$Pressure)
Is there a way to utilize named arguments to avoid this sort of error via mapply?
If we need to match the function arguments, name the arguments for Map/mapply with the arguments of the function
mapply(FUN = myfunction,T=mydata$Temperature,P=mydata$Pressure)
We can apply the function directly instead of mapply though (based on the example provided below in my post)
do.call(myfunction, unname(mydata[2:1]))
data
mydata <- data.frame(Temperature = 1:5, Pressure = 16:20)
myfunction <- function(P, T) {P*5 + T*10}

pass a character vector as the funs argument for summarize_at

I'm working on a function that takes an argument funs that is a string of functions to be applied to a set of variables vars. The simplest way to do this seemed to be to use dplyr's summarize_at and the SE version of funs function.
This works with the functions that are built-in R but doesn't seem to work with user-defined functions. It reports an error that it can't find the user-defined function. However, summarize_at works when done "manually."
This function is part of a larger function that produces a box for a Shiny Dashboard. I'd prefer not to have (and have to maintain) a different Shiny module for each type of box (each function and function argument combination).
A minimal reproducible example is below:
# function to compute summary stat
compute_box_value <- function(data, vars, funs) {
f = funs_(funs)
result <- data %>%
summarize_at(.cols = vars, .funs = f)
}
# simple user defined function that gets count of rows with certain values of x
equals <- function(x, test_value) {
sum(x %in% test_value)
}
x <- data.frame(value = sample(1:5, 10, TRUE))
vars <- c("value")
# this works
print(compute_box_value(x, vars, "mean(., na.rm = TRUE)"))
# this works
summarize_at(x, vars, .funs = "equals", test_value = 1)
# this doesn't work (error: couldn't find function equals)
print(compute_box_value(x, vars, "equals(., test_value = 1)"))
You need to use the formula (~) instead of string:
print(compute_box_value(x, vars, ~equals(., test_value = 1)))
Which gives:
# value
#1 3
From the documentation:
It’s best to use a formula because a formula captures both the
expression to evaluate and the environment where the evaluation
occurs. This is important if the expression is a mixture of variables
in a data frame and objects in the local environment:

R: Convert variable name to string in sapply

I have found that to convert a variable name into a string I would use deparse(substitute(x)) where x is my variable name. But what if I want to do this in an sapply function call?
sapply( myDF, function(x) { hist( x, main=VariableNameAsString ) } )
When I use deparse(substitute(x)), I get something like X[[1L]] as the title. I would like to have the actual variable name. Any help would be appreciated.
David
If you need the names, then iterate over the names, not the values:
sapply(names(myDF), function(nm) hist(myDF[[nm]], main=nm))
Alternatively, iterate over both names and values at the same time using mapply or Map:
Map(function(name, values) hist(values, main=name),
names(myDF), myDF)
For the most part, you shouldn't be using deparse and substitute unless you are doing metaprogramming (if you don't know what it is, you're not doing it).
Here is a piece of code that worked for me:
deparse(substitute(variable))

Resources