I'm trying to create an enclosing function which will:
process some data,
cat() results of that data,
request user input (ie, via readline() ) based on the results of that cat(),
then return a function where one of the argument defaults of the returned function is the value inputted by readline().
Additionally, I'd like the remaining default values of the arguments of the returned function to be user-interpretable. That is, I don't want the defaults to be variable names of variables hidden in the parent environment (this stipulation precludes simple argument passing). Specifically, I'd like arg() to return actual evaluated numbers, etc.
I've cooked up this solution below, but it feels clunky and awkward. Is there a more elegant way of approaching this?
top <- function(year=1990, n.times=NULL){
if(is.null(n.times)){
###in the real function, data would be processed here
###results would be returned via cat and
###the user is prompted return values that reflect a decision
###made from the processed data
n.times <- as.numeric(readline("how many times?"))
}
out <- function(year, n.times){
###in the real function, this is where most of the work would happen
rep(year, n.times)
}
###this entire section below is clunky.
if( !identical( names(formals()), names(formals(out)) ) ){
stop("internal error: mismatching formals")
}
pass.formals <- setdiff( names(formals()), "n.times")
formals(out)[pass.formals] <- formals()[pass.formals]
formals(out)$n.times <- n.times
out
}
x <- top()
x
It looks generally OK to me; there's only a few things I'd do differently.
Is there any reason that the parameters of top() and out() seem to correspond in some
way? ie, do you need the identical check? Not sure, so I took it out. This seems to do what you
want, and is slightly shorter:
top <- function(year=1990, n.times=NULL){
if (is.null(n.times)) {
n.times <- as.numeric(readline("how many times?"))
}
out <- function(year, n.times) {
rep(year, n.times)
}
out.formals = formals(out)
out.formals['n.times'] = n.times
formals(out) = out.formals
out
}
edit: And if you want to use super R magic, you can write
top <- function(year=1990, n.times=NULL){
if (is.null(n.times)) {
n.times <- as.numeric(readline("how many times?"))
}
`formals<-`(
function() {
rep(year, n.times)
},
value=list(year=alist(x=)$x, n.times=n.times)
)
}
edit: You could also use something like DWin suggested (though I couldn't
get it to work with substitute):
out = function(year, n.times) {
rep(year, n.times)
}
`formals<-`(out, v=`[[<-`(formals(out), 'n.times', n.times))
Or using bquote:
eval(bquote(
function(year, n.times=.(n.times)) {
rep(year, n.times)
}
)[-4L])
You have so many options.
Related
I am trying to apply a function to each group of data in the main dataframe and I decided to use group_modify() (since it returns a dataframe as well). Here is my initial code:
max_conc_fx <- function(df) {
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
However, I keep getting this error message:
Error in as_group_map_function(.f) :
The function must accept at least two arguments. You can use ... to absorb unused components
After some trial and error, I rectified this by adding the argument "..." to my max_conc_fx() function, which leads to this code which works:
max_conc_fx <- function(df, ...) { #x is the rows of data for one shop
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
Can someone explain to me what the dots are actually for in this case? I understood them to be used for representing an arbitrary number of arguments or for passing on additional arguments to other functions, but I do not see both of these events happening here. Do let me know if I am missing out something or if you have a better solution for my code.
The dots don't do much in that case, but there is a condition that requires them in your functions case for group_modify()to work. The function you are passing is getting converted using a helper function as_group_map_function(). This function checks if the function has more than two arguments and if not it should have ... to pass:
## dplyr/R/group_map.R (Lines 2-8)
as_group_map_function <- function(.f) {
.f <- rlang::as_function(.f)
if (length(form <- formals(.f)) < 2 && ! "..." %in% names(form)){
stop("The function must accept at least two arguments. You can use ... to absorb unused components")
}
.f
}
I'm not 100% sure why it is done, but based on a quick peek on the source code it looks like there is a point where they pass two arguments and ... to the 'converted' version of your function (technically there is no conversion that happens – the conversion only takes place if you pass a formula instead of a function...), so my best guess is that is the reason: it needs to have some way of dealing with at least two arguments — if it doesn't need them, then it needs ... to 'absorb' them, otherwise it would fail.
I'm trying to write a R function that can take either quoted or unquoted data frame variable name or vector of variable names as a parameter. The problem is when the user inserts unquoted dataframe column names as function input parameters it results in "object not found" error. How can I check if the variable name is quoted?
I've tried exists(), missing(), substitute() but none of them works for all combinations.
# considering this printfun as something I can't change
#made it just for demosnstration purposeses
printfun <- function(df, ...){
for(item in list(...)){
print(df[item])
}
}
myfun<-function(df,x){
#should check if input is quoted or unquoted here
# substitute works for some cases not all (see below)
new_args<-c(substitute(df),substitute(x))
do.call(printfun,new_args)
}
#sample data
df<-data.frame(abc=1,dfg=2)
#these are working
myfun(df,c("abc"))
myfun(df,c("abc","dfg"))
myfun(df,"abc")
#these are failing with object not found
myfun(df,abc)
myfun(df,c(abc))
I can differentiate the myfun(df,abc) and myfun(df,"abc") with a try Catch block. Although this does not seem very neat.
But I haven't found any way to differentiate the second argument in myfun(df,c(abc)) from myfun(df,abc) ?
Alternatively, can I somehow check if the error comes from missing quotes, as I guess the object not found error might arise also from something else (eg the dataframe name) being mistyped?
This appears to work for all your cases:
myfun<-function(df,x){
sx <- substitute(x)
a <- tryCatch(is.character(x), error = function(e) FALSE)
if (a) {
new_x <- x
} else {
cx <- as.character(sx)
if (is.name(sx)) {
new_x <- cx
} else if (is.call(sx) && cx[1] == "c") {
new_x <- cx[-1]
} else {
stop("Invalid x")
}
}
new_args <- c(substitute(df), as.list(new_x))
do.call(printfun, new_args)
}
However, I feel there is something strange about what you are trying to do.
I am currently writing a function that will take an equation as an argument. The function will expect variables to be apart of the column names of data.
mydata <- data.frame(x=c(1,2,3,4),y=c(5,6,7,8), z=c(9,10,11,12))
my_function <- function(data, equ) {
EQU.sub <- deparse(substitute(equ))
#Check if colnames are used
for(i in 1:length(colnames(data)) {
if(str_detect(string = EQU.sub, pattern = colnames(data)[i])) {
#if used, create variable with its name.
assign(x = colnames(data)[i],
value = eval(parse(text = paste("data$",
colnames(data),
sep = ""))))
} else {
warning(paste(colnames[i], "was not used in EQU"))
}
}
df$new.value <- eval(equ)
output <- function(new.equ = equ)
return(df)
}
my_function(data = mydata, equ = x+(y^2))
I know what you may be thinking, this is a big workaround for just doing
mydata$x+(mydata$y^2)
THE ISSUE
The issue is that I want to pass my input of equ into an new function.
new_function <- function(new.equ) {
string <- deparse(substitute(new.equ))
#does some stuff....
return(output) }
however, when changing from execution environment of my_function to new_function, calling deparse(substitute(equ)) returns "equ" instead of "x+(y^2)"
I know that the function substitute returns what was explicitly assigned to the variable. (equ) but I am wondering if there is a way for new_function() to be able to see into the execution environment of my_function() so I can get the desired output of "x+(y^2)"
UPDATE
After thinking about it, I could change what I pass to new.equ to the deparsed version of equ as follows...
output <- function(new.equ = EQU.sub)
new_function <- function(new.equ) {
#given that these variables are available
value <- parse(text = new.equ)
#does some stuff....
return(output) }
but my original question still stands because I'm still new to R environments. Is there a more elegant way to go through execution environments?
Using non-standard evaulation like this can be pretty messy. Rather than trying to capture expressions from promises passed to functions, it's much safer just to pass a formula. For example
mydata <- data.frame(x=c(1,2,3,4),y=c(5,6,7,8), z=c(9,10,11,12))
my_function <- function(data, equ) {
stopifnot(inherits(equ, "formula"))
eval(equ[[2]], data)
}
new_function <- function(newequ) {
my_function(mydata, newequ)
}
my_function(mydata, ~x+(y^2))
new_function(~x+(y^2))
Or give your function an extra parameter where you can pass an expression instead so you don't have to rely on a promise. This makes it much easier to write other functions that can call your function.
my_function <- function(data, equ, .equ=substitute(equ)) {
eval(.equ, data)
}
new_function <- function(newequ) {
equ <- substitute(newequ)
my_function(mydata, .equ=equ)
}
my_function(mydata, x+(y^2))
new_function(x+(y^2))
my_function(mydata, .equ=quote(x+(y^2)))
I regularly come up against the issue of how to categorise dataframes from a list of dataframes according to certain values within them (E.g. numeric, factor strings, etc). I am using a simplified version using vectors here.
After writing messy for loops for this task a bunch of times, I am trying to write a function to repeatedly solve the problem. The code below returns a subscripting error (given at the bottom), however I don't think this is a subscripting problem, but to do with my use of return.
As well as fixing this, I would be very grateful for any pointers on whether there are any cleaner / better ways to code this function.
library(plyr)
library(dplyr)
#dummy data
segmentvalues <- c('1_P', '2_B', '3_R', '4_M', '5_D', '6_L')
trialvec <- vector()
for (i in 1:length(segmentvalues)){
for (j in 1:20) {
trialvec[i*j] <- segmentvalues[i]
}
}
#vector categorisation
vcategorise <- function(categories, data) {
#categorises a vector into a list of vectors
#requires plyr and dyplyr
assignment <- list()
catlength <- length(categories)
for (i in 1:length(catlength)){
for (j in 1:length(data)) {
if (any(contains(categories[i], ignore.case = TRUE,
as.vector(data[j])))) {
assignment[[i]][j] <- data[j]
}
}
}
return (assignment)
}
result <- vcategorise(categories = segmentvalues, data = trialvec)
Error in *tmp*[[i]] : subscript out of bounds
You are indexing assignments -- which is ok, even if at an index that doesn't have a value, that just gives you NULL -- and then indexing into what you get there -- which won't work if you get NULL. And NULL you will get, because you haven't allocated the list to be the right size.
In any case, I don't think it is necessary for you to allocate a table. You are already using a flat indexing structure in your test data generation, so why not do the same with assignment and then set its dimensions afterwards?
Something like this, perhaps?
vcategorise <- function(categories, data) {
assignment <- vector("list", length = length(data) * length(categories))
n <- length(data)
for (i in 1:length(categories)){
for (j in 1:length(data)) {
assignment[(i-1)*n + j] <-
if (any(contains(categories[i],
ignore.case = TRUE,
as.vector(data[j])))) {
data[j]
} else {
NA
}
}
}
dim(assignment) <- c(length(data), length(categories))
assignment
}
It is not the prettiest code, but without fully understanding what you want to achieve, I don't know how to go further.
I am working on a function that would behave similar to Reduce where you pass in a function and dispatch it over the arguments. Here is simple example to demonstrate what I am working on.
fun <- function(f){
switch(f,
`+` = "addition",
stop("undefined")
)
}
fun(`+`)
Now this clearly won't work as it stands because switch requires a character or numeric EXPR. What I don't know how to do is convert the function f that is passed to fun to a string.
One approach is to capture the input and deparse the call.
fun <- function(f){
switch(deparse(substitute(f)),
`+` = "addition",
stop("undefined")
)
}
fun(`+`)
#[1] "addition"
Going off of Pierre's comment above, one can use identical to test whether two functions are the same. This doesn't work well with switch, but an if/else tree is still relatively simple:
fun <- function(f) {
if (identical(f, `+`)) {
return('addition')
} else if (identical(f, mean)) {
return('mean')
} else {
return('undefined')
}
}