dataset and UDFs using sapply

dataset and UDFs using sapply - r

I have a dataset (employee) created from a csv, that displays data as given below;
employee[1,]
age name designation
28 Tony Manager
I have created a function that returns a decision based on an input parameter;
loan_eligible_decision <- function(p)
{
if(p$designation == "manager")
{
decision <- "yes"
}
return(decision)
}
when the function is called directly it works fine and gives the result below;
loan_eligible_decision(employee[1,])
gives me output: yes
However when called within an sapply family it throws a reference error;
sapply(data.frame(employee[1,]),loan_eligible_decision(x))
Error in p$marital : $ operator is invalid for atomic vectors
Any suggestions as to what could be a possible workaround/solution?
I have also tried replacing the if condition with;
if(p[[designation]] == "manager")
and called upon the function like so;
sapply(employee['1',],loan_eligible_decision(x))
The error:
Error in loan_eligible_decision(x) : object 'designation' not found

You are calling the function incorrectly. It should be
myfun <- function(x) x^2
sapply(xy, FUN = myfun)
In any case, try inserting a browser() call within the function and inspect what is going on. See ?browser for more info.
myfun <- function(x) {
browser()
x^2
}

Related

Using group_modify to apply function to grouped dataframe

I am trying to apply a function to each group of data in the main dataframe and I decided to use group_modify() (since it returns a dataframe as well). Here is my initial code:
max_conc_fx <- function(df) {
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
However, I keep getting this error message:
Error in as_group_map_function(.f) :
The function must accept at least two arguments. You can use ... to absorb unused components
After some trial and error, I rectified this by adding the argument "..." to my max_conc_fx() function, which leads to this code which works:
max_conc_fx <- function(df, ...) { #x is the rows of data for one shop
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
Can someone explain to me what the dots are actually for in this case? I understood them to be used for representing an arbitrary number of arguments or for passing on additional arguments to other functions, but I do not see both of these events happening here. Do let me know if I am missing out something or if you have a better solution for my code.

The dots don't do much in that case, but there is a condition that requires them in your functions case for group_modify()to work. The function you are passing is getting converted using a helper function as_group_map_function(). This function checks if the function has more than two arguments and if not it should have ... to pass:
## dplyr/R/group_map.R (Lines 2-8)
as_group_map_function <- function(.f) {
.f <- rlang::as_function(.f)
if (length(form <- formals(.f)) < 2 && ! "..." %in% names(form)){
stop("The function must accept at least two arguments. You can use ... to absorb unused components")
}
.f
}
I'm not 100% sure why it is done, but based on a quick peek on the source code it looks like there is a point where they pass two arguments and ... to the 'converted' version of your function (technically there is no conversion that happens – the conversion only takes place if you pass a formula instead of a function...), so my best guess is that is the reason: it needs to have some way of dealing with at least two arguments — if it doesn't need them, then it needs ... to 'absorb' them, otherwise it would fail.

How to check if a variable is passed to a function with or without quotes?

I'm trying to write a R function that can take either quoted or unquoted data frame variable name or vector of variable names as a parameter. The problem is when the user inserts unquoted dataframe column names as function input parameters it results in "object not found" error. How can I check if the variable name is quoted?
I've tried exists(), missing(), substitute() but none of them works for all combinations.
# considering this printfun as something I can't change
#made it just for demosnstration purposeses
printfun <- function(df, ...){
for(item in list(...)){
print(df[item])
}
}
myfun<-function(df,x){
#should check if input is quoted or unquoted here
# substitute works for some cases not all (see below)
new_args<-c(substitute(df),substitute(x))
do.call(printfun,new_args)
}
#sample data
df<-data.frame(abc=1,dfg=2)
#these are working
myfun(df,c("abc"))
myfun(df,c("abc","dfg"))
myfun(df,"abc")
#these are failing with object not found
myfun(df,abc)
myfun(df,c(abc))
I can differentiate the myfun(df,abc) and myfun(df,"abc") with a try Catch block. Although this does not seem very neat.
But I haven't found any way to differentiate the second argument in myfun(df,c(abc)) from myfun(df,abc) ?
Alternatively, can I somehow check if the error comes from missing quotes, as I guess the object not found error might arise also from something else (eg the dataframe name) being mistyped?

This appears to work for all your cases:
myfun<-function(df,x){
sx <- substitute(x)
a <- tryCatch(is.character(x), error = function(e) FALSE)
if (a) {
new_x <- x
} else {
cx <- as.character(sx)
if (is.name(sx)) {
new_x <- cx
} else if (is.call(sx) && cx[1] == "c") {
new_x <- cx[-1]
} else {
stop("Invalid x")
}
}
new_args <- c(substitute(df), as.list(new_x))
do.call(printfun, new_args)
}
However, I feel there is something strange about what you are trying to do.

R: eval parse function call not accessing correct environments

I'm trying to read a function call as a string and evaluate this function within another function. I'm using eval(parse(text = )) to evaluate the string. The function I'm calling in the string doesn't seem to have access to the environment in which it is nested. In the code below, my "isgreater" function finds the object y, defined in the global environment, but can't find the object x, defined within the function. Does anybody know why, and how to get around this? I have already tried adding the argument envir = .GlobalEnv to both of my evals, to no avail.
str <- "isgreater(y)"
isgreater <- function(y) {
return(eval(y > x))
}
y <- 4
test <- function() {
x <- 3
return(eval(parse(text = str)))
}
test()
Error:
Error in eval(y > x) : object 'x' not found

Thanks to #MrFlick and #r2evans for their useful and thought-provoking comments. As far as a solution, I've found that this code works. x must be passed into the function and cannot be a default value. In the code below, my function generates a list of results with the x variable being changed within the function. If anyone knows why this is, I would love to know.
str <- "isgreater(y, x)"
isgreater <- function(y, x) {
return(eval(y > x))
}
y <- 50
test <- function() {
list <- list()
for(i in 1:100) {
x <- i
bool <- eval(parse(text = str))
list <- append(list, bool)
}
return(list)
}
test()
After considering the points made by #r2evans, I have elected to change my approach to the problem so that I do not arrive at this string-parsing step. Thanks a lot, everyone.

I offer the following code, not as a solution, but rather as an insight into how R "works". The code does things that are quite dangerous and should only be examined for its demonstration of how to assert a value for x. Unfortunately, that assertion does destroy the x-value of 3 inside the isgreater-function:
str <- "isgreater(y)"
isgreater <- function(y) {
return(eval( y > x ))
}
y <- 4
test <- function() {
environment(isgreater)$x <- 5
return(eval(parse(text = str) ))
}
test()
#[1] FALSE
The environment<- function is used in the R6 programming paradigm. Take a look at ?R6 if you are interested in working with a more object-oriented set of structures and syntax. (I will note that when I first ran your code, there was an object named x in my workspace and some of my efforts were able to succeed to the extent of not throwing an error, but they were finding that length-10000 vector and filling up my console with logical results until I escaped the console. Yet another argument for passing both x and y to isgreater.)

R: function assigned to object producing unexpected results

EDIT: I solved this one on my own. It had nothing to do with the function object assignment, it was that I was assigning the results to a vector "[]" rather then to a list "[[]]"
here's more reading on the subject: The difference between [] and [[]] notations for accessing the elements of a list or dataframe
I'm trying to filter event data. Depending on what I'm looking at I've got to do the filtering different ways. I've got two functions that I use for filtering (I use them throughout my project, in addition to this instance):
drop_columns <- function(x, ...) {
selectors <- list(...)
return(x[ , -which(names(x) %in% selectors)])
}
filter_by_val <- function(x, col, ...) {
return(x[ which(x[, col] %in% ...), ])
}
Here's the function that choses which function does the filtering, and then executes it. Note that I'm assigning the function to an object called "filter_method":
filter_playtime_data <- function (key_list, data) {
filter_method <- NULL
out_list <- list()
if(key_list$kind == "games") {
filter_method <- function(key_list) {
drop_columns(filter_by_val(data, "GameTitle", key_list), "X")
}
} else if (key_list$kind == "skills") {
filter_method <- function(key_list) {
filter_by_val(data, "Skill", key_list)
}
}
# Separate data with keys
out_list["ELA"] <- filter_method(key_list[["ELA"]])
out_list["MATH"] <- filter_method(key_list[["MATH"]])
out_list["SCI"] <- filter_method(key_list[["SCI"]])
return (out_list)
}
I'm trying to filter data based on "skills" (ie. using filter_by_val) and it's not working as expected. I'm feeding in a data.frame and I'm expecting a data.frame to come out, but instead I'm getting a list of indexes, as if the function is only returning this part of my function: -which(names(x) %in% selectors)
When I run this is the debug browser -- ie. filter_method(key_list[["ELA"]]) -- it works as expected, I get the data frame. But the values held in my output list: out_list[[ELA]] is the list of indexes. Any idea what's happening?

R data tables accessing columns by name

If I have a data table, foo, in R with a column named "date", I can get the vector of date values by the notation
foo[, date]
(Unlike data frames, date doesn't need to be in quotes).
How can this be done programmatically? That is, if I have a variable x whose value is the string "date", then how to I access the column of foo with that name?
Something that sort of works is to create a symbol:
sym <- as.name(x)
v <- foo[, eval(sym)]
...
As I say, that sort of works, but there is something not quite right about it. If that code is inside a function myFun in package myPackage, then it seems that it doesn't work if I explicitly use the package through:
myPackage::myFun(...)
I get an error message saying "undefined columns selected".
[edited] Some more details
Suppose I create a package called myPackage. This package has a single file with the following in it:
library(data.table)
#' export
myFun <- function(table1) {
names1 <- names(table1)
name1 <- names1[[1]]
sym <- as.Name(name1)
table1[, eval(sym)]
}
If I load that function using R Studio, then
myFun(tbl)
returns the first column of the data table tbl.
On the other hand, if I call
myPackage::myFun(tbl)
it doesn't work. It complains about
Error in .subset(x, j) : invalid subscript type 'builtin'
I'm just curious as to why myPackage:: would make this difference.

A quick way which points to a longer way is this:
subset(foo, TRUE, date)
The subset function accepts unquoted symbol/names for its 'subset' and 'select' arguments. (Its author, however, thinks this was a bad idea and suggests we use formulas instead.) This was the jumping off place for sections of Hadley Wickham's Advanced Programming webpages (and book).: http://adv-r.had.co.nz/Computing-on-the-language.html and http://adv-r.had.co.nz/Functional-programming.html . You can also look at the code for subset.data.frame:
> subset.data.frame
function (x, subset, select, drop = FALSE, ...)
{
r <- if (missing(subset))
rep_len(TRUE, nrow(x))
else {
e <- substitute(subset)
r <- eval(e, x, parent.frame())
if (!is.logical(r))
stop("'subset' must be logical")
r & !is.na(r)
}
vars <- if (missing(select))
TRUE
else {
nl <- as.list(seq_along(x))
names(nl) <- names(x)
eval(substitute(select), nl, parent.frame())
}
x[r, vars, drop = drop]
}
The problem with the use of "naked" expressions that get passed into functions is that their evaluation frame is sometimes not what is expected. R formulas, like other functions, carry a pointer to the environment in which they were defined.

I think the problem is that you've defined myFun in your global environment, so it only appeared to work.
I changed as.Name to as.name, and created a package with the following functions:
library(data.table)
myFun <- function(table1) {
names1 <- names(table1)
name1 <- names1[[1]]
sym <- as.name(name1)
table1[, eval(sym)]
}
myFun_mod <- function(dt) {
# dt[, eval(as.name(colnames(dt)[1]))]
dt[[colnames(dt)[1]]]
}
Then, I tested it using this:
library(data.table)
myDt <- data.table(a=letters[1:3],b=1:3)
myFun(myDt)
myFun_mod(myDt)
myFun didn't work
myFun_mod did work
The output:
> library(test)
> myFun(myDt)
Error in eval(expr, envir, enclos) : object 'a' not found
> myFun_mod(myDt)
[1] "a" "b" "c"
then I added the following line to the NAMESPACE file:
import(data.table)
This is what #mnel was talking about with this link:
Using data.table package inside my own package
After adding import(data.table), both functions work.
I'm still not sure why you got the particular .subset error, which is why I went though the effort of reproducing the result...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

dataset and UDFs using sapply - r

You are calling the function incorrectly. It should be myfun <- function(x) x^2 sapply(xy, FUN = myfun) In any case, try inserting a browser() call within the function and inspect what is going on. See ?browser for more info. myfun <- function(x) { browser() x^2 }

Related

Using group_modify to apply function to grouped dataframe

How to check if a variable is passed to a function with or without quotes?

R: eval parse function call not accessing correct environments

R: function assigned to object producing unexpected results

R data tables accessing columns by name

Categories

Resources