Using group_modify to apply function to grouped dataframe - r

I am trying to apply a function to each group of data in the main dataframe and I decided to use group_modify() (since it returns a dataframe as well). Here is my initial code:
max_conc_fx <- function(df) {
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
However, I keep getting this error message:
Error in as_group_map_function(.f) :
The function must accept at least two arguments. You can use ... to absorb unused components
After some trial and error, I rectified this by adding the argument "..." to my max_conc_fx() function, which leads to this code which works:
max_conc_fx <- function(df, ...) { #x is the rows of data for one shop
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
Can someone explain to me what the dots are actually for in this case? I understood them to be used for representing an arbitrary number of arguments or for passing on additional arguments to other functions, but I do not see both of these events happening here. Do let me know if I am missing out something or if you have a better solution for my code.

The dots don't do much in that case, but there is a condition that requires them in your functions case for group_modify()to work. The function you are passing is getting converted using a helper function as_group_map_function(). This function checks if the function has more than two arguments and if not it should have ... to pass:
## dplyr/R/group_map.R (Lines 2-8)
as_group_map_function <- function(.f) {
.f <- rlang::as_function(.f)
if (length(form <- formals(.f)) < 2 && ! "..." %in% names(form)){
stop("The function must accept at least two arguments. You can use ... to absorb unused components")
}
.f
}
I'm not 100% sure why it is done, but based on a quick peek on the source code it looks like there is a point where they pass two arguments and ... to the 'converted' version of your function (technically there is no conversion that happens – the conversion only takes place if you pass a formula instead of a function...), so my best guess is that is the reason: it needs to have some way of dealing with at least two arguments — if it doesn't need them, then it needs ... to 'absorb' them, otherwise it would fail.

Related

mockery::mock and mockery::stub do not work properly with quasiquotation?

I've written an import function that gets a single file from an aws s3-bucket.
That function itself is a wrapper around aws.s3::s3read_using() which takes a reading function as its first argument.
Why do I wrap around aws.s3::s3read_using() ? Because I need to do some special error-handling and want the wrapping function to do some Recall() up to a limit... but that's a different story.
Now that i've successfully build and tested my wrapping function i want to do another wrapping arround that:
I want to iterate n times over my wrapper to bind the downloaded files together. I now have the difficulty to hand the 'reading_function' to the FUN argument of aws.s3::s3read_using().
I could do that by simply using ... - BUT!
I want to make clear to the USER of my wrapping wrapper, that he needs to specify that argument.
So I've decided to use rlangs rlang::enexpr() to capture the argument and to hand it over to my first wrapper via !! - which in return captures that argument again with rlang::enexpr() and hands it over - finally - to aws.s3::s3read_using() via rlang::expr(aws.s3::s3read_using(FUN = !!reading_fn, object = s3_object))
That works perfectly fine and smooth. My Problem is with testing that function construct using testthat and mockery
Here is some broadly simplyfied code:
my_workhorse_function <- function(fn_to_work_with, value_to_work_on) {
fn <- rlang::enexpr(fn_to_work_with)
# Some other magic happens here - error handling, condition-checking, etc...
out <- eval(rlang::expr((!!fn)(value_to_work_on)))
}
my_iterating_function <- function(fn_to_iter_with, iterate_over) {
fn <- rlang::enexpr(fn_to_iter_with)
out <- list()
for(i in seq_along(iterate_over)) {
out[[i]] <- my_workhorse_function(!!fn, iterate_over[i])
}
return(out)
}
# Works just fine
my_iterating_function(sqrt, c(9:16))
Now, to the test:
# Throws an ERROR: 'Error in `!fn`: invalid argument type'
test_that("my_iterating_function iterates length(iterate_over) times over my_workhorse_function", {
mock_1 <- mockery::mock(1, cycle = TRUE)
stub(my_iterating_function, "my_workhorse_function", mock_1)
expect_equal(my_iterating_function(sqrt, c(9:16)), list(1,1,1,1,1,1,1,1))
expect_called(mock_1, 8)
})
I've used a workarround, but that just doesn't feel right, even though, it works:
# Test passed
test_that("my_iterating_function iterates length(iterate_over) times over my_workhorse_function", {
mock_1 <- mockery::mock(1, cycle = TRUE)
stub(my_iterating_function, "my_workhorse_function",
function(fn_to_work_with, value_to_work_on) {
fn <- rlang::enexpr(fn_to_work_with)
out <- mock_1(fn, value_to_work_on)
out})
expect_equal(my_iterating_function(sqrt, c(9:16)), list(1,1,1,1,1,1,1,1))
expect_called(mock_1, 8)
})
I'm using version of R: 4.1.1
I'm using versions of testthat(3.1.1), mockery(0.4.2), rlang(0.4.12)
I think you're complicating things here, although maybe I'm not fully understanding your end goal. You can directly pass functions through arguments without any issue. Your example code above can be easily simplified to (keeping the loop just to match your test_that() call):
library(testthat)
library(mockery)
my_workhorse_function <- function(fn_to_work_with, value_to_work_on) {
fn_to_work_with(value_to_work_on)
}
my_iterating_function <- function(fn_to_iter_with, iterate_over) {
out <- list()
for(i in seq_along(iterate_over)) {
out[[i]] <- my_workhorse_function(fn_to_iter_with, iterate_over[i])
}
return(out)
}
# Works just fine
my_iterating_function(sqrt, c(9:16))
#> [[1]]
#> [1] 3
#>
#> ...
test_that("my_iterating_function iterates length(iterate_over) times over my_workhorse_function", {
mock_1 <- mockery::mock(1, cycle = TRUE)
stub(my_iterating_function, "my_workhorse_function", mock_1)
expect_equal(my_iterating_function(sqrt, c(9:16)), list(1,1,1,1,1,1,1,1))
expect_called(mock_1, 8)
})
#> Test passed 🥇
You can just pass FUN directly through all of your nested functions. The functions you're wrapping with enexpr() were never going to be evaluated in the first place until you explicitly call them. You usually use enexpr when users are supplying expressions, not just functions.

How to check if a variable is passed to a function with or without quotes?

I'm trying to write a R function that can take either quoted or unquoted data frame variable name or vector of variable names as a parameter. The problem is when the user inserts unquoted dataframe column names as function input parameters it results in "object not found" error. How can I check if the variable name is quoted?
I've tried exists(), missing(), substitute() but none of them works for all combinations.
# considering this printfun as something I can't change
#made it just for demosnstration purposeses
printfun <- function(df, ...){
for(item in list(...)){
print(df[item])
}
}
myfun<-function(df,x){
#should check if input is quoted or unquoted here
# substitute works for some cases not all (see below)
new_args<-c(substitute(df),substitute(x))
do.call(printfun,new_args)
}
#sample data
df<-data.frame(abc=1,dfg=2)
#these are working
myfun(df,c("abc"))
myfun(df,c("abc","dfg"))
myfun(df,"abc")
#these are failing with object not found
myfun(df,abc)
myfun(df,c(abc))
I can differentiate the myfun(df,abc) and myfun(df,"abc") with a try Catch block. Although this does not seem very neat.
But I haven't found any way to differentiate the second argument in myfun(df,c(abc)) from myfun(df,abc) ?
Alternatively, can I somehow check if the error comes from missing quotes, as I guess the object not found error might arise also from something else (eg the dataframe name) being mistyped?
This appears to work for all your cases:
myfun<-function(df,x){
sx <- substitute(x)
a <- tryCatch(is.character(x), error = function(e) FALSE)
if (a) {
new_x <- x
} else {
cx <- as.character(sx)
if (is.name(sx)) {
new_x <- cx
} else if (is.call(sx) && cx[1] == "c") {
new_x <- cx[-1]
} else {
stop("Invalid x")
}
}
new_args <- c(substitute(df), as.list(new_x))
do.call(printfun, new_args)
}
However, I feel there is something strange about what you are trying to do.

User defined function - issue with return values

I regularly come up against the issue of how to categorise dataframes from a list of dataframes according to certain values within them (E.g. numeric, factor strings, etc). I am using a simplified version using vectors here.
After writing messy for loops for this task a bunch of times, I am trying to write a function to repeatedly solve the problem. The code below returns a subscripting error (given at the bottom), however I don't think this is a subscripting problem, but to do with my use of return.
As well as fixing this, I would be very grateful for any pointers on whether there are any cleaner / better ways to code this function.
library(plyr)
library(dplyr)
#dummy data
segmentvalues <- c('1_P', '2_B', '3_R', '4_M', '5_D', '6_L')
trialvec <- vector()
for (i in 1:length(segmentvalues)){
for (j in 1:20) {
trialvec[i*j] <- segmentvalues[i]
}
}
#vector categorisation
vcategorise <- function(categories, data) {
#categorises a vector into a list of vectors
#requires plyr and dyplyr
assignment <- list()
catlength <- length(categories)
for (i in 1:length(catlength)){
for (j in 1:length(data)) {
if (any(contains(categories[i], ignore.case = TRUE,
as.vector(data[j])))) {
assignment[[i]][j] <- data[j]
}
}
}
return (assignment)
}
result <- vcategorise(categories = segmentvalues, data = trialvec)
Error in *tmp*[[i]] : subscript out of bounds
You are indexing assignments -- which is ok, even if at an index that doesn't have a value, that just gives you NULL -- and then indexing into what you get there -- which won't work if you get NULL. And NULL you will get, because you haven't allocated the list to be the right size.
In any case, I don't think it is necessary for you to allocate a table. You are already using a flat indexing structure in your test data generation, so why not do the same with assignment and then set its dimensions afterwards?
Something like this, perhaps?
vcategorise <- function(categories, data) {
assignment <- vector("list", length = length(data) * length(categories))
n <- length(data)
for (i in 1:length(categories)){
for (j in 1:length(data)) {
assignment[(i-1)*n + j] <-
if (any(contains(categories[i],
ignore.case = TRUE,
as.vector(data[j])))) {
data[j]
} else {
NA
}
}
}
dim(assignment) <- c(length(data), length(categories))
assignment
}
It is not the prettiest code, but without fully understanding what you want to achieve, I don't know how to go further.

R: function assigned to object producing unexpected results

EDIT: I solved this one on my own. It had nothing to do with the function object assignment, it was that I was assigning the results to a vector "[]" rather then to a list "[[]]"
here's more reading on the subject: The difference between [] and [[]] notations for accessing the elements of a list or dataframe
I'm trying to filter event data. Depending on what I'm looking at I've got to do the filtering different ways. I've got two functions that I use for filtering (I use them throughout my project, in addition to this instance):
drop_columns <- function(x, ...) {
selectors <- list(...)
return(x[ , -which(names(x) %in% selectors)])
}
filter_by_val <- function(x, col, ...) {
return(x[ which(x[, col] %in% ...), ])
}
Here's the function that choses which function does the filtering, and then executes it. Note that I'm assigning the function to an object called "filter_method":
filter_playtime_data <- function (key_list, data) {
filter_method <- NULL
out_list <- list()
if(key_list$kind == "games") {
filter_method <- function(key_list) {
drop_columns(filter_by_val(data, "GameTitle", key_list), "X")
}
} else if (key_list$kind == "skills") {
filter_method <- function(key_list) {
filter_by_val(data, "Skill", key_list)
}
}
# Separate data with keys
out_list["ELA"] <- filter_method(key_list[["ELA"]])
out_list["MATH"] <- filter_method(key_list[["MATH"]])
out_list["SCI"] <- filter_method(key_list[["SCI"]])
return (out_list)
}
I'm trying to filter data based on "skills" (ie. using filter_by_val) and it's not working as expected. I'm feeding in a data.frame and I'm expecting a data.frame to come out, but instead I'm getting a list of indexes, as if the function is only returning this part of my function: -which(names(x) %in% selectors)
When I run this is the debug browser -- ie. filter_method(key_list[["ELA"]]) -- it works as expected, I get the data frame. But the values held in my output list: out_list[[ELA]] is the list of indexes. Any idea what's happening?

Delete data.frame columns and loop through data.frame assignment function

I found the following piece of code here at stackoverflow:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)
So i wanted to use it for my own proposes, but it did not work out as planned.
What i try to archive is to use it in a for loop or with lapply to use it with multiple data.frames. Amongst others I tried:
d.frame1 <- iris
d.frame2 <- cars
l.frames <- c("d.frame1","d.frame2")
for (b in l.frames){
columnFunction(b)
}
but it yields the following error message:
Error in dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res :
$ operator is invalid for atomic vectors
Well, what i need additionally is that I can loop though that function so that i can iterate through different data.frames.
Last but not least I would need something like:
for (xyz in l.frames){
xyz <- columnFunction(xyz)
}
to automate the saving step.
Does anyone have any idea how i could loop though that function or how i could change the function so that it performs all those steps and is loopable.
I`m quite new to R so perhaps Im missing something obvious.
lapply was designed for this task:
l.frames <- list(d.frame1, d.frame2)
l.frames <- lapply(l.frames, columnFunction)
If you insist on using a for loop:
for (i in seq_along(l.frames)) l.frames[[i]] <- columnFunction(l.frames[[i]])

Resources