EDIT: I solved this one on my own. It had nothing to do with the function object assignment, it was that I was assigning the results to a vector "[]" rather then to a list "[[]]"
here's more reading on the subject: The difference between [] and [[]] notations for accessing the elements of a list or dataframe
I'm trying to filter event data. Depending on what I'm looking at I've got to do the filtering different ways. I've got two functions that I use for filtering (I use them throughout my project, in addition to this instance):
drop_columns <- function(x, ...) {
selectors <- list(...)
return(x[ , -which(names(x) %in% selectors)])
}
filter_by_val <- function(x, col, ...) {
return(x[ which(x[, col] %in% ...), ])
}
Here's the function that choses which function does the filtering, and then executes it. Note that I'm assigning the function to an object called "filter_method":
filter_playtime_data <- function (key_list, data) {
filter_method <- NULL
out_list <- list()
if(key_list$kind == "games") {
filter_method <- function(key_list) {
drop_columns(filter_by_val(data, "GameTitle", key_list), "X")
}
} else if (key_list$kind == "skills") {
filter_method <- function(key_list) {
filter_by_val(data, "Skill", key_list)
}
}
# Separate data with keys
out_list["ELA"] <- filter_method(key_list[["ELA"]])
out_list["MATH"] <- filter_method(key_list[["MATH"]])
out_list["SCI"] <- filter_method(key_list[["SCI"]])
return (out_list)
}
I'm trying to filter data based on "skills" (ie. using filter_by_val) and it's not working as expected. I'm feeding in a data.frame and I'm expecting a data.frame to come out, but instead I'm getting a list of indexes, as if the function is only returning this part of my function: -which(names(x) %in% selectors)
When I run this is the debug browser -- ie. filter_method(key_list[["ELA"]]) -- it works as expected, I get the data frame. But the values held in my output list: out_list[[ELA]] is the list of indexes. Any idea what's happening?
Related
I am trying to apply a function to each group of data in the main dataframe and I decided to use group_modify() (since it returns a dataframe as well). Here is my initial code:
max_conc_fx <- function(df) {
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
However, I keep getting this error message:
Error in as_group_map_function(.f) :
The function must accept at least two arguments. You can use ... to absorb unused components
After some trial and error, I rectified this by adding the argument "..." to my max_conc_fx() function, which leads to this code which works:
max_conc_fx <- function(df, ...) { #x is the rows of data for one shop
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
Can someone explain to me what the dots are actually for in this case? I understood them to be used for representing an arbitrary number of arguments or for passing on additional arguments to other functions, but I do not see both of these events happening here. Do let me know if I am missing out something or if you have a better solution for my code.
The dots don't do much in that case, but there is a condition that requires them in your functions case for group_modify()to work. The function you are passing is getting converted using a helper function as_group_map_function(). This function checks if the function has more than two arguments and if not it should have ... to pass:
## dplyr/R/group_map.R (Lines 2-8)
as_group_map_function <- function(.f) {
.f <- rlang::as_function(.f)
if (length(form <- formals(.f)) < 2 && ! "..." %in% names(form)){
stop("The function must accept at least two arguments. You can use ... to absorb unused components")
}
.f
}
I'm not 100% sure why it is done, but based on a quick peek on the source code it looks like there is a point where they pass two arguments and ... to the 'converted' version of your function (technically there is no conversion that happens – the conversion only takes place if you pass a formula instead of a function...), so my best guess is that is the reason: it needs to have some way of dealing with at least two arguments — if it doesn't need them, then it needs ... to 'absorb' them, otherwise it would fail.
I'm trying to write a R function that can take either quoted or unquoted data frame variable name or vector of variable names as a parameter. The problem is when the user inserts unquoted dataframe column names as function input parameters it results in "object not found" error. How can I check if the variable name is quoted?
I've tried exists(), missing(), substitute() but none of them works for all combinations.
# considering this printfun as something I can't change
#made it just for demosnstration purposeses
printfun <- function(df, ...){
for(item in list(...)){
print(df[item])
}
}
myfun<-function(df,x){
#should check if input is quoted or unquoted here
# substitute works for some cases not all (see below)
new_args<-c(substitute(df),substitute(x))
do.call(printfun,new_args)
}
#sample data
df<-data.frame(abc=1,dfg=2)
#these are working
myfun(df,c("abc"))
myfun(df,c("abc","dfg"))
myfun(df,"abc")
#these are failing with object not found
myfun(df,abc)
myfun(df,c(abc))
I can differentiate the myfun(df,abc) and myfun(df,"abc") with a try Catch block. Although this does not seem very neat.
But I haven't found any way to differentiate the second argument in myfun(df,c(abc)) from myfun(df,abc) ?
Alternatively, can I somehow check if the error comes from missing quotes, as I guess the object not found error might arise also from something else (eg the dataframe name) being mistyped?
This appears to work for all your cases:
myfun<-function(df,x){
sx <- substitute(x)
a <- tryCatch(is.character(x), error = function(e) FALSE)
if (a) {
new_x <- x
} else {
cx <- as.character(sx)
if (is.name(sx)) {
new_x <- cx
} else if (is.call(sx) && cx[1] == "c") {
new_x <- cx[-1]
} else {
stop("Invalid x")
}
}
new_args <- c(substitute(df), as.list(new_x))
do.call(printfun, new_args)
}
However, I feel there is something strange about what you are trying to do.
I regularly come up against the issue of how to categorise dataframes from a list of dataframes according to certain values within them (E.g. numeric, factor strings, etc). I am using a simplified version using vectors here.
After writing messy for loops for this task a bunch of times, I am trying to write a function to repeatedly solve the problem. The code below returns a subscripting error (given at the bottom), however I don't think this is a subscripting problem, but to do with my use of return.
As well as fixing this, I would be very grateful for any pointers on whether there are any cleaner / better ways to code this function.
library(plyr)
library(dplyr)
#dummy data
segmentvalues <- c('1_P', '2_B', '3_R', '4_M', '5_D', '6_L')
trialvec <- vector()
for (i in 1:length(segmentvalues)){
for (j in 1:20) {
trialvec[i*j] <- segmentvalues[i]
}
}
#vector categorisation
vcategorise <- function(categories, data) {
#categorises a vector into a list of vectors
#requires plyr and dyplyr
assignment <- list()
catlength <- length(categories)
for (i in 1:length(catlength)){
for (j in 1:length(data)) {
if (any(contains(categories[i], ignore.case = TRUE,
as.vector(data[j])))) {
assignment[[i]][j] <- data[j]
}
}
}
return (assignment)
}
result <- vcategorise(categories = segmentvalues, data = trialvec)
Error in *tmp*[[i]] : subscript out of bounds
You are indexing assignments -- which is ok, even if at an index that doesn't have a value, that just gives you NULL -- and then indexing into what you get there -- which won't work if you get NULL. And NULL you will get, because you haven't allocated the list to be the right size.
In any case, I don't think it is necessary for you to allocate a table. You are already using a flat indexing structure in your test data generation, so why not do the same with assignment and then set its dimensions afterwards?
Something like this, perhaps?
vcategorise <- function(categories, data) {
assignment <- vector("list", length = length(data) * length(categories))
n <- length(data)
for (i in 1:length(categories)){
for (j in 1:length(data)) {
assignment[(i-1)*n + j] <-
if (any(contains(categories[i],
ignore.case = TRUE,
as.vector(data[j])))) {
data[j]
} else {
NA
}
}
}
dim(assignment) <- c(length(data), length(categories))
assignment
}
It is not the prettiest code, but without fully understanding what you want to achieve, I don't know how to go further.
I have the following code:
df<- iris
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
for (z in column.D) {
x[[z]] <- NULL #with this part I wanted to delete the above selected columns
}
}
}
columnFunction(df)
So how is it possible to address data.frame columns "dynamically" so: x[[z]] <- NULL should translate to:
df$Species <- NULL
df[["Species"]] <- NULL
df[,"Species"] <- NULL
and that for every selected column in every data.frame chosen for the function.
Well does anyone know how to archive something like that? I tried several things like with the paste command or sprintf, deparse but i didnt get it working. I also tied to address the data.frame as a global variable by using <<- but didn`t help, too. (Well its the first time i even heard about that). It looks like i miss the right method transferring x and z to the variable assignment.
If you want to create a function columnFunction that removes columns from a passed data frame df, all you need to do is pass the data frame to the function, return the modified version of df, and replace df with the result:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)
I have a list called "scenbase" that contains 40 data frames, which are each 326 rows by 68 columns. I would like to use lapply() to subset the data frames so they only retain rows 33-152. I've written a simple function called trim() (below), and am attempting to apply it to the list of data frames but am getting an error message. The function and my attempt at using it with lapply is below:
trim <- function(i)
{ (i <- i[33:152,]) }
lapply(scenbase, trim)
Error in i[33:152, ] : incorrect number of dimensions
When I try to do the same thing to one of the individual data frames (soil11base.txt) that are included in the list (below), it works as expected:
soil11base.txt <- soil11base.txt[33:152,]
Any idea what I need to do to get the dimensions correct?
You have 2 solutions. You can either
(a) assign to a new list newList = lapply(scenbase, function(x) { x[33:152,,drop=F]} )
(b) use the <<- operator will assign your trimmed data in place lapply(1:length(scenbase), function(x) { scenbase[[x]] <<- scenbase[[x]][33:152,,drop=F]} ).
Your call does not work because the i is not in the global scope. You can work your way around that by using calls to the <<- operator which assigns to the first variable it finds in successive parent environments. Or by creating a new trimmed list.
Here is some code that reproduces solution (a):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
choppedList = lapply(listOfDfs, function(x) { x[33:152,,drop=F]} )
Here is some code that reproduces solution (b):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
lapply(1:length(listOfDfs), function(x) { listOfDfs[[x]] <<- listOfDfs[[x]][33:152,,drop=F]} )