Chaining multiple replacement functions in R - r

I am using R to work with a large JS object (using the library rjsonio). As such, I have a lot of nested lists, which are getting somewhat cumbersome to work with. I have a simplified example below. I am trying to work with this object by creating some form of ‘getter’ and ‘setter’ functions. After looking around, I have found a pretty nice ‘getter’ function that recurses through the object and returns the first matching label. This is especially great because it lends itself to chaining functions together. However, I can not figure out a way to get the same effect for a ‘setter’ function. Any thoughts on how to create a ‘setter’ function that can be chained together in a similar fashion?
#example, simplified, object
app = list(
1,
2,
d=list(a=123,
b=456,
list(
FirstKey=list(attr1='good stuff', attr2=12345),
SecondKey=list(attr1='also good stuff', attr2=4321)
)
)
)
#Return a function that returns the value
#associated with first label that matches 'name'
getByName <- function(name){
rmatch <- function(x) {
pos <- match(name, names(x))
if (!is.na(pos))
return(x[[pos]])
for (el in x) {
if (class(el) == "list") {
out <- Recall(el)
if (!is.null(out)) return(out)
}
}
}
rmatch
}
getFirstKey <- getByName("FirstKey")
getAttr1 <- getByName("attr1")
getAttr2 <- getByName("attr2")
#I like that I can chain these functions together
getAttr1(getFirstKey(app))
getAttr2(getFirstKey(app))
# I would like to be able to do something like this
# But this won't work
### getAttr1(getFirstKey(app)) <- 9876
# This does work,,, but I loose the ability to chain functions together
# Closure around a replacement function
setterKeyAttr <- function(keyName, attr){
function(x, value){
x$d[[3]][[keyName]][[attr]] <- value
x
}
}
`setFirstKeyAttr2<-` <- setterKeyAttr("FirstKey", "attr2")
setFirstKeyAttr2(app) <- 22222
#check the answer is correct
getAttr2(getFirstKey(app))
references:
R decorator to change both input and output
http://r.789695.n4.nabble.com/How-to-get-a-specific-named-element-in-a-nested-list-td3037430.html
http://adv-r.had.co.nz/Functions.html

This is what I came up with. It makes the recursive function return the position of the 'name' and still be able to chain the calls together. I am not sure if this is a great way to do it... but it seems to be working... This is based off the fact that app[[c(3,3,1,)]] is a valid way to index in R.
rmatch.pos <- function(object, name, seq=NA, level=NULL){
##return the vector of integers corresponding to the first match
##of 'name' to a label in object or NULL if no match is found
###object: a list, likely deeply nested
##name: the name of the label to look for
##seq: starting point to search for 'name' in 'object' i.e. c(2,3,3)
##level: don't touch this; it keeps track of how deep the recursive execution is
##can be chained together to reduce ambiguity or result:
##obj <- list(a=1, b=list(c=2, d=list(e=1, attr1="really?", f=list(attr1 = "found me!"))))
##obj[[rmatch.pos(obj, "attr1", rmatch.pos(obj, "f"))]]
if(is.null(seq)){
#short circuit if NULL gets passed
#when chaining, this forces the whole 'chain'
#to NULL when any 'link' is NULL
return(NULL)
}
if(is.null(level)){
level <- length(na.omit(seq))
}
if(any(is.na(seq))){
temp <- object
}else{
temp <- object[[seq]]
}
level <- level + 1
pos <- match(name, names(temp))
if(!is.na(pos)){
seq[level] <- pos
return(seq)
}
for(el in seq_along(temp)){
if(class(temp[[el]]) == "list"){
seq[level] <- el
out <- Recall(object, name, seq, level)
if(!is.null(out)){
return(out)
}
}
}
}
###Examples
rmatch.pos(app, "ThirdKey")
rmatch.pos(app, "attr2")
###chaining example
rmatch.pos(app, "attr2", rmatch.pos(app, "FirstKey"))
rmatch.pos(app, "attr2", rmatch.pos(app, "SecondKey"))
rmatch.pos(app, "attr1", rmatch.pos(app, "ERROR"))
rmatch.pos(app, "ERROR", rmatch.pos(app, "attr1"))

Related

How to modify the list/vector argument in place in the R function lapply?

Perhaps an odd question but I have a real use case for it: is it possible to modify in place, i.e. when the function runs and not on a copy, the list/vector argument in the lapply function in R? And if yes, how to do it?
I need it because of some (desired) side effects.
To be more specific, after each element of the list/vector argument is 'read' (or 'used') by lapply, I need to delete it.
Best regards,
Olivier
Some code to illustrate what I am looking for.
Blocked is an atomic vector and B a list.
Unblock <- function(M)
{
Blocked[M] <<- FALSE
Length_B_M <- length(B[[M]])
if(Length_B_M > 0)
{
# Vectorised version
UnblockAndUpdateAdjacencyList <- function(Node.Value)
{
Node.Name <- as.character(Node.Value)
Node.Value <<- NULL # This is not correct because it won't delete the argument in place
if(Blocked[Node.Name]) Unblock(Node.Name)
}
lapply(X = B[[M]], FUN = UnblockAndUpdateAdjacencyList)
# Unvectorised version
# 'i' is a vector/list index - integer
# Important: it is necessary to browse items backwards as items from the atomic vector 'B[[M]]' are deleted and thus its length decreases at each iteration
for(i in Length_B_M:1)
{
P <- as.character(B[[M]][i])
B[[M]] <<- B[[M]][-i]
if(Blocked[P]) Unblock(P)
}
}
}
Actually I found a somewhat elegant solution, i.e. without any explicit loop and any copy of the vector/list in the argument of lapply.
Unblock <- function(M)
{
Blocked[M] <<- FALSE
Length_B_M <- length(B[[M]])
if(Length_B_M > 0)
{
# Vectorised version
# One solution is to copy the vector/list in the argument of 'lapply' - Actually not even needed
UnblockAndUpdateAdjacencyList <- function(Node.Value)
{
Node.Name <- as.character(Node.Value)
B[[M]] <<- B[[M]][-1]
if(Blocked[Node.Name]) Unblock(Node.Name)
}
#B_M_copy <- B[[M]]
#lapply(X = B_M_copy, FUN = UnblockAndUpdateAdjacencyList)
lapply(X = B[[M]], FUN = UnblockAndUpdateAdjacencyList)
}
}

Evaluate listed strings to create function object in r

I need a function created by a list of commands to fully evaluate so that it is identical to the "manual" version of the function.
Background: I am using ScaleR functions in Microsoft R Server and need to apply a set of transformations as a function. ScaleR is very picky about needing to be passed a function that is phrased exactly as specified below:
functionThatWorks <- function(data) {
data$marital_status_p1_ismarried <- impute(data$marital_status_p1_ismarried)
return(data)
}
I have a function that creates this list of transformations (and hundreds more, hence the need to functionalize its writing).
transformList <- list ("data$ismarried <- impute(data$ismarried)",
"data$issingle <- impute(data$issingle)")
This line outputs the evaluated string that I want to the console, but I am unaware of a way to move it from console output to being used in a function:
cat(noquote(unlist(bquote( .(noquote(transformList[1]))))))
I need to evaluate functionIWant so that it is identical to functionThatWorks.
functionIWant <- function(data){
eval( cat(noquote(unlist(bquote( .(noquote(transformList[1])))))) )
return(data)
}
identical(functionThatWorks, functionIWant)
EDIT: Adding in the answer based on #dww 's code. It works well in ScaleR. It is identical, minus meaningless spacing.
functionIWant <- function(){}
formals(functionIWant) <- alist(data=NULL)
functionIWant.text <- parse(text = c(
paste( bquote( .(noquote(transformList[1]))), ";", "return(data)\n")
))
body(functionIWant) <- as.call(c(as.name("{"), functionIWant.text))
Maybe something like this?
# 1st define a 'hard-coded' function
f1 <- function (x = 2)
{
y <- x + 1
y^2
}
f1(3)
# [1] 16
# now create a similar function from a character vector
f2 <- function(){}
formals(f2) <- alist(x=2)
f2.text <- parse(text = c('y <- x + 1', 'y^2'))
body(f2) <- as.call(c(as.name("{"), f2.text))
f2(3)
# [1] 16

R: function assigned to object producing unexpected results

EDIT: I solved this one on my own. It had nothing to do with the function object assignment, it was that I was assigning the results to a vector "[]" rather then to a list "[[]]"
here's more reading on the subject: The difference between [] and [[]] notations for accessing the elements of a list or dataframe
I'm trying to filter event data. Depending on what I'm looking at I've got to do the filtering different ways. I've got two functions that I use for filtering (I use them throughout my project, in addition to this instance):
drop_columns <- function(x, ...) {
selectors <- list(...)
return(x[ , -which(names(x) %in% selectors)])
}
filter_by_val <- function(x, col, ...) {
return(x[ which(x[, col] %in% ...), ])
}
Here's the function that choses which function does the filtering, and then executes it. Note that I'm assigning the function to an object called "filter_method":
filter_playtime_data <- function (key_list, data) {
filter_method <- NULL
out_list <- list()
if(key_list$kind == "games") {
filter_method <- function(key_list) {
drop_columns(filter_by_val(data, "GameTitle", key_list), "X")
}
} else if (key_list$kind == "skills") {
filter_method <- function(key_list) {
filter_by_val(data, "Skill", key_list)
}
}
# Separate data with keys
out_list["ELA"] <- filter_method(key_list[["ELA"]])
out_list["MATH"] <- filter_method(key_list[["MATH"]])
out_list["SCI"] <- filter_method(key_list[["SCI"]])
return (out_list)
}
I'm trying to filter data based on "skills" (ie. using filter_by_val) and it's not working as expected. I'm feeding in a data.frame and I'm expecting a data.frame to come out, but instead I'm getting a list of indexes, as if the function is only returning this part of my function: -which(names(x) %in% selectors)
When I run this is the debug browser -- ie. filter_method(key_list[["ELA"]]) -- it works as expected, I get the data frame. But the values held in my output list: out_list[[ELA]] is the list of indexes. Any idea what's happening?

Delete data frame column within function

I have the following code:
df<- iris
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
for (z in column.D) {
x[[z]] <- NULL #with this part I wanted to delete the above selected columns
}
}
}
columnFunction(df)
So how is it possible to address data.frame columns "dynamically" so: x[[z]] <- NULL should translate to:
df$Species <- NULL
df[["Species"]] <- NULL
df[,"Species"] <- NULL
and that for every selected column in every data.frame chosen for the function.
Well does anyone know how to archive something like that? I tried several things like with the paste command or sprintf, deparse but i didnt get it working. I also tied to address the data.frame as a global variable by using <<- but didn`t help, too. (Well its the first time i even heard about that). It looks like i miss the right method transferring x and z to the variable assignment.
If you want to create a function columnFunction that removes columns from a passed data frame df, all you need to do is pass the data frame to the function, return the modified version of df, and replace df with the result:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)

combination of expand.grid and mapply?

I am trying to come up with a variant of mapply (call it xapply for now) that combines the functionality (sort of) of expand.grid and mapply. That is, for a function FUN and a list of arguments L1, L2, L3, ... of unknown length, it should produce a list of length n1*n2*n3 (where ni is the length of list i) which is the result of applying FUN to all combinations of the elements of the list.
If expand.grid worked to generate lists of lists rather than data frames, one might be able to use it, but I have in mind that the lists may be lists of things that won't necessarily fit into a data frame nicely.
This function works OK if there are exactly three lists to expand, but I am curious about a more generic solution. (FLATTEN is unused, but I can imagine that FLATTEN=FALSE would generate nested lists rather than a single list ...)
xapply3 <- function(FUN,L1,L2,L3,FLATTEN=TRUE,MoreArgs=NULL) {
retlist <- list()
count <- 1
for (i in seq_along(L1)) {
for (j in seq_along(L2)) {
for (k in seq_along(L3)) {
retlist[[count]] <- do.call(FUN,c(list(L1[[i]],L2[[j]],L3[[k]]),MoreArgs))
count <- count+1
}
}
}
retlist
}
edit: forgot to return the result. One might be able to solve this by making a list of the indices with combn and going from there ...
I think I have a solution to my own question, but perhaps someone can do better (and I haven't implemented FLATTEN=FALSE ...)
xapply <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
inds <- do.call(expand.grid,lapply(L,seq_along)) ## Marek's suggestion
retlist <- list()
for (i in 1:nrow(inds)) {
arglist <- mapply(function(x,j) x[[j]],L,as.list(inds[i,]),SIMPLIFY=FALSE)
if (FLATTEN) {
retlist[[i]] <- do.call(FUN,c(arglist,MoreArgs))
}
}
retlist
}
edit: I tried #baptiste's suggestion, but it's not easy (or wasn't for me). The closest I got was
xapply2 <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
L <- list(...)
xx <- do.call(expand.grid,L)
f <- function(...) {
do.call(FUN,lapply(list(...),"[[",1))
}
mlply(xx,f)
}
which still doesn't work. expand.grid is indeed more flexible than I thought (although it creates a weird data frame that can't be printed), but enough magic is happening inside mlply that I can't quite make it work.
Here is a test case:
L1 <- list(data.frame(x=1:10,y=1:10),
data.frame(x=runif(10),y=runif(10)),
data.frame(x=rnorm(10),y=rnorm(10)))
L2 <- list(y~1,y~x,y~poly(x,2))
z <- xapply(lm,L2,L1)
xapply(lm,L2,L1)
#ben-bolker, I had a similar desire and think I have a preliminary solution worked out, that I've also tested to work in parallel. The function, which I somewhat confusingly called gmcmapply (g for grid) takes an arbitrarily large named list mvars (that gets expand.grid-ed within the function) and a FUN that utilizes the list names as if they were arguments to the function itself (gmcmapply will update the formals of FUN so that by the time FUN is passed to mcmapply it's arguments reflect the variables that the user would like to iterate over (which would be layers in a nested for loop)). mcmapply then dynamically updates the values of these formals as it cycles over the expanded set of variables in mvars.
I've posted the preliminary code as a gist (reprinted with an example below) and would be curious to get your feedback on it. I'm a grad student, that is self-described as an intermediately-skilled R enthusiast, so this is pushing my R skills for sure. You or other folks in the community may have suggestions that would improve on what I have. I do think even as it stands, I'll be coming to this function quite a bit in the future.
gmcmapply <- function(mvars, FUN, SIMPLIFY = FALSE, mc.cores = 1, ...){
require(parallel)
FUN <- match.fun(FUN)
funArgs <- formals(FUN)[which(names(formals(FUN)) != "...")] # allow for default args to carry over from FUN.
expand.dots <- list(...) # allows for expanded dot args to be passed as formal args to the user specified function
# Implement non-default arg substitutions passed through dots.
if(any(names(funArgs) %in% names(expand.dots))){
dot_overwrite <- names(funArgs[which(names(funArgs) %in% names(expand.dots))])
funArgs[dot_overwrite] <- expand.dots[dot_overwrite]
#for arg naming and matching below.
expand.dots[dot_overwrite] <- NULL
}
## build grid of mvars to loop over, this ensures that each combination of various inputs is evaluated (equivalent to creating a structure of nested for loops)
grid <- expand.grid(mvars,KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
# specify formals of the function to be evaluated by merging the grid to mapply over with expanded dot args
argdefs <- rep(list(bquote()), ncol(grid) + length(expand.dots) + length(funArgs) + 1)
names(argdefs) <- c(colnames(grid), names(funArgs), names(expand.dots), "...")
argdefs[which(names(argdefs) %in% names(funArgs))] <- funArgs # replace with proper dot arg inputs.
argdefs[which(names(argdefs) %in% names(expand.dots))] <- expand.dots # replace with proper dot arg inputs.
formals(FUN) <- argdefs
if(SIMPLIFY) {
#standard mapply
do.call(mcmapply, c(FUN, c(unname(grid), mc.cores = mc.cores))) # mc.cores = 1 == mapply
} else{
#standard Map
do.call(mcmapply, c(FUN, c(unname(grid), SIMPLIFY = FALSE, mc.cores = mc.cores)))
}
}
example code below:
# Example 1:
# just make sure variables used in your function appear as the names of mvars
myfunc <- function(...){
return_me <- paste(l3, l1^2 + l2, sep = "_")
return(return_me)
}
mvars <- list(l1 = 1:10,
l2 = 1:5,
l3 = letters[1:3])
### list output (mapply)
lreturns <- gmcmapply(mvars, myfunc)
### concatenated output (Map)
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE)
## N.B. This is equivalent to running:
lreturns <- c()
for(l1 in 1:10){
for(l2 in 1:5){
for(l3 in letters[1:3]){
lreturns <- c(lreturns,myfunc(l1,l2,l3))
}
}
}
### concatenated outout run on 2 cores.
lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE, mc.cores = 2)
Example 2. Pass non-default args to FUN.
## Since the apply functions dont accept full calls as inputs (calls are internal), user can pass arguments to FUN through dots, which can overwrite a default option for FUN.
# e.g. apply(x,1,FUN) works and apply(x,1,FUN(arg_to_change= not_default)) does not, the correct way to specify non-default/additional args to FUN is:
# gmcmapply(mvars, FUN, arg_to_change = not_default)
## update myfunc to have a default argument
myfunc <- function(rep_letters = 3, ...){
return_me <- paste(rep(l3, rep_letters), l1^2 + l2, sep = "_")
return(return_me)
}
lreturns <- gmcmapply(mvars, myfunc, rep_letters = 1)
A bit of additional functionality I would like to add but am still trying to work out is
cleaning up the output to be a pretty nested list with the names of mvars (normally, I'd create multiple lists within a nested for loop and tag lower-level lists onto higher level lists all the way up until all layers of the gigantic nested loop were done). I think using some abstracted variant of the solution provided here will work, but I haven't figured out how to make the solution flexible to the number of columns in the expand.grid-ed data.frame.
I would like an option to log the outputs of the child processesthat get called in mcmapply in a user-specified directory. So you could look at .txt outputs from every combination of variables generated by expand.grid (i.e. if the user prints model summaries or status messages as a part of FUN as I often do). I think a feasible solution is to use the substitute() and body() functions, described here to edit FUN to open a sink() at the beginning of FUN and close it at the end if the user specifies a directory to write to. Right now, I just program it right into FUN itself, but later it would be nice to just pass gmcmapply an argument called something like log_children = "path_to_log_dir. and then editing the body of the function to (pseudocode) sink(file = file.path(log_children, paste0(paste(names(mvars), sep = "_"), ".txt")
Let me know what you think!
-Nate

Resources