R - store functions in a data.frame - r

I would like to return a matrix/data.frame each row containing arguments and the content of a file.
However, there may be many files, so I would prefer if I could load the file lazily, so the file is only read if the actual content is requested. The function below loads the files actively if as.func=F.
It would be perfect if it could load them lazily, but it would also be acceptable, if instead of the content a function is returned that would read the content.
I can make functions that read the content (see below with as.func=T), but for some reason I cannot put that into the data.frame to return.
load_parallel_results <- function(resdir,as.func=F) {
## Find files called .../stdout
stdoutnames <- list.files(path=resdir, pattern="stdout", recursive=T);
## Find files called .../stderr
stderrnames <- list.files(path=resdir, pattern="stderr", recursive=T);
if(as.func) {
## Create functions to read them
stdoutcontents <-
lapply(stdoutnames, function(x) { force(x); return(function() { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } ) } );
stderrcontents <-
lapply(stderrnames, function(x) { force(x); return(function() { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } ) } );
} else {
## Read them
stdoutcontents <-
lapply(stdoutnames, function(x) { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } );
stderrcontents <-
lapply(stderrnames, function(x) { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } );
}
if(length(stdoutnames) == 0) {
## Return empty data frame if no files found
return(data.frame());
}
## Make the columns containing the variable values
m <- matrix(unlist(strsplit(stdoutnames, "/")),nrow = length(stdoutnames),byrow=T);
mm <- as.data.frame(m[,c(F,T)]);
## Append the stdout and stderr column
mmm <- cbind(mm,unlist(stdoutcontents),unlist(stderrcontents));
colnames(mmm) <- c(strsplit(stdoutnames[1],"/")[[1]][c(T,F)],"stderr");
## Example:
## parallel --results my/res/dir --header : 'echo {};seq {myvar1}' ::: myvar1 1 2 ::: myvar2 A B
## > load_parallel_results("my/res/dir")
## myvar1 myvar2 stdout stderr
## [1,] "1" "A" "1 A\n1" ""
## [2,] "1" "B" "1 B\n1" ""
## [3,] "2" "A" "2 A\n1\n2" ""
## [4,] "2" "B" "2 B\n1\n2" ""
return(mmm);
}
Background
GNU Parallel has a --results option that stores output in a structured way. If there are 1000000 outputfiles it may be hard to manage them. R is good for that, but it would be awfully slow if you had to read all 1000000 files just to get the ones where argument 1 = "Foo" and argument 2 = "Bar".

Unfortunately I don't think you can save a function in a data.frame column.
But you could store the deparsed text of the function and evaluate it when needed:
e.g.
myFunc <- function(x) { print(x) }
# convert the function to text
funcAsText <- deparse(myFunc)
# convert the text back to a function
newMyFunc <- eval(parse(text=funcAsText))
# now you can use the function newMyFunc exactly like myFunc
newMyFunc("foo")
> [1] "foo"
EDIT:
Since the files are a lot, I suggest you to simply store a string indicating the type of the file and create a function that understands the types and reads the file accordingly; so you can call it when needed by passing the type and filepath.

(Without reading the question body:)
You can store functions in a data.frame like this:
df <- data.frame(fun = 1:3)
df$fun <- c(mean, sd, function(x) x^2)
I am not sure if this will break other things, so consider using tibble or data.table from the same named packages which really support arbitrary object types.

You can use 2D lists to store your functions. Obviously, you lose some of the checks you get with DFs, but that's the whole point here:
> funs <- c(replicate(5, function(x) NULL), replicate(5, function(y) TRUE))
> names <- as.list(letters[1:10])
> # df doesn't work
> df <- data.frame(names=names)
> df.2 <- cbind(df, funs)
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ""function"" to a data.frame
# but 2d lists do
> lst.2d <- cbind(funs, names)
> lst.2d[2, 1]
$funs
function (x)
NULL
> lst.2d[6, 1]
$funs
function (y)
TRUE

Related

Create an R function programmatically with non-fixed body

In a for loop I make a "string-formula" and allocate it to e.g. body1. And when I try to make a function with that body1 it fails... And I have no clue what I should try else...
This question How to create an R function programmatically? helped me a lot but sadly only quote is used to set the body...
I hope you have an idea how to work around with this issue.
And now my code:
A.m=matrix(c(3,4,2,2,1,1,1,3,2),ncol=3,byrow=TRUE)
for(i in 1:dim(A.m)[1]) {
body=character()
# here the string-formula emerges
for(l in 1:dim(A.m)[2]) {
body=paste0(body,"A.m[",i,",",l,"]","*x[",l,"]+")
}
# only the last plus-sign is cutted off
assign(paste0("body",i),substr(body,1,nchar(body)-1))
}
args=alist(x = )
# just for your convenience the console output
body1
## [1] "A.m[1,1]*x[1]+A.m[1,2]*x[2]+A.m[1,3]*x[3]"
# in this code-line I don't know how to pass body1 in feasible way
assign("Function_1", as.function(c(args, ???body1???), env = parent.frame())
And this is my aim:
Function_1(x=c(1,1,1))
## 9 # 3*1 + 4*1 + 2*1
Since you have a string, you need to parse that string. You can do
assign("Function_1",
as.function(c(args, parse(text=body1)[[1]])),
env = parent.frame())
Though I would strongly discourage the use of assign for filling your global environment with a bunch of variables with indexes in their name. In general that makes things much tougher to program with. It would be much easier to collect all your functions in a list. For example
funs <- lapply(1:dim(A.m)[1], function(i) {
body <- ""
for(l in 1:dim(A.m)[2]) {
body <- paste0(body,"A.m[",i,",",l,"]","*x[",l,"]+")
}
body <- substr(body,1,nchar(body)-1)
body <- parse(text=body)[[1]]
as.function(c(alist(x=), body), env=parent.frame())
})
And then you can call the different functions by extracting them with [[]]
funs[[1]](x=c(1,1,1))
# [1] 9
funs[[2]](x=c(1,1,1))
# [1] 4
Or you can ever call all the functions with an lapply
lapply(funs, function(f, ...) f(...), x=c(1,1,1))
# [[1]]
# [1] 9
# [[2]]
# [1] 4
# [[3]]
# [1] 6
Although if this is actually what your function is doing, there are easier ways to do this in R using matrix multiplication %*%. Your Function_1 is the same as A.m[1,] %*% c(1,1,1). You could make a generator funciton like
colmult <- function(mat, row) {
function(x) {
as.numeric(mat[row,] %*% x)
}
}
And then create the same list of functions with
funs <- lapply(1:3, function(i) colmult(A.m, i))
Then you don't need any string building or parsing which tends to be error prone.

How could I define an application method on an S3 object in R? (like a "function object" in c++)

The problem I am trying to tackle here is needing to apply (execute) an S3 object which is essentially a vector-like structure. This may contain various formulas which at some stage I need to evaluate for a single argument, in order to get back a vector-like object of the original shape, containing the evaluation of its constituent formulas at the given argument.
Examples of this (just to illustrate) might be a matrix of transformation - say rotation - which would take the angle to rotate by, and produce a matrix of values by which to multiply a point, for the given rotation. Another example might be the vector of states in a problem in classical mechanics. Then given t, v, a, etc, it could return s...
Now, I have created my container object in S3, and its working fine in most respects, using generic methods; I also found the Ops.myClass system of operator overloading very useful.
To complete my class, all I need now is a way to specify it as executable.
I see that there are various mechanisms that will do what I want in part, for instance I suppose that as.function() will convert the object to behave as I want, and something like lapply() could be used for the "reverse" application of the argument to the functions. What I am not sure how to do is link it all up so that I can do something like this mock-up:
new_Object <- function(<all my function vector stuff spec>)
vtest <- new_Object(<say, sin, cos, tan>)
vtest(1)
==>
myvec(.8414709848078965 .5403023058681398 1.557407724654902)
(Yes, I have already specified a generic print() routine that will make it appear nice)
All suggestions, sample code, links to examples are welcome.
PS =====
I have added some basic example code as per request.
I am not sure how much would be too much, so the full working minimal example, including operator overloading is in this gist here.
I am only showing the constructor and helper functions below:
# constructor
new_Struct <- function(stype , vec){
stopifnot(is.character(stype)) # enforce up | down
stopifnot(is.vector(vec))
structure(vec,class="Struct", type=stype)
}
# constructor helper functions --- need to allow for nesting!
up <-function(...){
vec <- unlist(list(...),use.names = FALSE)
new_Struct("up",vec)
}
down <-function(...){
vec <- unlist(list(...),use.names = FALSE)
new_Struct("down",vec)
}
The above code behaves thus:
> u1 <- up(1,2,3)
> u2 <- up(3,4,5)
> d1 <- down(u1)
> d1
[1] down(1, 2, 3)
> u1+u2
[1] up(4, 6, 8)
> u1+d1
Error: '+' not defined for opposite tuple types
> u1*d1
[1] 14
> u1*u2
[,1] [,2] [,3]
[1,] 3 4 5
[2,] 6 8 10
[3,] 9 12 15
> u1^2
[1] 14
> s1 <- up(sin,cos,tan)
> s1
[1] up(.Primitive("sin"), .Primitive("cos"), .Primitive("tan"))
> s1(1)
Error in s1(1) : could not find function "s1"
What I need, is for it to be able to do this:
> s1(1)
[1] up(.8414709848078965 .5403023058681398 1.557407724654902)
You can not call each function in a list of functions without a loop.
I'm not fully understanding all requirements, but this should give you a start:
new_Struct <- function(stype , vec){
stopifnot(is.character(stype)) # enforce up | down
stopifnot(is.vector(vec) || is.function(vec))
structure(vec,class="Struct", type=stype)
}
# constructor helper functions --- need to allow for nesting!
up <- function(...) UseMethod("up")
up.default <- function(...){
vals <- list(...)
stopifnot(all(vapply(vals, is.vector, FUN.VALUE = logical(1))))
vec <- unlist(vals, use.names = FALSE)
new_Struct("up",vec)
}
up.function <- function(...){
funs <- list(...)
stopifnot(all(vapply(funs, is.function, FUN.VALUE = logical(1))))
new_Struct("up", function(x) new_Struct("up", sapply(funs, do.call, list(x))))
}
up(1, 2, 3)
#[1] 1 2 3
#attr(,"class")
#[1] "Struct"
#attr(,"type")
#[1] "up"
up(1, 2, sin)
#Error in up.default(1, 2, sin) :
# all(vapply(vals, is.vector, FUN.VALUE = logical(1))) is not TRUE
up(sin, 1, 2)
#Error in up.function(sin, 1, 2) :
# all(vapply(funs, is.function, FUN.VALUE = logical(1))) is not TRUE
s1 <- up(sin, cos, tan)
s1(1)
#[1] 0.8414710 0.5403023 1.5574077
#attr(,"class")
#[1] "Struct"
#attr(,"type")
#[1] "up"
After some thought I have come up with a way to approach this, it's not perfect, it would be great if someone could figure out a way to make the function call implicit/transparent.
So, for now I just use the call() mechanism on the object, and that seems to work fine. Here's the pertinent part of the code, minus checks. I'll put up the latest full version on the same gist as above.
# constructor
new_Struct <- function(stype , vec){
stopifnot(is.character(stype)) # enforce up | down
stopifnot(is.vector(vec))
structure(vec,class="Struct", type=stype)
}
# constructor helper functions --- need to allow for nesting!
up <- function(...){
vec <- unlist(list(...), use.names = FALSE)
new_Struct("up",vec)
}
down <- function(...){
vec <- unlist(list(...), use.names = FALSE)
new_Struct("down",vec)
}
# generic print for tuples
print.Struct <- function(s){
outstr <- sprintf("%s(%s)", attributes(s)$type, paste(c(s), collapse=", "))
print(noquote(outstr))
}
# apply the structure - would be nice if this could be done *implicitly*
call <- function(...) UseMethod("call")
call.Struct <- function(s,x){
new_Struct(attributes(s)$type, sapply(s, do.call, list(x)))
}
Now I can do:
> s1 <- up(sin,cos,tan)
> length(s1)
[1] 3
> call(s1,1)
[1] up(0.841470984807897, 0.54030230586814, 1.5574077246549)
>
Not as nice as my ultimate target of
> s1(1)
[1] up(0.841470984807897, 0.54030230586814, 1.5574077246549)
but it will do for now...

Set atomic vector names by reference

I am wondering if it is possible to set vector names by reference in R.
I often use data.table::fread to read text files, and then I clean up the variable names by wrapping setnames (which also works on a plain data.frame) and a string cleanup function similar to:
clean_var_name <- function(s) {
gsub("^_+|_+$","",gsub("(\\s|\\-|[[:punct:]])+", "_", tolower(s) ) )
}
so my function looks like:
clean_names <- function(x){
require(data.table)
if(is.data.frame(x)){setnames(x, names(x), clean_var_name(names(x)))} # this part works
else if(is.vector(x)){ do_something_here } # this is the question
}
I'm wondering if there is a way to include the case of vectors in the same function in a way that performs names(x) <- clean_var_name(names(x)) by reference.
v <- c(`thIs.Is.A.Terrible-Name`=1, `this One is TOO`=2)
dt <- data.table(t(v))
clean_names(dt)
dt
# this_is_a_terrible_name this_one_is_too
# 1: 1 4
# would like to be able to do same for clean_names(v)
I'm also open to explanations of why this is a bad idea (side effects, functional programming, etc.)
Use setattr function:
library(data.table)
x <- 1:10
address(x)
# [1] "0x713cfd0"
setattr(x,"names",letters[1:10])
address(x)
# [1] "0x713cfd0"

Recursively editing a list in R

In my program, I am recursively going over a nested list and adding elements to an overall list that I will return. There are a few details to be taken care of, so I can't just use unlist.
formulaPart is taken to be a formula object.
My code is:
parseVariables <- function(formulaPart, myList){
for(currentVar in as.list(formulaPart))
if(typeof(currentVar == 'language'
parseVariables(currentVar, myList)
else
if(! toString(currentVar) %in% c(\\various characters)
list <- c(list, currentVar)
}
I have checked that the function correctly adds elements to the list when it should. The problem is that the list loses elements due to recursion. The elements added during one inner recursive call are not saved for another recursive call.
If this was in C++, I could just use a pointer; the same for Java. However, I do not understand how to handle this error in R.
R does something like pass-by-value, so you can't modify (most) existing objects just by passing them into a function. If you want to add on to something recursively, one trick would be to use an environment instead, which get passed by reference. This can easily be coerced to list when you're done.
parseVariables <- function(formulaPart, myList){
for(currentVar in as.list(formulaPart)) {
if(typeof(currentVar) == 'language') {
parseVariables(currentVar, myList)
}
else {
if(! toString(currentVar) %in% c(':', '+', '~'))
assign(toString(currentVar), currentVar, myList)
}
}
}
f1 <- z ~ a:b + x
f2 <- z ~ x + y
myList <- new.env()
parseVariables(f1, myList)
parseVariables(f2, mylist)
ls(myList)
# [1] "a" "b" "x" "z"
as.list(myList)
# $x
# x
#
# $z
# z
#
# $a
# a
#
# $b
# b

Writing functions to handle multiple data types in R/Splus?

I would like to write a function that handles multiple data types. Below is an example that works but seems clunky. Is there a standard (or better) way of doing this?
(It's times like this I miss Matlab where everything is one type :>)
myfunc = function(x) {
# does some stuff to x and returns a value
# at some point the function will need to find out the number of elements
# at some point the function will need to access an element of x.
#
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
x.vec <- as.vector(as.matrix(as.data.frame(x)))
n <- length(x.vec)
ret <- x.vec[n/3] # this line only for concreteness
return(ret)
}
Use S3 methods. A quick example to get you started:
myfunc <- function(x) {
UseMethod("myfunc",x)
}
myfunc.data.frame <- function(x) {
x.vec <- as.vector(as.matrix(x))
myfunc(x.vec)
}
myfunc.numeric <- function(x) {
n <- length(x)
ret <- x[n/3]
return(ret)
}
myfunc.default <- function(x) {
stop("myfunc not defined for class",class(x),"\n")
}
Two notes:
The ... syntax passes any additional arguments on to functions. If you're extending an existing S3 method (e.g. writing something like summary.myobject), then including the ... is a good idea, because you can pass along arguments conventionally given to the canonical function.
print.myclass <- function(x,...) {
print(x$keyData,...)
}
You can call functions from other functions and keep things nice and parsimonious.
Hmm, your documentation for the function is
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
and if one supplies an object as you claim is require, isn't it already a vector and not a matrix or a data frame, hence obviating the need for separate methods/specific handling?
> dat <- data.frame(A = 1:10, B = runif(10))
> class(dat[,1])
[1] "integer"
> is.vector(dat[,1])
[1] TRUE
> is.vector(dat$A)
[1] TRUE
> is.numeric(dat$A)
[1] TRUE
> is.data.frame(dat$A)
[1] FALSE
I would:
myfunc <- function(x) {
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
n <- length(x)
ret <- x[n/3] # this line only for concreteness
return(ret)
}
> myfunc(dat[,1])
[1] 3
Now, if you want to handle different types of objects and extract a column, then S3 methods would be a way to go. Perhaps your example is over simplified for actual use? Anyway, S3 methods would be something like:
myfunc <- function(x, ...)
UseMethod("myfunc", x)
myfunc.matrix <- function(x, j = 1, ...) {
x <- x[, j]
myfunc.default(x, ...)
}
myfunc.data.frame <- function(x, j = 1, ...) {
x <- data.matrix(x)
myfunc.matrix(x, j, ...)
}
myfunc.default <- function(x, ...) {
n <- length(x)
x[n/3]
}
Giving:
> myfunc(dat)
[1] 3
> myfunc(data.matrix(dat))
[1] 3
> myfunc(data.matrix(dat), j = 2)
[1] 0.2789631
> myfunc(dat[,2])
[1] 0.2789631
You probably should try to use an S3 method for writing a function that will handle multiple datatypes.
A good reference is here: http://www.biostat.jhsph.edu/~rpeng/biostat776/classes-methods.pdf

Resources