data.table by with substitute, eval and deparse

data.table by with substitute, eval and deparse - r

Problem 1: Why it is not consistent?
dt <- data.table(x=1:4, y=c(1,1,2,2), z=c(1,2,1,2))
test1 <- function(dt, a){
t <- deparse(substitute(a))
dt[,list(x=sum(x)), by=t]
}
test1(dt, y) # Works well
y x
1: 1 3
2: 2 7
test2 <- function(dt, a){
dt[,list(x=sum(x)), by=deparse(substitute(a))]
}
test2(dt, y)
# Error: 'by' appears to evaluate to column names but isn't c() or key().
Problem 2:
It seems I can do the following in both frames? Why is that? Which one should I use?
test1 <- function(dt, a){
dt[,list(x=sum(x)), by=eval(substitute(a))]
}
test1(dt, y)
substitute x
1: 1 3
2: 2 7
>
test2 <- function(dt, a){
dt[,list(x=sum(x)), by=eval(substitute(a), parent.frame())]
}
test2(dt, y)
substitute x
1: 1 3
2: 2 7

You didn't reproduce the full error:
test2(dt, y)
Error in [.data.table(dt, , list(x = sum(x)), by = deparse(substitute(a))) :
'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(deparse(substitute(a))) should work. This is for efficiency so data.table can detect which columns are needed.
As suggested (or perhaps merely hinted at) you could get success by just enclosing in c
test2 <- function(dt, a){
dt[,list(x=sum(x)), by=c(deparse(substitute(a)))]
}
> test2(dt, y)
y x
1: 1 3
2: 2 7
I think the c() forces an evaluation.

Related

Assign values to slice of data.frame column returned from function in R

I have a function that does some (complex) selecting of rows, and returns the corresponding values from one column. I then want to overwrite these values.
MWE
x = data.frame(a=c(1,2,3),b=c(4,5,6))
f = function(x,i){ return(x[x$a==i,'b']) }
f(x,2) <- 3
throws:
Error in f(x, 2) = 3 : could not find function "f<-"
Is there a way to assign these values from the function return?
No tidyverse please. Only base R.

The function should be
f <- function(x, i, val) {
if(missing(val)) {
x<- x[x$a==i,]
} else {
x$b[x$a ==i] <- val
}
return(x)
}
Then, when we run the code
> f(x, 2, 3)
a b
1 1 4
2 2 3
3 3 6
> f(x, 2) # missing the val
a b
2 2 5
If we want to update the object, use <-
x <- f(x, 2, 3)

An alternative would be to write your function in two ways: the original one, and a specific assignment function, so that the R parser will work on your original syntax:
f <- function(x, i) {
return(x[x$a == i, 'b'])
}
`f<-` <- function(x, i, value) {
x[x$a == i, 'b'] <- value
return(x)
}
So now you can do:
f(x, 2)
#> [1] 5
f(x, 2) <- 3
x
#> a b
#> 1 1 4
#> 2 2 3
#> 3 3 6
f(x, 2)
#> [1] 3

functional programing, iterations of a loop within child frame

I would like to build a function to use within a loop function, however the child cannot find the i from the parent function's for (i in 1:3). How would I avoid this problem? I was thinking it could be be solved with assign(X, envir=sys.frame(which=-1)) , although I'm not sure where to place it or the frame to select in this case...
#simplified issue
parent.fxn = function(X) {
for (i in 1:3)
child.fxn(X)
}
child.fxn = function(X) {
return(X[i])
}
#ex. run_script
#parent.fxn(1:10)
#Error in child.fxn(X) : object 'i' not found
For reference, here is a version of the actual script I tried to write (below)... the error from my script (below) was very slightly different from my simplified example (above), although I'm not sure why.
library(cluster)
library(data.table)
library(magrittr)
data = data.table(x=runif(300), y=runif(300), z=runif(300))
master.fxn = function(DT, nlvls=3, data.cols=c("x", "y", "z")) {
Lvls = paste0("Level_", 1:nlvls);
for (i in 1:nlvls) {
core.fxn(DT, Lvls, return.k=TRUE)
}
}
core.fxn = function(X, mkey, ...) {
X[, c(mkey[i]):=cluster.fxn(.SD, ...), .SDcols=data.cols, by=c(mkey[0:(i-1)])] %>%
setkeyv(c(mkey[1:i]))
}
cluster.fxn = function(X, return.ac=FALSE, return.k=FALSE) {
a = agnes(X, metric="euclidean", method="ward", stand=TRUE)
if (return.ac) return(a$ac)
if (return.k) return(cutree(a, k=2))
}
#ex. run_script
#master.fxn(data)
#Error in eval(bysub, parent.frame(), parent.frame()) : object 'i' not found
And finally, here is a working version of the actual script, which does not use a function that utilizes [i] within the loop, but achieves the desired outcome. These scripts are condensed to exemplify the error. It would really simplify things in the final product if I could get the core.fxn to work within the master.fxn loop.
library(cluster)
library(data.table)
library(magrittr)
data = data.table(x=runif(300), y=runif(300), z=runif(300))
master.fxn = function(DT, nlvls=3, data.cols=c("x", "y", "z")) {
Lvls = paste0("Level_", 1:nlvls);
for (i in 1:nlvls) {
DT[, c(Lvls[i]):=cluster.fxn(.SD, return.k=TRUE), .SDcols=data.cols, by=c(Lvls[0:(i-1)])] %>%
setkeyv(c(Lvls[1:i]))
}
}
cluster.fxn = function(X, return.ac=FALSE, return.k=FALSE) {
a = agnes(X, metric="euclidean", method="ward", stand=TRUE)
if (return.ac) return(a$ac)
if (return.k) return(cutree(a, k=2))
}
#ex. run_script
#> data
# x y z
# 1: 0.1934689 0.67631296 0.3083592
# 2: 0.5267910 0.93186454 0.9583132
# 3: 0.5533244 0.37712457 0.4022132
# 4: 0.1886627 0.07535931 0.1171205
# 5: 0.7499003 0.90682684 0.6104284
# ---
#296: 0.7196245 0.80206991 0.6657839
#297: 0.2453930 0.06807955 0.8126690
#298: 0.3481978 0.23024162 0.4734052
#299: 0.2123976 0.27191432 0.1753336
#300: 0.7312911 0.89491793 0.5417281
#
#>master.fxn(data)
#data[, .SD[1], by=.(Level_1, Level_2, Level_3)]
#Level_1 Level_2 Level_3 x y z
#1: 1 1 1 0.0584953 0.77933040 0.76432541
#2: 1 1 2 0.1814877 0.65263178 0.41425295
#3: 1 2 1 0.9932725 0.99409350 0.96849477
#4: 1 2 2 0.9102010 0.76071068 0.69283525
#5: 2 1 1 0.9040033 0.85361443 0.30636660
#6: 2 1 2 0.8026868 0.08595128 0.43176372
#7: 2 2 1 0.2167962 0.73551203 0.01174373
#8: 2 2 2 0.5592571 0.84508641 0.37382253
#>

How to write a R function to ensure that we get a variable but not a col of a data.table?

R Code
library(data.table)
x <- 4
f1 <- function(){
x <- 1
dt <- data.table(x=1:4,y=1:12)
dt[x==get("x", envir=parent.env(environment()))]
}
f1()
I got this:
x y
1: 1 1
2: 1 5
3: 1 9
f2 is a new function that remove x <- 1 in the function.
f2 <- function(){
#x <- 1
dt <- data.table(x=1:4,y=1:12)
dt[x==get("x", envir=parent.env(environment()))]
}
f2()
I got this:
x y
1: 4 4
2: 4 8
3: 4 12
that's right, my question is how to write a function to replace the get("x", envir=parent.env(environment()))?
Thanks!

I only just realized that the OP is grabbing x from the function's enclosing environment instead of passing it as an argument. I consider this bad practice and don't really have a recommendation for that case. I might delete this answer (which only covers passing x to the function) if it's too much of a distraction.
library(data.table)
dt <- data.table(x=1:4,y=1:12)
ff = function(x, ...){
mDT = data.table(x)
dt[mDT, on=.(x), ...]
}
ff(4L, verbose = TRUE)
# Calculated ad hoc index in 0 secs
# Starting bmerge ...done in 0 secs
# x y
# 1: 4 4
# 2: 4 8
# 3: 4 12
This only addresses the OP's specific example, of DT[x == get("x", ...)], and not broader expressions. For those, constructing and evaluating an expression should work:
fs = function(x, ...){
e = substitute(x == ..x, list(..x = x))
dt[eval(e), ...]
}
fs(4L, verbose = TRUE)
# Creating new index 'x'
# Starting bmerge ...done in 0 secs
# x y
# 1: 4 4
# 2: 4 8
# 3: 4 12
fs(3L, verbose = TRUE)
# Using existing index 'x'
# Starting bmerge ...done in 0 secs
# x y
# 1: 3 3
# 2: 3 7
# 3: 3 11
The verbose output indicates that fs creates indices, which can be helpful for speed. See vignette("datatable-secondary-indices-and-auto-indexing").
Eventually, there might be syntax so we can simply write ...
dt[..x == x]
perhaps using the proposed inherits = TRUE argument from the link for safety (so that x must be a column and either (i) x must exist in the parent environment or ..x must be a column name).

#Frank, Thanks! Based on this post variable usage in data.table, I wrote a function:
`..` <- function(x){
stopifnot(inherits(x, "character"))
stopifnot(length(x)==1)
get(x, parent.frame(4))
}
x <- 4
f1 <- function(){
x <- 1
dt <- data.table(x=1:4,y=1:12)
dt[x==..("x")]
}
f1()
f2 <- function(){
#x <- 1
dt <- data.table(x=1:4,y=1:12)
dt[x==..("x")]
}
f2()
Both f1 and f2 got the correct results!
Why parent.frame(4)?
We see the code first:
current_frame <- sys.nframe()
dt <- data.table()
dt[, sys.nframe() - current_frame]
We got 4, this should be the reason.

I found the old solution does not work for data.table 1.11.4, and I wrote a new one:
.. <- function (x, env = parent.frame())
{
stopifnot(inherits(x, "character"))
stopifnot(length(x) == 1)
get(x, envir = parent.env(env))
}
x <- 4
f1 <- function(){
x <- 1
dt <- data.table(x=1:4,y=1:12)
dt[x==..("x")]
}
f1j <- function(){
x <- 1
dt <- data.table(x=1:4,y=1:12)
dt[, ..("x")]
}
f2 <- function(){
#x <- 1
dt <- data.table(x=1:4,y=1:12)
dt[x==..("x")]
}
f2j <- function(){
#x <- 1
dt <- data.table(x=1:4,y=1:12)
dt[,..("x")]
}
stopifnot(all(f1()$y==c(1,5,9)))
stopifnot(all(f1j()==c(1)))
stopifnot(all(f2()$y==c(4,8,12)))
stopifnot(all(f2j()==c(4)))
I have tested that this worked for data.table_1.10.4-3 and data.table_1.11.4.
Actually, I am confused about R's parent.frame and how to find the right variables in R, I just tested some possible ways until I got the expected results.

Vector names error

Im trying to set names for a vector using the function names, but R gives me an error. I want to create a vector with function and then set name for each number in that vector. I want to do all this step by writing just one order (for example v(x)). This is example of my code script
v <- c(2,6,5)
d <- function(x) x*9
names(d(x))<-paste("q=", 1:3, sep="")
and R says
Error in names(d(x)) <- paste("q=", 1:3, sep = "") :
could not find function "d<-"

I don't really know what you are doing, but assuming this approximates it I can reproduce the error:
qn <- 1:11
div_1 <- function(x) { x <- x + 1 }
div_2 <- function(x) { x <- x + 2 }
div<- function(x) c(div_1(x)[1],div_2(x),div_1(x)[2:10])
x <- 1
names(div(x))<- paste("q=", qn, sep="" )
# Error in names(div(x)) <- paste("q=", qn, sep = "") :
# could not find function "div<-"
and I can fix it with this (breaking it into two steps):
qn <- 1:11
div_1 <- function(x) { x <- x + 1 }
div_2 <- function(x) { x <- x + 2 }
div<- function(x) c(div_1(x)[1],div_2(x),div_1(x)[2:10])
x <- 1
v <- div(x)
names(v)<- paste("q=", qn, sep="" )
# q=1 q=2 q=3 q=4 q=5 q=6 q=7 q=8 q=9 q=10 q=11
# 2 3 NA NA NA NA NA NA NA NA NA
It may be a bug, or a limitation in assigning names to a temporary variable (note that the result gets thrown out in your version). Out of curiousity, what are you doing with div?

R ff ffbase ffwhich error in a function call?

Here is My code call ffwhich in a function:
library(ffbase)
rm(a,b)
test <- function(x) {
a <- 1
b <- 3
ffwhich(x, x > a & x < b)
}
x <- ff(1:10)
test(x)
Error in eval(expr, envir, enclos) (from <text>#1) : object 'a' not found
traceback()
6: eval(expr, envir, enclos)
5: eval(e)
4: which(eval(e))
3: ffwhich.ff_vector(x, x > a & x < b)
2: ffwhich(x, x > a & x < b) at #4
1: test(x)
It may caused by lazy evaluation? The eval() can not find the a and b which is bounded in function test. How can I use ffwhich in a function?
R 2.15.2
ffbase 0.6-3
ff 2.2-10
OS opensuse 12.2 64 bit

Yes, it looks like an eval issue like Arun is indicating. I normally use the following when using ffwhich which is like an eval.
library(ffbase)
rm(a,b)
test <- function(x) {
a <- 1
b <- 3
idx <- x > a & x < b
idx <- ffwhich(idx, idx == TRUE)
idx
}
x <- ff(1:10)
test(x)

I was having the same problem, and the answer given was not solving it, because we can not pass the argument "condition" to the function.
I just got a way to do that.
Here it is ::
require(ffdf)
# the data ::
x <- as.ffdf( data.frame(a = c(1:4,1),b=5:1))
x[,]
# Now the function below is working ::
idx_ffdf <- function(data, condition){
exp <-substitute( (condition) %in% TRUE)
# substitute will take the value of condition (non-evaluated).
# %in% TRUE makes the condition be false when there is NAs...
idx <- do.call(ffwhich, list(data, exp) ) # here is the trick: do.call !!!
return(idx)
}
# testing :
idx <- idx_ffdf(x,a==1)
idx[] # gives the correct 1,5 ...
idx <- idx_ffdf(x,b>3)
idx[] # gives the correct 1,2 ...
Hope this helps somebody !

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

data.table by with substitute, eval and deparse - r

Related

Assign values to slice of data.frame column returned from function in R

functional programing, iterations of a loop within child frame

How to write a R function to ensure that we get a variable but not a col of a data.table?

Vector names error

R ff ffbase ffwhich error in a function call?

Categories

Resources