How to use data.table inside a function? - r

As a minimal working example, for instance, I want to be able to dynamically pass expressions to a data.table object to create new columns or modify existing ones:
dt <- data.table(x = 1, y = 2)
dynamicDT <- function(...) {
dt[, list(...)]
}
dynamicDT(z = x + y)
I was expecting:
z
1: 3
but instead, I get the error:
Error in eval(expr, envir, enclos) : object 'x' not found
So how can I fix this?
Attempts:
I've seen this post, which suggests using quote or substitute, but
> dynamicDT(z = quote(x + y))
Error in `rownames<-`(`*tmp*`, value = paste(format(rn, right = TRUE), :
length of 'dimnames' [1] not equal to array extent
or
> dynamicDT <- function(...) {
+ dt[, list(substitute(...))]
+ }
> dynamicDT(z = x + y)
Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, :
first argument must be atomic
haven't worked for me.

This should be a better alternative to David's answer:
dynamicDT <- function(...) {
dt[, eval(substitute(...))]
}
dynamicDT(z := x + y)
# x y z
#1: 1 2 3

You will need to use eval(parse(text = )) combination. parse will transform the string into an expression, while eval will evaluate it.
library(data.table)
dt <- data.table(x = 1, y = 2)
dynamicDT <- function(temp = "") {
dt[, eval(parse(text = temp))]
}
In order to get your previous desired output
dynamicDT("z := x + y")
## x y z
## 1: 1 2 3
In order to get your current desired output
dynamicDT("z = x + y")
## [1] 3
In order to parse multiple arguments you can do
dynamicDT('c("a","b") := list(x + y, x - y)')
## x y a b
##1: 1 2 3 -1

Related

Function in data.table with two columns as arguments

I have the following function:
DT <- data.table(col1 = 1:4, col2 = c(2:5))
fun <- function(DT, fct){
DT_out <- DT[,new_col := fct]
return(DT_out)
}
fun(input, fct = function(x = col1, y = col2){y - x})
In reality I have some processing before and after this code snippet, thus I do not wish to use directly the statement DT[,new_col := fct] with a fixed fct (because the fct should be flexible). I know this question is very similar to this one, but I cannot figure out how to reformulate the code such that two columns as arguments for the function are allowed. The code above gives the error:
Error in `[.data.table`(DT, , `:=`(new_col, fct)) :
RHS of assignment is not NULL, not an an atomic vector (see ?is.atomic) and not a list column.
One option if you don't mind adding quotes around the variable names
fun <- function(DT, fun, ...){
fun_args <- c(...)
DT[,new_col := do.call(fun, setNames(mget(fun_args), names(fun_args)))]
}
fun(DT, fun = function(x, y){y - x}, x = 'col1', y = 'col2')
DT
# col1 col2 new_col
# 1: 1 2 1
# 2: 2 3 1
# 3: 3 4 1
# 4: 4 5 1
Or use .SDcols (same result as above)
fun <- function(DT, fun, ...){
fun_args <- c(...)
DT[, new_col := do.call(fun, setNames(.SD, names(fun_args))),
.SDcols = fun_args]
}

`:=` used for multiple simultaneous assign in data table does not respect updated values

:= used for multiple simultaneous assign in data table does not respect updated values. The column x is incremented, and then I intend to assign updated value of x to y. Why is the value not equal to intended ?
> z = data.table(x = 1:5, y= 1:5)
> z[, `:=` (x = x + 1, y = x)]
> # Actual
> z
x y
1: 2 1
2: 3 2
3: 4 3
4: 5 4
5: 6 5
> # Expected
> z
x y
1: 2 2
2: 3 3
3: 4 4
4: 5 5
5: 6 6
Here are two more alternatives for you to consider. As noted, data.table doesn't do the dynamic scoping in the way that dplyr::mutate does, so y = x still refers to z$x in the second part of your statement. You can consider Filing an issue if you strongly prefer this way.
explicitly assign the new x inline:
z[, `:=` (x = (x <- x + 1), y = x)]
In the environment where j is evaluated, now an object x is created to overwrite z$x temporarily. This should be very similar to what dplyr is doing internally -- evaluating the arguments of mutate sequentially and updating the column values iteratively.
Switch to LHS := RHS form (see ?set):
z[ , c('x', 'y') := {
x = x + 1
.(x, x)
}]
. is shorthand in data.table for list. In LHS := RHS form, RHS must evaluate to a list; each element of that list will be one column in the assignment.
More compactly:
z[ , c('x', 'y') := {x = x + 1; .(x, x)}]
; allows you to write multiple statements on the same line (e.g. 3+4; 4+5 will run 3+4 then 4+5). { creates a way to wrap multiple statements and return the final value, see ?"{". Implicitly you're using this whenever you write if (x) { do_true } else { do_false } or function(x) { function_body }.
The value of x is not updated while doing the calculation for y. You might use the same assignment as x for y
library(data.table)
z[, `:=` (x = x + 1, y = x + 1)]
Or update it separately.
z[, x := x + 1][, y:= x]
This behavior is different as compared to mutate from dplyr where the following works.
library(dplyr)
z %>% mutate(x = x + 1, y = x)

Creating an object of a custom class and assigning methods to it

I am trying to create an object of class "weeknumber", which would have the following format: "2019-W05"
Additionally, I need to be able to use this object with +- operators. Similarly like "Date" variables behave in base R. For instance:
"2019-W05" + 1 = "2019-W06"
"2019-W01" - 1 = "2018-W52"
"2019-W03" - "2019-W01" = 2
I managed to partially achieve my goal. This is what I got so far:
weeknum <- function(date){
# Function that creates weeknumber object from a date
weeknumber <- paste(isoyear(date), formatC(isoweek(date), width = 2, format = "d", flag = "0"), sep = "-W")
class(weeknumber) <- c("weeknumber", class(weeknumber))
weeknumber
}
week2date <- function(weeknumber, weekday = 4) {
# Wrapper around ISOweek2date function from the 'ISOweek' package
ISOweek2date(paste(weeknumber, weekday, sep = "-"))
}
"+.weeknumber" <- function(x, ...) {
# Creating a method for addition
x <- week2date(x) + sum(...)*7
weeknum(x)
}
"-.weeknumber" <- function(x, ...) {
# Creating a method for subtraction
x <- week2date(x) - sum(...)*7
weeknum(x)
}
What works:
> x <- weeknum("2019-01-01")
> x
[1] "2019-W01"
attr(,"class")
[1] "weeknumber" "character"
> x + 1
[1] "2019-W02"
attr(,"class")
[1] "weeknumber" "character"
> x - 1
[1] "2018-W52"
attr(,"class")
[1] "weeknumber" "character"
Works as expected! The only annoying thing is that calling the variable also
prints out the attributes. Any way to hide them in the default print out?
What doesn't work:
> 1 + x
Error: all(is.na(weekdate) | stringr::str_detect(weekdate, kPattern)) is not TRUE
> y <- weeknum("2019-03-01")
> y - x
Error in as.POSIXlt.default(x) :
do not know how to convert 'x' to class “POSIXlt”
Any help appreciated!
Edit:
Figured out a solution how to make 1 + x (where x is a weeknumber) work. Not very elegant but does the job.
"+.weeknumber" <- function(...) {
# Creating a method for addition
vector <- c(...)
week_index <- which(unlist(lapply(list(...), function(x) class(x)[1]))=="weeknumber")
week <- vector[week_index]
other_values <- sum(as.numeric(c(...)[-week_index]))
x <- week2date(week) + other_values*7
weeknum(x)
}
> x <- weeknum("2019-01-01")
> x
[1] "2019-W01"
> 5 + x + 1 + 2 - 1
[1] "2019-W08"
For the first part: Define a custom print-method for your class:
print.weeknumber <- function(x,...)
{
attributes(x) <- NULL
print(x)
}

purrr::pmap with user-defined functions and named list

The following piece of code works as expected:
library(tidyverse)
tib <- tibble(x = c(1,2), y = c(2,4), z = c(3,6))
tib %>% pmap(c)
#[[1]]
#x y z
#1 2 3
#
#[[2]]
#x y z
#2 4 6
But if I define the function
my_c_1 <- function(u, v, w) c(u, v, w)
I get an error:
tib %>% pmap(my_c_1)
#Error in .f(x = .l[[c(1L, i)]], y = .l[[c(2L, i)]], z = .l[[c(3L, i)]], :
# unused arguments (x = .l[[c(1, i)]], y = .l[[c(2, i)]], z = .l[[c(3, i)]])
Equivalently, for a named list with the base vector function all works well:
lili_1 <- list(x = list(1,2), y = list(2,4), z = list(3,6))
pmap(lili_1, c)
#[[1]]
#x y z
#1 2 3
#
#[[2]]
#x y z
#2 4 6
And with the user-defined function I get the same error:
pmap(lili_1, my_c_1)
#Error in .f(x = .l[[c(1L, i)]], y = .l[[c(2L, i)]], z = .l[[c(3L, i)]], :
#unused arguments (x = .l[[c(1, i)]], y = .l[[c(2, i)]], z = .l[[c(3, i)]])
However, for an un-named list with the user-defined function, it works:
lili_2 <- list(list(1,2), list(2,4), list(3,6))
pmap(lili_2, my_c_1)
#[[1]]
#[1] 1 2 3
#
#[[2]]
#[1] 2 4 6
I don't quite understand why things break with named lists and user-defined functions. Any insight?
BTW, I found a temporary workaround by defining:
my_c_2 <- function(...) c(...)
Then all works well, even with named lists... which leaves me even more puzzled.
This is in the spirit of a minimal reproducible example. In my current working code I would like to be able to pipe tibbles to pmap with my more general defined function without using the ... workaround for my variables.
your function my_c_1 has arguments u, v, w but you pass a list with names x, y, z. If you don't want a function with no named arguments (..., such as base's c), you should make sure the names match in your call.

correct braces placement in := within data.table

Here is an example of a problem I am having. Am I misusing or is this a bug?
require(data.table)
x <- data.table(a = 1:4)
# this does not work
x[ , {b = a + 3; `:=`(c = b)}]
# Error in `:=`(c = b) : unused argument(s) (c = b)
# this works fine
x[ ,`:=`(c = a + 3)]
not a bug,
it's just that the ordering of the braces should be different:
That is, use the braces to wrap only the RHS argument in `:=`(LHS, RHS)
Example:
# sample data
x <- data.table(a = 1:4)
# instead of:
x[ , {b = a + 3; `:=`(c, b)}] # <~~ Notice braces are wrapping LHS AND RHS
# use this:
x[ , `:=`(c, {b = a + 3; b})] # <~~ Braces wrapping only RHS
x
# a c
# 1: 1 4
# 2: 2 5
# 3: 3 6
# 4: 4 7
However, more succinctly and naturally:
you are probably looking for this:
x[ , c := {b = a + 3; b}]
Update from Matthew
Exactly. Using := in other incorrect ways gives this (long) error :
x := 1
# Error: := is defined for use in j only, and (currently) only once; i.e.,
# DT[i,col:=1L] and DT[,newcol:=sum(colB),by=colA] are ok, but not
# DT[i,col]:=1L, not DT[i]$col:=1L and not DT[,{newcol1:=1L;newcol2:=2L}].
# Please see help(":="). Check is.data.table(DT) is TRUE.
but not in the case that the question showed, giving just :
x[ , {b = a + 3; `:=`(c = b)}]
# Error in `:=`(c = b) : unused argument(s) (c = b)
I've just changed this in v1.8.9. Both these incorrect ways of using := now give a more succinct error :
x[ , {b = a + 3; `:=`(c = b)}]
# Error in `:=`(c = b) :
# := and `:=`(...) are defined for use in j only, in particular ways. See
# help(":="). Check is.data.table(DT) is TRUE.
and we'll embellish ?":=". Thanks #Alex for highlighting!

Resources