R - How pass the environment of a data.table to a function? - r

I'd like to do this:
for example, i have one data.table as:
dt <- data.table(a=1:3, b=5:7, c=10:8)
# a b c
#1: 1 5 10
#2: 2 6 9
#3: 3 7 8
and i want to pass the environment of one row per time to a function, for example:
f <- function(a,b,c){
x <- a*b
y <- a*c
z <- a/b
return( x + y + z)
}
I know i could use in this case mapply to solve this multivariate function, but in my real need i have a function that manipulate almost 150 variables of a data.table, and i don't want to assign the variable's names one by one. I also tried some .SD manipulatations, but it didn't work either.
I would like something that i pass the number of data.table row, and inside the function they get the objects a, b and c in the data.table environment.
Something similar to this:
f <- function(row_id){
# set function parent env as data.table[row_id]
# and *a = data.table[row_id, a]* and successively to b and c...
x <- a*b
y <- a*c
z <- a/b
return( x + y + z)
}

One way would be to adapt the function to take in a given data.table and a row and output your x + y + z:
f <- function(dataTable,row_id){
a <- dataTable[row_id,a]
b <- dataTable[row_id,b]
c <- dataTable[row_id,c]
x <- a*b
y <- a*c
z <- a/b
return( x + y + z)
}
If you input f(dt) it'll give youall of the x+y+z values, or if you give it f(dt,1), it'll return values for the first row only.
EDIT:
Assuming that you're column names are the variable names you want to assign, you could try this:
f <- function(dataTable,row_id){
for(i in colnames(dataTable)){
assign(paste(i,"",sep=""), dataTable[row_id,..i])
}
x <- a*b
y <- a*c
z <- a/b
return( x + y + z)
}

Related

`:=` used for multiple simultaneous assign in data table does not respect updated values

:= used for multiple simultaneous assign in data table does not respect updated values. The column x is incremented, and then I intend to assign updated value of x to y. Why is the value not equal to intended ?
> z = data.table(x = 1:5, y= 1:5)
> z[, `:=` (x = x + 1, y = x)]
> # Actual
> z
x y
1: 2 1
2: 3 2
3: 4 3
4: 5 4
5: 6 5
> # Expected
> z
x y
1: 2 2
2: 3 3
3: 4 4
4: 5 5
5: 6 6
Here are two more alternatives for you to consider. As noted, data.table doesn't do the dynamic scoping in the way that dplyr::mutate does, so y = x still refers to z$x in the second part of your statement. You can consider Filing an issue if you strongly prefer this way.
explicitly assign the new x inline:
z[, `:=` (x = (x <- x + 1), y = x)]
In the environment where j is evaluated, now an object x is created to overwrite z$x temporarily. This should be very similar to what dplyr is doing internally -- evaluating the arguments of mutate sequentially and updating the column values iteratively.
Switch to LHS := RHS form (see ?set):
z[ , c('x', 'y') := {
x = x + 1
.(x, x)
}]
. is shorthand in data.table for list. In LHS := RHS form, RHS must evaluate to a list; each element of that list will be one column in the assignment.
More compactly:
z[ , c('x', 'y') := {x = x + 1; .(x, x)}]
; allows you to write multiple statements on the same line (e.g. 3+4; 4+5 will run 3+4 then 4+5). { creates a way to wrap multiple statements and return the final value, see ?"{". Implicitly you're using this whenever you write if (x) { do_true } else { do_false } or function(x) { function_body }.
The value of x is not updated while doing the calculation for y. You might use the same assignment as x for y
library(data.table)
z[, `:=` (x = x + 1, y = x + 1)]
Or update it separately.
z[, x := x + 1][, y:= x]
This behavior is different as compared to mutate from dplyr where the following works.
library(dplyr)
z %>% mutate(x = x + 1, y = x)

Simplifying matrix product with one unknown variable

I have to compute a product of 3 matrices D=ABC with:
A is a (1x3) matrix,
B is a (3x3) matrix,
C is a (3x1) matrix (and is equal to A', if it matters)
The result of this product is a simple value, and the calculation is very straightforward in R.
My problem is there is one unknown, namely X, inside A and C, and I would like to get the result as a formula: D = ABD = f(X).
Is there any way I could achieve this with R ?
Define D as shown below where argument B is the square matrix and A is a function of x returning a vector.
D <- function(B, A) function(x) t(A(x)) %*% B %*% A(x)
# test
A <- function(x) seq(3) * x
B <- matrix(1:9, 3)
Dfun <- D(B, A)
Dfun(10)
## [1] 22800

Why does r str changes evaluation

Using str() appears to change the evaluation why?
MWE:
f1 <- function(x, y = x) {
str(y)
x <- x + 1
y }
f1(1) # result is 1
f2 <- function(x, y = x) {
x <- x + 1
y }
f2(1) # result is 2
Why does this happen? I tried to use pryr library to debug but can not see the references being updated.
Lazy evaluation. It is about when y = x is evaluated. It is evaluated right before the first statement that uses y.
## f1
y <- x
str(y) ## first use of y
x <- x + 1
y
## f2
x <- x + 1
y <- x
y ## first use of y

Multidimensional Matrix Lookup, how to improve slow solution

Sometimes I want to transform several data columns (usually character or factor) into one new column (usually a number). I try to do this using a lookup matrix. For example, my dataset is
dset <- data.frame(
x=c("a", "a", "b"),
y=c("v", "w", "w"),
stringsAsFactors=FALSE
)
lookup <- matrix(c(1:4), ncol=2)
rownames(lookup) <- c("a", "b")
colnames(lookup) <- c("v", "w")
Ideally (for my purpose here), I would now do
transform(dset, z=lookup[x,y])
and get my new data column. While this works in the one-dimensional case, this fails here, as lookup[x,y] returns a matrix. I came up with this function, which looks rather slow:
fill_from_matrix <- function(m, ...) {
arg <- list(...)
len <- sapply(arg, length)
if(sum(diff(len))!=0) stop("differing lengths in fill_from_matrix")
if(length(arg)!=length(dim(m))) stop("differing dimensions in fill_from_matrix")
n <- len[[1]]
dims <- length(dim(m))
res <- rep(NA, n)
for (i in seq(1,n)) {
one_arg <- list(m)
for (j in seq(1,dims)) one_arg[[j+1]] <- arg[[j]][[i]]
res[i] <- do.call("[", one_arg)
}
return(res)
}
With this function, I can call transform and get the result I wanted:
transform(dset, z=fill_from_matrix(lookup,x,y))
# x y z
# 1 a v 1
# 2 a w 3
# 3 b w 4
However, I am not satisfied with the code and wonder if there is a more elegant (and faster) way to perform this kind of transformation. How do I get rid of the for loops?
This is really quite easy and I suspect fast with base R indexing because the "[" function accepts a two-column matrix for this precise purpose:
> dset$z <- lookup[ with(dset, cbind(x,y)) ]
> dset
x y z
1 a v 1
2 a w 3
3 b w 4
If you needed it as a specific function then:
lkup <- function(tbl, rowidx, colidx){ tbl[ cbind(rowidx, colidx)]}
zvals <- lkup(lookup, dset$x, dset$y)
zvals
#[1] 1 3 4
(I'm pretty sure you can also use three and four column matrices if you have arrays of those dimensions.)
You can use library dplyr for inner_join and use a data.frame instead of matrix as lookup table:
library(dplyr)
lookup = transform(expand.grid(c('a','b'),c('v','w')), v=1:4) %>%
setNames(c('x','y','val'))
inner_join(dset, lookup, by=c('x','y'))
# x y val
#1 a v 1
#2 a w 3
#3 b w 4
A fast way is also to use data.table package, with my definition of lookup:
library(data.table)
setDT(lookup)
setDT(dset)
setkey(lookup, x ,y)[dset]
# x y val
#1: a v 1
#2: a w 3
#3: b w 4
If for any reason you have your matrix lookup as input, transform it in a dataframe:
lookup = transform(expand.grid(rownames(lookup), colnames(lookup)), v=c(lookup))
names(lookup) = c('x','y','val')

for loop on R function

I'm new to R (and programming generally), and confused about why the following bits of code yield different results:
x <- 100
for(i in 1:5){
x <- x + 1
print(x)
}
This incrementally prints the sequence 101:105 as I would expect.
x <- 100
f <- function(){
x <- x + 1
print(x)
}
for(i in 1:5){
f()
}
But this just prints 101 five times.
Why does packaging the logic into a function cause it to revert to the original value on each iteration rather than incrementing? And what can I do to make this work as a repeatedly-called function?
The problem
It is because in your function you are dealing with a local variable x on the left side, and a global variable x on the right side. You are not updating the global x in the function, but assigning the value of 101 to the local x. Each time you call the function, the same thing happens, so you assign local x to be 101 5 times, and print it out 5 times.
To help visualize:
# this is the "global" scope
x <- 100
f <- function(){
# Get the "global" x which has value 100,
# add 1 to it, and store it in a new variable x.
x <- x + 1
# The new x has a value of 101
print(x)
}
This would be similar to the following code:
y <- 100
f <- function(){
x <- y + 1
print(x)
}
One possible fix
As for what to do to fix it. Take the variable as the argument, and pass it back as the update. Something like this:
f <- function(old.x) {
new.x <- old.x + 1
print(new.x)
return(new.x)
}
You would want to store the return value, so your updated code would look like:
x <- 100
f <- function(old.x) {
new.x <- old.x + 1
print(new.x)
return(new.x)
}
for (i in 1:5) {
x <- f(x)
}
This does what you want:
f <- function(){
x <<- x + 1
print(x)
}
But you shouldn't do this. Globals are not a good construct. Functions with side-effects cause code to be hard to understand and hard to debug.
A safer way to use a global is to encapsulate it into another environment. Here is an example:
create.f <- function(x) {
return(function() {
x <<- x + 1
print(x)
})
}
f <- create.f(100)
for (i in 1:5) f()
## [1] 101
## [1] 102
## [1] 103
## [1] 104
## [1] 105
Here, the "global" x is in the environment of the body of create.f, where f is defined, and not the global environment. The environment of a function is the environment in which it is defined (and not that in which it is called).

Resources