How to update data.table variable within a function? - r

Sorry if this is a duplicate. I am very new to data.table. Basically, I am able to get my code to work outside of functions, but when I pack the operations inside of a function, they breakdown. Ultimately, I had hoped to make the functions age.inds and m.inds internal functions in a package.
# required functions ------------------------------------------------------
# create object
create.obj <- function(n = 100){
obj = list()
obj$inds <- data.table(age = rep(0.1, n), m = NA)
obj$m$model <- function(age, a){return(age^a)}
obj$m$params <- list(a = 2)
return(obj)
}
# calculate new 'age' of inds
age.inds <- function(obj){
obj$inds[, age := age + 1]
return(obj)
}
# calculate new 'm' of inds
m.inds <- function(obj){
ARGS <- list()
args.incl <- which(names(obj$m$params) %in% names(formals(obj$m$model)))
ARGS <- c(ARGS, obj$m$params[args.incl])
args.incl <- names(obj$inds)[names(obj$inds) %in% names(formals(obj$m$model))]
ARGS <- c(ARGS, obj$inds[, ..args.incl]) # double dot '..' version
# ARGS <- c(ARGS, inds[, args.incl, with = FALSE]) # 'with' version
obj$inds[, m := do.call(obj$m$model, ARGS)]
return(obj)
}
# advance object
adv.obj <- function(obj, times = 1){
for(i in seq(times)){
obj <- age.inds(obj)
obj <- m.inds(obj)
}
return(obj)
}
# Example ----------------------------------------------------------------
# this doesn't work
obj <- create.obj(n = 10)
obj # so far so good
obj <- age.inds(obj)
obj # 'inds' gone
# would ultimately like to call adv.obj
obj <- adv.obj(obj, times = 5)
Also (as a side note), most of what I would like to do in my code would be vectorized calculations (i.e. updating variables in obj$inds), so I don't even know if going to data.tables makes too much sense for me (i.e. no by grouping operations as of yet). I am dealing with large objects and wondered if switching from data.frame objects would speed things up (I can get my code to work using data.frames).
Thanks
Update
OK, the issue with the printing has been solved thanks to #eddi. I am however unable to use these "inds" functions when they are located internally within a package (i.e not exported). I made a small package (DTtester), that has this example in the help file for adv.obj:
obj <- create.obj(n=10)
obj <- adv.obj(obj, times = 5)
# Error in `:=`(age, new.age) :
# Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are
# defined for use in j, once only and in particular ways. See help(":=").
Any idea why the functions would fail in this way?

Related

R : How to create objects with a function which name and value depend on an argument, and that these objects are found in the global environment?

I have the following situation: I have different dataframes, I would like to be able, for each dataframe, to create 2 dataframes according to the value of one of the columns (log2FoldChange>1 and logFoldChange<-1).
For this I use the following code:
DJ29_T0_Overexpr = DJ29_T0[which(DJ29_T0$log2FoldChange > 1),]
DJ29_T0_Underexpr = DJ29_T0[which(DJ21_T0$log2FoldChange < -1),]
DJ229_T0 being one of my dataframe.
First problem: the sign for the dataframe where log2FoldChange < -1 is not taken into account.
But the main problem is at the time of making the function, I wrote the following:
spliteOverUnder <- function(res){
nm <-deparse(substitute(res))
assign(paste(nm,"_Overexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) > 1),])
assign(paste(nm,"_Underexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) < -1),])
}
Which I then ran with :
spliteOverUnder(DJ29_T0)
No error message, but my objects are not exported in my global environment. I tried with return(paste(nm,"_Overexpr", sep="") but it only returns the object name but not the associated dataframe.
Using paste() forces the use of assign(), so I can't do :
spliteOverUnder <- function(res){
nm <-deparse(substitute(res))
paste(nm,"_Overexpr", sep="") <<- res[which(as.numeric(as.character(res$log2FoldChange)) > 1),]
paste(nm,"_Underexpr", sep="") <<- res[which(as.numeric(as.character(res$log2FoldChange)) < -1),]
}
spliteOverUnder(DJ24_T0)
I encounter the following error:
Error in paste(nm, "_Overexpr", sep = "") <<- res[which(as.numeric(as.character(res$log2FoldChange)) > :
could not find function "paste<-"
If you've encountered this difficulty before, I'd appreciate a little help.
And if you knew, once the function works, how to use a For loop going through a list containing all my dataframes to apply this function to each of them, I'm also a taker.
Thanks
When assigning, use the pos argument to hoist the new objects out of the function.
function(){
assign(x = ..., value = ...,
pos = 1 ## see below
)
}
... where 0 = the function's local environment, 1 = the environment next up (in which the function is defined) etc.
edit
A general function to create the split dataframes in your global environment follows. However, you might rather want to save the new dataframes (from within the function) or just forward them to downstream functions than cram your workspace with intermediary objects.
splitOverUnder <- function(the_name_of_the_frame){
df <- get(the_name_of_the_frame)
df$cat <- cut(df$log2FoldChange,
breaks = c(-Inf, -1, 1, Inf),
labels = c('underexpr', 'normal', 'overexpr')
)
split_data <- split(df, df$cat)
sapply(c('underexpr', 'overexpr'),
function(n){
new_df_name <- paste(the_name_of_the_frame, n, sep = '_')
assign(x = new_df_name,
value = split_data$n,
envir = .GlobalEnv
)
}
)
}
## say, df1 and df2 are your initial dataframes to split:
sapply(c('df1', 'df2'), function(n) splitOverUnder(n))

Name seurat function in r with name of each experiment/variable

I am using seurat to analyze some scRNAseq data, I have managed to put all the SCT integration one line codes from satijalab into a function with basically
SCT_normalization <- function (f1, f2) {
f_merge <- merge (f1, y=f2)
f.list <- SplitObject(f_merge, split.by = "stim")
f.list <- lapply(X = f.list, FUN = SCTransform)
features <- SelectIntegrationFeatures(object.list = f.list, nfeatures = 3000)
f.list <<- PrepSCTIntegration(object.list = f.list, anchor.features = features)
return (f.list)
}
so that I will have f.list in the global environment for downstream analysis and making plots. The problem I am running into is that, every time I run the function, the output would be f.list, I want it to be specific to the input value name (i.e., f1 and/or f2). Basically something that I can set so that I would know which input value was used to generate the final output. I saw something using the assign function but someone wrote a warning about "the evil and wrong..." so I am not sure as to how to approach this.
From what it sounds like you don't need to use the super assign function <<-. In my opinion, I don't think <<- should be used as it can cause unexpected changes in objects. This is what I assume the other person was saying. For example, if you have the following function:
AverageVector <- function(v) x <<- mean(v, rm.na = TRUE)
Now you're trying to find the average of a vector you have, along with more analysis
library(tidyverse)
x <- unique(iris$Species)
avg_sl <- AverageVector(iris$Sepal.Length)
Now where x used to be a character vector, it's not a numeric vector with a length of 1.
So I would remove the <<- and call your function like this
object_list_1_2 <- SCT_normalize(object1, object2)
If you wanted a slightly more programatic way you could do something like this to keep track of objects you could do something like this:
SCT_normalization <- function(f1, f2) {
f_merge <- merge (f1, y = f2)
f.list <- SplitObject(f_merge, split.by = "stim")
f.list <- lapply(X = f.list, FUN = SCTransform)
features <- SelectIntegrationFeatures(object.list = f.list, nfeatures = 3000)
f.list <- PrepSCTIntegration(object.list = f.list, anchor.features = features)
to_return <- list(inputs = list(f1, f2), normalized = f.list)
return(to_return)
}

Getting name of an object from list in Map

Given the following data:
list_A <- list(data_cars = mtcars,
data_air = AirPassengers,
data_list = list(A = 1,
B = 2))
I would like to print names of objects available across list_A.
Example:
Map(
f = function(x) {
nm <- deparse(match.call()$x)
print(nm)
# nm object is only needed to properly name flat file that may be
# produced within Map call
if (any(class(x) == "list")) {
length(x) + 1
} else {
length(x) + 1e6
saveRDS(object = x,
file = tempfile(pattern = make.names(nm), fileext = ".RDS"))
}
},
list_A
)
returns:
[1] "dots[[1L]][[1L]]"
[1] "dots[[1L]][[2L]]"
[1] "dots[[1L]][[3L]]"
$data_cars
NULL
$data_air
NULL
$data_list
[1] 3
Desired results
I would like to get:
`data_cars`
`data_air`
`data_list`
Update
Following the comments, I have modified the example to make it more reflective of my actual needs which are:
While using Map to iterate over list_A I'm performing some operations on each element of the list
Periodically I want to create a flat file with name reflecting name of object that was processed
In addition to list_A, there are also list_B, list_C and so forth. Therefore, I would like to avoid calling names(list) inside the function f of the Map as I will have to modify it n number of times. The solution I'm looking to find should lend itself for:
Map(function(l){...}, list_A)
So I can later replace list_A. It does not have to rely on Map. Any of the apply functions would do; same applied to purrr-based solutions.
Alternative example
do_stuff <- function(x) {
nm <- deparse(match.call()$x)
print(nm)
# nm object is only needed to properly name flat file that may be
# produced within Map call
if (any(class(x) == "list")) {
length(x) + 1
} else {
length(x) + 1e6
saveRDS(object = x,
file = tempfile(pattern = make.names(nm), fileext = ".RDS"))
}
}
Map(do_stuff, list_A)
As per the notes below, I want to avoid having to modify do_stuff function as I will be looking to do:
Map(do_stuff, list_A)
Map(do_stuff, list_B)
Map(do_stuff, list_...)
We could wrap it into a function, and do it in two steps:
myFun <- function(myList){
# do stuff
res <- Map(
f = function(x) {
#do stuff
head(x)
},
myList)
# write to a file, here we might add control
# if list is empty do not output to a file
for(i in names(res)){
write.table(res[[ i ]], file = paste0(i, ".txt"))
}
}
myFun(list_A)
Would something like this work ?
list_A2 <- Map(list, x = list_A,nm = names(list_A) )
trace(do_stuff, quote({ nm <- x$nm; x<- x$x}), at=3)
Map(do_stuff, list_A2)

Vectorizing 'deparse(substitute(d))' in R?

I'm wondering why, when running my below using: bb(d = c(dnorm, dcauchy) ) I get an error saying: object 'c(dnorm, dcauchy)' not found?
P.S. But as I show below, the function has no problem with bb(d = c(dnorm)).
bb <- function(d){
d <- if(is.character(d)) d else deparse(substitute(d))
h <- numeric(length(d))
for(i in 1:length(d)){
h[i] <- get(d[i])(1) ## is there something about `get` that I'm missing?
}
h
}
# Two Examples of Use:
bb(d = dnorm) # Works OK
bb(d = c(dnorm, dcauchy) ) # Error: object 'c(dnorm, dcauchy)' not found
# But if you run:
bb(d = c("dnorm", "dcauchy"))# Works OK
Try this alternative where you pass the functions directly to your function
bb <- function(d){
if (!is.list(d)) d <- list(d)
sapply(d, function(x) x(1))
}
bb(d = list(dnorm, dcauchy))
bb(d = dnorm)
The c() function is meant to combine vectors, it's not a magic "array" function or anything. If you have collections of simple atomic types, you can join them with c(), but for more complicated objects like functions, you need to collect those in a list, not a vector.

Loops and variable scope in R

I have following for loop in R:
v = c(1,2,3,4)
s = create.some.complex.object()
for (i in v){
print(i)
s = some.complex.function.that.updates.s(s)
}
# s here has the right content.
Needless to say, this loop is horribly slow in R.
I tried to write it in functional style:
lapply(v, function(i){
print(i)
s = some.complex.function.that.updates.s(s)
})
# s wasn't updated.
But this doesn't work, because s is passed by value and not by reference.
I only need the result of the last iteration, not all of the intermediate steps.
How do I formulate the first loop in R-style?
Mulone
lapply(v, function(i){
print(i)
s = some.complex.function.that.updates.s(s)
return(s)
})
the result will be a list of object s created for each value of v. Even if it should have passed the value of v anyway cause it was the last operation performed by the function.
If you can't afford to create it many times then there are not a lot of options. It is hard to say as well without seeing the object that you are operating on. If the object is growing/appending you could collect the intermediate results and do the appending at the end. If it is actually mutating you should try to get away from the pass value and use reference classes (http://www.inside-r.org/r-doc/methods/ReferenceClasses). Then the function that modifies it will actually be a method you just call n times.
Is the loop itself really the problem? Or is it rather the time the execution of some.complex.function.that.updates.s needs?
Some R programers will jump through hoops to avoid loops but have a look at this example:
f <- function(a) a/1.001
loop <- function(n) { s = (1/f(1)^n); for (i in 1:n) s <- f(s); s}
system.time(loop(1E7))
user system elapsed
7.011 0.030 7.008
This is 0.7 micro seconds (on a MacBook Pro) per call of a very trivial function in a loop.
v = c(1,2,3,4)
s = create.some.complex.object()
lapply(v, function(i){
print(i)
s <<- some.complex.function.that.updates.s(s)
}) |> invisible()
Use of the <<- operator can sometimes get you into trouble and is (somewhat) discouraged, but when I want to mimic a for loop with side-effects this is a pattern I have found useful.
v = c(1,2,3,4)
s = create.some.complex.object()
lapply(v, function(i){
print(i)
assign('s', some.complex.function.that.updates.s(s), envir = .GlobalEnv)
}) |> invisible()
Using assign allows you to avoid the use of <<- operator. Using <<- is significantly faster than invoking the assign function. For performance reasons in more intensive applications it is very much worth it to replace sequential for loops with vectorized operations as the median execution time of lapply can be several orders of magnitude faster! Here are some toy benchmarks to support this assertion:
v <- c(1, 2, 3, 4)
microbenchmark::microbenchmark({
s <- 1
lapply(v, function(i) {
s <<- s + i
})
}, times = 1e4, unit = 'microseconds')
Median: ~ 4 microseconds
v <- c(1, 2, 3, 4)
microbenchmark::microbenchmark({
s <- 1
for(i in v) {
s <- s + i
}
}, times = 1e4, unit = 'microseconds')
Median: ~ 1488 microseconds

Resources