lapply not working with my own variable - r

I created this custom function of mine here:
generate_portfolio <- function(price_list_w_returns, initial_AUM){
price_list_w_returns1 = lapply(price_list_w_returns, transform, USD_portfolio = initial_AUM*cum_returns )
}
where initial_AUM is something that someone can change to whatever number he or she wants.
USD_portfolio in this case is a new column that i am trying to create and cum_returns is an existing column. price_list_w_returns is a list of dataframes with similar columns and rows.
The error i am getting is:
Error in eval(substitute(list(...)), `_data`, parent.frame()) :
object 'initial_AUM' not found

The problem is of not fully correct definition of lambda-function in lapply. You are trying to pass an argument, which is not defined, to the function. Please see below how it is solved in generate_portfolio2:
# simulate list of data.frames
price_list_w_returns <- replicate(10, data.frame(id = 1:10, cum_returns = abs(rnorm(10)/ 10)), simplify = FALSE)
str(price_list_w_returns)
## Not run:
generate_portfolio <- function(price_list_w_returns, initial_AUM){
price_list_w_returns1 = lapply(price_list_w_returns, transform, USD_portfolio = initial_AUM * cum_returns )
}
generate_portfolio(price_list_w_returns, 2)
## Error in eval(substitute(list(...)), `_data`, parent.frame()) :
## object 'initial_AUM' not found
## End(Not run)
generate_portfolio2 <- function(price_list_w_returns, initial_AUM){
lapply(price_list_w_returns, function(x) {
x$USD_portfolio = initial_AUM * x$cum_returns
x
})
}
generate_portfolio2(price_list_w_returns, 2)

Related

R : How to create objects with a function which name and value depend on an argument, and that these objects are found in the global environment?

I have the following situation: I have different dataframes, I would like to be able, for each dataframe, to create 2 dataframes according to the value of one of the columns (log2FoldChange>1 and logFoldChange<-1).
For this I use the following code:
DJ29_T0_Overexpr = DJ29_T0[which(DJ29_T0$log2FoldChange > 1),]
DJ29_T0_Underexpr = DJ29_T0[which(DJ21_T0$log2FoldChange < -1),]
DJ229_T0 being one of my dataframe.
First problem: the sign for the dataframe where log2FoldChange < -1 is not taken into account.
But the main problem is at the time of making the function, I wrote the following:
spliteOverUnder <- function(res){
nm <-deparse(substitute(res))
assign(paste(nm,"_Overexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) > 1),])
assign(paste(nm,"_Underexpr", sep=""), res[which(as.numeric(as.character(res$log2FoldChange)) < -1),])
}
Which I then ran with :
spliteOverUnder(DJ29_T0)
No error message, but my objects are not exported in my global environment. I tried with return(paste(nm,"_Overexpr", sep="") but it only returns the object name but not the associated dataframe.
Using paste() forces the use of assign(), so I can't do :
spliteOverUnder <- function(res){
nm <-deparse(substitute(res))
paste(nm,"_Overexpr", sep="") <<- res[which(as.numeric(as.character(res$log2FoldChange)) > 1),]
paste(nm,"_Underexpr", sep="") <<- res[which(as.numeric(as.character(res$log2FoldChange)) < -1),]
}
spliteOverUnder(DJ24_T0)
I encounter the following error:
Error in paste(nm, "_Overexpr", sep = "") <<- res[which(as.numeric(as.character(res$log2FoldChange)) > :
could not find function "paste<-"
If you've encountered this difficulty before, I'd appreciate a little help.
And if you knew, once the function works, how to use a For loop going through a list containing all my dataframes to apply this function to each of them, I'm also a taker.
Thanks
When assigning, use the pos argument to hoist the new objects out of the function.
function(){
assign(x = ..., value = ...,
pos = 1 ## see below
)
}
... where 0 = the function's local environment, 1 = the environment next up (in which the function is defined) etc.
edit
A general function to create the split dataframes in your global environment follows. However, you might rather want to save the new dataframes (from within the function) or just forward them to downstream functions than cram your workspace with intermediary objects.
splitOverUnder <- function(the_name_of_the_frame){
df <- get(the_name_of_the_frame)
df$cat <- cut(df$log2FoldChange,
breaks = c(-Inf, -1, 1, Inf),
labels = c('underexpr', 'normal', 'overexpr')
)
split_data <- split(df, df$cat)
sapply(c('underexpr', 'overexpr'),
function(n){
new_df_name <- paste(the_name_of_the_frame, n, sep = '_')
assign(x = new_df_name,
value = split_data$n,
envir = .GlobalEnv
)
}
)
}
## say, df1 and df2 are your initial dataframes to split:
sapply(c('df1', 'df2'), function(n) splitOverUnder(n))

Name seurat function in r with name of each experiment/variable

I am using seurat to analyze some scRNAseq data, I have managed to put all the SCT integration one line codes from satijalab into a function with basically
SCT_normalization <- function (f1, f2) {
f_merge <- merge (f1, y=f2)
f.list <- SplitObject(f_merge, split.by = "stim")
f.list <- lapply(X = f.list, FUN = SCTransform)
features <- SelectIntegrationFeatures(object.list = f.list, nfeatures = 3000)
f.list <<- PrepSCTIntegration(object.list = f.list, anchor.features = features)
return (f.list)
}
so that I will have f.list in the global environment for downstream analysis and making plots. The problem I am running into is that, every time I run the function, the output would be f.list, I want it to be specific to the input value name (i.e., f1 and/or f2). Basically something that I can set so that I would know which input value was used to generate the final output. I saw something using the assign function but someone wrote a warning about "the evil and wrong..." so I am not sure as to how to approach this.
From what it sounds like you don't need to use the super assign function <<-. In my opinion, I don't think <<- should be used as it can cause unexpected changes in objects. This is what I assume the other person was saying. For example, if you have the following function:
AverageVector <- function(v) x <<- mean(v, rm.na = TRUE)
Now you're trying to find the average of a vector you have, along with more analysis
library(tidyverse)
x <- unique(iris$Species)
avg_sl <- AverageVector(iris$Sepal.Length)
Now where x used to be a character vector, it's not a numeric vector with a length of 1.
So I would remove the <<- and call your function like this
object_list_1_2 <- SCT_normalize(object1, object2)
If you wanted a slightly more programatic way you could do something like this to keep track of objects you could do something like this:
SCT_normalization <- function(f1, f2) {
f_merge <- merge (f1, y = f2)
f.list <- SplitObject(f_merge, split.by = "stim")
f.list <- lapply(X = f.list, FUN = SCTransform)
features <- SelectIntegrationFeatures(object.list = f.list, nfeatures = 3000)
f.list <- PrepSCTIntegration(object.list = f.list, anchor.features = features)
to_return <- list(inputs = list(f1, f2), normalized = f.list)
return(to_return)
}

Correct use of tryCatch to skip plot error

This is a follow up question to this question which didn't get any traction. I realise now that this has nothing to do with purrr::possibly specifically as tryCatch also doesn't work like I thought.
I'm trying to write a function which will run any other arbitrary function without throwing an error. This might not be good practice but I want to understand why this does not work:
library(ggplot2)
## function I thought I could use to run other functions safely
safe_plot <- function(.FUN, ...) {
tryCatch({
.FUN(...)
},
error = function(e) {
message(e)
print("I have got passed the error")
})
}
## Simple plot function
plot_fun <- function(df) {
ggplot(df, aes(xvar, yvar)) +
geom_point()
}
## Works for good data
some_data <- data.frame(xvar = 1:10, yvar = 1:10)
safe_plot(.FUN = plot_fun, df = some_data)
## Works here for data that is not defined:
> safe_plot(.FUN = plot_fun, df = not_defined)
object 'not_defined' not found[1] " I have got passed the error"
## Falls over for an empty data frame
empty_data <- data.frame()
safe_plot(.FUN = plot_fun, df = empty_data)
## Why does't this get past this error and print the message?
> safe_plot(.FUN = plot_fun, df = empty_data)
Error in FUN(X[[i]], ...) : object 'xvar' not found
Why doesn't the last call get to the print statement? I suspect I am abusing tryCatch but I don't know why. Reading some other questions (1, 2) I checked for the class of what tryCatch returns:
> ## What is returned
> empty_data <- data.frame()
> what_is_this <- safe_plot(.FUN = plot_fun, df = empty_data)
> what_is_this
Error in FUN(X[[i]], ...) : object 'xvar' not found
> class(what_is_this)
[1] "gg" "ggplot"
So in the plot function above ggplot(df, does not fail because there is a df of empty_data. The error must occur in aes(xar, . But why does this not return an error class and instead returns a ggplot?

R: how to find what S3 method will be called on an object?

I know about methods(), which returns all methods for a given class. Suppose I have x and I want to know what method will be called when I call foo(x). Is there a oneliner or package that will do this?
The shortest I can think of is:
sapply(class(x), function(y) try(getS3method('foo', y), silent = TRUE))
and then to check the class of the results... but is there not a builtin for this?
Update
The full one liner would be:
fm <- function (x, method) {
cls <- c(class(x), 'default')
results <- lapply(cls, function(y) try(getS3method(method, y), silent = TRUE))
Find(function (x) class(x) != 'try-error', results)
}
This will work with most things but be aware that it might fail with some complex objects. For example, according to ?S3Methods, calling foo on matrix(1:4, 2, 2) would try foo.matrix, then foo.numeric, then foo.default; whereas this code will just look for foo.matrix and foo.default.
findMethod defined below is not a one-liner but its body has only 4 lines of code (and if we required that the generic be passed as a character string it could be reduced to 3 lines of code). It will return a character string representing the name of the method that would be dispatched by the input generic given that generic and its arguments. (Replace the last line of the body of findMethod with get(X(...)) if you want to return the method itself instead.) Internally it creates a generic X and an X method corresponding to each method of the input generic such that each X method returns the name of the method of the input generic that would be run. The X generic and its methods are all created within the findMethod function so they disappear when findMethod exits. To get the result we just run X with the input argument(s) as the final line of the findMethod function body.
findMethod <- function(generic, ...) {
ch <- deparse(substitute(generic))
f <- X <- function(x, ...) UseMethod("X")
for(m in methods(ch)) assign(sub(ch, "X", m, fixed = TRUE), "body<-"(f, value = m))
X(...)
}
Now test it. (Note that the one-liner in the question fails with an error in several of these tests but findMethod gives the expected result.)
findMethod(as.ts, iris)
## [1] "as.ts.default"
findMethod(print, iris)
## [1] "print.data.frame"
findMethod(print, Sys.time())
## [1] "print.POSIXct"
findMethod(print, 22)
## [1] "print.default"
# in this example it looks at 2nd component of class vector as no print.ordered exists
class(ordered(3))
## [1] "ordered" "factor"
findMethod(print, ordered(3))
## [1] "print.factor"
findMethod(`[`, BOD, 1:2, "Time")
## [1] "[.data.frame"
I use this:
s3_method <- function(generic, class, env = parent.frame()) {
fn <- get(generic, envir = env)
ns <- asNamespace(topenv(fn))
tbl <- ns$.__S3MethodsTable__.
for (c in class) {
name <- paste0(generic, ".", c)
if (exists(name, envir = tbl, inherits = FALSE)) {
return(get(name, envir = tbl))
}
if (exists(name, envir = globalenv(), inherits = FALSE)) {
return(get(name, envir = globalenv()))
}
}
NULL
}
For simplicity this doesn't return methods defined by assignment in the calling environment. The global environment is checked for convenience during development. These are the same rules used in r-lib packages.

I do not understand error "object not found" inside the function

I have roughly this function:
plot_pca_models <- function(models, id) {
library(lattice)
splom(models, groups=id)
}
and I'm calling it like this:
plot_pca_models(data.pca, log$id)
wich results in this error:
Error in eval(expr, envir, enclos) : object 'id' not found
when I call it without the wrapping function:
splom(data.pca, groups=log$id)
it raises this error:
Error in log$id : object of type 'special' is not subsettable
but when I do this:
id <- log$id
splom(models, groups=id)
it behaves as expected.
Please can anybody explain why it behaves like this and how to correct it? Thanks.
btw:
I'm aware of similar questions here, eg:
Help understand the error in a function I defined in R
Object not found error with ddply inside a function
Object disappears from namespace in function
but none of them helped me.
edit:
As requested, there is full "plot_pca_models" function:
plot_pca_models <- function(data, id, sel=c(1:4), comp=1) {
# 'data' ... princomp objects
# 'id' ... list of samples id (classes)
# 'sel' ... list of models to compare
# 'comp' ... which pca component to compare
library(lattice)
models <- c()
models.size <- 1:length(data)
for(model in models.size) {
models <- c(models, list(data[[model]]$scores[,comp]))
}
names(models) <- 1:length(data)
models <- do.call(cbind, models[sel])
splom(models, groups=id)
}
edit2:
I've managed to make the problem reproducible.
require(lattice)
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100))
my.id <- data.frame(id = sample(letters[1:4], 100, replace = TRUE))
plot_pca_models2 <- function(x, ajdi) {
splom(x, group = ajdi)
}
plot_pca_models2(x = my.data, ajdi = my.id$id)
which produce the same error like above.
The problem is that splom evaluates its groups argument in a nonstandard way.A quick fix is to rewrite your function so that it constructs the call with the appropriate syntax:
f <- function(data, id)
eval(substitute(splom(data, groups=.id), list(.id=id)))
# test it
ir <- iris[-5]
sp <- iris[, 5]
f(ir, sp)
log is a function in base R. Good practice is to not name objects after functions...it can create confusion. Type log$test into a clean R session and you'll see what's happening:
object of type 'special' is not subsettable
Here's a modification of Hong Oi's answer. First I would recommend to include id in the main data frame, i.e
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100), id = sample(letters[1:4], 100, replace = TRUE))
.. and then
plot_pca_models2 <- function(x, ajdi) {
Call <- bquote(splom(x, group = x[[.(ajdi)]]))
eval(Call)
}
plot_pca_models2(x = my.data, ajdi = "id")
The cause of the confusion is the following line in lattice:::splom.formula:
groups <- eval(substitute(groups), data, environment(formula))
... whose only point is to be able to specify groups without quotation marks, that is,
# instead of
splom(DATA, groups="ID")
# you can now be much shorter, thanks to eval and substitute:
splom(DATA, groups=ID)
But of course, this makes using splom (and other functions e.g. substitute which use "nonstandard evaluation") harder to use from within other functions, and is against the philosophy that is "mostly" followed in the rest of R.

Resources