I have roughly this function:
plot_pca_models <- function(models, id) {
library(lattice)
splom(models, groups=id)
}
and I'm calling it like this:
plot_pca_models(data.pca, log$id)
wich results in this error:
Error in eval(expr, envir, enclos) : object 'id' not found
when I call it without the wrapping function:
splom(data.pca, groups=log$id)
it raises this error:
Error in log$id : object of type 'special' is not subsettable
but when I do this:
id <- log$id
splom(models, groups=id)
it behaves as expected.
Please can anybody explain why it behaves like this and how to correct it? Thanks.
btw:
I'm aware of similar questions here, eg:
Help understand the error in a function I defined in R
Object not found error with ddply inside a function
Object disappears from namespace in function
but none of them helped me.
edit:
As requested, there is full "plot_pca_models" function:
plot_pca_models <- function(data, id, sel=c(1:4), comp=1) {
# 'data' ... princomp objects
# 'id' ... list of samples id (classes)
# 'sel' ... list of models to compare
# 'comp' ... which pca component to compare
library(lattice)
models <- c()
models.size <- 1:length(data)
for(model in models.size) {
models <- c(models, list(data[[model]]$scores[,comp]))
}
names(models) <- 1:length(data)
models <- do.call(cbind, models[sel])
splom(models, groups=id)
}
edit2:
I've managed to make the problem reproducible.
require(lattice)
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100))
my.id <- data.frame(id = sample(letters[1:4], 100, replace = TRUE))
plot_pca_models2 <- function(x, ajdi) {
splom(x, group = ajdi)
}
plot_pca_models2(x = my.data, ajdi = my.id$id)
which produce the same error like above.
The problem is that splom evaluates its groups argument in a nonstandard way.A quick fix is to rewrite your function so that it constructs the call with the appropriate syntax:
f <- function(data, id)
eval(substitute(splom(data, groups=.id), list(.id=id)))
# test it
ir <- iris[-5]
sp <- iris[, 5]
f(ir, sp)
log is a function in base R. Good practice is to not name objects after functions...it can create confusion. Type log$test into a clean R session and you'll see what's happening:
object of type 'special' is not subsettable
Here's a modification of Hong Oi's answer. First I would recommend to include id in the main data frame, i.e
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100), id = sample(letters[1:4], 100, replace = TRUE))
.. and then
plot_pca_models2 <- function(x, ajdi) {
Call <- bquote(splom(x, group = x[[.(ajdi)]]))
eval(Call)
}
plot_pca_models2(x = my.data, ajdi = "id")
The cause of the confusion is the following line in lattice:::splom.formula:
groups <- eval(substitute(groups), data, environment(formula))
... whose only point is to be able to specify groups without quotation marks, that is,
# instead of
splom(DATA, groups="ID")
# you can now be much shorter, thanks to eval and substitute:
splom(DATA, groups=ID)
But of course, this makes using splom (and other functions e.g. substitute which use "nonstandard evaluation") harder to use from within other functions, and is against the philosophy that is "mostly" followed in the rest of R.
Related
I am trying something pretty simple, want to run a bunch of regressions parallelly. When I use the following data generator (PART 1), The parallel part does not work and give the error listed below
#PART 1
p <- 20; rho<-0.7;
cdc<- diag(p)
for( i in 1:(p-1) ){ for( j in (i+1):p ){
cdc[i,j] <- cdc[j,i] <- rho^abs(i-j)
}}
my.data <- mvrnorm(n=100, mu = rep(0, p), Sigma = cdc)
The following Parallel Part does work but if I generate the data as PART 2
# PART 2
my.data<-matrix(rnorm(1000,0,1),nrow=100,ncol=10)
I configured the function that I want to run parallelly... as
parallel_fun<-function(obj,my.data){
p1 <- nrow(cov(my.data));store.beta<-matrix(0,p1,length(obj))
count<-1
for (itration in obj) {
my_df<-data.frame(my.data)
colnames(my_df)[itration] <- "y"
my.model<-bas.lm(y ~ ., data= my_df, alpha=3,
prior="ZS-null", force.heredity = FALSE, pivot = TRUE)
cf<-coef(my.model, estimator="MPM")
betas<-cf$postmean[-1]
store.beta[ -itration, count]<- betas
count<-count+1
}
result<-list('Beta'=store.beta)
}
So I write the following way of running parlapply
{
no_cores <- detectCores(logical = TRUE)
myclusternumber<-(no_cores-1)
cl <- makeCluster(myclusternumber)
registerDoParallel(cl)
p1 <- ncol(my.data)
obj<-splitIndices(p1, myclusternumber)
clusterExport(cl,list('parallel_fun','my.data','obj'),envir=environment())
clusterEvalQ(cl, {
library(MASS)
library(Matrix)
library(BAS)
})
newresult<-parallel::parLapply(cl,obj,fun = parallel_fun,my.data)
stopCluster(cl)
}
But whenever am doing PART 1 I get the following error
Error in checkForRemoteErrors(val) :
7 nodes produced errors; first error: object 'my_df' not found
But this should not happen, the data frame should be created, I have no idea why this is happening. Any help is appreciated.
Posting this as one possible workaround, see if it works:
parallel_fun<-function(obj,my.data){
p1 <- nrow(cov(my.data));store.beta<-matrix(0,p1,length(obj))
count<-1
for (itration in obj) {
my_df<-data.frame(my.data)
colnames(my_df)[itration] <- "y"
my_df <<- my_df
my.model<-bas.lm(y ~ ., data= my_df, alpha=3,
prior="ZS-null", force.heredity = FALSE, pivot = TRUE)
cf<-BAS:::coef.bas(my.model, estimator="MPM")
betas<-cf$postmean[-1]
store.beta[ -itration, count]<- betas
count<-count+1
}
result<-list('Beta'=store.beta)
}
The issue seems to be with BAS:::coef.bas function, that calls eval in order to get my_df and fails to do that when called in parallel. The "hack" here is to force my_df out to the parent environment by calling my_df <<- my_df.
There should be a better way to do this, but <<- might be the fastest one. In general, <<- may cause unwanted behaviour, especially when used in loops. Assigning unique variable name before exporting (and don't forgetting to remove after use) is one way to tackle them.
I am using seurat to analyze some scRNAseq data, I have managed to put all the SCT integration one line codes from satijalab into a function with basically
SCT_normalization <- function (f1, f2) {
f_merge <- merge (f1, y=f2)
f.list <- SplitObject(f_merge, split.by = "stim")
f.list <- lapply(X = f.list, FUN = SCTransform)
features <- SelectIntegrationFeatures(object.list = f.list, nfeatures = 3000)
f.list <<- PrepSCTIntegration(object.list = f.list, anchor.features = features)
return (f.list)
}
so that I will have f.list in the global environment for downstream analysis and making plots. The problem I am running into is that, every time I run the function, the output would be f.list, I want it to be specific to the input value name (i.e., f1 and/or f2). Basically something that I can set so that I would know which input value was used to generate the final output. I saw something using the assign function but someone wrote a warning about "the evil and wrong..." so I am not sure as to how to approach this.
From what it sounds like you don't need to use the super assign function <<-. In my opinion, I don't think <<- should be used as it can cause unexpected changes in objects. This is what I assume the other person was saying. For example, if you have the following function:
AverageVector <- function(v) x <<- mean(v, rm.na = TRUE)
Now you're trying to find the average of a vector you have, along with more analysis
library(tidyverse)
x <- unique(iris$Species)
avg_sl <- AverageVector(iris$Sepal.Length)
Now where x used to be a character vector, it's not a numeric vector with a length of 1.
So I would remove the <<- and call your function like this
object_list_1_2 <- SCT_normalize(object1, object2)
If you wanted a slightly more programatic way you could do something like this to keep track of objects you could do something like this:
SCT_normalization <- function(f1, f2) {
f_merge <- merge (f1, y = f2)
f.list <- SplitObject(f_merge, split.by = "stim")
f.list <- lapply(X = f.list, FUN = SCTransform)
features <- SelectIntegrationFeatures(object.list = f.list, nfeatures = 3000)
f.list <- PrepSCTIntegration(object.list = f.list, anchor.features = features)
to_return <- list(inputs = list(f1, f2), normalized = f.list)
return(to_return)
}
Question: i have the following R code (below):
It didn't work with "x" = "ARIMA" and "ETS" from "my.list".
That's the problem:
"fabletools::model(arima_auto = fable::ARIMA(Trips))" = it works,
but this: "fabletools::model(arima_auto = fable::x(Trips))" didn't work.
Does anyone know the solution to my problem. Is it even possible in R?
library(tidyverse)
library(fable)
library(fabletools)
library(tsibble)
tourism <- tsibble::tourism
my.list <- list("ARIMA","ETS")
my.list[[1]] ## "ARIMA"
my.list[[2]] ## "ETS"
f_test <- function(.df1,.n){
x <- .df1[[.n]][[1]] ### 1) "ARIMA", "ETS"
print(x)
fit <- tourism %>%
dplyr::filter(Region == "Adelaide") %>%
#fabletools::model(arima_auto = fable::ARIMA(Trips)) ### it works
fabletools::model(arima_auto = fable::x(Trips)) ### didn't work
assign("fit", fit, envir= globalenv())
}
purrr::map(.x = seq(my.list), .f = ~(f_test(my.list, Counter <- .x)))
When you're calling x(Trips), your x is the character vector "ARIMA". R has no idea what [character vector](Trips) means. It would be like trying to call "Alice"(y) and expecting R to treat "Alice" as a function, even though it clearly is not one.
What you want is a way for R to swap the string "ARIMA" for its corresponding function. This is what match.fun is for. Try this instead:
working<-match.fun(x)
fabletools::model(arima_auto = working(Trips))
Note that we didn't need to use any namespaces for this, unlike your original approach. Good practice would be to find a way to do so, e.g. working<-get(x,envir = environment(fable)), but we didn't need it here.
I am having problems with my S4 object resafter I appended a list of values to it. The object was created with the DESeq2 package. The object was created via:
dds <- DESeqDataSetFromMatrix(countData = count.matrix,
colData = coldata,
design = ~ Condition)
dds <- DESeq(dds, test = "Wald")
res <- results(dds)
I did the following:
x <- qvalue(res#listData[["pvalue"]]) #calc qvalues based on pvalues from S4 object 'res'
res#listData[["qval"]] <- x[["qvalues"]] #append qvalues from x to 'res' as new col named "qval"
Now when I try to inspect the object with head() I get the following error:
> head(res)
Error in `rownames<-`(`*tmp*`, value = names(x)) :
invalid rownames length
The funny thing is that with View()I can inspect the S4 object in RStudio and I can see that everything went fine, adding the qvalues. Does anyone know why this happens? Is there a way to avoid that?
For you to get the qvalues.. you can do this first:
library(qvalue)
library(DESeq2)
dds = makeExampleDESeqDataSet()
dds = DESeq(dds)
res = results(dds)
res$qvalue = qvalue(res$pvalue)$qvalue
I will follow up with why there is an error.. you need to look into how it is constructed.
This a a follow up question from Error in calling `lm` in a `lapply` with `weights` argument but it may not be the same problem (but still related).
Here is a reproducible example:
dd <- data.frame(y = rnorm(100),
x1 = rnorm(100),
x2 = rnorm(100),
x3 = rnorm(100),
x4 = rnorm(100),
wg = runif(100,1,100))
ls.form <- list(
formula(y~x1+x2),
formula(y~x3+x4),
formula(y~x1|x2|x3),
formula(y~x1+x2+x3+x4)
)
I have a function that takes different arguments (1- a subsample, 2- a colname for the weights argument, 3- a list of formulas to try and 4- the data.frame to use)
f1 <- function(samp, dat, forms, wgt){
baselm <- lm(y~x1, data = dat[samp,], weights = dat[samp,wgt])
lapply(forms, update, object = baselm)
}
If I call the function, I get an error:
f1(1:66, dat = dd, forms = ls.form, wgt = "wg")
Error in is.data.frame(data) : object 'dat' not found
I don't really get why it doesn't find the dat object, it should be part of the fonction environment. The problem is in the update part of the code as if you remove this line from the function, the code works.
At the end, this function will be call with a lapply
lapply(list(1:66, 33:99), f1, dat=dd, forms = ls.form, wgt="wg")
I think your problems are due to the scoping rules used by lm which are quite frankly a pain in the r-squared.
One option is to use do.call to get it to work, but you get some ugly output when it deparses the inputs to give the call used for the standard print method.
f1 <- function(samp, dat, forms, wgt){
baselm <- do.call(lm,list(formula=y~x1, data = dat[samp,], weights = dat[samp,wgt]))
lapply(forms, update, object = baselm)
}
A better way is to use an eval(substitute(...)) construct which gives the output you originally expected:
f2 <- function(samp, dat, forms, wgt){
baselm <- eval(substitute(lm(y~x1, data = dat[samp,], weights = dat[samp,wgt])))
lapply(forms, update, object = baselm)
}
Such scoping issues are very common with lm objects. You can solve this by specifying the correct environment for evaluation:
f1 <- function(samp, dat, forms, wgt){
baselm <- lm(y~x1, data = dat[samp,], weights = dat[samp,wgt])
mods <- lapply(forms, update, object = baselm, evaluate = FALSE)
e <- environment()
lapply(mods, eval, envir = e)
}
f1(1:66, dat = dd, forms = ls.form, wgt = "wg")
#works
The accepted error work, but I continued digging and found this old r-help question (here) which gave more options and explanation. I thought I would post it here in case somebody else needs it.