Question: i have the following R code (below):
It didn't work with "x" = "ARIMA" and "ETS" from "my.list".
That's the problem:
"fabletools::model(arima_auto = fable::ARIMA(Trips))" = it works,
but this: "fabletools::model(arima_auto = fable::x(Trips))" didn't work.
Does anyone know the solution to my problem. Is it even possible in R?
library(tidyverse)
library(fable)
library(fabletools)
library(tsibble)
tourism <- tsibble::tourism
my.list <- list("ARIMA","ETS")
my.list[[1]] ## "ARIMA"
my.list[[2]] ## "ETS"
f_test <- function(.df1,.n){
x <- .df1[[.n]][[1]] ### 1) "ARIMA", "ETS"
print(x)
fit <- tourism %>%
dplyr::filter(Region == "Adelaide") %>%
#fabletools::model(arima_auto = fable::ARIMA(Trips)) ### it works
fabletools::model(arima_auto = fable::x(Trips)) ### didn't work
assign("fit", fit, envir= globalenv())
}
purrr::map(.x = seq(my.list), .f = ~(f_test(my.list, Counter <- .x)))
When you're calling x(Trips), your x is the character vector "ARIMA". R has no idea what [character vector](Trips) means. It would be like trying to call "Alice"(y) and expecting R to treat "Alice" as a function, even though it clearly is not one.
What you want is a way for R to swap the string "ARIMA" for its corresponding function. This is what match.fun is for. Try this instead:
working<-match.fun(x)
fabletools::model(arima_auto = working(Trips))
Note that we didn't need to use any namespaces for this, unlike your original approach. Good practice would be to find a way to do so, e.g. working<-get(x,envir = environment(fable)), but we didn't need it here.
Related
I am trying something pretty simple, want to run a bunch of regressions parallelly. When I use the following data generator (PART 1), The parallel part does not work and give the error listed below
#PART 1
p <- 20; rho<-0.7;
cdc<- diag(p)
for( i in 1:(p-1) ){ for( j in (i+1):p ){
cdc[i,j] <- cdc[j,i] <- rho^abs(i-j)
}}
my.data <- mvrnorm(n=100, mu = rep(0, p), Sigma = cdc)
The following Parallel Part does work but if I generate the data as PART 2
# PART 2
my.data<-matrix(rnorm(1000,0,1),nrow=100,ncol=10)
I configured the function that I want to run parallelly... as
parallel_fun<-function(obj,my.data){
p1 <- nrow(cov(my.data));store.beta<-matrix(0,p1,length(obj))
count<-1
for (itration in obj) {
my_df<-data.frame(my.data)
colnames(my_df)[itration] <- "y"
my.model<-bas.lm(y ~ ., data= my_df, alpha=3,
prior="ZS-null", force.heredity = FALSE, pivot = TRUE)
cf<-coef(my.model, estimator="MPM")
betas<-cf$postmean[-1]
store.beta[ -itration, count]<- betas
count<-count+1
}
result<-list('Beta'=store.beta)
}
So I write the following way of running parlapply
{
no_cores <- detectCores(logical = TRUE)
myclusternumber<-(no_cores-1)
cl <- makeCluster(myclusternumber)
registerDoParallel(cl)
p1 <- ncol(my.data)
obj<-splitIndices(p1, myclusternumber)
clusterExport(cl,list('parallel_fun','my.data','obj'),envir=environment())
clusterEvalQ(cl, {
library(MASS)
library(Matrix)
library(BAS)
})
newresult<-parallel::parLapply(cl,obj,fun = parallel_fun,my.data)
stopCluster(cl)
}
But whenever am doing PART 1 I get the following error
Error in checkForRemoteErrors(val) :
7 nodes produced errors; first error: object 'my_df' not found
But this should not happen, the data frame should be created, I have no idea why this is happening. Any help is appreciated.
Posting this as one possible workaround, see if it works:
parallel_fun<-function(obj,my.data){
p1 <- nrow(cov(my.data));store.beta<-matrix(0,p1,length(obj))
count<-1
for (itration in obj) {
my_df<-data.frame(my.data)
colnames(my_df)[itration] <- "y"
my_df <<- my_df
my.model<-bas.lm(y ~ ., data= my_df, alpha=3,
prior="ZS-null", force.heredity = FALSE, pivot = TRUE)
cf<-BAS:::coef.bas(my.model, estimator="MPM")
betas<-cf$postmean[-1]
store.beta[ -itration, count]<- betas
count<-count+1
}
result<-list('Beta'=store.beta)
}
The issue seems to be with BAS:::coef.bas function, that calls eval in order to get my_df and fails to do that when called in parallel. The "hack" here is to force my_df out to the parent environment by calling my_df <<- my_df.
There should be a better way to do this, but <<- might be the fastest one. In general, <<- may cause unwanted behaviour, especially when used in loops. Assigning unique variable name before exporting (and don't forgetting to remove after use) is one way to tackle them.
I have a question about passing parameters for functions down through a series of wrappers and the correct way to these sorts of things. Each of these functions aside from the wrapper function are supposed to also work outside of the wrapper if that makes any difference. I apologize for the stupidity or errors within the example, I was struggling to think of something that would explain the problem.
# data sets
data("mtcars")
data("starwars")
# Data set list
d.list <- list(mtcars, starwars)
names(d.list) <- c("mtcars", "starwars")
foo_1 <- function(event,data.in){
data.in[grep(event, names(data.in))]
}
foo_2 <- function(event, data.in, extra, ...){
a.df <- foo_1(...)
a.df %>%
mutate(across(is.numeric, ~ . + extra))
}
foo_wrapper <- function(event, data.in, extra, ...){
b.df <- foo_2(...)
c.df <- foo_2(..., extra = 15)
return(list(b.df, c.df))}
foo.this <- foo_wrapper(event = "starwars",
data.in = d.list,
extra = 12)
# Error in foo_1(...) : argument "data.in" is missing, with no default
Imagine you have a simple function that specifies which statistical tests to run for each variable. Its syntax, simplified for the purposes of this question is as follows:
test <- function(...) {
x <- list(...)
return(x)
}
which takes argument pairs such as Gender = 'Tukey', and intends to pass its result to other functions down the line. The output of test() is as follows:
test(Gender = 'Tukey')
# $Gender
# [1] "Tukey"
What is desired is the ability to replace the literal Gender by a dynamically assigned variable varname (e.g., for looping purposes). Currently what happens is:
varname <- 'Gender'
test(varname = 'Tukey')
# $varname
# [1] "Tukey"
but what is desired is this:
varname <- 'Gender'
test(varname = 'Tukey')
# $Gender
# [1] "Tukey"
I tried tinkering with functions such as eval() and parse(), but to no avail. In practice, I resolved the issue by simply renaming the resulting list, but it is an ugly solution and I am sure there is an elegant R way to achieve it. Thank in advance for the educational value of your answer.
NB: This question occurred to me while trying to program a custom function which uses mcp() from the effects package in its internals. The said mcp() function is the real world counterpart of test().
EDIT1: Perhaps it needs to be clarified that (for educational purposes) changing test() is not an option. The question is about how to pass the tricky argument to test(). If you take a look at NB, it becomes clear why: the real world counterpart of test(), namely mcp(), comes with a package. And while it is possible to create a modified copy of it, I am really curious whether there exists a simple solution in somehow 'converting' the dynamically assigned variable to a literal in the context of dot-arguments.
This works:
test <- function(...) {
x = list(...)
names(x) <- sapply(names(x),
function(p) eval(as.symbol(p)))
return(x)
}
apple = "orange"
test(apple = 5)
We can use
test <- function(...) {
x <- list(...)
if(exists(names(x))) names(x) <- get(names(x))
x
}
test(Gender = 'Tukey')
#$Gender
#[1] "Tukey"
test(varname = 'Tukey')
#$Gender
#[1] "Tukey"
What about this:
varname <- "Gender"
args <- list()
args[[varname]] <- "Tukey"
do.call(test, args)
I have a function, biovars(prec,tmin,tmax) with three variables as input. prec,tmin,tmax are multi bandGeoTiffs. However there are several Tiffs stored in different directories, that needed to be stacked before the can be passed to the function:
stack("/path1/to/file/prec1.tif")
stack("/path2/to/file/tmin1.tif")
stack("/path3/to/file/tmax1.tif")
How can I loop through each directory, stack the first file and use the result of the loop as input for the function:
biovars(prec1,tmin1,tmax1)
biovars(prec2,tmin2,tmax2)
...
biovars(precn,tminn,tmaxn)
Is there an easy way for doing this? I don't think it is possible with three for loops...
From your description, I think you can do something like this:
library(dismo)
# setwd(root of data files")
fprec <- list.files(".", pattern="prec", recursive=TRUE, full.names=TRUE)
ftmin <- gsub("prec", "tmin", fprec)
ftmax <- gsub("prec", "tmax", fprec)
fbio <- gsub("prec", "bio", fprec)
for (i in 1:length(fprec)) {
prec <- brick(fprec[i])
tmin <- brick(ftmin[i])
tmax <- brick(ftmax[i])
b <- biovars(prec, tmin, tmax, filename=fbio[i])
}
I always prefer to use *apply functions and avoid get. It also leads to easily read code (IMO).
prec_files = c("/path/prec1.tif", "/path/prec2.tif", ..., "/path/precn.tif")
tmin_files = c("/path/tmin1.tif", "/path/tmin2.tif", ..., "/path/tminn.tif")
tmax_files = c("/path/tmax1.tif", "/path/tmax2.tif", ..., "/path/tmaxn.tif")
prec_stacks <- lapply(prec_files, stack)
tmin_stacks <- lapply(tmin_files, stack)
tmax_stacks <- lapply(tmax_files, stack)
results <- mapply(FUN = biovar, prec_stacks, tmin_stacks, tmax_stacks)
If biovars returns a useful object, you can store the results of the mapply call. If you hit memory limitations by (I assume) loading all the rasters at once, you can wrap the loading and biovar call into one function:
stack_and_biovar <- function(prec_path, tmin_path, tmax_path) {
prec <- stack(prec_path)
tmin <- stack(tmin_path)
tmax <- stack(tmax_path)
biovar(prec, tmin, tmax)
}
mapply(FUN = stack_and_biovar, prec_files, tmin_files, tmax_files)
If I understand you correctly then there is a simple way to do it using either pmap or pwalk from purrr package.
Basically, you list all the files with given patterns then use pmap/pwalk to loop through that list and pass corresponding file names to raster::stack and biovars functions.
library(tidyverse)
pathGIS <- "/path1/to/file/"
filePattern <- list("^prec*", "^tmax*", "^tmin*")
# List files and sort them numerically using mixedsort from gtools
fileList <- map(filePattern, ~ gtools::mixedsort(list.files(path = pathGIS, recursive = TRUE,
pattern = .x, full.names = TRUE)))
head(fileList)
# ..1, ..2, etc. correspond to the sublist order within fileList
pwalk(fileList, ~ biovars(raster::stack(..1), raster::stack(..2), raster::stack(..3)))
pmap(fileList, ~ biovars(raster::stack(..1), raster::stack(..2), raster::stack(..3)))
Simple example
fileList <- list(c("./GIS/prec1.tif", "./GIS/prec2.tif"),
c("./GIS/tmax1.tif", "./GIS/tmax2.tif"),
c("./GIS/tmin1.tif", "./GIS/tmin2.tif"))
dummy_function <- function(precFile, tmaxFile, tminFile) {
cat("\n-----\n")
print(precFile)
print(tmaxFile)
print(tminFile)
cat("-----\n")
}
pwalk(fileList, ~ dummy_function(..1, ..2, ..3))
-----
[1] "./GIS/prec1.tif"
[1] "./GIS/tmax1.tif"
[1] "./GIS/tmin1.tif"
-----
-----
[1] "./GIS/prec2.tif"
[1] "./GIS/tmax2.tif"
[1] "./GIS/tmin2.tif"
-----
I have roughly this function:
plot_pca_models <- function(models, id) {
library(lattice)
splom(models, groups=id)
}
and I'm calling it like this:
plot_pca_models(data.pca, log$id)
wich results in this error:
Error in eval(expr, envir, enclos) : object 'id' not found
when I call it without the wrapping function:
splom(data.pca, groups=log$id)
it raises this error:
Error in log$id : object of type 'special' is not subsettable
but when I do this:
id <- log$id
splom(models, groups=id)
it behaves as expected.
Please can anybody explain why it behaves like this and how to correct it? Thanks.
btw:
I'm aware of similar questions here, eg:
Help understand the error in a function I defined in R
Object not found error with ddply inside a function
Object disappears from namespace in function
but none of them helped me.
edit:
As requested, there is full "plot_pca_models" function:
plot_pca_models <- function(data, id, sel=c(1:4), comp=1) {
# 'data' ... princomp objects
# 'id' ... list of samples id (classes)
# 'sel' ... list of models to compare
# 'comp' ... which pca component to compare
library(lattice)
models <- c()
models.size <- 1:length(data)
for(model in models.size) {
models <- c(models, list(data[[model]]$scores[,comp]))
}
names(models) <- 1:length(data)
models <- do.call(cbind, models[sel])
splom(models, groups=id)
}
edit2:
I've managed to make the problem reproducible.
require(lattice)
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100))
my.id <- data.frame(id = sample(letters[1:4], 100, replace = TRUE))
plot_pca_models2 <- function(x, ajdi) {
splom(x, group = ajdi)
}
plot_pca_models2(x = my.data, ajdi = my.id$id)
which produce the same error like above.
The problem is that splom evaluates its groups argument in a nonstandard way.A quick fix is to rewrite your function so that it constructs the call with the appropriate syntax:
f <- function(data, id)
eval(substitute(splom(data, groups=.id), list(.id=id)))
# test it
ir <- iris[-5]
sp <- iris[, 5]
f(ir, sp)
log is a function in base R. Good practice is to not name objects after functions...it can create confusion. Type log$test into a clean R session and you'll see what's happening:
object of type 'special' is not subsettable
Here's a modification of Hong Oi's answer. First I would recommend to include id in the main data frame, i.e
my.data <- data.frame(pca1 = rnorm(100), pca2 = rnorm(100), pca3 = rnorm(100), id = sample(letters[1:4], 100, replace = TRUE))
.. and then
plot_pca_models2 <- function(x, ajdi) {
Call <- bquote(splom(x, group = x[[.(ajdi)]]))
eval(Call)
}
plot_pca_models2(x = my.data, ajdi = "id")
The cause of the confusion is the following line in lattice:::splom.formula:
groups <- eval(substitute(groups), data, environment(formula))
... whose only point is to be able to specify groups without quotation marks, that is,
# instead of
splom(DATA, groups="ID")
# you can now be much shorter, thanks to eval and substitute:
splom(DATA, groups=ID)
But of course, this makes using splom (and other functions e.g. substitute which use "nonstandard evaluation") harder to use from within other functions, and is against the philosophy that is "mostly" followed in the rest of R.