Writing a function for initializing parameters in R/Splus - r

I'd like to write a function that will create and return a set of parameters to be used in a function mySimulation I've created. Until now, I've basically been doing, e.g., mySimulation(parm1 = 3, parm2 = 4). This is now suboptimal because (1) in the actual version, the number of parameters is becoming unwieldy and (2) I'd like to keep track of different combinations of the parameters that produce the different models I'm using. So, I wrote createParms (a minimally sufficient version shown below) to do the trick. My whole approach just seems so clunky though. With all the statisticians using R, I'm sure there's a more standard way of handling my issue...right?
createParms <- function(model = "default", ...) {
# Returns a list `parms` of parameters which will then be used in
# mySimultation(parms)
#
# Args:
# model: ["default" | "mymodel"] character string representation of a model
# with known parameters
# ...: parameters of the existing `model` to overwrite.
# if nothing is supplied then the model parameters will be left as is.
# passed variables must be named.
# e.g., `parm1 = 10, parm2 = 20` is good. `10, 20` is bad.
#
# Returns:
# parms: a list of parameters to be used in mySimulation(parms)
#
parms.names <- c("parm1", "parm2")
parms <- vector(mode = "list", length = length(parms.names))
names(parms) <- parms.names
overwrite <- list(...)
overwrite.names <- names(overwrite)
if (model == "default") {
parms$parm1 <- 0
parms$parm2 <- 0
} else if (model == "mymodel") {
parms$parm1 <- 1
parms$parm2 <- 2
}
if (length(overwrite) != 0) {
parms[overwrite.names] <- overwrite
}
return(parms)
}

I think if you know the combination of parameters to be used for each model, then it is better to create a data frame of model names and parameters as shown below
# create a data frame with model names and parameters
# NOTE: i am assuming all models have equal number of parameters
# if they are unequal, then store as list of models
model = c('default', 'mymodel');
parm1 = c(0.5, 0.75);
parm2 = c(1, 2);
models.df = data.frame(model, parm1, parm2)
You can now simulate any of the models by passing it as an argument to your mySimulation function. I have used a dummy simulation example, which you can replace with your code.
# function to run simulation based on model name
mySimulation = function(model = 'default'){
# find row corresponding to model of interest
mod.row = match(model, models.df$model)
# extract parameters corresponding to model
parms = models.df[mod.row, -1]
# run dummy simulation of drawing normal random variables
sim.df = rnorm(100, mean = parms[,1], sd = parms[,2])
return(sim.df)
}
If you now want to run all your simulations in one step, you can use the excellent plyr package and invoke
library(plyr)
sim.all = ldply(models.df$model, mySimulation)
If each of your simulations returns unequal number of values then you can use the function llply instead of ldply.
If you provide more information about the return values of your simulation and details on what it does, this code can be easily tweaked to get what you want.
Let me know if this works

If the simulation function always takes the same set of arguments, then Ramnath's approach of storing them in a data frame is best. For the more general case of variable inputs to mySimulation, you should store each set of inputs in a list – probably using a list of lists for running several simluations.
The idea behind your createParms function looks sound; you can simplify the code a little bit.
createParms <- function(model = "default", ...)
{
#default case
parms <- list(
parm1 = 0,
parm2 = 0
)
#other special cases
if(model == "mymodel")
{
parms <- within(parms,
{
parm1 <- 1
parm2 <- 2
})
}
#overwrite from ...
dots <- list(...)
parms[names(dots)] <- dots
parms
}
Test this with, e.g.,
createParms()
createParms("mymodel")
createParms("mymodel", parm2 = 3)
do.call may come in handy for running your simulation, as in
do.call(mySimulation, createParms())
EDIT: What do.call does for you
If you have parms <- createParms(), then
do.call(mySimulation, parms)
is the same as
with(parms, mySimulation(parm1, parm2))
The main advantage is that you don't need to spell out each parameter that you are passing into mySimulation (or to modify that function to accept the parameters in list form).

Related

How do I repeat codes with names changing at every block? (with R)

I'm dealing with several outputs I obtain from QIIME, texts which I want to manipulate for obtaining boxplots. Every input is formatted in the same way, so the manipulation is always the same, but it changes the source name. For each input, I want to extract the last 5 rows, have a mean for each column/sample, associate the values to sample experimental labels (Group) taken from the mapfile and put them in the order I use for making a boxplot of all the 6 data obtained.
In bash, I do something like "for i in GG97 GG100 SILVA97 SILVA100 NCBI RDP; do cp ${i}/alpha/collated_alpha/chao1.txt alpha_tot/${i}_chao1.txt; done" to do a command various times changing the names in the code in an automatic way through ${i}.
I'm struggling to find a way to do the same with R. I thought creating a vector containing the names and then using a for cycle by moving the i with [1], [2] etc., but it doesn't work, it stops at the read.delim line not finding the file in the wd.
Here's the manipulation code I wrote. After the comment, it will repeat itself 6 times with the 6 databases I'm using (GG97 GG100 SILVA97 SILVA100 NCBI RDP).
PLUS, I repeat this process 4 times because I have 4 metrics to use (here I'm showing shannon, but I also have a copy of the code for chao1, observed_species and PD_whole_tree).
library(tidyverse)
library(labelled)
mapfile <- read.delim(file="mapfile_HC+BV.txt", check.names=FALSE);
mapfile <- mapfile[,c(1,4)]
colnames(mapfile) <- c("SampleID","Pathology_group")
#GG97
collated <- read.delim(file="alpha_diversity/GG97_shannon.txt", check.names=FALSE);
collated <- tail(collated,5); collated <- collated[,-c(1:3)]
collated_reorder <- collated[,match(mapfile[,1], colnames(collated))]
labels <- t(mapfile)
colnames(collated_reorder) <- labels[2,]
mean <- colMeans(collated_reorder, na.rm = FALSE, dims = 1)
mean = as.matrix(mean); mean <- t(mean)
GG97_shannon <- as.data.frame(rbind(labels[2,],mean))
GG97_shannon <- t(GG97_shannon);
DB_type <- list(DB = "GG97"); DB_type <- rep(DB_type, 41)
GG97_shannon <- as.data.frame(cbind(DB_type,GG97_shannon))
colnames(GG97_shannon) <- c("DB","Group","value")
rm(collated,collated_reorder,DB_type,labels,mean)
Here I paste all the outputs together, freeze the order and make the boxplot.
alpha_shannon <- as.data.frame(rbind(GG97_shannon,GG100_shannon,SILVA97_shannon,SILVA100_shannon,NCBI_shannon,RDP_shannon))
rownames(alpha_shannon) <- NULL
rm(GG97_shannon,GG100_shannon,SILVA97_shannon,SILVA100_shannon,NCBI_shannon,RDP_shannon)
alpha_shannon$Group = factor(alpha_shannon$Group, unique(alpha_shannon$Group))
alpha_shannon$DB = factor(alpha_shannon$DB, unique(alpha_shannon$DB))
library(ggplot2)
ggplot(data = alpha_shannon) +
aes(x = DB, y = value, colour = Group) +
geom_boxplot()+
labs(title = 'Shannon',
x = 'Database',
y = 'Diversity') +
theme(legend.position = 'bottom')+
theme_grey(base_size = 16)
How do I keep this code "DRY" and don't need 146 rows of code to repeat the same things over and over? Thank you!!
You didn't provide a Minimal reproducible example, so this answer cannot guarantee correctness.
An important point to note is that you use rm(...), so this means some variables are only relevant within a certain scope. Therefore, encapsulate this scope into a function. This makes your code reusable and spares you the manual variable removal:
process <- function(file, DB){
# -> Use the function parameter `file` instead of a hardcoded filename
collated <- read.delim(file=file, check.names=FALSE);
collated <- tail(collated,5); collated <- collated[,-c(1:3)]
collated_reorder <- collated[,match(mapfile[,1], colnames(collated))]
labels <- t(mapfile)
colnames(collated_reorder) <- labels[2,]
mean <- colMeans(collated_reorder, na.rm = FALSE, dims = 1)
mean = as.matrix(mean); mean <- t(mean)
# -> rename this variable to a more general name, e.g. `result`
result <- as.data.frame(rbind(labels[2,],mean))
result <- t(result);
# -> Use the function parameter `DB` instead of a hardcoded string
DB_type <- list(DB = DB); DB_type <- rep(DB_type, 41)
result <- as.data.frame(cbind(DB_type,result))
colnames(result) <- c("DB","Group","value")
# -> After the end of this function, the variables defined in this function
# vanish automatically, you just need to specify the result
return(result)
}
Now you can reuse that block:
GG97_shannon <- process(file = "alpha_diversity/GG97_shannon.txt", DB = "GG97")
GG100_shannon <- process(file =...., DB = ....)
SILVA97_shannon <- ...
SILVA100_shannon <- ...
NCBI_shannon <- ...
RDP_shannon <- ...
Alternatively, you can use looping structures:
General-purpose for:
datasets <- c("GG97_shannon", "GG100_shannon", "SILVA97_shannon",
"SILVA100_shannon", "NCBI_shannon", "RDP_shannon")
files <- c("alpha_diversity/GG97_shannon.txt", .....)
DBs <- c("GG97", ....)
result <- list()
for(i in seq_along(datasets)){
result[[datasets[i]]] <- process(files[i], DBs[i])
}
mapply, a "specialized for" for looping over several vectors in parallel:
# the first argument is the function from above, the other ones are given as arguments
# to our process(.) function
results <- mapply(process, files, DBs)

Create multiple functions/varying arguments in R using a list

I'm trying to create multiple functions with varying arguments.
Just some background: I need to compute functions describing 75 days respectively and multiply them later to create a Maximum-Likelihood function. They all have the same form, they only differ in some arguments. That's why I wanted to this via a loop.
I've tried to put all the equations in a list to have access to them later on.
The list this loop generates has 75 arguments, but they're all the same, as the [i] in the defined function is not taken into account by the loop, meanging that the M_b[i] (a vector with 75 arguments) does not vary.
Does someone know, why this is the case?
simplified equation used
for (i in 1:75){
log_likelihood[[i]] <-
list(function(e_b,mu_b){M_b[i]*log(e_b*mu_b))})
}
I couldn't find an answer to this in different questions. I'm sorry, if there's a similar thread already existing.
you need to force the evaluation of the variable M_b[i], see https://adv-r.hadley.nz/function-factories.html. Below I try and make it work
func = function(i){
i = force(i)
f = function(e_b,mu_b){i*log(e_b*mu_b) }
return(f)
}
# test
func(9)(7,3) == 9*log(7*3)
#some simulated values for M_b
M_b = runif(75)
log_likelihood = vector("list",75)
for (idx in 1:75){
log_likelihood[[idx]] <- func(M_b[idx])
}
# we test it on say e_b=5, mu_b=6
test = sapply(log_likelihood,function(i)i(5,6))
actual = sapply(M_b,function(i)i*log(5*6))
identical(test,actual)
[1] TRUE
This is called lazy evaluation, where R doesn't evaluate an expression when it is not used. As correctly pointed about by #SDS0, the value you get is at i=75. We try it with your original function:
func = function(i){function(e_b,mu_b){i*log(e_b*mu_b) }}
M_b = 1:3
log_likelihood = vector("list",3)
for (idx in 1:3){
log_likelihood[[idx]] = func(M_b[idx])
}
sapply(log_likelihood,function(f)f(5,6))
[1] 10.20359 10.20359 10.20359
#you get 10.20359 which is M_b[3]*log(5*6)
There is one last option, which I just learned of, which is to do lapply which no longer does lazy evaluation:
func = function(i){function(e_b,mu_b){i*log(e_b*mu_b) }}
log_likelihood = lapply(1:3,function(idx)func(M_b[idx]))
sapply(log_likelihood,function(f)f(5,6))
[1] 3.401197 6.802395 10.203592

passing default values from outer function to repeatedly called inner function in R

This question differs from my original; it adheres more to a minimal reproducible example and incorporates a recommendation by be_green against silently loading entire libraries within the context of a function.
The outer function starts by defining a number of cases, default values, and a list of any case exceptions. The inner function assembles each case by using the default values in a computation unless exceptions are defined. Finally, the outer function assembles these cases into a data frame.
Here is the function:
outerfun <- function(cases, var_default, exceptions=list()){
# Inner Function to create a case
innerfun <- function(var=var_default) { # Case
result = var
return(result)
}
# Combine Cases
datlist <- list()
for(case in 1:cases){
datlist[[paste0("X",case)]] <- do.call(innerfun, as.list(exceptions[[paste0("X",case)]]))
}
casedata <- do.call(dplyr::data_frame, datlist)
return(casedata)
}
This function works fine when I define values for the inner function as exceptions:
data <- outerfun(cases = 3, var_default = 10, exceptions = list("X2" = c(var = 14)))
But not when I mix the two:
data <- outerfun(cases = 3, var_default = 10, exceptions =
list("X2" = c(var = var_default + 4)))
Being able to mix the two are important since it makes the function more intuitive and easier to program for a variety of cases.
I think the problem might result from using do.call and have seen other threads detailing this issue (having to do with environments and frames), but I haven't been able to find an optimal solution for me. I like do.call since I can pass a list of arguments into a function. I could turn the inner function into a list (think: function(...) { }) but then I would have to define every variable instead of relying on the default.
Any help or suggestions you might have would be great.
The problem is that lvl_default is not defined outside the context of the function, and yet you call it as an input to a parameter. Because there is no variable called lvl_default in the global environment, when the function tries to evaluate the parameter exceptions = list(X3 - c(lvl = lvl_default + 10), it fails to find a variable to evaluate. You are not able to specify parameters by setting them equal to the names of other unevaluated parameters.
Instead, what I would recommend doing is setting a variable outside the function associated with the value you were hoping to pass into lvl_default and then pass it into the function like so:
level <- 1000
data <- genCaseData(n_signals = 3, datestart = "2017-07-01T15:00:00",
n_cycles = 4, period_default = 10, phase_default = 0, ampl_default = 15,
lvl_default = level, exceptions = list(X1= c(lvl=980),
X3 = c(lvl = level + 10)))
Also as I noted in a comment, I would recommend against silently loading entire libraries within the context of a function. You can end up masking things you didn't mean to, and running into strange errors because the require call doesn't actually throw one if a library is unavailable. Instead I would reference the functions through pkgname::fncname.
be_green did solve this first, but I wanted to follow-up with what I actually did for my project.
As be_green pointed out, I couldn't call var_default within the exception list since it hadn't yet been defined. I didn't understand this at first since you can actually define the default of an argument to a variable defined within the function itself:
addfun <- function(x, y = z + x + 2) {
z = 20
c(x, y)
}
addfun(x = 20)
[1] 20 42
This is because function arguments in R lazily evaluated. I thought this gave me a pass to call the function like this:
addfun(x = 10, y = x + z)
Error in addfun(x = 10, y = x + z) : object 'x' not found
If you remove x then it calls an error for z. So even though the default to y is dependent on x and z, you can't call the function using x or z.
be_green suggested that I pass arguments in a string and then parse it within the function. But I was afraid that others on my team would find the resulting syntax confusing.
Instead, I used ellipsis (...) and evaluated the ellipsis arguments within my function. I did this using this line of code:
list2env(eval(substitute(alist(...))), envir = as.environment(-1))
Here the eval(substitute(alist(...))) pattern is common but results in a named list of arguments. Due to some other features, it becomes more convenient to evaluate the arguments as objects within the function. list2env(x, envir = as.environment(-1)) accomplishes this with an additional step. Once the argument is called, you need to explicitly evaluate the call. So if I wanted to change my addfun() above:
addfun <- function(x, ...) {
z = 20
list2env(eval(substitute(alist(...))),
envir = as.environment(-1))
c(x, eval(y))
}
addfun(x = 10, y = x + z)
This is a trite example: I now need to define y even though it's not an argument in the function. But now I can even re-define z within the function call:
addfun(x = 10, y = z + 2, z = 10)
This is all possible because of non-standard evaluation. There can be trade-offs but in my application of non-standard evaluation, I was able to increase the usability and flexibility of the function while making it more intuitive to use.
Final code:
outerfun <- function(caseIDs, var_default, ...){
list2env(eval(substitute(alist(...))), envir = as.environment(-1))
# Inner Function to create a case
innerfun <- function(var=var_default) { # Case
result = var
return(result)
}
# Combine Cases
datlist <- lapply(caseIDs, function(case) {
do.call(innerfun, eval(get0(case, ifnotfound = list())))
})
names(datlist) <- caseIDs
casedata <- do.call(dplyr::data_frame, datlist)
return(casedata)
}
Now both examples work with full functionality:
data <- outerfun(caseIDs = c("X1","X2","X3"), var_default = 10,
X2 = list(var = 14))
data <- outerfun(caseIDs = c("X1","X2","X3"), var_default = 10,
X2 = list(var = var_default + 4))
I hope this helps someone else! Enjoy!

In a custom R function that calls ezANOVA: How do I parameterize the dv?

I'm trying to use ezANOVA from the ez package within a function where I want to allow the dv to be specified using a parameter. Normally, ezANOVA will accept the column variable as a symbol or character string (see "This Works" below). However, trying to give ezANOVA a parameter that holds a symbol or character doesn't work (see "This Does Not Work" below). ezANOVA complains that '"the_dv" is not a variable in the data frame provided'. I've tried wrapping the variable name in various methods like as.symbol(), as.formula(), and even tried various ways to incorporate eval() and substitute(), but all with no luck. How is this achieved?
If the why of it helps, i have an project where I need to do many compound analyses (means, anovas, post-hocs, graphs) that are identical expect for the dataset or the variable being analyzed. I want a function so I can write it once and run it many times. The code below is just a simple example.
library(ez)
df<-data.frame(ID=as.factor(101:120),
Training=rep(c("Jedi", "Sith"), 10),
Wins=sample(1:50, 20),
Losses=sample(1:50, 20))
# ----------
# This Works
# ----------
myfunc1 <- function(the_data) {
ezANOVA(
data = the_data,
wid = ID,
dv = Wins,
between = Training
)
}
myfunc1(the_data = df)
# ------------------
# This Does Not Work
# -------------------
myfunc2 <- function(the_data, the_dv) {
ezANOVA(
data = the_data,
wid = ID,
dv = the_dv,
between = Training
)
}
myfunc2(the_data = df, the_dv = Wins) # 'Wins' also fails
Had to solve this one myself. Turns out that a combination of eval() and substitute() solves this puzzle:
# ----------------------------------
# Aha, it works!
# ----------------------------------
library(ez)
df<-data.frame(ID=as.factor(101:120),
Training=rep(c("Jedi", "Sith"), 10),
Wins=sample(1:50, 20),
Losses=sample(1:50, 20))
myfunc2 <- function(the_data, the_dv) {
eval(
substitute(
ezANOVA(data = the_data,
wid = ID,
dv = the_dv,
between = Training),
list(the_dv = the_dv)))
}
myfunc2(the_data = df, the_dv = 'Wins')
myfunc2(the_data = df, the_dv = 'Losses')
Enjoy!!

How to run a script for multiple inputs and save result objects based on the name of input?

To run the same set of commands and save the result objects for each time series, I wrote the script in the following manner :
# Specify time series to be used
dat <- tsname
# Run a set of commands and fit models with different parameters
dat.1 <- model1(dat)
dat.2 <- model2(dat)
dat.3 <- model3(dat)
# Save objects for further analysis
tsname.1 <- dat.1
tsname.2 <- dat.2
save(tsname.1, tsname.2, tsname.3, file = paste0("tsname", ".rda")
In this way, we just need to change the script in the beginning and end, save the script for each time series and run each of them individually or in a main script.
The main reason for this method was because I could not find a way to rename the objects created and some search suggested that the above is the only way to do it.
Now as the number of series has increased, it is preferable to either use a for loop, foreach, batch script or commandArgs() to run one script and specify all time series as arguments.
To make that work though, the script must find a way to assign these objects with name of series itself so that they can be loaded later for further analysis.
How can we make such a script work or is there a better approach ? Which method of looping will work in that case ?
A MWE
set.seed(1)
tsdata <- ts(rnorm(250), start = c(1980,1), frequency = 12)
dat <- tsdata
dat.11 <- arima(dat, order = c(1, 1, 1))
dat.21 <- arima(dat, order = c(2, 1, 0))
tsname.11 <- dat.11 # problem is to specify this step in each script
tsname.21 <- dat.21
save(tsname.11, , file = "tsname.rda")
REVISED the code
How can we execute this script for multiple time series and store the results and result objects for further analysis ? If Batch command can be used, what is the best way to input set of multiple time series?
How can we run the script for one series, over a set of time series of same or mixed length?
I show a couple ways to create and retrieve individual objects using assign and get, but also provide an alternative where all model runs are stored as different elements of a list. Similarly, I show how you can save each model run in separate files (soi.1.rda, etc), but that you can also save everything together, in one step :)
# ===========================================
# = Set up model params, generate test data =
# ===========================================
mod.param <- 1:5 # orders of AR to try ...
test.soi <- arima.sim(model=list(ar=c(0.5, -0.2)), n=20)
# ===========================================================
# = Create empty vectors/ list to store data and data names =
# ===========================================================
dat.names <- c() # a place to store the names of the individual objects that we'll create
save.names <- c() # the names of the files to save, e.g., "soi.1"
dat.all <- list() # as an altnerative, you can save each analysis in different elements of a list
# ===================================================
# = Loop through each type of model, saving results =
# ===================================================
for(i in 1:length(mod.param)){ # loop through each model you want to run
temp.dat <- arima(test.soi, order=c(mod.param[i], 0, 0)) # temp.dat is the current model result
dat.names[i] <- paste("dat", i, sep=".") # dat.names stores the names of all the dat.x objects
assign(dat.names[i], temp.dat) # use assign() to create an object with name of temp.dat.name
# dat.all[[dat.names[i]]] <- temp.dat # store the object in a list
dat.all[[dat.names[i]]] <- get(dat.names[i]) # same as above, but using get(), which complements assign() nicely
save.name <- paste("soi", i, "rda", sep=".") # I'm assuming the file should be named soi.1.rda, not soi.rda
save(list=dat.names[i], file=save.name) # save soi.1.rda, soi.2.rda ... etc.
}
# But we don't have to save each file individually!
# We can save a file that contains our list of models (dat.all), as well as each model object (dat.1, dat.2 ... etc.)
all.objs <- ls() # what are all of the object names in our working memory?
dat.objs <- all.objs[all.objs%in%c(dat.names, "dat.all")] # subset to the names of objects we want to save
save(list=dat.objs, file="everything.rda") # save all relevant objects in 1 .rda file
print(dat.1)
print(dat.all$dat.1)
Edit: A different approach that applies each of several models to several time series
Note that the approach might change slightly depending on which models you want to apply to which time series. I've assumed that several models should be applied to each time series, and that the models differ only the the ARIMA order.
The results can be saved as 1 nested list (different model results grouped under different time series), or with model results for each time series being saved as a separate file.
# ============================================================
# = Generate many time series, many sets of model parameters =
# ============================================================
# Model parameters
n.Params <- 5
ar.orders <- 1:n.Params # orders of AR to try ...
i.orders <- rep(0, n.Params)
ma.orders <- rep(0,n.Params)
arima.params <- as.list(as.data.frame(rbind(ar.orders, i.orders, ma.orders)))
# Time Series Data
n.ts <- 10 # number of time series
test.soi <- quote(as.numeric(arima.sim(model=list(ar=c(0.2, 0.4)), n=sample(20:30, 1))))
all.soi.ts <- replicate(n.ts, eval(test.soi))
names(all.soi.ts) <- paste("soi", 1:n.ts, sep=".")
# ==============================================
# = Function to be applied to each time series =
# ==============================================
# Analyze time series
ats <- function(TS, arimaParams){
dat.all <- list() # as an altnerative, you can save each analysis in different elements of a list
for(i in 1:length(arimaParams)){ # loop through each model you want to run
temp.dat <- arima(TS, order=arimaParams[[i]]) # temp.dat is the current model result
dat.all[[i]] <- temp.dat # store the object in a list
}
dat.all
}
# =========================
# = All Results in 1 List =
# =========================
AllResults <- lapply(all.soi.ts, ats, arima.params) # multilevel list – top level is each TS, within each TS group are the results of all models applied to that time series
save(AllResults, file="everything.rda") # save this big list as 1 file
# ========================================================================
# = Each time series gets its own file and its own list of model results =
# ========================================================================
for(i in 1:length(all.soi.ts)){ # if you want many files, 1 file per time series, use this for loop
temp.ts <- all.soi.ts[[i]]
soi.name <- paste("soi", i, sep=".")
assign(soi.name, ats(temp.ts, arima.params))
save(list=soi.name, file=paste(soi.name, "rda", sep=".")) # each file will have a name like "soi.1.rda", containing the results of all models applied to the first time series
}
The function sets datname to the name of the input variable. Then define a list L of model outputs and add names. Finally use with(L, ...) to regard the list component names as variable names in ... and use save(list = ..., ...) which allows specification of the variables as a character string of names. Now we only have to set up the data and call the function to run it. If you have several data sets call the function for each one.
run <- function(dat, datname = deparse(subset(dat))) {
L <- list(
arima(dat, order = c(1, 1, 1)),
arima(dat, order = c(2, 1, 0))
)
names(L) <- paste(datname, seq_along(L), sep = ".")
with(L, save(list = names(L), file = paste0(datname, ".rda")))
}
set.seed(1)
soi <- ts(rnorm(250), start = c(1980,1), frequency = 12)
run(soi)
Another possibility might be to save the entire list rather than its components separately. That is, replace the with statement with
listname <- paste0(datname, ".models")
assign(listname, L)
save(list = listname, file = paste0(datname, ".rda"))
REVISED Some corrections and added alternative at end.
When you want to manipulate objects whose names are themselves stored inside a variable, just use assign() and its reverse get(). And use ls() to see which objects exist in a particular scope.
The objects don't need to be stored separately as tsname.1/2/3, model1/2/3??
You can make it real simple if you just store a vector dat[1:3].
Indeed you can have a vector of model[1:3] too. Use vectorization. It's your friend.
Use the assign("tsname.21", object,...) command and its reverse get("tsname.21") to manipulate objects by string name. Just be consistent about whether you prefer to refer to objnames or objects.
set.seed(1)
tsdata <- ts(rnorm(250), start = c(1980,1), frequency = 12)
dat <- tsdata
set.seed(1)
tsdata <- ts(rnorm(250), start = c(1980,1), frequency = 12)
dat <- tsdata
create_model <- function(data, params, objname.prefix='tsname.', envir=.GlobalEnv) {
objname = paste(objname.prefix, params[1], params[2], sep='') # both assigns and prints it
the.model <- arima(dat, order = params)
assign(objname, the.model, envir) # create the var in the global env
# If you want, you can return the varname
return(objname)
}
# dat.11 <- arima(dat, order = c(1, 1, 1))
create_model(dat, c(1, 1, 1))
# dat.21 <- arima(dat, order = c(2, 1, 0))
create_model(dat, c(2, 1, 0))
#tsname.11 <- dat.11 # problem is to specify this step in each script
#tsname.21 <- dat.21
save(tsname.11, , file = "tsname.rda")
# Use `ls(pattern=...)` to find object-names, with wildcard matching.
all.models <- ls(pattern='tsname.*')
#[1] "tsname.11" "tsname.21"
#############
# Refactor your original code similarly.
dat <- tsname
# Run a set of commands and fit models with different parameters
dat[1] <- model1(dat)
dat[2] <- model2(dat)
dat[3] <- model3(dat)
# or maybe figure out how to use sapply here
# Save objects for further analysis
tsname <- dat[1:2] # instead of tsname.1 <- dat.1, tsname.2 <- dat.2
#
save(tsname, file = paste0("tsname", ".rda")

Resources