Variable in a function is not used as the value - r

I am using the package robust.arima in R, which works fine when I call it in a script. However, I want to organize my files and therefore call robust arima in a function. Here all of a sudden the variable is not found. Let me give an example
# Works fine
ts_list <- rnorm(100)
arima.rob(ts_list~1)
# Breaks down
get_rob_estimate <- function(x){
out <- arima.rob(x~1)
return(out)
ts_list <- rnorm(100)
get_rob_estimate(ts_list)
Error in eval(formula[[2]]) : object 'x' not found
Does anyone know what's going on? I think the problem looks similar to R : Pass argument to glm inside an R function , but I still can't seem to figure it out and I am curious how R processes these functions?
Edit
Okay for the basic option I understand it now, but I don't get why it works. What if I have
check_func <- function(ind_ts){
out <- substitute(arima.rob(ind_ts~1))
return(eval(out))
}
analyze_ts <- function(){
df <- mvrnorm(100, mu=c(0,0,0), Sigma=diag(c(1,1,1)))
p <- list()
for (i in ncol(df)){
sel <- df[,i]
check_func(sel)
p <- append(p, sel)
}
return(p)
}
analyze_ts()
I then get the error
Error in eval(formula[[2]]) : object 'sel' not found
How does it work? What is going on here? I just want my list to go as a list in my function, shouldn't be so hard right? Does not matter how many functions it goes through?

Using substitute()
get_rob_estimate <- function(x) {
out <- substitute(robustarima::arima.rob(x ~ 1))
return(eval(out))
}
get_rob_estimate(ts_list)
# Call:
# robustarima::arima.rob(formula = ts_list ~ 1)
#
# Regression Coefficients:
# (Intercept)
# 0.1032
#
# Degrees of freedom: 100 total; 99 residual
# Innovations standard deviation: 0.9832
#
# Number of outliers detected: 1
#
# Outlier index
# [1] 59
#
# Outlier type
# [1] "AO"
#
# Outlier impact
# [1] -3.0963
#
# Outlier t-statistics
# [1] 3.1493
edit
You can write your Arima wrapper correctly like so:
analyze_ts <- function(){
df <- MASS::mvrnorm(100, mu=c(0, 0, 0), Sigma=diag(c(1, 1, 1)))
for (i in seq_len(ncol(df))) {
sel <- df[,i]
sel <- check_func(sel)
p <- append(p, sel)
}
return(p)
}
Better using lapply
analyze_ts <- function() {
df <- MASS::mvrnorm(100, mu=c(0, 0, 0), Sigma=diag(c(1,1,1)))
return(lapply(seq_len(ncol(df)), \(i) check_func(df[,i])))
}
Usage:
set.seed(42) ## for sake of reproducibility
analyze_ts()
Data:
set.seed(42)
ts_list <- rnorm(100)

Related

How to cycle through a list of functions in a for loop

I am fairly new to programming in R, so I apologize if this question is too basic. I am trying to study the properties of OLS with error terms created by three different processes (i.e., normal1, normal2, and chi-square). I include these in a list, 'fun_list'.
I would like to iterate through 1,000 (iter) regressions, each with sample size 500 (n). I would like to save all 1,000 X 500 observations in a dataset (big_data) as well as the regression results (reg_results).
At the end of the program, I would like 1,000 regressions for each of the three processes (for a total of 3,000 regressions). I have set up nested loops for the three functions on one level and the 1,000 iterations on a different (sub-) level. I am having trouble getting the program to loop through the three different functions. I am not sure how to call out each element of the list in this embedded loop. Any help would be greatly appreciated!
library(psych)
library(arm)
library(dplyr)
library(fBasics)
library(sjstats)
#set sample size and number of iterations
set.seed(12345)
n <- 500
iter <- 1000
#setting empty vectors. Probably a better way to do this. :)
bn <- rep(NA,iter)
sen <- rep(NA,iter)
#these are the three functions I want to use to generate en,
#which is the error term below. I want one loop for each of the three.
# I can get f1, f2 and f3 to work independently, but I can't get the list
#to work to cycle through all three.
f1 <- function (n) {rnorm(n, 0, 2)}
f2 <- function (n) {rnorm(n, 0, 10)}
f3 <- function (n) {rchisq(n, 2)}
fun_list <- list(f1, f2, f3)
#following line starting point for saving all iterations in one big
#dataset
datalist = list()
#if I remove the following line (for (j ....)), I can get this to work by
#referencing each function independently (i.e., using 'en <- f1(n)').
for (j in fun_list) {
for (s in 1:iter) {
# en <- f1(n)
en <- fun_list[[1]]
x <- rnorm(n, 0, .5)
yn <- .3*x + en
#this is the part that saves the data#
dat <- data.frame(yn, x, en)
dat$s <- s
datalist[[s]] <- dat
#### run model for normal data and save parameters###
lm1n <- lm(yn ~ x)
int.hatn <- coef (lm1n)[1]
b.hatn <- coef (lm1n)[2]
se.hatn <- se.coef (lm1n) [2]
##save them for each iteration
bn[s] = b.hatn
sen[s] = se.hatn
}
}
reg_results<- tibble(bn, sen)
big_data = do.call(rbind,datalist)
When using the loop, I get the following error:
Error in 0.3 * x + en : non-numeric argument to binary operator
I am assuming this is because I do not fully understand how to call out each of the three functions in the list.
Here is a complete solution which wraps the multiple points discussed in the comments:
library(psych)
library(arm)
library(dplyr)
library(fBasics)
library(sjstats)
#set sample size and number of iterations
set.seed(12345)
n <- 500
iter <- 1000
#setting empty vectors. Probably a better way to do this. :)
bn <- c()
sen <- c()
#these are the three functions I want to use to generate en,
#which is the error term below. I want one loop for each of the three.
# I can get f1, f2 and f3 to work independently, but I can't get the list
#to work to cycle through all three.
f1 <- function (n) {rnorm(n, 0, 2)}
f2 <- function (n) {rnorm(n, 0, 10)}
f3 <- function (n) {rchisq(n, 2)}
fun_list <- list(f1, f2, f3)
#following line starting point for saving all iterations in one big
#dataset
datalist = list()
#if I remove the following line (for (j ....)), I can get this to work by
#referencing each function independently (i.e., using 'en <- f1(n)').
for (j in c(1:length(fun_list))) {
en <- fun_list[[j]]
for (s in 1:iter) {
x <- rnorm(n, 0, .5)
random_part <- en(n)
yn <- .3*x + random_part
#this is the part that saves the data#
dat <- data.frame(yn, x, random_part)
dat$s <- s
datalist[[s]] <- dat
#### run model for normal data and save parameters###
lm1n <- lm(yn ~ x)
int.hatn <- coef(lm1n)[1]
b.hatn <- coef(lm1n)[2]
se.hatn <- se.coef(lm1n)[2]
##save them for each iteration
bn = c(bn,b.hatn)
sen = c(sen,se.hatn)
}
}
reg_results<- tibble(bn, sen)
big_data = do.call(rbind,datalist)

Using snowfall package in R to do some simulation

I am staring work with snowfall package in that way:
library(snowfall)
sfInit(parallel=TRUE, cpus=6, type="SOCK")
#loading packages
sfLibrary(package = lars)
sfLibrary(package=covTest)
Function that I want to compute multiple times using sfLapply:
funkcja <- function(i,k=5)
{
beta <- c(k,k,0,k,k,rep(0,35))
X <- matrix(rnorm(100*40),100,40)
Y <- X%*%beta+rnorm(100)
lasso.lars <- lars(X,Y,intercept=FALSE,use.Gram=FALSE)
test <- covTest(lasso.lars,X,Y,sigma.est=1)
test
}
But when I try this
sfLapply(1:100,funkcja)
I get error:
"Error in checkForRemoteErrors(val): 6 nodes produced errors; first error: object 'Y' not found". But when I hide the last but one line and change test for lasso.lars then there is no longer trobule about vector Y:
funkcja <- function(i,k=5)
{
beta <- c(k,k,0,k,k,rep(0,35))
X <- matrix(rnorm(100*40),100,40)
Y <- X%*%beta+rnorm(100)
lasso.lars <- lars(X,Y,intercept=FALSE,use.Gram=FALSE)
#test <- covTest(lasso.lars,X,Y,sigma.est=1)
lasso.lars
}
I dont understand this because the line
test <- covTest(lasso.lars,X,Y,sigma.est=1)
should work since
lars(X,Y,intercept=FALSE,use.Gram=FALSE)
can work. I will be grateful for your help.
My guess is that Y is hiding and internal variable. The following function works (changed the case of "Y" to "y")
funkcja <- function(i,k=5)
{
beta <- c(k,k,0,k,k,rep(0,35))
X <- matrix(rnorm(100*40),100,40)
y <- X %*% beta + rnorm(100)
lasso.lars <- lars(X,y,intercept=FALSE,use.Gram=FALSE)
test <- covTest(lasso.lars,X,y,sigma.est=1)
test
}

R issue : performing lm and then a boxcox to find a proper lambda value

I have a dataset of climate data in a data.frame (columns are measuring stations, and rows indicate time of measurement), and I'm trying to find the proper lambda values in a Yeo-Johnson transform to limit skewness impact on a principal component analysis.
Obviously, the first step is to get log likelihoods to find the best lambda : I use the following, where i is the index of a column :
getYeoJohsnonLambda <- function(myClimateData,cols,lambda_min, lambda_max,eps)
...
lambda <- seq(lambda_min,lambda_max,eps)
for(i in cols)
{
formula <- as.formula(paste("myClimateData$",colnames(myClimateData)[i],"~1"))
currentModel <- lm(formula,myClimateData)
print(currentModel)
myboxCox <- boxCox(currentModel, lambda = lambda ,family="yjPower", plotit = FALSE)
...
}
When I am trying to call it for a climateData time series which could be, for example :
`climateData <-data.frame(c(8.2,6.83,5.46,4.1,3.73,3.36,3,3,3,3,3.7),c(0,0.66,1.33,2,2,2,2,2,2,2,1.6))`
I get this error : Error in is.data.frame(data) : object 'myClimateData' not found
This is weird, as lm seems to find it and return a correct fit, and myClimateData should be found as it is one of the arguments of the function, right ?
Sadly, it seems that the problem comes from the function boxCox rather than your getYeoJohsnonLambda function. As BrodieG pointed out in a related question, this function uses parent.frame as an argument to eval which is considered as bad practice in the doc.
One way to solve this is to build the models before the call, as suggested in Adam Quek's answer:
library(car)
climateData <- data.frame(c(8.2,6.83,5.46,4.1,3.73,3.36,3,3,3,3,3.7),c(0,0.66,1.33,2,2,2,2,2,2,2,1.6))
names(climateData) <- c("a","b")
modelList <- list()
for(k in 1:ncol(climateData)) {
modelList[[k]] <- lm(as.formula(paste0(names(climateData)[k],"~1")),data=climateData)
}
getYeoJohnsonLambda <- function(myClimateData, cols, lambda_min, lambda_max, eps)
{
#Recommended values for lambda_min = -0.5 and lambda_max = 2.0, eps = 0.1
myboxCox <- list()
lmd <- seq(lambda_min,lambda_max,eps)
for(i in cols)
{
cat("Creating model for column # ",i,"\n")
currentModel <- modelList[[i]]
myboxCox[[i]] <- boxCox(currentModel, lambda = lmd ,family="yjPower", plotit = FALSE)
}
return(myboxCox)
}
test <- getYeoJohnsonLambda(climateData,c(1,2) ,-0.5,2,0.1)
Other solution (arguably cleaner): use yeo.johnson in VGAM
library(VGAM)
getYeoJohnsonLambda_VGAM <- function(myClimateData, cols, lambda_min, lambda_max, eps)
{
#Recommended values for lambda_min = -0.5 and lambda_max = 2.0, eps = 0.1
myboxCox <- list()
lmd <- seq(lambda_min,lambda_max,eps)
return(apply(climateData,2,yeo.johnson,lambda=lmd))
}
test2 <- getYeoJohnsonLambda_VGAM(climateData,c(1,2) ,-0.5,2,0.1)
Here's a solution without troubleshooting the function getYeoJohsnonLambda:
iris.dat <- iris[-5]
vars <- names(iris.dat)
lmd <- seq(.1, 1, .1) #lambda_min, lambda_max, eps
all.form <- lapply(vars, function(x) as.formula(paste0(x, "~ 1")))
all.lm <- lapply(all.form, lm, data=iris.dat)
library(MASS)
all.bcox <- lapply(all.form, boxcox, data=iris.dat,
lambda=lmd, family="yjPower", plotit=FALSE)

Plotting specific variable in runjags output

When plotting runjags output, how does one plot a single specific variable, when many other variables have similar names? Providing a quoted variable name with the varsargument doesn't seem to do it (it still provides all partial matches).
Here is a simple reproducible example.
N <- 200
nobs <- 3
psi <- 0.35
p <- 0.45
z <- rbinom(n=N, size=1,prob=psi)
y <- rbinom(n=N, size=nobs,prob=p*z)
sink("model.txt")
cat("
model {
for (i in 1:N){
z[i] ~ dbern(psi)
pz[i] <- z[i]*p
y[i] ~ dbin(pz[i],nobs)
} #i
psi ~ dunif(0,1)
p ~ dunif(0,1)
}
",fill = TRUE)
sink()
m <-list(y=y,N=N,nobs=nobs)
inits <- function(){list(psi=runif(1),p=runif(1),z=as.numeric(y>0))}
parameters <- c("p","psi")
ni <- 1000
nt <- 1
nb <- 200
nc <- 3
ad <- 100
library(runjags)
out <- run.jags(model="model.txt",monitor=parameters,data=m,n.chains=nc,inits=inits,burnin=nb,
sample=ni,adapt=ad,thin=nt,modules=c("glm","dic"),method="parallel")
windows(9,4)
plot(out,plot.type=c("trace","histogram"),vars="p",layout=c(1,2),new.window=FALSE)
It should be possible to double quote variables to get an exact match, but this seems to be broken. It should also be possible to specify a logical vector to vars but this seems to be broken for the plot method ... how embarrassing. The following does work though:
# Generate a logical vector to use with matching variable names:
variables <- extract(out, 'stochastic')
variables['psi'] <- FALSE
# Add summary statistics only for the specified variables and pre-draw plots:
out2 <- add.summary(out, vars=variables, plots=TRUE)
plot(out2, plot.type=c("trace","histogram"))
I will fix the other issues for the next release.
Matt

Linear Regression in For Loop

I get an error for running the code below. I haven not figured out what I am doing wrong - sorry if it is obvious, I am new to R. The idea is to "generate" 100 regressions and output the estimated slope 100 times.
set.seed(21)
x <- seq(1,40,1)
for (i in 1:100 ) {
y[i] = 2*x+1+5*rnorm(length(x))
reg[i] <- lm(y[i]~x)
slp[i] <- coef(reg[i])[2]
}
There are several problems with the way you use indexing. You'll probably need to spend some time again on a short tutorial about R for beginners, and not "rush" to loops and regressions...
In the end, you want to have a vector containing 100 slope values. You need to define this (empty) vector 'slp' prior to running the loop and then fill each ith element with its value in the loop.
On the other hand,
1) at each iteration you don't fill the ith element of y but create a whole new vector y with as many values as there are in x...
2) you don't need to keep every regression so you don't need to "index" your object reg.
So here it is:
set.seed(21)
x <- seq(1,40,1)
slp=rep(NA,100)
for (i in 1:100) {
y = 2*x+1+5*rnorm(length(x))
reg <- lm(y~x)
slp[i]<-coef(reg)[2]
}
print(slp)
In addition to the other answers, there is a better (more efficient and easier) possibility. lm accepts a matrix as input for y:
set.seed(21)
y <- matrix(rep(2*x + 1, 100) + 5 *rnorm(length(x) * 100), ncol = 100)
reg1 <- lm(y ~ x)
slp1 <- coef(reg1)[2,]
all.equal(slp, slp1)
#[1] TRUE
If you had a function other than lm and needed a loop, you should use replicate instead of a for loop:
set.seed(21)
slp2 <- replicate(100, {
y = 2*x+1+5*rnorm(length(x))
reg <- lm(y~x)
unname(coef(reg)[2])
})
all.equal(slp, slp2)
#[1] TRUE
You need to create the matrix/vector y, reg, slp first, to be able to write to position i like: y[i] <-. You can do something along:
set.seed(21)
x <- seq(1,40,1)
slp <- numeric(100)
for (i in 1:100 ) {
y <- 2*x+1+5*rnorm(length(x))
reg <- lm(y~x)
slp[i] <- coef(reg)[2]
}
> slp
[1] 2.036344 1.953487 1.949170 1.961897 2.098186 2.027659 2.002638 2.107278
[9] 2.036880 1.980800 1.893701 1.925230 1.927503 2.073176 2.101303 1.943719
...
[97] 1.966039 2.041239 2.063801 2.066801

Resources