Using snowfall package in R to do some simulation - r

I am staring work with snowfall package in that way:
library(snowfall)
sfInit(parallel=TRUE, cpus=6, type="SOCK")
#loading packages
sfLibrary(package = lars)
sfLibrary(package=covTest)
Function that I want to compute multiple times using sfLapply:
funkcja <- function(i,k=5)
{
beta <- c(k,k,0,k,k,rep(0,35))
X <- matrix(rnorm(100*40),100,40)
Y <- X%*%beta+rnorm(100)
lasso.lars <- lars(X,Y,intercept=FALSE,use.Gram=FALSE)
test <- covTest(lasso.lars,X,Y,sigma.est=1)
test
}
But when I try this
sfLapply(1:100,funkcja)
I get error:
"Error in checkForRemoteErrors(val): 6 nodes produced errors; first error: object 'Y' not found". But when I hide the last but one line and change test for lasso.lars then there is no longer trobule about vector Y:
funkcja <- function(i,k=5)
{
beta <- c(k,k,0,k,k,rep(0,35))
X <- matrix(rnorm(100*40),100,40)
Y <- X%*%beta+rnorm(100)
lasso.lars <- lars(X,Y,intercept=FALSE,use.Gram=FALSE)
#test <- covTest(lasso.lars,X,Y,sigma.est=1)
lasso.lars
}
I dont understand this because the line
test <- covTest(lasso.lars,X,Y,sigma.est=1)
should work since
lars(X,Y,intercept=FALSE,use.Gram=FALSE)
can work. I will be grateful for your help.

My guess is that Y is hiding and internal variable. The following function works (changed the case of "Y" to "y")
funkcja <- function(i,k=5)
{
beta <- c(k,k,0,k,k,rep(0,35))
X <- matrix(rnorm(100*40),100,40)
y <- X %*% beta + rnorm(100)
lasso.lars <- lars(X,y,intercept=FALSE,use.Gram=FALSE)
test <- covTest(lasso.lars,X,y,sigma.est=1)
test
}

Related

Variable in a function is not used as the value

I am using the package robust.arima in R, which works fine when I call it in a script. However, I want to organize my files and therefore call robust arima in a function. Here all of a sudden the variable is not found. Let me give an example
# Works fine
ts_list <- rnorm(100)
arima.rob(ts_list~1)
# Breaks down
get_rob_estimate <- function(x){
out <- arima.rob(x~1)
return(out)
ts_list <- rnorm(100)
get_rob_estimate(ts_list)
Error in eval(formula[[2]]) : object 'x' not found
Does anyone know what's going on? I think the problem looks similar to R : Pass argument to glm inside an R function , but I still can't seem to figure it out and I am curious how R processes these functions?
Edit
Okay for the basic option I understand it now, but I don't get why it works. What if I have
check_func <- function(ind_ts){
out <- substitute(arima.rob(ind_ts~1))
return(eval(out))
}
analyze_ts <- function(){
df <- mvrnorm(100, mu=c(0,0,0), Sigma=diag(c(1,1,1)))
p <- list()
for (i in ncol(df)){
sel <- df[,i]
check_func(sel)
p <- append(p, sel)
}
return(p)
}
analyze_ts()
I then get the error
Error in eval(formula[[2]]) : object 'sel' not found
How does it work? What is going on here? I just want my list to go as a list in my function, shouldn't be so hard right? Does not matter how many functions it goes through?
Using substitute()
get_rob_estimate <- function(x) {
out <- substitute(robustarima::arima.rob(x ~ 1))
return(eval(out))
}
get_rob_estimate(ts_list)
# Call:
# robustarima::arima.rob(formula = ts_list ~ 1)
#
# Regression Coefficients:
# (Intercept)
# 0.1032
#
# Degrees of freedom: 100 total; 99 residual
# Innovations standard deviation: 0.9832
#
# Number of outliers detected: 1
#
# Outlier index
# [1] 59
#
# Outlier type
# [1] "AO"
#
# Outlier impact
# [1] -3.0963
#
# Outlier t-statistics
# [1] 3.1493
edit
You can write your Arima wrapper correctly like so:
analyze_ts <- function(){
df <- MASS::mvrnorm(100, mu=c(0, 0, 0), Sigma=diag(c(1, 1, 1)))
for (i in seq_len(ncol(df))) {
sel <- df[,i]
sel <- check_func(sel)
p <- append(p, sel)
}
return(p)
}
Better using lapply
analyze_ts <- function() {
df <- MASS::mvrnorm(100, mu=c(0, 0, 0), Sigma=diag(c(1,1,1)))
return(lapply(seq_len(ncol(df)), \(i) check_func(df[,i])))
}
Usage:
set.seed(42) ## for sake of reproducibility
analyze_ts()
Data:
set.seed(42)
ts_list <- rnorm(100)

Error when creating list of time series linear models with a loop

I have the following data set (code requires the forecast package for the tslm call.
x <- rnorm(11, mean = 534, sd = 79)
y <- rnorm(9, mean = 800, sd = 56)
p <- list(x, y)
tsl <- list(); ts_trend <- list()
for(i in seq_along(p)) {
tsl[[i]] <- ts(p[[i]], start = c(2018, 1), frequency = 52)
}
for(i in seq_along(tsl)) {
ts_trend[[i]] <- tslm(tsl[[i]] ~ trend)
}
When I run it, I get the error
Error in tsl[[i]] : subscript out of bounds
The subscript, to my knowledge, is clearly not out of bounds. I use the same reference in the prior loop, with no error.
I have no idea how to fix this. What am I missing?
We can use lapply and it would work
ts_trendN <- lapply(tsl, function(x) tslm(x ~ trend))
The reason why for loop is not working is based on the environment on which trend is calculated. We can create a new environment and it would work fine
for(i in seq_along(tsl)) {
ev1 <- new.env()
ev1$tsl1 <- tsl[[i]]
ts_trend[[i]] <- tslm(ev1$tsl1 ~ trend)
}
There may be some difference in attributes. The model output is the same
library(broom)
identical(tidy(ts_trendN[[1]]), tidy(ts_trend[[1]]))
#[1] TRUE

R issue : performing lm and then a boxcox to find a proper lambda value

I have a dataset of climate data in a data.frame (columns are measuring stations, and rows indicate time of measurement), and I'm trying to find the proper lambda values in a Yeo-Johnson transform to limit skewness impact on a principal component analysis.
Obviously, the first step is to get log likelihoods to find the best lambda : I use the following, where i is the index of a column :
getYeoJohsnonLambda <- function(myClimateData,cols,lambda_min, lambda_max,eps)
...
lambda <- seq(lambda_min,lambda_max,eps)
for(i in cols)
{
formula <- as.formula(paste("myClimateData$",colnames(myClimateData)[i],"~1"))
currentModel <- lm(formula,myClimateData)
print(currentModel)
myboxCox <- boxCox(currentModel, lambda = lambda ,family="yjPower", plotit = FALSE)
...
}
When I am trying to call it for a climateData time series which could be, for example :
`climateData <-data.frame(c(8.2,6.83,5.46,4.1,3.73,3.36,3,3,3,3,3.7),c(0,0.66,1.33,2,2,2,2,2,2,2,1.6))`
I get this error : Error in is.data.frame(data) : object 'myClimateData' not found
This is weird, as lm seems to find it and return a correct fit, and myClimateData should be found as it is one of the arguments of the function, right ?
Sadly, it seems that the problem comes from the function boxCox rather than your getYeoJohsnonLambda function. As BrodieG pointed out in a related question, this function uses parent.frame as an argument to eval which is considered as bad practice in the doc.
One way to solve this is to build the models before the call, as suggested in Adam Quek's answer:
library(car)
climateData <- data.frame(c(8.2,6.83,5.46,4.1,3.73,3.36,3,3,3,3,3.7),c(0,0.66,1.33,2,2,2,2,2,2,2,1.6))
names(climateData) <- c("a","b")
modelList <- list()
for(k in 1:ncol(climateData)) {
modelList[[k]] <- lm(as.formula(paste0(names(climateData)[k],"~1")),data=climateData)
}
getYeoJohnsonLambda <- function(myClimateData, cols, lambda_min, lambda_max, eps)
{
#Recommended values for lambda_min = -0.5 and lambda_max = 2.0, eps = 0.1
myboxCox <- list()
lmd <- seq(lambda_min,lambda_max,eps)
for(i in cols)
{
cat("Creating model for column # ",i,"\n")
currentModel <- modelList[[i]]
myboxCox[[i]] <- boxCox(currentModel, lambda = lmd ,family="yjPower", plotit = FALSE)
}
return(myboxCox)
}
test <- getYeoJohnsonLambda(climateData,c(1,2) ,-0.5,2,0.1)
Other solution (arguably cleaner): use yeo.johnson in VGAM
library(VGAM)
getYeoJohnsonLambda_VGAM <- function(myClimateData, cols, lambda_min, lambda_max, eps)
{
#Recommended values for lambda_min = -0.5 and lambda_max = 2.0, eps = 0.1
myboxCox <- list()
lmd <- seq(lambda_min,lambda_max,eps)
return(apply(climateData,2,yeo.johnson,lambda=lmd))
}
test2 <- getYeoJohnsonLambda_VGAM(climateData,c(1,2) ,-0.5,2,0.1)
Here's a solution without troubleshooting the function getYeoJohsnonLambda:
iris.dat <- iris[-5]
vars <- names(iris.dat)
lmd <- seq(.1, 1, .1) #lambda_min, lambda_max, eps
all.form <- lapply(vars, function(x) as.formula(paste0(x, "~ 1")))
all.lm <- lapply(all.form, lm, data=iris.dat)
library(MASS)
all.bcox <- lapply(all.form, boxcox, data=iris.dat,
lambda=lmd, family="yjPower", plotit=FALSE)

compute function in neuralnet R package not working when reproduced

I trained a model in neuralnet and I am trying to figure out how to compute results in Excel. Using the compute function that you call from the package everything works fine. But I went into the source code using F2 in Rstudio and the github page and the function is not working and stops on the relist() function and gives the error: Error in relist(weights, nrow.weights, ncol.weights) :
unused argument (ncol.weights)
I think the problem is the relist() function but I do not know how to transform the weights without it. And the neuralnet package does not come with its own version of relist(). If you ignore the relist line you get the following error: Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments because the weights wasn't transformed correctly. I tried the same thing on my own data and got the same error.
library(neuralnet)
normalize <-function(x) {
return((x - min(x))/(max(x) - min(x)))
}
newdf <- Cars93
newdf = na.omit(newdf)
newdf <- newdf[complete.cases(newdf),]
newdf$Cylinders <- as.numeric(levels(newdf$Cylinders))[newdf$Cylinders]
newdf$Horsepower <- normalize(newdf$Horsepower)
newdf$EngineSize <- normalize(newdf$EngineSize)
newdf$Cylinders <- normalize(newdf$Cylinders)
smp_size <- floor(0.75 * nrow(newdf))
set.seed(12)
train_ind <- sample(seq_len(nrow(newdf)), size = smp_size)
train <- newdf[train_ind, ]
test <- newdf[-train_ind, ]
carsNN <- neuralnet(Horsepower ~ Cylinders+EngineSize,
data = train,hidden = c(1))
cars_results = compute(carsNN,test[11:12])
#this is the source code using F2 in RStudio and on github
sourceCodeCompute = function (x, covariate, rep = 1)
{
nn <- x
linear.output <- nn$linear.output
weights <- nn$weights[[rep]]
nrow.weights <- sapply(weights, nrow)
ncol.weights <- sapply(weights, ncol)
weights <- unlist(weights)
if (any(is.na(weights)))
weights[is.na(weights)] <- 0
weights <- relist(weights, nrow.weights, ncol.weights)
length.weights <- length(weights)
covariate <- as.matrix(cbind(1, covariate))
act.fct <- nn$act.fct
neurons <- list(covariate)
if (length.weights > 1)
for (i in 1:(length.weights - 1)) {
temp <- neurons[[i]] %*% weights[[i]]
act.temp <- act.fct(temp)
neurons[[i + 1]] <- cbind(1, act.temp)
}
temp <- neurons[[length.weights]] %*% weights[[length.weights]]
if (linear.output)
net.result <- temp
else net.result <- act.fct(temp)
list(neurons = neurons, net.result = net.result)
}
sourceCodeCompute(carsNN,test[11:12])
You use the wrong relist function. Try explicitly calling neuralnet:::relist, which is the (unexported) function used automatically within the package namespace.
(I don't know how this question relates to Excel.)

Updated: Parallel computing using R result in "attempt to replicate an object of type 'closure'"

I have set up a Metropolis-Hastings algorithm, and now I am trying to run the algorithm using parallel computing. I have set up a single-chain function
library(parallel)
library(foreach)
library(mvtnorm)
library(doParallel)
n<-100
mX <- 1:n
vY <- rnorm(n)
chains <- 4
iter <- n
p <- 2
#Loglikelihood
post <- function(y, theta) dmvnorm(t(y), rep(0,length(y)), theta[1]*exp(- abs(matrix(rep(mX,n),n) - matrix(rep(mX,each=n),n))/theta[2]),log=TRUE)
geninits <- function() list(theta = runif(p, 0, 1))
dist <- 0.01
jump <- function(x, dist) exp(log(x) + rmvnorm(1,rep(0,p),diag(rep(dist,p))))
MCsingle <- function(){ # This is part of a larger function, so no input are needed
inits <- geninits()
theta.post <- matrix(NA,nrow=p,ncol=iter)
for (i in 1:p) theta.post[i,1] <- inits$theta[i]
for (t in 2:iter){
theta_star <- c(jump(theta.post[, t-1],dist))
pstar <- post(vY, theta = theta_star) # post is the loglikelihood using dmvnorm.
pprev <- post(vY, theta = theta.post[,t-1])
r <- min(exp(pstar - pprev) , 1)
accept <- rbinom(1, 1, prob = r)
if (accept == 1){
theta.post[, t] <- theta_star
} else {
theta.post[, t] <- theta.post[, t-1]
}
}
return(theta.post)
}
, which returns an p x iter matrix, with p parameters and iter iterations.
cl<-makeCluster(4)
registerDoParallel(cl)
posterior <- foreach(c = 1:chains) %dopar% {
MCsingle() }
UPDATE: When I tried to simplify the problem the code suddenly seemed to work. Even though I purposely tried to make errors, the code ran perfectly and the results were as wanted. So for others with similar problems unfortunately I cannot give an answer.
A follow-up question:
My initial purpose was to built up an entire function, such that
MCmulti <- function(mX,vY,iter,chains){
posterior <- foreach(c = 1:chains) %dopar% {
MCsingle() }
return(posterior)
}
but the foreach-loop does not seem to read all the required functions like:
Error in FUN() : task 1 failed - "could not find function "geninits""
Can anybody answer how to implement custom functions inside a foreach loop? Am I to input it as MCmulti <- function(FUN,...) FUN() and call MCmulti(MCsingle,...) ?

Resources