I'm modifying an existing model using RJAGS. I'd like to run chains in parallel, and occasionally check the Gelman-Rubin convergence diagnostic to see if I need to keep running. The problem is, if I need to resume running based on the diagnostic value, the recompiled chains restart from the first initialized prior values and not the position in parameter space where the chain stopped. If I do not recompile the model, RJAGS complains. Is there a way to store the positions of the chains when they stop so I can re-initialize from where I left off? Here I'll give a very simplified example.
example1.bug:
model {
for (i in 1:N) {
x[i] ~ dnorm(mu,tau)
}
mu ~ dnorm(0,0.0001)
tau <- pow(sigma,-2)
sigma ~ dunif(0,100)
}
parallel_test.R:
#Make some fake data
N <- 1000
x <- rnorm(N,0,5)
write.table(x,
file='example1.data',
row.names=FALSE,
col.names=FALSE)
library('rjags')
library('doParallel')
library('random')
nchains <- 4
c1 <- makeCluster(nchains)
registerDoParallel(c1)
jags=list()
for (i in 1:getDoParWorkers()){
jags[[i]] <- jags.model('example1.bug',
data=list('x'=x,'N'=N))
}
# Function to combine multiple mcmc lists into a single one
mcmc.combine <- function( ... ){
return( as.mcmc.list( sapply( list( ... ),mcmc ) ) )
}
#Start with some burn-in
jags.parsamples <- foreach( i=1:getDoParWorkers(),
.inorder=FALSE,
.packages=c('rjags','random'),
.combine='mcmc.combine',
.multicombine=TRUE) %dopar%
{
jags[[i]]$recompile()
update(jags[[i]],100)
jags.samples <- coda.samples(jags[[i]],c('mu','tau'),100)
return(jags.samples)
}
#Check the diagnostic output
print(gelman.diag(jags.parsamples[,'mu']))
counter <- 0
#my model doesn't converge so quickly, so let's simulate doing
#this updating 5 times:
#while(gelman.diag(jags.parsamples[,'mu'])[[1]][[2]] > 1.04)
while(counter < 5)
{
counter <- counter + 1
jags.parsamples <- foreach(i=1:getDoParWorkers(),
.inorder=FALSE,
.packages=c('rjags','random'),
.combine='mcmc.combine',
.multicombine=TRUE) %dopar%
{
#Here I lose the progress I've made
jags[[i]]$recompile()
jags.samples <- coda.samples(jags[[i]],c('mu','tau'),100)
return(jags.samples)
}
}
print(gelman.diag(jags.parsamples[,'mu']))
print(summary(jags.parsamples))
stopCluster(c1)
In the output, I see:
Iterations = 1001:2000
where I know there should be > 5000 iterations.
(cross-posted to stats.stackexchange.com, which may be the more appropriate venue)
Every time your JAGS model runs on the worker nodes the coda samples are returned but the state of the model is lost. So next time it recompiles, it restarts from the beginning, as you are seeing. To get around this you need to get and return the state of the model in your function (on the worker nodes) like so:
endstate <- jags[[i]]$state(internal=TRUE)
Then you need to pass this back to the worker node and re-generate the model within the worker function using jags.model() with inits=endstate (for the appropriate chain).
I would actually recommend looking at the runjags package that does all this for you. For example:
library('runjags')
parsamples <- run.jags('example1.bug', data=list('x'=x,'N'=N), monitor=c('mu','tau'), sample=100, method='rjparallel')
summary(parsamples)
newparsamples <- extend.jags(parsamples, sample=100)
summary(parsamples)
# etc
Or even:
parsamples <- autorun.jags('example1.bug', data=list('x'=x,'N'=N), monitor=c('mu','tau'), method='rjparallel')
Version 2 of runjags will hopefully be uploaded to CRAN soon, but for now you can download binaries from: https://sourceforge.net/projects/runjags/files/runjags/
Matt
Related
I am currently working on this code. Everything is fine, except...the code needs 12 hours to finish.
I really do want to change this code to use Doparallel, but I just failed over and over again. I have to use Foreach but it seems it cannot improve the efficiency... I don't know how to design a "parallel" for this array as I need this array for further analysis...
Can anybody help me?
Data <- read_csv("C:/Users/Desktop/200144.csv")
ourModels <- list("forecast(naive(data),h=h)","forecast(ets(data,model='ANN'),h=h,level=95)", "forecast(ets(data,model='AAN', damped=FALSE),h=h,level=95)"
,"forecast(ets(data,model='MAM', damped=FALSE),h=h,level=95)","forecast(ets(data,model='AAN', damped=TRUE),h=h,level=95)"
,"forecast(ets(data,ic='aicc'),h=h,level=95)","mapa(data,fh=h,conf.lvl=0.95)","forecast(auto.ssarima(data,ic='AICc'),h=h,level=95)")
ourHorizon <- 2 #1,2,6
ourOrigins <- 144-36-ourHorizon + 1 # x rolling origins
ourForecasts <- array (NA,c(ourHorizon,ourOrigins,length(ourModels),ncol(Data)))#max origin, max horizon, how many time series.
ourHoldoutValues <- array (NA,c(ourHorizon,ourOrigins,ncol(Data)))#max origin, max horizon, how many time series.
EMTable<-array (NA,c(ncol(Data),length(ourModels),3))#3 metrics
RMSETABLE<-array (NA,c(1,ourOrigins,length(ourModels),ncol(Data)))
#collect forecasting results and Standard deviation of past forecasting errors
foreach(a = 1:ncol(Data))%do% {# a is the No. of time series
foreach(b = 1:8)%do%{# b is the No. of the models
ourData <- as.data.frame(Data [,a])
#ourData <- na.omit(ourData)
ourData <- ts(ourData,frequency=12) #monthly data
#Change Models here:
ourCall <- ourModels[[b]]
if (b==7){#MAPA
ourValue <- c("outfor","MSE")
} else if (b==8){#SARIMA
ourValue <- c("mean","model$lossValue")
} else if (b==1){
ourValue <- c("mean","model$sigma2")}#?????
else
ourValue <- c("mean","model$mse")
ourROReturn <- ro(ourData, h=ourHorizon, origins=ourOrigins, call=ourCall, parallel = TRUE,
value=ourValue, ci=FALSE, co=TRUE)
if (b==7){#MAPA is special
ourForecasts[,,b,a] <- ourROReturn$outfor
ourHoldoutValues[,,a] <- ourROReturn$holdout
#measure forecasting performances here #MPE
EMTable[a,b,1]<-mean(apply(((ourROReturn$holdout - ourROReturn$outfor) / ourROReturn$holdout),1,mean,na.rm=TRUE))#MPE
RMSETABLE[,,b,a]<-(ourROReturn$MSE)^0.5
} else{#use ourreturn$model can get their INSAMPLE MSE
ourForecasts[,,b,a] <- ourROReturn$mean
ourHoldoutValues[,,a] <- ourROReturn$holdout
EMTable[a,b,1]<-mean(apply(((ourROReturn$holdout - ourROReturn$mean) / ourROReturn$holdout),1,mean,na.rm=TRUE))#MPE
RMSETABLE[,,b,a]<-(ourROReturn$model)^0.5}
}
print(a)
}
You are almost there. All you need to do is parallel execution as follows
# load libraries
library(parallel)
library(doParalle)
# create parallel cluster
cl <- makeCluster(9,timeout=120)
doParallel::registerDoParallel(cl)
getDoParWorkers()
## Your parallel for loop
#collect forecasting results and Standard deviation of past forecasting errors
foreach(a = 1:ncol(Data), .packages=c(any packages you need inside for loop)) %dopar% {
....
....
....
....
}
If you want to further improve the execution time, make the second foreach loop foreach(b = 1:8)%do%{ as a function and use apply function.
There are several packages in R to simplify running code in parallel, like foreach and future. Most of these have constructs which are like lapply or a for loop: they carry on until all the tasks have finished.
Is there a simple parallel version of Find? That is, I would like to run several tasks in parallel. I don't need all of them to finish, I just need to get the first one that finishes (maybe with a particular result). After that the other tasks can be killed, or left to finish on their own.
Conceptual code:
hunt_needle <- function (x, y) x %in% (y-1000):y
x <- sample.int(1000000, 1)
result <- parallel_find(seq(1000, 1000000, 1000), hunt_needle)
# should return the first value for which hunt_needle is true
You can use shared memory so that processes can communicate with one another.
For that, you can use package bigstatsr (disclaimer: I'm the author).
Choose a block size and do:
# devtools::install_github("privefl/bigstatsr")
library(bigstatsr)
# Data example
cond <- logical(1e6)
cond[sample(length(cond), size = 1)] <- TRUE
ind.block <- bigstatsr:::CutBySize(length(cond), block.size = 1000)
cl <- parallel::makeCluster(nb_cores())
doParallel::registerDoParallel(cl)
# This value (in an on-disk matrix) is shared by processes
found_it <- FBM(1, 1, type = "integer", init = 0L)
library(foreach)
res <- foreach(ic = sample(rows_along(ind.block)), .combine = 'c') %dopar% {
if (found_it[1]) return(NULL)
ind <- bigstatsr:::seq2(ind.block[ic, ])
find <- which(cond[ind])
if (length(find)) {
found_it[1] <- 1L
return(ind[find[1]])
} else {
return(NULL)
}
}
parallel::stopCluster(cl)
# Verification
all.equal(res, which(cond))
Basically, when a solution is found, you don't need to do some computations anymore, and others know it because you put a 1 in found_it which is shared between all processes.
As your question is not reproducible and I don't understand everything you need, you may have to adapt this solution a little bit.
I have written a program to generate a very large amount of random multivariate distributed data (25 x 30 x 10 000 000) using mvtnorm, then do some simple calculations and manipulations on the matrices.
I am using the foreach and doParallel packages to run operations in parallel to reduce time. A completely arbitrary example, just to demonstrate the packages is:
foreach (x = matr) %dopar% {
x[time_horizon + 1] <- x[time_horizon]
x <- cbind(100,x)
for (m in 2:(time_horizon + 1)) {
# loop through each row of matrix to apply function
x[,m] <- x[,m-1] + x[,m]
}
return(x)
}
I have created an implicit cluster of cores to run these foreach functions on:
registerDoParallel(4)
The problem
When I run with multiple cores, it appears to multiply or duplicate the memory used when I monitor performance on Task Manager (i.e. 2 cores uses more memory than 1 core, 4 cores uses more memory than 2).
When I run my program for (25 x 30 x 1 000 000), running in parallel helps the speed of execution (i.e. 4 cores is faster than 1 core). However, when I run my program for (25 x 30 x 2 500 000) and above, too much memory is used and that appears to slow it down.
One friend said it could potentially be a page fault and the hard drive must be accessed when I run out of RAM.
Why is the duplication of memory across cores happening? Is it supposed to happen? Can I stop it? Are there other solutions?
Edit (Full Code):
library(mvtnorm)
library(foreach)
library(doParallel)
library(ggplot2)
library(reshape2)
library(plyr)
# Calculate the number of cores
no_cores <- detectCores()
# Create an implicit cluster and regular cluster
registerDoParallel(no_cores)
daily_pnl <- function() {
time_horizon <- 30
paths <- 2500000
asset <- 25
path_split <- 100
corr_mat <- diag(asset)
expected_returns <- runif(asset,0.0, 0.25)
# Create a list of vectors to store pnl information for each asset
foreach(icount(time_horizon), .packages = "mvtnorm") %dopar% {
average_matrix <- matrix(, (paths/path_split), asset)
split_start <- 1
my_day <- rmvnorm(paths, expected_returns, corr_mat, method="chol")
for (n in 1:(paths/path_split)) {
average_matrix[n,] <- colMeans(my_day[split_start:(split_start + path_split - 1),])
split_start <- split_start + path_split
}
return(average_matrix)
}
}
matrix_splitter <- function(matr) {
time_horizon <- 30
paths <- 2500000
path_split <- 100
asset <- 25
alply(array(unlist(daily), c(paths/path_split,time_horizon,asset)),3)
}
cum_returns <- function(matr) {
time_horizon <- 30
paths <- 2500000
asset <- 25
foreach (x = matr) %dopar% {
x[time_horizon + 1] <- x[time_horizon]
x <- cbind(100,x)
for (m in 2:(time_horizon + 1)) {
# loop through each row of matrix to apply function
x[,m] <- x[,m-1] + x[,m]
}
return(x)
}
}
plotting <- function(path_matr) {
security_paths <- as.data.frame(t(path_matr))
security_paths$id <- 1:nrow(security_paths)
plot_paths <- melt(security_paths, id.var="id")
ggplot(plot_paths, aes(x=id, y=value,group=variable,colour=variable)) +
geom_line(aes(lty=variable))
}
system.time(daily <- daily_pnl())
system.time(daily_by_security <- matrix_splitter(daily))
rm(daily)
gc()
system.time(security_paths <- cum_returns(daily_by_security))
rm(daily_by_security)
gc()
plot_list <- foreach(x = security_paths, .packages = c("reshape2", "ggplot2")) %dopar% {
if (nrow(x) > 100) {
plotting(head(x,100))
} else {
plotting(x)
}
}
#Stop implicit cluster and regular cluster
stopImplicitCluster()
gc()
This seems to be a really old problem. I am having a similar issue. I don't need compute parallelization I actually need memory parallelization. (if such a thing can exist)
what works for me is azure do parallel. instead of registering system cores register cores from the cloud using registerDoAzureParallel(cluster)
your json will define the size of the machines (memory) you hire for the job. make sure each worker has enough memory to get a copy of your r environment. This will probably kill your network. You will be sending data to 30 -40 (depending on how many you have asked for) workers from your machine.
more documentation here.
https://github.com/Azure/doAzureParallel
Can we do something with sparklyr to address such issues?
Problem Description:
I have a big matrix c, loaded in RAM memory. My goal is through parallel processing to have read only access to it. However when I create the connections either I use doSNOW, doMPI, big.matrix, etc the amount to ram used increases dramatically.
Is there a way to properly create a shared memory, where all the processes may read from, without creating a local copy of all the data?
Example:
libs<-function(libraries){# Installs missing libraries and then load them
for (lib in libraries){
if( !is.element(lib, .packages(all.available = TRUE)) ) {
install.packages(lib)
}
library(lib,character.only = TRUE)
}
}
libra<-list("foreach","parallel","doSNOW","bigmemory")
libs(libra)
#create a matrix of size 1GB aproximatelly
c<-matrix(runif(10000^2),10000,10000)
#convert it to bigmatrix
x<-as.big.matrix(c)
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections
cl <- makeCluster(detectCores ())
registerDoSNOW(cl)
out<-foreach(linID = 1:10, .combine=c) %dopar% {
#load bigmemory
require(bigmemory)
# attach the matrix via shared memory??
m <- attach.big.matrix(mdesc)
#dummy expression to test data aquisition
c<-m[1,1]
}
closeAllConnections()
RAM:
in the image above, you may find that the memory increases a lot until foreach ends and it is freed.
I think the solution to the problem can be seen from the post of Steve Weston, the author of the foreach package, here. There he states:
The doParallel package will auto-export variables to the workers that are referenced in the foreach loop.
So I think the problem is that in your code your big matrix c is referenced in the assignment c<-m[1,1]. Just try xyz <- m[1,1] instead and see what happens.
Here is an example with a file-backed big.matrix:
#create a matrix of size 1GB aproximatelly
n <- 10000
m <- 10000
c <- matrix(runif(n*m),n,m)
#convert it to bigmatrix
x <- as.big.matrix(x = c, type = "double",
separated = FALSE,
backingfile = "example.bin",
descriptorfile = "example.desc")
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections
cl <- makeCluster(detectCores ())
registerDoSNOW(cl)
## 1) No referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
t <- attach.big.matrix("example.desc")
for (i in seq_len(30L)) {
for (j in seq_len(m)) {
y <- t[i,j]
}
}
return(0L)
}
## 2) Referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
invisible(c) ## c is referenced and thus exported to workers
t <- attach.big.matrix("example.desc")
for (i in seq_len(30L)) {
for (j in seq_len(m)) {
y <- t[i,j]
}
}
return(0L)
}
closeAllConnections()
Alternatively, if you are on Linux/Mac and you want a CoW shared memory, use forks. First load all your data into the main thread, and then launch working threads (forks) with general function mcparallel from the parallel package.
You can collect their results with mccollect or with the use of truly shared memory using the Rdsm library, like this:
library(parallel)
library(bigmemory) #for shared variables
shared<-bigmemory::big.matrix(nrow = size, ncol = 1, type = 'double')
shared[1]<-1 #Init shared memory with some number
job<-mcparallel({shared[1]<-23}) #...change it in another forked thread
shared[1,1] #...and confirm that it gets changed
# [1] 23
You can confirm, that the value really gets updated in backgruound, if you delay the write:
fn<-function()
{
Sys.sleep(1) #One second delay
shared[1]<-11
}
job<-mcparallel(fn())
shared[1] #Execute immediately after last command
# [1] 23
aaa[1,1] #Execute after one second
# [1] 11
mccollect() #To destroy all forked processes (and possibly collect their output)
To control for concurency and avoid race conditions use locks:
library(synchronicity) #for locks
m<-boost.mutex() #Lets create a mutex "m"
bad.incr<-function() #This function doesn't protect the shared resource with locks:
{
a<-shared[1]
Sys.sleep(1)
shared[1]<-a+1
}
good.incr<-function()
{
lock(m)
a<-shared[1]
Sys.sleep(1)
shared[1]<-a+1
unlock(m)
}
shared[1]<-1
for (i in 1:5) job<-mcparallel(bad.incr())
shared[1] #You can verify, that the value didn't get increased 5 times due to race conditions
mccollect() #To clear all threads, not to get the values
shared[1]<-1
for (i in 1:5) job<-mcparallel(good.incr())
shared[1] #As expected, eventualy after 5 seconds of waiting you get the 6
#[1] 6
mccollect()
Edit:
I simplified dependencies a bit by exchanging Rdsm::mgrmakevar into bigmemory::big.matrix. mgrmakevar internally calls big.matrix anyway, and we don't need anything more.
I would like to do bootstrap of residuals for nls fits in a loop. I use nlsBoot and in order to decrease computation time I would like to do that in parallel (on a Windows 7 system at the moment). Here is some code, which reproduces my problem:
#function for fitting
Falge2000 <- function(GP2000,alpha,PAR) {
(GP2000*alpha*PAR)/(GP2000+alpha*PAR-GP2000/2000*PAR)
}
#some data
PAR <- 10:1600
GPP <- Falge2000(-450,-0.73,PAR) + rnorm(length(PAR),sd=0.0001)
df1 <- data.frame(PAR,GPP)
#nls fit
mod <- nls(GPP~Falge2000(GP2000,alpha,PAR),start=list(GP2000=-450,alpha=-0.73),data=df1, upper=c(0,0),algorithm="port")
#bootstrap of residuals
library(nlstools)
summary(nlsBoot(mod,niter=5))
#works
#now do it several times
#and in parallel
library(foreach)
library(doParallel)
cl <- makeCluster(1)
registerDoParallel(cl)
ttt <- foreach(1:5, .packages='nlstools',.export="df1") %dopar% {
res <- nlsBoot(mod,niter=5)
summary(res)
}
#Error in { :
#task 1 failed - "Procedure aborted: the fit only converged in 1 % during bootstrapping"
stopCluster(cl)
I suspect this an issue with environments and after looking at the code of nlsBoot the problem seems to arise from the use of an anonymous function in a lapply call:
l1 <- lapply(1:niter, function(i) {
data2[, var1] <- fitted1 + sample(scale(resid1, scale = FALSE),
replace = TRUE)
nls2 <- try(update(nls, start = as.list(coef(nls)), data = data2),
silent = TRUE)
if (inherits(nls2, "nls"))
return(list(coef = coef(nls2), rse = summary(nls2)$sigma))
})
if (sum(sapply(l1, is.null)) > niter/2)
stop(paste("Procedure aborted: the fit only converged in",
round(sum(sapply(l1, is.null))/niter), "% during bootstrapping"))
Is there a way to use nlsBoot in a parallel loop? Or do I need to modify the function? (I could try to use a for loop instead of lapply.)
By moving the creation of the mod object into the %dopar% loop, it looks like everything works OK. Also, this automatically exports the df1 object, so you can remove the .export argument.
ttt <- foreach(1:5, .packages='nlstools') %dopar% {
mod <- nls(GPP~Falge2000(GP2000,alpha,PAR),start=list(GP2000=-450,alpha=-0.73),data=df1, upper=c(0,0),algorithm="port")
res <- nlsBoot(mod,niter=5)
capture.output(summary(res))
}
However, you might need to work out what you want returned. Using capture.output was just to see if things were working, since summary(res) seemed to only return NULL.