I have run a looong computation in WinBUGS (million iterations) using the R2WinBUGS package from within R:
bugs.object <- bugs(...)
but the R crashed. How do I reload the bugs.object into R again without running winbugs again? I tried this (I have 3 chains):
out <- read.bugs(paste("coda", 1:3, ".txt", sep = ""))
but the out data structure is completely diferent from the bugs object (as it is, it is unusable). I tried to convert it with as.bugs.array:
bugs.object <- as.bugs.array(out, model.file = "ttest.txt", n.iter = 1000000, n.burnin = 300000, n.thin = 2, program = "WinBUGS")
but it doesn't work. Please help. Thanks.
It is likely that you are reading a error message, where R ran out of memory to create the bugs.array object.
You can get round this problem by setting the codaPkg=T statement in the bugs function. This saves the CODA files in your specified working directory rather than creating the R2WinBUGS object (before R crashes). Then you can read the coda files back in using read.mcmc in the coda package, and if you really want, convert the mcmc object to a bugs.array.
This might not work if your MCMC is too big or you do not have enough memory for R.
Related
I'm new here. I've been struggling with analysing some data with the BaSTA package, the data works ok after running the "Datacheck" code , but right after running the following code this happens:
multiout <- multibasta(object = datosJ, studyStart = 1999, studyEnd = 2018, model = "LO",
shape = "simple", niter = 20001, burnin = 2001, thinning = 100,
parallel = TRUE)
No problems were detected with the data.
Starting simulation to find jump sd's... done.
Multiple simulations started...
**Error in setDefaultClusterOptions(type = .sfOption$type) :
could not find function "setDefaultClusterOptions"**
I believe this error has something to do with the usage of "parallel = TRUE" which is a function of the snow package that comes incorporated in the BaSTA package and makes the analysis run faster. If I don't use parallel the analysis takes weeks in running and I've been told that's not normal for the package I'm using.
Any help would be very helpful, thank you.
I came across this same behavior when using another R package that depends on snowfall. setDefaultClusterOptions is housed within a dependency of BaSTA so this is error message is because packages are not being loaded. Try calling library(snowfall) prior to running the BaSTA package command to see if that fixes it for you.
I am using MXnet for training a CNN (in R) and I can train the model without any error with the following code:
model <- mx.model.FeedForward.create(symbol=network,
X=train.iter,
ctx=mx.gpu(0),
num.round=20,
array.batch.size=batch.size,
learning.rate=0.1,
momentum=0.1,
eval.metric=mx.metric.accuracy,
wd=0.001,
batch.end.callback=mx.callback.log.speedometer(batch.size, frequency = 100)
)
But as this process is time-consuming, I run it on a server during the night and I want to save the model for the purpose of using it after finishing the training.
I used:
save(list = ls(), file="mymodel.RData")
and
mx.model.save("mymodel", 10)
But none of them can save the model! for example when I load the "mymodel.RData", I can not predict the labels for the test set!
Another example is when I load the "mymodel.RData" and try to plot it with the following code:
graph.viz(model$symbol$as.json())
I get the following error:
Error in model$symbol$as.json() : external pointer is not valid
Can anybody give me a solution for saving and then loading this model for future use?
Thanks
You can save the model by
model <- mx.model.FeedForward.create(symbol=network,
X=train.iter,
ctx=mx.gpu(0),
num.round=20,
array.batch.size=batch.size,
learning.rate=0.1,
momentum=0.1,
eval.metric=mx.metric.accuracy,
wd=0.001,
epoch.end.callback=mx.callback.save.checkpoint("model_prefix")
batch.end.callback=mx.callback.log.speedometer(batch.size, frequency = 100)
)
A mxnet model is an R list, but its first component is not an R object but a C++ pointer and can't be saved and reloaded as an R object. Therefore, the model needs to be serialized to behave as an actual R object. The serialized object is also a list, but its first object is a text string containing model information.
To save a model:
modelR <- mx.serialize(model)
save(modelR, file="~/model1.RData")
To retrieve it and use it again:
load("~/model1.RData", verbose=TRUE)
model <- mx.unserialize(modelR)
The best practice for saving a snapshot of your training progress is to use save_snapshot (http://mxnet.io/api/python/module.html#mxnet.module.Module.save_checkpoint) as part of the callback after every epoch training. In R the equivalent command is probably mx.callback.save.checkpoint, but I'm not using R and not sure about it usage.
Using these snapshots can also allow you to take advantage of the low cost option of using AWS Spot market (https://aws.amazon.com/ec2/spot/pricing/ ), which for example now offers and instance with 16 K80 GPUs for $3.8/hour compare to the on-demand price of $14.4. Such 80%-90% discount is common in the spot market and can optimize the speed and cost of your training, as long as you use these snapshots correctly.
In a nutshell I am trying to parallelise my whole script over dates using Snow and adply but continually get the below error.
Error in unserialize(socklist[[n]]) : error reading from connection
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
I have set up the parallelisation process in the following way:
Cores = detectCores(all.tests = FALSE, logical = TRUE)
cl = makeCluster(Cores, type="SOCK")
registerDoSNOW(cl)
clusterExport(cl, c("Var1","Var2","Var3","Var4"), envir = environment())
exposureDaily <- adply(.data = dateSeries,.margins = 1,.fun = MainCalcFunction,
.expand = TRUE, Var1, Var2, Var3,
Var4,.parallel = TRUE)
stopCluster(cl)
Where dateSeries might look something like
> dateSeries
marketDate
1 2016-04-22
2 2016-04-26
MainCalcFunction is a very long script with multiple of my own functions contained within it. As the script is so long reproducing it wouldn't be practical, and a hypothetical small function would defeat the purpose as I have already got this methodology to work with other smaller functions. I can say that within MainCalcFunction I call all my libraries, necessary functions, and a file containing all other variables aside from those exported above so that I don't have to export a long list libraries and other objects.
MainCalcFunction can run successfully in its entirety over 2 dates using adply but not parallelisation, which tells me that it is not a bug in the code that is causing the parallelisation to fail.
Initially I thought (from experience) that the parallelisation over dates was failing because there was another function within the code that utilised parallelisation, however I have subsequently rebuilt the whole code to make sure that there was no such function.
I have poured over the script with a fine tooth comb to see if there was any place where I accidently didn't export something that I needed and I can't find anything.
Some ideas as to what could be causing the code to fail are:
The use of various option valuation functions in fOptions and rquantlib
The use of type sock
I am aware of this question already asked and also this question, and while the first question has helped me, it hasn't yet help solve the problem. (Note: that may be because I haven't used it correctly, having mainly used loginfo("text") to track where the code is. Potentially, there is a way to change that such that I log warning and/or error messages instead?)
Please let me know if there is any other information I can provide to help in solving this. I would be so appreciative if someone could provide some guidance, as the code takes close to 40 minutes to run for a day and I need to run it for close to a year, therefore parallelisation is essential!
EDIT
I have tried to implement the suggestion in the first question included above by utilising the outfile option. Given I am using Windows, I have done this by including the following lines before the exporting of the key objects and running MainCalcFunction :
reportLogName <- paste("logout_parallel.txt", sep="")
addHandler(writeToFile,
file = paste(Save_directory,reportLogName, sep="" ),
level='DEBUG')
with(getLogger(), names(handlers))
loginfo(paste("Starting log file", getwd()))
mc<-detectCores()
cl<-makeCluster(mc, outfile="")
registerDoParallel(cl)
Similarly, at the beginning of MainCalcFunction, after having sourced my libraries and functions I have included the following to print to file:
reportLogName <- paste(testDate,"_logout.txt", sep="")
addHandler(writeToFile,
file = paste(Save_directory,reportLogName, sep="" ),
level='DEBUG')
with(getLogger(), names(handlers))
loginfo(paste("Starting test function ",getwd(), sep = ""))
In the MainCalcFunction function I have then put loginfo("text") statements at key junctures to inform me of where the code is at.
This has resulted in some text files being available after the code fails due to the aforementioned error. However, these text files provide no more information on the cause of the error aside from at what point. This is despite having a tryCatch statement embedded in MainCalcFunction where at the end, on any instance of error I have added the line logerror(e)
I am posting this answer in case it helps anyone else with a similar problem in the future.
Essentially, the error unserialize(socklist[[n]]) doesn't tell you a lot, so to solve it it's a matter of narrowing down the issue.
Firstly, be absolutely sure the code runs over several dates in non-parallel with no errors
Ensure the parallelisation is set up correctly. There are some obvious initial errors that many other questions respond to, e.g., hidden parallelisation inside the code which means parallelisation is occurring twice.
Once you are sure that there is no problem with the code and the parallelisation is set up correctly start narrowing down. The issue is likely (unless something has been missed above) something in the code which isn't a problem when it is run in serial, but becomes a problem when run in parallel. The easiest way to narrow down is by setting outfile = "Log.txt" in which make cluster function you use, e.g., cl<-makeCluster(cores-1, outfile="Log.txt"). Then add as many print("Point in code") comments in your function to narrow down on where the issue is occurring.
In my case, the problem was the line jj = closeAllConnections(). This line works fine in non-parallel but breaks the code when in parallel. I suspect it has something to do with the function closing all connections including socket connections that are required for the parallelisation.
Try running using plain R instead of running in RStudio.
At first, let's create some sample categorical data with 3 levels.
y<-sample(c("A","B","C"),50,replace=TRUE)
I'm trying to formulate a Bayesian statistical model in which the y variable follows categorical distribution with parameters theta1,theta2,theta3. These parameters describe the probability a single y[i] belongs to the corresponding category. In the bayesian perspective, these parameters are also random variables and we use to assign a dirichlet prior to them with hyper-parameters alpha1,alpha2,alpha3.
I'm having some problems with the syntax as it seems.
CODE
model<-function(){
#likelihood
for( i in 1:N){
y[i]~ dcat(theta[])
}
#prior
theta[1:3]~ ddirch(alpha[])
}
library(R2OpenBUGS)
model.file <- file.path(tempdir(),"model.txt")
write.model(model, model.file)
y<-sample(c("A","B","C"),50,replace=TRUE)
N<-50
alpha<-c(1,1,1)
data<-list('y','N','alpha')
params<-c('theta')
inits<-function(){theta=c(1/3,1/3,1/3)}
We call OpenBUGS through R, with the bugs function
out<-bugs(data,inits,params,model.file,n.chains = 2
,n.iter=6000,codaPkg = TRUE,n.burnin = 1000,DIC = TRUE)
I've tried different ways to syntactically formulate the above code, dribbling through the errors and getting familiar with the log.txt file (that is the file that keeps the OpenBUGS output) until this code gave me a log.txt with no errors while R still has problems.
R output
Error in bugs.run(n.burnin, OpenBUGS.pgm, debug = debug, WINE = WINE, :
Look at the log file in /tmp/Rtmpofdk0t and
try again with 'debug=TRUE' to figure out what went wrong within OpenBUGS.
In addition: Warning message:
In FUN(X[[i]], ...) : class of 'x' was discarded
log.txt
OpenBUGS version 3.2.3 rev 1012
model is syntactically correct
data loaded
model compiled
initial values generated, model initialized
1000 updates took 0 s
monitor set
monitor set
monitor set
monitor set
deviance set
Thanks in advance for your help
I think you should rename theta1, theta2, theta3 with alpha1, alpha2, alpha3, because you use the alpha1,... in the function ddirch, but you never declare them. Instead you declare theta1 and so on, but never use them.
If there are any other issues, you might have a look at the log file, like the compiler suggests.
After numerous experiments, i figured out that for some reason OpenBUGS cant accept factor variables given as usual. So i changed my data ( format "A","B","C") to numeric (format 1,2,3) with the as.numeric R function and everything ran smoothly!
I have been using the deSolve package in a MCMC algorithm to estimate parameters in an ODE and wrote the functions used in the solver in C to speed up the algorithm. Sometimes, but not always I get the error Error in .C("unlock solver") when running the ode function. I am able to successfully compile and link the C files using the commands
system("R CMD SHLIB [insert-file-path]")
dyn.load("[dll-file-path]")
but when I try to solve the ODE using the dll file, the error is thrown. Then, even when running a simple script like the one below, I get the same error. I think the issue is related to using the compiled code, but I don't know how and cannot find any references on this error.
> require(deSolve)
> initVal <- c(y=1)
> times <- seq(0, 1, 0.001)
> parms <- c(k=1)
> model1 <- function(t, y, parms){
+ with(as.list(c(y, parms)),{
+ dy <- -k*y;
+ list(c(dy))
+ })
+ }
> out <- ode(y=initVal, times=times, parms=parms, func=model1)
Error in .C("unlock_solver") :
"unlock_solver" not resolved from current namespace (deSolve)
Partial Solution
If I restart R and only load the DLL using the dyn.load function, but don't compile the code, the ode function runs without an error. This fixes my problem, but I still have no idea why.
Edit:
REAL solution from Thomas Petzoldt on the R help list:
[The error] occurs, if package deSolve is loaded after the compiled model... The solution is to load deSolve before [before loading DLL's], ideally at the very beginning of your script, and at least before loading/unloading the DLL/.so
If that doesn't work, the below might as well (old answer):
I have found a somewhat inelegant solution.
The problem seems to be that the "unlock_solver" function within deSolve is somehow not being accessed correctly. You can unload and reload the entire deSolve.so file, rather than restarting R.
To do this you can use something like:
require(deSolve)
# encounter error
library.dynam.unload("deSolve", libpath=paste(.libPaths()[1], "//deSolve", sep=""))
library.dynam("deSolve", package="deSolve", lib.loc=.libPaths()[1])
You'll need to replace ".libPaths()[1]" with wherever you have installed deSolve, if it isn't in the first element of your .libPaths variable.
This is something of a sledge hammer, though. I've sent out a request to the r-help list asking if there is some way to either change where R looks for "unlock_solver", or to just unload/reload a portion of deSolve.
Make sure you have the following packages installed (at the beginning of your script) to compile the .dll file.
packages <- c("deSolve","coda", "adaptMCMC")
if(require(packages)){
install.packages(packages,dependencies = T)
}
ppp <- lapply(packages,require,character.only=T)
First remove the current .dll file in your wdir
c_compile <- "your_c_file"
dyn.unload(paste0(c_compile,".dll")) # unload dll (Windows only)
Then compile the C file and the .dll
system(paste0("R CMD SHLIB ",c_compile,".c"))
dyn.load(paste0(c_compile,".dll"))# Load dll (Windows only)