I've been trying to run some stan models in a jupyter notebook using rstan with the IRkernel. I set up an environment for this using conda. I believe I have installed all the necessary packages. I can run ordinary R functions without problems, but when I try to create a model using something like
model <- stan( model_code = code , data = dat )
the kernel just dies without any further explanation. The command line output is
memset [0x0x7ffa665b6e3b+11835]
RtlFreeHeap [0x0x7ffa665347b1+81]
free_base [0x0x7ffa640cf05b+27]
(No symbol) [0x0x7ffa2f723b44]
[I 15:25:11.757 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
WARNING:root:kernel 658481b8-0c64-4612-9cad-1f199dabce3a restarted
which I do not know how to interpret. This happens 100% of the time, even with toy models. I can run the models just fine in RStudio. Could this be a memory issue? I don't experience this problem training deep learning models in tensorflow, for reference.
Thanks in advance for any help.
Related
I am running the sklearn DBSCAN algorithm on a dataset with dimensionality 300000x50 in a Jupyter Notebook on AWS Sagemaker ("ml.t2.medium" compute instance). The dataset contains feature vectors with 1:s and 0:s.
Once I run the cell, an orange prompt in the upper right corner "Gateway Timeout" appears after a while. The icon disappears when you click on it providing no further information. The notebook is unresponsive until you restart the notebook instance.
I have tried different values for the parameters eps and min_samples to no avail.
db = DBSCAN(eps = 0.1, min_samples = 100).fit(transformed_vectors)
Does "Gateway Timeout" mean that the notebook kernel has crashed or can I expect any results by waiting?
So far the calculation has been running for about 2 hours.
you could always pick a larger size for your notebook instance (ml.t2.medium is pretty small), but I think the better way would be to train your code a on a managed SageMaker instance. Sklearn is built-in on SageMaker, so all you have to do is bring your script, e.g.:
from sagemaker.sklearn.estimator import SKLearn
sklearn = SKLearn(
entry_point="my_code.py",
train_instance_type="ml.c4.xlarge",
role=role,
sagemaker_session=sagemaker_session)
Here's a complete example: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_iris/Scikit-learn%20Estimator%20Example%20With%20Batch%20Transform.ipynb
I’m trying to run some modeling (random forest, using caret) in rstudio server 1.1.423 (with R version 3.4.4, running on an Ubuntu 16.04 server), and it comes back with the following error:
Error: protect(): protection stack overflow
This error doesn't come up if I run the same analysis in an interactive R session. I seem to recall that in the past (in rstudio server running an older version of R) was able to resolve this error by issuing memory.limit(500000) in an interactive rstudio server session, but these days this comes back with:
> memory.limit(500000)
[1] Inf
Warning message:
'memory.limit()' is Windows-specific
A solution that works and that I use routinely is to run my analysis from a script, like Rscript --max-ppsize=500000 --vanilla /location/of/the/script.R, but that’s not what I want to do, as in this particular case I need to run the analysis interactively.
I’ve also tried adding R_MAX_VSIZE=500000 at the end of my ~/.profile, or rsession-memory-limit-mb=500000 into /etc/rstudio/rserver.conf, as well as putting options(expressions = 5e5) in my ~/.Rprofile, or running options(expressions = 5e5) in an interactive rstudio server session. No luck so far, the “protect()” error keeps on popping up.
Any ideas as to how to remove the memory limit in rstudio server?
I am a h2o R version user and I have a question regarding the h2o local cluster. I setup the cluster by execute the command in r,
h2o.init()
However, the cluster will be turned off automatically when I do not use it for a few hours. For example, I run my model during the night, but when I come back to my office in the morning to check on my model. It says,
Error in h2o.getConnection() : No active connection to an H2O cluster. Did you runh2o.init()?
Is there a way to fix or work around it ?
If the H2O cluster is still running, then your models are all still there (assuming they finished training successfully). There are a number of ways that you can check if the H2O Java cluster is still running. In R, you can check the output of these functions:
h2o.clusterStatus()
h2o.clusterInfo()
At the command line (look for a Java process):
ps aux | grep java
If you started H2O from R, then you should see a line that looks something like this:
yourusername 26215 0.0 2.7 8353760 454128 ?? S 9:41PM 21:25.33 /usr/bin/java -ea -cp /Library/Frameworks/R.framework/Versions/3.3/Resources/library/h2o/java/h2o.jar water.H2OApp -name H2O_started_from_R_me_iqv833 -ip localhost -port 54321 -ice_root /var/folders/2j/jg4sl53d5q53tc2_nzm9fz5h0000gn/T//Rtmp6XG99X
H2O models do not live in the R environment, they live in the H2O cluster (a Java process). It sounds like what's happening is that the R object representing your model (which is actually just a pointer to the model in the H2O cluster) is having issues finding the model since your cluster disconnected. I don't know exactly what's going on because you haven't posted the errors you're receiving when you try to use h2o.predict() or h2o.performance().
To get the model back, you can use the h2o.getModel() function. You will need to know the ID of your model. If your model object (that's not working properly) is still accessible, then you can see the model ID easily that way: model#model_id You can also head over to H2O Flow in the browser (by typing: http://127.0.0.1:54321 if you started H2O with the defaults) and view all the models by ID that way.
Once you know the model ID, then refresh the model by doing:
model <- h2o.getModel("model_id")
This should re-establish the connection to your model and the h2o.predict() and h2o.performance() functions should work again.
I don't know if this issue is dada2 specific or not. I would guess that it is not but I am not able to reproduce it otherwise.
I am trying to use mclapply from the parallel library inside of a Jupyter notebook with dada2. The parallel job runs, though the moment it finishes the kernel dies and I am unable to restart it. Running the same workflow inside of an R terminal has no issues.
Running it on a small dataset works with no issues:
library(dada2)
library(parallel)
derepFs <- mclapply('seqs/test_f.fastq', derepFastq)
derepFs
Running the same workflow but with the full dataset (I'm sorry I am not able to provide it here, it is too large, and not public) causes the kernel to die, this makes me think it is a memory issue, runninng it outside of the Jupyter environment has no issues. Running this with lapply has no issues. Also attempting to run this on an AWS instance with more memory results in the the same error. The terminal output when the kernel dies is:
Error in poll.socket(list(sockets$hb, sockets$shell, sockets$control), :
Interrupted system call
Calls: <Anonymous> -> <Anonymous> -> poll.socket -> .Call
Execution halted
Monitoring memory shows it never gets very high ~200MB. So my question is if it is not memory what could it be? I realize it may be difficult to answer this question, though as I said I cannot post the full dataset. R version 3.2.2, Jupyter version 1.0.0, Dada2 version 0.99.8, OSX 10.11.4
I've recently converted my windows R code to a Linux installation for running DEoptim on a function. On my windows system it all worked fine using:
ans <- DEoptim1(Calibrate,lower,upper,
DEoptim.control(trace=TRUE,parallelType=1,parVAr=parVarnames3,
packages=c("hydromad","maptools","compiler","tcltk","raster")))
where the function 'Calibrate' consisted of multiple functions. On the windows system I simply downloaded the various packages needed into the R library. The option paralleType=1 ran the code across a series of cores.
However, now I want to put this code onto a Linux based computing cluster - the function 'Calibrate' works fine when stand alone, as does DEoptim if I want to run the code on one core. However, when I specify the parelleType=1, the code fails and returns:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
7 nodes produced errors; first error: there is no package called ‘raster’
This error is reproduced whatever package I try and recall, even though the
library(raster)
command worked fine and 'raster' is clearly shown as okay when I call all the libraries using:
library()
So, my gut feeling is, is that even though all the packages and libraries are loaded okay, it is because I have used a personal library and the packages element of DEoptim.control is looking in a different space. An example of how the packages were installed is below:
install.packages("/home/antony/R/Pkges/raster_2.4-15.tar.gz",rpeo=NULL,target="source",lib="/home/antony/R/library")
I also set the lib paths option as below:
.libPaths('/home/antony/R/library')
Has anybody any idea of what I am doing wrong and how to set the 'packages' option in DEoptim control so I can run DEoptim across multiple cores in parallel?
Many thanks, Antony