I am running the sklearn DBSCAN algorithm on a dataset with dimensionality 300000x50 in a Jupyter Notebook on AWS Sagemaker ("ml.t2.medium" compute instance). The dataset contains feature vectors with 1:s and 0:s.
Once I run the cell, an orange prompt in the upper right corner "Gateway Timeout" appears after a while. The icon disappears when you click on it providing no further information. The notebook is unresponsive until you restart the notebook instance.
I have tried different values for the parameters eps and min_samples to no avail.
db = DBSCAN(eps = 0.1, min_samples = 100).fit(transformed_vectors)
Does "Gateway Timeout" mean that the notebook kernel has crashed or can I expect any results by waiting?
So far the calculation has been running for about 2 hours.
you could always pick a larger size for your notebook instance (ml.t2.medium is pretty small), but I think the better way would be to train your code a on a managed SageMaker instance. Sklearn is built-in on SageMaker, so all you have to do is bring your script, e.g.:
from sagemaker.sklearn.estimator import SKLearn
sklearn = SKLearn(
entry_point="my_code.py",
train_instance_type="ml.c4.xlarge",
role=role,
sagemaker_session=sagemaker_session)
Here's a complete example: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_iris/Scikit-learn%20Estimator%20Example%20With%20Batch%20Transform.ipynb
Related
I've been trying to run some stan models in a jupyter notebook using rstan with the IRkernel. I set up an environment for this using conda. I believe I have installed all the necessary packages. I can run ordinary R functions without problems, but when I try to create a model using something like
model <- stan( model_code = code , data = dat )
the kernel just dies without any further explanation. The command line output is
memset [0x0x7ffa665b6e3b+11835]
RtlFreeHeap [0x0x7ffa665347b1+81]
free_base [0x0x7ffa640cf05b+27]
(No symbol) [0x0x7ffa2f723b44]
[I 15:25:11.757 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
WARNING:root:kernel 658481b8-0c64-4612-9cad-1f199dabce3a restarted
which I do not know how to interpret. This happens 100% of the time, even with toy models. I can run the models just fine in RStudio. Could this be a memory issue? I don't experience this problem training deep learning models in tensorflow, for reference.
Thanks in advance for any help.
Error message in vs code when using jupyter extension connected to remote server using ssh.
Error: Session cannot generate requests
Error: Session cannot generate requests
at w.executeCodeCell (/root/.vscode-server/extensions/ms-toolsai.jupyter-2021.8.1236758218/out/client/extension.js:90:327199)
at w.execute (/root/.vscode-server/extensions/ms-toolsai.jupyter-2021.8.1236758218/out/client/extension.js:90:326520)
at w.start (/root/.vscode-server/extensions/ms-toolsai.jupyter-2021.8.1236758218/out/client/extension.js:90:322336)
at async t.CellExecutionQueue.executeQueuedCells (/root/.vscode-server/extensions/ms-toolsai.jupyter-2021.8.1236758218/out/client/extension.js:90:336863)
at async t.CellExecutionQueue.start (/root/.vscode-server/extensions/ms-toolsai.jupyter-2021.8.1236758218/out/client/extension.js:90:336403)
I got this error after running the code below.
import pandas as pd
from itertools import product
pd.DataFrame(product(item_table, user_table), columns = ['item_id', 'user_id'])
product function outputs all combinations of the given tables.
item_table has 39729 number of items(39729 by 1)
user_table has 251350 users(251350 by 1).
And the above code outputs 251350 x 39729 combination table.
Therefore I guess this is because of the large computation but I want to know the meaning of error messages and want to know how to solve the problem.
I have encountered the same problem (so so).
It happened when I tried to import tensorflow.keras.
I was no longer able to import packages.
I just changed of conda environment, and then got back to the one I was working in and it worked (but trying to import keras still caused the same problem).
I use R with Keras and tensorflow 2.0 on the GPU.
After connecting a second monitor to my GPU, I receive this error during a deep learning script:
I concluded that the GPU is short of memory and a solution seems to be this code:
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True # dynamically grow the memory used on the GPU
config.log_device_placement = True # to log device placement (on which device the operation ran)
# (nothing gets printed in Jupyter, only if you run it standalone)
sess = tf.Session(config=config)
set_session(sess) # set this TensorFlow session as the default session for Keras
According to this post:
https://github.com/tensorflow/tensorflow/issues/7072#issuecomment-422488354
Although this code is not accepted by R.
It says
unexpecterd token from Tensorflow.
Error in tf.ConfigProto() : could not find function "tf.ConfigProto"
It seems that tensorflow 2.0 does not accept this code if I understand correct from this post:
https://github.com/tensorflow/tensorflow/issues/33504
Does anyone know how I can maximize the GPU usage from my R script with Keras library and Tensorflow 2.0 ?
Thank you!
To enable GPU memory growth using keras or tensorflow in R, with tensorflow 2.0, you need to find the correct functions in the tf object.
First, find your GPU device:
library(tensorflow)
gpu <- tf$config$experimental$get_visible_devices('GPU')[[1]]
Then enable memory growth for that device:
tf$config$experimental$set_memory_growth(device = gpu, enable = TRUE)
You can find more relevant functions by typing tf$config$experimental$ and then using tab autocomplete in Rstudio.
Since these functions are labeled as experimental, they will likely change or move location in the future.
I am a h2o R version user and I have a question regarding the h2o local cluster. I setup the cluster by execute the command in r,
h2o.init()
However, the cluster will be turned off automatically when I do not use it for a few hours. For example, I run my model during the night, but when I come back to my office in the morning to check on my model. It says,
Error in h2o.getConnection() : No active connection to an H2O cluster. Did you runh2o.init()?
Is there a way to fix or work around it ?
If the H2O cluster is still running, then your models are all still there (assuming they finished training successfully). There are a number of ways that you can check if the H2O Java cluster is still running. In R, you can check the output of these functions:
h2o.clusterStatus()
h2o.clusterInfo()
At the command line (look for a Java process):
ps aux | grep java
If you started H2O from R, then you should see a line that looks something like this:
yourusername 26215 0.0 2.7 8353760 454128 ?? S 9:41PM 21:25.33 /usr/bin/java -ea -cp /Library/Frameworks/R.framework/Versions/3.3/Resources/library/h2o/java/h2o.jar water.H2OApp -name H2O_started_from_R_me_iqv833 -ip localhost -port 54321 -ice_root /var/folders/2j/jg4sl53d5q53tc2_nzm9fz5h0000gn/T//Rtmp6XG99X
H2O models do not live in the R environment, they live in the H2O cluster (a Java process). It sounds like what's happening is that the R object representing your model (which is actually just a pointer to the model in the H2O cluster) is having issues finding the model since your cluster disconnected. I don't know exactly what's going on because you haven't posted the errors you're receiving when you try to use h2o.predict() or h2o.performance().
To get the model back, you can use the h2o.getModel() function. You will need to know the ID of your model. If your model object (that's not working properly) is still accessible, then you can see the model ID easily that way: model#model_id You can also head over to H2O Flow in the browser (by typing: http://127.0.0.1:54321 if you started H2O with the defaults) and view all the models by ID that way.
Once you know the model ID, then refresh the model by doing:
model <- h2o.getModel("model_id")
This should re-establish the connection to your model and the h2o.predict() and h2o.performance() functions should work again.
I don't know if this issue is dada2 specific or not. I would guess that it is not but I am not able to reproduce it otherwise.
I am trying to use mclapply from the parallel library inside of a Jupyter notebook with dada2. The parallel job runs, though the moment it finishes the kernel dies and I am unable to restart it. Running the same workflow inside of an R terminal has no issues.
Running it on a small dataset works with no issues:
library(dada2)
library(parallel)
derepFs <- mclapply('seqs/test_f.fastq', derepFastq)
derepFs
Running the same workflow but with the full dataset (I'm sorry I am not able to provide it here, it is too large, and not public) causes the kernel to die, this makes me think it is a memory issue, runninng it outside of the Jupyter environment has no issues. Running this with lapply has no issues. Also attempting to run this on an AWS instance with more memory results in the the same error. The terminal output when the kernel dies is:
Error in poll.socket(list(sockets$hb, sockets$shell, sockets$control), :
Interrupted system call
Calls: <Anonymous> -> <Anonymous> -> poll.socket -> .Call
Execution halted
Monitoring memory shows it never gets very high ~200MB. So my question is if it is not memory what could it be? I realize it may be difficult to answer this question, though as I said I cannot post the full dataset. R version 3.2.2, Jupyter version 1.0.0, Dada2 version 0.99.8, OSX 10.11.4