Anyone run into the R haven package call generating: ***Recursive gc invocation error in a shared environment?
Here is what I am running.
RStudio Ver. 2021.09.2 Build 382
R Ver. 4.1.2
Rtools Vers 4.0
Haven package release 2.4.3
The code works on my local install. I have users accessing a shared instance to work with large data files and can't get it to consistently run in that environ. The essence of the call is:
library(haven)
SFF05 <- read_sas("file/path/name/filename.sas7bdat”)
Users in this space often get this error off the library statement:
*******recursive gc invocation
Adding gc() to the start of the code will fix it temporarily, but eventually stops working.
My users also get this error in relation to the file read_sas call:
Error in df_parse_sas_file(….)
They occasionally give us other recursive errors or namespace errors. It’s not very consistent, and sometimes it just hangs until R crashes and reloads.
These programs worked flawlessly on R3.6.1 and RStudio July 19th, 2018 (1.1.456). Also, it works in R4.1.2 console just fine, it appears be an issue with the combination of R 4.1.2 and RStudio Ver. 2021.09.2 Build 382 in a shared environment.
This could be related to a bug in your RStudio, e.g see:
https://github.com/rstudio/rstudio/issues/9868
https://github.com/rstudio/rstudio/issues/10565
https://github.com/rstudio/rstudio/issues/10040
It seems to be fixed in the latest RStudio release which should arrive any day now:
https://github.com/rstudio/rstudio/milestone/20?closed=1
A temporary fix might be to set session-handle-offline-timeout-ms=0 in your rsession.conf. See: https://github.com/rstudio/rstudio/issues/10565#issuecomment-1035517692
Related
I am running Red Hat Enterprise Linux (RHEL) 8.5 with Linux kernel 4.18 and Gnome 3.32.2. In this system, I've got R 4.1.2 compiled with the tool asdf with shared libraries enabled. On top of that, I installed RStudio 2021.09.01-372 from an RPM from the official RStudio website.
When I start Rstudio, the first line of output after the usual R startup is an error:
Error in tools::startDynamicHelp() : internet routines cannot be loaded
I am unable to figure out what's causing this error, and with it I can't run things like refresh CRAN or update packages. But if I start a pure R session from the terminal (instead of Rstudio) this error does not occur.
Some things I tried:
Install the krb5 and libssh2 packages on my host system: Didn't help.
Starting a "pure" R session (both with and without the --vanilla argument) from the Terminal tab within Rstudio also gives this error. If I try to run update.packages() from this session, it pops up a window to select a CRAN mirror then fails with the following:
Warning: failed to download mirrors file (internet routines cannot be loaded); using local file '/home/[my username]/.asdf/installs/R/4.1.2/lib64/R/doc/CRAN_mirrors.csv'
Warning: unable to access index for repository https://cloud.r-project.org/src/contrib:
internet routines cannot be loaded
Warning message:
In download.file(url, destfile = f, quiet = TRUE) :
unable to load shared object '/home/penyuan/.asdf/installs/R/4.1.2/lib64/R/modules//internet.so':
/lib64/libssh.so.4: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b
But like I said, the strange thing is if I start an R session outside of Rstudio, these errors don't happen.
Within RStudio, the only workaround I can find is to run this command upon startup (suggested in this thread):
options(download.file.method="wget")
Once this is done, everything else seems to work, such as package updates.
However, I don't want to manually do this every time I start RStudio. So I tried to put it into ~/.Rprofile including a test print() as follows:
print("This is `~/.Rprofile`")
options(download.file.method="wget")
When I open RStudio, I can see the output from the print() call, but the options() command is not run because the original error shows up again. I still have to manually enter options(download.file.method="wget") every time.
I also tried to fold everything into a .First function in ~/.Rprofile as follows:
.First <- function() {
options(download.file.method="wget")
print("This is the `.First` function in `~/.Rprofile`")
}
Unfortunately, same result as before: print()'s output is seen, but options() is not run.
I also made sure that my ~/.Rprofile includes a trailing newline as discussed here. But this didn't help.
The above are the steps I've tried so far.
Why does this error only occur when running RStudio or a terminal within Rstudio? Why doesn't it happen if I start R from a terminal outside of Rstudio?
Is there a way to solve the problem so that the error doesn't happen in the first place? If it can't be solved, how do I set up my ~/.Rprofile so that options(download.file.method="wget") will be run?
Thank you.
I have used RStudio to submit a job a few months ago to cloudml (AI platform) and it was successful.
Today I tried to use AI platform notebook to submit the same job but I get:
"ERROR: (gcloud.ai-platform.jobs.submit.training) INVALID_ARGUMENT: Field: runtime_version Error: The specified runtime version '1.9' with the Python version ''"
I even ran which python in the terminal and then in the R env.:
library(reticulate)
use_python("result of the which python")
I tried R in the terminal as well and get the same error.
I don't know if it helps or not but the previous run and this one were in different regions.
us-central was successful
australia-southeast1 was getting this error.
This error occurs because as of March 16, 2020, you can no longer create training jobs that use runtime version 1.9. You can try submitting the job with version 1.15 which is the only Tensorflow 1.x version that is currently supported for training jobs. It is still possible though that you may experience errors due to incompatibilities in the code.
I'm having issues with Azure Machine Learning SDK for R: "module 'azureml' has no attribute 'core'"...
For reasons that aren't my own, I have to use azureml to apply machine learning (my own stuff, written in R) to data from our data warehouse that is put in the blob storage. The modelled output should be put back into the blob storage so it can be accessed from the data warehouse.
I've written the code in R on my local machine (stored in a git repo). Preferably, I'd find some method to pull my code from git into a pipeline in the azureml environment so that it can be directly run whenever new data is available in the blob storage.
I've embarked on a tutorial-spree and found this seemingly relevant walkthrough: Train and deploy your first model with Azure ML (and this one).
But... after trying all I could think of, I'm stuck on the first steps. After installing all (or at least.. that's what I think) packages, modules, apps etc, and running the following code in RStudio:
library(azuremlsdk)
existing_ws <- get_workspace(name = name,
subscription_id = subscription_id,
resource_group)
I run into an error that I haven't been able to fix:
AttributeError: module 'azureml' has no attribute 'core'
It seems that the azuerml is supposed to have an attribute "core", but when looking at it more closely, there is indeed no such attribute.
The function "get_workspace()" is trying to access: "azureml$core$Workspace$get".
I found that "azuerML$Workspace" does exist, but then I can't figure out how to make that work.
Can anyone explain to me why I'm encountering this error?
Does anyone know of a better tutorial on how to connect my R code the azureml's cloud service?
Any pointers in the right direction are much appreciated!
EDITS - still not solved:
After advice from others, I double, triple and quadruple checked the installation.
I updated R and I'm now running:
R.version
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 6.2
year 2019
month 12
day 12
svn rev 77560
language R
version.string R version 3.6.2 (2019-12-12)
nickname Dark and Stormy Night
I installed Conda with Python 3.6.10.
I installed the azuremlsdk R package (I tried both provided options).
I then realized that there are some inconsistencies with the versions of the azure-modules, so I also tried installing it with the keyword 'multi-arch':
remotes::install_cran('azuremlsdk', repos = 'http://cran.us.r-project.org', INSTALL_opts=c("--no-multiarch"))
Then, I installed the azureml python sdk.
I had a look at all the versions again (using python -m pip freeze):
azure-common==1.1.24
azure-graphrbac==0.61.1
azure-mgmt-authorization==0.60.0
azure-mgmt-containerregistry==2.8.0
azure-mgmt-keyvault==2.0.0
azure-mgmt-resource==7.0.0
azure-mgmt-storage==7.1.0
azureml==0.2.7
azureml-automl-core==1.0.83.1
azureml-core==1.0.69
azureml-dataprep==1.1.36
azureml-dataprep-native==13.2.0
azureml-pipeline==1.0.69
azureml-pipeline-core==1.0.69
azureml-pipeline-steps==1.0.69
azureml-sdk==1.0.69
azureml-telemetry==1.0.69
azureml-train==1.0.69
azureml-train-automl-client==1.0.83
azureml-train-core==1.0.69
azureml-train-restclients-hyperdrive==1.0.69
As I was surprised to see all the 1.0.69 versions, instead of the 1.0.83 versions, I re-installed the azureml python sdk using:
azuremlsdk::install_azureml(version = "1.0.83")
This worked, in the sense that indeed all versions are now 1.0.83:
azure-common==1.1.24
azure-graphrbac==0.61.1
azure-mgmt-authorization==0.60.0
azure-mgmt-containerregistry==2.8.0
azure-mgmt-keyvault==2.0.0
azure-mgmt-resource==7.0.0
azure-mgmt-storage==7.1.0
azureml==0.2.7
azureml-automl-core==1.0.83.1
azureml-core==1.0.83
azureml-dataprep==1.1.36
azureml-dataprep-native==13.2.0
azureml-pipeline==1.0.83
azureml-pipeline-core==1.0.83
azureml-pipeline-steps==1.0.83
azureml-sdk==1.0.83
azureml-telemetry==1.0.83
azureml-train==1.0.83
azureml-train-automl-client==1.0.83
azureml-train-core==1.0.83
azureml-train-restclients-hyperdrive==1.0.83
But still... I get the error with the missing core. I get it both when running:
library(azuremlsdk)
get_current_run()
and also when running:
library(azuremlsdk)
existing_ws <- get_workspace(name = name,
subscription_id = subscription_id,
resource_group)
Note that the first time running this code after starting up RStudio, I get the error:
Error in py_get_attr_impl(x, name, silent) :
AttributeError: module 'azureml' has no attribute '_base_sdk_common'
And every time after that I get this error:
Error in py_get_attr_impl(x, name, silent) :
AttributeError: module 'azureml' has no attribute 'core'
Any help would be much appreciated!
This issue was introduced by the latest reticulate 1.14 release, in which reticulate would create a default r-reticulate conda environment. Since Azure ML was installing the python SDK in an environment named r-azureml, the r-reticulate environment used by reticulate was missing the python SDK. A fix for this issue was addressed in a PR and has been merged into master. Please install from GitHub for now if you have reticulate version 1.14 and are running into this issue. We will be releasing an update to CRAN shortly.
I seemed to have fixed the issue by specifically installing the python package azureml AND azureml.core:
python -m pip install azureml
and then...
python -m pip install azureml.core
I did this for the Conda version that was called by R (r-reticulate). It's a bit odd to not be able to use the Conda environment 'r-azureml' without R switching back to 'r-reticulate', but ah well... at least I don't get my 'azureml' has no attribute 'core' anymore.
I don't know if this issue is dada2 specific or not. I would guess that it is not but I am not able to reproduce it otherwise.
I am trying to use mclapply from the parallel library inside of a Jupyter notebook with dada2. The parallel job runs, though the moment it finishes the kernel dies and I am unable to restart it. Running the same workflow inside of an R terminal has no issues.
Running it on a small dataset works with no issues:
library(dada2)
library(parallel)
derepFs <- mclapply('seqs/test_f.fastq', derepFastq)
derepFs
Running the same workflow but with the full dataset (I'm sorry I am not able to provide it here, it is too large, and not public) causes the kernel to die, this makes me think it is a memory issue, runninng it outside of the Jupyter environment has no issues. Running this with lapply has no issues. Also attempting to run this on an AWS instance with more memory results in the the same error. The terminal output when the kernel dies is:
Error in poll.socket(list(sockets$hb, sockets$shell, sockets$control), :
Interrupted system call
Calls: <Anonymous> -> <Anonymous> -> poll.socket -> .Call
Execution halted
Monitoring memory shows it never gets very high ~200MB. So my question is if it is not memory what could it be? I realize it may be difficult to answer this question, though as I said I cannot post the full dataset. R version 3.2.2, Jupyter version 1.0.0, Dada2 version 0.99.8, OSX 10.11.4
My R script worked fine in RStudio (Version 0.98.1091) on Windows 7. Then I restarted my laptop, entered again in RStudio and now it provides the following error messages each time I want to execute my code:
cl <- makeCluster(mc); # build the cluster
Error: could not find function "makeCluster"
> registerDoParallel(cl)
Error: could not find function "registerDoParallel"
> fileIdndexes <- gsub("\\.[^.]*","",basename(SF))
Error in basename(SF) : object 'SF' not found
These error messages are slightly different each time I run the code. It seems that RStudio cannot find any function that is used in the code.
I restarted R Session, cleaned Workspace, restarted RStudio. Nothing helps.
It must be noticed that after many attempts to execute the code, it finally was initialized. However, after 100 iterations, it crashed with the message related to unavailability of localhost.
Add library(*the package needed/where the function is*) for each of the packages you're using.