rstudio - removing the memory limit - r

I’m trying to run some modeling (random forest, using caret) in rstudio server 1.1.423 (with R version 3.4.4, running on an Ubuntu 16.04 server), and it comes back with the following error:
Error: protect(): protection stack overflow
This error doesn't come up if I run the same analysis in an interactive R session. I seem to recall that in the past (in rstudio server running an older version of R) was able to resolve this error by issuing memory.limit(500000) in an interactive rstudio server session, but these days this comes back with:
> memory.limit(500000)
[1] Inf
Warning message:
'memory.limit()' is Windows-specific
A solution that works and that I use routinely is to run my analysis from a script, like Rscript --max-ppsize=500000 --vanilla /location/of/the/script.R, but that’s not what I want to do, as in this particular case I need to run the analysis interactively.
I’ve also tried adding R_MAX_VSIZE=500000 at the end of my ~/.profile, or rsession-memory-limit-mb=500000 into /etc/rstudio/rserver.conf, as well as putting options(expressions = 5e5) in my ~/.Rprofile, or running options(expressions = 5e5) in an interactive rstudio server session. No luck so far, the “protect()” error keeps on popping up.
Any ideas as to how to remove the memory limit in rstudio server?

Related

R: "internet routines cannot be loaded" when starting from RStudio

I am running Red Hat Enterprise Linux (RHEL) 8.5 with Linux kernel 4.18 and Gnome 3.32.2. In this system, I've got R 4.1.2 compiled with the tool asdf with shared libraries enabled. On top of that, I installed RStudio 2021.09.01-372 from an RPM from the official RStudio website.
When I start Rstudio, the first line of output after the usual R startup is an error:
Error in tools::startDynamicHelp() : internet routines cannot be loaded
I am unable to figure out what's causing this error, and with it I can't run things like refresh CRAN or update packages. But if I start a pure R session from the terminal (instead of Rstudio) this error does not occur.
Some things I tried:
Install the krb5 and libssh2 packages on my host system: Didn't help.
Starting a "pure" R session (both with and without the --vanilla argument) from the Terminal tab within Rstudio also gives this error. If I try to run update.packages() from this session, it pops up a window to select a CRAN mirror then fails with the following:
Warning: failed to download mirrors file (internet routines cannot be loaded); using local file '/home/[my username]/.asdf/installs/R/4.1.2/lib64/R/doc/CRAN_mirrors.csv'
Warning: unable to access index for repository https://cloud.r-project.org/src/contrib:
internet routines cannot be loaded
Warning message:
In download.file(url, destfile = f, quiet = TRUE) :
unable to load shared object '/home/penyuan/.asdf/installs/R/4.1.2/lib64/R/modules//internet.so':
/lib64/libssh.so.4: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b
But like I said, the strange thing is if I start an R session outside of Rstudio, these errors don't happen.
Within RStudio, the only workaround I can find is to run this command upon startup (suggested in this thread):
options(download.file.method="wget")
Once this is done, everything else seems to work, such as package updates.
However, I don't want to manually do this every time I start RStudio. So I tried to put it into ~/.Rprofile including a test print() as follows:
print("This is `~/.Rprofile`")
options(download.file.method="wget")
When I open RStudio, I can see the output from the print() call, but the options() command is not run because the original error shows up again. I still have to manually enter options(download.file.method="wget") every time.
I also tried to fold everything into a .First function in ~/.Rprofile as follows:
.First <- function() {
options(download.file.method="wget")
print("This is the `.First` function in `~/.Rprofile`")
}
Unfortunately, same result as before: print()'s output is seen, but options() is not run.
I also made sure that my ~/.Rprofile includes a trailing newline as discussed here. But this didn't help.
The above are the steps I've tried so far.
Why does this error only occur when running RStudio or a terminal within Rstudio? Why doesn't it happen if I start R from a terminal outside of Rstudio?
Is there a way to solve the problem so that the error doesn't happen in the first place? If it can't be solved, how do I set up my ~/.Rprofile so that options(download.file.method="wget") will be run?
Thank you.

R/RStudio unable to run, with looping socketConnection error

A few days ago, I was having an error running models in R using 'brms', which said that my posterior samples didn't exist. Upon reading further, these links (1, 2, 3, 4) led me to think it was an rstan problem playing with my macOS (Catalina 10.15.6).
I followed their instructions, namely:
-updated packages Rcpp, rstan, arm, and brms
-followed these workaround instructions to alter the 'parallel' settings for stan: https://github.com/rstudio/rstudio/issues/6692
-updated R and RStudio, since this problem was supposedly fixed a few months ago with R 4.0
-updated XCode 11, Quartz 11, GNU Fortran 8.2
-updated latest macOS Catalina bug fixes
-ran sudo rm -rf [path to R] to uninstall R
-tried to do a thorough uninstall of all R and RStudio files, including deleting files in my Library/Frameworks folder, any .plist files in Library/Preferences, and any .Rprofile, .Rscript, .Rapp, .Rhistory, or .Renvirons files
-reinstalled R and RStudio after restart
Now, instead of having a "blank slate" to start from, I am experiencing some super weird behaviors. First, RStudio opens on a completely white blank screen and never loads. Second, when I try to open R directly either via terminal or with R Console, I get stuck in a loop for nearly 20 min that says:
Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :
file descriptor is too large for select()
Calls: <Anonymous> ... makePSOCKcluster -> newPSOCKnode -> socketConnection
Execution halted
Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :
cannot open the connection
Calls: <Anonymous> ... makePSOCKcluster -> newPSOCKnode -> socketConnection
In addition: Warning message:
In socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :
port 11537 cannot be opened
Execution halted
At the very end, when it finally stops looping forever, it says:
/Library/Frameworks/R.framework/Versions/4.0/Resources/bin/R: cannot make pipe for command substitution: Too many open files
ERROR: option '-e' requires a non-empty argument
rm: /var/folders/54/km__8z8x78x8_ct1pw8w8bbh0000gn/T//RtmpVORdTy: Too many open files
I can't access a console or enter anything in R to try to troubleshoot. Moreover, it causes a massive slowdown to my computer and Activity Monitor shows more than 150 'R' processes running, which don't go away after quitting R, only after using 'killall R' in Terminal.
However, someone in IT helped me determine that it's something in my Mac user library or preferences, because we created a brand new user on my machine, installed R and RStudio, and had no problems loading them.
I am just a psychology grad student, so I really don't understand the back end that makes R work and I am totally baffled by these symptoms.
I suspect that these links (5, 6, 7) might help, but I don't know how to execute the solutions because right now I can't enter or run anything in R without triggering that endless loop of 'Execution halted.'
I could really use a hand, thanks!

IRkernel dies when running mclapply with dada2

I don't know if this issue is dada2 specific or not. I would guess that it is not but I am not able to reproduce it otherwise.
I am trying to use mclapply from the parallel library inside of a Jupyter notebook with dada2. The parallel job runs, though the moment it finishes the kernel dies and I am unable to restart it. Running the same workflow inside of an R terminal has no issues.
Running it on a small dataset works with no issues:
library(dada2)
library(parallel)
derepFs <- mclapply('seqs/test_f.fastq', derepFastq)
derepFs
Running the same workflow but with the full dataset (I'm sorry I am not able to provide it here, it is too large, and not public) causes the kernel to die, this makes me think it is a memory issue, runninng it outside of the Jupyter environment has no issues. Running this with lapply has no issues. Also attempting to run this on an AWS instance with more memory results in the the same error. The terminal output when the kernel dies is:
Error in poll.socket(list(sockets$hb, sockets$shell, sockets$control), :
Interrupted system call
Calls: <Anonymous> -> <Anonymous> -> poll.socket -> .Call
Execution halted
Monitoring memory shows it never gets very high ~200MB. So my question is if it is not memory what could it be? I realize it may be difficult to answer this question, though as I said I cannot post the full dataset. R version 3.2.2, Jupyter version 1.0.0, Dada2 version 0.99.8, OSX 10.11.4

RStudio cannot find any package after laptop restart

My R script worked fine in RStudio (Version 0.98.1091) on Windows 7. Then I restarted my laptop, entered again in RStudio and now it provides the following error messages each time I want to execute my code:
cl <- makeCluster(mc); # build the cluster
Error: could not find function "makeCluster"
> registerDoParallel(cl)
Error: could not find function "registerDoParallel"
> fileIdndexes <- gsub("\\.[^.]*","",basename(SF))
Error in basename(SF) : object 'SF' not found
These error messages are slightly different each time I run the code. It seems that RStudio cannot find any function that is used in the code.
I restarted R Session, cleaned Workspace, restarted RStudio. Nothing helps.
It must be noticed that after many attempts to execute the code, it finally was initialized. However, after 100 iterations, it crashed with the message related to unavailability of localhost.
Add library(*the package needed/where the function is*) for each of the packages you're using.

"Cannot open the connection" - HPC in R with snow

I'm attempting to run a parallel job in R using snow. I've been able to run extremely similar jobs with no trouble on older versions of R and snow. R package dependencies prevent me from reverting.
What happens: My jobs terminate at the parRapply step, i.e., the first time the nodes have to do anything short of reporting Sys.info(). The error message reads:
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: cannot open the connection
Calls: parRapply ... clusterApply -> staticClusterApply -> checkForRemoteErrors
Specs: R 2.14.0, snow 0.3-8, RedHat Enterprise Linux Client release 5.6. The snow package has been built on the correct version of R.
Details:
The following code appears to execute fine:
cl <- makeCluster(3)
clusterEvalQ(cl,library(deSolve,lib="~/R/library"))
clusterCall(cl,function() Sys.info()[c("nodename","machine")])
I'm an end-user, not a system admin, but I'm desperate for suggestions and insights into what could be going wrong.
This cryptic error appeared because an input file that's requested during program execution wasn't actually present. Each node would attempt to load this file and then fail, but this would result only in a "cannot open the connection" message.
What this means is that almost anything can cause a "connection" error. Incredibly annoying!

Resources