IRkernel dies when running mclapply with dada2 - r

I don't know if this issue is dada2 specific or not. I would guess that it is not but I am not able to reproduce it otherwise.
I am trying to use mclapply from the parallel library inside of a Jupyter notebook with dada2. The parallel job runs, though the moment it finishes the kernel dies and I am unable to restart it. Running the same workflow inside of an R terminal has no issues.
Running it on a small dataset works with no issues:
library(dada2)
library(parallel)
derepFs <- mclapply('seqs/test_f.fastq', derepFastq)
derepFs
Running the same workflow but with the full dataset (I'm sorry I am not able to provide it here, it is too large, and not public) causes the kernel to die, this makes me think it is a memory issue, runninng it outside of the Jupyter environment has no issues. Running this with lapply has no issues. Also attempting to run this on an AWS instance with more memory results in the the same error. The terminal output when the kernel dies is:
Error in poll.socket(list(sockets$hb, sockets$shell, sockets$control), :
Interrupted system call
Calls: <Anonymous> -> <Anonymous> -> poll.socket -> .Call
Execution halted
Monitoring memory shows it never gets very high ~200MB. So my question is if it is not memory what could it be? I realize it may be difficult to answer this question, though as I said I cannot post the full dataset. R version 3.2.2, Jupyter version 1.0.0, Dada2 version 0.99.8, OSX 10.11.4

Related

R/RStudio unable to run, with looping socketConnection error

A few days ago, I was having an error running models in R using 'brms', which said that my posterior samples didn't exist. Upon reading further, these links (1, 2, 3, 4) led me to think it was an rstan problem playing with my macOS (Catalina 10.15.6).
I followed their instructions, namely:
-updated packages Rcpp, rstan, arm, and brms
-followed these workaround instructions to alter the 'parallel' settings for stan: https://github.com/rstudio/rstudio/issues/6692
-updated R and RStudio, since this problem was supposedly fixed a few months ago with R 4.0
-updated XCode 11, Quartz 11, GNU Fortran 8.2
-updated latest macOS Catalina bug fixes
-ran sudo rm -rf [path to R] to uninstall R
-tried to do a thorough uninstall of all R and RStudio files, including deleting files in my Library/Frameworks folder, any .plist files in Library/Preferences, and any .Rprofile, .Rscript, .Rapp, .Rhistory, or .Renvirons files
-reinstalled R and RStudio after restart
Now, instead of having a "blank slate" to start from, I am experiencing some super weird behaviors. First, RStudio opens on a completely white blank screen and never loads. Second, when I try to open R directly either via terminal or with R Console, I get stuck in a loop for nearly 20 min that says:
Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :
file descriptor is too large for select()
Calls: <Anonymous> ... makePSOCKcluster -> newPSOCKnode -> socketConnection
Execution halted
Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :
cannot open the connection
Calls: <Anonymous> ... makePSOCKcluster -> newPSOCKnode -> socketConnection
In addition: Warning message:
In socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :
port 11537 cannot be opened
Execution halted
At the very end, when it finally stops looping forever, it says:
/Library/Frameworks/R.framework/Versions/4.0/Resources/bin/R: cannot make pipe for command substitution: Too many open files
ERROR: option '-e' requires a non-empty argument
rm: /var/folders/54/km__8z8x78x8_ct1pw8w8bbh0000gn/T//RtmpVORdTy: Too many open files
I can't access a console or enter anything in R to try to troubleshoot. Moreover, it causes a massive slowdown to my computer and Activity Monitor shows more than 150 'R' processes running, which don't go away after quitting R, only after using 'killall R' in Terminal.
However, someone in IT helped me determine that it's something in my Mac user library or preferences, because we created a brand new user on my machine, installed R and RStudio, and had no problems loading them.
I am just a psychology grad student, so I really don't understand the back end that makes R work and I am totally baffled by these symptoms.
I suspect that these links (5, 6, 7) might help, but I don't know how to execute the solutions because right now I can't enter or run anything in R without triggering that endless loop of 'Execution halted.'
I could really use a hand, thanks!

rstudio - removing the memory limit

I’m trying to run some modeling (random forest, using caret) in rstudio server 1.1.423 (with R version 3.4.4, running on an Ubuntu 16.04 server), and it comes back with the following error:
Error: protect(): protection stack overflow
This error doesn't come up if I run the same analysis in an interactive R session. I seem to recall that in the past (in rstudio server running an older version of R) was able to resolve this error by issuing memory.limit(500000) in an interactive rstudio server session, but these days this comes back with:
> memory.limit(500000)
[1] Inf
Warning message:
'memory.limit()' is Windows-specific
A solution that works and that I use routinely is to run my analysis from a script, like Rscript --max-ppsize=500000 --vanilla /location/of/the/script.R, but that’s not what I want to do, as in this particular case I need to run the analysis interactively.
I’ve also tried adding R_MAX_VSIZE=500000 at the end of my ~/.profile, or rsession-memory-limit-mb=500000 into /etc/rstudio/rserver.conf, as well as putting options(expressions = 5e5) in my ~/.Rprofile, or running options(expressions = 5e5) in an interactive rstudio server session. No luck so far, the “protect()” error keeps on popping up.
Any ideas as to how to remove the memory limit in rstudio server?

R Execution failed with the application has requested the Runtime to terminate it in an unusual way

I am new to R, kindly help me with below error.
Calling R code using batch file (e.g: c:\batchfile\x.bat) in a machine which has dynamic memory i.e. based on load memory and cores will increase.
in above approach everything I executing with out error. R code using RODBCext
koRpus, akmeans, lsa, stringr, topicmodels, RWeka, lda, snowfall, tm, openNLP, reshape2, plyr, RODBC packages.
But while calling x.bat file using remote server within powershell (e.g.,Invoke-Command -Computername 'Servername' {Start-Process 'C:\batchfile\x.bat' -wait}) resulting below errors:
LoadLibrary failure: The paging file is too small for this operation to complete.
just-in-time debugging errors
This application has requested the Runtime to terminate it in an unusual way.
Thanks in advance

RStudio cannot find any package after laptop restart

My R script worked fine in RStudio (Version 0.98.1091) on Windows 7. Then I restarted my laptop, entered again in RStudio and now it provides the following error messages each time I want to execute my code:
cl <- makeCluster(mc); # build the cluster
Error: could not find function "makeCluster"
> registerDoParallel(cl)
Error: could not find function "registerDoParallel"
> fileIdndexes <- gsub("\\.[^.]*","",basename(SF))
Error in basename(SF) : object 'SF' not found
These error messages are slightly different each time I run the code. It seems that RStudio cannot find any function that is used in the code.
I restarted R Session, cleaned Workspace, restarted RStudio. Nothing helps.
It must be noticed that after many attempts to execute the code, it finally was initialized. However, after 100 iterations, it crashed with the message related to unavailability of localhost.
Add library(*the package needed/where the function is*) for each of the packages you're using.

"Cannot open the connection" - HPC in R with snow

I'm attempting to run a parallel job in R using snow. I've been able to run extremely similar jobs with no trouble on older versions of R and snow. R package dependencies prevent me from reverting.
What happens: My jobs terminate at the parRapply step, i.e., the first time the nodes have to do anything short of reporting Sys.info(). The error message reads:
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: cannot open the connection
Calls: parRapply ... clusterApply -> staticClusterApply -> checkForRemoteErrors
Specs: R 2.14.0, snow 0.3-8, RedHat Enterprise Linux Client release 5.6. The snow package has been built on the correct version of R.
Details:
The following code appears to execute fine:
cl <- makeCluster(3)
clusterEvalQ(cl,library(deSolve,lib="~/R/library"))
clusterCall(cl,function() Sys.info()[c("nodename","machine")])
I'm an end-user, not a system admin, but I'm desperate for suggestions and insights into what could be going wrong.
This cryptic error appeared because an input file that's requested during program execution wasn't actually present. Each node would attempt to load this file and then fail, but this would result only in a "cannot open the connection" message.
What this means is that almost anything can cause a "connection" error. Incredibly annoying!

Resources