makeCluster function in R snow hangs indefinitely - r

I am using makeCluster function from R package snow from Linux machine to start a SOCK cluster on a remote Linux machine. All seems settled for the two machines to communicate succesfully (I am able to estabilish ssh connections between the two). But:
makeCluster("192.168.128.24",type="SOCK")
does not throw any result, just hangs indefinitely.
What am I doing wrong?
Thanks a lot

Unfortunately, there are a lot of things that can go wrong when creating a snow (or parallel) cluster object, and the most common failure mode is to hang indefinitely. The problem is that makeSOCKcluster launches the cluster workers one by one, and each worker (if successfully started) must make a socket connection back to the master before the master proceeds to launch the next worker. If any of the workers fail to connect back to the master, makeSOCKcluster will hang without any error message. The worker may issue an error message, but by default any error message is redirected to /dev/null.
In addition to ssh problems, makeSOCKcluster could hang because:
R not installed on a worker machine
snow not installed on a the worker machine
R or snow not installed in the same location as the local machine
current user doesn't exist on a worker machine
networking problem
firewall problem
and there are many more possibilities.
In other words, no one can diagnose this problem without further information, so you have to do some troubleshooting in order to get that information.
In my experience, the single most useful troubleshooting technique is manual mode which you enable by specifying manual=TRUE when creating the cluster object. It's also a good idea to set outfile="" so that error messages from the workers aren't redirected to /dev/null:
cl <- makeSOCKcluster("192.168.128.24", manual=TRUE, outfile="")
makeSOCKcluster will display an Rscript command to execute in a terminal on the specified machine, and then it will wait for you to execute that command. In other words, makeSOCKcluster will hang until you manually start the worker on host 192.168.128.24, in your case. Remember that this is a troubleshooting technique, not a solution to the problem, and the hope is to get more information about why the workers aren't starting by trying to start them manually.
Obviously, the use of manual mode bypasses any ssh issues (since you're not using ssh), so if you can create a SOCK cluster successfully in manual mode, then probably ssh is your problem. If the Rscript command isn't found, then either R isn't installed, or it's installed in a different location. But hopefully you'll get some error message that will lead you to the solution.
If makeSOCKcluster still just hangs after you've executed the specified Rscript command on the specified machine, then you probably have a networking or firewall issue.
For more troubleshooting advice, see my answer for making cluster in doParallel / snowfall hangs.

Related

R studio server browser freezes upon login

I have been working on my R studio session hosted by a Linux server and recently, ran a piece of code that was taking way too long to execute and I decided to kill it.
Here is the sequence of steps that I took - none of them helped me restore the health of my session.
1) Hit the stop button on R studio and be patient.
2) Ssh into my Linux server and ran the following command to kill all the processes running with my userid
killall -u myuserid
3) Removed the.RData,.Renviron,.Rhistory files from my workspace.
4) Ran the following R command via the Linux server for garbage collection
gc(reset=TRUE)
4) Restarted the entire Linux server.
I am running out of ideas and would really appreciate any other suggestions before I take more drastic steps like revoking access and granting it again(not sure if that would be the right fix)
Note: The browser window freezes every time I login, and it happens only for my R studio session, the rest of the users in the same network have no issues.
I solved this problem - Rstudio-serverfreezing. I think it was a network problem since I couldn't receive any response from calling "~~~~~~.cache.js". In this case, you can find out "~~~~~~~~~.cache.js" no response with pushing key before you click log-in button.
Anyway, here is my way.
Reset your Network with following orders
you can insert these into cmd terminal as an admin mode.
netsh winsock reset
netsh int ip reset
Reboot
The IP information may be erased. So if you're using fixed IP address, fill the blanks with as-is IP address.
That's all.
You may follow this way to recover the connection.

What is the status of long-running remote R sessions in ESS/Emacs?

I routinely run R remotely and have had great success with RStudio server to do so. However, Emacs/ESS is still preferable in many cases, particularly since I often work on multiple projects simultaneously. What is the start-of-the-art when running ESS/R in emacs when the expectation is that the connection will be broken? To be more concrete, I'd love to run a tmux session in Emacs so that I can connect to a long-running R session running in tmux (or screen). What is the status of ESS/Emacs to support such a scenario? This seems to be changing over time and I haven't found the "definitive" approach (perhaps there isn't one).
I do that all the time. At both home, and work.
Key components:
Start emacs in daemon mode: emacs --daemon &. Now emacs is long-running and persistent as it is disconnected from the front-end.
Connect using emacsclient -nw in text mode using tmux (or in my case, the byobu wrapper around tmux). As tmux persists, I can connect, disconnect, reconnect,... at will while having several tabs, split panes, ... from byobu/tmux.
When nearby -- on home desktop connecting to home server, or at work with several servers -- connect via emacsclient -c. Now I have the standard X11 goodness, plotting etc pp. That is my default 'working' mode.
But because each emacs session has an R session (or actually several, particularly at work) I can actually get to them as I can ssh into the tmux/byobu session too.
Another nice feature is tramp-mode allowing you to edit a remote file (possibly used by a remote R session) in a local Emacs buffer as tramp wraps around ssh and scp making the remote file appear local.
Last but not least mosh is very nice on the (Ubuntu) laptop as it automagically resumes sessions when I am back on the local network at home or work. In my case mosh from Debian/Ubuntu on server and client; may also work for you OS X folks.
In short, works like a dream, but may require the extra step of "disconnecting" emacs from the particularly tmux shell in which you launch. Daemon mode is key. Some of these sessions run on for weeks.
I started working like this maybe half a decade ago. Possibly longer. But using ESS to connect to remote Emacs session is much older -- I think the ESS manual already had entries for it when I first saw it in the late 1990s.
But I find this easier as it gives me "the whole emacs" including whatever other buffers and session I may need.
Edit: And just to be plain, I also use RStudio (Server) at home and work, but generally spend more time in Emacs for all the usual reasons.
More Edits: In follow-up to #kjhealy I added that I am also a fan of both tramp-mode (edit remote files locally in Emacs thanks to the magic that are ssh and scp) as well as mosh (sessions that magically resume when I get to work or back home).

How to set up cluster slave nodes (on Windows)

I need to run thousands* of models on 15 machines (each of 4 cores), all Windows. I started to learn parallel, snow and snowfall packages and read a bunch of intro's, but they mainly focus on the setup of the master. There is only a little information on how to set up the worker (slave) nodes on Windows. The information is often contradictory: some say that SOCK cluster is practically the easiest way to go, others claim that SOCK cluster setup is complicated on Windows (sshd setup) and the best way to go is MPI.
So, what is an easiest way to install slave nodes on Windows? MPI, PVM, SOCK or NWS? My, possibly naive ideas were (listed by priority):
To use all 4 cores on the slave nodes (required).
Ideally, I need only R with some packages and a slave R script or R function that would listen on some port and wait for tasks from master.
Ideally, nodes can be added/removed dynamically from the cluster.
Ideally, the slaves would connect to the master - so I wouldn't have to list all the slaves IP's in configuration of the master.
Only 1 is 100% required, 2-4 are "would be good". Is it too naive to request?
I am sorry but I have not been able to figure this out from the available docs and tutorials. I would be grateful if you point me out to the right source.
* Note that each of those thousands of models will take at least 7 minutes, so there won't be a big communication overhead.
It's a shame how all these APIs (like parallel/snow/snowfall) are complex to work with, a lots of docs but not what you need... I have found an API which is very simple and goes straight to the ideas I sketched!! It is redis and doRedis R package (as recommended here). Finally a very simple tutorial is present! Just modified a bit and got this:
The workers need only R, doRedis package and this script:
require(doRedis)
redisWorker('jobs', '10.0.0.7') # IP of the server
The master needs redis server running (installed the experimental windows binaries for Windows), and this R code:
require(doRedis)
registerDoRedis('jobs')
foreach(j=1:10,.combine=sum,.multicombine=TRUE) %dopar%
... # whatever you need to run
removeQueue('jobs')
Adding/removing workers is fully dynamic, no need to specify IPs at master, automatic "load balanancing", simple and no need for tons of docs! This solution fulfills all the requirements and even more - as stated in ?registerDoRedis:
The doRedis parallel back end tolerates faults among the worker processes and automatically resubmits failed tasks.
I don't know how complex this would be using the parallel/snow/snowfall with SOCKS/MPI/PVM/NWS, if it would be possible at all, but I guess very complex...
The only disadvantages of using redis I found:
It is a database server. I wonder if this API exist somewhere without the need to install the database server which I don't need at all. I guess it must exist!
There is a bug in the current doRedis package ("object '.doRedisGlobals' not found") with no solution yet and I am not able to install the old working doRedis 1.0.5 package into R 3.0.1.

Process stop getting network data

We have a process (written in c++ /managed), which receives network data via tcpip.
After running the process for a while while tracking network load, it seems that network get into freeze state and the process does not getting data, there are other processes in the system that using networking (same nic) which operates normally.
the process gets out of this frozen situation by itself after several minutes.
Any idea what is happening?
Any counter i can track to see if my process reach some limitations ?
It is going to be very difficult to answer specifically,
-- without knowing what exactly is your process/application about,
-- whether it is a network chat application, or a file server/client, or ......
-- without other details about your process how it is implemented, what libraries it uses, if relevant to problem.
Also you haven't mentioned what OS and environment you are running this process under,
there is very little anyone can help . It could be anything, a busy wait loopl in your code, locking problems if its a multi-threaded code,....
Nonetheless , here are some options to check:
If its linux try below commands to debug and monitor the behaviour of the process and see what could be problem-
top
Check top to see ow much resources(CPU, memory) your process is using and if there is anything abnormally high values in CPU usage for it.
pstack
This should stack frames of the process executing at time of the problem.
netstat
Run this with necessary options (tcp/udp) to check what is the stae of the network sockets opened by your process
gcore -s -c
This forces your process to core when the mentioned problem happens, and then analyze that core file using gdb
gdb
and then use command where at gdb prompt to get full back trace of the process (which functions it was executing last and previous function calls.

Amazon EC2 / RStudio : Is there are a way to run a job without maintaining a connection?

I have a long running job that I'd like to run using EC2 + RStudio. I setup the EC2 instance then setup RStudio as a page on my web browser. I need to physically move my laptop that I use to setup the connection and run the web browser throughout the course of the day and my job gets terminated in RStudio but the instance is still running on the EC2 dashboard.
Is there a way to keep a job running without maintaining an active connection?
Does it have to be started / controlled via RStudio?
If you make your task a "normal" R script, executed via Rscript or littler, then you can run them from the shell ... and get to
use old-school tools like nohup, batch or at to control running in the background
use tools like screen, tmux or byobu to maintain one or multiple sessions in which you launch the jobs, and connect / disconnect / reconnect at leisure.
RStudio Server works in similar ways but AFAICT limits you to a single user per user / machine -- which makes perfect sense for interactive work but is limiting if you have a need for multiple sessions.
FWIW, I like byobu with tmux a lot for this.
My original concern that it needed to maintain a live connection was incorrect. It turns out the error was from running out of memory, it just coincided with being disconnected from the internet connection.
An instance is started from the AWS dashboard and stopped or terminated from there also. As long as it is still running it can be accessed from an RStudio tab by copying the public DNS to the address bar on the web page and logging in again.

Resources