Is it possible to pause an R script that is running? - r

I am running a some analysis in R which is going to take at least 24 hours to finish. Is it possible to pause the function midway, so that I can take my computer to work and back?

This is not possible AFAIK, but I believe you can just suspend your computer and the processes automatically will be paused.
If you are using Linux, you can also stop and continue a process manually using killall -STOP R and killall -CONT R commands. Take a look at this article and the comment section there, which contain useful information regarding this.
On Windows, you can maybe use the Task Manager or install special software that is capable of doing that. But I really do not know as I do not use Windows on a regular basis.
EDIT: even if you use kill or killall to pause the process, but shutdown the computer, you will lose the data.

Related

Best way to execute code on remote machine

I am looking for the best way to execute code on a distant machine. Ideally, I am looking for a solution such as Cuda which provides the opportunity to allocate executions on GPU or CPU, but for distinct machine.
I tried distinct ways to do that :
I connect my machines with ssh, export my script, execut it. No particular issue, but not very handy. But maybe this solution could be optimise. Because I open my ssh connection with the terminal, or termius.
I try another way with mosh, same outcomes, but quicker.
Currently, I am working on a Spyder kernel to have a direct link in the place of execution.
I've seen there is also a possibility with a nohup connection, but I have to work on this solution to understand well the possibilities.
Everything works well, but I am looking for a more convenient solution.
Thank you in advance for your answers !
You could either use sshfs along to ssh to mount the remote filesystem on your machine it's easier than always copy the code by hand, if so I would recommend to use screen or something like that that if the connection breaks it offers no problems.
Personal I like to work with Visual Studio Code and the ssh fs extension for this purpose.
An other alternative is to work with X2Go. X2Go enables you to access a graphical desktop of a computer over a low bandwidth (or high bandwidth) connection.

What is merit of terminal multiplexer compared to standard terminal app and job control?

I don't know what is a merit of a terminal multiplexer like screen or tmux compared to combination of a standard terminal application and a job control feature of a shell.
Typically good features of a terminal multiplexer are cited as follows:
persistance
multiple windows
session sharing
First two features are, however, achieved with a terminal application like iTerm2 and a job control feature of a shell like bash.
Session sharing is a novel feature but it seems to be required in a quite rare situation.
What is a merit of terminal multiplexer? Why do you use it?
I'm interested especially in its merit in daily task.
I can tell you from my perspective as a devops/developer.
Almost every day I have to deploy a bunch of apps (a particular version) on multiple servers. Handling that without something like Terminator or Tmux would be a real pain.
On a single window I can put something like 4 panes (four windows in one) and monitor stuff on 4 different servers...which by it self is a huge deal...without tabs or other terminal instances and what not....
On the first pane I can shutdown nginx, on the second server I can shut down all the processes with supervisord (process manager), and on the third pane I can do the deploy process...if I quickly need to jump to some other server I just use the fourth pane...
Some colleagues that only use a bunch of terminal instances can get really confused when they have to do a bunch of things quickly, constantly ssh-ing in and out ...and if they are not careful they can go to the wrong server because they switched to the wrong terminal instance and entered a command that wasn't meant for that particular server :)...
A terminal multiplexer like Tmux really does help me to be quick and accurate.
There is an package manager for Tmux, which lets you install plugins and really supercharge you terminal even more!
On a different note, a lot of people are using Tmux in combination with Vim...which lets you create some awesome things together...
All in all, those were my two cents on the benefit of using a terminal multiplexer...

How to stop the running cell if interupt kernel does not work in Jupyter Notebook

I have been using Jupyter Notebook for a while. Often when I try to stop a cell execution, interrupting the kernel does not work. In this case, what else can I do, other than just closing the notebook and relaunching it again? I guess this might be a common situation for many people.
Currently this is an issue in the github jupyter repository as well,
https://github.com/ipython/ipython/issues/3400
there seems to be no exact solution for that except killing the kernel
If you're ok with losing all currently defined variables, then going to Kernel > Restart will stop execution without closing the notebook.
This worked for me:
- Put the laptop to sleep (one of the power options)
- Wait 10 s
- Wake up computer (with power button)
Kernel then says reconnecting and its either interrupted or you can press interrupt.
Probably isn't fool proof but worth a try so you don't waste previous computation time.
(I had Windows 10 running a Jupyter Notebook that wouldn't stop running a piece of Selenium code)
There are a few options here:
Change the folder name of data:
Works if the cell is running already and pulling data from a particular folder. For example I had a for loop that when interrupted just moved to the next item in list it was processing.
Change the code in the cell to generate an error:
Works if the cell has not been run yet but is just in queue.
Restart Kernel:
If all else fails
Recently I also faced a similar issue.
Found out that there is an issue in Python https://github.com/ipython/ipython/issues/3400 and it was there for 6 some years and it has been resolved as of 1st March 2020.
One thing that might work is hitting interrupt a bunch of times. It's possible that a library you are using catches the interrupt signal and only stops after receiving the signal multiple times.
For example, when using sklearn's cross_val_score() I found that I have to interrupt once for each cross validation fold.
If you know in advance that you might want to stop without losing all your variables, the following solution might be useful:
In cells that take a while because of long loops, you may implement something like this in the loop:
if os.path.exists(os.path.join(os.getcwd(),'stop_true.txt')):
break
Then if you want to stop just create the file 'stop_true.txt'. And the loop stops before the next round.
Usually, the file is called 'stop_false.txt' until I rename it to stop the loop.
Additionally, the results of each loop are stored in a dictionary separately. Therefore I'm able to keep all results until the break happened and can restart the loop from this point onwards.
If the iPython kernel did not die, you might be able to inject Python code into it that saves important data using pyrasite. You need to install and run pyrasite as root, i.e. with sudo python -m pip install pyrasite or python3 as needed. Then you need to figure out the process id (PID) of the iPython kernel (e.g. via htop or ps aux | grep ipython), say 3873. Then, write a script that saves the state for example to a pickle in a file inject.py, say, it is a Pandas dataframe df in the global scope:
df.to_pickle("rescued_df.pkl")
Finally, and inject it into the process as follows:
sudo pyrasite 3873 inject.py
You may need to enable dtrace first like so:
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
For me, setting up a time limit worked: https://github.com/scipopt/PySCIPOpt/issues/197. Specifically, I added "model.setRealParam("limits/time", 60)" piece of code and it automatically stops calculation after 60 seconds. You can set up any time instead of 60. But this is for pyscipopt package (solving optimization model). I am not sure how to set up the time limit for your specific problem.
Try this:
Close the browser tab in which Jupyter is running
Run jupyter server list
Kill each running server with jupyter server stop <PORT>
You can force the termination by deleting the cell. I copy the code, delete the cell, create a new cell, paste, and execute again. Works like a charm.
I suggest to restart the kernel (Kernel -> Restart Kernel) as suggested by #hamdog.
It will be ready to use after that. However, it will certainly delete all variables stored in memory.

is it possible to run R as a daemon

I have a script in R that is frequently called during the day (by other scripts). I call R in a terminal using
Rscript code.R
I notice it takes a lot of time to load packages and set up R.
Is it possible to run R as a background service which I hit using a port or something?
Yes, look into RServe which has been available for over a dozen years for this reason. There are a couple of fairly high profile applications too.
You can check out this add-in for Rstudio, it is not a port like solution but maybe it can help you https://github.com/bnosac/taskscheduleR

When running R, how to exit from Emacs-ESS gracefully?

Sometimes, right after I submit a computation-intensive run to R in ESS, my whole screen freezes. When that happens, none of the Emacs commands work (I use laptop running XP). My crude solution is to press Control-Alt-Del, go to the Task Manager, and kill the R process, not the Emacs process. (I actually use Process Explorer). Once I kill the R process, I get the Emacs buffers back, but lose the R session. I can then do Meta-R and start again.
Does anyone know of a more graceful way to exit/abort from R within ESS?
Control-G will get you control of emacs again. Then control-C twice will interrupt R and probably get your prompt back.

Resources