👋
I have a long-running script (a few days) that I run on two environments (staging + prod). Each of these executions will produce about 3MB output in logs that it will print to the screen over time.
I'm running these in tmux, splitting my pane so that one half of the screen is staging and the other half is prod. I then ran the script, checked in every morning and then suddenly one only one split was left. I checked the logs that I tee'ed to a file and the script that was running in the pane that disappeared indeed finished.
I first thought that I accidentially closed it the other evening, but I then repeated that process twice and the exact same thing happened every time. I'm now asking for help to root-cause this :)
Thoughts I had so far:
Is tmux closing panes if the process that is running in it takes very long (days) ?
Does the log contain any escape characters that could close the pane?
Is the pane crashing because too many logs get printed at one point in time?
Note that this script is doing no "active" tmux interaction or anything like that, so it is not calling any tmux commands.
Was the script the only thing running in the pane or did you run it from a shell? If the former, or if you run it from the shell with "exec", then the pane will close when the script finishes.
Other possibilities are something killed the parent shell, or the shell has TMOUT or similar set. Or of course you closed it accidentally or another script did (perhaps by mixing up pane IDs or something).
Try setting the remain-on-exit option in the window and tmux will not close the panes when the program running in them exits and it will tell you if the program exited normally or on a signal.
Related
I have a Desktop Entry that calls a shiny app.
Here's my desktop entry
[Desktop Entry]
Type=Application
Comment=App
Name=App
Exec=/usr/bin/Rscript -e "shiny::runApp('~/raspberry_IP/app.R', launch.browser=TRUE)"
Icon=/home/path/to/logo/rpi_logo.png
Terminal=false
Categories=Utility
StartupNotify=true
This runs well and the Shiny application works as expected. However, when I close my browser tab, I see the process is still running.
I am running this application from Ubuntu 20.04 LTS.
If I change my terminal=false to terminal=true. I see that the output of my terminal freezes (the app no longer updates, hence no diagnostic printing). But the terminal is still active (which makes sense given the process is still running). I can CTRL+C out of it to kill it from my terminal, but that's not the desired user behavior.
I see some strategies to stop the app using stopApp() like here. This strategy uses a separate action button to kill the app, so the user has to "manually" stop it. This might work, but I believe it's natural for the user to try to close the tab or Browser (and there's no way for them to know the process is still running, unless they check).
Using this on the server side, does kill the process, but it's recommended against (see here).
session$onSessionEnded(stopApp)
In this case, because my app is running on a local machine, I think I could potentially get away with this (there is only one user at a time), but I was wondering whether there's better practice to implement.
I have a remote server with a long-running jupyter notebook process (keras training).
When I reconnect to the notebook (either by reloading the chrome tab or by clicking on the notebook in the list of running notebooks), I get a tab marked as "running" (hourglass icon instead of the book icon), and I can confirm that the process is running by ssh to the server.
However, the progress indicator and stdout of the process running is lost. The bar
71255/225127 [========>.....................] - ETA: 3:32:43 - loss: 2.1890
is never updated. Other (non-keras) processes lose their stdout (unless I also write to a file).
Is this a known problem?
Is there a way to recover the output stream after reconnecting?
This is a known problem, and as far as I know IPython has already published a collaboration jupyter notebook option, that they said "could solve the problem in the collaboration mode"
Sadly, if it's lost, there's no way to find it back (unless you outputted to a file or to a variable, or you know how to do some intricate tricks that could recover the data)
One way to solve this, to prevent the issue in advance, is to use magic code
%%capture <VARIABLE NAME> in the cell you want to save the stdout when closed the notebook in your browser (don't terminate your notebook). In this way, output will be save in <VARIABLE NAME> and later on you can access it using print(<VARIABLE NAME>.stdout), after the process is finished.
I'm running a long R script, which takes 2 or 3 days to finish. I accidentally run another script, which, if it works as R usually does, will go in some queue and R will run it as soon as the first script is over. I need to stop that, as it would compromise the results from the first script. Is there a visible queue or any other way to stop R from running some code?
I'm working on an interactive session in R studio, on windows 10.
Thanks a lot for any help!
Assuming you're running in console (or interactive session in R studio, that's undetermined from your question) and that what you did was sourcing a script/pasting code and while it was running pasting another chunck of code:
What is ongoing is that you pushed data into R process input stream, it's a buffered input, so it will run each line once the previous line call has ended and free the process.
There's no easy way to play with an input buffer, that's R internal input/output system and mostly it's the Operating system which have those information in cache for now.
Asking R itself is not possible as it already has this buffer to read, any new command would go after.
Last chance thing: If you can spot your another chunck of code starting in your console, you can try pressing the esc key to stop the code running.
You may try messing with the process buffers with procexp but there's a fair chance to just make your R session segfault anyway.
To avoid that in the future, use scripts and run them on the command line separately with Rscript (present in R bin directory under windows too despite the link pointing to a linux manpage).
This would create one session per script and allow to kill them independently. That said if they both write to the same place (database, a file would create an error if accessed by two process) that won't prevent data corruption.
I am guessing OP has below problem:
# my big code, running for a long time
Sys.sleep(10); print("hello 1")
# other big code I dropped in console while R was still busy with above code
print("hello 2")
If this is the case, I don't think it is possible to stop the 2nd process from running.
I don't know what is a merit of a terminal multiplexer like screen or tmux compared to combination of a standard terminal application and a job control feature of a shell.
Typically good features of a terminal multiplexer are cited as follows:
persistance
multiple windows
session sharing
First two features are, however, achieved with a terminal application like iTerm2 and a job control feature of a shell like bash.
Session sharing is a novel feature but it seems to be required in a quite rare situation.
What is a merit of terminal multiplexer? Why do you use it?
I'm interested especially in its merit in daily task.
I can tell you from my perspective as a devops/developer.
Almost every day I have to deploy a bunch of apps (a particular version) on multiple servers. Handling that without something like Terminator or Tmux would be a real pain.
On a single window I can put something like 4 panes (four windows in one) and monitor stuff on 4 different servers...which by it self is a huge deal...without tabs or other terminal instances and what not....
On the first pane I can shutdown nginx, on the second server I can shut down all the processes with supervisord (process manager), and on the third pane I can do the deploy process...if I quickly need to jump to some other server I just use the fourth pane...
Some colleagues that only use a bunch of terminal instances can get really confused when they have to do a bunch of things quickly, constantly ssh-ing in and out ...and if they are not careful they can go to the wrong server because they switched to the wrong terminal instance and entered a command that wasn't meant for that particular server :)...
A terminal multiplexer like Tmux really does help me to be quick and accurate.
There is an package manager for Tmux, which lets you install plugins and really supercharge you terminal even more!
On a different note, a lot of people are using Tmux in combination with Vim...which lets you create some awesome things together...
All in all, those were my two cents on the benefit of using a terminal multiplexer...
I have been using Jupyter Notebook for a while. Often when I try to stop a cell execution, interrupting the kernel does not work. In this case, what else can I do, other than just closing the notebook and relaunching it again? I guess this might be a common situation for many people.
Currently this is an issue in the github jupyter repository as well,
https://github.com/ipython/ipython/issues/3400
there seems to be no exact solution for that except killing the kernel
If you're ok with losing all currently defined variables, then going to Kernel > Restart will stop execution without closing the notebook.
This worked for me:
- Put the laptop to sleep (one of the power options)
- Wait 10 s
- Wake up computer (with power button)
Kernel then says reconnecting and its either interrupted or you can press interrupt.
Probably isn't fool proof but worth a try so you don't waste previous computation time.
(I had Windows 10 running a Jupyter Notebook that wouldn't stop running a piece of Selenium code)
There are a few options here:
Change the folder name of data:
Works if the cell is running already and pulling data from a particular folder. For example I had a for loop that when interrupted just moved to the next item in list it was processing.
Change the code in the cell to generate an error:
Works if the cell has not been run yet but is just in queue.
Restart Kernel:
If all else fails
Recently I also faced a similar issue.
Found out that there is an issue in Python https://github.com/ipython/ipython/issues/3400 and it was there for 6 some years and it has been resolved as of 1st March 2020.
One thing that might work is hitting interrupt a bunch of times. It's possible that a library you are using catches the interrupt signal and only stops after receiving the signal multiple times.
For example, when using sklearn's cross_val_score() I found that I have to interrupt once for each cross validation fold.
If you know in advance that you might want to stop without losing all your variables, the following solution might be useful:
In cells that take a while because of long loops, you may implement something like this in the loop:
if os.path.exists(os.path.join(os.getcwd(),'stop_true.txt')):
break
Then if you want to stop just create the file 'stop_true.txt'. And the loop stops before the next round.
Usually, the file is called 'stop_false.txt' until I rename it to stop the loop.
Additionally, the results of each loop are stored in a dictionary separately. Therefore I'm able to keep all results until the break happened and can restart the loop from this point onwards.
If the iPython kernel did not die, you might be able to inject Python code into it that saves important data using pyrasite. You need to install and run pyrasite as root, i.e. with sudo python -m pip install pyrasite or python3 as needed. Then you need to figure out the process id (PID) of the iPython kernel (e.g. via htop or ps aux | grep ipython), say 3873. Then, write a script that saves the state for example to a pickle in a file inject.py, say, it is a Pandas dataframe df in the global scope:
df.to_pickle("rescued_df.pkl")
Finally, and inject it into the process as follows:
sudo pyrasite 3873 inject.py
You may need to enable dtrace first like so:
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
For me, setting up a time limit worked: https://github.com/scipopt/PySCIPOpt/issues/197. Specifically, I added "model.setRealParam("limits/time", 60)" piece of code and it automatically stops calculation after 60 seconds. You can set up any time instead of 60. But this is for pyscipopt package (solving optimization model). I am not sure how to set up the time limit for your specific problem.
Try this:
Close the browser tab in which Jupyter is running
Run jupyter server list
Kill each running server with jupyter server stop <PORT>
You can force the termination by deleting the cell. I copy the code, delete the cell, create a new cell, paste, and execute again. Works like a charm.
I suggest to restart the kernel (Kernel -> Restart Kernel) as suggested by #hamdog.
It will be ready to use after that. However, it will certainly delete all variables stored in memory.