How can I run Jupyter notebooks disconnecting/reconnecting to its kernels without losing all the output from completed cells?
I know this has been asked several times and there are plenty of Q&A and web pages speaking about this, but I could not understand whether or not what I would like to do is possible and, if it is possible, what I am doing wrong. Many web pages and questions date back to several years ago, so, may be, such a basic functionality has been finally implemented.
I set up a remote jupyter using screen/tmux and ssh port forwarding. So I am able to connect to the remote machine using a web browser.
This is the notebook I run:
cell 1
import time
i = 0
while i < 10:
i += 1
print(i, flush=True)
time.sleep(5)
print("done")
cell 2:
print(i)
When I run the first cell, close the browser, reconnect to the kernel running the notebook above (Jupyter -> Running -> notebook) it seems that nothing is running. However, when I run the second cell, I realize the kernel is still running the first cell: there is no output for a while, then 10 is printed.
But:
When I reopen the notebook, there is no indication about which cell is currently running;
The output is lost. Not only the output produced after disconnecting from the kernel (closing the browser tab), but also all the output produced by the running cell between the save action and the disconnection.
This makes Jupyter completely useless when running (long) notebooks whose output needs to be monitored every once in a while.
I have this question. Without using tricks (like capturing output with some magic, writing the output to external log files, running the notebook as a python script, ...), is it possible to configure jupyter in order to disconnect/reconnect from/to its kernels running ipynb files using only the web interface and keeping the output produced while no browsers were attached to it?
Related
I have a remote server with a long-running jupyter notebook process (keras training).
When I reconnect to the notebook (either by reloading the chrome tab or by clicking on the notebook in the list of running notebooks), I get a tab marked as "running" (hourglass icon instead of the book icon), and I can confirm that the process is running by ssh to the server.
However, the progress indicator and stdout of the process running is lost. The bar
71255/225127 [========>.....................] - ETA: 3:32:43 - loss: 2.1890
is never updated. Other (non-keras) processes lose their stdout (unless I also write to a file).
Is this a known problem?
Is there a way to recover the output stream after reconnecting?
This is a known problem, and as far as I know IPython has already published a collaboration jupyter notebook option, that they said "could solve the problem in the collaboration mode"
Sadly, if it's lost, there's no way to find it back (unless you outputted to a file or to a variable, or you know how to do some intricate tricks that could recover the data)
One way to solve this, to prevent the issue in advance, is to use magic code
%%capture <VARIABLE NAME> in the cell you want to save the stdout when closed the notebook in your browser (don't terminate your notebook). In this way, output will be save in <VARIABLE NAME> and later on you can access it using print(<VARIABLE NAME>.stdout), after the process is finished.
I am used to using R in RStudio. For a new project, I have to use R on the command line, because the data storage and analysis are only allowed to be on a specific server that I connect to using ssh. This server doesn't have rstudio-server to support remote RStudio sessions.
The project involves an extremely large dataset, and some pre-written code to load/format the data that I have been told to run using "source()" before I do anything else. This takes several minutes to run and load the data each time.
What would a good workflow be for something like this? Editing my code in a .r file, saving, then running it would require taking several minutes to load the data each time. But just running R in an interactive session would make it hard to keep track of what I am doing and repeat things if necessary.
Is there some command-line equivalent to RStudio where you can have an interactive session but be editing/saving a file of your code as you go?
Sounds like JuPyteR might be your friend here.
The R kernel works great.
You can use it on a remote server either with exposing an open port (and setting up JuPyteR login credentials)
Or via port forwarding over SSH.
It is a lot like an interactive reply, except it holds state.
And you can go back and rerun cells.
(Of course state can be dangerous for reproduceability)
For RStudio you can launch console and ssh to your remote servers even if your servers don't use expensive RStudio for servers platform. You can then execute all commands from R Studio directly into the ssh with the default shortcut key. This might allow to continue using R studio, track what you're doing in the R script, execute interactively.
I am currently running R on a Microsoft Azure instance (Ubuntu virtual machine) using RStudio as my IDE, to which I connect simply through my browser. I am trying to run some commands that take quite some time to complete from within RStudio and figured that I could simply close my tab with RStudio open and the process would keep running. However, when I try to reconnect to see how the process is doing, the page keeps loading but I am unable to see RStudio.
I have a few questions regarding running RStudio on a server:
First, am I correct in thinking that I can close my tab and keep the process running?
Second, is it normal behaviour that I am unable to connect to the server while the process is running?
Third, am I going about this the correct way or are there better ways?
Yes, you can close your tab and keep it running.
RStudio Server waits on updates from the R process to update the UI. This means that if you have a long-running computation, your tab may not fully reload until it's finished. You may also have seen this in the middle of a session: when R is busy, you can have problems saving scripts that are open in the editor pane.
Logging out in the middle of a computation should be safe, but be aware that RStudio will save your workspace and shut R down after a period of inactivity. It then reloads everything when you log back in. But this only extends to objects in memory; if you have any files saved in your temp directory, they'll have disappeared when you come back. They're probably still on the disk, but since your new R session has a new temp directory, you'll have to do a manual search for them.
I love notebooks. I love them so much that I have many of them running at the same time, often in different browsers, sometimes on different remote clients. I miss one feature: when I close the tab corresponding to a running notebook, it warns that the corresponding run will be stopped.
My question:
How do I make a jupyter notebook resume it's run even if the page is closed ?
such that I can:
re-open the tab in another browser (possibly on a remote computer such as a tablet),
restart a browser when it needs to,
close those with long running time for later inspection.
From what I understand, the client-server architecture could make that possible, but that there may be issues with multiple concurrent runs...
PS: I created an issue on GitHub
In fact, this was answered in the github issue:
takluyver commented on 26 Apr 2017: Anything already running in the
notebook will keep running, and the kernel it started for that will
stay running - so it won't lose your variables. However, any output
produced while the notebook isn't open in a browser tab is lost; there
isn't an easy way to change this until we have the notebook server
able to track the document state, which has been on the plan for ages.
Thanks!
I am using the LATEST (2.2.0) ipython notebook, when I create a notebook with a loop to write many lines (about 20000 lines), then it run forever I guess since I always see the running icon at the top right. Even if I restart the computer and reopen the notebook again, it will into a running mode automatically, then I almost unable to do anything in this page. I have to copy the code and new another page to fix it.
How can I fix such hang issue during open a too large notebook? I have tried the kernel "interrupt" and "restart" menu and it seems no any effect at all.
IPython notebook is not intended to do such tasks with too much calculation or too many output data because actually such things is for standalone program rather than a notebook.
To fix such issues, you need to create a standalone application (script) to do it from console, then paste the meaningful result into IPython notebook.