Jupyter: child processes of "%%bash" - jupyter-notebook

How does jupyter execute %%bash cells? I ran the following experiment:
Run this cell:
%%bash
{
while true; do
echo test;
sleep 1;
done | tee test.txt
} &
It outputs the "test" every second.
After you kill the cell and go to run a different cell, the different cell will still output "test" once every second.
It's clear that jupyter doesn't clean up bash child processes. But can you give a high-level overview of how `%%bash`` is implemented and in what way the bash execution is separated from the python execution?

Related

Parallelize a bash script and wait for each loop to finish

I'm trying to write a script, that we call pippo.R. pippo.R aim, is to run another script (for.sh) in a for loop with a parallelization using two values :
nPerm= total number of times the script has to be run
permAtTime= number of script that can run at the same time.
A very important thing to do, is to wait for each loop to be concluded, thats why I added a file in which all the PID are stored and then I use the wait function to wait for each of them. The main problem of this script is the following error :
./wait.sh: line 2: wait: pid 836844 is not a child of this shell
For reproducibility sake you can put in a folder the following files :
pippo.R
nPerm=10
permAtTime=2
cycles=nPerm/permAtTime
for(i in 1:cycles){
d=1
system(paste("./for.sh ", i," ",permAtTime,sep=""))
}
for.sh
#!/bin/bash
for X in $(seq $1)
do
nohup ./script.sh $(($X +($2 -1)*$1 )) &
echo $! >> ./save_pid.txt
done
./wait.sh
wait.sh
#!/bin/bash
while read p; do wait $p; done < ./save_pid.txt
Running Rscript pippo.R you will have the explained error. I know that there is the parallel function that can help me in this but for several reasons i cannot use that package.
Thanks
You don't need to keep track of PIDs, because if you call wait without any argument, the script will wait for all the child processes to finish.
#!/bin/bash
for X in $(seq $1)
do
nohup ./script.sh $(($X +($2 -1)*$1 )) &
done
wait

sleep inside inotifywait in a shell function not working

I am trying to run the following function
foo () {
sleep 1
echo "outside inotify"
(inotifywait . -e create |
while read path action file; do
echo "test"
sleep 1
done)
echo "end"
}
Until inotifywait it runs correctly; I see:
>> foo
outside inotify
Setting up watches.
Watches established.
However as soon as I create a file, I get
>>> fooo
outside inotify
Setting up watches.
Watches established.
test
foo:6: command not found: sleep
end
Any idea why? Plus do I need to spawn the subprocess ( ) around inotifywait? what are the benefits?
thank you.
Edit
I realized I am running on zsh
The read path is messing you up, because unlike POSIX-compliant shells -- which guarantee that only modification to variables with all-uppercase names can have unwanted side effects on the shell itself -- zsh also has special-cased behavior for several lower-case names, including path.
In particular, zsh presents path as an array corresponding to the values in PATH. Assigning a string to this array will overwrite your PATH as well.

make zsh prompt update each time a command is executed

TLDR;
I need a way for .zshrc to automatically be sourced each time a command is executed. PROMPT needs to be updated each time a command is executed in order to show relevant information in the prompt.
Reason
I use Watson cli for tracking time. On my previous bash setup, I prepended my prompt ($PS1) with a symbol that indicates whether the timer is running or not (red/green). I have mimicked this functionality with Oh My Zsh, as follows (in the theme file):
WATSON_DIR="$HOME/Library/Application Support/watson"
watson_status() {
local txtred="${fg_bold[red]}"
local txtgrn="${fg_bold[green]}"
local txtrst="${reset_color}"
# Started
local status_color="$txtgrn"
# Stopped
if [[ $(cat "$WATSON_DIR/state") == '{}' ]]; then
status_color="$txtred"
fi
echo -e "$status_color""◉""$txtrst"
}
PROMPT="╭── %{$(watson_status) $fg_bold[green]%}%~%{$reset_color%}$(git_prompt_info) ⌚ %{$FG[130]%}%*%{$reset_color%}
╰─➤ $ "
Current issue
The icon will indicate the color of the state at the time that .zshrc was executed. For example, if the timer is running and the icon is properly indicating green, stopping the timer will not cause the icon to turn red. In order to see the icon change color, I have to source .zshrc.
This indicates that the function watson_status() needs to be run each time a command is executed, to give the latest status at the time of the command
I recently ported some prompt code from bash to zsh - on OSX Big Sur - and these were the two big "things to know" for me:
add setopt PROMPT_SUBST to your .zshrc. This "allows for functions in the prompt"
use single quotes when defining your PS1 / PROMPT. If you use double quotes, then the whole string will be evaluated once when the terminal starts, and then that evaluated value is "re-executed" every time the command changes. But you want the functions to be re-evaluated, not the output of the function at the time when the terminal is created.
Easiest example I used to confirm I had it working:
# print_epoch() { date '+%s' }
# this is what you want
# export PS1='$(print_epoch) > '
# this is not what you want
# export PS1="$(print_epoch) > "
Also worth noting PS1 and PROMPT are interchangeable
Extra:
While we are sharing here is my fairly minimal echo my user, directory, and git branch PROMPT with some colors and a pretty leaf:
parse_git_branch() {
git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/\1/'
}
setopt PROMPT_SUBST
autoload -U colors && colors
export PROMPT='%n %~ %F{blue}🌿$(parse_git_branch)%f > '
noting that:
%n prints name
%~ prints directory relative to home
%F{blue} changes text to blue
%f resets color
Documentation: http://zsh.sourceforge.net/Doc/Release/Prompt-Expansion.html#Visual-effects

Process not listed on "ps -ef" (AIX 7.1)

I have an unusual problem involving the output from the ps -ef command on AIX 7.1.
A shell script monitors processes by parsing this output. I've noticed on two occasions a process (a Perl program) was omitted from this list. Everything I've read on the subject says this is not possible. The program in question starts via crontab at 6am and runs until 11pm, when it self terminates. I checked the output of ps -ef immediately after being omitted by the monitor script, and it displays:
user 1249864 9569338 0 06:00:00 - 0:19 /usr/bin/perl -w /path/to/omittedProgram.pl
... which means it's the same process that was started at 6am. The program did not terminate, then restart.
What is causing it to be omitted from the ps -ef output?
Edit: This is the program that examines the output of ps -ef, which has been running successfully for about five years. I've only noticed this problem twice, but both have been in the last 2 months:
# set global variables
PROCESS_FILE=/tmp/processList.txt
TEMP_FILE=/tmp/greppedProcesses.tmp
BOX=`uname -n`
DATE=`date`
EMAIL_LIST="Support#email.address"
# Get list of running processes
ps -ef > $PROCESS_FILE
checkProcess() {
PROCESS_NAME=$1
PROCESS_ABBREVIATION=$2
PROCESS_COUNT=$3
UNIQUE_PROCESS_IDENTIFIER=$4
GREPPED_LINES=$TEMP_FILE-$PROCESS_ABBREVIATION
grep $UNIQUE_PROCESS_IDENTIFIER $PROCESS_FILE | grep -v grep > $GREPPED_LINES
NUM=`cat $GREPPED_LINES | wc -l`
if [[ $NUM -ne $PROCESS_COUNT ]]
# Incorrect number of processes running!
then MESSAGE=`printf "The \"$PROCESS_NAME\" process count is %1d, but it should be $PROCESS_COUNT!!!" $NUM`
echo "Monitor - starting on $DATE\n\n$MESSAGE\n\n`cat $GREPPED_LINES`" | mail -s "Problem with $PROCESS_NAME on $BOX" $EMAIL_LIST
fi
# Delete the temp file
rm $GREPPED_LINES
}
checkProcess "Full Name of Program" "Program Abbreviation" <expected number of processes running> "Unique string to identify program in ps output"
checkProcess ... (for other processes) ...
exit 0
This might be a long shot in your case but I had same experience with "ps -ef" in the past (don't remember the exact OS type where I seen it, but my script had to work on any Linux, AIX, Solaris and HP-UX).
The "ps -ef" output might be limited to a certain number of columns when used inside a script executed without a terminal. The user, pid, ppid, cputime columns are dynamic and breaking the format sometimes (when the data is larger then the reserved space).
For example if the PID of the process gets to large then the name of the process might be "cut" so that it doesn't appear in the already limited number of column displayed by "ps -ef" then your monitor script would fail.
You could try to keep the file containing the "ps -ef" output and check if it's this problem. No need to wait for when the issue happens, just check if you have the extra long process names in the file (anything longer then the process you're looking for).
My workaround for this problem is to specify a large enough number of columns to be used, like this: COLUMNS=8192 ps -ef > file.out the variable is set just for this 1 purpose.
I just heard from my server support team that the AIX 7.1 TL4 SP4 patch will fix this! We're installing it on our servers now and hopefully this won't happen again.

'tee' and exit status

Is there an alternative to tee which captures standard output and standard error of the command being executed and exits with the same exit status as the processed command?
Something like the following:
eet -a some.log -- mycommand --foo --bar
Where "eet" is an imaginary alternative to "tee" :) (-a means append and -- separates the captured command). It shouldn't be hard to hack such a command, but maybe it already exists and I'm not aware of it?
This works with Bash:
(
set -o pipefail
mycommand --foo --bar | tee some.log
)
The parentheses are there to limit the effect of pipefail to just the one command.
From the bash(1) man page:
The return status of a pipeline is the exit status of the last command, unless the pipefail option is enabled. If pipefail is enabled, the pipeline's return status is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully.
I stumbled upon a couple of interesting solutions at Capture Exit Code Using Pipe & Tee.
There is the $PIPESTATUS variable available in Bash:
false | tee /dev/null
[ $PIPESTATUS -eq 0 ] || exit $PIPESTATUS
And the simplest prototype of "eet" in Perl may look as follows:
open MAKE, "command 2>&1 |" or die;
open (LOGFILE, ">>some.log") or die;
while (<MAKE>) {
print LOGFILE $_;
print
}
close MAKE; # To get $?
my $exit = $? >> 8;
close LOGFILE;
Here's an eet. Works with every Bash I can get my hands on, from 2.05b to 4.0.
#!/bin/bash
tee_args=()
while [[ $# > 0 && $1 != -- ]]; do
tee_args=("${tee_args[#]}" "$1")
shift
done
shift
# now ${tee_args[*]} has the arguments before --,
# and $* has the arguments after --
# redirect standard out through a pipe to tee
exec | tee "${tee_args[#]}"
# do the *real* exec of the desired program
exec "$#"
(pipefail and $PIPESTATUS are nice, but I recall them being introduced in 3.1 or thereabouts.)
This is what I consider to be the best pure-Bourne-shell solution to use as the base upon which you could build your "eet":
# You want to pipe command1 through command2:
exec 4>&1
exitstatus=`{ { command1; echo $? 1>&3; } | command2 1>&4; } 3>&1`
# $exitstatus now has command1's exit status.
I think this is best explained from the inside out – command1 will execute and print its regular output on stdout (file descriptor 1), then once it's done, echo will execute and print command1's exit code on its stdout, but that stdout is redirected to file descriptor three.
While command1 is running, its stdout is being piped to command2 (echo's output never makes it to command2 because we send it to file descriptor 3 instead of 1, which is what the pipe reads). Then we redirect command2's output to file descriptor 4, so that it also stays out of file descriptor one – because we want file descriptor one clear for when we bring the echo output on file descriptor three back down into file descriptor one so that the command substitution (the backticks) can capture it.
The final bit of magic is that first exec 4>&1 we did as a separate command – it opens file descriptor four as a copy of the external shell's stdout. Command substitution will capture whatever is written on standard out from the perspective of the commands inside it – but, since command2's output is going to file descriptor four as far as the command substitution is concerned, the command substitution doesn't capture it – however, once it gets "out" of the command substitution, it is effectively still going to the script's overall file descriptor one.
(The exec 4>&1 has to be a separate command to work with many common shells. In some shells it works if you just put it on the same line as the variable assignment, after the closing backtick of the substitution.)
(I use compound commands ({ ... }) in my example, but subshells (( ... )) would also work. The subshell will just cause a redundant forking and awaiting of a child process, since each side of a pipe and the inside of a command substitution already normally implies a fork and await of a child process, and I don't know of any shell being coded to recognize that it can skip one of those forks because it's already done or is about to do the other.)
You can look at it in a less technical and more playful way, as if the outputs of the commands are leapfrogging each other: command1 pipes to command2, then the echo's output jumps over command2 so that command2 doesn't catch it, and then command2's output jumps over and out of the command substitution just as echo lands just in time to get captured by the substitution so that it ends up in the variable, and command2's output goes on its way to the standard output, just as in a normal pipe.
Also, as I understand it, at the end of this command, $? will still contain the return code of the second command in the pipe, because variable assignments, command substitutions, and compound commands are all effectively transparent to the return code of the command inside them, so the return status of command2 should get propagated out.
A caveat is that it is possible that command1 will at some point end up using file descriptors three or four, or that command2 or any of the later commands will use file descriptor four, so to be more hygienic, we would do:
exec 4>&1
exitstatus=`{ { command1 3>&-; echo $? 1>&3; } 4>&- | command2 1>&4; } 3>&1`
exec 4>&-
Commands inherit file descriptors from the process that launches them, so the entire second line will inherit file descriptor four, and the compound command followed by 3>&1 will inherit the file descriptor three. So the 4>&- makes sure that the inner compound command will not inherit file descriptor four, and the 3>&- makes sure that command1 will not inherit file descriptor three, so command1 gets a 'cleaner', more standard environment. You could also move the inner 4>&- next to the 3>&-, but I figure why not just limit its scope as much as possible.
Almost no programs uses pre-opened file descriptor three and four directly, so you almost never have to worry about it, but the latter is probably best to keep in mind and use for general-purpose cases.
{ mycommand --foo --bar 2>&1; ret=$?; } | tee -a some.log; (exit $ret)
KornShell, all in one line:
foo; RET_VAL=$?; if test ${RET_VAL} != 0;then echo $RET_VAL; echo Error occurred!>/tmp/out.err;exit 2;fi |tee >>/tmp/out.err ; if test ${RET_VAL} != 0;then exit $RET_VAL;fi
#!/bin/sh
logfile="$1"
shift
exec 2>&1
exec "$#" | tee "$logfile"
Hopefully this works for you.
Assuming Bash or Z shell (zsh),
my_command >>my_log 2>&1
N.B. The sequence of redirection and duplication of standard error onto standard output is significant!
I didn't realise you wanted to see the output on screen as well. This will of course direct all output to the file my_log.

Resources