Timeout in tests when running pintos - pintos

I am just getting started with the pintos projects, working from my home computer that is running ubuntu 14.04 x64 system.
I'm able to compile the project from the src/threads/ directory, and the initial test pintos run alarm-multiple seems to work okay (notice that it runs qemu by default):
zay#ubuntu:~/Documents/pintos/src/threads/build$ pintos run alarm-multiple
Prototype mismatch: sub main::SIGVTALRM () vs none at /home/zay/Documents/pintos/src/utils/pintos line 935.
Constant subroutine SIGVTALRM redefined at /home/zay/Documents/pintos/src/utils/pintos line 927.
qemu-system-x86_64 -drive cache=writeback,file=/tmp/YS3E7FICwo.dsk -m 4 -net none -serial stdio
PiLo hda1
Loading..........
Kernel command line: run alarm-multiple
Pintos booting with 4,088 kB RAM...
382 pages available in kernel pool.
382 pages available in user pool.
Calibrating timer... 286,310,400 loops/s.
Boot complete.
Executing 'alarm-multiple':
(alarm-multiple) begin
(alarm-multiple) Creating 5 threads to sleep 7 times each.
(alarm-multiple) Thread 0 sleeps 10 ticks each time,
(alarm-multiple) thread 1 sleeps 20 ticks each time, and so on.
(alarm-multiple) If successful, product of iteration count and
(alarm-multiple) sleep duration will appear in nondescending order.
(alarm-multiple) thread 0: duration=10, iteration=1, product=10
(alarm-multiple) thread 0: duration=10, iteration=2, product=20
However, when I run make check under src/threads/build, all tests get a timeout fault:
zay#ubuntu:~/Documents/pintos/src/threads/build$ make check
pintos -v -k -T 60 --qemu -- -q run alarm-multiple < /dev/null 2> tests/threads/alarm-multiple.errors > tests/threads/alarm-multiple.output
perl -I../.. ../../tests/threads/alarm-multiple.ck tests/threads/alarm-multiple tests/threads/alarm-multiple.result
FAIL tests/threads/alarm-multiple
run: TIMEOUT after 61 seconds of wall-clock time - load average: 0.20, 0.45, 0.26
pintos -v -k -T 60 --qemu -- -q run alarm-simultaneous < /dev/null 2> tests/threads/alarm-simultaneous.errors > tests/threads/alarm-simultaneous.output
perl -I../.. ../../tests/threads/alarm-simultaneous.ck tests/threads/alarm-simultaneous tests/threads/alarm-simultaneous.result
FAIL tests/threads/alarm-simultaneous
run: TIMEOUT after 61 seconds of wall-clock time - load average: 0.18, 0.40, 0.25
pintos -v -k -T 60 --qemu -- -q run alarm-priority < /dev/null 2> tests/threads/alarm-priority.errors > tests/threads/alarm-priority.output
perl -I../.. ../../tests/threads/alarm-priority.ck tests/threads/alarm-priority tests/threads/alarm-priority.result
FAIL tests/threads/alarm-priority
run: TIMEOUT after 61 seconds of wall-clock time - load average: 0.10, 0.34, 0.2
What changes should I make to solve this problem?

Apparently, QEMU no longer supports the power off sequence on the port 0x8900. Here is a fix that made it work for me (found in chaOs): in the file devices/shutdown.c patch shutdown_power_off as follows:
void
shutdown_power_off (void)
{
// ...
printf ("Powering off...\n");
serial_flush ();
outw (0xB004, 0x2000); // <-- Add this line
// ...
}

If you are using qemu for pintos.
You need to add one line of code in devices/shutdown.c.
Insert the line
outw( 0x604, 0x0 | 0x2000 ); after printf (“Powering off…\n”); serial_flush (); as shown below:
/* This is a special power-off sequence supported by Bochs and
QEMU, but not by physical hardware. */
for (p = s; *p != ' printf ("Powering off...\n");
serial_flush ();
//add the following line
outw( 0x604, 0x0 | 0x2000 );
Follow this guide to find out more

Related

Simulate wait's command -n flag in zsh

I am creating two shell jobs as follows
sleep 5 &
completion_pid=$!
sleep 40 && exit 1 &
failure_pid=$!
In bash I am able to get the exit code of the first job to finish by using the -n flag of wait's command
# capture exit code of the first subprocess to exit
wait -n $completion_pid $failure_pid
It seems however that this flag is not available in my MacOS Big Sur's version of wait (probably cause I am using zsh - ? )
▶ wait -n
wait: job not found: -n
Are there any alternative tools to do this that are also available on MacOS?
What perhaps is weird is that I am getting the same error when invoking a script containing wait -n as bash myscript.sh...
Since you are waiting by specifying PIDs, you can simply do a
wait $completion_pid $failure_pid

Asterisk EAGI audio while running AMD or other asterisk app via "EXEC"

Is it possible to use "AMD" to detect silence in EAGI script and receive the audio on fd 3 at the same time?
Is this scenario supported or I am doing something wrong?
Simple demonstration bash script, which is run as EAGI(/home/agi/eagi.sh) from asterisk:
#!/bin/bash
log=/tmp/eagi.log
# Read all variables sent by Asterisk to array
declare -a array
while read -e ARG && [ "$ARG" ] ; do
array=(` echo $ARG | sed -e 's/://'`)
export ${array[0]}=${array[1]}
echo $ARG | sed -e 's/://' >>$log
done
/usr/bin/dd if=/dev/fd/3 of=/tmp/eagi.tmp.out &>>$log &
### or just sleep 10 ###
sleep 1
echo "EXEC AMD"
read line # blocks until silence is detected by AMD
echo $line >>$log
sleep 1
### ###
kill -USR1 %1; sleep 0.1; kill %1
ls -lh /tmp/eagi.tmp.out >>$log
echo "EXEC HANGUP "
read line
echo $line >>$log
exit
What it does is it starts capturing the audio data from fd 3 via dd started as background process. When I have just sleep 10 instead of the echo EXEC AMD, after the 10 seconds, dd has recorded the full audio file.
However with "AMD", dd stops receiving data on fd 3 as soon as the "AMD" is executed (confirmed also via strace) and continues after "AMD" finishes. So while "AMD" is running, no audio is recorded.
Output in the logfile looks like this:
Working (with just sleep):
1522+501 records in
1897+0 records out
971264 bytes (971 kB, 948 KiB) copied, 10.0023 s, 97.1 kB/s
-rw-r--r-- 1 asterisk asterisk 958K Sep 24 10:16 /tmp/eagi.tmp.out
Non-working (with "AMD" which detected silence after 6 seconds, and dd was running the whole time but only 1 second before and 1 second after "AMD" was recorded into the file):
322+101 records in
397+0 records out
203264 bytes (203 kB, 198 KiB) copied, 8.06516 s, 25.2 kB/s
-rw-r--r-- 1 asterisk asterisk 208K Sep 24 10:13 /tmp/eagi.tmp.out
So is this some kind of bug in Asterisk, or just unsupported usage? I didn't find much info about EAGI in the Asterisk documentation, so not sure what is supported and what not. Version of Asterisk is 16.2.1 on Debian 10, the testing call was done via webphone on Chrome browser, audio passed via fd 3 was 48 kHz, 16bit, mono (maybe with some other audio format/codec, both fd 3 and "AMD" would work at the same time?)
EDIT2: Removed info about my complicated setup and added simple reproducible example.
EDIT3: During further debugging I used "EXEC Background" to output some short audio file to the caller and also during this no audio was recorded. So the issue seems to be not only with "EXEC AMD", but also "EXEC Background" and probably also other asterisk applications invoked by "EXEC".

How to get supervisord to restart hung workers?

I have a number of Python workers managed by supervisord that should continuously print to stdout (after each completed task) if they are working properly. However, they tend to hang, and we've had difficulty finding the bug. Ideally supervisord would notice that they haven't printed in X minutes and restart them; the tasks are idempotent, so non-graceful restarts are fine. Is there any supervisord feature or addon that can do this? Or another supervisor-like program that has this out of the box?
We are already using http://superlance.readthedocs.io/en/latest/memmon.html to kill if memory usage skyrockets, which mitigates some of the hangs, but a hang that doesn't cause a memory leak can still cause the workers to reach a standstill.
One possible solution would be to wrap your python script in a bash script that'd monitor it and exit if there isn't output to stdout for a period of time.
For example:
kill-if-hung.sh
#!/usr/bin/env bash
set -e
TIMEOUT=60
LAST_CHANGED="$(date +%s)"
{
set -e
while true; do
sleep 1
kill -USR1 $$
done
} &
trap check_output USR1
check_output() {
CURRENT="$(date +%s)"
if [[ $((CURRENT - LAST_CHANGED)) -ge $TIMEOUT ]]; then
echo "Process STDOUT hasn't printed in $TIMEOUT seconds"
echo "Considering process hung and exiting"
exit 1
fi
}
STDOUT_PIPE=$(mktemp -u)
mkfifo $STDOUT_PIPE
trap cleanup EXIT
cleanup() {
kill -- -$$ # Send TERM to child processes
[[ -p $STDOUT_PIPE ]] && rm -f $STDOUT_PIPE
}
$# >$STDOUT_PIPE || exit 2 &
while true; do
if read tmp; then
echo "$tmp"
LAST_CHANGED="$(date +%s)"
fi
done <$STDOUT_PIPE
Then you would run a python script in supervisord like: kill-if-hung.sh python -u some-script.py (-u to disable output buffering, or set PYTHONUNBUFFERED).
I'm sure you could imagine a python script that'd do something similar.

Grabbing .jar application output stream to console after console was closed and new one opened on Oracle Solaris 11

On Oracle Solaris 11 console when ps -ef | grep java command is issued I can see running some java process PID, which was started on other console window and then it (console window) was closed (.jar application output then was visible). Is it some way to grab again that application output without restarting .jar file?
Application was started like this (as a root user):
java -jar SomeFile.jar &
Write output to file is not an option in this case.
Yes, you can do that, but it involves some mad skills with gdb. Here is how to do that in Linux and I believe you can do the same in Solaris (since it has gdb and it has all needed system calls I'm gonna use further).
There are 3 file descriptors for standard streams:
stdin: 0
stdout: 1
stderr: 2
You are interested in stdout and stderr (both are console output), so you need file descriptors with numbers 1 and 2, just keep it in mind.
Now I'm gonna show you how to do what you ask for "okular" application (instead of your "java" application) for stderr stream.
Run "okular" in terminal, like this:
$ okular &
and then close this terminal. This is just to simulate your situation.
Open another terminal
Look for "okular" process:
$ ps aux | grep okular
Output:
joe 27599 2.2 0.9 515644 73944 ? S 23:46 0:00 okular
So "okular" PID is 27599.
Look for open file descriptors of "okular" process:
$ ls -l /proc/27599/fd
Output:
lrwx------ 1 joe joe 64 Feb 18 23:46 0 -> /dev/pts/0 (deleted)
lrwx------ 1 joe joe 64 Feb 18 23:46 1 -> /dev/pts/0 (deleted)
lrwx------ 1 joe joe 64 Feb 18 23:46 2 -> /dev/pts/0 (deleted)
You see that all 3 streams are deleted.
Now let's attach to our process with gdb:
$ gdb -p 27599 /usr/bin/okular
Inside of gdb perform next operations:
(gdb) p close(2)
(gdb) p creat("/tmp/okular_2", 0600)
(gdb) detach
(gdb) quit
Here we invoked 2 system calls:
close(), to close file for stderr stream of our process
creat(), to create new file for stderr stream of our process
p is gdb command, it prints (in our case) system calls return values.
Now all new stderr output of our process will be appended to text file /tmp/okular_2. We can read it constantly this way:
$ tail -f /tmp/okular_2
Conclusion
Ok, that's it, we revived stderr stream. You can do the same for stdout stream, the only difference is that you need to call "close(1)" instead of "close(2)" in gdb. Also, in your case be sure to replace all "okular" words with your "java" word.
The most of answer was inspired by this article.
If you need to revive stdin stream, you can attach it to pipe (FIFO) file, see details here.
Yes, it is possible to snoop any process output with Solaris native tools.
One way would be using dtrace which allows tracing processes even when they are already grabbed by a debugger or similar tool.
This dtrace script will display a given process stdout:
#!/bin/ksh
pid=$1
dtrace -qn "syscall::write:entry /pid == $pid && arg0 == 1 /
{ printf(\"%s\",copyinstr(arg1)); }"
You should should pass the process id of the java application to trace as its first argument, eg. $(pgrep -f "java -jar SomeFile.jar").
Replace arg0 == 1 by arg0 == 2 if you want to trace stderr vs stdin.
Should you want to see non displayable characters (in octal), you might use this slightly modified version:
#!/bin/ksh
pid=$1
dtrace -qn "syscall::write:entry /pid == $pid && arg0 == 1 /
{ printf(\"%s\",copyinstr(arg1)); }" | od -c
Another native way is to use the truss command. The following script will show all writes from your process to any file descriptors, and will include a full detailed trace for both stdout and stderr (3799 is your target process pid):
truss -w1,2 -t write -p 3799
dtrace:
http://docs.oracle.com/cd/E18752_01/html/819-5488/gcgkk.html
truss:
http://docs.oracle.com/cd/E36784_01/html/E36870/truss-1.html#scrolltoc

How to do parallel processing in Unix Shell script?

I have a shell script that transfers a build.xml file to a remote unix machine (devrsp02) and executes the ANT task wldeploy on that machine (devrsp02). Now, this wldeploy task takes around 15 minutes to complete and while this is running, the last line at the unix console is -
"task {some digit} initialized".
Once this task is complete, we get a "task Completed" msg and the next task in the script is executed only after that.
But sometimes, there might be a problem with the weblogic domain and the deployment might be failing internally, with no effect on the status of the wldeploy task. The unix console will still be stuck at "task {some digit} initialized". The error of the deployment will be getting logged in a file called output.a
So, what I want now is -
Start a time counter before running wldeploy. If the wldeploy runs for more than 15 minutes, the following command should be run -
tail -f output.a ## without terminating the wldeploy
or
cat output.a ## after terminating the wldeploy forcefully
Point to be noted here is - I can't run the wldeploy task in background, as in that case the user won't get to know when the task is complete, which is crucial for this script.
Could you please suggest anything to achieve this?
Create this script (deploy.sh for example):
#!/bin/sh
sleep 900 && pkill -n wldeploy && cat output.a &
wldeploy
Then from the console
chmod +x deploy.sh
Then run
./deploy.sh
This script will start a counter (15 minutes) that will forcibly kill the wldeploy process if it's running, and if the process was running you'll see the contents of output.a.
If the script has terminated then pkill will not return true and output.a will not be shown.
I would call this task monitoring rather than "parallel processing" :)
This will only kill the wldeploy process it started, tell you whether wldeploy returned success or failure, and run no more than 30 seconds after wldeploy finishes.
It should be sh-compatible, but the /bin/sh I've got access to now seems to have a broken wait command.
#!/bin/ksh
wldeploy &
while [ ${slept:-0} -le 900 ]; do
sleep 30 && slept=`expr ${slept:-0} + 30`
if [ $$ = "`ps -o ppid= -p $!`" ]; then
echo wldeploy still running
else
wait $! && echo "wldeploy succeeded" || echo "wldeploy failed"
break
fi
done
if [ $$ = "`ps -o ppid= -p $!`" ]; then
echo "wldeploy did not finish in $slept seconds, killing it"
kill $!
cat output.a
fi
For the part without terminating the wldeploy it is easy, just execute before
{ sleep 900; tail -f output.a; } &
For the part with kill it, it is more complex, as you have determine the PID of the wldeploy process. The answer of pra is exactly doing that, so I would just refer to that.

Resources