How to get the trace of process? - unix

Suppose there is a process which is in inactive state from many days and i want to know until what time the process is in active state. Other than the log records where i can get that information?
This is unix platform.

Use strace debugging utility. You can attach to already running process, save output to log file and analyse it later.
[root#localhost ~]#
[root#localhost ~]# strace -o log -p 7166
Process 7166 attached - interrupt to quit

Related

AWS Code Deploy - Script at specified location: scripts/validate_service.sh failed with exit code 1

My deployments fail on last step Validate Service with error message:
The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems.
Events log
No lines are selected.
My validate_service.sh contain
#!/bin/bash
# verify we can access our webpage successfully
curl -v --silent localhost:80 2>&1 | grep Welcome
Can someone advice what should I change ?
Script return value matters. Yours looks good to me. I just added couple of seconds to wait until application starts up.
In case you use bash -x together with pipeline of commands, you better add shopt -s pipefail so all pipeline fails when one of the commands fails.
Checkout my script:
#!/bin/bash
sleep 5
curl http://localhost:3009 | grep Welcome

Phabricator Daemon: `phd` was unable to switch to the correct user with `sudo`

I am currently trying to install and run Phabricator on a Raspberry Pi for personal use (even though It's not recommended by Phacility, I thought I still give it a try). So far, I was able to setup everything except the phd user as daemon.
/etc/passwd
phd:x:1001:1001:,,,:/home/phd:/bin/bash
/etc/shadow
phd:NP:17107:0:99999:7:::
I created the user phd and gave im NP in shadow, but that still makes Phabricator unable to switch to phd when starting the daemon.
sudo ./bin/phd restart
Interrupting process 19517...
Process 19517 exited.
Freeing active task leases...
Freed 0 task lease(s).
Starting daemons as phd
Launching daemons:
(Logs will appear in "/var/tmp/phd/log/daemons.log".)
PhabricatorRepositoryPullLocalDaemon (Static)
PhabricatorTriggerDaemon (Static)
PhabricatorTaskmasterDaemon (Autoscaling: group=task, pool=4, reserve=0)
Usage Exception: Daemons are configured to run as user "phd" in
configuration option `phd.user`, but the current user is "root" and
`phd` was unable to switch to the correct user with `sudo`. Command output:
Command failed with error #255!
COMMAND
exec sudo -En -u 'phd' -- ./phd-daemon '--verbose'
STDOUT
(empty)
STDERR
[2016-11-04 08:54:54] EXCEPTION: (Exception) Specified daemon PID directory
('/var/tmp/phd/pid') does not exist or is not writable by the daemon user!
at [<phutil>/src/daemon/PhutilDaemonOverseer.php:115]
arcanist(head=master, ref.master=fad85844314b), phabricator(head=master,
ref.master=6982bded7124), phutil(head=master, ref.master=2b7b1007bf87)
#0 PhutilDaemonOverseer::__construct(array) called at
[<phabricator>/scripts/daemon/launch_daemon.php:13]
What I tried is starting the phd user via su phd -c "/home/phd/phabricator/bin/phd restart" but that queries a password from me.
I kept close to this guide https://secure.phabricator.com/book/phabricator/article/diffusion_hosting/ as well as this https://gist.github.com/sparrc/b4eff48a3e7af8411fc1
Any help is really, really appreciated!
Thanks to #JSON who just made me aware of a line that I apparently always missed, the solution was:
sudo chmod go+w /var/tmp/phd/pid
This will make the directoy writeable and free for all and let me start the error
We usually run
sudo -u phd ./bin/phd restart

Kill all R processes that hang for longer than a minute

I use crontask to regularly run Rscript. Unfortunately, I need to do this on a small instance of aws and the process may hang, building more and more processes on top of each other until the whole system is lagging.
I would like to write a crontask to kill all R processes lasting longer than one minute. I found another answer on Stack Overflow that I've adapted that I think would solve the problem. I came up with;
if [[ "$(uname)" = "Linux" ]];then killall --older-than 1m "/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";fi
I copied the task directly from htop, but it does not work as I expect. I get the No such file or directory error but I've checked it a few times.
I need to kill all R processes that have lasted longer than a minute. How can I do this?
You may want to avoid killing processes from another user and try SIGKILL (kill -9) after SIGTERM (kill -15). Here is a script you could execute every minute with a CRON job:
#!/bin/bash
PROCESS="R"
MAXTIME=`date -d '00:01:00' +'%s'`
function killpids()
{
PIDS=`pgrep -u "${USER}" -x "${PROCESS}"`
# Loop over all matching PIDs
for pid in ${PIDS}; do
# Retrieve duration of the process
TIME=`ps -o time:1= -p "${pid}" |
egrep -o "[0-9]{0,2}:?[0-9]{0,2}:[0-9]{2}$"`
# Convert TIME to timestamp
TTIME=`date -d "${TIME}" +'%s'`
# Check if the process should be killed
if [ "${TTIME}" -gt "${MAXTIME}" ]; then
kill ${1} "${pid}"
fi
done
}
# Leave a chance to kill processes properly (SIGTERM)
killpids "-15"
sleep 5
# Now kill remaining processes (SIGKILL)
killpids "-9"
Why imply an additional process every minute with cron?
Would it not be easier to start R with timeout from coreutils, the processes will then be killed automatically after the time you chose.
timeout [option] duration command [arg]…
I think the best option is to do this with R itself. I am no expert, but it seems the future package will allow executing a function in a separate thread. You could run the actual task in a separate thread, and in the main thread sleep for 60 seconds and then stop().
Previous Update
user1747036's answer which recommends timeout is a better alternative.
My original answer
This question is more appropriate for superuser, but here are a few things wrong with
if [[ "$(uname)" = "Linux" ]];then
killall --older-than 1m \
"/usr/lib/R/bin/exec/R --slave --no-restore --file=/home/ubuntu/script.R";
fi
The name argument is either the name of image or path to it. You have included parameters to it as well
If -s signal is not specified killall sends SIGTERM which your process may ignore. Are you able to kill a long running script with this on the command line? You may need SIGKILL / -9
More at http://linux.die.net/man/1/killall

rsync over ssh hangs in a file sync daemon

I am writing a file syncing application where I collect event from the filesystem whenever the file is modified and than later I copy it over to remote share via rsync over ssh. In my setup I have a slot which is connected to a QTimer. Each 5 seconds I pick a file from a sqlite db for synchronization and start a QProcess::start with the following parameters
/usr/bin/rsync -a /aufs/another-test-folder/testfile286.txt --rsh="ssh -p 8023" user#myserver.de:/home/neox/another-test-folder/testfile286.txt --rsync-path="mkdir -p /home/neox/another-test-folder && rsync"
I have at most 2 rsync processes running in parallel. This results in a process tree:
MyApp
\_rsync
| \_ssh
|_rsync
\_ssh
The problem is that sometimes the application hangs and the ps says that ssh processes have gone zombie. First I have tried to kill MyApp with SIGKILL but no luck. Than I moved on to kill rsync and ssh but still no luck. The whole tree hangs. And if I try to start the daemon from another console or even try to ssh to another box, I can't. My idea here is that somewhere ssh blocks some IO resources. Any idea how to solve this?
P.S. This happens randomly and not often

LSF bsub: job always in PENDING state, not going to RUN state

I stuck on a small problem.
I'm launching many bsub commands at the same time each one on a specified host:
bsub -sp 20 -W 0:5 -m $myhostname -q "myQueue" -J "mkdir_script" -o $log_file "script_to_launch param1 param2 param3"
all this inside a for, for each hostName.
The problem is that everything is OK for all hosts except one (always the same one). The job is always in PENDING state, and is not moving to RUN state.
The script to execute is a script that will check for a folder and creating it if is not there (so a very small task to do).
Is there a way to see what happens on that host and why my job is not going to RUN state ?
PS: I just found the bjobs -p command and I have the following message:
Not specified in job submission: 81 hosts;
Closed by LSF administrator: 3 hosts;
What does this message mean?
The -m option limits you to a particular host, which excludes 81 hosts. The other three have been closed by your system administrator. You would have to contact them to find out why.

Resources