I'm making a project in which the parent forks a terminator process that kills one random child of the parent. It seems that it creates some problems.
Is it allowed?
Not only is that allowed: any process running as the same user or as the root user can kill any other process.
Running as my user ID in a terminal I can kill anything with my user ID. Even the terminal that I am running in. Or my GUI process.
The only process in Unix OS types that is immune to kill is PID 1 aka init. Because killing init would result in an immediate kernel panic. If PID 1 exits for any reason, such as an internal bug and segmentation fault, there's an immediate kernel panic.
It is allowed. Write the following code in parent.sh
terminator() {
sleep 2;
echo "(terminator) Going to kill Pid $1"
kill -9 "$1" && echo "(terminator) Pid $1 killed"
}
sleep 7 &
sleep 7 &
sleep 7 &
pid=$!
echo "Random pid=${pid} will be killed"
sleep 7 &
sleep 7 &
terminator ${pid} &
echo "All started"
ps -ef | sed -n '1p; /sleep 7/p'
sleep 3
echo "After kill"
ps -ef | sed -n '1p; /sleep 7/p'
Background processes are childs. The terminator child will kill a random other child after 2 seconds.
Random pid=6781 will be killed
All started
UID PID PPID C STIME TTY TIME CMD
notroot 6779 6777 0 16:59 pts/0 00:00:00 sleep 7
notroot 6780 6777 0 16:59 pts/0 00:00:00 sleep 7
notroot 6781 6777 0 16:59 pts/0 00:00:00 sleep 7
notroot 6782 6777 0 16:59 pts/0 00:00:00 sleep 7
notroot 6783 6777 0 16:59 pts/0 00:00:00 sleep 7
(terminator) Going to kill Pid 6781
(terminator) Pid 6781 killed
parent.sh: line ...: 6781 Killed sleep 7
After kill
UID PID PPID C STIME TTY TIME CMD
notroot 6779 6777 0 16:59 pts/0 00:00:00 sleep 7
notroot 6780 6777 0 16:59 pts/0 00:00:00 sleep 7
notroot 6782 6777 0 16:59 pts/0 00:00:00 sleep 7
notroot 6783 6777 0 16:59 pts/0 00:00:00 sleep 7
Related
Here is a little reproducible example:
library(doMC)
library(doParallel)
registerDoMC(4)
timing <- system.time( fitall <- foreach(i=1:1000, .combine = "c") %dopar% {
print(i)
})
I start up R and look at the process table:
> system("ps -efl")
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S chbr 1 0 5 80 0 - 21399 wait 10:58 ? 00:00:00 /usr/local/lib/R/bin/exec/R --no-save --no-restore
0 S chbr 9 1 0 80 0 - 1113 wait 10:58 ? 00:00:00 sh -c ps -efl
0 R chbr 10 9 0 80 0 - 4294 - 10:58 ? 00:00:00 ps -efl
If I use the aformentioned simple for loop doMC or doParallel leave a zombie process behind. Output of ps -efl after running the loop:
> system("ps -efl")
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S chbr 1 0 4 80 0 - 25256 wait 11:00 ? 00:00:00 /usr/local/lib/R/b
1 Z chbr 10 1 0 80 0 - 0 exit 11:00 ? 00:00:00 [R] <defunct>
0 S chbr 12 1 0 80 0 - 1113 wait 11:00 ? 00:00:00 sh -c ps -efl
0 R chbr 13 12 0 80 0 - 4294 - 11:00 ? 00:00:00 ps -efl
If I repeat the loop without issuing registerDoMC(4) again no additional zombie process gets created. However, if I issue registerDoMC(4) an additional zombie process gets created:
> system("ps -efl")
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S chbr 1 0 0 80 0 - 25554 wait 11:00 ? 00:00:01 /usr/local/lib/R/b
1 Z chbr 21 1 0 80 0 - 0 exit 11:02 ? 00:00:00 [R] <defunct>
1 Z chbr 22 1 0 80 0 - 0 exit 11:02 ? 00:00:00 [R] <defunct>
0 S chbr 26 1 0 80 0 - 1113 wait 11:03 ? 00:00:00 sh -c ps -efl
0 R chbr 27 26 0 80 0 - 4294 - 11:03 ? 00:00:00 ps -efl
That's how I figured it could be doMC which is doing something that should not be done. If doMC is causing this is there a way to stop doMC from leaving zombie processes behind? (stopCluster() does not work as no cluster gets created in the first place.)
> sessionInfo()
R Under development (unstable) (2014-08-16 r66404)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_IE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_IE.UTF-8 LC_COLLATE=en_IE.UTF-8
[5] LC_MONETARY=en_IE.UTF-8 LC_MESSAGES=en_IE.UTF-8
[7] LC_PAPER=en_IE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] doParallel_1.0.8 doMC_1.3.3 iterators_1.0.7 foreach_1.4.2
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.2.0
This really has nothing to do with foreach or doMC; as Steve Weston has pointed out in answer to other StackOverflow queries, doMC is essentially just a wrapper for mclapply, and you can see zombie processes created with a simple call to mclapply:
library(parallel)
mclapply(rep(5,4), rnorm)
On my system, this leaves two zombie processes:
[richcalaway#richcalaway-pc ~]$ ps -efl | grep defunct
1 Z 1660945517 28701 28624 0 77 0 - 0 exit 12:00 pts/1 00:00:00 [R] <defunct>
1 Z 1660945517 28702 28624 0 78 0 - 0 exit 12:00 pts/1 00:00:00 [R] <defunct>
0 S 1660945517 28704 28308 0 78 0 - 15306 pipe_w 12:00 pts/2 00:00:00 grep defunct
Under normal circumstances, these zombie processes won't cause any trouble, and they do disappear when the R session ends. You can avoid them by using doParallel and a fork cluster instead of using doMC.
Cheers,
Rich Calaway
Principal Program Manager
Revolution Analytics
If I am running a long-running process, and when I stop it with Ctrl+Z, I get the following message in my terminal:
76381 suspended git clone git#bitbucket.org:kevinburke/<large-repo>.git
What actually happens when the process is suspended? Is the state held in memory? Is this functionality implemented at the operating system level? How is the process able to resume execution right where it left off when I restart it with fg?
When you hit Ctrl+Z in a terminal, the line-discipline of the (pseudo-)terminal device driver (the kernel) sends a SIGTSTP signal to all the processes in the foreground process group of the terminal device.
That process group is an attribute of the terminal device. Typically, your shell is the process that defines which process group is the foreground process group of the terminal device.
In shell terminology, a process group is called a "job", and you can put a job in foreground and background with the fg and bg command and find out about the currently running jobs with the jobs command.
The SIGTSTP signal is like the SIGSTOP signal except that contrary to SIGSTOP, SIGTSTP can be handled by a process.
Upon reception of such a signal, the process is suspended. That is, it's paused and still there, only it won't be scheduled for running any more until it's killed or sent a SIGCONT signal to resume execution. The shell that started the job will be waiting for the leader of the process group in it. If it is suspended, the wait() will return indicating that the process was suspended. The shell can then update the state of the job and tell you it is suspended.
$ sleep 100 | sleep 200 & # start job in background: two sleep processes
[1] 18657 18658
$ ps -lj # note the PGID
F S UID PID PPID PGID SID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 10031 18657 26500 18657 26500 0 85 5 - 2256 - pts/2 00:00:00 sleep
0 S 10031 18658 26500 18657 26500 0 85 5 - 2256 - pts/2 00:00:00 sleep
0 R 10031 18692 26500 18692 26500 0 80 0 - 2964 - pts/2 00:00:00 ps
0 S 10031 26500 26498 26500 26500 0 80 0 - 10775 - pts/2 00:00:01 zsh
$ jobs -p
[1] + 18657 running sleep 100 |
running sleep 200
$ fg
[1] + running sleep 100 | sleep 200
^Z
zsh: suspended sleep 100 | sleep 200
$ jobs -p
[1] + 18657 suspended sleep 100 |
suspended sleep 200
$ ps -lj # note the "T" under the S column
F S UID PID PPID PGID SID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 T 10031 18657 26500 18657 26500 0 85 5 - 2256 - pts/2 00:00:00 sleep
0 T 10031 18658 26500 18657 26500 0 85 5 - 2256 - pts/2 00:00:00 sleep
0 R 10031 18766 26500 18766 26500 0 80 0 - 2964 - pts/2 00:00:00 ps
0 S 10031 26500 26498 26500 26500 0 80 0 - 10775 - pts/2 00:00:01 zsh
$ bg %1
[1] + continued sleep 100 | sleep 200
$ ps -lj
F S UID PID PPID PGID SID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 S 10031 18657 26500 18657 26500 0 85 5 - 2256 - pts/2 00:00:00 sleep
0 S 10031 18658 26500 18657 26500 0 85 5 - 2256 - pts/2 00:00:00 sleep
0 R 10031 18824 26500 18824 26500 0 80 0 - 2964 - pts/2 00:00:00 ps
0 S 10031 26500 26498 26500 26500 0 80 0 - 10775 - pts/2 00:00:01 zsh
I need a shell script to search and replace a file. Details are below. Pl help
Basically everyday i get some files into my directory.Like below i get 100 files
AllResponses_11003_6_20_2013.txt
AllResponses_11004_6_20_2013.txt
AllResponses_11005_6_20_2013.txt
AllResponses_11006_6_20_2013.txt
AllResponses_11007_6_20_2013.txt
AllResponses_11008_6_20_2013.txt
AllResponses_11009_6_20_2013.txt
AllResponses_11010_6_20_2013.txt
AllResponses_11011_6_20_2013.txt
AllResponses_11012_6_20_2013.txt
Among those i need to copy 5 files to another directory based on file number(11003,11004,11005,11006,11007)
AllResponses_11003_6_20_2013.txt
AllResponses_11004_6_20_2013.txt
AllResponses_11005_6_20_2013.txt
AllResponses_11006_6_20_2013.txt
AllResponses_11007_6_20_2013.txt
If not find , then need to replace with 0 byte files for those 5 files into another directory.
Pl help
But how to pass num as 11003,11004,11005,11006,11007 among hundreds of numbers like 11003 to 11100
Pl help...
export SRCDIR=/informat/PowerCenter/9.1.0/server/infa_shared/SrcFiles/CSI/historical
export TGTDIR=/informat/PowerCenter/9.1.0/server/infa_shared/SrcFiles/CSI/incoming
export FILEDT=6_15_2013
export FILEDT=$(date +"%-m_%-d_%Y")
looping for to search and copy files
for FILE_NUM in "$#";
do
GET_FNAME="AllResponses_"${FILE_NUM}"_"${FILEDT}"*.txt"
if [ -f ${GET_FNAME} ]; then
cp ${SRCDIR}/${GET_FNAME} ${TGTDIR}
else
echo "File ${GET_FNAME} is missing in ${SRCDIR}"
touch ${TGTDIR}/AllResponses_${FILE_NUM}_${FILEDT}.txt
echo "Created ${GET_FNAME} touch file in ${TGTDIR}"
fi done
iam done like above and executing as ksh -x csi_file_copy_bala.ksh 11003 11004 99999
but its always going to else clause..please help me...
my files nales looks like...AllResponses_11004_6_11_20132_18_00AM1.txt
Pl help me...as iam running out of time
Thanks in advance
Assuming by shell u mean bash:
Skeleton to start with:
luk32:~/projects/tests$ cat ./process_files.sh
#!/bin/bash
DEST=./copies
for num in "$#"; do
file="AllResponses_"$num"_6_20_2013.txt"
if [ -f $file ]; then
cp $file $DEST
else
touch $DEST/$file
fi
done;
It takes numbers as arguments, then tries to find a file with given pattern in current working directory. If found copy to destination folder, else touch the file.
You will probably have to tinker a little bit to get friendlier than hard-coded date handling.
Example:
luk32:~/projects/tests$ ls -l
total 40116
-rw-r--r-- 1 luk32 luk32 4 cze 21 11:33 AllResponses_1_6_20_2013.txt
-rw-r--r-- 1 luk32 luk32 5 cze 21 11:33 AllResponses_3_6_20_2013.txt
-rw-r--r-- 1 luk32 luk32 0 cze 21 11:32 AllResponses_4_6_20_2013.txt
luk32:~/projects/tests$ ls -l ./copies/
total 0
luk32:~/projects/tests$ ./process_files.sh 1 2 3 4
luk32:~/projects/tests$ ls -l ./copies/
total 8
-rw-r--r-- 1 luk32 luk32 4 cze 21 11:35 AllResponses_1_6_20_2013.txt
-rw-r--r-- 1 luk32 luk32 0 cze 21 11:35 AllResponses_2_6_20_2013.txt
-rw-r--r-- 1 luk32 luk32 5 cze 21 11:35 AllResponses_3_6_20_2013.txt
-rw-r--r-- 1 luk32 luk32 0 cze 21 11:35 AllResponses_4_6_20_2013.txt
I am unsure why is this rsync command is not syncing?
rsync -v -e root#ec2-X.compute-1.amazonaws.com:/var/log/apache2/USAGE-log.txt splunk-rync-logs/log.txt
I see this returned after that command which appears OK.
building file list ... done
-rw-r--r-- 0 2012/03/26 19:28:00 log.txt
sent 28 bytes received 12 bytes 80.00 bytes/sec
total size is 0 speedup is 0.00
BUT no data is added to the local file that is supposed to be being synced with the remote file:
ls -al
total 0
drwxr-xr-x 3 bd staff 102 Mar 26 19:28 .
drwxr-xr-x+ 54 bd staff 1836 Mar 26 19:28 ..
-rwxrwxrwx 1 bd staff 0 Mar 26 19:28 log.txt
Any advice?
The syntax for the -e option (rsync version 3.0.8 protocol version 30) is:
-e, --rsh=COMMAND specify the remote shell to use
For use such as -e 'ssh -p 2234'.
Maybe you have a different version, but that's where I'd start looking.
I am submitting a job using qsub that runs parallelized R. My
intention is to have R programme running on 4 different cores rather than 8 cores. Here are some of my settings in PBS file:
#PBS -l nodes=1:ppn=4
....
time R --no-save < program1.R > program1.log
I am issuing the command ta job_id and I'm seeing that 4 cores are listed. However, the job occupies a large amount of memory(31944900k used vs 32949628k total). If I were to use 8 cores, the jobs got hang due to memory limitation.
top - 21:03:53 up 77 days, 11:54, 0 users, load average: 3.99, 3.75, 3.37
Tasks: 207 total, 5 running, 202 sleeping, 0 stopped, 0 zombie
Cpu(s): 30.4%us, 1.6%sy, 0.0%ni, 66.8%id, 0.0%wa, 0.0%hi, 1.2%si, 0.0%st
Mem: 32949628k total, 31944900k used, 1004728k free, 269812k buffers
Swap: 2097136k total, 8360k used, 2088776k free, 6030856k cached
Here is a snapshot when issuing command ta job_id
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1794 x 25 0 6247m 6.0g 1780 R 99.2 19.1 8:14.37 R
1795 x 25 0 6332m 6.1g 1780 R 99.2 19.4 8:14.37 R
1796 x 25 0 6242m 6.0g 1784 R 99.2 19.1 8:14.37 R
1797 x 25 0 6322m 6.1g 1780 R 99.2 19.4 8:14.33 R
1714 x 18 0 65932 1504 1248 S 0.0 0.0 0:00.00 bash
1761 x 18 0 63840 1244 1052 S 0.0 0.0 0:00.00 20016.hpc
1783 x 18 0 133m 7096 1128 S 0.0 0.0 0:00.00 python
1786 x 18 0 137m 46m 2688 S 0.0 0.1 0:02.06 R
How can I prevent other users from using the other 4 cores? I like to mask somehow that my job is using 8 cores with 4 cores idling.
Could anyone kindly help me out on this? Can this be solved using pbs?
Many Thanks
"How can I prevent other users from using the other 4 cores? I like to mask somehow that my job is using 8 cores with 4 cores idling."
Maybe a simple way around it is to send a 'sleep' job on the other 4? Seems hackish though! (ans warning, my PBS is rusty!)
Why not do the following -
ask PBS for ppn=4, additionally, ask for all the memory on the node, i e
#PBS -l nodes=1:ppn=4 -l mem=31944900k
This might not be possible on your setup.
I am not sure how R is parallelized, but if it is OPENMP you could definitely ask for 8 cores but set OMP_NUM_THREADS to 4