How to exclude parent Unix processes from grepped output from ps - unix

I have got a file of pids and am using ps -f to get information about the pids.
Here is an example..
ps -eaf | grep -f myfilename
myuser 14216 14215 0 10:00 ? 00:00:00 /usr/bin/ksh /home/myScript.ksh
myuser 14286 14216 0 10:00 ? 00:00:00 /usr/bin/ksh /home/myScript.ksh
where myfilename contains only 14216.
I've got a tiny problem where the output is giving me parent process id's as well as the child. I want to exclude the line for the parent process id.
Does anyone know how I could modify my command to exclude parent process keeping in mind that I could have many process id's in my input file?

Hard to do with just grep but easy to do with awk.
Invoke the awk script below from the following command:
ps -eaf | awk -f script.awk myfilename -
Here's the script:
# process the first file on the command line (aka myfilename)
# this is the list of pids
ARGIND == 1 {
pids[$0] = 1
}
# second and subsequent files ("-"/stdin in the example)
ARGIND > 1 {
# is column 2 of the ps -eaf output [i.e.] the pid in the list of desired
# pids? -- if so, print the entire line
if ($2 in pids)
printf("%s\n",$0)
}
UPDATE:
When using GNU awk (gawk), the following may be ignored. For other [obsolete] versions, insert the following code at the top:
# work around old, obsolete versions
ARGIND == 0 {
defective_awk_flag = 1
}
defective_awk_flag != 0 {
if (FILENAME != defective_awk_file) {
defective_awk_file = FILENAME
ARGIND += 1
}
}
UPDATE #2:
The above is all fine. Just for fun, here's an alternate way to do the same thing with perl. One of the advantages is that everything can be contained in the script and no pipeline is necessary.
Invoke the script via:
./script.pl myfilename
And, here's script.pl. Note: I don't write idiomatic perl. My style is more akin to what one would expect to see in other languages like C, javascript, etc.:
#!/usr/bin/perl
master(#ARGV);
exit(0);
# master -- master control
sub master
{
my(#argv) = #_;
my($xfsrc);
my($pidfile);
my($buf);
# NOTE: "chomp" is a perl function that strips newlines
# get filename with list of pids (e.g. myfilename)
$pidfile = shift(#argv);
open($xfsrc,"<$pidfile") ||
die("master: unable to open '$pidfile' -- $!\n");
# create an associative array (a 'hash" in perl parlance) of the desired
# pid numbers
while ($pid = <$xfsrc>) {
chomp($pid);
$pid_desired{$pid} = 1;
}
close($xfsrc);
# run the 'ps' command and capture its output into an array
#pslist = (`ps -eaf`);
# process the command output, line-by-line
foreach $buf (#pslist) {
chomp($buf);
# the pid number we want is in the second column
(undef,$pid) = split(" ",$buf);
# print the line if the pid is one of the ones we want
print($buf,"\n")
if ($pid_desired{$pid});
}
}

Use this command:
ps -eaf | grep -f myfilename | grep -v grep | grep -f myfilename

Related

Unable to use -C of grep in Unix Shell Script

I am able to use grep in normal command line.
grep "ABC" Filename -C4
This is giving me the desired output which is 4 lines above and below the matched pattern line.
But if I use the same command in a Unix shell script, I am unable to grep the lines above and below the pattern. It is giving me output as the only lines where pattern is matched and an error in the end that cannot says cannot open grep : -C4
The results are similar if I use -A4 and -B4
I'll assume you need a portable POSIX solution without the GNU extensions (-C NUM, -A NUM, and -B NUM are all GNU, as are arguments following the pattern and/or file name).
POSIX grep can't do this, but POSIX awk can. This can be invoked as e.g. grepC -C4 "ABC" Filename (assuming it is named "grepC", is executable, and is in your $PATH):
#!/bin/sh
die() { echo "$*\nUsage: $0 [-C NUMBER] PATTERN [FILE]..." >&2; exit 2; }
CONTEXT=0 # default value
case $1 in
-C ) CONTEXT="$2"; shift 2 ;; # extract "4" from "-C 4"
-C* ) CONTEXT="${1#-C}"; shift ;; # extract "4" from "-C4"
--|-) shift ;; # no args or use std input (implicit)
-* ) [ -f "$1" ] || die "Illegal option '$1'" ;; # non-option non-file
esac
[ "$CONTEXT" -ge 0 ] 2>/dev/null || die "Invalid context '$CONTEXT'"
[ "$#" = 0 ] && die "Missing PATTERN"
PATTERN="$1"
shift
awk '
/'"$PATTERN"'/ {
match='$CONTEXT'
for(i=1; i<=CONTEXT; i++) if(NR>i) print last[i];
print
next
}
match { print; match-- }
{ for(i='$CONTEXT'; i>1; i--) last[i] = last[i-1]; last[1] = $0 }
' "$#"
This sets up die as a fatal error function, then finds the desired lines of context from your arguments (either -C NUMBER or -CNUMBER), with an error for unsupported options (unless they're files).
If the context is not a number or there is no pattern, we again fatally error out.
Otherwise, we save the pattern, shift it away, and reserve the rest of the options for handing to awk as files ("$#").
There are three stanzas in this awk call:
Match the pattern itself. This requires ending the single-quote portion of the string in order to incorporate the $PATTERN variable (which may not behave correctly if imported via awk -v). Upon that match, we store the number of lines of context into the match variable, loop through the previous lines saved in the last hash (if we've gone far enough to have had them), and print them. We then skip to the next line without evaluating the other two stanzas.
If there was a match, we need the next few lines for context. As this stanza prints them, it decrements the counter. A new match (previous stanza) will reset that count.
We need to save previous lines for recalling upon a match. This loops through the number of lines of context we care about and stores them in the last hash. The current line ($0) is stored in last[1].

How to use awk for multiple file search in two directories, print records only from files with matching string in second directory

Remade a previous question so that it is more clear. I'm trying to search files in two directories and print matching character strings (+ line immediately following) into a new file from the second directory only if they match a record in the first directory. I have found similar examples but nothing quite the same. I don't know how to use awk for multiple files from different directories and I've tortured myself trying to figure it out.
Directory 1, 28,000 files, formatted viz.:
>ABC
KLSDFIOUWERMSDFLKSJDFKLSJDSFKGHGJSNDKMVMFHKSDJFS
>GHI
OOILKJSDFKJSDFLMOPIWERIOUEWIRWIOEHKJTSDGHLKSJDHGUIYIUSDVNSDG
Directory 2, 15 files, formatted viz.:
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>DEF
12341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
Desired output:
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
Directories 1 and 2 are located in my home directory: (./Test1 & ./Test2)
If anyone could advise command to specific the different directories, I'd be immensely grateful! Currently when I include file path (e.g., /Test1/*.fa) I get the following error:
awk: can't open file /Test1/*.fa
You'll want something like this (untested):
awk '
FNR==1 {
dirname = FILENAME
sub("/.*","",dirname)
if (NR==1) {
dirname1 = dirname
}
}
dirname == dirname1 {
if (FNR % 2) {
key = $0
}
else {
map[key] = $0
}
next
}
(FNR % 2) && ($0 in map) && !seen[$0,map[$0]]++ {
print $0 ORS map[$0]
}
' Test1/* Test2/*
Given you're getting the error message /usr/bin/awk: Argument list too long which means you're exceeding your shells maximum argument length for a command and that 28,000 of your files are in the Test1 directory, try this:
find Test1 -type f -exec cat {} \; |
awk '
NR == FNR {
if (FNR % 2) {
key = $0
}
else {
map[key] = $0
}
next
}
(FNR % 2) && ($0 in map) && !seen[$0,map[$0]]++ {
print $0 ORS map[$0]
}
' - Test2/*
Solution in TXR:
Data:
$ ls dir*
dir1:
file1 file2
dir2:
file1 file2
$ cat dir1/file1
>ABC
KLSDFIOUWERMSDFLKSJDFKLSJDSFKGHGJSNDKMVMFHKSDJFS
>GHI
OOILKJSDFKJSDFLMOPIWERIOUEWIRWIOEHKJTSDGHLKSJDHGUIYIUSDVNSDG
$ cat dir1/file2
>XYZ
SDOIWEUROIUOIWUEROIWUEROIWUEROIWUEROUIEIDIDIIDFIFI
>MNO
OOIWEPOIUWERHJSDHSDFJSHDF
$ cat dir2/file1
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>DEF
12341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
$ cat dir2/file2
>STP
12341234123412341234123412341234123412341234123412341234123412341234123412341234
>MNO
123412341234123412341234123412341234123412341234123412341234123412341234
$
Run:
$ txr filter.txr dir1/* dir2/*
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
>MNO
123412341234123412341234123412341234123412341234123412341234123412341234
Code in filter.txr:
#(bind want #(hash :equal-based))
#(next :args)
#(all)
#dir/#(skip)
#(and)
# (repeat :gap 0)
#dir/#file
# (next `#dir/#file`)
# (repeat)
>#key
# (do (set [want key] t))
# (end)
# (end)
#(end)
#(repeat)
#path
# (next path)
# (repeat)
>#key
#datum
# (require [want key])
# (output)
>#key
#datum
# (end)
# (end)
#(end)
To separate the dir1 paths from the rest, we use an #(all) match (try multiple pattern branches, which must all match) with two branches. The first branch matches one #dir/#(skip) pattern, binding the variable dir to text that is preceded by a slash, and ignore the rest. The second branch matches a whole consecutive sequence of #dir/#file patterns via #(repeat :gap 0). Because the same dir variable appears that already has a binding from the first branch of the all, this constrains the matches to the same directory name. Inside this repeat we recurse into each file via next and gather the >-delimited keys into the keep hash. After that, we process the remaining arguments as path names of files to process; they don't all have to be in the same directory. We scan through each one for the >#key pattern followed by a line of #datum. The #(require ...) directive will fail the match if key is not in the wanted hash, otherwise we fall through to the #(output).

How to retrieve unique IDs from the txt file?

I have a text file containing the sequence IDs. These Ids file contain some duplicate IDs. Few IDs are also present more then 2 times in this file. I want to find unique IDs in one file and repeated IDs in another file. Furthermore I am also interested to find the number, how many times the repeated IDs present in the file.
I found duplicated sequence using the following command
$ cat id.txt | grep '^>' | sort | uniq -d > dupid.txt
This gives me the duplicated sequences in "dupid.txt" file. But how do I get those that are present more then 2 times and how many times they are present? Secondly, how do I find unique sequences?
A bit of searching might have found this answer, with many suggestions on traditional uses of uniq.
Also, note that:
$ cat id.txt | grep '^>'
...is basically the same as:
$ grep '^>' id.txt
The so-called "Useless Use Of Cat"
But to your question - find uniq ids, dupes, and dupes with counts - here's a try using awk that processes its stdin, and writes to three output files the user must name, trying to avoid clobbering output files that already exist. One pass, but holds all input in memory before starting output.
#!/bin/bash
[ $# -eq 3 ] || { echo "Usage: $(basename $0) <uniqs> <dupes> <dupes_counts>" 1>&2; exit 1; }
chk() {
[ -e "$1" ] && { echo "$1: already exists" 1>&2; return 1; }
return $2
}
chk "$1" 0; chk "$2" $?; chk "$3" $? || exit 1
awk -v u="$1" -v d="$2" -v dc="$3" '
{
idc[$0]++
}
END {
for (id in idc) {
if (idc[id] == 1) {
print id >> u
} else {
print id >> d
printf "%d:%s\n", idc[id], id >> dc
}
}
}
'
Save as (for example) "doit.sh", and then invoke via:
$ grep '^>' id.txt | doit.sh uniques.txt dupes.txt dupes_counts.txt

Awk command in paragraph mode but skipping blank lines

I have one file with several elements <elem>...</elem>. I need to split this file into n files with m elements each one (argument passed to awk command I am using). For example if my original file has 40 elements, I would want to split in 3 files (10 elements, 13 elements and 17 elements).
The problem is that the original file has elements with different structures.
EDITED AFTER fedorqui comment:
I use as awk command as files I want to get at the end of the process.
That means If I need 3 files with m1, m2 and m3 elements, I will
execute 3 awk with different parameters
Example of input (file.txt) (5 elements)
<elem>aaaaaaaa1</elem>
<elem>aaaaaaaa2</elem>
<elem>bbbbbbbb
bbbbbbbbb
bbbbbbbbb</elem>
<elem>bbbbbbbb2</elem>
<elem>ccccc
cccc</elem>
As you can see, 1st/2nd/4th element is in one line, 3rd element is in 3 lines without blank lines and 5h element is in 3 lines with an blank line.
Blank lines between elements is not a problem but blank lines inside an element fails
Example of desired output:
file_1.txt (2 elements)
<elem>aaaaaaaa1</elem>
<elem>aaaaaaaa2</elem>
file_2.txt (2 elements)
<elem>bbbbbbbb
bbbbbbbbb
bbbbbbbbb</elem>
<elem>bbbbbbbb2</elem>
file_3.txt (1 element)
<elem>ccccc
cccc</elem>
AWK command
(suffixFile is the suffix number of the file. For example fileAux_1.txt, fileAux_2.txt...)
Attempt1
awk -v numElems=$1 -v suffixFile=$2 '{
for(i=1;i<=numElems;i++) {
printf "<doc>"$i > "fileAux_" suffixFile".txt"
}
}' RS='' FS='<doc>' file.txt
Works except for blank lines inside an element. I understand why it fails, because RS='' tells awk to split by blank lines
Attempt 2
awk -v numElems=$1 -v suffixFile=$2 '{
for(i=1;i<=numElems;i++) {
printf $i > "fileAux_" suffixFile".txt"
}
}' RS='<doc>' FS='<doc>' file.txt
Another aproach but it also fails
¿Can anyone help me?
Thanks in advance!
Assuming I understood your challenge correctly, here is my attempt:
$ cat script.sh
#!/bin/bash
awk -v numElems=$1 -v suffixFile=$2 '
/<elem>/{var++}
/<\/elem>/{var--; count++}
{if(count < numElems || (count == numElems && var == 0)) {
print $0 >> "file_"suffixFile".txt"
} else {
print $0
} }' $3
The script mainly keeps track of the <elem> and </elem> closures with the var and counts the pairs with count. Then an if statement decides whether to push the line to the file or not. Once the total number of elements is reached, the rest of the file is returned so you can reiterate the process using pipes.
Here is an example of how to run it with the final output:
$ ./script.sh 2 1 file.txt | ./script.sh 2 2 | ./script.sh 1 3
$ tail -n +1 file_*
==> file_1.txt <==
<elem>aaaaaaaa1</elem>
<elem>aaaaaaaa2</elem>
==> file_2.txt <==
<elem>bbbbbbbb
bbbbbbbbb
bbbbbbbbb</elem>
<elem>bbbbbbbb2</elem>
==> file_3.txt <==
<elem>ccccc
cccc</elem>

How can I show file sizes with commas when getting a directory listing with 'ls -l'?

You can do 'ls -l' to get a detailed directory listing like this:
-rw-rw-rw- 1 alice themonkeys 1159995999 2008-08-20 07:01 foo.log
-rw-rw-rw- 1 bob bob 244251992 2008-08-20 05:30 bar.txt
But notice how you have to slide your finger along the screen to figure out the order of magnitude of those file sizes.
What's a good way to add commas to the file sizes in the directory listing, like this:
-rw-rw-rw- 1 alice themonkeys 1,159,995,999 2008-08-20 07:01 foo.log
-rw-rw-rw- 1 bob bob 244,251,992 2008-08-20 05:30 bar.txt
I just discovered that it's built-in to GNU Core Utils and works for ls and du!
ls -l --block-size="'1"
du --block-size="'1"
It works on Ubuntu but sadly doesn't on OSX. More on variants of block size here
If the order of magnitude is all you're interested in, ls -lh does something like this:
-rw-r----- 1 alice themonkeys 626M 2007-02-05 01:15 foo.log
-rw-rw-r-- 1 bob bob 699M 2007-03-12 23:14 bar.txt
I don't think 'ls' has exactly that capability. If you are looking for readability, 'ls -lh' will give you file sizes that are easier for humans to parse.
-rw-rw-rw- 1 alice themonkeys 1.2G 2008-08-20 07:01 foo.log
-rw-rw-rw- 1 bob bob 244M 2008-08-20 05:30 bar.txt
Here's an improvement to commafy.pl, it allows you to use ls with or without listing the file sizes. Alias ls to commafy.pl to use it.
#!/usr/bin/perl
# Does ls and adds commas to numbers if ls -l is used.
$largest_number_of_commas = 0;
$result = `ls -C #ARGV`;
# First line adds five spaces before file size
$result =~ s/(^[-lrwxds]{10,}\s*[^\s]+\s*[^\s]+\s*[^\s]+)/$1 /gm;
$result =~ s/(.{5} )(\d{4,}) /truncatePre($1,$2).commafy($2).' '/eg;
$remove_extra_spaces = 5 - $largest_number_of_commas;
$result =~ s/(^[-lrwxds]{10,}\s*[^\s]+\s*[^\s]+\s*[^\s]+) {$remove_extra_spaces}/$1/gm;
print $result;
# adds commas to an integer as appropriate
sub commafy
{
my($num) = #_;
my $len = length($num);
if ($len <= 3) { return $num; }
return commafy(substr($num, 0, $len - 3)) . ',' . substr($num, -3);
}
# removes as many chars from the end of str as there are commas to be added
# to num
sub truncatePre
{
my($str, $num) = #_;
$numCommas = int((length($num)-1) / 3);
if ($numCommas > $largest_number_of_commas) {$largest_number_of_commas = $numCommas}
return substr($str, 0, length($str) - $numCommas);
}
This common sed script should work:
ls -l | sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'
However, I agree with the earlier comment suggesting ls -lh is probably the better general solution for the desired effect.
This is on OS X, so you might have to tweak it a bit for your Unix flavor. I created such a function for this purpose in my ~/.bashrc dot file. The trick is using ' in the awk printf format string for the file size. A caveat: the awk mangles the "total" first line somewhat, and loses terminal coloration as well. Otherwise, one of its merits is that it tries to keep columns aligned as much as possible. To me this instantly gives a visual estimation of how big a file is. The -h switch solution is okay, but your brain needs to convert those Ks, Bs, Gs. The biggest advantage to the solution below is that you can pipe it to sort and sort would understand it. As in "lc | sort -k5,5nr" for example.
lc() {
/bin/ls -l -GPT | /usr/bin/awk "{
printf \"%-11s \", \$1;
printf \"%3s \", \$2;
printf \"%-6s \", \$3;
printf \"%-6s \", \$4;
printf \"%'12d \", \$5;
printf \"%3s \", \$6;
printf \"%2s \", \$7;
for (i=8; i<=NF; i++) {
printf \"%s \", \$i
};
printf \"\n\";
}"
}
Here's a perl script that will filter the output of 'ls -l' to add the commas.
If you call the script commafy.pl then you can alias 'ls' to 'ls -l | commafy.pl'.
#!/usr/bin/perl -p
# pipe the output of ls -l through this to add commas to numbers.
s/(.{5} )(\d{4,}) /truncatePre($1,$2).commafy($2).' '/e;
# adds commas to an integer as appropriate
sub commafy
{
my($num) = #_;
my $len = length($num);
if ($len <= 3) { return $num; }
return commafy(substr($num, 0, $len - 3)) . ',' . substr($num, -3);
}
# removes as many chars from the end of str as there are commas to be added
# to num
sub truncatePre
{
my($str, $num) = #_;
$numCommas = int((length($num)-1) / 3);
return substr($str, 0, length($str) - $numCommas);
}
Actually, I was looking for a test for a young trainee and this seemed ideal. Here's what he came up with:
for i in $(ls -1)
do
sz=$(expr $(ls -ld $i | awk '{print $5}' | wc -c) - 1)
printf "%10d %s\n" $sz $i
done
It gives the order of magnitude for the size in a horribly inefficient way. I'll make this community wiki since we're both interested how you rate his code, but I don't want my rep suffering.
Feel free to leave comments (be gentle, he's a newbie, though you wouldn't guess it by his shell scripting :-).
I wrote this several years ago, works on stdin;
Read stdin & insert commas in numbers for readability emit to stdout.
example;
$ ls -l testdatafile.1000M
-rw-rw-r--+ 1 mkm wheel 1048576000 Apr 24 12:45 testdatafile.1000M
$ ls -l testdatafile.1000M | commas
-rw-rw-r--+ 1 mkm wheel 1,048,576,000 Apr 24 12:45 testdatafile.1000M
https://github.com/mikemakuch/commas
export LS_BLOCK_SIZE="'1"
will do this for recent versions of GNU ls.

Resources