Get a list of unique sender(from=) domains in postfix maillog - postfix-mta

I am currenlty trying to extract all the sender domains from maillog. I am able to do some of that with the below command but the output is not quite what I desired. What would be the best approach to retrieve a unique list of sender domain from maillog?
grep from= /var/log/maillog | awk '{print $7}' | sort | uniq -c | sort -n
output
1 from=<user#test.com>,
1 from=<apache#app1.com>,
2 from=<bounceld_5BFa-bx0p-P3tQ-67Nn#example.com>,
2 from=<bounceld_19iI-HqaS-usVU-fqe5#example.com>,
12 reject:
666 from=<>,
desired output
test.com
app1.com
example.com

See useless use of grep; if you are using Awk anyway, you don't really need grep at all.
awk '$7 ~ /from=.*#/{split($7, a, /#/); ++count[a[2]] }
END { for(dom in count) print count[dom], dom }' /var/log/maillog
Collecting the counts in an associative array does away with the need to call sort and uniq, too. Obviously, if you don't care about the count, don't print count[dom] at the end.

This should give you the answer:
grep from= /var/log/maillog | awk '{print $7}' | grep -Po '(?=#).{1}\K.*(?=>)' | sort -n | uniq -c
... change last items to "| sort | uniq" to remove the counts.
References:
https://www.baeldung.com/linux/bash-remove-first-characters {1}\K use
Extract email addresses from log with grep or sed -Po grep function

Related

Unix piping in system() does not work in R

I am using the system() command to run a Bash command in R, but every time I try to pipe the results of one command into the next (using '|'), I get some error.
For example:
system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t" '{print $2}'') returns the error: Error: unexpected '{' in "system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t" '{", and if I try to remove awk -F "\t" '{print $2}' so that I'm left with system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p'), I get the following:
/usr/bin/grep: 2-: No such file or directory
[1] 2
I have to keep removing parts of it till I am left with only system('grep ^SN bam_stats.txt'), AKA no pipes are left, for it to work.
Here is a sample from the file 'bam_stats.txt' from which I'm extracting information:
SN filtered sequences: 0
SN sequences: 137710356
SN is sorted: 1
SN 1st fragments: 68855178
SN last fragments: 68855178
SN reads mapped: 137642653
SN reads mapped and paired: 137602018 # paired-end technology bit set + both mates mapped
SN reads unmapped: 67703
SN percentage of properly paired reads (%): 99.8
Can someone tell me why piping is not working? Apologies if this is a stupid question. Please let me know if I should provide more information.
Thank you in advance.
I don't know R but IF Rs implementation of system() just passes it's argument to a shell then, in terms of standard Unix quoting, your example
system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t" '{print $2}'')
contains 2 strings within quotes and a string in the middle that's outside of quotes:
Inside: grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t"
Outside: {print $2}
Inside: <a null string>
because the 2 quotes in the middle around '{print $2}' are ending the first quoted string then later starting a second quoted string.
You don't need sed, grep, or cut if you're using awk anyway though so try just this:
system('awk -F"\t" "/^SN/ && (++cnt==8){print \$3}" bam_stats.txt')

Can someone explain me the unix command the following command

I want to validate the file. As per validation, I need to check the length of each column, null or not null and primary constant of that file.
cat File_name| awk -F '|' '{print NF}' | sort | uniq
This command split lines of the file on tokens using pipe | as delimiter, print number of tokens on each row (NF variable), sort the output (sort command) and on the end get only uniq numbers (uniq command).
The script can be optimised getting rid of cat command and combine it in awk and use parameter of sort to get uniq records:
awk -F '|' '{print NF}' file_name | sort -u

grep multiple files get count of unique cut

I think I'm close on this, and saw similar questions but couldn't get it to work as I want. So, I have several log files and I would like to count the occurrences of several different service calls by date.
First I tried the below, the cut is just to get the first element (date) and 11th element (name of service call), which is specific to my log file:
grep -E "invoking webservice" *.log* | cut -d ' ' -f1 -f11 | sort | uniq -c
But this returned something that looks like:
5 log_1.log:2017-12-05 getLegs()
10 log_1.log:2017-12-05 getArms()
7 log_2.log:2017-12-05 getLegs()
13 log_2.log:2017-12-04 getLegs()
What I really want is:
12 2017-12-05 getLegs()
10 2017-12-05 getArms()
13 2017-12-04 getLegs()
I've seen examples where they cat * first, but looks like the same problem.
cat * | grep -E "invoking webservice" *.log* | cut -d ' ' -f1 -f11 | sort | uniq -c
What am I doing wrong? As always, thanks a lot!
Your issue seems to be that grep prefixes the matched lines with the filenames. (grep has this behavior when multiple filenames are specified, to disambiguate the results.) You can pass the -h to grep to not print the filenames:
grep -h "invoking webservice" *.log | cut -d ' ' -f1 -f11 | sort | uniq -c
Note that I dropped the -E flag, because it is used to enable extended regex support, and your example doesn't need it.
Alternatively, you could use cat to dump the content of files to standard output, and pipe that to grep. That would work, because it removes the need for filename parameters for grep:
cat *.log | grep "invoking webservice" | cut -d ' ' -f1 -f11 | sort | uniq -c

sorting ls-l owners in Unix

I want to sort the owners in alphabetical order from a call to ls -l and cannot figure out a way to do it. I know something like ls-l | sort would sort the file name but how do i sort the owners in order?
The owner is the third field, so use -k 3:
ls -l | sort -k 3
You can extend this idea to sorting based on other fields, and you can have multiple -k options. For instance, maybe you want to sort by owner, and then size in descending order:
ls -l | sort -k 3,3 -k 5rn
I am not sure if you want only the owners or the whole information sorted by owner. In the former case superfo's solution is almost correct.
Additionally you need to remove repeating white spaces from ls's output with tr because otherwise cut that uses them as a delimiter won't work in all directories.*
So in the end you get this:
ls -l | tr -s ' ' | cut -d ' ' -f 3 | sort | uniq
*Some directories have a two digit value in the second field and all other lines with a single digit get an additional whitespace to preserve the layout.
How about ...
ls -l | cut -d ' ' -f 3 | sort | uniq
Try this:
ls -l | awk '{print $3, $4, $8}' | sort
It will print the user name, the group name and the file name. (File name cannot contain spaces)
ls -l | awk '{print $3, $4, $0}' | sort
This will print the user name, group name and the full ls -l output, sorted by the user name first, then the group name, then what ls -l prints first

Unix uniq utility: What is wrong with this code?

What I want to accomplish: print duplicated lines
This is what uniq man says:
SYNOPSIS
uniq [OPTION]... [INPUT [OUTPUT]]
DESCRIPTION
Discard all but one of successive identical lines from INPUT (or stan-
dard input), writing to OUTPUT (or standard output).
...
-d, --repeated
only print duplicate lines
This is what I try to execute:
root#laptop:/var/www# cat file.tmp
Foo
Bar
Foo
Baz
Qux
root#laptop:/var/www# cat file.tmp | uniq --repeated
root#laptop:/var/www#
So I was waiting for Foo in this example but it returns nothing..
What is wrong with this snippet?
uniq only checks consecutive lines against each other. So you can only expect to see something printed if there are two or more Foo lines in a row, for example.
If you want to get around that, sort the file first with sort.
$ sort file.tmp | uniq -d
Foo
If you really need to have all the non-consecutive duplicate lines printed in the order they occur in the file, you can use awk for that:
$ awk '{ if ($0 in lines) print $0; lines[$0]=1; }' file.tmp
but for a large file, that may be less efficient than sort and uniq. (May be - I haven't tried.)
cat file.tmp | sort | uniq --repeated
or
sort file.tmp | uniq --repeated
cat file.tmp | sort | uniq --repeated
the lines needs to be sorted
uniq operates on adjacent lines. what you want is
cat file.tmp | sort | uniq --repeated
On OS X, I actually would have
sort file.tmp | uniq -d
I've never tried this myself, but I think the word "successive" is the key.
This would probably work if you sorted the input before running uniq over it.
Something like
sort file.tmp | uniq -d

Resources