grep multiple files get count of unique cut

grep multiple files get count of unique cut - unix

I think I'm close on this, and saw similar questions but couldn't get it to work as I want. So, I have several log files and I would like to count the occurrences of several different service calls by date.
First I tried the below, the cut is just to get the first element (date) and 11th element (name of service call), which is specific to my log file:
grep -E "invoking webservice" *.log* | cut -d ' ' -f1 -f11 | sort | uniq -c
But this returned something that looks like:
5 log_1.log:2017-12-05 getLegs()
10 log_1.log:2017-12-05 getArms()
7 log_2.log:2017-12-05 getLegs()
13 log_2.log:2017-12-04 getLegs()
What I really want is:
12 2017-12-05 getLegs()
10 2017-12-05 getArms()
13 2017-12-04 getLegs()
I've seen examples where they cat * first, but looks like the same problem.
cat * | grep -E "invoking webservice" *.log* | cut -d ' ' -f1 -f11 | sort | uniq -c
What am I doing wrong? As always, thanks a lot!

Your issue seems to be that grep prefixes the matched lines with the filenames. (grep has this behavior when multiple filenames are specified, to disambiguate the results.) You can pass the -h to grep to not print the filenames:
grep -h "invoking webservice" *.log | cut -d ' ' -f1 -f11 | sort | uniq -c
Note that I dropped the -E flag, because it is used to enable extended regex support, and your example doesn't need it.
Alternatively, you could use cat to dump the content of files to standard output, and pipe that to grep. That would work, because it removes the need for filename parameters for grep:
cat *.log | grep "invoking webservice" | cut -d ' ' -f1 -f11 | sort | uniq -c

Related

Get a list of unique sender(from=) domains in postfix maillog

I am currenlty trying to extract all the sender domains from maillog. I am able to do some of that with the below command but the output is not quite what I desired. What would be the best approach to retrieve a unique list of sender domain from maillog?
grep from= /var/log/maillog | awk '{print $7}' | sort | uniq -c | sort -n
output
1 from=<user#test.com>,
1 from=<apache#app1.com>,
2 from=<bounceld_5BFa-bx0p-P3tQ-67Nn#example.com>,
2 from=<bounceld_19iI-HqaS-usVU-fqe5#example.com>,
12 reject:
666 from=<>,
desired output
test.com
app1.com
example.com

See useless use of grep; if you are using Awk anyway, you don't really need grep at all.
awk '$7 ~ /from=.*#/{split($7, a, /#/); ++count[a[2]] }
END { for(dom in count) print count[dom], dom }' /var/log/maillog
Collecting the counts in an associative array does away with the need to call sort and uniq, too. Obviously, if you don't care about the count, don't print count[dom] at the end.

This should give you the answer:
grep from= /var/log/maillog | awk '{print $7}' | grep -Po '(?=#).{1}\K.*(?=>)' | sort -n | uniq -c
... change last items to "| sort | uniq" to remove the counts.
References:
https://www.baeldung.com/linux/bash-remove-first-characters {1}\K use
Extract email addresses from log with grep or sed -Po grep function

Unix piping in system() does not work in R

I am using the system() command to run a Bash command in R, but every time I try to pipe the results of one command into the next (using '|'), I get some error.
For example:
system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t" '{print $2}'') returns the error: Error: unexpected '{' in "system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t" '{", and if I try to remove awk -F "\t" '{print $2}' so that I'm left with system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p'), I get the following:
/usr/bin/grep: 2-: No such file or directory
[1] 2
I have to keep removing parts of it till I am left with only system('grep ^SN bam_stats.txt'), AKA no pipes are left, for it to work.
Here is a sample from the file 'bam_stats.txt' from which I'm extracting information:
SN filtered sequences: 0
SN sequences: 137710356
SN is sorted: 1
SN 1st fragments: 68855178
SN last fragments: 68855178
SN reads mapped: 137642653
SN reads mapped and paired: 137602018 # paired-end technology bit set + both mates mapped
SN reads unmapped: 67703
SN percentage of properly paired reads (%): 99.8
Can someone tell me why piping is not working? Apologies if this is a stupid question. Please let me know if I should provide more information.
Thank you in advance.

I don't know R but IF Rs implementation of system() just passes it's argument to a shell then, in terms of standard Unix quoting, your example
system('grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t" '{print $2}'')
contains 2 strings within quotes and a string in the middle that's outside of quotes:
Inside: grep ^SN bam_stats.txt | cut -f 2- | sed -n 8p | awk -F "\t"
Outside: {print $2}
Inside: <a null string>
because the 2 quotes in the middle around '{print $2}' are ending the first quoted string then later starting a second quoted string.
You don't need sed, grep, or cut if you're using awk anyway though so try just this:
system('awk -F"\t" "/^SN/ && (++cnt==8){print \$3}" bam_stats.txt')

Most frequently used commands during the last x months

I know how to get the most used shell commands in zsh with
history 1 | awk '{$1="";print substr($0,2)}' | sort | uniq -c | sort -n | tail -n 20
but is there a way to restrict myself to let's say the last two or three months?
I need this because I would like to create aliases for the commands I am currently using most.

history in zsh have several flags to show date and time stamp. For this to work, you have to add setopt extended_history to your .zshrc file.
If you have extended_history enabled, history -i will show full time-date stamps in ISO8601 `yyyy-mm-dd hh:mm' format. Dates in this format can be compared as strings. So just change your awk script and use it to select only lines after some date.
history -i 1 | awk '{ if ($2 >= "2020-05-01") { $1=$2=$3="";print $0; } }' | sort | uniq -c | sort -n -r | head -n 20
Be aware that if you have HIST_IGNORE_ALL_DUPS or HIST_IGNORE_DUPS options enabled, this will not work as intended.
You can also use date command to get older date automatically.

I tried various cut commands but unable to get the output I desire

cat DecisionService.txt
/MAGI/Household/MAGI_EDG_FLOW.erf;/Medicaid/MAGI_EDG_FLOW;4;4
/VCL/VCL_Ruleflow_1.erf;/VCL/VCL1_EBDC_FLOW;4;4
/VCL/VCL_Ruleflow_2.erf;/VCL/VCL2_EBDC_FLOW;4;4
I tried this:
cat DecisionService.txt | cut -d ';' -f2 | cut -d '/' -f2 | tr -s ' ' '\n'
My output is:
$i=Medicaid
VCL
VCL
Whereas I need the output to be:
$a=Medicaid
$b=VCL

If you just want the unique values then:
awk -F'/' 'NF&&!a[$(NF-1)]++{print $(NF-1)}' file
Medicaid
VCL
If you actually want the output to contain prefixed incremental variables then:
awk -F'/' 'NF&&!a[$(NF-1)]++{printf "$%c=%s\n",i++,$(NF-1)}' i=97 file
$a=Medicaid
$b=VCL
Note: If your input may contain more than 26 unique value you will need to do something cleverer to avoid output such as $|=VCL.

Well from the question, it's not much clear what exactly you want, but i guess you don't want repeated VCL in output. Try adding sort and uniq at the end.
cat DecisionService.txt
/MAGI/Household/MAGI_EDG_FLOW.erf;/Medicaid/MAGI_EDG_FLOW;4;4
/VCL/VCL_Ruleflow_1.erf;/VCL/VCL1_EBDC_FLOW;4;4
/VCL/VCL_Ruleflow_2.erf;/VCL/VCL2_EBDC_FLOW;4;4
cat DecisionService.txt | cut -d ';' -f2 | cut -d '/' -f2 | tr -s ' ' '\n'|sort|uniq
Medicaid
VCL

sorting ls-l owners in Unix

I want to sort the owners in alphabetical order from a call to ls -l and cannot figure out a way to do it. I know something like ls-l | sort would sort the file name but how do i sort the owners in order?

The owner is the third field, so use -k 3:
ls -l | sort -k 3
You can extend this idea to sorting based on other fields, and you can have multiple -k options. For instance, maybe you want to sort by owner, and then size in descending order:
ls -l | sort -k 3,3 -k 5rn

I am not sure if you want only the owners or the whole information sorted by owner. In the former case superfo's solution is almost correct.
Additionally you need to remove repeating white spaces from ls's output with tr because otherwise cut that uses them as a delimiter won't work in all directories.*
So in the end you get this:
ls -l | tr -s ' ' | cut -d ' ' -f 3 | sort | uniq
*Some directories have a two digit value in the second field and all other lines with a single digit get an additional whitespace to preserve the layout.

How about ...
ls -l | cut -d ' ' -f 3 | sort | uniq

Try this:
ls -l | awk '{print $3, $4, $8}' | sort
It will print the user name, the group name and the file name. (File name cannot contain spaces)
ls -l | awk '{print $3, $4, $0}' | sort
This will print the user name, group name and the full ls -l output, sorted by the user name first, then the group name, then what ls -l prints first