How to append suffix to all string matched regular expression in unix - unix

I need to replace all occurrences of string in specific format (in my case colon followed by some number) with same string with suffix in a file, like this:
:123456 -> :123456_suffix
Is there a way to do it with sed or other unix command-line tool?

Sed should do that:
sed -i~ -e 's/:\([0-9]\{1,\}\)/:\1_suffix/g' file
^ ^ ^ ^ ^ ^
| | | | | |
start capture | | end | globally, i.e. not just the first
group | | capture | occurrence on a line
any digit | the first capture
one or group contents
more times
If -i is not supported, just create a new file and replace the old one:
sed ... > newfile
mv oldfile oldfile~ # a backup
mv newfile oldfile

use sed,
sed 's/\(:[0-9]\+\)/\1_suffix/g' file
add -i modifier , if you want to do an in-place edit.

Related

Find most occurring words in text file

I have a log file which logs cat and sub cat names that failed with message error. My goal is to find the most occurring categories.
e.g. log.:
Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073'
Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'
Now I want to identify the top 10 categories that failed.
Using sed:
sed -e 's/\s/\n/g' < file.log | grep ERROR | sort | uniq -c | sort -nr | head -10
I am getting 1636 [ERROR
While I was looking for a list of categories sorting after amount of occurrenxe. e.g.
139 category1
23 category 2
...
You say you want to make a counting using sed, but actually, you are having an entire pipeline with sed, grep, sort, uniq and head. Generally, when this happens, your problem is screaming for awk:
awk 'BEGIN{FS="\047"; PROCINFO["sorted_in"]="#val_num_asc"}
/\[ERROR /{c[$2]++}
END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file
The above solution is a GNU awk solution as it makes use of non-POSIX compliant features such as the sorting of the array traversal (PROCINFO). The field separator is set to the <single quote> (') which has octal value \047 as it assumes that the category name is between single quotes.
If you are not using GNU awk, you could use sort and head or do the sorting yourself. One way is:
awk 'BEGIN{FS="\047"; n=10 }
/\[ERROR /{ c[$2]++ }
END {
for (l in c) {
for (i=1;i<=n;++i) {
if (c[l] > c[s[i]]) {
for(j=n;j>i;--j) s[j]=s[j-1];
s[i]=l
break
}
}
}
for (i=1;i<=n;++i) {
if (s[i]=="") break
print c[s[i]], s[i]
}
}' file
or just do:
awk 'BEGIN{FS="\047"}
/\[ERROR /{c[$2]++}
END{for(i in c) { print c[i],i; if(++j == 10) exit } }' file \
| sort -nr | head -10
You got 1636 [ERROR because you change the space character into a newline character, then you grep the word ERROR, then you count.
This :
sed -e 's/\s/\n/g' < file.log | grep ERROR
Gives you this :
[ERROR
[ERROR
[ERROR
[ERROR
[ERROR
[ERROR
... (1630 more)
You need to grep first then sed (pretty sure you can do better with sed but I'm just talking about the logic behind the commands) :
grep ERROR file.log | sed -e 's/\s/\n/g' | sort | uniq -c | sort -nr | head -10
This may not be the best solution as it counts the word ERROR and other useless words, but you didn't give us a lot of information on the input file.
Assuming 'Bulgari' is an example of a category you want to extract, try
sed -n "s/.*ERROR.*\] Category '\([^']*\)'.*/\1/p" file.log |
sort | uniq -c | sort -rn | head -n 10
The sed command finds lines which match a fairly complex regular expression and captures part of the line, then replaces the match with the captured substring, and prints it (the -n option disables the default print action, so we only print the extracted lines). The rest is basically identical to what you already had.
In the regex, we look for (beginning of line followed by) anything (except a newline) followed by ERROR and later on followed by ] Category ' and then a string which doesn't contain a single quote, then the closing single quote followed by anything. The lots of "anything (except newline)" are required in order to replace the entire line with just the captured string from inside the single quotes. The backslashed parentheses are what capture an expression; google for "backref" for the full scoop.
Your original attempt would only extract the actual ERROR strings, because you replaced all the surrounding spaces with newlines (assuming vaguely that your sed accepts the Perl \s shorthand, which isn't standard in sed, and that \n gets interpreted as a literal newline in the replacement, which also isn't entirely standard or portable).
The way to go is to select the erred categories and replace the whole line with only the Category name using sed.
Give a try to this:
sed -e "s/^.* [[]ERROR .*[]] Category '\([^']*\)' .*$/\1/g" file.log | sort | uniq -c | sort -nr | head -16
^ is the start of the line
\( ... \) : the char sequence enclosed in this escaped parenthesis can be referred with \1 for the first pair appearing in the regex, \2 for the second pair etc.
$ is the end of the line.
The sed selects a line which contains [ERROR and some chars until a ], folled with the word Category, and then after the (space) char, any sequence of chars, up to the next space char, is selected with a pair of escaped parenthesis, followed with any sequence of chars up to the end of the line. If a such a line is found, it is replaced with the char sequence after Category.
Using Perl
> cat merlin.txt
Mon, 26 Nov 2018 07:51:07 +0100 | 164: [ERROR ***] Category ID not found for 'mcat-name1' 'subcat-name1' ref: '073'
Mon, 26 Nov 2018 07:51:08 +0100 | 278: [ERROR ***] Category ID not found for 'mcat-name2' 'subcat-name2' ref: '020'
Mon, 26 Nov 2018 07:51:21 +0100 | 1232: [ERROR ***] Category ID not found for 'make' 'model' ref: '228239'
> perl -ne ' { s/(.*)Category.*for(.+)ref.*/\2/g and s/(\047\S+\047)/$kv{$1}++/ge if /ERROR/} END { foreach (sort keys %kv) { print "$_ $kv{$_}\n" } } ' merlin.txt | sort -nr
'subcat-name2' 1
'subcat-name1' 1
'model' 1
'mcat-name2' 1
'mcat-name1' 1
'make' 1
>

Considering quotes and few special characters in string

I want to execute the below command in remote servers..
find /usr/nsh/NSH/Transactions/log -name "bldeploy-*" -and -printf
'%T#:%p\n' | sort -V | sed -r 's/^[^:]+://'|xargs egrep -i
"VANTAGE_CORE-APP"|tail -1|cut -d '"' -f2
how can i put this single command in string???
I have trying this way but its not working.
Dim str as string = "-above command-"
Can anyone let me know, how can place this whole command in one string considering all quotes.
Thanks for your help.
Since this looks like VB.NET, you simply need to escape the double quotes (" --> "").
Like so:
Dim str as string = "find /usr/nsh/NSH/Transactions/log -name ""bldeploy-*"" -and -printf '%T#:%p\n' | sort -V | sed -r 's/^[^:]+://'|xargs egrep -i ""VANTAGE_CORE-APP""|tail -1|cut -d '""' -f2"

Update value within a unix flat file

Kindly help. I want to make .0 to be 0.0 within a UNIX file yyyyy.csv :
603905209;47.824;USD
603905477;57.199;USD
603938657;3.2281;USD
603949388;.00191;USD
603937274;.00563;USD
603911160;.00287;USD
I want the result to be
603905209;47.824;USD
603905477;57.199;USD
603938657;3.2281;USD
603949388;0.00191;USD
603937274;0.00563;USD
603911160;0.00287;USD
but I got this result:
603905209;0.4.7824;USD
603905477;0.5.7199;USD
603938657;0.3.2281;USD
603949388;0.00191;USD
603937274;0.00563;USD
603911160;0.00287;USD
Below is my command:
sed 's/;.0/;0.0/g' yyyyy.csv | sed 's/;.2/;0.2/g' | sed 's/;.1/;0.1/g' | sed 's/;.3/;0.3/g' | sed 's/;.4/;0.4/g' | sed 's/;.5/;0.5/g'| sed 's/;.6/;0.6/g' | sed 's/;.7/;0.7/g' | sed 's/;.8/;0.8/g' | sed 's/;.9/;0.9/g' > xxxxx.csv
You need to escape all the dots present in your regex or otherwise it would match any character. That is, . is a special meta character in regex which matches any character. To match a literal dot, you need to escape the ..
sed 's/;\.0/;0.0/g' yyyyy.csv
And this would be enough.
$ sed 's/;\.\([0-9]\)/;0.\1/g' file
603905477;57.199;USD
603938657;3.2281;USD
603949388;0.00191;USD
603937274;0.00563;USD
603911160;0.00287;USD
In basic sed, \(...\) called capturing group, which is used to capture the characters matched by the pattern present inside that group. So the pattern present inside the group is [0-9] which matches a digit from 0-9. We could refer the captured characters through back-referencing ie, \1. \1 at the replacement part refers to the characters which are present inside the group index 1.
Change your command to:
sed 's/;\./;0./g' File
i.e, just substitute ;. with ;0..
Just print your file using a tool that understands numbers:
$ awk -F';' '{printf "%d;%f;%s\n", $1,$2,$3}' file
603905209;47.824000;USD
603905477;57.199000;USD
603938657;3.228100;USD
603949388;0.001910;USD
603937274;0.005630;USD
603911160;0.002870;USD
awk -F';' '{printf "%d;%.5f;%s\n", $1,$2,$3}' file
603905209;47.82400;USD
603905477;57.19900;USD
603938657;3.22810;USD
603949388;0.00191;USD
603937274;0.00563;USD
603911160;0.00287;USD
$ awk -F';' '{printf "%d;%.2f;%s\n", $1,$2,$3}' file
603905209;47.82;USD
603905477;57.20;USD
603938657;3.23;USD
603949388;0.00;USD
603937274;0.01;USD
603911160;0.00;USD
Whatever precision you like...

Unix- Sed replacing substring

I am new to sed . I want to replace a substring
for example:
var1=server1:game1,server2:game2,sever3:game1
output should be
server1 server2 server3 (with just spaces)
I have tried this.
echo $var1 | sed 's/,/ /g' | sed 's/:* / /g'
This is not working. Please suggest a solution.
You can try this sed,
echo $var1 | sed 's/:[^,]\+,\?/ /g'
Explanation:
:[^,]\+, - It will match the string from : to ,
\? - Previous may occur or may not ( Since end of line don't have , )
echo $var1 | sed s/:game[0-9],*/\ /
Assuming your sub string has game followed by a number([0-9]*)
An awk variation using same regex as sed
awk '{gsub(/:[^,]+,?/," ")}1' <<< "$var1"
PS Its always good custom to "quote" variables
Just for info, you are really only matching, not replacing, so grep can be your friend (with -P):
grep -oP '[^:,=]+(?=:)'
That matches a number of characters that aren't :,= followed by a : using lookahead.
This will put the servers on different lines, which may be what you want anyway. You can put them on one line by adding tr:
grep -oP '[^:,=]+(?=:)' | tr '\n' ' '

In UNIX Terminal How to get a part of filename in a folder?

I have a list of n files in a folder have some format.
Eg: ABCD.EXXXX.ZZZZ.ZZZZZ.txt
in above file ABCD.E is common for all the files,ZZZZ.ZZZZ is user wish string and i need to extract XXXX from all the files , need to display distinct XXXX to user.. Is there any way to do so.? Help me out in doing so.. Thanks in advance..
Use ls -1 to make a list of the relevant files. Pipe it into sed to strip the beginning 'ABCD.E'. Then pipe it into sed again to remove everything after the first '.'
ls -1 ABCD\.E*\.txt | sed 's/^ABCD\.E//' | sed 's/\..*//'
Alternatively, if you want a little more control of the output you can do the second bit with awk
ls -1 ABCD\.E*\.txt | sed 's/^ABCD\.E//' | awk 'BEGIN{FS="."}{print "value =", $1, "user=", $2"."$3}'
awk -F"."'{print $2}' filename
You can try printing $1, $2 ,$3... to get more understanding of command.
You can use the bash/ksh parameter subsitution # and % for this from inside the shell.
function get_filename_section {
typeset f=${1:?}
typeset r=${f#ABCD.E}
print ${r%.ZZZZ.ZZZZZ.txt}
}
Testing:
[[ $( get_filename_section ABCD.EXXXX.ZZZZ.ZZZZZ.txt ) == XXXX ]] &&
echo ok || echo no

Resources