I have a blob of text like this:
abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,....
Can you guys help me in replacing the 4th comma (,) with a newline using awk or any unix (mac) magic!
To replace 4th , occurance you can use:
echo "abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,...." | sed 's/,/\n/4'
To replace every 4th occurance use:
echo "abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,...." | sed 's/\(\([^,]*,\)\{3\}[^,]*\),/\1\n/g'
To change only the 4th comma:
sed 's/\(\([^,]*,\)\{3\}[^,]*\),/\1\n/'
(note: rush shows a much cooler way to do this): s/,/\n/4
To change every 4th comma, add the g flag:
$ echo 'abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,....' |\
> sed 's/\(\([^,]*,\)\{3\}[^,]*\),/\1\n/g'
abcd,def,geff,hij
klmn,nop,qrs,tuv
wxyz,....
Here's a sed reference.
In a nutshell, the command finds the pattern
(( non-commas - comma ) (3 times) - (non-commas)) comma
and changes it to
"whatever is in outer brackets" + newline.
It works because default action of xargs is /bin/echo
http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
echo 'abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,....' | xargs -d, -n4 | tr ' ' ','
Related
I am new to the shell script, I have a text file with multiple records, and the 1st record end and second record start in the same line as below
"-}{"
So I want to break the chain as
"-} #line1
{ #line2"
I tried like below:
Method 1
sed 's/\-\}\{//\-\} \n \{' file.txt
Method 2
tr '-}{' '\n'
Can anyone please help me with this?
With your shown samples, please try following awk code. Simply substituting -}{ with -} new line { and printing the value.
echo '"-}{"' | awk '{sub(/-}{/,"-}\n{")} 1'
Too much escaping.
Also it's s/<pattern>/<replacement>/. There are 3 /, the last one on the end.
$ echo '"-}{"' | sed 's/-}{/-} \n {/'
"-}
{"
It's not possible to with tr, tr is for single character translate. If you would like tr -- '-}{' '\n' then tr would replace any of -, } and { by a newline.
This might work for you (GNU sed):
sed 'G;:a;s/-}\({.*\(.\)\)/-}\2\1/;ta;s/.$//' file
Append a newline to the current line.
Use pattern matching to insert the newline between -} and { repeatedly.
When all is done, remove the introduced newline.
You can use () to capture the delimiters and do this:
echo '"-}{" -} -}{' | sed -E 's/(-})({)/\1\n\2/g'
"-}
{" -} -}
{
I have a Unix file containing semicolon separated records like below, having 2nd part/column a string with comma separated values, like below:
789651234512;TEST-10=5,TEST-136=6,TEST-3=1,TEST-4=2,TEST-5=3,TEST-9=4,TEST-9013=100
132567123784;TEST-3=1,TEST-136=5,TEST-15=4,TEST-4=2,TEST-5=3
132564013784;TEST-3=1,TEST-15=4,TEST-4=2,TEST-5=8
132496583212;TEST-13=4,TEST-136=7,TEST-23=1,TEST-6=2,TEST-5=3,TEST-4=5,TEST-6=11
I want to find all TEST-136=X, when exists, where X can be any interger number from 1 and up to 3 digits and return them like, for above example:
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
I am using the below awk, but that returns whole string of 2nd part/column:
awk -F'[;]' '/TEST-136/{ print $1";"$2 }' file.txt
However, I need to get only the 1st part/column and also the TEST-136=X part of the 2nd part/column, as said.
assumes ONE match per line/record.
$ awk -F';' 'match($0, /TEST-136=[[:digit:]]+/) {print $1, substr($0,RSTART,RLENGTH)}' OFS=';' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
This might work for you (GNU sed):
sed -En 's/^([^;]*;).*(TEST-136=[^,]*).*/\1\2/p' file
Simple Perl,
$ perl -F";" -lane ' /(TEST-136=\w+)/ and print "$F[0];$1" ' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
$
Another awk
$ awk -F"[;,]" ' { for(i=2;i<=NF;i++) if($i~/TEST-136/) print $1 ";" $i } ' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
$
Each record coming with column names. It is pipe delimited. I have to replace them in each record as shown below:
Input:
COMPILES=1|PROPS=inet.timeoutDownload=5000;inet.timeoutIO=5000;inet.timeoutOpen=5000;inet.urlBase=vxml3-elr:7000/CVP/;swirec_language=en-US|SCPU=30828
Output:
1|inet.timeoutDownload=5000;inet.timeoutIO=5000;inet.timeoutOpen=5000;inet.urlBase=vxml3-elr:7000/CVP/;swirec_language=en-US|30828
I was trying the command sed 's/[^|]*=//g' to replace all sequences of non-| characters followed by = with nothing but in the 2nd column it is printing only last value. Is there a way to replace only 1st instance in each field?
1|en-US|30828
Using sed:
$ sed 's/\(^\||\)[^=]\+=/\1/g' file
1|inet.timeoutDownload=5000;inet.timeoutIO=5000;inet.timeoutOpen=5000;inet.urlBase=vxml3-elr:7000/CVP/;swirec_language=en-US|30828
Explained:
s/ replace
\(^\||\)[^=]\+= beginning (^) or (\|) separator (|) and all non-=s and a =
/\1/g with beginning or separator (\1) globally (g)
ie. replace ^THIS= with ^ and |THIS= with |.
Try with this:
awk -v RS='|' -v ORS='|' '{sub("[^.]*=","")}1' input | sed "s|\|$||g"
RS, record separator, usually is newline, in this case it changes to |, so a record would be COMPILES=1 or PROPS=inet.timeoutDownload=5000;inet.timeoutIO=5000;inet.timeoutOpen=5000;inet.urlBase=vxml3-elr:7000/CVP/;swirec_language=en-US
ORS, output record separator, is also newline, changes to |, so when print, the output would be separated by |
sub("[^.]*=","") its a lazy regex to replace the first value before =, more about it in https://unix.stackexchange.com/questions/49601/how-to-reduce-the-greediness-of-a-regular-expression-in-awk
sed "s|\|$||g" to delete the last |
another awk
$ awk 'BEGIN{FS=OFS="|"} {for(i=1;i<=NF;i++) sub(/[^=]+=/,"",$i)}1' file
results with
1|inet.timeoutDownload=5000;inet.timeoutIO=5000;inet.timeoutOpen=5000;inet.urlBase=vxml3-elr:7000/CVP/;swirec_language=en-US|30828
Using Perl
$ cat mullapudi.log
COMPILES=1|PROPS=inet.timeoutDownload=5000;inet.timeoutIO=5000;inet.timeoutOpen=5000;inet.urlBase=vxml3-elr:7000/CVP/;swirec_language=en-US|SCPU=30828
$ perl -F"\|" -ane ' s/^.+?=//g for #F; print join("|",#F) ' mullapudi.log
1|inet.timeoutDownload=5000;inet.timeoutIO=5000;inet.timeoutOpen=5000;inet.urlBase=vxml3-elr:7000/CVP/;swirec_language=en-US|30828
I have a Unix file which has data like this.
1379545632,
1051908588,
229102020,
1202084378,
1102083491,
1882950083,
152212030,
1764071734,
1371766009,
(FYI, there is no empty line between two numbers as you see above. Its just because of the editor here. Its just a column with all numbers one below other)
I want to transpose it and print as a single line.
Like this:
1379545632,1051908588,229102020,1202084378,1102083491,1882950083,152212030,1764071734,1371766009
Also remove the last comma.
Can someone help? I need a shell/awk solution.
tr '\n' ' ' < file.txt
To remove the last comma you can try sed 's/,$//'.
With GNU awk for multi-char RS:
$ printf 'x,\ny,\nz,\n' | awk -v RS='^$' '{gsub(/\n|(,\n$)/,"")} 1'
x,y,z
awk 'BEGIN { ORS="" } { print }' file
ORS : Output Record separator.
Each Record will be separated with this delimiter.
* Each line consists of two fields, separated by a pipe '|', where
* the first field is a comma-separated list of items, and
* the second field is a tag.
This is my INPUT:
100,210,354,462|acct
331,746,50|mis
90,263,47,14|sales
and required OUTPUT:
100acct
210acct
354acct
462acct
331mis
746mis
50mis
90sales
263sales
47sales
14sales
sed '{s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\5\n\2\5\n\3\5\n\4\5/;s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/}' filename
One way using GNU awk:
awk -F "[,|]" '{ for (i=1; i<NF; i++) print $i$NF }' file.txt
Results:
100acct
210acct
354acct
462acct
331mis
746mis
50mis
90sales
263sales
47sales
14sales
use the following
sed 's/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\5\n\2\5\n\3\5\n\4\5/g;s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/g'
This might work for you (GNU sed):
sed 's/\s*//;:a;s/,\(.*|\(.*\)\)/\2\n\1/;ta;s/|//' file
Explanation:
s/\s*// remove whitespace at the front of the record.
:a;s/,\(.*|\(.*\)\)/\2\n\1/;ta replace each , by the last field and a newline
s/|// remove the |
To preserve whitespace use:
sed -r 's/(\s*)(.*\|)/\2\1/;:a;s/,(.*\|(.*))/\2\n\1/;ta;s/\|//;s/(\S+)(\s+)(\S+)/\2\1\3/g' file
sed 's/\([0-9]\),\([0-9]*\),\([0-9]*\),*\([0-9]*\)\([,|]\)\(.*\)/\1\6\n\2\6\n\3\6\n\4\6/' input | sed '/^[a-z]*$/d'
this expression is give the correct output for you.