Pattern conversion in Unix - unix

I am new to the shell script, I have a text file with multiple records, and the 1st record end and second record start in the same line as below
"-}{"
So I want to break the chain as
"-} #line1
{ #line2"
I tried like below:
Method 1
sed 's/\-\}\{//\-\} \n \{' file.txt
Method 2
tr '-}{' '\n'
Can anyone please help me with this?

With your shown samples, please try following awk code. Simply substituting -}{ with -} new line { and printing the value.
echo '"-}{"' | awk '{sub(/-}{/,"-}\n{")} 1'

Too much escaping.
Also it's s/<pattern>/<replacement>/. There are 3 /, the last one on the end.
$ echo '"-}{"' | sed 's/-}{/-} \n {/'
"-}
{"
It's not possible to with tr, tr is for single character translate. If you would like tr -- '-}{' '\n' then tr would replace any of -, } and { by a newline.

This might work for you (GNU sed):
sed 'G;:a;s/-}\({.*\(.\)\)/-}\2\1/;ta;s/.$//' file
Append a newline to the current line.
Use pattern matching to insert the newline between -} and { repeatedly.
When all is done, remove the introduced newline.

You can use () to capture the delimiters and do this:
echo '"-}{" -} -}{' | sed -E 's/(-})({)/\1\n\2/g'
"-}
{" -} -}
{

Related

Enclose columns containing alphabets with single quotes using awk

Can awk process this?
Input
Neil,23,01-Jan-1990
25,Reena,19900203
Output
'Neil',23,'01-Jan-1990'
25,'Reena',19900203
awk approach:
awk -F, '{for(i=1;i<=NF;i++) if($i~/[[:alpha:]]/) $i="\047"$i"\047"}1' OFS="," file
The output:
'Neil',23,'01-Jan-1990'
25,'Reena',19900203
if($i~/[[:alpha:]]/) - if field contains alphabetic character
\047 - octal code of single quote ' character
Incorrect was my first attempt
sed -r 's/([^,]*[a-zA-Z]+[^,]*)(,{0,1})/"\1"\2/g' inputfile
#Sundeep gave an excellent comment: I need single quotes and it can be shorter:
I tried to match including the , of end-of-line, causing some complexity for matching. You can just match between the seperators making sure there is an alphabetic character somewhere.
sed 's/[^,]*[a-zA-Z][^,]*/\x27&\x27/g' inputfile
You might use this script:
script.awk
BEGIN { OFS=FS="," }
{ for(i= 1; i<=NF; i++) {
if( !match( $i, /^[0-9]+$/ ) ) $i = "'" $i "'"
}
print
}
and run it like this: awk -f script.awk yourfile .
Explanation
the first line sets up the input and output Fieldseparators to ,.
the loop tests each field, whether it contains only digits (/^[0-9]+$/):
if not the field is put in quotes

transpose a column in unix

I have a Unix file which has data like this.
1379545632,
1051908588,
229102020,
1202084378,
1102083491,
1882950083,
152212030,
1764071734,
1371766009,
(FYI, there is no empty line between two numbers as you see above. Its just because of the editor here. Its just a column with all numbers one below other)
I want to transpose it and print as a single line.
Like this:
1379545632,1051908588,229102020,1202084378,1102083491,1882950083,152212030,1764071734,1371766009
Also remove the last comma.
Can someone help? I need a shell/awk solution.
tr '\n' ' ' < file.txt
To remove the last comma you can try sed 's/,$//'.
With GNU awk for multi-char RS:
$ printf 'x,\ny,\nz,\n' | awk -v RS='^$' '{gsub(/\n|(,\n$)/,"")} 1'
x,y,z
awk 'BEGIN { ORS="" } { print }' file
ORS : Output Record separator.
Each Record will be separated with this delimiter.

Use sed to delete everything after '>' and add index number plus a string?

I know this should be pretty simple to do, but I can't get it to work. My file looks like this
>c12345|random info goes here that I want to delete
AAAAATTTTTTTTCCCC
>c45678| more | random info| here
GGGGGGGGGGG
And what I want to do is just make this far simpler so it might look like this
>seq1 [organism=human]
AAAAATTTTTTTTCCCC
>seq2 [organism=human]
GGGGGGGGGGGG
>seq3 [organism=human]
etc....
I know I can append that constant easily once I get the indexed part in there by doing:
sed '/^>/ s/$/\[organism-human]/g'
But how do I get that index built?
With sed:
sed '/^>/d' filename | sed '=' | sed 's/^[0-9]*$/>seq& [organism=human]/'
(Thanks to NeronLeVelu for the simplification.)
Here's one way you could do it using awk:
$ awk '/^>/ { $0 = ">seq" ++i " [organism=human]" } 1' file
>seq1 [organism=human]
AAAAATTTTTTTTCCCC
>seq2 [organism=human]
GGGGGGGGGGG
When the line begins with >, replace it with seq followed by i (which increases by 1 every time), then [organism=human]. The 1 at the end of the command is true, so awk performs the default action, which is to print the line.
Might be easier with a Perl one-liner:
perl -ne 'chomp; if (/^>/) { s/\|.*$//; print "$_ \[organism=human\]\n";} else { print "$_\n";}' filename

The characteristics of the input data

* Each line consists of two fields, separated by a pipe '|', where
* the first field is a comma-separated list of items, and
* the second field is a tag.
This is my INPUT:
100,210,354,462|acct
331,746,50|mis
90,263,47,14|sales
and required OUTPUT:
100acct
210acct
354acct
462acct
331mis
746mis
50mis
90sales
263sales
47sales
14sales
sed '{s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\5\n\2\5\n\3\5\n\4\5/;s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/}' filename
One way using GNU awk:
awk -F "[,|]" '{ for (i=1; i<NF; i++) print $i$NF }' file.txt
Results:
100acct
210acct
354acct
462acct
331mis
746mis
50mis
90sales
263sales
47sales
14sales
use the following
sed 's/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\5\n\2\5\n\3\5\n\4\5/g;s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/g'
This might work for you (GNU sed):
sed 's/\s*//;:a;s/,\(.*|\(.*\)\)/\2\n\1/;ta;s/|//' file
Explanation:
s/\s*// remove whitespace at the front of the record.
:a;s/,\(.*|\(.*\)\)/\2\n\1/;ta replace each , by the last field and a newline
s/|// remove the |
To preserve whitespace use:
sed -r 's/(\s*)(.*\|)/\2\1/;:a;s/,(.*\|(.*))/\2\n\1/;ta;s/\|//;s/(\S+)(\s+)(\S+)/\2\1\3/g' file
sed 's/\([0-9]\),\([0-9]*\),\([0-9]*\),*\([0-9]*\)\([,|]\)\(.*\)/\1\6\n\2\6\n\3\6\n\4\6/' input | sed '/^[a-z]*$/d'
this expression is give the correct output for you.

replacing nth character using unix magic

I have a blob of text like this:
abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,....
Can you guys help me in replacing the 4th comma (,) with a newline using awk or any unix (mac) magic!
To replace 4th , occurance you can use:
echo "abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,...." | sed 's/,/\n/4'
To replace every 4th occurance use:
echo "abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,...." | sed 's/\(\([^,]*,\)\{3\}[^,]*\),/\1\n/g'
To change only the 4th comma:
sed 's/\(\([^,]*,\)\{3\}[^,]*\),/\1\n/'
(note: rush shows a much cooler way to do this): s/,/\n/4
To change every 4th comma, add the g flag:
$ echo 'abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,....' |\
> sed 's/\(\([^,]*,\)\{3\}[^,]*\),/\1\n/g'
abcd,def,geff,hij
klmn,nop,qrs,tuv
wxyz,....
Here's a sed reference.
In a nutshell, the command finds the pattern
(( non-commas - comma ) (3 times) - (non-commas)) comma
and changes it to
"whatever is in outer brackets" + newline.
It works because default action of xargs is /bin/echo
http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
echo 'abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,....' | xargs -d, -n4 | tr ' ' ','

Resources