The characteristics of the input data

The characteristics of the input data - unix

* Each line consists of two fields, separated by a pipe '|', where
* the first field is a comma-separated list of items, and
* the second field is a tag.
This is my INPUT:
100,210,354,462|acct
331,746,50|mis
90,263,47,14|sales
and required OUTPUT:
100acct
210acct
354acct
462acct
331mis
746mis
50mis
90sales
263sales
47sales
14sales

sed '{s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\5\n\2\5\n\3\5\n\4\5/;s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/}' filename

One way using GNU awk:
awk -F "[,|]" '{ for (i=1; i<NF; i++) print $i$NF }' file.txt
Results:
100acct
210acct
354acct
462acct
331mis
746mis
50mis
90sales
263sales
47sales
14sales

use the following
sed 's/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\5\n\2\5\n\3\5\n\4\5/g;s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/g'

This might work for you (GNU sed):
sed 's/\s*//;:a;s/,\(.*|\(.*\)\)/\2\n\1/;ta;s/|//' file
Explanation:
s/\s*// remove whitespace at the front of the record.
:a;s/,\(.*|\(.*\)\)/\2\n\1/;ta replace each , by the last field and a newline
s/|// remove the |
To preserve whitespace use:
sed -r 's/(\s*)(.*\|)/\2\1/;:a;s/,(.*\|(.*))/\2\n\1/;ta;s/\|//;s/(\S+)(\s+)(\S+)/\2\1\3/g' file

sed 's/\([0-9]\),\([0-9]*\),\([0-9]*\),*\([0-9]*\)\([,|]\)\(.*\)/\1\6\n\2\6\n\3\6\n\4\6/' input | sed '/^[a-z]*$/d'
this expression is give the correct output for you.

Related

Pattern conversion in Unix

I am new to the shell script, I have a text file with multiple records, and the 1st record end and second record start in the same line as below
"-}{"
So I want to break the chain as
"-} #line1
{ #line2"
I tried like below:
Method 1
sed 's/\-\}\{//\-\} \n \{' file.txt
Method 2
tr '-}{' '\n'
Can anyone please help me with this?

With your shown samples, please try following awk code. Simply substituting -}{ with -} new line { and printing the value.
echo '"-}{"' | awk '{sub(/-}{/,"-}\n{")} 1'

Too much escaping.
Also it's s/<pattern>/<replacement>/. There are 3 /, the last one on the end.
$ echo '"-}{"' | sed 's/-}{/-} \n {/'
"-}
{"
It's not possible to with tr, tr is for single character translate. If you would like tr -- '-}{' '\n' then tr would replace any of -, } and { by a newline.

This might work for you (GNU sed):
sed 'G;:a;s/-}\({.*\(.\)\)/-}\2\1/;ta;s/.$//' file
Append a newline to the current line.
Use pattern matching to insert the newline between -} and { repeatedly.
When all is done, remove the introduced newline.

You can use () to capture the delimiters and do this:
echo '"-}{" -} -}{' | sed -E 's/(-})({)/\1\n\2/g'
"-}
{" -} -}
{

unix ksh how to print $1 and first n characters of $2

I have a file as follows:
$ cat /etc/oratab
hostname01:DBNAME11:/oracle_home/A_19.0.0.0:N
hostname01:DBNAME1_DC:/oracle_home/A_19.0.0.0:N
hostname02:DBNAME21:/oracle_home/B_19.0.0.0:N
hostname02:DBNAME2_DC:/oracle_home/B_19.0.0.0:N
I want print the unique of the first column, first 6 characters of the second column and the third column when the third column matches the string "19.0.0".
The output I want to see is:
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
I put together this piece of code but looks like its not the correct way to do it.
cat /etc/oratab|grep "19.0.0"|awk '{print $1}' || awk -F":" '{print subsrt($2,1,8)}
sorry I am very new to shell scripting

1st solution: With your shown sample please try following, written and tested with GNU awk.
awk 'BEGIN{FS=OFS=":"} {$2=substr($2,1,7)} !arr[$1,$2]++ && $3~/19\.0\.0/{NF--;print}' Input_file
2nd solution: OR in case your awk doesn't support NF-- then try following.
awk '
BEGIN{
FS=OFS=":"
}
{
$2=substr($2,1,7)
}
!arr[$1,$2]++ && $3~/19\.0\.0/{
$4=""
sub(/:$/,"")
print
}
' Input_file
Explanation: Simple explanation would be, set field separator and output field separator as :. Then in main program, set 2nd field to 1st 7 characters of its value. Then check condition if they are unique(didn't occur before) and 3rd field is like 19.0.0, reduce 1 field and print that line.

You may try this awk:
awk 'BEGIN{FS=OFS=":"} $3 ~ /19\.0\.0/ && !seen[$1]++ {
print $1, substr($2,1,7), $3}' /etc/fstab
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
We check and populate associative array seen only if we find 19.0.0 in $3.

If the lines can be like this and ending on 19.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.1
and the hostname01 only should be unique, you might miss a line.
You could match the pattern using sed and use 2 capture groups that you want to keep and match what you don't want.
Then pipe the output to uniq to get all unique lines instead of line the first column.
sed -nE 's/^([^:]+:.{7})[^:]*(:[^:]*19\.0\.0[^:]*).*/\1\2/p' file | uniq
Output
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0

$ awk 'BEGIN{FS=OFS=":"} index($3,"19.0.0"){print $1, substr($2,1,7), $3}' file | sort -u
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0

transpose a column in unix

I have a Unix file which has data like this.
1379545632,
1051908588,
229102020,
1202084378,
1102083491,
1882950083,
152212030,
1764071734,
1371766009,
(FYI, there is no empty line between two numbers as you see above. Its just because of the editor here. Its just a column with all numbers one below other)
I want to transpose it and print as a single line.
Like this:
1379545632,1051908588,229102020,1202084378,1102083491,1882950083,152212030,1764071734,1371766009
Also remove the last comma.
Can someone help? I need a shell/awk solution.

tr '\n' ' ' < file.txt
To remove the last comma you can try sed 's/,$//'.

With GNU awk for multi-char RS:
$ printf 'x,\ny,\nz,\n' | awk -v RS='^$' '{gsub(/\n|(,\n$)/,"")} 1'
x,y,z

awk 'BEGIN { ORS="" } { print }' file
ORS : Output Record separator.
Each Record will be separated with this delimiter.

Use sed to delete everything after '>' and add index number plus a string?

I know this should be pretty simple to do, but I can't get it to work. My file looks like this
>c12345|random info goes here that I want to delete
AAAAATTTTTTTTCCCC
>c45678| more | random info| here
GGGGGGGGGGG
And what I want to do is just make this far simpler so it might look like this
>seq1 [organism=human]
AAAAATTTTTTTTCCCC
>seq2 [organism=human]
GGGGGGGGGGGG
>seq3 [organism=human]
etc....
I know I can append that constant easily once I get the indexed part in there by doing:
sed '/^>/ s/$/\[organism-human]/g'
But how do I get that index built?

With sed:
sed '/^>/d' filename | sed '=' | sed 's/^[0-9]*$/>seq& [organism=human]/'
(Thanks to NeronLeVelu for the simplification.)

Here's one way you could do it using awk:
$ awk '/^>/ { $0 = ">seq" ++i " [organism=human]" } 1' file
>seq1 [organism=human]
AAAAATTTTTTTTCCCC
>seq2 [organism=human]
GGGGGGGGGGG
When the line begins with >, replace it with seq followed by i (which increases by 1 every time), then [organism=human]. The 1 at the end of the command is true, so awk performs the default action, which is to print the line.

Might be easier with a Perl one-liner:
perl -ne 'chomp; if (/^>/) { s/\|.*$//; print "$_ \[organism=human\]\n";} else { print "$_\n";}' filename

replacing nth character using unix magic

I have a blob of text like this:
abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,....
Can you guys help me in replacing the 4th comma (,) with a newline using awk or any unix (mac) magic!

To replace 4th , occurance you can use:
echo "abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,...." | sed 's/,/\n/4'
To replace every 4th occurance use:
echo "abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,...." | sed 's/\(\([^,]*,\)\{3\}[^,]*\),/\1\n/g'

To change only the 4th comma:
sed 's/\(\([^,]*,\)\{3\}[^,]*\),/\1\n/'
(note: rush shows a much cooler way to do this): s/,/\n/4
To change every 4th comma, add the g flag:
$ echo 'abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,....' |\
> sed 's/\(\([^,]*,\)\{3\}[^,]*\),/\1\n/g'
abcd,def,geff,hij
klmn,nop,qrs,tuv
wxyz,....
Here's a sed reference.
In a nutshell, the command finds the pattern
(( non-commas - comma ) (3 times) - (non-commas)) comma
and changes it to
"whatever is in outer brackets" + newline.

It works because default action of xargs is /bin/echo
http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
echo 'abcd,def,geff,hij,klmn,nop,qrs,tuv,wxyz,....' | xargs -d, -n4 | tr ' ' ','

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

The characteristics of the input data - unix

sed '{s/^\([^a-z].\),\([^a-z].\),\([^a-z].\),\([^a-z].\)|\([^0-9].\)$/\1\5\n\2\5\n\3\5\n\4\5/;s/^\([^a-z].\),\([^a-z].\),\([^a-z].\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/}' filename

One way using GNU awk: awk -F "[,|]" '{ for (i=1; i<NF; i++) print $i$NF }' file.txt Results: 100acct 210acct 354acct 462acct 331mis 746mis 50mis 90sales 263sales 47sales 14sales

use the following sed 's/^\([^a-z].\),\([^a-z].\),\([^a-z].\),\([^a-z].\)|\([^0-9].\)$/\1\5\n\2\5\n\3\5\n\4\5/g;s/^\([^a-z].\),\([^a-z].\),\([^a-z].\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/g'

sed 's/\([0-9]\),\([0-9]\),\([0-9]\),\([0-9]\)\([,|]\)\(.\)/\1\6\n\2\6\n\3\6\n\4\6/' input | sed '/^[a-z]$/d' this expression is give the correct output for you.

Related

Pattern conversion in Unix

unix ksh how to print $1 and first n characters of $2

transpose a column in unix

Use sed to delete everything after '>' and add index number plus a string?

replacing nth character using unix magic

Categories

Resources

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

The characteristics of the input data - unix

sed '{s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\5\n\2\5\n\3\5\n\4\5/;s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/}' filename

One way using GNU awk: awk -F "[,|]" '{ for (i=1; i<NF; i++) print $i$NF }' file.txt Results: 100acct 210acct 354acct 462acct 331mis 746mis 50mis 90sales 263sales 47sales 14sales

use the following sed 's/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\5\n\2\5\n\3\5\n\4\5/g;s/^\([^a-z].*\),\([^a-z].*\),\([^a-z].*\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/g'

sed 's/\([0-9]\),\([0-9]*\),\([0-9]*\),*\([0-9]*\)\([,|]\)\(.*\)/\1\6\n\2\6\n\3\6\n\4\6/' input | sed '/^[a-z]*$/d' this expression is give the correct output for you.

Related

Pattern conversion in Unix

unix ksh how to print $1 and first n characters of $2

transpose a column in unix

Use sed to delete everything after '>' and add index number plus a string?

replacing nth character using unix magic

Categories

Resources

sed '{s/^\([^a-z].\),\([^a-z].\),\([^a-z].\),\([^a-z].\)|\([^0-9].\)$/\1\5\n\2\5\n\3\5\n\4\5/;s/^\([^a-z].\),\([^a-z].\),\([^a-z].\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/}' filename

use the following sed 's/^\([^a-z].\),\([^a-z].\),\([^a-z].\),\([^a-z].\)|\([^0-9].\)$/\1\5\n\2\5\n\3\5\n\4\5/g;s/^\([^a-z].\),\([^a-z].\),\([^a-z].\)|\([^0-9].*\)$/\1\4\n\2\4\n\3\4/g'

sed 's/\([0-9]\),\([0-9]\),\([0-9]\),\([0-9]\)\([,|]\)\(.\)/\1\6\n\2\6\n\3\6\n\4\6/' input | sed '/^[a-z]$/d' this expression is give the correct output for you.