transpose a column in unix - unix

I have a Unix file which has data like this.
1379545632,
1051908588,
229102020,
1202084378,
1102083491,
1882950083,
152212030,
1764071734,
1371766009,
(FYI, there is no empty line between two numbers as you see above. Its just because of the editor here. Its just a column with all numbers one below other)
I want to transpose it and print as a single line.
Like this:
1379545632,1051908588,229102020,1202084378,1102083491,1882950083,152212030,1764071734,1371766009
Also remove the last comma.
Can someone help? I need a shell/awk solution.

tr '\n' ' ' < file.txt
To remove the last comma you can try sed 's/,$//'.

With GNU awk for multi-char RS:
$ printf 'x,\ny,\nz,\n' | awk -v RS='^$' '{gsub(/\n|(,\n$)/,"")} 1'
x,y,z

awk 'BEGIN { ORS="" } { print }' file
ORS : Output Record separator.
Each Record will be separated with this delimiter.

Related

unix ksh how to print $1 and first n characters of $2

I have a file as follows:
$ cat /etc/oratab
hostname01:DBNAME11:/oracle_home/A_19.0.0.0:N
hostname01:DBNAME1_DC:/oracle_home/A_19.0.0.0:N
hostname02:DBNAME21:/oracle_home/B_19.0.0.0:N
hostname02:DBNAME2_DC:/oracle_home/B_19.0.0.0:N
I want print the unique of the first column, first 6 characters of the second column and the third column when the third column matches the string "19.0.0".
The output I want to see is:
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
I put together this piece of code but looks like its not the correct way to do it.
cat /etc/oratab|grep "19.0.0"|awk '{print $1}' || awk -F":" '{print subsrt($2,1,8)}
sorry I am very new to shell scripting
1st solution: With your shown sample please try following, written and tested with GNU awk.
awk 'BEGIN{FS=OFS=":"} {$2=substr($2,1,7)} !arr[$1,$2]++ && $3~/19\.0\.0/{NF--;print}' Input_file
2nd solution: OR in case your awk doesn't support NF-- then try following.
awk '
BEGIN{
FS=OFS=":"
}
{
$2=substr($2,1,7)
}
!arr[$1,$2]++ && $3~/19\.0\.0/{
$4=""
sub(/:$/,"")
print
}
' Input_file
Explanation: Simple explanation would be, set field separator and output field separator as :. Then in main program, set 2nd field to 1st 7 characters of its value. Then check condition if they are unique(didn't occur before) and 3rd field is like 19.0.0, reduce 1 field and print that line.
You may try this awk:
awk 'BEGIN{FS=OFS=":"} $3 ~ /19\.0\.0/ && !seen[$1]++ {
print $1, substr($2,1,7), $3}' /etc/fstab
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
We check and populate associative array seen only if we find 19.0.0 in $3.
If the lines can be like this and ending on 19.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.1
and the hostname01 only should be unique, you might miss a line.
You could match the pattern using sed and use 2 capture groups that you want to keep and match what you don't want.
Then pipe the output to uniq to get all unique lines instead of line the first column.
sed -nE 's/^([^:]+:.{7})[^:]*(:[^:]*19\.0\.0[^:]*).*/\1\2/p' file | uniq
Output
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
$ awk 'BEGIN{FS=OFS=":"} index($3,"19.0.0"){print $1, substr($2,1,7), $3}' file | sort -u
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0

awk/sed/grep to search for substring within string of second semicolon separated part/column and return only first part/column plus the substring

I have a Unix file containing semicolon separated records like below, having 2nd part/column a string with comma separated values, like below:
789651234512;TEST-10=5,TEST-136=6,TEST-3=1,TEST-4=2,TEST-5=3,TEST-9=4,TEST-9013=100
132567123784;TEST-3=1,TEST-136=5,TEST-15=4,TEST-4=2,TEST-5=3
132564013784;TEST-3=1,TEST-15=4,TEST-4=2,TEST-5=8
132496583212;TEST-13=4,TEST-136=7,TEST-23=1,TEST-6=2,TEST-5=3,TEST-4=5,TEST-6=11
I want to find all TEST-136=X, when exists, where X can be any interger number from 1 and up to 3 digits and return them like, for above example:
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
I am using the below awk, but that returns whole string of 2nd part/column:
awk -F'[;]' '/TEST-136/{ print $1";"$2 }' file.txt
However, I need to get only the 1st part/column and also the TEST-136=X part of the 2nd part/column, as said.
assumes ONE match per line/record.
$ awk -F';' 'match($0, /TEST-136=[[:digit:]]+/) {print $1, substr($0,RSTART,RLENGTH)}' OFS=';' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
This might work for you (GNU sed):
sed -En 's/^([^;]*;).*(TEST-136=[^,]*).*/\1\2/p' file
Simple Perl,
$ perl -F";" -lane ' /(TEST-136=\w+)/ and print "$F[0];$1" ' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
$
Another awk
$ awk -F"[;,]" ' { for(i=2;i<=NF;i++) if($i~/TEST-136/) print $1 ";" $i } ' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
$

How to use Awk to filter rows using a column value under double quotes

"A","B",123,"C","AAB"
"A","BB",234,"CC","BA"
"AA","B",123,"CC","CBB"
"AA","BB",213,"C","CCA"
I want to get those rows where $1 == AA
awk 'BEGIN { FS = ","; OFS = FS;} {if ($1=="AA") print}'
but its not working. It works if the data is not in double quotes.
Just match the literal " with an escape character. This is the straight-forward filter to match the literal "AA" on the first column. Since awk works on a pattern { action } basis, the condition match to see if first column is "AA" can be done directly without needing to use explicit { print }
If the condition is met for that line, awk is left with a condition as awk 1 file on which case the line is printed.
awk -v FS=, '$1=="\"AA\""' file
Also, you can avoid escapes, by putting the match string in a variable under single-quotes and let it match the variable
awk -v FS=, -v m='"AA"' '$1==m' file
Following awk may help you on same.
awk -F, '{val=$1;gsub(/\"/,"",val)} val=="AA"' Input_file
Solution 2nd:
awk -F"[\",]" '$2=="AA"' Input_file

Enclose columns containing alphabets with single quotes using awk

Can awk process this?
Input
Neil,23,01-Jan-1990
25,Reena,19900203
Output
'Neil',23,'01-Jan-1990'
25,'Reena',19900203
awk approach:
awk -F, '{for(i=1;i<=NF;i++) if($i~/[[:alpha:]]/) $i="\047"$i"\047"}1' OFS="," file
The output:
'Neil',23,'01-Jan-1990'
25,'Reena',19900203
if($i~/[[:alpha:]]/) - if field contains alphabetic character
\047 - octal code of single quote ' character
Incorrect was my first attempt
sed -r 's/([^,]*[a-zA-Z]+[^,]*)(,{0,1})/"\1"\2/g' inputfile
#Sundeep gave an excellent comment: I need single quotes and it can be shorter:
I tried to match including the , of end-of-line, causing some complexity for matching. You can just match between the seperators making sure there is an alphabetic character somewhere.
sed 's/[^,]*[a-zA-Z][^,]*/\x27&\x27/g' inputfile
You might use this script:
script.awk
BEGIN { OFS=FS="," }
{ for(i= 1; i<=NF; i++) {
if( !match( $i, /^[0-9]+$/ ) ) $i = "'" $i "'"
}
print
}
and run it like this: awk -f script.awk yourfile .
Explanation
the first line sets up the input and output Fieldseparators to ,.
the loop tests each field, whether it contains only digits (/^[0-9]+$/):
if not the field is put in quotes

Replacing a String Pattern with another sequence in unix

I want replace the String TaskID_1 with a sequence starting from 1001 and this TaskID_1 can exists any many number of lines in my input file.
Similarly i need to replace all occurrences of TASKID_2 in my input file with next sequence value 1002.
Input file:
12345|45345|TaskID_1|dksj|kdjfdsjf|12
1245|425345|TaskID_1|dksj|kdjfdsjf|12
1234|25345|TaskID_2|dksj|kdjfdsjf|12
123425|65345|TaskID_2|dksj|kdjfdsjf|12
123425|15325|TaskID_1|dksj|kdjfdsjf|12
11345|55315|TaskID_2|dksj|kdjfdsjf|12
6345|15345|TaskID_3|dksj|kdjfdsjf|12
72345|25345|TaskID_4|dksj|kdjfdsjf|12
9345|411345|TaskID_3|dksj|kdjfdsjf|12
The output file should look like:
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
Here's one way using awk:
awk 'BEGIN { FS=OFS="|" } { $3=1000 + NR }1' file
Or less verbosely:
awk -F '|' '{ $3=1000 + NR }1' OFS='|' file
Results:
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1002|dksj|kdjfdsjf|12
1234|25345|1003|dksj|kdjfdsjf|12
123425|65345|1004|dksj|kdjfdsjf|12
123425|15325|1005|dksj|kdjfdsjf|12
11345|55315|1006|dksj|kdjfdsjf|12
6345|15345|1007|dksj|kdjfdsjf|12
72345|25345|1008|dksj|kdjfdsjf|12
9345|411345|1009|dksj|kdjfdsjf|12
For the first example, the file separator and output file separator are set to a single pipe character. This is set in the BEGIN block, so that it is executed only once, and not on every line of input. We then set the third column to be equal to 1000 plus an incrementing variable. We could use ++i as this variable, but we could instead use NR (which is short for record number/line number) and this would therefore avoid the need to create an extra variable. The 1 on the end enables printing by default. A more verbose solution would look like:
awk 'BEGIN { FS=OFS="|" } { $3=1000 + NR; print }' file
EDIT:
Using the updated data file, try:
awk 'BEGIN { FS=OFS="|" } { sub(/.*_/,"",$3); $3+=1000 }1' file
Results:
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
A Perl solution using Steve's logic of adding 1000:
perl -pne 's/TaskID_(\d+)/$1+1000/e;' file
This replaces the 'TaskID_n' with 1000+n. 'e' is used to evaluate the replacement.
Replace TaskID_ with 100, this is super easy with sed for single digit IDs:
$ sed 's/TaskID_/100/' file
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
To store this change back to the file use the -i option:
sed -i 's/TaskID_/100/' file
Note: this works for TaskID_[0-9] if you want TaskID_23 mapped to 1023 then this won't, this would map TaskID_23 to 10023.
I can't come up with a better solution than the one steve suggested in awk.
So here's a worse solution, using only bash.
#!/bin/bash
IFS='|'
while read f1 f2 f3 f4 f5 f6; do
printf '%s|%s|%d|%s|%s|%s\n' "$f1" "$f2" "$((${f3#*_}+1000))" "$f4" "$f5" "$f6"
done < input
It's "worse" only because it'll be much slower than awk, which is fast and efficient with this sort of problem.
perl -F"\|" -lane '$F[2]=~s/.*_/100/g;print join("|",#F)' your_file
Tested Below:
> cat temp
12345|45345|TaskID_1|dksj|kdjfdsjf|12
1245|425345|TaskID_1|dksj|kdjfdsjf|12
1234|25345|TaskID_2|dksj|kdjfdsjf|12
123425|65345|TaskID_2|dksj|kdjfdsjf|12
123425|15325|TaskID_1|dksj|kdjfdsjf|12
11345|55315|TaskID_2|dksj|kdjfdsjf|12
6345|15345|TaskID_3|dksj|kdjfdsjf|12
72345|25345|TaskID_4|dksj|kdjfdsjf|12
9345|411345|TaskID_3|dksj|kdjfdsjf|12
> perl -F"\|" -lane '$F[2]=~s/.*_/100/g;print join("|",#F)' temp
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
>

Resources