unix ksh how to print $1 and first n characters of $2 - unix

I have a file as follows:
$ cat /etc/oratab
hostname01:DBNAME11:/oracle_home/A_19.0.0.0:N
hostname01:DBNAME1_DC:/oracle_home/A_19.0.0.0:N
hostname02:DBNAME21:/oracle_home/B_19.0.0.0:N
hostname02:DBNAME2_DC:/oracle_home/B_19.0.0.0:N
I want print the unique of the first column, first 6 characters of the second column and the third column when the third column matches the string "19.0.0".
The output I want to see is:
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
I put together this piece of code but looks like its not the correct way to do it.
cat /etc/oratab|grep "19.0.0"|awk '{print $1}' || awk -F":" '{print subsrt($2,1,8)}
sorry I am very new to shell scripting

1st solution: With your shown sample please try following, written and tested with GNU awk.
awk 'BEGIN{FS=OFS=":"} {$2=substr($2,1,7)} !arr[$1,$2]++ && $3~/19\.0\.0/{NF--;print}' Input_file
2nd solution: OR in case your awk doesn't support NF-- then try following.
awk '
BEGIN{
FS=OFS=":"
}
{
$2=substr($2,1,7)
}
!arr[$1,$2]++ && $3~/19\.0\.0/{
$4=""
sub(/:$/,"")
print
}
' Input_file
Explanation: Simple explanation would be, set field separator and output field separator as :. Then in main program, set 2nd field to 1st 7 characters of its value. Then check condition if they are unique(didn't occur before) and 3rd field is like 19.0.0, reduce 1 field and print that line.

You may try this awk:
awk 'BEGIN{FS=OFS=":"} $3 ~ /19\.0\.0/ && !seen[$1]++ {
print $1, substr($2,1,7), $3}' /etc/fstab
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0
We check and populate associative array seen only if we find 19.0.0 in $3.

If the lines can be like this and ending on 19.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname01:DBNAME1:/oracle_home/A_19.0.0.1
and the hostname01 only should be unique, you might miss a line.
You could match the pattern using sed and use 2 capture groups that you want to keep and match what you don't want.
Then pipe the output to uniq to get all unique lines instead of line the first column.
sed -nE 's/^([^:]+:.{7})[^:]*(:[^:]*19\.0\.0[^:]*).*/\1\2/p' file | uniq
Output
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0

$ awk 'BEGIN{FS=OFS=":"} index($3,"19.0.0"){print $1, substr($2,1,7), $3}' file | sort -u
hostname01:DBNAME1:/oracle_home/A_19.0.0.0
hostname02:DBNAME2:/oracle_home/B_19.0.0.0

Related

awk/sed/grep to search for substring within string of second semicolon separated part/column and return only first part/column plus the substring

I have a Unix file containing semicolon separated records like below, having 2nd part/column a string with comma separated values, like below:
789651234512;TEST-10=5,TEST-136=6,TEST-3=1,TEST-4=2,TEST-5=3,TEST-9=4,TEST-9013=100
132567123784;TEST-3=1,TEST-136=5,TEST-15=4,TEST-4=2,TEST-5=3
132564013784;TEST-3=1,TEST-15=4,TEST-4=2,TEST-5=8
132496583212;TEST-13=4,TEST-136=7,TEST-23=1,TEST-6=2,TEST-5=3,TEST-4=5,TEST-6=11
I want to find all TEST-136=X, when exists, where X can be any interger number from 1 and up to 3 digits and return them like, for above example:
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
I am using the below awk, but that returns whole string of 2nd part/column:
awk -F'[;]' '/TEST-136/{ print $1";"$2 }' file.txt
However, I need to get only the 1st part/column and also the TEST-136=X part of the 2nd part/column, as said.
assumes ONE match per line/record.
$ awk -F';' 'match($0, /TEST-136=[[:digit:]]+/) {print $1, substr($0,RSTART,RLENGTH)}' OFS=';' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
This might work for you (GNU sed):
sed -En 's/^([^;]*;).*(TEST-136=[^,]*).*/\1\2/p' file
Simple Perl,
$ perl -F";" -lane ' /(TEST-136=\w+)/ and print "$F[0];$1" ' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
$
Another awk
$ awk -F"[;,]" ' { for(i=2;i<=NF;i++) if($i~/TEST-136/) print $1 ";" $i } ' kostas.txt
789651234512;TEST-136=6
132567123784;TEST-136=5
132496583212;TEST-136=7
$

How to use Awk to filter rows using a column value under double quotes

"A","B",123,"C","AAB"
"A","BB",234,"CC","BA"
"AA","B",123,"CC","CBB"
"AA","BB",213,"C","CCA"
I want to get those rows where $1 == AA
awk 'BEGIN { FS = ","; OFS = FS;} {if ($1=="AA") print}'
but its not working. It works if the data is not in double quotes.
Just match the literal " with an escape character. This is the straight-forward filter to match the literal "AA" on the first column. Since awk works on a pattern { action } basis, the condition match to see if first column is "AA" can be done directly without needing to use explicit { print }
If the condition is met for that line, awk is left with a condition as awk 1 file on which case the line is printed.
awk -v FS=, '$1=="\"AA\""' file
Also, you can avoid escapes, by putting the match string in a variable under single-quotes and let it match the variable
awk -v FS=, -v m='"AA"' '$1==m' file
Following awk may help you on same.
awk -F, '{val=$1;gsub(/\"/,"",val)} val=="AA"' Input_file
Solution 2nd:
awk -F"[\",]" '$2=="AA"' Input_file

Find the Record with Null Values and Display the Column Names using Unix

I have following Input file and need to find which field has null and Display the Key Column and Null Value Column Name.
Note : In Future There might be New Fields can be added too.
Input.txt
Keyfeild1|Over|Loan|cc|backup
200|12||0|
100||15|1|200
100|100|100|100|100
50||50||11
ExpectedOutput.txt :
200|Loan
200|backup
100|Over
50|Over
50|cc
Command Used :
cat Input.txt | awk -F"|" '{for(i=1;i<=NF;i++) if($i=="") { print $1"|"i} }'
Achieved Output:
200|3
200|5
100|2
50|2
50|4
Following awk may help you on same.
awk -F"|" 'FNR>1{for(i=2;i<=NF;i++){if($i==""){print $1,"field"i}}}' OFS="|" Input_file
Output will be as follows:
200|field3
200|field5
100|field2
50|field2
50|field4

transpose a column in unix

I have a Unix file which has data like this.
1379545632,
1051908588,
229102020,
1202084378,
1102083491,
1882950083,
152212030,
1764071734,
1371766009,
(FYI, there is no empty line between two numbers as you see above. Its just because of the editor here. Its just a column with all numbers one below other)
I want to transpose it and print as a single line.
Like this:
1379545632,1051908588,229102020,1202084378,1102083491,1882950083,152212030,1764071734,1371766009
Also remove the last comma.
Can someone help? I need a shell/awk solution.
tr '\n' ' ' < file.txt
To remove the last comma you can try sed 's/,$//'.
With GNU awk for multi-char RS:
$ printf 'x,\ny,\nz,\n' | awk -v RS='^$' '{gsub(/\n|(,\n$)/,"")} 1'
x,y,z
awk 'BEGIN { ORS="" } { print }' file
ORS : Output Record separator.
Each Record will be separated with this delimiter.

Maximum number of characters in a field of a csv file using unix shell commands?

I have a csv file. In one of the fields, say the second field, I need to know maximum number of characters in that field. For example, given the file below:
adf,jlkjl,lkjlk
jf,j,lkjljk
jlkj,lkejflkj,adfafef,
jfje,jj,lkjlkj
jjee,eeee,ereq
the answer would be 8 because row 3 has 8 characters in the second field. I would like to integrate this into a bash script, so common unix command line programs are preferred. Imaginary bonus points for explaining what the command is doing.
EDIT: Here is what I have so far
cut --delimiter=, -f 2 test.csv | wc -m
This gives me the character count for all of the fields, not just one, so I still have progress to make.
I would use awk for the task. It uses a comma to split line in fields and for each line checks if the length of second field is bigger that the value already saved.
awk '
BEGIN {
FS = ","
}
{ c = length( $2 ) > c ? length( $2 ) : c }
END {
print c
}
' infile
Use it as a one-liner and assign the return value to a variable, like:
num=$(awk 'BEGIN { FS = "," } { c = length( $2 ) > c ? length( $2 ) : c } END { print c }' infile)
Well #oob, you basically provided the answer with your last edit, and it's the most simple of all answers given. However, I also like #Birei's answer just because I enjoy AWK. :-)
I too had to find the longest possible value for a given field inside a text file today. Tested with your sample and got the expected 8.
cut -d, -f2 test.csv | wc -L
As you see, just a matter of using the correct option for wc (which I hope you have already figured by now).
My solution is to loop over the lines. Than I exchange the commas with new lines to loop over the words than I check which is the longest word and save the data.
#!/bin/bash
lineno=1
matchline=0
matchlen=0
for line in $(cat input.txt); do
words=`echo $line | sed -e 's/,/\n/g'`
for word in $words; do
# echo "line: $lineno; length: ${#word}; input: $word"
if [ $matchlen -lt ${#word} ]; then
matchlen=${#word}
matchline=$lineno
fi
done;
lineno=$(($lineno + 1))
done;
echo max length is $matchlen in line $matchline
Bash and Coreutils Solution
There are a number of ways to solve this, but I vote for simplicity. Here's a solution that uses Bash parameter expansion and a few standard shell utilities to measure each line:
cut -d, -f2 /tmp/foo |
while read; do
echo ${#REPLY}
done | sort | tail -n1
The idea here is to split the CSV file, and then use the parameter length expansion of the implicit REPLY variable to measure the characters on each line. When we sort the measurements, the last line of the sorted output will hold the length of the longest line found.
cut out the desired column
print each line length
sort the line lengths
grab the max line length
cut -d, -f2 test.csv | awk '{print length($0);}' | sort -n | tail -n 1

Resources