Adding Double quotes to last value - unix

I have the data in below format.
abc ssg,"-149,684.58","-149,469.05",-215.53
efg sfg,-80.99,-77.46,-3.53
hij sf,"4,341.23","4,131.90",209.33
kilm mm,"2,490,716.13","-180,572.48","9,223.06"
I want to add double quotes to those value at the end which does not have double quotes done through perl or unix
the output should look as below:
abc ssg,"-149,684.58","-149,469.05","-215.53"
efg sfg,-80.99,-77.46,"-3.53"
hij sf,"4,341.23","4,131.90","209.33"
kilm mm,"2,490,716.13","-180,572.48","9,223.06"

This might work for you:
gawk -F ',' -v Q='"' 'BEGIN {OFS=FS} $NF !~ Q {gsub(/.*/,Q $NF Q,$NF); print ; next} 1' INPUTFILE
-F ',' sets the input field separator
-v Q='"' sets the Q variable to ", this helps to avoid some escaping problems
BEGIN {OFS=FS} set the output field separator to the same as the input
$NF !~ Q if the last field not matches Q (==") then
gsub(/.*/,Q $NF Q,$NF) replace the last field to " delimited
print the line
skip the next rule(s) and process the next line
1 executes the default print line action for every other line (where the last field not matches ")

Related

OFS when using if-else statement in awk

I have a simple text file, delimited by multiple spaces, and with a different number of columns (6 or 5).
What I am trying to do is, for the rows with more than 5 columns, combine the 2 last columns in one, doing:
cat data.txt | awk '{if(NF>5) print $1,$2,$3,$4,$5"_"$6; else print $0} OFS="," ' > data.csv
The problem is that the OFS is not working for the else statement.
Example - input:
a d e t er ap
b q j n mm
Output that I am getting:
a,d,e,t,er_ap
b q j n mm
Desirable output:
a,d,e,t,er_ap
b,q,j,n,mm
Any suggestions?
Set your OFS in the BEGIN block so that it's a comma before any processing happens. Also when you do print $0 without manipulating the line in any way, awk will just spit out the line as-is with whatever delimiters are in place in the source file. Personally I think that's dumb, but that's awk. As a workaround, just set one column equal to itself, then print:
awk 'BEGIN{OFS=","}{if(NF>5) print $1,$2,$3,$4,$5"_"$6; else {$1=$1;print $0}}' data.txt
If you anticipate more than 6 columns you can just have it toss underscores for all of them after column 5 with some printf trickery too
awk '{for (i=1;i<=NF;i++){printf (i==NF)?"%s\n":(i>=5)?"%s_":"%s,", $i}}' data.txt

Unix - print distinct field lengths

I would like to print the total number of times each length occurs in a field.
The column type is varChar and the strings in that field are either 9, 10, or 15 characters long. I want to know how many exist of each length.
My code:
awk -F'|'
'NR>1 $61!="" &&
if /length($61)=15/ then {a++}
elif /length($61)=10/ then {b++}
else /length($61)=9/ then {c++}
fi {print a ", " b ", " c}'
ERROR:
awk -F'|' 'NR>1 $61!="" && if /length($61)=15/ then {a++} elif /length($61)=10/ then {b++} else /length($61)=9/ then {c++} fi {print a ", " b ", " c}'
Syntax Error The source line is 1.
The error context is
NR>1 >>> $61!= <<<
awk: 0602-500 Quitting The source line is 1.
INPUT
A pipe delimited .sqf file with 1.2 million rows and column 61 is varChar 15.
based on your pseudo-code I guess you want
awk -F'|' -v OFS=', ' 'NR>1 {count[length($61)]++}
END {print count[15],count[10],count[9]}' file
you'll also have the other length counts there in case of data quality check.
If you want to have 0 instead of null for missing counts, change to count[n]+0 as suggested in the comments.

Use awk to replace word in file

I have a file with some lines:
a
b
c
d
I would like to cat this file into a awk command to produce something like this:
letter is a
letter is b
letter is c
letter is d
using something like this:
cat file.txt | awk 'letter is $1'
But it's not printing out as expected:
$ cat raw.txt | awk 'this is $1'
a
b
c
d
At the moment, you have no { action } block, so your condition evaluates the two empty variables this and is, concatenating them with the first field $1, and checks whether the result is true (a non-empty string). It is, so the default action prints each line.
It sounds like you want to do this instead:
awk '{ print "letter is", $1 }' raw.txt
Although in this case, you might as well just use sed:
sed 's/^/letter is /' raw.txt
This command matches the start of each line and adds the string.
Note that I'm passing the file as an argument, rather than using cat with a pipe.
Not sure if you wanted sed or awk but this is in awk:
$ awk '{print "letter is " $1}' file
letter is a
letter is b
letter is c
letter is d

How to split and replace strings in columns using awk

I have a tab-delim text file with only 4 columns as shown below:
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:2:d:c:a:FAIL
If the string "FAIL" is found in a specific column starting from column2 to columnN (all the strings are separated by ":") then it would need to replace the second element in that column to "-1". Sample output is shown below:
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:-1:d:c:a:FAIL
Any help using awk?
With any awk:
$ awk 'BEGIN{FS=OFS="\t"} {for (i=2;i<=NF;i++) if ($i~/:FAIL$/) sub(/:[^:]+/,":-1",$i)} 1' file
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:-1:d:c:a:FAIL
In order to split in awk you can use "split".
An example of it would be the following:
split(1,2,"3");
1 is the string you want to split
2 is the array you want to split it into
and 3 is the character that you want to be split on
e.g
string="hello:world"
result=`echo $string | awk '{ split($1,ARR,":"); printf("%s ",ARR[1]);}'`
In this case the result would be equal to hello, because we split the string to the " : " character and we printed the first half of the ARR, if we would print the second half (so printf("%s ",ARR[2])) of the ARR then it would be returned to result the "world".
With gawk:
awk '{$0=gensub(/[^:]*(:[^:]*:[^:]*:[^:]:FAIL)/,"-1\\1", "g" , $0)};1' File
with sed:
sed 's/[^:]*\(:[^:]*:[^:]*:[^:]:FAIL\)/-1\1/g' File
If you are using GNU awk, you can take advantage of the RT feature1 and split the records at tabs and newlines:
awk '$NF == "FAIL" { $2 = "-1"; } { printf "%s", $0 RT }' RS='[\t\n]' FS=':' infile
Output:
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:-1:d:c:a:FAIL
1 The record separator that follows the current record.
Your requirements are somewhat vague, but I'm pretty sure this does what you want with bog standard awk (no gnu-awk extensions):
awk '/FAIL/{$2=-1}1' ORS=\\t RS=\\t FS=: OFS=: input

How do I get the maximum value for a text in a UNIX file as shown below?

I have a unix file with the following contents.
$cat myfile.txt
abc:1
abc:2
hello:3
hello:6
wonderful:1
hai:2
hai:4
hai:8
How do I get the max value given for each text in the file above.
'abc' value 2
'hello' value 6
'hai' value 8
'womderful' 1
Based on the current example in your question, minus the first line of expected output:
awk -F':' '{arr[$1]=$2 ; next} END {for (i in arr) {print i, arr[i]} } ' inputfile
You example input and expected output are very confusing.... The reason I posted this is to get feedback from the OP forthcoming
This assumes the data is unsorted, but also works with sorted data (New):
sort -t: -k2n inputfile | awk -F':' '{arr[$1]=$2 ; next} END {for (i in arr) {print i, arr[i]} } '

Resources