Extract values of a variable occurring multiple times in a file

Extract values of a variable occurring multiple times in a file - unix

I have to extract value of a variable which occurs multiple times in a file. for example, I have a text file abc.txt . There is a variable result. Suppose value of result in first line is 2, in third line it is 55 and in last line it is 66.
Then my desired output should be :
result:2,55,66
I am new in unix so I could not figure out how to do this. Please help
The contents of text file can be as follows:
R$#$#%$W%^BHGF, result=2,
fsdfsdsgf
VSDF$TR$R,result=55
fsdf4r54
result=66

Try this :
using awk code :
awk -F'(,| |^)result=' '
/result=/{
gsub(",", "", $2)
v = $2
str = (str) ? str","v : v
}
END{print "result:"str}
' abc.txt
Using perl code :
perl -lane '
push #arr, $& if /\bresult=\K\d+/;
END{print "result:" . join ",", #arr}
' abc.txt
Output :
result:2,55,66

Related

calculate the percentage of not null recs in a file in unix

How do i figure out the percentage of not null records in my file in UNIX?
My file like this: I wanted to know the amount of records & the percentage of not null rec's. Tried whole lot of grep n cut commands but nothing seems to be working out. Can anyone help me here please...
"name","country","age","place"
"sam","US","30","CA"
"","","",""
"joe","UK","34","BRIS"
,,,,
"jake","US","66","Ohio"

Perl solution:
#!/usr/bin/perl
use warnings;
use strict;
use 5.012; # say, keys #arr
use Text::CSV_XS qw{ csv };
my ($count_all, #count_nonempty);
csv(in => shift,
out => \ 'skip',
headers => 'skip',
on_in => sub {
my (undef, $columns) = #_;
++$count_all;
length $columns->[$_] and $count_nonempty[$_]++
for 0 .. $#$columns;
},
);
for my $column (keys #count_nonempty) {
say "Column ", 1 + $column, ": ",
100 * $count_nonempty[$column] / $count_all, '%';
}
It uses Text::CSV_XS to read the CSV file. It skips the header line, and for each subsequent line, it calls the callback specified in on_in, which increments the count of all lines and also the count of empty fields per column if the length of a field is zero.

Along with choroba, I would normally recommend using a CSV parser on CSV data.
But in this case, all we want to look for is that a record contains any character that is not a comma or quote: if a record contains only commas and/or quotes, it is a "null" record.
awk '
/[^",]/ {nonnull++}
END {printf "%d / %d = %.2f\n", nonnull, NR, nonnull/NR}
' file
To handle leading/trailing whitespace
awk '
{sub(/^[[:blank:]]+/,""); sub(/[[:blank:]]+$/,"")}
/[^",]/ {nonnull++}
END {printf "%d / %d = %.2f\n", nonnull, NR, nonnull/NR}
' file
If allowing fields containing only whitespace, such as
" ","",,," "
is also a null record, we can simple ignore all whitespace
awk '
/[^",[:blank:]]/ {nonnull++}
END {printf "%d / %d = %.2f\n", nonnull, NR, nonnull/NR}
' file

Replace column in header of a large .txt file - unix

i need to replace the date in header of a large file. So i have multiple column in header, using |(pipe) as separator, like this:
A|B05|1|xxc|2018/06/29|AC23|SoOn
So i need the same header but with the date(5th column) updated : A|B05|1|xxc|2018/08/29|AC23
Any solutions for me? I tried with awk and sed but both of them carried me errors greater than me. I'm new on this and i really want to understand the solution. So could you please help me?

You can use below command which replaces 5th column from every line with content of newdate variable:
awk -v newdate="2018/08/29" 'BEGIN{FS=OFS="|"}{ $5 = newdate }1' infile > outfile
Explanation
awk -v newdate="2018/08/29" ' # call awk, and set variable newdate
BEGIN{
FS=OFS="|" # set input and output field separator
}
{
$5 = newdate # assign fifth field with a content of variable newdate
}1 # 1 at the end does default operation
# print current line/row/record, that is print $0
' infile > outfile
If you want to skip first line incase if you have header then use FNR>1
awk -v newdate="2018/08/29" 'BEGIN{FS=OFS="|"}FNR>1{ $5 = newdate }1' infile > outfile
If you want to replace 5th column in 1st row only then use FNR==1
awk -v newdate="2018/08/29" 'BEGIN{FS=OFS="|"}FNR==1{ $5 = newdate }1' infile > outfile
If you still have problem, frame your question with sample input and
expected output, so that it will be easy to interpret your problem.

Short sed solution:
sed -Ei '1s~\|[0-9]{4}/[0-9]{2}/[0-9]{2}\|~|2018/08/29|~' file
-i - modify the file in-place
1s - substitute only in the 1st(header) line
[0-9]{4}/[0-9]{2}/[0-9]{2} - date pattern

Do this script is right to give expected output?

Will this script give expected output .Provided files are sample files(small
in size). Need to find the columns name from input file and according to
columns names do aggregations on it to get generate report. If not then what
will the possible solution.
#!/bin/sh
temp_file1=$(mktemp /tmp/temp_file1.XXXXX)
contact=""
not_Type=""
count=""
awk 'BEGIN{
OFS=FS=","
split(target,fields,FS)
for (i in fields)
print i
field_idx[fields[i]] = i
print field_idx[fields[i]]
}
NR==1 {
for (i=1;i<=NF;i++)
head[i] = $i
print $head[i]
next
} ' $1 > temp_file1
myarr=( $( cat temp_file1 ) ) # to get the columns name in temp file
for i in ${myarr[*]} #To check required columns to do aggregations
do
case i in
Contact_Id)
contact=Contact_id ;;
Not_Type)
not_Type=Not_Type ;;
Count)
count=Count ;;
esac
done
actual aggregations logic
awk 'BEGIN{FS=OFS=","}{a[$contact OFS $not_Type)]+=1}END{for(i in a)print i,a[i]}' $1 > $2
Note:-It is giving error at line no 1. for awk command
awk: 0602-562 Field $() is not correct.
The input line number is 1.
The source line number is 12.
a.sh[20]: 0403-057 Syntax error at line 20 : '(' is not expected.
input :-
Sr_No,Contact_Id,Not_Type,Count
1,A,RC,1
2,B,OTC,1
3,C,RC,1
4,A,OTC,1
5,D,PB,1
6,A,RC,1
7,B,OTC,1
Expected OutPut:-
Sr_No,Contact_Id,Not_Type,Count
A,OTC,1
A,RC,2
B,OTC,2
C,RC,1
D,PB,1
Thanks in advance.

pattern match and create multiple files LINUX

I have a pipe delimited file with over 20M rows. In 4th column I have a date field. I have to take the partial value (YYYYMM) from the date field and write the matching data to a new file appending it to file name. Thanks for all your inputs.
Inputfile.txt
XX|1234|PROCEDURES|20160101|RC
XY|1634|PROCEDURES|20160115|RC
XM|1245|CODES|20170124|RC
XZ|1256|CODES|20170228|RC
OutputFile_201601.txt
XX|1234|PROCEDURES|20160101|RC
XY|1634|PROCEDURES|20160115|RC
OutputFile_201701.txt
XM|1245|CODES|20170124|RC
OutputFile_201702.txt
XZ|1256|CODES|20170228|RC

Using awk:
$ awk -F\| '{f="outputfile_" substr($4,1,6) ".txt"; print >> f ; close (f)}' file
$ ls outputfile_201*
outputfile_201601.txt outputfile_201701.txt outputfile_201702.txt
Explained:
$ awk -F\| ' # pipe as delimiter
{
f="outputfile_" substr($4,1,6) ".txt" # form output filename
print >> f # append record to file
close(f) # close output file
}' file

How to split and replace strings in columns using awk

I have a tab-delim text file with only 4 columns as shown below:
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:2:d:c:a:FAIL
If the string "FAIL" is found in a specific column starting from column2 to columnN (all the strings are separated by ":") then it would need to replace the second element in that column to "-1". Sample output is shown below:
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:-1:d:c:a:FAIL
Any help using awk?

With any awk:
$ awk 'BEGIN{FS=OFS="\t"} {for (i=2;i<=NF;i++) if ($i~/:FAIL$/) sub(/:[^:]+/,":-1",$i)} 1' file
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:-1:d:c:a:FAIL

In order to split in awk you can use "split".
An example of it would be the following:
split(1,2,"3");
1 is the string you want to split
2 is the array you want to split it into
and 3 is the character that you want to be split on
e.g
string="hello:world"
result=`echo $string | awk '{ split($1,ARR,":"); printf("%s ",ARR[1]);}'`
In this case the result would be equal to hello, because we split the string to the " : " character and we printed the first half of the ARR, if we would print the second half (so printf("%s ",ARR[2])) of the ARR then it would be returned to result the "world".

With gawk:
awk '{$0=gensub(/[^:]*(:[^:]*:[^:]*:[^:]:FAIL)/,"-1\\1", "g" , $0)};1' File
with sed:
sed 's/[^:]*\(:[^:]*:[^:]*:[^:]:FAIL\)/-1\1/g' File

If you are using GNU awk, you can take advantage of the RT feature1 and split the records at tabs and newlines:
awk '$NF == "FAIL" { $2 = "-1"; } { printf "%s", $0 RT }' RS='[\t\n]' FS=':' infile
Output:
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:-1:d:c:a:FAIL
1 The record separator that follows the current record.

Your requirements are somewhat vague, but I'm pretty sure this does what you want with bog standard awk (no gnu-awk extensions):
awk '/FAIL/{$2=-1}1' ORS=\\t RS=\\t FS=: OFS=: input

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extract values of a variable occurring multiple times in a file - unix

Try this : using awk code : awk -F'(,| |^)result=' ' /result=/{ gsub(",", "", $2) v = $2 str = (str) ? str","v : v } END{print "result:"str} ' abc.txt Using perl code : perl -lane ' push #arr, $& if /\bresult=\K\d+/; END{print "result:" . join ",", #arr} ' abc.txt Output : result:2,55,66

Related

calculate the percentage of not null recs in a file in unix

Replace column in header of a large .txt file - unix

Do this script is right to give expected output?

pattern match and create multiple files LINUX

How to split and replace strings in columns using awk

Categories

Resources