Unix file comparison - unix

I have two files which has component name and version number separated by a space:
cat file1
com.acc.invm:FNS_PROD 94.0.5
com.acc.invm:FNS_TEST_DCCC_Mangment 94.1.6
com.acc.invm:FNS_APIPlat_BDMap 100.0.9
com.acc.invm:SendEmail 29.6.113
com.acc.invm:SendSms 12.23.65
cat file2
com.acc.invm:FNS_PROD 94.0.5
com.acc.invm:FNS_TEST_DCCC_Mangment 94.0.6
com.acc.invm:FNS_APIPlat_BDMap 100.0.10
com.acc.invm:SendEmail 29.60.113
com.acc.invm:SendSms 133.28.65
com.acc.invm:distri_cob 110
needed output is: All components from file2 with a higher version than in file1.
We have to ignore components from file2 if that is not in file1, and components with same version and lower version in file1.
In this example the desired output is
com.acc.invm:FNS_APIPlat_BDMap 100.0.10
com.acc.invm:SendEmail 29.60.113
com.acc.invm:SendSms 133.28.65
Hope so I am clear with my requirement.

Here's a simple solution which is "almost there":
join -a1 file1 file2 | awk '$2 > $3 {print $1,$2}'
It produces:
com.acc.invm:FNS_TEST_DCCC_Mangment 94.1.6
com.acc.invm:FNS_APIPlat_BDMap 100.0.9
com.acc.invm:SendEmail 29.6.113
com.acc.invm:SendSms 12.23.65
Note that the BDMap entry is produced because awk doesn't know how to parse your version numbers, so they're compared textually. If you could use version numbers with fixed numbers of digits like 100.000.009 this would fix it, but I suppose you don't want to do that so we'll need to work on the above a little more.

$ cat tst.awk
{ split($2,a,/\./); curr = a[1]*10000 + a[2]*100 + a[3] }
NR==FNR { prev[$1] = curr; next }
!($1 in prev) || (curr > prev[$1])
$ awk -f tst.awk file2 file1
com.acc.invm:FNS_TEST_DCCC_Mangment 94.1.6
com.acc.invm:SendEmail 29.6.113
com.acc.invm:SendSms 12.23.65

Related

Compare 2 nd columns from 2 files unix

Compare 2 nd columns from 2 files, unmatch match first file record s write into output file
Example:
# delimiter
Filename_clientid.txt
RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025
RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024
RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026
Client_id.txt
ramesh#RIA000025
suresh#RIA000024
vamshi#RIA000027
Excepted output:
RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026
I used awk command not working can you help me
awk -F '#' 'NR==FNR{a[$2]; next} FNR==1 || !($1 in a)' Client_id.txt Filename_clientid.txt
alternative
$ join -t# -j2 <(sort -t# -k2 file1) <(sort -t# -k2 file2)
RIA000026#RIA000026_MA_plan_BTR_09282022_6.xml#ramesh
The number of zeroes is not the same in both files. If they are the same, you can check that the field 2 value of Filename_clientid.txt does not occur in a
Filename_clientid.txt
RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025
RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024
RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026
Client_id.txt
ramesh#RIA00025
suresh#RIA00024
vamshi#RIA00027
Example
awk -F'#' 'NR==FNR{a[$2]; next} !($2 in a)' Client_id.txt Filename_clientid.txt
Output
RIA00026_MA_plan_BTR_09282022_6.xml#RIA000026
With corrected inputs (was wrong with number of zeroes):
file1
RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025
RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024
RIA000026_MA_plan_BTR_09282022_6.xml#RIA000026
file2
ramesh#RIA000025
suresh#RIA000024
vamshi#RIA000027
ramesh#RIA000026
code
awk -F'#' 'NR==FNR{a[$1]=$0;next} $2 in a{print a[$2]}' file1 file2
Output
RIA000026_MA_plan_BTR_09282022_6.xml

Append data from 1 file to another using AWK

I have an already existing script to check the exclusive data between 2 files and load it in 3rd file. The command is below.
var='FNR == NR {keys[$1 $2]; next} !($1 $2 in keys)'
awk -F\| $var file1.dat file2.dat > file3.dat
The requirement is to reuse the same but just append the data from file2 to file3 ignoring file1. I tried to do the below but it is spooling the data from both file1 and file2. All I need is, though there are 2 file names provided in the awk command, only the 2nd file data to be appended.
var='{print $0}'
awk -F\| $var file1.dat file2.dat > file3.dat
Can anyone help with the exact command.
Below is the data in each file and expected output.
File1 (Can have 0 or more) - We should not look at this file at all
123
456
789
File2:
123
ABC
XYZ
456
Expected output in File3 (All from file2 and just ignore file1 input, but I have to have the file1 name in awk command)
123
ABC
XYZ
456
All from file2 and just ignore file1 input, but I have to have the file1 name in awk command.
If you must use file1 and file2 in arguments to awk command and want to output content from file2 only then you can just use:
awk 'BEGIN {delete ARGV[1]} 1' file1 file2 > file3
123
ABC
XYZ
456
delete ARGV[1] will delete first argument from argument list.
With your shown samples and attempts please try following awk code. Written and tested in GNU awk. Simply use nextfile to skip first Input_file named file1 itself and read 2nd file onwards.
awk 'NR==1{nextfile} 1' file1 file2
also remember not to waste time splitting unneeded fields
{m,g}awk 'BEGIN { delete ARGV[_^=FS="^$"] }_' file1 file2
and it's MUUUCH faster not reading it a row at a time :
mawk2 'BEGIN { delete ARGV[_^=FS="^$"] }_' "${m2p}" "${m3t}"
out9: 1.85GiB 0:00:01 [1.11GiB/s] [1.11GiB/s] [ <=>]
f9d2e18d22eb58e5fc2173863cff238e stdin
mawk2 'BEGIN { delete ARGV[_^=RS=FS="^$"] }_^(ORS=__)' "${m2p}" "${m3t}"
out9: 1.85GiB 0:00:00 [1.92GiB/s] [1.92GiB/s] [<=> ]
f9d2e18d22eb58e5fc2173863cff238e stdin
and try to avoid the slow default mode of gawk :
gawk 'BEGIN { delete ARGV[_^=FS="^$"] }_' "${m2p}" "${m3t}"
out9: 1.85GiB 0:00:03 [ 620MiB/s] [ 620MiB/s] [ <=> ]
f9d2e18d22eb58e5fc2173863cff238e stdin

compare two fields from two different files using awk

I have two files where I want to compare certain fields and produce the output
I have a variable as well
echo ${CURR_SNAP}
123
File1
DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|RSCNAME1
DOMAIN2|USER2|LE2|ORG2|ACCES2|RSCTYPE2|RSCNAME2
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|RSCNAME3
DOMAIN4|USER4|LE4|ORG4|ACCES4|RSCTYPE4|RSCNAME4
File2
ORG1|PRGPATH1
ORG3|PRGPATH3
ORG5|PRGPATH5
ORG6|PRGPATH6
ORG7|PRGPATH7
The output I am expecting as below where the last column is CURR_SNAP value and the matching will be 4th column of File1 should be matched with 1st column of File2
DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|123
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|123
I tried with the below code piece but looks like I am not doing it correctly
awk -v CURRSNAP="${CURR_SNAP}" '{FS="|"} NR==FNR {x[$0];next} {if(x[$1]==$4) print $1"|"$2"|"$3"|"$4"|"$5"|"$6"|"CURRSNAP}' File2 File1
With awk:
#! /bin/bash
CURR_SNAP="123"
awk -F'|' -v OFS='|' -v curr_snap="$CURR_SNAP" '{
if (FNR == NR)
{
# this stores the ORG* as an index
# here you can store other values if needed
orgs_arr[$1]=1
}
else if (orgs_arr[$4] == 1)
{
# overwrite $7 to contain CURR_SNAP value
$7=curr_snap
print
}
}' file2 file1
As in your expected output, you didn't output RSCNAME*, so I have overwritten $7(which is column for RSCNAME*) with $CURR_SNAP. If you want to display RSCNAME* column aswell, remove $7=curr_snap and change print statement to print $0, curr_snap.
I wouldn't use awk at all. This is what join(1) is meant for (Plus sed to append the extra column:
$ join -14 -21 -t'|' -o 1.1,1.2,1.3,1.4,1.5,1.6 File1 File2 | sed "s/$/|${CURR_SNAP}/"
DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|123
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|123
It does require that the files be sorted based on the common field, like your examples are.
You can do this with awk with two-rules. For the first file (where NR==FNR), simply use string concatenation to append the fields 1 - (NF-1) assigning the concatenated result to an array indexed by $4. Then for the second file (where NR>FNR) in rule two test if array[$1] has content and if so, output the array and append "|"CURR_SNAP (with CURR_SNAP shortened to c in the example below and array being a), e.g.
CURR_SNAP=123
awk -F'|' -v c="$CURR_SNAP" '
NR==FNR {
for (i=1;i<NF;i++)
a[$4]=i>1?a[$4]"|"$i:a[$4]$1
}
NR>FNR {
if(a[$1])
print a[$1]"|"c
}
' file1 file2
Example Use/Output
After setting the filenames to match yours, you can simply copy/middle-mouse-paste in your console to test, e.g.
$ awk -F'|' -v c="$CURR_SNAP" '
> NR==FNR {
> for (i=1;i<NF;i++)
> a[$4]=i>1?a[$4]"|"$i:a[$4]$1
> }
> NR>FNR {
> if(a[$1])
> print a[$1]"|"c
> }
> ' file1 file2
DOMAIN1|USER1|LE1|ORG1|ACCES1|RSCTYPE1|123
DOMAIN3|USER3|LE3|ORG3|ACCES3|RSCTYPE3|123
Look things over and let me know if you have further questions.

How to get a pattern from a file and search in another file in unix

I have 2 files File1 and File2.
File1 has some values separated by "|". For example,
A|a
C|c
F|f
File2 also has some values separated by "|". For example,
a|1
b|2
c|3
d|4
e|5
Means 2nd column in File1 is resembled with 1st column of File2.
I have to create 3rd file File3 with expected output
A|a|1
C|c|3
I tried to take each record in loop and searched for that in File2 using "awk".
It worked, but the problem is both File1 and File2 are having more than 5 million records.
I need an optimized solution.
You can use this awk,
awk -F'|' 'NR==FNR{a[$2]=$1;next} $1 in a { print a[$1],$1,$2 }' OFS="|" file1 file2 > file3
More clearer way:
awk 'BEGIN{ OFS=FS="|";} NR==FNR{a[$2]=$1;next} $1 in a { print a[$1],$1,$2 }' file1 file2 > file3
As per #Kent suggestion:
If your file2 have more than two columns that you want it in file3 then,
awk 'BEGIN{ OFS=FS="|";} NR==FNR{a[$2]=$1;next} $1 in a { print a[$1],$0 }' file1 file2 > file3
Here,
FS - Field Separator
OFS - Output Field Separator
This is what join was created to do:
$ join -t '|' -o '1.1,1.2,2.2' -1 2 -2 1 file1 file2
A|a|1
C|c|3
man join for more details and pay particular attention to the files needing to be sorted on the join fields (i.e. 2nd field for file1 and 1st field for file2), as your posted sample input is.

How to interleave lines from two text files

What's the easiest/quickest way to interleave the lines of two (or more) text files? Example:
File 1:
line1.1
line1.2
line1.3
File 2:
line2.1
line2.2
line2.3
Interleaved:
line1.1
line2.1
line1.2
line2.2
line1.3
line2.3
Sure it's easy to write a little Perl script that opens them both and does the task. But I was wondering if it's possible to get away with fewer code, maybe a one-liner using Unix tools?
paste -d '\n' file1 file2
Here's a solution using awk:
awk '{print; if(getline < "file2") print}' file1
produces this output:
line 1 from file1
line 1 from file2
line 2 from file1
line 2 from file2
...etc
Using awk can be useful if you want to add some extra formatting to the output, for example if you want to label each line based on which file it comes from:
awk '{print "1: "$0; if(getline < "file2") print "2: "$0}' file1
produces this output:
1: line 1 from file1
2: line 1 from file2
1: line 2 from file1
2: line 2 from file2
...etc
Note: this code assumes that file1 is of greater than or equal length to file2.
If file1 contains more lines than file2 and you want to output blank lines for file2 after it finishes, add an else clause to the getline test:
awk '{print; if(getline < "file2") print; else print ""}' file1
or
awk '{print "1: "$0; if(getline < "file2") print "2: "$0; else print"2: "}' file1
#Sujoy's answer points in a useful direction. You can add line numbers, sort, and strip the line numbers:
(cat -n file1 ; cat -n file2 ) | sort -n | cut -f2-
Note (of interest to me) this needs a little more work to get the ordering right if instead of static files you use the output of commands that may run slower or faster than one another. In that case you need to add/sort/remove another tag in addition to the line numbers:
(cat -n <(command1...) | sed 's/^/1\t/' ; cat -n <(command2...) | sed 's/^/2\t/' ; cat -n <(command3) | sed 's/^/3\t/' ) \
| sort -n | cut -f2- | sort -n | cut -f2-
With GNU sed:
sed 'R file2' file1
Output:
line1.1
line2.1
line1.2
line2.2
line1.3
line2.3
Here's a GUI way to do it: Paste them into two columns in a spreadsheet, copy all cells out, then use regular expressions to replace tabs with newlines.
cat file1 file2 |sort -t. -k 2.1
Here its specified that the separater is "." and that we are sorting on the first character of the second field.

Resources