Compare 2 nd columns from 2 files unix - unix

Compare 2 nd columns from 2 files, unmatch match first file record s write into output file
Example:
# delimiter
Filename_clientid.txt
RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025
RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024
RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026
Client_id.txt
ramesh#RIA000025
suresh#RIA000024
vamshi#RIA000027
Excepted output:
RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026
I used awk command not working can you help me
awk -F '#' 'NR==FNR{a[$2]; next} FNR==1 || !($1 in a)' Client_id.txt Filename_clientid.txt

alternative
$ join -t# -j2 <(sort -t# -k2 file1) <(sort -t# -k2 file2)
RIA000026#RIA000026_MA_plan_BTR_09282022_6.xml#ramesh

The number of zeroes is not the same in both files. If they are the same, you can check that the field 2 value of Filename_clientid.txt does not occur in a
Filename_clientid.txt
RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025
RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024
RIA00026_MA_plan_BTR_09282022_6.xml#RIA00026
Client_id.txt
ramesh#RIA00025
suresh#RIA00024
vamshi#RIA00027
Example
awk -F'#' 'NR==FNR{a[$2]; next} !($2 in a)' Client_id.txt Filename_clientid.txt
Output
RIA00026_MA_plan_BTR_09282022_6.xml#RIA000026

With corrected inputs (was wrong with number of zeroes):
file1
RIA00024_MA_plan_BTR_09282022_4.xml#RIA00025
RIA00024_MA_plan_BTR_09282022_5.xml#RIA00024
RIA000026_MA_plan_BTR_09282022_6.xml#RIA000026
file2
ramesh#RIA000025
suresh#RIA000024
vamshi#RIA000027
ramesh#RIA000026
code
awk -F'#' 'NR==FNR{a[$1]=$0;next} $2 in a{print a[$2]}' file1 file2
Output
RIA000026_MA_plan_BTR_09282022_6.xml

Related

Append data from 1 file to another using AWK

I have an already existing script to check the exclusive data between 2 files and load it in 3rd file. The command is below.
var='FNR == NR {keys[$1 $2]; next} !($1 $2 in keys)'
awk -F\| $var file1.dat file2.dat > file3.dat
The requirement is to reuse the same but just append the data from file2 to file3 ignoring file1. I tried to do the below but it is spooling the data from both file1 and file2. All I need is, though there are 2 file names provided in the awk command, only the 2nd file data to be appended.
var='{print $0}'
awk -F\| $var file1.dat file2.dat > file3.dat
Can anyone help with the exact command.
Below is the data in each file and expected output.
File1 (Can have 0 or more) - We should not look at this file at all
123
456
789
File2:
123
ABC
XYZ
456
Expected output in File3 (All from file2 and just ignore file1 input, but I have to have the file1 name in awk command)
123
ABC
XYZ
456
All from file2 and just ignore file1 input, but I have to have the file1 name in awk command.
If you must use file1 and file2 in arguments to awk command and want to output content from file2 only then you can just use:
awk 'BEGIN {delete ARGV[1]} 1' file1 file2 > file3
123
ABC
XYZ
456
delete ARGV[1] will delete first argument from argument list.
With your shown samples and attempts please try following awk code. Written and tested in GNU awk. Simply use nextfile to skip first Input_file named file1 itself and read 2nd file onwards.
awk 'NR==1{nextfile} 1' file1 file2
also remember not to waste time splitting unneeded fields
{m,g}awk 'BEGIN { delete ARGV[_^=FS="^$"] }_' file1 file2
and it's MUUUCH faster not reading it a row at a time :
mawk2 'BEGIN { delete ARGV[_^=FS="^$"] }_' "${m2p}" "${m3t}"
out9: 1.85GiB 0:00:01 [1.11GiB/s] [1.11GiB/s] [ <=>]
f9d2e18d22eb58e5fc2173863cff238e stdin
mawk2 'BEGIN { delete ARGV[_^=RS=FS="^$"] }_^(ORS=__)' "${m2p}" "${m3t}"
out9: 1.85GiB 0:00:00 [1.92GiB/s] [1.92GiB/s] [<=> ]
f9d2e18d22eb58e5fc2173863cff238e stdin
and try to avoid the slow default mode of gawk :
gawk 'BEGIN { delete ARGV[_^=FS="^$"] }_' "${m2p}" "${m3t}"
out9: 1.85GiB 0:00:03 [ 620MiB/s] [ 620MiB/s] [ <=> ]
f9d2e18d22eb58e5fc2173863cff238e stdin

using awk to do left outer join kinda sql on files in unix

I'm trying to do a join based on the 1st columns present in both the files.
So far I have tried using is the below code
awk '{if (NR==FNR) {a[$1]=$2; next} if ($1 in a) {print $2"|"$3"|""Found"} if(!($1 in a)) {print $2"|"$3"|""Not Found"}}' file1.txt file2.txt > TARGET.txt
file1 file contents
1as.pdf
2as.pdf
3as.pdf
45.pdf
as.pdf
file2 file contents
3ss.pdf 1_2_3_45.csv 4
3s.pdf 1_2_3_45.csv 4
2as.pdf 1_2_3_45.csv 4
1as.pdf 1_2_3_45.csv 4
45.pdf 1_2_3_45_5.csv 1
3.pdf 1_2_3_5.csv 1
$ awk -v OFS='|' 'NR==FNR{a[$1];next}
{print $2,$3,(($1 in a)?"Found":"Not Found")}' file1 file2
1_2_3_45.csv|4|Not Found
1_2_3_45.csv|4|Not Found
1_2_3_45.csv|4|Found
1_2_3_45.csv|4|Found
1_2_3_45_5.csv|1|Found
1_2_3_5.csv|1|Not Found
however, since you're not printing the key, what's is found, what is not found is not very clear. Perhaps keep the first field in the output as well...

No results while making precise matching using awk

I am having rows like this in my source file:
"Sumit|My Application|PROJECT|1|6|Y|20161103084527"
I want to make a precise match on Column 3 i.e. I do not want to use '~' operator while writing my awk command. However the command:
awk -F '|' '($3 ~ /'"$Var_ApPJ"'/) {print $3}' ${Var_RDR}/${Var_RFL};
is fetching me correct result but the command:
awk -F '|' '($3 == "${Var_ApPJ}") {print $3}' ${Var_RDR}/${Var_RFL};
fails to do so. Can anyone help in explaining why it happens and I am willing to use '==' because I do not want to match if the value is "PROJECT1" in source file.
Parameter Var_ApPJ="PROJECT"
${Var_RDR}/${Var_RFL} -> Refers to source file.
Refer to this part of documentation to know how to pass variable to awk.
I found an alternative way of '==' with '~':
awk -F '|' '($3 ~ "^${Var_ApPJ}"$) {print $3}' ${Var_RDR}/${Var_RFL};
here is the problem -
try below command -
awk -F '|' '$3 == Var_ApPJ {print $3}' ${Var_RDR}/${Var_RFL};
Remove curly braces and bracket.
vipin#kali:~$ cat kk.txt
a 5 b cd ef gh
vipin#kali:~$ awk -v var1="5" '$2 == var1 {print $3}' kk.txt
b
vipin#kali:~$
OR
#cat kk.txt
a 5 b cd ef gh
#var1="5"
#echo $var1
5
#awk '$2 == "'"$var1"'" {print $3}' kk.txt ### With "{}"
b
#
#awk '$2 == "'"${var1}"'" {print $3}' kk.txt ### without "{}"
b
#

Unix file comparison

I have two files which has component name and version number separated by a space:
cat file1
com.acc.invm:FNS_PROD 94.0.5
com.acc.invm:FNS_TEST_DCCC_Mangment 94.1.6
com.acc.invm:FNS_APIPlat_BDMap 100.0.9
com.acc.invm:SendEmail 29.6.113
com.acc.invm:SendSms 12.23.65
cat file2
com.acc.invm:FNS_PROD 94.0.5
com.acc.invm:FNS_TEST_DCCC_Mangment 94.0.6
com.acc.invm:FNS_APIPlat_BDMap 100.0.10
com.acc.invm:SendEmail 29.60.113
com.acc.invm:SendSms 133.28.65
com.acc.invm:distri_cob 110
needed output is: All components from file2 with a higher version than in file1.
We have to ignore components from file2 if that is not in file1, and components with same version and lower version in file1.
In this example the desired output is
com.acc.invm:FNS_APIPlat_BDMap 100.0.10
com.acc.invm:SendEmail 29.60.113
com.acc.invm:SendSms 133.28.65
Hope so I am clear with my requirement.
Here's a simple solution which is "almost there":
join -a1 file1 file2 | awk '$2 > $3 {print $1,$2}'
It produces:
com.acc.invm:FNS_TEST_DCCC_Mangment 94.1.6
com.acc.invm:FNS_APIPlat_BDMap 100.0.9
com.acc.invm:SendEmail 29.6.113
com.acc.invm:SendSms 12.23.65
Note that the BDMap entry is produced because awk doesn't know how to parse your version numbers, so they're compared textually. If you could use version numbers with fixed numbers of digits like 100.000.009 this would fix it, but I suppose you don't want to do that so we'll need to work on the above a little more.
$ cat tst.awk
{ split($2,a,/\./); curr = a[1]*10000 + a[2]*100 + a[3] }
NR==FNR { prev[$1] = curr; next }
!($1 in prev) || (curr > prev[$1])
$ awk -f tst.awk file2 file1
com.acc.invm:FNS_TEST_DCCC_Mangment 94.1.6
com.acc.invm:SendEmail 29.6.113
com.acc.invm:SendSms 12.23.65

How to interleave lines from two text files

What's the easiest/quickest way to interleave the lines of two (or more) text files? Example:
File 1:
line1.1
line1.2
line1.3
File 2:
line2.1
line2.2
line2.3
Interleaved:
line1.1
line2.1
line1.2
line2.2
line1.3
line2.3
Sure it's easy to write a little Perl script that opens them both and does the task. But I was wondering if it's possible to get away with fewer code, maybe a one-liner using Unix tools?
paste -d '\n' file1 file2
Here's a solution using awk:
awk '{print; if(getline < "file2") print}' file1
produces this output:
line 1 from file1
line 1 from file2
line 2 from file1
line 2 from file2
...etc
Using awk can be useful if you want to add some extra formatting to the output, for example if you want to label each line based on which file it comes from:
awk '{print "1: "$0; if(getline < "file2") print "2: "$0}' file1
produces this output:
1: line 1 from file1
2: line 1 from file2
1: line 2 from file1
2: line 2 from file2
...etc
Note: this code assumes that file1 is of greater than or equal length to file2.
If file1 contains more lines than file2 and you want to output blank lines for file2 after it finishes, add an else clause to the getline test:
awk '{print; if(getline < "file2") print; else print ""}' file1
or
awk '{print "1: "$0; if(getline < "file2") print "2: "$0; else print"2: "}' file1
#Sujoy's answer points in a useful direction. You can add line numbers, sort, and strip the line numbers:
(cat -n file1 ; cat -n file2 ) | sort -n | cut -f2-
Note (of interest to me) this needs a little more work to get the ordering right if instead of static files you use the output of commands that may run slower or faster than one another. In that case you need to add/sort/remove another tag in addition to the line numbers:
(cat -n <(command1...) | sed 's/^/1\t/' ; cat -n <(command2...) | sed 's/^/2\t/' ; cat -n <(command3) | sed 's/^/3\t/' ) \
| sort -n | cut -f2- | sort -n | cut -f2-
With GNU sed:
sed 'R file2' file1
Output:
line1.1
line2.1
line1.2
line2.2
line1.3
line2.3
Here's a GUI way to do it: Paste them into two columns in a spreadsheet, copy all cells out, then use regular expressions to replace tabs with newlines.
cat file1 file2 |sort -t. -k 2.1
Here its specified that the separater is "." and that we are sorting on the first character of the second field.

Resources