find many word that match in a pattern file (txt file) - unix

Unix system fine all word in a txt file, key word in a pattern file
EX: pattern file txt
1
2
3
EX: a.txt file we want to fine out that word contain 1 or 2 or 3
a
2
4
3
5
4
1
2
Result like:
2
3
1
2
I had try awk, but not good
awk '/1/,/2/,/3/,.... a.txt

You want to have an exact match between pattern.txt and a.txt. This implies if pattern.txt contains a line:
foo
then this line can only match "foo" and not
bar foo
foo bar
foo123
For a perfect match you can do:
$ awk '(NR==FNR){a[$0];next}($0 in a)' pattern.txt a.txt
$ grep -xFf pattern.txt a.txt

Related

Finding amount of sequence matches per line

I'm looking to use GREP or something similiar to find the total matches of a 5 letter sequence (AATTC) in every line of a file, and then print the result in a new file. For example:
File 1:
GGGGGAATTCGAATTC
GGGGGAATTCGGGGGG
GGGGGAATTCCAATTC
Then in another file it prints the matches line by line
File 2:
2
1
2
Awk solution:
awk '{ print gsub(/AATTC/,"") }' file1 > file2
The gsub() function returns the number of substitutions made
$ cat file2
2
1
2
If you have to use grep, then put that in a while loop,
$ while read -r line; do grep -o 'AATTC'<<<"$line"|wc -l >> file2 ; done < file1
$ cat file2
2
1
2
Another way: using perl.
$ perl -ne 'print s/AATTC/x/g ."\n"' file1 > file2

Splitting text file based on column value in unix

I have a text file:
head train_test_split.txt
1 0
2 1
3 0
4 1
5 1
What I want to do is save the first column values for which second column value is 1 to file train.txt.
So, the corresponding first column value for second column value with 1 are: 2,4,5. So, in my train.txt file i want:
2
4
5
How can I do this easily unix?
You can use awk for this:
awk '$2 == 1 { print $1 }' inputfile
That is,
$2 == 1 is a filter,
matching lines where the 2nd column is 1,
and print $1 means to print the first column.
In Perl:
$ perl -lane 'print "$F[0]" if $F[1]==1' file
Or GNU grep:
$ grep -oP '^(\S+)(?=[ \t]+1$)' file
But awk is the best. Use awk...

How do I list specific lines using awk?

I have a file that is sorted like the following:
2 Good
2 Hello
3 Goodbye
3 Begin
3 Yes
3 No
I want to search for the highest value in the file and display what is one the line?
3 Goodbye
3 begin
3 Yes
3 No
How would I do this?
awk to the rescue!
$ awk 'FNR==NR{if(max<$1) max=$1; next} $1==max' file{,}
3 Goodbye
3 Begin
3 Yes
3 No
double-pass, find the maximum and filter out the rest.
cat file.txt | sort -r | awk '{if ($1>=prev) {print $0; prev=$1}}'
3 Yes
3 No
3 Goodbye
3 Begin
Assuming file.txt contains
2 Good
2 Hello
3 Goodbye
3 Begin
3 Yes
3 No
First get the highest value in the file into a variable. Considering the file is already sorted, pickup the last line in the file. Then parse out the number using awk.
highest=`tail -1 file.list|awk '{print $1}'`
Then grep the file using that value.
grep "^${highest} " file.list
This should do the job. I am only using awk as required in the question:
awk 'BEGIN {v=0} {l = l "\n" $0} {if ($1>v) {l = $0; v = $1}} END {print l}' file.txt
The variable v is initialized (before parsing the file) to 0. Then each line is read and kept in memory; if the first field ($1) is greater than v, then update v and empty what is in l. At the end, just print the content of l.
It's easier than you think.
awk '/^3/' file
3 Goodbye
3 Begin
3 Yes
3 No

Convert specific column of file into upper case in unix (without using awk and sed)

My file is as below
file name = test
1 abc
2 xyz
3 pqr
How can i convert second column of file in upper case without using awk or sed.
You can use tr to transform from lowercase to uppercase. cut will extract the single columns and paste will combine the separated columns again.
Assumption: Columns are delimited by tabs.
paste <(cut -f1 file) <(cut -f2 file | tr '[:lower:]' '[:upper:]')
Replace file with your file name (that is test in your case).
In pure bash
#!/bin/bash
while read -r col1 col2;
do
printf "%s%7s\n" "$col1" "${col2^^}"
done < file > output-file
Input-file
$ cat file
1 abc
2 xyz
3 pqr
Output-file
$ cat output-file
1 ABC
2 XYZ
3 PQR

AWK to use multiple spaces as delimiter

I am using below command to join two files using first two columns.
awk 'NR==FNR{a[$1,$2]=substr($0,3);next} ($1,$2) in a{print $0, a[$1,$2] > "br0102_3.txt"}' br01.txt br02.txt
Now, by default AWk command uses whitespaces as the separators. But my file may contain single space between two words, e.g.
File 1:
ABCD TEXT1 TEXT2 123123112312312312312312312312312312
BCDEFG TEXT3TEXT4 133123123123123123123123123125423423
QWERT TEXT5TEXT6 123123123123125456678786789698758567
File 2:
ABCD TEXT1 TEXT2 12312312312312312312312312312
BCDEFG TEXT3TEXT4 31242342342342342342342342343
MNHT TEXT8 TEXT9 31242342342342342342342342343
I want the result file as ;
ABCD TEXT1 TEXT2 123123112312312312312312312312312312 12312312312312312312312312312
BCDEFG TEXT3TEXT4 133123123123123123123123123125423423 31242342342342342342342342343
QWERT TEXT5TEXT6 123123123123125456678786789698758567
MNHT TEXT8 TEXT9 31242342342342342342342342343
Any hints ?
awk supports a regular expression as the value of FS so you can specify a regular expression that matches at least two spaces. Something like -F '[[:space:]][[:space:]]+'.
$ awk '{print NF}' File2
4
3
4
$ awk -F '[[:space:]][[:space:]]+' '{print NF}' File2
3
3
3
You are using fixed width fields so you should be using gnu awk FIELDWIDTHS (or similar) to separate the fields, e.g. if the 2nd field is the 15 chars from char 8 to char 23 inclusive in this file:
$ cat file
abc def ghi klm
AAAAAAAB C D E F G H IJJJJ
abc def ghi klm
$ awk -v FIELDWIDTHS="7 15 4" '{print "<" $2 ">"}' file
<def ghi >
<B C D E F G H I>
< def ghi >
Any solution that relies on a certain number of spaces between fields will fail when you have 1 or zero spaces between your fields.
If you want to strip leading/trailing blanks from your target field(s):
$ awk -v FIELDWIDTHS="7 15 4" '{gsub(/^\s+|\s+$/,"",$2); print "<" $2 ">"}' file
<def ghi>
<B C D E F G H I>
<def ghi>
awk automatically detects multiple spaces if field seperator is set to " "
Thus, this simply works:
awk -F' ' '{ print $2 }'
to get the second column if you have a table like the one mentioned.

Resources