Using inverse grep to compare two .txt files - unix

I have two .txt files "test1.txt" and "test2.txt" and I want to use inverse grep (UNIX) to find out all lines in test2.txt that do not contain any of the lines in test1.txt
test1.txt contains only user names, while test2.txt contains longer strings of text. I only want the lines in test2.txt that DO NOT contain the usernames found in test1.txt
Would it be something like?
grep -v test1.txt test2.txt > answer.txt

Your were almost there just missed one option in your command (i.e -f )
Your Solution should be use the -f flag, see below for sample session demonstrating the same
Demo Session
$ # first file
$ cat a.txt
xxxx yyyy
kkkkkk
zzzzzzzz
$ # second file
$ cat b.txt
line doesnot contain any name
This person is xxxx yyyy good
Another line which doesnot contain any name
Is kkkkkk a good name ?
This name itself is sleeping ...zzzzzzzz
I can't find any other name
Lets try the command now
$ # -i is used to ignore the case while searching
$ # output contains only lines from second file not containing text for first file lines
$ grep -v -i -f a.txt b.txt
line doesnot contain any name
Another line which doesnot contain any name
I can't find any other name
Lets try the command now

They're probably better ways to do this ie. without grep but heres a solution which will work
grep -v -P "($(sed ':a;N;$!ba;s/\n/)|(/g' test1.txt))" test2.txt > answer.txt
To explain this:
$(sed ':a;N;$!ba;s/\n/)|(/g' test1.txt) is an embedded sed command which outputs a string where each newline in test1.txt is replaced by )|( the output is then inserted into a perl style regex (-P) for grep to use, so that grep is searching test2.txt for the every line in text1.txt and returns only those in test2.txt which don't contain lines in test1.txt because of the -v param.

What flavor of unix are you using? This will provide us with a better understanding of what is available to you from the command line. Currently what you have will not work, you're looking for the diff command which compares two files.
You can do the following for OS X 10.6 I have tested this at home.
diff -i -y FILE1 FILE2
diff compares the files -i will ignore the case if this does not matter so Hi and HI will still mean the same. Finally -y will output side by side the results If you want to out the information to a file you could do diff -i -y FILE1 FILE2 >> /tmp/Results.txt

Related

Linux - Get Substring from 1st occurence of character

FILE1.TXT
0020220101
or
01 20220101
Need to extra date part from file where text starts from 2
Options tried:
t_FILE_DT1='awk -F"2" '{PRINT $NF}' FILE1.TXT'
t_FILE_DT2='cut -d'2' -f2- FILE1.TXT'
echo "$t_FILE_DT1"
echo "$t_FILE_DT2"
1st output : 0101
2nd output : 0220101
Expected Output: 20220101
Im new to linux scripting. Could some one help guide where Im going wrong?
Use grep like so:
echo "0020220101\n01 20220101" | grep -P -o '\d{8}\b'
20220101
20220101
Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
grep manual
perlre - Perl regular expressions
Using any awk:
$ awk '{print substr($0,length()-7)}' file
20220101
20220101
The above was run on this input file:
$ cat file
0020220101
01 20220101
Regarding PRINT $NF in your question - PRINT != print. Get out of the habit of using all-caps unless you're writing Cobol. See correct-bash-and-shell-script-variable-capitalization for some reasons.
The 2 in your scripts is telling awka and cut to use the character 2 as the field separator so each will carve up the input into substrings everywhere a 2 occurs.
The 's in your question are single quotes used to make strings literal, you were intending to use backticks, `cmd`, but those are deprecated in favor of $(cmd) anyway.
I would instead of looking for "after" the 2 .. (not having to worry about whether there is a space involved as well) )
Think instead about extracting the last 8 characters, which you know for fact is your date ..
input="/path/to/txt/file/FILE1.TXT"
while IFS= read -r line
do
# read in the last 8 characters of $line .. You KNOW this is the date ..
# No need to worry about exact matching at that point, or spaces ..
myDate=${line: -8}
echo "$myDate"
done < "$input"
About the cut and awk commands that you tried:
Using awk -F"2" '{PRINT $NF}' file will set the field separator to 2, and $NF is the last field, so printing the value of the last field is 0101
Using cut -d'2' -f2- file uses a delimiter of 2 as well, and then print all fields starting at the second field, which is 0220101
If you want to match the 2 followed by 7 digits until the end of the string:
awk '
match ($0, /2[0-9]{7}$/) {
print substr($0, RSTART, RLENGTH)
}
' file
Output
20220101
The accepted answer shows how to extract the first eight digits, but that's not what you asked.
grep -o '2.*' file
will extract from the first occurrence of 2, and
grep -o '2[0-9]*' file
will extract all the digits after every occurrence of 2. If you specifically want eight digits, try
grep -Eo '2[0-9]{7}'
maybe also with a -w option if you want to only accept a match between two word boundaries. If you specifically want only digits after the first occurrence of 2, maybe try
sed -n 's/[^2]*\(2[0-9]*\).*/\1/p' file

Unix command to replace first column of a .csv file

I want a unix command (that I will call in a ControlM job) that changes the value of the first column of my .csv file (not the header line), with the date of the previous day (expected format : YYYY-MM-DD).
I tried many commands but none of them do want I want :
tmp=$(mktemp) && awk -F\| -v val=`date -d yesterday +%F` 'NR>1 {gsub($1,val)}' file.csv > "$tmp" && mv "$tmp" file.csv
or :
awk -F\| -v val=`date -d yesterday +%F` '{gsub($1, val)}1' file.csv
even tried gensub but not working.
Example of what I want :
Input :
VALUE_DATE;TRADE_DATE;DESCR1;DESCR2
2019-03-05;2017-11-15;BRIDGE;HELLO
2019-03-05;2018-03-17;WORK;DATA
Output I want (as today is 2019-03-07):
VALUE_DATE;TRADE_DATE;DESCR1;DESCR2
2019-03-06;2017-11-15;BRIDGE;HELLO
2019-03-06;2018-03-17;WORK;DATA
Can you help please and give me examples of commands that should work, I'm not finding a solution.
Thanks a lot
Could you please try following first?(not saving output into file.csv itself it will print output on terminal once happy then you could use answer
provided at last of this post)
awk -v val=$(date -d yesterday +%F) 'BEGIN{FS=OFS=";"}FNR>1{$1=val} 1' file.csv
Problems identified in OP's code(and fixed in my suggestion):
1- Use of backtick is depreciated now to save shell variable's values, so instead use val=$(date....) for declaring awk's variable named val.
2- Use of -F, you have set your field separator as \| which is pipe but when we see your provided sample Input_file carefully it is delimited with ;(semi colon) NOT | so that is also one of the reason why it is not reflecting in output.
3- Since use of gsub($1,val), replaces whole line to only with value of variable val
because
syntax of gsub is something like: gsub(your_regex/value_needs_to_be_replaced,"new_value"/variable_which_should_be_there_after_replacement,current_line/variable). Since you have defined wrong field separator so whole line being treated as $1 and thus when you print it by doing awk -F\| -v val=$(date -d yesterday +%F) 'NR>1 {gsub($1,val)} 1' file.csv it will only print previous dates.
4- 4th and main issue is you have NOT printed anything, so even you did mistakes you will NOT see any output either on terminal or in output file.
If happy then you could run your own command to make changes into Input_file itself.(I am assuming that you are having propervaluein your tmp variable here)
tmp=$(mktemp) && awk -v val=$(date -d yesterday +%F) 'BEGIN{FS=OFS=";"}FNR>1{$1=val} 1' file.csv > "$tmp" && mv "$tmp" file.csv

unix combine grep w and v command

I want to search a file and include the text #!/bin/bash, but exclude any other line that has a # sign. These two commands: grep -w '#!/bin/bash' file and grep -v '^#' file each do one part of this job. I would like this to be a single command, so here's what I've tried.
grep -w '#!/bin/bash' | grep -v '^#' file
This excludes lines beginning with #, but doesn't include the line #!/bin/bash
grep -w '#!/bin/bash' -v '^#' file
This just prints every line but #!/bin/bash
grep "^[^#]\|^#\!/bin/bash$" test.sh
Explanation:
^[^#] means starts by something different that #
\| is a or
^#\!/bin/bash$ is the exact line #!/bin/bash
So .. it looks as if you're trying to strip comments from bash files without removing their shebang.
The grep command can search for regular expressions, but isn't so good at applying rules of logic. You could do something like this:
grep -v '^#[^!]' input.sh
But you'd fail to strip comments that are affixed to the ends of lines. Note that I'm being a little more liberal with this regex, since it's entirely possible that a script might use something other than /bin/bash for its shebang. :-)
Another possibility would be to use awk. This lets you apply logic that cannot be expressed within a regular expression. For example, if you want to keep the commented line only if it is a shebang on the first line of the file, and remove all other comments, awk can express that as follows:
awk '
NF==1 && /^#!/; # if we're on the first line and find shebang, print.
/^#/ { next } # if this is a comment line, skip it.
1 # print everything else.
' input.sh

difference between grep Vs cat and grep

i would like to know difference between below 2 commands, I understand that 2) should be use but i want to know the exact sequence that happens in 1) and 2)
suppose filename has 200 characters in it
1) cat filename | grep regex
2) grep regex filename
Functionally (in terms of output), those two are the same. The first one actually creates a separate process cat which simply send the contents of the file to standard output, which shows up on the standard input of the grep, because the shell has connected the two with a pipe.
In that sense grep regex <filename is also equivalent but with one less process.
Where you'll start seeing the difference is in variants when the extra information (the file names) is used by grep, such as with:
grep -n regex filename1 filename2
The difference between that and:
cat filename1 filename2 | grep -n regex
is that the former knows about the individual files whereas the latter sees it as one file (with no name).
While the former may give you:
filename1:7:line with regex in 10-line file
filename2:2:another regex line
the latter will be more like:
7:line with regex in 10-line file
12:another regex line
Another executable that acts differently if it knows the file names is wc, the word counter programs:
$ cat qq.in
1
2
3
$ wc -l qq.in # knows file so prints it
3 qq.in
$ cat qq.in | wc -l # does not know file
3
$ wc -l <qq.in # also does not know file
3
First one:
cat filename | grep regex
Normally cat opens file and prints its contents line by line to stdout. But here it outputs its content to pipe'|'. After that grep reads from pipe(it takes pipe as stdin) then if matches regex prints line to stdout. But here there is a detail grep is opened in new shell process so pipe forwards its input as output to new shell process.
Second one:
grep regex filename
Here grep directly reads from file(above it was reading from pipe) and matches regex if matched prints line to stdout.
If you want to check the actual execution time diffrence, first create a file with 100000 lines:
user#server ~ $ for i in $(seq 1 100000); do echo line${1} >> test_f; done
user#server ~ $ wc -l test_f
100000 test_f
Now measure:
user#server ~ $ time grep line test_f
#...
real 0m1.320s
user 0m0.101s
sys 0m0.122s
user#server ~ $ time cat test_f | grep line
#...
real 0m1.288s
user 0m0.132s
sys 0m0.108s
As we can see, the diffrence is not too big...
Actually, though the outputs are the same;
-$cat filename | grep regex
This command looks for the content of the file "filename", then fetches regex in it; while
-$grep regex filename
This command directly searches for the content named regex in the file "filename"
Functionally they are equivalent, however, the shell will fork two processes for cat filename | grep regex and connect them with a pipe.

use of grep commands in unix

I have a file and i want to sort it according to a word and to remove the special characters.
The grep command is used to search for the characters
-b Display the block number at the beginning of each line.
-c Display the number of matched lines.
-h Display the matched lines, but do not display the filenames.
-i Ignore case sensitivity.
-l Display the filenames, but do not display the matched lines.
-n Display the matched lines and their line numbers.
-s Silent mode.
-v Display all lines that do NOT match.
-w Match whole word
but
How to use the grep command to do the file sort and remove the special character and number.
grep searches inside all the files to find matching text. It doesn't really sort and it doesn't really chop and change output. What you want is probably to use the sort command
sort <filename>
and the output sent to either the awk command or the sed command, which are common tools for manipulating text.
sort <filename> | sed 's/REPLACE/NEW_TEXT/g'
something like above I'd imagine.
The following command would do it.
sort FILE | tr -d 'LIST OF SPECIAL CHARS' > NEW_FILE

Resources