Finding the differences of two variables containing strings unix - unix

How i use diff for variables instead of files.
All tutorials have examples with files but not with variables.
I want it to print just the differences.
for example:
TEXTA=abcdefghijklmnopqrstuvxyz; TEXTB=abcdefghijklmnopqrstuvxyr

diff is a utility to compare two files. If you really want to compare two variables, and you are using bash for your shell, you can "fake it" this way:
diff <(echo ${TEXTA}) <(echo ${TEXTB})
Otherwise, you can just write your variables to two temporary files and compare them.
However, note that in your example, since each variable is a single line, it'll just tell you that they're different, unless you use a version of diff that will show you the specific positions in the line where they differ.

I would use sdiff.
sdiff <(echo $TEXTA) <(echo $TEXTB)
sdiff points out just the differences between the two strings and shows them side-by-side separated by |.
abcdefghijklmnopqrstuvxyz | abcdefghijklmnopqrstuvxyr
This can be useful when your string is too long. sdiff would highlight only the part of the string that is different.

Related

Is unix join strictly lexical?

I have to process many ('00s) two-column delimited files that are numerically sorted by their first column (a long int that can range from 857 to 293823421 for example).
The processing is simple enough: iterate through a loop to left-join the files using one of them as 'anchor' (the 'left' file in the join), using join's -e and -o options to fill in the NULLs.
Question: is there any way join (from Core Utils 8.13) can process these joins as-is, or must I add a sort -k1,1 step to ensure lexical order prior to each join ?
Everything I've read searching this tells me I have to, but I wanted to make sure I wasn't missing some clever trick to avoid the extra sorting. Thank you.
Indeed, join does not support numeric comparisons. However, from your description, it sounds like you can convert your first field into an already-string-sorted form by zero-padding it, and then convert it back by de-zero-padding it. For example, here is a function that performs a join -e NULL on two files that match your description (as I understand it):
function join_by_numeric_first_field () {
local file1="$1"
local file2="$2"
join -e NULL <(awk '{printf("%020d\t%s\n", $1, $2)}' "$file1") \
<(awk '{printf("%020d\t%s\n", $1, $2)}' "$file2") \
| awk '{printf("%d\t%s\n", $1, $2)}'
}
(The awk '{printf("%020d\t%s\n", $1, $2)}' reads each line of a two-column input and re-prints out the two columns, separated by a tab, but treating the first column as a decimal integer and zero-padding it out to twenty characters. The awk '{printf("%d\t%s\n", $1, $2)}' does the same thing, except that it doesn't zero-pad the decimal integer, so it has the effect of stripping out any zero-padding that was there.)
Whether this is a better approach than sort-ing will depend on the size of your files, and on how flexible you need to be in supporting files that don't quite match your description. This approach scales linearly with the file-size, but is significantly more complicated, and is also a bit more fragile, in that the awk commands expect a pretty specific input-format. The sort approach is much simpler, but will not perform as well for large files.

Search for multiple patterns in a file not necessarily on the same line

I need a unix command that would search multiple patterns (basically an AND), however those patterns need not be on the same line (otherwise I could use grep AND command). For e.g. suppose I have a file like following:
This is first line.
This is second line.
This is last line.
If I search for words 'first' and 'last', above file should be included in the result.
Try this question, seems to be the same as yours with plenty of solutions: How to find patterns across multiple lines using grep?
I think instead of AND you actually mean OR:
grep 'first\|last' file.txt
Results:
This is first line.
This is last line.
If you have a large number of patterns, add them to a file; for example if patterns.txt contains:
first
last
Run:
grep -f patterns.txt file.txt

Remove lines which are between given patterns from a file (using Unix tools)

I have a text file (more correctly, a “German style“ CSV file, i.e. semicolon-separated, decimal comma) which has a date and the value of a measurement on each line.
There are stretches of faulty values which I want to remove before further work. I'd like to store these cuts in some script so that my corrections are documented and I can replay those corrections if necessary.
The lines look like this:
28.01.2005 14:48:38;5,166
28.01.2005 14:50:38;2,916
28.01.2005 14:52:38;0,000
28.01.2005 14:54:38;0,000
(long stretch of values that should be removed; could also be something else beside 0)
01.02.2005 00:11:43;0,000
01.02.2005 00:13:43;1,333
01.02.2005 00:15:43;3,250
Now I'd like to store a list of begin and end patterns like 28.01.2005 14:52:38 + 01.02.2005 00:11:43, and the script would cut the lines matching these begin/end pairs and everything that's between them.
I'm thinking about hacking an awk script, but perhaps I'm missing an already existing tool.
Have a look at sed:
sed '/start_pat/,/end_pat/d'
will delete lines between start_pat and end_pat (inclusive).
To delete multiple such pairs, you can combine them with multiple -e options:
sed -e '/s1/,/e1/d' -e '/s2/,/e2/d' -e '/s3/,/e3/d' ...
Firstly, why do you need to keep a record of what you have done? Why not keep a backup of the original file, or take a diff between the old & new files, or put it under source control?
For the actual changes I suggest using Vim.
The Vim :global command (abbreviated to :g) can be used to run :ex commands on lines that match a regex. This is in many ways more powerful than awk since the commands can then refer to ranges relative to the matching line, plus you have the full text processing power of Vim at your disposal.
For example, this will do something close to what you want (untested, so caveat emptor):
:g!/^\d\d\.\d\d\.\d\d\d\d/ -1 write tmp.txt >> | delete
This matches lines that do NOT start with a date (the ! negates the match), appends the previous line to the file tmp.txt, then deletes the current line.
You will probably end up with duplicate lines in tmp.txt, but they can be removed by running the file through uniq.
you are also use awk
awk '/start/,/end/' file
I would seriously suggest learning the basics of perl (i.e. not the OO stuff). It will repay you in bucket-loads.
It is fast and simple to write a bit of perl to do this (and many other such tasks) once you have grasped the fundamentals, which if you are used to using awk, sed, grep etc are pretty simple.
You won't have to remember how to use lots of different tools and where you would previously have used multiple tools piped together to solve a problem, you can just use a single perl script (usually much faster to execute).
And, perl is installed on virtually every unix/linux distro now.
(that sed is neat though :-)
use grep -L (print none matching lines)
Sorry - thought you just wanted lines without 0,000 at the end

script,unix,compare

I have two files ...
file1:
002009092312291100098420090922111
010555101070002956200453T+00001190.81+00001295.920010.87P
010555101070002956200449J+00003128.85+00003693.90+00003128
010555101070002956200176H+00000281.14+00000300.32+00000281
file2:
002009092410521000098420090709111
010560458520002547500432M+00001822.88+00001592.96+00001822
010560458520002547500432D+00000106.68+00000114.77+00000106
In both files in every record starting with 01, the string from 3rd char to 25th char, i.e up to alphabet is the key.
Based on this key, I have to compare two files, and if there is any record matching in file 2, then I have to replace that record in file1, or else append it if it won't match.
Well, this is a fairly unspecific (and basic) programming question. We'll be better able to help us if you explain exactly what you did and where you got stuck.
Also, it looks a bit like homework, and people are wary of giving too much help on homework problems, as it might look like cheating.
To get you started:
I'd recommend Perl to solve this, but awk or another scripting language will also do. I'd recommend against sh/bash, as they are weak on text manipulation; also combining grep et al will become rather cumbersome.
First write a Perl program that filters records starting with 01. Then extract the key and put it into a hash (a Perl structure). Then output a new, combined file as required.
Using awk get the fields from 3-25 but doing something like
awk -F "" '/^01/{print $1}' file_name | cut -c 3-25 and match the first two fields with 01 from both files and get all the lines in two different buffers and compare both the buffers using for line in in a shell script.
Whenever the line in second buffer matches the first one grep the line in second buffer in first file and replace the line in first file with the line in second. I think you need to work a bit around the logic.

Compare two folders which have many files inside contents

Have two folders with approx. 150 java property files.
In a shell script, how to compare both folders to see if there is any new property file in either of them and what are the differences between the property files.
The output should be in a report format.
To get summary of new/missing files, and which files differ:
diff -arq folder1 folder2
a treats all files as text, r recursively searched subdirectories, q reports 'briefly', only when files differ
diff -r will do this, telling you both if any files have been added or deleted, and what's changed in the files that have been modified.
I used
diff -rqyl folder1 folder2 --exclude=node_modules
in my nodejs apps.
Could you use dircmp ?
Diff command in Unix is used to find the differences between files(all types). Since directory is also a type of file, the differences between two directories can easily be figure out by using diff commands. For more option use man diff on your unix box.
-b Ignores trailing blanks (spaces and tabs)
and treats other strings of blanks as
equivalent.
-i Ignores the case of letters. For example,
`A' will compare equal to `a'.
-t Expands <TAB> characters in output lines.
Normal or -c output adds character(s) to the
front of each line that may adversely affect
the indentation of the original source lines
and make the output lines difficult to
interpret. This option will preserve the
original source's indentation.
-w Ignores all blanks (<SPACE> and <TAB> char-
acters) and treats all other strings of
blanks as equivalent. For example,
`if ( a == b )' will compare equal to
`if(a==b)'.
and there are many more.

Resources