How will find uniq file using md5sum cmmand? - unix

I am using Md5sum command ,i get the file content in binary format
I want the result in without same content available in a file
for example
$ md5sum file1 file2 file3 file4
c8675a129a538248bf9b0f8104c8e817 file1
9d3df2c17bfa06c6558cfc9d2f72aa91 file2
9d3df2c17bfa06c6558cfc9d2f72aa91 file3
2e7261df11a2fcefee4674fc500aeb7f file4
I want the output for not matching in a file that mean
file1 and file2 I need .
c8675a129a538248bf9b0f8104c8e817 file1
2e7261df11a2fcefee4674fc500aeb7f file4
That file content in not same in another file that file only I need
Thanks In Advance

You can say:
md5sum file1 file2 file3 file4 | uniq -u -w33
in order to get the unique files.
Quoting man uniq:
-u, --unique
only print unique lines
EDIT: You seem to be looking for alternatives. Try
md5sum ... | sed ':a;$bb;N;/^\(.\).*\n\1[^\n]*$/ba;:b;s/^\(.\).*\n\1[^\n]*\n*//;ta;/./P;D'

Try this: BASH
find -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 33 | cut -c 35-
Explanation:
Find all files, calculate their MD5SUM, find duplicates by comparing the MD5SUM, print the names
Read more here

Related

I would like to write the content of 24 big files zipped into 1 file unzipped in order to count the number of distinct lines . Python

I have tried this :
!gunzip file1.gz
!cat file1 >> data
!rm -Rf file1
!gunzip file2.gz
!cat file2 >> data
!rm -Rf file2
but when doing that for file2 an error message is saying: "cat: write error: No space left on device" Knowing that I have to do that for 24 files, is there any alternative for me to have the content of all the files within one (the final file is supposed to have more than two million lines)
If you just want to count unique lines, there's no need to create any files:
gunzip -c file1.gz file2.gz ... | sort -u | wc -l
or even
gunzip -c file*.gz | sort -u | wc -l
gunzip -c writes the unzipped files to stdout, one after another. sort -u sorts these, and filters out duplicates. wc -l finally counts the lines.
With unique data in the first two colums, cut them first.
Perhaps with cut -d"," -f1,2 or cut -c1-16. Use the correct cut in
gunzip -c file*.gz | cut -f 1,2 | sort -u | wc -l

Copy content of a file to multiple files using CAT command in UNIX

I have 3 files
File1, File2, File3
I want to copy the content of File1 to File2 and File3 in a single command
Is it possible with CAT command
If yes how, if no then which command is used to do this task
maybe this code can help you.
cat file1.txt >> file2.txt && cat file1.txt >> file3.txt
Use tee:
$ cat file1 | tee file2 > file3
man tee:
NAME
tee - read from standard input and write to standard output and files

Unix- Using Grep to get unmatched lines

I am new to unix. I want to grep the unmatched pattern from a file1 provided that the patterns are in the file2. The real files are having more than 1000 lines.
Example:
File1:
Hi(Everyone)
How(u)people(are)doing?
ThanksInadvance
File2:
Hi(Every
ThanksI
Required Result:
How(u)people(are)doing?
I want only the pattern to be used like ("Hi(Every") for the grep.It should return the unmatched line from file1.
this line works for given example:
grep -Fvf file2 file1
The 3 options used above:
-F makes grep do fixed-string match
-v invert matching
-f get patterns from file
the Grep-Flag -v inverts the Grep-Command.
cat File1 |grep -v ("Hi(Every")
should return all Lines from File1 where ("Hi(Every") doesnt contains.
best regards,
Jan

Appending multiple files into one file

I append multiple data files into a single data file using the cat command. How can I assign that single file value into a new file?
I am using the command:
cat file1 file2 file3 > Newfile.txt
AnotherFile=`cat Newfile.txt`
sort $AnotherFile | uniq -c
it showing error like can not open AnotherFile
How to assign this newfile value into another file?
Original answer to original question
Well, the easiest way is probably cp:
cat file1 file2 file3 > Newfile.txt
cp Newfile.txt AnotherFile.txt
Failing that, you can use:
cat file1 file2 file3 > Newfile.txt
AnotherFile=$(cat Newfile.txt)
echo "$AnotherFile" > AnotherFile.txt
Revised answer to revised question
The original question had echo "$AnotherFile" as the third line; the revised question has sort $AnotherFile | uniq -c as the third line.
Assuming that sort $AnotherFile is not sorting all the contents of the files mentioned in the list created from concatenating the original files (that is, assuming that file1, file2 and file3 do not contain just lists of file names), then the objective is to sort and count the lines found in the source files.
The whole job can be done in a single command line:
cat file1 file2 file3 | tee Newfile.txt | sort | uniq -c
Or (more usually):
cat file1 file2 file3 | tee Newfile.txt | sort | uniq -c | sort -n
which lists the lines in increasing order of frequency.
If you really do want to sort the contents of the files listed in file1, file2, file3 but only list the contents of each file once, then:
cat file1 file2 file3 | tee Newfile.txt | sort -u | xargs sort | sort | uniq -c
It looks weird having three sort-related commands in a row, but there is justification for each step. The sort -u ensures each file name is listed once. The xargs sort converts a list of file names on standard input into a list of file names on the sort command line. The output of this is the sorted data from each batch of files that xargs produces. If there are so few files that xargs doesn't need to run sort more than once, then the following plain sort is redundant. However, if xargs has to run sort more than once, then the final sort has to deal with the fact that the first lines from the second batch produced by xargs sort probably come before the last lines produced by the first batch produced by xargs sort.
This becomes a judgement call based on knowledge of the data in the original files. If the files are small enough that xargs won't need to run multiple sort commands, omit the final sort. A heuristic would be "if the sum of the sizes of the source files is smaller than the maximum command line argument list, don't include the extra sort".
You can probably do that in one go:
# Write to two files at once. Both files have a constantly varying
# content until cat is finished.
cat file1 file2 file3 | tee Newfile.txt> Anotherfile.txt
# Save the output filename, just in case you need it later
filename="Anotherfile.txt"
# This reads the contents of Newfile into a variable called AnotherText
AnotherText=`cat Newfile.txt`
# This is the same as "cat Newfile.txt"
echo "$AnotherText"
# This saves AnotherText into Anotherfile.txt
echo "$AnotherText" > Anotherfile.txt
# This too, using cp and the saved name above
cp Newfile.txt "$filename"
If you want to create the second file all in one go, this is a common pattern:
# During this process the contents of tmpfile.tmp is constantly changing
{ slow process creating text } > tmpfile.tmp
# Very quickly create a complete Anotherfile.txt
mv tmpfile.tmp Anotherfile.txt
make file and redirectin this in append mode.
touch Newfile.txt
cat files* >> Newfile.txt

compare two file in unix using awk

I need to compare two files. File1.txt and File2.txt in unix. The values present in File1.txt and not in File2.txt have to be written into diff.txt. I guess we can implement using awk only. Can anyone please guide me to achieve this?
File1.txt
apple
bat
cat
File2.txt
apple
cat
diff.txt
bat
try this one-liner:
awk 'NR==FNR{a[$0];next}!($0 in a)' file2 file1 > diff.txt
diff file2 file1 | perl -lne 'print $1 if(/^\> (.*)/)'
This is the job that "comm" was created to do:
comm -23 file1 file2
man comm for details. The caveat is that the input files have to be sorted, as yours are.

Resources