Appending multiple files into one file

Appending multiple files into one file - unix

I append multiple data files into a single data file using the cat command. How can I assign that single file value into a new file?
I am using the command:
cat file1 file2 file3 > Newfile.txt
AnotherFile=`cat Newfile.txt`
sort $AnotherFile | uniq -c
it showing error like can not open AnotherFile
How to assign this newfile value into another file?

Original answer to original question
Well, the easiest way is probably cp:
cat file1 file2 file3 > Newfile.txt
cp Newfile.txt AnotherFile.txt
Failing that, you can use:
cat file1 file2 file3 > Newfile.txt
AnotherFile=$(cat Newfile.txt)
echo "$AnotherFile" > AnotherFile.txt
Revised answer to revised question
The original question had echo "$AnotherFile" as the third line; the revised question has sort $AnotherFile | uniq -c as the third line.
Assuming that sort $AnotherFile is not sorting all the contents of the files mentioned in the list created from concatenating the original files (that is, assuming that file1, file2 and file3 do not contain just lists of file names), then the objective is to sort and count the lines found in the source files.
The whole job can be done in a single command line:
cat file1 file2 file3 | tee Newfile.txt | sort | uniq -c
Or (more usually):
cat file1 file2 file3 | tee Newfile.txt | sort | uniq -c | sort -n
which lists the lines in increasing order of frequency.
If you really do want to sort the contents of the files listed in file1, file2, file3 but only list the contents of each file once, then:
cat file1 file2 file3 | tee Newfile.txt | sort -u | xargs sort | sort | uniq -c
It looks weird having three sort-related commands in a row, but there is justification for each step. The sort -u ensures each file name is listed once. The xargs sort converts a list of file names on standard input into a list of file names on the sort command line. The output of this is the sorted data from each batch of files that xargs produces. If there are so few files that xargs doesn't need to run sort more than once, then the following plain sort is redundant. However, if xargs has to run sort more than once, then the final sort has to deal with the fact that the first lines from the second batch produced by xargs sort probably come before the last lines produced by the first batch produced by xargs sort.
This becomes a judgement call based on knowledge of the data in the original files. If the files are small enough that xargs won't need to run multiple sort commands, omit the final sort. A heuristic would be "if the sum of the sizes of the source files is smaller than the maximum command line argument list, don't include the extra sort".

You can probably do that in one go:
# Write to two files at once. Both files have a constantly varying
# content until cat is finished.
cat file1 file2 file3 | tee Newfile.txt> Anotherfile.txt
# Save the output filename, just in case you need it later
filename="Anotherfile.txt"
# This reads the contents of Newfile into a variable called AnotherText
AnotherText=`cat Newfile.txt`
# This is the same as "cat Newfile.txt"
echo "$AnotherText"
# This saves AnotherText into Anotherfile.txt
echo "$AnotherText" > Anotherfile.txt
# This too, using cp and the saved name above
cp Newfile.txt "$filename"
If you want to create the second file all in one go, this is a common pattern:
# During this process the contents of tmpfile.tmp is constantly changing
{ slow process creating text } > tmpfile.tmp
# Very quickly create a complete Anotherfile.txt
mv tmpfile.tmp Anotherfile.txt

make file and redirectin this in append mode.
touch Newfile.txt
cat files* >> Newfile.txt

Related

find similar rows in a text file in unix system

I have a file named tt.txt and the contents of this file is as follows:
fdgs
jhds
fdgs
I am trying to get the similar row as the output in a text file.
my expected output is:
fdgs
fdgs
to do so, I used this command:
uniq -u tt.txt > output.txt
but it returns:
fdgs
jhds
fdgs
do you know how to fix it?

If by similar row you mean the row with the same content.
From the uniq manpage the uniq command would only filter the adjacent matching lines from the repeated lines. So you need to sort the input first and used -D option to print all duplicated lines like below. However -D options is limited to the GNU implementation, and doing this would print the output in different order from the input.
sort tt.txt | uniq -D
If you want the output to be in the same order you need to remember the input line number and sort the line number again like this
cat -n tt.txt | sort -k 2 | uniq -f 1 -D | sort -k 1,1 | sed 's/\s+[0-9]+\s+//'
cat -n would print the content with the line number
sort -k 2 would sort the input starting at 2rd column
uniq -f 1 would ignore the first column
sort -k1,1 would sort the the output back by the original line number
sed 's/\s+[0-9]+\s+//' would delete the first column with line number
uniq -u command would output only the unique input line, which is completely opposite as what you want.

One in awk:
$ awk '++seen[$0]==2;seen[$0]>1' file
fdgs
fdgs

I would like to write the content of 24 big files zipped into 1 file unzipped in order to count the number of distinct lines . Python

I have tried this :
!gunzip file1.gz
!cat file1 >> data
!rm -Rf file1
!gunzip file2.gz
!cat file2 >> data
!rm -Rf file2
but when doing that for file2 an error message is saying: "cat: write error: No space left on device" Knowing that I have to do that for 24 files, is there any alternative for me to have the content of all the files within one (the final file is supposed to have more than two million lines)

If you just want to count unique lines, there's no need to create any files:
gunzip -c file1.gz file2.gz ... | sort -u | wc -l
or even
gunzip -c file*.gz | sort -u | wc -l
gunzip -c writes the unzipped files to stdout, one after another. sort -u sorts these, and filters out duplicates. wc -l finally counts the lines.

With unique data in the first two colums, cut them first.
Perhaps with cut -d"," -f1,2 or cut -c1-16. Use the correct cut in
gunzip -c file*.gz | cut -f 1,2 | sort -u | wc -l

Grep from multiple files and get the first n lines of each output

Let's say I have f files.
From each file I want to grep a pattern.
I just want n pattern matches from each file.
What I have:
strings <files_*> | grep <pattern> | head -<n>
I do need to use strings because I'm dealing with binaries, and from this command I am only getting n lines from the total.

grep has a -mX option that allows you to specify how many matches. However, adding this to your piped command line, is going to stop at the the first X matches total... not per file.
To get per-file count, I came up with this:
for FILE in `ls -f <files_*>` ; do strings "$FILE" | grep -m<X> <pattern> ; done
Example (searching for "aa" the files that match x* and returning up to 3 lines from each would be:
for FILE in `ls -f x*` ; do strings "$FILE" | grep -m3 aa ; done

How will find uniq file using md5sum cmmand?

I am using Md5sum command ,i get the file content in binary format
I want the result in without same content available in a file
for example
$ md5sum file1 file2 file3 file4
c8675a129a538248bf9b0f8104c8e817 file1
9d3df2c17bfa06c6558cfc9d2f72aa91 file2
9d3df2c17bfa06c6558cfc9d2f72aa91 file3
2e7261df11a2fcefee4674fc500aeb7f file4
I want the output for not matching in a file that mean
file1 and file2 I need .
c8675a129a538248bf9b0f8104c8e817 file1
2e7261df11a2fcefee4674fc500aeb7f file4
That file content in not same in another file that file only I need
Thanks In Advance

You can say:
md5sum file1 file2 file3 file4 | uniq -u -w33
in order to get the unique files.
Quoting man uniq:
-u, --unique
only print unique lines
EDIT: You seem to be looking for alternatives. Try
md5sum ... | sed ':a;$bb;N;/^\(.\).*\n\1[^\n]*$/ba;:b;s/^\(.\).*\n\1[^\n]*\n*//;ta;/./P;D'

Try this: BASH
find -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 33 | cut -c 35-
Explanation:
Find all files, calculate their MD5SUM, find duplicates by comparing the MD5SUM, print the names
Read more here

sort and uniq oneliner

Is there a oneliner for for sort and uniq given a filename in unix?
I googled and found the following but its not sorting,also not sure what is the below command doing..any better ways using awk or anyother unix tool?
cut -d, -f1 file | uniq | xargs -I{} grep -m 1 "{}" file
On a side note,is there one that can be used in both windows and unix?this is not important but just checking..
C:\Users\Chola>sort -t "#" -k2,2 email-list.txt
Input text file:-
436485
422636
429228
427041
433414
425810
422636
431526
428808

If your file consists only of numbers, one per line:
sort -n FILENAME | uniq
or
sort -u -n FILENAME
(You can add -u to the sort command instead of piping through uniq in all of the following.).
If you want to extract just one column of a file, and then sort that column numerically removing duplicates, you could do this:
cut -f7 FILENAME | sort -n | uniq
Cut assumes that there is a single tab between columns. If your file is CSV, you might be able to do this:
cut -f7 -d, FILENAME | sort -n | uniq
but that won't work if there is a , in a text field in the file (where CSV will protect it with "'s).
If you want to sort by the column but remove only completely duplicate lines, then you can do this:
sort -k7,7n FILENAME | uniq
sort assumes that columns are separated by whitespace. Again, if you want to separate with ,, you can use:
sort -k7,7n -t, FILENAME | uniq

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex