unix: how to concatenate files matched in grep - unix

I want to concatenate the files whose name does not include "_BASE_". I thought it would be somewhere along the lines of ...
ls | grep -v _BASE_ | cat > all.txt
the cat part is what I am not getting right. Can anybody give me some idea about this?

Try this
ls | grep -v _BASE_ | xargs cat > all.txt

You can ignore some files with ls using --ignore option and then cat them into a file.
ls --ignore="*_BASE_*" | xargs cat > all.txt
Also you can do that without xargs:
cat $( ls --ignore="*_BASE_*" ) > all.txt
UPD:
Dale Hagglund noticed, that filename like "Some File" will appear as two filenames, "Some" and "File". To avoid that you can use --quoting-style=WORD option, when WORD can be shell or escape.
For example, if --quoting-style=shell Some File will print as 'Some File' and will be interpreted as one file.
Another problem is output file could the same of one of lsed files. We need to ignore it too.
So answer is:
outputFile=a.txt; ls --ignore="*sh*" --ignore="${outputFile}" --quoting-style=shell | xargs cat > ${outputFile}

If you want to get also files from subdirectories, `find' is your friend:
find . -type f ! -name '*_BASE_*' ! -path ./all.txt -exec cat {} >> all.txt \+
It searches files in the current directory and its subdirectories, it finds only files (-type f), ignores files matching to wildcard pattern *_BASE_*, ignores all.txt, and executes cat in the same manner as xargs would.

Related

Recursively finding files in list of directories

How do I recursively count files in a list of Linux directories?
Example:
/dog/
/a.txt
/b.txt
/c.ipynb
/cat/
/d.txt
/e.pdf
/f.png
/g.txt
/owl/
/h.txt
I want following output:
5 .txt
1 .pynb
1 .pdf
1 .png
I tried the following, with no luck.
find . -type f | sed -n 's/..*\.//p' | sort | uniq -c
This find + gawk may work for you:
find . -type f -print0 |
awk -v RS='\0' -F/ '{sub(/^.*\./, ".", $NF); ++freq[$NF]} END {for (i in freq) print freq[i], i}'
It is safe to use -print0 in find to handle files with whitespace and other special glob characters. Likewise we use -v RS='\0' in awk to ensure NUL byte is record seperator.
Use Perl one-liners to make the output in the format you need, like so:
find . -type f | perl -pe 's{.*[.]}{.}' | sort | uniq -c | perl -lane 'print join "\t", #F;' | sort -nr
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start
Assume you have a known a directory path with the following subdirectories foo, bar, baz, qux, quux, gorge and we want to count the file types based on extension, but only for the subdirectories, foo, baz and qux
The best is to just do
$ find /path/{foo,baz,qux} -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c
The exec part just uses a simple sh variable substitution to print the extension.

UNIX: cat large number of files - output being doubled

I need to concatenate a large number of text files from a series of directories. All the text files have the same name but some folders do not contain the file and just need to be skipped.
When I use cat ./**/File.txt > newFile.txt I get the following error /bin/cat: Argument list too long.
I tried using the ulimit command a few different ways but that did not work.
I have tried:
find . -name File.txt -exec cat {} \; > newFile.txt
find . -name File.txt -exec cat {} \+ > newFile.txt
find . -type f -name File.txt | xargs cat
and this results in the files being concatenated twice. For example, I have 3 text files named File.txt, each in a different directory, each with a different line of text:
test1
test2
test3
When I do the above commands my newFile.txt looks like:
test1
test2
test3
test1
test2
test3
I can't figure out why this is happening twice. When I use the command cat ./**/File.txt > newFile.txt on my small test set, it works fine and I end up with one file that has:
test1
test2
test3
I also tried
for a in File.txt ; do cat $a >> newFile.txt ; done
but get the message
cat: File.txt: No such file or directory
because some of the directories do not contain this text file, is my guess.
Is there another way to do this, or is there a reason my files are being concatenated twice?
Here's how I would do it
find . -name File.txt -exec cat {} >> output.txt \;
This searches for all occurrences of the file File.txt and appends the cat'ed output of that file to the file output.txt
However, I have tried your find command and it too also works.
find . -name File.txt -exec cat {} \; > newFile.txt
I would suggest that you clear down the output file newFile.txt before you try either your find or my find as follows:
>newFile.txt
This is a handy way to empty a file's contents. (Although this should not matter to you right now emptying a file by redirecting nothing to it can be done even if another process is writing to the file)
Hope this helps.

How to cat all files with filename with certain words in unix

I have a bunch of file in one directory, what I wanted to do is:
cat a-12-08.json b-12-08_others.json b-12-08-mian.json >> new.json
But there are too many files, is there any command I can use to cat all files with "12-08" in their filename?
I found the solution below.
Here is the answer:
cat *12-08* >> new.json
you can use find to do what you want to archive:
find . -type f -name '*12-08*' -exec sh -c 'grep "one" {} && cat {} >> /tmp/output.txt' \;
In this way you can cat the files with contain the word that you looking for
Use a wildcard name:
cat *12-08* >>new.json
This will work as long as there aren't so many files that you exceed the maximum length of a command line, ARG_MAX (2MB on the Linux systems I checked).

Remove underscores from all filenames within a directory

I have a folder "model" with files named like:
a_EmployeeData
a_TableData
b_TestData
b_TestModel
I basically need to drop the underscore and make them:
aEmployeeData
aTableData
bTestData
bTestModel
Is there away in the Unix Command Line to do so?
This will correctly process files containing odd characters like spaces or even newlines and should work on any Unix / Linux distribution being only based on POSIX syntax.
find model -type f -name "*_*" -exec sh -c 'd=$(dirname "$1"); mv "$1" "$d/$(basename "$1" | tr -d _)"' sh {} \;
Here is what it does:
For each file (not directory) containing an underscore in its name under the model directory and its subdirectories, rename the file in place with all the underscores stripped out.
You can do this simply with bash.
for file in /path/to/model/*; do
mv "$file" "${file/_/}"
done
If you have rename command available then simply do
rename 's/_//' /path/to/model/*
for f in model/* ; do mv "$f" `echo "$f" | sed 's/_//g'` ; done
Edit: modified a few things thanks to suggestions by others, but I'm afraid my code is still bad for strange filenames.
maybe this:
find model -name "*_*" -type f -maxdepth 1 -print | sed -e 'p;s/_//g' | xargs -n2 echo mv
Decomposition:
find all plain files in the directory model what contains at least one underscore, and don't search subdirectories
with the sed make filename adjustments - replace the _ with nothing
also print the old name
fed the two filenames to xargs what will rename the files with mv
The above is for a dry-run. When satisfied, remove the echo before mv for actual rename.
Warning: Will not work if filename contains spaces. If you have GNU sed you can
find . -name "*_*" -maxdepth 1 -print0 | sed -z 'p;s/_//g' | xargs -0 -n2 echo mv
and will works with a filenames with spaces too...
In zsh:
autoload zmv # in ~/.zshrc
cd model && zmv '(**/)(*)' '$1${2//_}'
marc#panic:~$ echo 'a_EmployeeData' | tr -d '_'
aEmployeeData
I had the same problem on my machine, but the filenames had more than one underscore. I used rename with the g option so that all underscores get removed:
find model/ -maxdepth 1 -type f | rename 's/_//g'
Or if there are no subdirectories, just
rename 's/_//g'
If you don't have rename, see Jaypal Singh's answer.
Use the global flag /g with your replace pattern to replace all occurrences within the filename.
find . -type f -print0 | xargs -0 rename 's/_//g'
Or if you want underscores replaced with spaces then use this:
find . -type f -print0 | xargs -0 rename 's/_/ /g'
If you like to live dangerously add the force flag -f in front of your replace pattern rename -f 's/_//g'

unix - how to deal with too many args for cat

I have a bunch of files in a directory, each with one line of text. I want to cat all of these files together (all the one liners) into a single, large file. However, when I use cat there are too many arguments. How can I get around this?
bash$ (ls | xargs cat) > /tmp/some_big_file
try to use -n with xargs to reduce the number of arguments passed to cat
find .|xargs -n 100 cat >> out
look into xargs
find . <whatever> | xargs cat > outfile.txt
Replace the find . <whatever> bit with your own way of getting all the files
Replace outfile.txt with your output file.

Resources