unix - how to deal with too many args for cat - unix

I have a bunch of files in a directory, each with one line of text. I want to cat all of these files together (all the one liners) into a single, large file. However, when I use cat there are too many arguments. How can I get around this?

bash$ (ls | xargs cat) > /tmp/some_big_file

try to use -n with xargs to reduce the number of arguments passed to cat
find .|xargs -n 100 cat >> out

look into xargs
find . <whatever> | xargs cat > outfile.txt
Replace the find . <whatever> bit with your own way of getting all the files
Replace outfile.txt with your output file.

Related

ls and xargs to output specific file extentions

I am trying to use ls and xargs to print specific file extensions .bam and .vcf witout the path. The below is close but when I | the two ls commands I get the error below. Separated it works fine except each file is printed on a newline (my actual data has hundreds of files and make it easier to read). Thank you :).
files in directory
1.bam
1.vcf
2.bam
2.vcf
command with error
ls /home/cmccabe/Desktop/NGS/test/R_folder/*.bam | xargs -n1 basename | ls /home/cmccabe/Desktop/NGS/test/R_folder/*.vcf | xargs -n1 basename >> /home/cmccabe/Desktop/NGS/test/log
xargs: basename: terminated by signal 13
desired output
1.bam 1.vcf
2.bam 2.vcf
You cannot pipe output into ls and have it print that with its other output. You should give the parameters to the first one and it will output everything.
ls *.a *.b *.c | xargs ...q
ls isn't really doing anything for you currently, it's the shell that's listing all your files. Since you're piping ls's output around, you're actually vulnerable to dangerous file names.
basename can take multiple arguments with the -a option:
basename -a "path/to/files/"*.{bam,vcf}
To print that in two columns, you could use printf via xargs, with sort for... sorting. The -z or -0 flags throughout cause null bytes to be used as the filename separators:
basename -az "path/to/files/"*.{bam,vcf} | sort -z | xargs -0n 2 printf "%b\t%b\n"
If you're going to be doing any more processing after printing to columns, you may want to replace the %bs in the printf format with %qs. That will escape non-printable characters in the output, but might look a bit ugly to human eyes.

How to cat all files with filename with certain words in unix

I have a bunch of file in one directory, what I wanted to do is:
cat a-12-08.json b-12-08_others.json b-12-08-mian.json >> new.json
But there are too many files, is there any command I can use to cat all files with "12-08" in their filename?
I found the solution below.
Here is the answer:
cat *12-08* >> new.json
you can use find to do what you want to archive:
find . -type f -name '*12-08*' -exec sh -c 'grep "one" {} && cat {} >> /tmp/output.txt' \;
In this way you can cat the files with contain the word that you looking for
Use a wildcard name:
cat *12-08* >>new.json
This will work as long as there aren't so many files that you exceed the maximum length of a command line, ARG_MAX (2MB on the Linux systems I checked).

Grep from multiple files and get the first n lines of each output

Let's say I have f files.
From each file I want to grep a pattern.
I just want n pattern matches from each file.
What I have:
strings <files_*> | grep <pattern> | head -<n>
I do need to use strings because I'm dealing with binaries, and from this command I am only getting n lines from the total.
grep has a -mX option that allows you to specify how many matches. However, adding this to your piped command line, is going to stop at the the first X matches total... not per file.
To get per-file count, I came up with this:
for FILE in `ls -f <files_*>` ; do strings "$FILE" | grep -m<X> <pattern> ; done
Example (searching for "aa" the files that match x* and returning up to 3 lines from each would be:
for FILE in `ls -f x*` ; do strings "$FILE" | grep -m3 aa ; done

Unix - Using ls with grep

How can I use ls (or other commands) and grep together to search from specific files for a certain word inside that file?
Example I have a file - 201503003_315_file.txt and I have other files in my dir.
I only want to search files that have a file name that contains _315_ and inside that file, search for the word "SAMPLE".
Hope this is clear and thanks in advance for any help.
You can do:
ls * _315_* | xargs grep "SAMPLE"
The first part: ls * _315_* will list only files that have 315 as part of the file name, this list of files is piped to grep which will scan each one of them and look for "SAMPLE"
UPDATE
A bit easier (and actually safer) approach was mentioned by David in the comments bellow:
grep "SAMPLE" *_315_*.txt
The reason why it's safer is that ls doesn't handle well special characters.
Another option, as mentioned by Charles Duffy in the comments below:
printf '%s\0' *_315_* | xargs -0 grep
Change to that directory (using cd dir) and try:
grep SAMPLE *_315_*
If you really MUST use ls AND grep try this:
ls *_315_* | xargs grep SAMPLE
The first example, however, requires less typing...

unix: how to concatenate files matched in grep

I want to concatenate the files whose name does not include "_BASE_". I thought it would be somewhere along the lines of ...
ls | grep -v _BASE_ | cat > all.txt
the cat part is what I am not getting right. Can anybody give me some idea about this?
Try this
ls | grep -v _BASE_ | xargs cat > all.txt
You can ignore some files with ls using --ignore option and then cat them into a file.
ls --ignore="*_BASE_*" | xargs cat > all.txt
Also you can do that without xargs:
cat $( ls --ignore="*_BASE_*" ) > all.txt
UPD:
Dale Hagglund noticed, that filename like "Some File" will appear as two filenames, "Some" and "File". To avoid that you can use --quoting-style=WORD option, when WORD can be shell or escape.
For example, if --quoting-style=shell Some File will print as 'Some File' and will be interpreted as one file.
Another problem is output file could the same of one of lsed files. We need to ignore it too.
So answer is:
outputFile=a.txt; ls --ignore="*sh*" --ignore="${outputFile}" --quoting-style=shell | xargs cat > ${outputFile}
If you want to get also files from subdirectories, `find' is your friend:
find . -type f ! -name '*_BASE_*' ! -path ./all.txt -exec cat {} >> all.txt \+
It searches files in the current directory and its subdirectories, it finds only files (-type f), ignores files matching to wildcard pattern *_BASE_*, ignores all.txt, and executes cat in the same manner as xargs would.

Resources