I have 8 files that need to be merged into one text file, with each of the file names being on a separate line.
The output should be as follows:
file.txt:
output1/transcripts.gtf
output2/transcripts.gtf
output3/transcripts.gtf
and so on...
I have read several other suggestions and I know it should be an easy fix. I have tried dir and awk but have only gotten results that has all files in one line. I am using unix.
How about this?
ls -1 output*/*.gtf > file.txt
or if the nesting of you sub directories is deeper and you want all files with names ending in ".gtf":
find . -type f -name "*.gtf" -print | cut -b 3- > file.txt
Related
I have a directory containing over a thousand subdirectories. I only want to 'ls' all the directories that contain more than 2 files. I don't need the directories that contain less than 2 files. This is in C-shell, not bash. Anyone know of a good command for this?
I tried this command but it's not giving the desired output. I simply want the full list of directories with more than 2 files. A reason it isn't working is because it will go into sub dirs in those dirs to find if they have more than 2 files. I don't want a recursive search. Just a list of first level directories in the main directory they are in.
$ find . -type f -printf '%h\n' | sort | uniq -c | awk '$1 > 2'
My mistake, I was thinking bash rather than csh. Although I don't have a csh to test with, I think this is the csh syntax for the same thing:
foreach d (*)
if (d "$d" && `ls -1 "$d"|wc -l` > 2) echo "$d"
end
I've added a guard so that non-directories aren't unnecessarily processed, and I've included double-quotes in case there are any "funny" file or directory names (containing spaces e.g.).
One possible problem (I don't know what your exact task is): any immediate subdirectories will also count as files.
Sorry, I was working in bash here:
for d in *;do if [ $(ls -1 $d|wc -l) -gt 2 ];then echo $d;fi;done
For a faster solution, you could try "cheating" by deconstructing the format of the directories themselves if you're on pure Unix. They're just files themselves with contents that can be analyzed in that case. Needless to say that is NOT PORTABLE, to e.g. any bash running on Windows, so not recommended.
I need to concatenate a large number of text files from a series of directories. All the text files have the same name but some folders do not contain the file and just need to be skipped.
When I use cat ./**/File.txt > newFile.txt I get the following error /bin/cat: Argument list too long.
I tried using the ulimit command a few different ways but that did not work.
I have tried:
find . -name File.txt -exec cat {} \; > newFile.txt
find . -name File.txt -exec cat {} \+ > newFile.txt
find . -type f -name File.txt | xargs cat
and this results in the files being concatenated twice. For example, I have 3 text files named File.txt, each in a different directory, each with a different line of text:
test1
test2
test3
When I do the above commands my newFile.txt looks like:
test1
test2
test3
test1
test2
test3
I can't figure out why this is happening twice. When I use the command cat ./**/File.txt > newFile.txt on my small test set, it works fine and I end up with one file that has:
test1
test2
test3
I also tried
for a in File.txt ; do cat $a >> newFile.txt ; done
but get the message
cat: File.txt: No such file or directory
because some of the directories do not contain this text file, is my guess.
Is there another way to do this, or is there a reason my files are being concatenated twice?
Here's how I would do it
find . -name File.txt -exec cat {} >> output.txt \;
This searches for all occurrences of the file File.txt and appends the cat'ed output of that file to the file output.txt
However, I have tried your find command and it too also works.
find . -name File.txt -exec cat {} \; > newFile.txt
I would suggest that you clear down the output file newFile.txt before you try either your find or my find as follows:
>newFile.txt
This is a handy way to empty a file's contents. (Although this should not matter to you right now emptying a file by redirecting nothing to it can be done even if another process is writing to the file)
Hope this helps.
I have a list of 50 names that look like this:
O8-E7
O8-F2
O8-F6
O8-F8
O8-H2
O9-A5
O9-B8
O9-D8
O9-E2
O9-F5
O9-H12
S37-A5
S37-B11
S37-B12
S37-C12
S37-D12
S37-E8
S37-G2
I want to look inside a specific directory for all the subdirectories whose name contains one of these elements.
For example, the directory Sample_S37-G2-from-Specimen-001 would be a match.
Inside those subdirectories, there is a file called accepted_hits.bam (unfortunately named the same way in all of them). I want to find these files and copy them into a single folder, with the name of the sample subdirectory that they came from.
For example, I would copy the accepted_hits.bam file from the subdirectory Sample_S37-G2-from-Specimen-001 to the new_dir as S37-G2_accepted_hits.bam
I tried using find, but it's not working and I don't really understand why.
cat sample.list | while read FILENAME; do find /path/to/sampleDirectories -name "$FILENAME" -exec cp '{}' new
_dir\; done
Any ideas? Thanks!
You are looking for dirs that are exactly the same as the lines in your input.
The first improvement would be using wildcards
cat sample.list | while read FILENAME; do
find /path/to/sampleDirectories -name "*${FILENAME}*" -exec cp '{}' new_dir\; done
Your new problem is that now you will be looking for dir's, not files. You want to find dir's with the filename accepted_hits.bam.
So your next try would be parsing the output of
find /path/to/sampleDirectories -name accepted_hits.bam | grep "${FILENAME}"
but you do not want to call find for each entry in sample.list.
You need to start with 1 find command and get the relevant subdirs from it.
A complication is that you want to have the substring from orgfile in your destfile name. Look at the grep options o and f, they help!
find /path/to/sampleDirectories -name accepted_hits.bam | while read orgfile | do
matched_part=$(echo "${orgfile}" | grep -of sample.list)
if [ -n "${matched_part}" ]; then
cp ${orgfile} newdir/${matched_part}accepted_hits.bam
fi
done
This will only work when your sample.list is without additional spaces. When you have spaces and can not cange the file, you need to copy/parse sample.list to another file.
When one of your 50 entries in sample.list is a substring of "accepted_hits.bam", you need to do some extra work.
Edit: if [ -n "${matched_part}" ] was missing the $.
Try using egrep with alternation
build a text file with single line of patterns: (pat1|pat2|pat3)
call find to list all of the regular files
use egrep to select the ones based on the patterns in the pattern file
awk 'BEGIN { printf("(") } FNR==1 {printf("%s", $0)} FNR>1 {printf("|%s", $0)} END{printf(")\n") } ' sample.list > t.sed
find /path/to/sampleDirectories -type f | egrep -f t.sed > filelist
Using cat command as follows we can display content of multiple files on screen
cat file1 file2 file3
But in a directory if there are more than 20 files and I want content of all those files to be displayed on the screen without using the cat command as above by mentioning the names of all files.
How can I do this?
You can use the * character to match all the files in your current directory.
cat * will display the content of all the files.
If you want to display only files with .txt extension, you can use cat *.txt, or if you want to display all the files whose filenames start with "file" like your example, you can use cat file*
If it's just one level of subdirectory, use cat * */*
Otherwise,
find . -type f -exec cat {} \;
which means run the find command, to search the current directory (.) for all ordinary files (-type f). For each file found, run the application (-exec) cat, with the current file name as a parameter (the {} is a placeholder for the filename). The escaped semicolon is required to terminate the -exec clause.
I also found it useful to print filename before printing content of each file:
find ./ -type f | xargs tail -n +1
It will go through all subdirectories as well.
Have you tried this command?
grep . *
It's not suitable for large files but works for /sys or /proc, if this is what you meant to see.
You could use awk too. Lets consider we need to print the content of a all text files in a directory some-directory
awk '{print}' some-directory/*.txt
If you want to do more then just one command called for every file, you will be more flexible with for loop. For example if you would like to print filename and it contents
for file in parent_dir/*.file_extension; do echo $file; cat $file; echo; done
I'm very new to Unix, and currently taking a class learning the basics of the system and its commands.
I'm looking for a single command line to list off all of the user home directories in alphabetical order from the /etc/passwd directory. This applies only to the home directories, and not the contents within them. There should be no duplicate entries. I've tried many permutations of commands such as the following:
sort -d | find /etc/passwd /home/* -type -d | uniq | less
I've tried using -path, -name, removing -type, using -prune, and changing the search pattern to things like /home/*/$, but haven't gotten good results once. At best I can get a list of my own directory (complete with every directory inside it, which is bad), and the directories of the other students on the server (without the contained directories, which is good). I just can't get it to display the /home/user directories and nothing else for my own account.
Many thanks in advance.
/etc/passwd is a file. the home directory is usually at field/column 6, where ":" is the delimiter. When you are dealing with file structure that has distinct characters as delimiters, you should use a tool that can break your data down into smaller chunks for easier manipulation using fields and field delimiters. awk/cut etc, even using the shell with IFS variable set can do the job. eg
awk -F":" '{print $6}' /etc/passwd | sort
cut -d":" -f6 /etc/passwd |sort
using the shell to read the file
while IFS=":" read -r a b c d e home_dir g
do
echo $home_dir
done < /etc/passwd | sort
I think the tools you want are grep, tr and awk. Grep will give you lines from the file that actually contain home directories. tr will let you break up the delimiter into spaces, which makes each line easier to parse.
Awk is just one program that would help you display the results that you want.
Good luck :)
Another hint, try ls --color=auto /etc, passwd isn't the kind of file that you think it is. Directories show up in blue.
In Unix, find is a command for finding files under one or more directories. I think you are looking for a command for finding lines within a file that match a pattern? Look into the command grep.
sed 's|\(.[^:]*\):\(.[^:]*\):\(.*\):\(.[^:]*\):\(.[^:]*\)|\4|' /etc/passwd|sort
I think all this processing could be avoided. There is a utility to list directory contents.
ls -1 /home
If you'd like the order of the sorting reversed
ls -1r /home
Granted, this list out the name of just that directory name and doesn't include the '/home/', but that can be added back easily enough if desired with something like this
ls -1 /home | (while read line; do echo "/home/"$line; done)
I used something like :
ls -l -d $(cut -d':' -f6 /etc/passwd) 2>/dev/null | sort -u
The only thing I didn't do is to sort alphabetically, didn't figured that yet