UNIX how to use the base of an input file as part of an output file - unix

I use UNIX fairly infrequently so I apologize if this seems like an easy question. I am trying to loop through subdirectories and files, then generate an output from the specific files that the loop grabs, then pipe an output to a file in another directory whos name will be identifiable from the input file. SO far I have:
for file in /home/sub_directory1/samples/SSTC*/
do
samtools depth -r chr9:218026635-21994999 < $file > /home/sub_directory_2/level_2/${file}_out
done
I was hoping to generate an output from file_1_novoalign.bam in sub_directory1/samples/SSTC*/ and to send that output to /home/sub_directory_2/level_2/ as an output file called file_1_novoalign_out.bam however it doesn't work - it says 'bash: /home/sub_directory_2/level_2/file_1_novoalign.bam.out: No such file or directory'.
I would ideally like to be able to strip off the '_novoalign.bam' part of the outfile and replace with '_out.txt'. I'm sure this will be easy for a regular unix user but I have searched and can't find a quick answer and don't really have time to spend ages searching. Thanks in advance for any suggestions building on the code I have so far or any alternate suggestions are welcome.
p.s. I don't have permission to write files to the directory containing the input folders

Beneath an explanation for filenames without spaces, keeping it simple.
When you want files, not directories, you should end your for-loop with * and not */.
When you only want to process files ending with _novoalign.bam, you should tell this to unix.
The easiest way is using sed for replacing a part of the string with sed.
A dollar-sign is for the end of the string. The total script will be
OUTDIR=/home/sub_directory_2/level_2
for file in /home/sub_directory1/samples/SSTC/*_novoalign.bam; do
echo Debug: Inputfile including path: ${file}
OUTPUTFILE=$(basename $file | sed -e 's/_novoalign.bam$/_out.txt/')
echo Debug: Outputfile without path: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE}
done
Note 1:
You can use parameter expansion like file=${fullfile##*/} to get the filename without path, but you will forget the syntax in one hour.
Easier to remember are basename and dirname, but you still have to do some processing.
Note 2:
When your script first changes the directory to /home/sub_directory_2/level_2 you can skip the basename call.
When all the files in the dir are to be processed, you can use the asterisk.
When all files have at most one underscore, you can use cut.
You might want to add some error handling. When you want the STDERR from samtools in your outputfile, add 2>&1.
These will turn your script into
OUTDIR=/home/sub_directory_2/level_2
cd /home/sub_directory1/samples/SSTC
for file in *; do
echo Debug: Inputfile: ${file}
OUTPUTFILE="$(basename $file | cut -d_ -f1)_out.txt"
echo Debug: Outputfile: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE} 2>&1
done

Related

How can I invoke the name of a file associated with a batch process?

I'm trying to batch process a folder full of text files with pandoc, and I'd like to maintain the current filenames. How do I call the filename as a variable in the output? For example, I want to write a command like this:
pandoc -s notes/*.txt -o rtf/$1.rtf
Where $1 represents the filename grabbed with the * character.
I'm sure this is a simple question, but I don't quite know the right language to search for it properly.
Thanks for any help!
Try
for file in notes/*txt
do
file_base_name=$(basename "${file}" | cut -d'.' -f1)
pandoc -s "$file" -o rtf/${file_base_name}.rtf
done

Trim a file name in Unix

I have a file with name
ROCKET_25_08:00.csv
I want to trim the name of the file to
ROCKET_25_.csv
I tried mv but mv is not what I required because there will be cases where the files may be more than one.
I want the name till the second _.
How to get that in unix.
Please advise.
There are some utilities that provide more flexible renaming. But one solution that won't use anything other but included UNIX tools (like sed) would be:
ls -d * | sed -re 's/^([^_]*_[^_]*_)(.*)(\....)$/mv -v \1\2\3 \1\3/' | bash
This will only work in one directory, it won't process subdirectories.
It's not at all clear what you are actually trying to do, but if you just want to remove text between the last underscore and the period, you can do:
f=ROCKET_25_08:00.csv
echo ${f%_*}_.csv

Is there any way to extract only one file (or a regular expression) from tar file

I have a tar.gz file.
Because of space issues and the time required extract is longer, I need to extract only the selected file.
I have tried the below
grep -l '<text>' *
file1
file2
only file1,file2 should be extracted.
What should I Do to SAVE all the tail -f data to a FILE swa3?
I have swa1.out which has list of online data inputs.
swa2 is a file which should skip the keywords from swa1.
swa3 is a file where it should write the data.
Can anyone help in this?
I have tried below commnad, but I'm not able to get it
tail -f SWA1.out |grep -vf SWA2 >> swa3
You can use do this with --extract option like this
tar --extract --file=test.tar.gz main.c
Here in --file , specify the .gz filename and at the end specify the
filename you want to extract.

How to display contents of all files under a directory on the screen using unix commands

Using cat command as follows we can display content of multiple files on screen
cat file1 file2 file3
But in a directory if there are more than 20 files and I want content of all those files to be displayed on the screen without using the cat command as above by mentioning the names of all files.
How can I do this?
You can use the * character to match all the files in your current directory.
cat * will display the content of all the files.
If you want to display only files with .txt extension, you can use cat *.txt, or if you want to display all the files whose filenames start with "file" like your example, you can use cat file*
If it's just one level of subdirectory, use cat * */*
Otherwise,
find . -type f -exec cat {} \;
which means run the find command, to search the current directory (.) for all ordinary files (-type f). For each file found, run the application (-exec) cat, with the current file name as a parameter (the {} is a placeholder for the filename). The escaped semicolon is required to terminate the -exec clause.
I also found it useful to print filename before printing content of each file:
find ./ -type f | xargs tail -n +1
It will go through all subdirectories as well.
Have you tried this command?
grep . *
It's not suitable for large files but works for /sys or /proc, if this is what you meant to see.
You could use awk too. Lets consider we need to print the content of a all text files in a directory some-directory
awk '{print}' some-directory/*.txt
If you want to do more then just one command called for every file, you will be more flexible with for loop. For example if you would like to print filename and it contents
for file in parent_dir/*.file_extension; do echo $file; cat $file; echo; done

Unix: prepending a file without a dummy-file?

I do not want:
$ cat file > dummy; $ cat header dummy > file
I want similar to the command below but to the beginning, not to the end:
$ cat header >> file
You can't append to the beginning of a file without rewriting the file. The first way you gave is the correct way to do this.
This is easy to do in sed if you can embed the header string directly in the command:
$ sed -i "1iheader1,header2,header3"
Or if you really want to read it from a file, you can do so with bash's help:
$ sed -i "1i$(<header)" file
BEWARE that "-i" overwrites the input file with the results. If you want sed to make a backup, change it to "-i.bak" or similar, and of course always test first with sample data in a temp directory to be sure you understand what's going to happen before you apply to your real data.
The whole dummy file thing is pretty annoying. Here's a 1-liner solution that I just tried out which seems to work.
echo "`cat header file`" > file
The ticks make the part inside quotes execute first so that it doesn't complain about the output file being an input file. It seems related to hhh's solution but a bit shorter. I suppose if the files are really large this might cause problems though because it seems like I've seen the shell complain about the ticks making commands too long before. Somewhere the part that is executed first must be stored in a buffer so that the original can be overwritten, but I'm not enough of an expert to know what/where that buffer would be or how large it could be.
You can't prepend to a file without reading all the contents of the file and writing a new file with your prepended text + contents of the file. Think of a file in Unix as a stream of bytes - it's easy to append to an end of a stream, but there is no easy operation to "rewind" the stream and write to it. Even a seek operation to the beginning of the file will overwrite the beginning of with any data you write.
One possibility is to use a here-document:
cat > "prependedfile" << ENDENDEND
prepended line(s)
`cat "file"`
ENDENDEND
There may be a memory limitation to this trick.
Thanks to right searchterm!
echo "include .headers.java\n$(cat fileObject.java )" > fileObject.java
Then with a file:
echo "$(cat .headers.java)\n\n$(cat fileObject.java )" > fileObject.java
if you want to pre-pend "header" to "file" why not append "file" to "Header"
cat file >> header
Below is a simple c-shell attempt to solve this problem. This "prepend.sh" script takes two parameters:
$1 - The file containing the pre-appending wording.
$2 - The original/target file to be modified.
#!/bin/csh
if (if ./tmp.txt) then
rm ./tmp.txt
endif
cat $1 > ./tmp.txt
cat $2 >> ./tmp.txt
mv $2 $2.bak
mv ./tmp.txt $2

Resources