UNIX C Shell Scripting. Copying files and adding extension - unix

I'm trying to write a script that copies files from one directory to another and adds a .bak extension to them. I'm having a hard time figuring out how to add the extension.
foreach file in ($argv[1]/*)
cp $file $argv[2]
end

Making a bunch of assumptions (mainly that the outline of your script is valid C shell syntax, and that spaces in file names are not an issue), then you probably need to use the basename command:
foreach file in ($argv[1]/*)
cp $file $argv[2]/`basename $file`.bak
end
The basename command removes the pathname, so the files will be copied precisely to the directory named by $argv[2]. If you're looking to retain directory hierarchies too, you have to work a fair bit harder.

$1 and $2 are the arguments (directories) to the script-:
for f in $1/*
do
fname=$(basename $f)
cp $f $2/$fname.bak
done

Related

Rename files in a directory the simplest way in a script

I want to write a script that add '0' at the end of the files that doesn't have it.
This is what I wrote:
#!/bin/bash
for file in $1
do
echo $file
ls $file | grep "\0$"
if ["$?"="1"]
then
fi
done
I don't know hot to target the files in a way I can rename them
for file in *[!0]; do mv "$file" "${file}0"; done
For each name that does not end 0, rename it so it does. Note that this handles names with spaces etc in them.
I want to give the script a directory, and it will rename the files in it that do not end in 0. How can I use this in a way I can tell the script which directory to work with?
So, make the trivial necessary changes, working with a single directory (and not rejecting the command line if more than one directory is specified; just quietly ignoring the extras):
for file in "${1:?}"/*[!0]; do mv "$file" "${file}0"; done
The "${1:?}" notation ensures that $1 is set and is not empty, generating an error message if it isn't. You could alternatively write "${1:-.}" instead; that would work on the current directory instead of a remote directory. The glob then generates the list of file names in that directory that do not end with a 0 and renames them so that they do. If you have Bash, you can use shopt -s nullglob you won't run into problems if there are no files without the 0 suffix in the directory.
You can generalize to handle any number of arguments (all supposed to be directories, defaulting to the current directory if no directory is specified):
for dir in "${#:-.}"
do
for file in "$dir"/*[!0]; do mv "$file" "${file}0"; done
done
Or (forcing directories):
for dir in "${#:-.}"
do
(cd "$dir" && for file in *[!0]; do mv "$file" "${file}0"; done)
done
This has the merit of reporting which arguments are not directories, or are inaccessible directories.
There are endless variations of this sort that could be made; some of them might even be useful.
Now, I want to do the same but, instead of the file ending with '0', the script should rename files that do not end with '.0' so that they do end with '.0'?
This is slightly trickier because of the revised ending. Simply using [!.][!0] is insufficient. For example, if the list of files includes 30, x.0, x0, z.9, and z1, then echo *[!.][!0] only lists z1 (omitting 30, x0 and z.9 which do not end with .0).
I'd probably use something like this instead:
for dir in "${#:-.}"
do
(
cd "$dir" &&
for file in *
do
case "$file" in
(*.0) : skip it;;
(*) mv "$file" "${file}0";;
esac
done
)
done
The other alternative lists more glob patterns:
for dir in "${#:-.}"
do
(cd "$dir" && for file in *[!.][!0] *.[!0] *[!.]0; do mv "$file" "${file}0"; done)
done
Note that this rapidly gets a lot trickier if you want to look for files not ending .00 — there would be a 7 glob expressions (but the case variant would work equally straight-forwardly), and shopt -s nullglob becomes increasingly important (or you need [ -f "$file" ] && mv "$file" "${file}.0" instead of the simpler move command).

UNIX how to use the base of an input file as part of an output file

I use UNIX fairly infrequently so I apologize if this seems like an easy question. I am trying to loop through subdirectories and files, then generate an output from the specific files that the loop grabs, then pipe an output to a file in another directory whos name will be identifiable from the input file. SO far I have:
for file in /home/sub_directory1/samples/SSTC*/
do
samtools depth -r chr9:218026635-21994999 < $file > /home/sub_directory_2/level_2/${file}_out
done
I was hoping to generate an output from file_1_novoalign.bam in sub_directory1/samples/SSTC*/ and to send that output to /home/sub_directory_2/level_2/ as an output file called file_1_novoalign_out.bam however it doesn't work - it says 'bash: /home/sub_directory_2/level_2/file_1_novoalign.bam.out: No such file or directory'.
I would ideally like to be able to strip off the '_novoalign.bam' part of the outfile and replace with '_out.txt'. I'm sure this will be easy for a regular unix user but I have searched and can't find a quick answer and don't really have time to spend ages searching. Thanks in advance for any suggestions building on the code I have so far or any alternate suggestions are welcome.
p.s. I don't have permission to write files to the directory containing the input folders
Beneath an explanation for filenames without spaces, keeping it simple.
When you want files, not directories, you should end your for-loop with * and not */.
When you only want to process files ending with _novoalign.bam, you should tell this to unix.
The easiest way is using sed for replacing a part of the string with sed.
A dollar-sign is for the end of the string. The total script will be
OUTDIR=/home/sub_directory_2/level_2
for file in /home/sub_directory1/samples/SSTC/*_novoalign.bam; do
echo Debug: Inputfile including path: ${file}
OUTPUTFILE=$(basename $file | sed -e 's/_novoalign.bam$/_out.txt/')
echo Debug: Outputfile without path: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE}
done
Note 1:
You can use parameter expansion like file=${fullfile##*/} to get the filename without path, but you will forget the syntax in one hour.
Easier to remember are basename and dirname, but you still have to do some processing.
Note 2:
When your script first changes the directory to /home/sub_directory_2/level_2 you can skip the basename call.
When all the files in the dir are to be processed, you can use the asterisk.
When all files have at most one underscore, you can use cut.
You might want to add some error handling. When you want the STDERR from samtools in your outputfile, add 2>&1.
These will turn your script into
OUTDIR=/home/sub_directory_2/level_2
cd /home/sub_directory1/samples/SSTC
for file in *; do
echo Debug: Inputfile: ${file}
OUTPUTFILE="$(basename $file | cut -d_ -f1)_out.txt"
echo Debug: Outputfile: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE} 2>&1
done

Makefile rule depend on directory content changes

Using Make is there a nice way to depend on a directories contents.
Essentially I have some generated code which the application code depends on. The generated code only needs to change if the contents of a directory changes, not necessarily if the files within change their content. So if a file is removed or added or renamed I need the rule to run.
My first thought is generate a text file listing of the directory and diff that with the last listing. A change means rerun the build. I think I will have to pass off the generate and diff part to a bash script.
I am hoping somehow in their infinite intelligence might have an easier solution.
Kudos to gjulianm who got me on the right track. His solution works perfect for a single directory.
To get it working recursively I did the following.
ASSET_DIRS = $(shell find ../../assets/ -type d)
ASSET_FILES = $(shell find ../../assets/ -type f -name '*')
codegen: ../../assets/ $(ASSET_DIRS) $(ASSET_FILES)
generate-my-code
It appears now any changes to the directory or files (add, delete, rename, modify) will cause this rule to run. There is likely some issue with file names here (spaces might cause issues).
Let's say your directory is called dir, then this makefile will do what you want:
FILES = $(wildcard dir/*)
codegen: dir # Add $(FILES) here if you want the rule to run on file changes too.
generate-my-code
As the comment says, you can also add the FILES variable if you want the code to depend on file contents too.
A disadvantage of having the rule depend on a directory is that any change to that directory will cause the rule to be out-of-date — including creating generated files in that directory. So unless you segregate source and target files into different directories, the rule will trigger on every make.
Here is an alternative approach that allows you to specify a subset of files for which additions, deletions, and changes are relevant. Suppose for example that only *.foo files are relevant.
# replace indentation with tabs if copy-pasting
.PHONY: codegen
codegen:
find . -name '*.foo' |sort >.filelist.new
diff .filelist.current .filelist.new || cp -f .filelist.new .filelist.current
rm -f .filelist.new
$(MAKE) generate
generate: .filelist.current $(shell cat .filelist.current)
generate-my-code
.PHONY: clean
clean:
rm -f .filelist.*
The second line in the codegen rule ensures that .filelist.current is only modified when the list of relevant files changes, avoiding false-positive triggering of the generate rule.

How to programatically add files to an existing tar file

I have one process that creates a tar based on some existing files, then I want another process to take that tar file and add MORE files to it.
How is this accomplished programmatically?
There are no folders as such in a tarfile. Each file can have a path, so a tarfile might contain
/some/path/foo
/some/path/bar
/another/path/baz
If you have a file /elsewhere/quartz which you wish to add to the tarfile as /some/path/quartz, this will do it:
tar -rvf tarfilename --transform 's,.*/,/some/path/,' /elsewhere/quartz
(This will work in GNU tar, I can't make promises about other versions.)
The stuff inside the single quotes is a regular expression substitution command, roughly "take everything up to a slash (as much as possible) and replace it with /some/path/".

Can I symlink multiple directories into one?

I have a feeling that I already know the answer to this one, but I thought I'd check.
I have a number of different folders:
images_a/
images_b/
images_c/
Can I create some sort of symlink such that this new directory has the contents of all those directories? That is this new "images_all" would contain all the files in images_a, images_b and images_c?
No. You would have to symbolically link all the individual files.
What you could do is to create a job to run periodically which basically removed all of the existing symbolic links in images_all, then re-create the links for all files from the three other directories, but it's a bit of a kludge, something like this:
rm -f images_all/*
for i in images_[abc]/* ; do; ln -s $i images_all/$(basename $i) ; done
Note that, while this job is running, it may appear to other processes that the files have temporarily disappeared.
You will also need to watch out for the case where a single file name exists in two or more of the directories.
Having come back to this question after a while, it also occurs to me that you can minimise the time during which the files are not available.
If you link them to a different directory then do relatively fast mv operations that would minimise the time. Something like:
mkdir images_new
for i in images_[abc]/* ; do
ln -s $i images_new/$(basename $i)
done
# These next two commands are the minimal-time switchover.
mv images_all images_old
mv images_new images_all
rm -rf images_old
I haven't tested that so anyone implementing it will have to confirm the suitability or otherwise.
You could try a unioning file system like unionfs!
http://www.filesystems.org/project-unionfs.html
http://aufs.sourceforge.net/
to add on to paxdiablo 's great answer, i think you could use cp -s
(-s or --symbolic-link)
which makes symbolic links instead of literal copying
to maybe speed up or simplify the the bulk adding of symlinks to the "merge" folder A , of the files from folder B and C.
(i have not tested this though)
I cant recall of the top of my head, but im sure there is some option for CP to NOT overwrite existing, thus only symlinks of new files will be "cp -s" ed

Resources