Loop through folders inside the zip file using unix shell script - unix

My zip file has my folders inside. After unzipping my zip file, I want to iterate a loop for available folders inside the zip.
Inside loop condition is like below:
If my folder has index file (This is a file contains some data), then only I want to run some process (I know what this process is..). Otherwise we can ignore that folder.
Then loop will continue with other folder if there are anything
Thanks advance..

something like this?
(note: I assume $destdir will only contain the zipfile and its extraction!)
zipfile="/path/to/the/zipfile.zip"
destdir="/path/to/where/you/want/to/unzip"
indexfile="index.txt" #name of the index files
mkdir -p "$destdir" 2>/dev/null #make "sure" it exists.. but ignore errors in case it already exists
cd "$destdir" || { echo "Can not go into destdir=$destdir" ; Exit 1 ; }
#at that point, we are inside $destdir : we can start to work:
unzip "$zipfile"
for i in ./*/ ; do # you could change ./*/ to ./*/*/ if the zip contains a master directory too
cd "$i" && { #the && is important: you want to be sure you could enter that subdir!
if [ -e ./"$indexfile" ]; then
dosomething # you can define the function dosomething and use it here..
# or just place commands here
fi
cd - #we can safely this works, as we started there...
}
done
note: I iterate on ./*/ instead of */ as the dirname could contain a leding -, and therefore make cd -something not work (it would say it can't recognise some options!) ! this goes away with ./, cd ./-something will work !

Related

Rename files in a directory the simplest way in a script

I want to write a script that add '0' at the end of the files that doesn't have it.
This is what I wrote:
#!/bin/bash
for file in $1
do
echo $file
ls $file | grep "\0$"
if ["$?"="1"]
then
fi
done
I don't know hot to target the files in a way I can rename them
for file in *[!0]; do mv "$file" "${file}0"; done
For each name that does not end 0, rename it so it does. Note that this handles names with spaces etc in them.
I want to give the script a directory, and it will rename the files in it that do not end in 0. How can I use this in a way I can tell the script which directory to work with?
So, make the trivial necessary changes, working with a single directory (and not rejecting the command line if more than one directory is specified; just quietly ignoring the extras):
for file in "${1:?}"/*[!0]; do mv "$file" "${file}0"; done
The "${1:?}" notation ensures that $1 is set and is not empty, generating an error message if it isn't. You could alternatively write "${1:-.}" instead; that would work on the current directory instead of a remote directory. The glob then generates the list of file names in that directory that do not end with a 0 and renames them so that they do. If you have Bash, you can use shopt -s nullglob you won't run into problems if there are no files without the 0 suffix in the directory.
You can generalize to handle any number of arguments (all supposed to be directories, defaulting to the current directory if no directory is specified):
for dir in "${#:-.}"
do
for file in "$dir"/*[!0]; do mv "$file" "${file}0"; done
done
Or (forcing directories):
for dir in "${#:-.}"
do
(cd "$dir" && for file in *[!0]; do mv "$file" "${file}0"; done)
done
This has the merit of reporting which arguments are not directories, or are inaccessible directories.
There are endless variations of this sort that could be made; some of them might even be useful.
Now, I want to do the same but, instead of the file ending with '0', the script should rename files that do not end with '.0' so that they do end with '.0'?
This is slightly trickier because of the revised ending. Simply using [!.][!0] is insufficient. For example, if the list of files includes 30, x.0, x0, z.9, and z1, then echo *[!.][!0] only lists z1 (omitting 30, x0 and z.9 which do not end with .0).
I'd probably use something like this instead:
for dir in "${#:-.}"
do
(
cd "$dir" &&
for file in *
do
case "$file" in
(*.0) : skip it;;
(*) mv "$file" "${file}0";;
esac
done
)
done
The other alternative lists more glob patterns:
for dir in "${#:-.}"
do
(cd "$dir" && for file in *[!.][!0] *.[!0] *[!.]0; do mv "$file" "${file}0"; done)
done
Note that this rapidly gets a lot trickier if you want to look for files not ending .00 — there would be a 7 glob expressions (but the case variant would work equally straight-forwardly), and shopt -s nullglob becomes increasingly important (or you need [ -f "$file" ] && mv "$file" "${file}.0" instead of the simpler move command).

UNIX how to use the base of an input file as part of an output file

I use UNIX fairly infrequently so I apologize if this seems like an easy question. I am trying to loop through subdirectories and files, then generate an output from the specific files that the loop grabs, then pipe an output to a file in another directory whos name will be identifiable from the input file. SO far I have:
for file in /home/sub_directory1/samples/SSTC*/
do
samtools depth -r chr9:218026635-21994999 < $file > /home/sub_directory_2/level_2/${file}_out
done
I was hoping to generate an output from file_1_novoalign.bam in sub_directory1/samples/SSTC*/ and to send that output to /home/sub_directory_2/level_2/ as an output file called file_1_novoalign_out.bam however it doesn't work - it says 'bash: /home/sub_directory_2/level_2/file_1_novoalign.bam.out: No such file or directory'.
I would ideally like to be able to strip off the '_novoalign.bam' part of the outfile and replace with '_out.txt'. I'm sure this will be easy for a regular unix user but I have searched and can't find a quick answer and don't really have time to spend ages searching. Thanks in advance for any suggestions building on the code I have so far or any alternate suggestions are welcome.
p.s. I don't have permission to write files to the directory containing the input folders
Beneath an explanation for filenames without spaces, keeping it simple.
When you want files, not directories, you should end your for-loop with * and not */.
When you only want to process files ending with _novoalign.bam, you should tell this to unix.
The easiest way is using sed for replacing a part of the string with sed.
A dollar-sign is for the end of the string. The total script will be
OUTDIR=/home/sub_directory_2/level_2
for file in /home/sub_directory1/samples/SSTC/*_novoalign.bam; do
echo Debug: Inputfile including path: ${file}
OUTPUTFILE=$(basename $file | sed -e 's/_novoalign.bam$/_out.txt/')
echo Debug: Outputfile without path: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE}
done
Note 1:
You can use parameter expansion like file=${fullfile##*/} to get the filename without path, but you will forget the syntax in one hour.
Easier to remember are basename and dirname, but you still have to do some processing.
Note 2:
When your script first changes the directory to /home/sub_directory_2/level_2 you can skip the basename call.
When all the files in the dir are to be processed, you can use the asterisk.
When all files have at most one underscore, you can use cut.
You might want to add some error handling. When you want the STDERR from samtools in your outputfile, add 2>&1.
These will turn your script into
OUTDIR=/home/sub_directory_2/level_2
cd /home/sub_directory1/samples/SSTC
for file in *; do
echo Debug: Inputfile: ${file}
OUTPUTFILE="$(basename $file | cut -d_ -f1)_out.txt"
echo Debug: Outputfile: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE} 2>&1
done

unix compare two directories if a directory exists in directory 1 only then do something

I have two directories, I would like to do something based on the results of a comparison.
Below is my script
#!/bin/sh
# the script doesn't work below if a the above line says bash
for i in $(\ls -d /data_vis/upload/);
do
diff ${i} /data_vis/upload1/;
done
The output from the above script is
Common subdirectories: /data_vis/upload/2012_06 and /data_vis/upload1/2012_06
Common subdirectories: /data_vis/upload/2012_07 and /data_vis/upload1/2012_07
Only in data_vis/upload/: 2012_08
Only in /data_vis/upload/: 2012_09
Only in /data_vis/upload/: index.php
Only in /data_vis/upload/: index.php~
Question ?
How can I use this this output to do something e.g. see below
Pseudocode
if Only in data_vis/upload/: 2012_08 # e.g if directory only exists in upload directory
then do something
else
do something else
Finish
Any comments or better solutions/commands welcome!
I understood that You want to parse the output of the diff.
First, Your outermost for-loop is not necessary, since the "ls"-operation returns only one item. The task could be done as follows:
#!/bin/sh
diff data_vis/upload/ data_vis/upload1/ | while read line
do
if echo $line | grep "Only in">/dev/null;then
# parse the name of the directory where the not matched dir is located
dironlyin=$(echo $line|awk -F ":" '{split($1,f," ");print f[3];}');
# parse the name of the not matched dir
fileonlyin=$(echo $line|awk -F ":" '{l=length($2);print substr($2,2,l-2);}');
# prove that the parsing worked correctly
echo "do something with file \""$fileonlyin"\" in dir \""$dironlyin"\""
else
# do your own parsing here if needed
echo "do something else with "\"$line\"
fi
done
You need to do the parsing of the lines starting with "Common subdirectories" by yourself. I hope the awk mini-scripts can help You doing it!
Cheers
Jörg

tcsh creating backup files

I'm trying to write a script that backs up a file which is given as parameter, in a way that a running number should be added to each copy of the file. For example, if the name of the original file was aa.c, then the first backup copy will be called aa.1.c. In the next time backup is run, the copy should be called aa.2.c, then aa.3.c, and so on. In addition ,the script should automatically find the copy with the highest number, and use it to create the new number.
Anyone know how can I do that with foreach loop?
Anyone know how can I do that with foreach loop?
#!/usr/bin/env tcsh
foreach file ($*:q)
# numb=1
while (-e $file:r.$numb.$file:e)
# numb++
end
cp -p $file $file:r.$numb.$file:e
end

Batch renaming / moving / hashing of files

I have a highly structured hierarchical directory containing multiple files that need to be moved into a flat structure and renamed at the same time. The original path and name must be logged along with the new path and name and eventually loaded into a database. Finally, each renamed file must get a unique, unguessable (IE: encrypted or hashed) file name. When the renamed file is moved into the new directory structure, I also want to limit the # of files in each directory, so each directory would be created with a sequential number for its name and then the files would be loaded into it until a maximum number of files was reached (eg: 255) before rolling into a new directory with the next sequential number for its name.
Is there a tool / software that does this? I did some initial research and nothing came up with the following criteria:
batch rename & copy into alternative (flatter) structure
hash / encrypt filename and ensure uniqueness
sequentially name folders and limit file count
log each file's original name and path, and new (encrypted) name and path
I have several Bash scripts I have used in the past to migrate hand-made file repositories to hashed repositories to be accessed and managed from a web application (mostly PHP apps). In these repositories filenames are hashed (to avoid collisions with files with the same content/name) and files are distributed evenly (in a deterministic fashion or randomly) to keep files-per-dir count low for performance reasons. The following is one fully-working example:
#!/bin/bash
MAXFILESPERDIR=500
TARGETROOTDIR="./newrepository"
RANDOMDISTRIBUTION=1
if [ -d "$1" ]; then
LOGFILE=$(basename $0).$(date +"_%Y%m%d_%H%M").${$}.log
SQLFILE=$(basename $0).$(date +"_%Y%m%d_%H%M").${$}.sql
SOURCEDIR="$1"
TOTALSOURCEFILES=$(find "$1" -type f | wc -l)
let "TOTALTARGETDIRS=$TOTALSOURCEFILES / $MAXFILESPERDIR"
PADLENTARGETDIRS=${#TOTALTARGETDIRS}
PADLENTARGETFILE=${#TOTALSOURCEFILES}
echo "We will create $TOTALTARGETDIRS directories to hold $MAXFILESPERDIR files per directory."
if [ "$RANDOMDISTRIBUTION" == "1" ] ; then
echo "We will rename and distribute each file randomly."
else
echo "We will rename and distribute each file uniformly."
fi
echo "Do you want to continue?"
select choice in yes no ; do
if [ "$choice" == "yes" ] ; then
COUNTER=1
find "$1" -type f | while read SOURCEFILE ; do {
CHECKSUMFILE=$(sha1sum "$SOURCEFILE" | cut -d " " -f 1)
CHECKSUMNAME=$(echo "$SOURCEFILE" | sha1sum | cut -d " " -f 1)
DETERMINISTICNONCE=$(printf "%0${PADLENTARGETFILE}d\n" $COUNTER)
if [ "$RANDOMDISTRIBUTION" == "1" ] ; then
PROBABILISTICNONCE=$(let "XX=$RANDOM % $TOTALTARGETDIRS + 1" ; printf "%0${PADLENTARGETDIRS}d\n" $XX;)
else
PROBABILISTICNONCE=$(let "XX=$COUNTER % $TOTALTARGETDIRS + 1" ; printf "%0${PADLENTARGETDIRS}d\n" $XX;)
fi
FILEDATE=$(stat -c %z "$SOURCEFILE" | cut -d "." -f 1)
FILESIZE=$(stat -c %s "$SOURCEFILE")
echo "Source file $SOURCEFILE" >> $LOGFILE
echo "Target file $TARGETROOTDIR/$PROBABILISTICNONCE/$PROBABILISTICNONCE$CHECKSUMFILE$DETERMINISTICNONCE" >> $LOGFILE
echo "INSERT INTO files (Filename, Location, Checksum, CDate, Size) VALUES ('$PROBABILISTICNONCE$CHECKSUMFILE$DETERMINISTICNONCE', '$PROBABILISTICNONCE', '$CHECKSUMFILE', '$FILEDATE', $FILESIZE);" >> $SQLFILE
mkdir -p $TARGETROOTDIR/$PROBABILISTICNONCE
cp -v "$SOURCEFILE" $TARGETROOTDIR/$PROBABILISTICNONCE/$PROBABILISTICNONCE$CHECKSUMFILE$DETERMINISTICNONCE
let "COUNTER+=1"
} ; done
echo "Done."
echo
break
fi
if [ "$choice" == "no" ] ; then
echo
echo "Operation cancelled"
echo
break
fi
done
else
echo
echo "Missing source directory"
echo
fi
Just run it from the root of your new repository. You can configure it modifying the first variables: MAXFILESPERDIR defines how many files to store per-directory, TARGETROOTDIR is the name of the first-level directory to create the first level directory (it uses only two levels, the first one is really a single root), and RANDOMDISTRIBUTION defines if the files will be distributed randomly (it may look uneven, specially for small runs) or deterministically (just counting).
How it works (FYI, just in case this is not what you are looking for but maybe you can get some ideas):
Count the source files.
Calculate how many target directories will create.
Ask for confirmation.
For each file:
Calculate the SHA1 hash for the file content.
Create a deterministic nonce.
Create a probabilistic nonce (if RANDOMDISTRIBUTION is 1, otherwise just a counter).
Get the size and modification date.
Combine the values of the random value with the hash and the counter to get the new file name (the path will be the random value).
Log the source and target full paths.
Create and log a SQL insert query.
Create the target directory (if it does not exist).
Copy the file. (You can move it if you want but I'm playing safe).
Finish
If you set RANDOMDISTRIBUTION to 1 and run the script several times, you'll get duplicates of your source files, as each file will get different target filename/path each time you run it. If RANDOMDISTRIBUTION is set to something else, everytime you run the script the files will be renamed the same way (for the same file set, if you add or remove files, they will get different names/paths).
The objective of using a random value + hash + counter is to be sure we can handle duplicates (won't collide thanks to the counter) while still distributing the files randomly (for long enough runs, this will distribute the files evenly).
Also, the preffix of the generated file name is the name of the directory too, so that if you have the file name and the directory name length, you can calculate the directory name (just in case you don't store that in your database table).
Finally, this is a one-time migration script, it was not really written to be executed regularly over the same set of files.

Resources