Move files with certain name to folder with certain name - Unix - unix

I am trying since a while now, can anyone help me please?
I want to move files with certain names, e.g.
tree.txt
apple.txt
....
To their corresponding folder
tree
apple
I tried this but it takes too much time to do it individually:
mv *tree* destination_directory/tree
because then I need to repeat this 200 times
mv *apple* destination_directory/apple
.....
Is there any way to make this faster?
I have a list.txt with all the file names.
Thank you so much,
Bine

Assuming toy have the list of txt files in a file called "filewithtxts", you can read the file into a while loop and then process each entry
while read file;
do
dir=$(awk -F_ '{ print $(NF-1)"_"$NF }' <<< "${file%.txt}")
mv *"${file%.txt}"* "destination_directory/$dir" # Use ${file%.txt} to strip .txt from the entry
done < filewithtxts

Related

Is there a way to make Unix diff -r compare only differences in filenames, but not check if any single file actually differs?

I need to compare two large directories with a lot of files in them. I tried using:
diff -r Directory1 Directory2
but the process is really slow due to the amount of files and their huge size.
So I thought about making the process faster by just comparing the content of the folders and not the actual content of the files.
Is there a way to make diff recursively check only if every subdirectory of Directory1 and Directory2 match in name and file content, but not check if every single file in Directory1 actually matches every single file in Directory2?
For example, let's say I have Directory1/SubDirectory1 and Directory2/Subdirectory1.
I want to check only if Directory1/SubDirectory1.1 and Directory2/Subdirectory2.1 have the same number of files with the same filenames (let's say, file1, file2, ... fileN), but I don't care about matching every file1, file2 ... fileN of Directory1/SubDirectory1.1 to every file1, file2 ... fileN of SubDirectory2.1 to see if their content is actually the same.
Is there a way of doing this?
Edit:
I tried using:
diff <(path1) <(path2)
but unfortunately, diff outputs the full path for each file. The output I get is thus:
< /Volume1/.../.../Directory1/SubDirectory1.1/file1
< /Volume1/.../.../Directory1/SubDirectory1.1/file2
...
> /Volume2/.../.../Directory2/SubDirectory2.1/file1
> /Volume2/.../.../Directory2/SubDirectory2.1/file2
...
Here every single filename clearly differs, because the full paths differ.
Is there a way to force find to output paths only starting from the directory you give as argument? For example:
find -(some option I'm not aware of) /Volume1/.../.../Directory1
outputs:
/Directory1/SubDirectory1.1/file1
/Directory1/SubDirectory1.1/file2
...
A simple way:
cd /.../Directory1
find . | sort >/tmp/dir1.lst
cd /.../Directory2
find . | sort >/tmp/dir2.lst
diff /tmp/dir1.lst /tmp/dir2.lst
It will fail if your filenames contain newlines, but in many cases that isn't a concern.
If scripting this, make sure to use auto-generated temp file names, e.g. with mktemp(1), to avoid symlink attacks and other problems.
Nate Eldredge, thank you for your answer!
However, I was able to solve my problem creating a script named fast_diff.sh, with just a line of code, as follows:
diff <(find "$1" | sed "s|$1\/||g" | sort) <(find "$2" | sed "s|$2\/||g" | sort)
The script takes two arguments, let's say path1 and path2:
./fast_diff.sh /Volume1/.../.../Directory1 /Volume2/.../.../Directory2
Now the variable $1 is equal to "/Volume1/.../.../Directory1" and the variable $2 is equal to "/Volume2/.../.../Directory2".
The command find gives as output something like:
/Volume1/.../.../Directory1/SubDirectory1.1/file1
/Volume1/.../.../Directory1/SubDirectory1.1/file2
...
Now I pipe this output to sed, using:
sed "s|$1||g"
which replaces every occurrence of "/Volume1/.../.../Directory1" with nothing. I used | as a separator instead of / because there are many occurrences of / in the directory path.
Employing the previous line of code, though, lists all subdirectories and files starting with a slash:
/SubDirectory1.1/file1
/SubDirectory1.1/file2
...
To remove the slash, I added \/:
sed "s|$1\/||g"

Combining two big files from specific lines in Unix

I have two large files that I want to combine in one file and gzip it as well. However for the second file I want to exclude the first two lines. How can I do it? What I have done so far is:
awk 'FNR>2' /application/psmcHard_0.msOut.gz /JPT/psmcHard_0.msOut.gz > /all_data/psmcHard_0.msOut
Do you think this is the right way to do it? And how can I gzip the file?
You input files are in compressed with '.gz' - awk will not be able to process them directly. You will have to unpack the files, concat them, and recompress the files
(
# Pass first file as-is (no need to unzip/rezip)
cat /application/psmcHard_0.msOut.gz
# Unzip second file, filter required lines, and re-zip
zcat /JPT/psmcHard_0.msOut.gz | awk 'FNR > 2' | gzip
) > /all_data/psmcHard_0.msOut.gz

UNIX how to use the base of an input file as part of an output file

I use UNIX fairly infrequently so I apologize if this seems like an easy question. I am trying to loop through subdirectories and files, then generate an output from the specific files that the loop grabs, then pipe an output to a file in another directory whos name will be identifiable from the input file. SO far I have:
for file in /home/sub_directory1/samples/SSTC*/
do
samtools depth -r chr9:218026635-21994999 < $file > /home/sub_directory_2/level_2/${file}_out
done
I was hoping to generate an output from file_1_novoalign.bam in sub_directory1/samples/SSTC*/ and to send that output to /home/sub_directory_2/level_2/ as an output file called file_1_novoalign_out.bam however it doesn't work - it says 'bash: /home/sub_directory_2/level_2/file_1_novoalign.bam.out: No such file or directory'.
I would ideally like to be able to strip off the '_novoalign.bam' part of the outfile and replace with '_out.txt'. I'm sure this will be easy for a regular unix user but I have searched and can't find a quick answer and don't really have time to spend ages searching. Thanks in advance for any suggestions building on the code I have so far or any alternate suggestions are welcome.
p.s. I don't have permission to write files to the directory containing the input folders
Beneath an explanation for filenames without spaces, keeping it simple.
When you want files, not directories, you should end your for-loop with * and not */.
When you only want to process files ending with _novoalign.bam, you should tell this to unix.
The easiest way is using sed for replacing a part of the string with sed.
A dollar-sign is for the end of the string. The total script will be
OUTDIR=/home/sub_directory_2/level_2
for file in /home/sub_directory1/samples/SSTC/*_novoalign.bam; do
echo Debug: Inputfile including path: ${file}
OUTPUTFILE=$(basename $file | sed -e 's/_novoalign.bam$/_out.txt/')
echo Debug: Outputfile without path: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE}
done
Note 1:
You can use parameter expansion like file=${fullfile##*/} to get the filename without path, but you will forget the syntax in one hour.
Easier to remember are basename and dirname, but you still have to do some processing.
Note 2:
When your script first changes the directory to /home/sub_directory_2/level_2 you can skip the basename call.
When all the files in the dir are to be processed, you can use the asterisk.
When all files have at most one underscore, you can use cut.
You might want to add some error handling. When you want the STDERR from samtools in your outputfile, add 2>&1.
These will turn your script into
OUTDIR=/home/sub_directory_2/level_2
cd /home/sub_directory1/samples/SSTC
for file in *; do
echo Debug: Inputfile: ${file}
OUTPUTFILE="$(basename $file | cut -d_ -f1)_out.txt"
echo Debug: Outputfile: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE} 2>&1
done

Converting Filename to Filename_Inode

I'm writing my first script that takes a file and moves it to another folder, except that I want to change the filename of the file to filename_inode instead of just filename incase there are any files with the same name
I've figured out how to show this by creating the following 4 variables
inode=$(ls -i $1 | cut -c1-7) #lists the file the user types, cuts the inode from it
space="_" #used to put inbetween the filename and bname
bname=$(basename $1) #gets the basename of the file without the directory etc
bnamespaceinode=$bname$space$inode #combines the 3 values into one variable
echo "$bnamespaceinode #prints filename_inode to the window
So the bottom echo shows filename_inode which is what I want, except now when I try to move this using mv or cp i'm getting the following errors
I dont think it's anything wrong with the syntax i'm using for the mv and cv commands, and so I'm thinking I need to concatenate the 3 variables into a new file or use the result of the first and then append the other 2 to that file?
I've tried both of the above but still not having any luck, any ideas?
Thanks
Without clearer examples, I guess this could work:
$TARGETDIR=/my/target/directory
mv $1 $TARGETDIR/$(basename "$1" | sed 's/_.*/_inode/')

How to use mv command to rename multiple files in unix?

I am trying to rename multiple files with extension xyz[n] to extension xyz
example :
mv *.xyz[1] to *.xyz
but the error is coming as - " *.xyz No such file or directory"
Don't know if mv can directly work using * but this would work
find ./ -name "*.xyz\[*\]" | while read line
do
mv "$line" ${line%.*}.xyz
done
Let's say we have some files as shown below.Now i want remove the part -(ab...) from those files.
> ls -1 foo*
foo-bar-(ab-4529111094).txt
foo-bar-foo-bar-(ab-189534).txt
foo-bar-foo-bar-bar-(ab-24937932201).txt
So the expected file names would be :
> ls -1 foo*
foo-bar-foo-bar-bar.txt
foo-bar-foo-bar.txt
foo-bar.txt
>
Below is a simple way to do it.
> ls -1 | nawk '/foo-bar-/{old=$0;gsub(/-\(.*\)/,"",$0);system("mv \""old"\" "$0)}'
for detailed explanation check here
Here is another way using the automated tools of StringSolver. Let us say your first file is named abc.xyz[1] a second named def.xyz[1] and a third named ghi.jpg (not the same extension as the previous two).
First, filter the files you want by giving examples (ok and notok are any words such that the first describes the accepted files):
filter abc.xyz[1] ok def.xyz[1] ok ghi.jpg notok
Then perform the move with the filter it created:
mv abc.xyz[1] abc.xyz
mv --filter --all
The second line generalizes the first transformation on all files ending with .xyz[1].
The last two lines can also be abbreviated in just one, which performs the moves and immediately generalizes it:
mv --filter --all abc.xyz[1] abc.xyz
DISCLAIMER: I am a co-author of this work for academic purposes. Other examples are available on youtube.
I think mv can't operate on multiple files directly without loop.
Use rename command instead. it uses regular expressions but easy to use once mastered and more powerful.
rename 's/^text-to-replace/new-text-you-want/' text-to-replace*
e.g to rename all .jar files in a directory to .jar_bak
rename 's/^jar/jar_bak/' jar*

Resources