What is the usage of braces in unix commands? - unix

I have been looking all over for how to use the {} feature in unix filenames. For example, I have a directory that has subdirectories named as an ID#. Each subdirectory has a recorded audio file named as a specific test question ID#. The following command:
for file in */1-01*; do echo "$file"; done
will list the names of the audio files within each subdirectory of the current directory. Doing so gives me a list:
809043250/1-01-20131.wav
813777079/1-01-20131.wav
817786199/1-01-20131.wav
827832538/1-01-20131.wav
834820477/1-01-20131.wav
I want to rename each of the above .wav files as a different ID#, so it should end up like this:
809043250/5001.wav
813777079/5001.wav
817786199/5001.wav
827832538/5001.wav
834820477/5001.wav
So how do I use the ${file/firstOccurance/replaceWith} notation if I want to use the mv command to keep the person's ID# (809043250, 813777079, etc) and the first / but strip off the 1-01-20131.wav and tack on 5001.wav ?
I don't know how to search for this on google. Is it called regular expressions (i don't think so...)? Brace notation? filename patterns? Does anyone know of a good explanation of these?
Thanks!

In bash, one variation on the notation is called Brace Expansion. This generates file names.
The other variation on the notation is called Shell Parameter Expansion. Here, bash provides a regex-ish substitute notation ${var/match/replace}.
For your purposes, you need the paremeter expansion:
$ x="809043250/1-01-20131.wav
> 813777079/1-01-20131.wav
> 817786199/1-01-20131.wav
> 827832538/1-01-20131.wav
> 834820477/1-01-20131.wav"
$ for file in $x; do echo $file ${file/1-01-20131/5001}; done
809043250/1-01-20131.wav 809043250/5001.wav
813777079/1-01-20131.wav 813777079/5001.wav
817786199/1-01-20131.wav 817786199/5001.wav
827832538/1-01-20131.wav 827832538/5001.wav
834820477/1-01-20131.wav 834820477/5001.wav
$

Related

Issue while renaming a file with file pattern in unix

As part of our process, we get an input file in the .gz format. We need to unzip this file and add some suffix at the end of the file. The input file has timestamp so I am trying to use filter while unzipping and renaming this file.
Input file name :
Mem_Enrollment_20200515130341.dat.gz
Step 1:
Unzipping this file : (working as expected)
gzip -d Mem_Enrollment_*.dat.gz
output :
Mem_Enrollment_20200515130341.dat
Step 2: Renaming this file : (issues while renaming)
Again, I am going with the pattern but I know this won't work in this case. So, what should I do rename this file?
mv Mem_Enrollment_*.dat Mem_Enrollment_*.dat_D11
output :
Mem_Enrollment_*.dat_D11
expected output :
Mem_Enrollment_20200515130341.dat_D11
try
for fn in Mem_Enrollment_*.dat
do
mv ${fn} ${fn}_D11;
done
With just datastage you could loop over ls output from an execute command stage via "ls Mem_Enrollment_*.dat.gz" and then use an #FM as a delimiter when looping the output list. You could then breakout the gzip and rename into two separate commands, which helps with readability in your job.
Only caveat here is that the Start Loop stage doesn't accept the #FM in the delimiter due to some internal funkyness inside Datastage. So you need to set a user variable equal to it and pass that to the mark.

How to use AWS CLI to only copy files in S3 bucket that match a given string pattern

I'm using the AWS CLI to copy files from an S3 bucket to my R machine using a command like below:
system(
"aws s3 cp s3://my_bucket_location/ ~/my_r_location/ --recursive --exclude '*' --include '*trans*' --region us-east-1"
)
This works as expected, i.e. it copies all files in my_bucket_location that have "trans" in the filename at that location.
The problem that I am facing is that I have other files with similar naming conventions that I don't want to import in this step. As an example, in the list below I only want to copy the first two files, not the last two:
File list
trans_120215.csv
trans_130215.csv
sum_trans_120215.csv
sum_trans_130215.csv
If I was using regex I could make it more specific like "^trans_\\d+" to bring in just the first two files, but this doesn't seem possible using AWS CLI. So my question is there a way to have more complex pattern matching using AWS CLI like below?
system(
"aws s3 cp s3://my_bucket_location/ ~/my_r_location/ --recursive --exclude '*' --include '^trans_\\d+' --region us-east-1"
)
Please note that I can only use information about the file in question, i.e. that I want to import a file with pattern "^trans_\\d+", I can't use the fact that the other unwanted files contain sum_ at the start, because this is only an example there could be other files with similar names like "check_trans_120215.csv".
I have considered other alternatives like below, but hoping there is a way to adjust the copy command to avoid going down either of these routes:
Listing all items in the bucket > using regex in R to specify the files that I want > Only importing those files
Keeping the copy command as it is > delete unwanted files on the R machine after the copy
The alternatives that you have listed are the best options because S3 CLI doesn't support regex.
Use of Exclude and Include Filters:
Currently, there is no support for the use of UNIX style wildcards in
a command's path arguments. However, most commands have --exclude
"" and --include "" parameters that can achieve the
desired result. These parameters perform pattern matching to either
exclude or include a particular file or object. The following pattern
symbols are supported.
*: Matches everything
?: Matches any single character
[sequence]: Matches any character in sequence
[!sequence]: Matches any character not in sequence
Putting this here for others to find, since I just had to figure this out. Here's what I came up with:
s3cmd del $(s3cmd ls s3://[BUCKET]/ | grep '.*s3://[BUCKET]/[FILENAME]' | cut -c 41-)
You can put the regex in the grep search string. For instance, I was searching for specific files to delete (hence the s3cmd del). My regex looked like: '2016-11-04.*s3.*[DN][RS].*'. You may have to adjust the cut for your use. Should also work with s3cmd get.

converting all files in a directory in unix to another format

I have a list of files in a unix directory. All of them are called something.bed and I am looking to convert them all to something.txt
The .bed format can easily be read by txt editors but is not recognised as a txt file in coding languages such as python so doesn't seem to be read by standard txt parsing scripts.
This is probably quite an easy question for anyone with intermediate unix experience but I don't use unix much and have looked around and can't find a quick answer
I tried:
for i in *.bed
do cat > *.txt
done
it just gives 1 file called *.txt, where I'm looking for 1.bed to become 1.txt and 2.bed to become 2.txt etc
Any pointers or solutions would be much appreciated
for i in *.bed; do mv $i ${i/bed/txt}; done
Actually, the following would be safer since my first attempt would convert "bed.bed" into "txt.bed".
for i in *.bed; do mv $i ${i/bed$/txt}; done
Just add the variable properly in the for loop:
for i in *.bed
do
cat $i > ${i%.*}.txt
done
While it would be better to use mv for this:
for i in *.bed
do
mv $i ${i%.*}.txt
done
${i%.*}.txt converts i.bed into i.txt.
Why didn't your attempt work?
for i in *.bed
do
cat > *.txt
done
As you are inside a for loop, the variable to use is i, not *. That is, when looping through the *.bed files, it is i variable that stores the values. Hence you need to use i to refer them.

How to use mv command to rename multiple files in unix?

I am trying to rename multiple files with extension xyz[n] to extension xyz
example :
mv *.xyz[1] to *.xyz
but the error is coming as - " *.xyz No such file or directory"
Don't know if mv can directly work using * but this would work
find ./ -name "*.xyz\[*\]" | while read line
do
mv "$line" ${line%.*}.xyz
done
Let's say we have some files as shown below.Now i want remove the part -(ab...) from those files.
> ls -1 foo*
foo-bar-(ab-4529111094).txt
foo-bar-foo-bar-(ab-189534).txt
foo-bar-foo-bar-bar-(ab-24937932201).txt
So the expected file names would be :
> ls -1 foo*
foo-bar-foo-bar-bar.txt
foo-bar-foo-bar.txt
foo-bar.txt
>
Below is a simple way to do it.
> ls -1 | nawk '/foo-bar-/{old=$0;gsub(/-\(.*\)/,"",$0);system("mv \""old"\" "$0)}'
for detailed explanation check here
Here is another way using the automated tools of StringSolver. Let us say your first file is named abc.xyz[1] a second named def.xyz[1] and a third named ghi.jpg (not the same extension as the previous two).
First, filter the files you want by giving examples (ok and notok are any words such that the first describes the accepted files):
filter abc.xyz[1] ok def.xyz[1] ok ghi.jpg notok
Then perform the move with the filter it created:
mv abc.xyz[1] abc.xyz
mv --filter --all
The second line generalizes the first transformation on all files ending with .xyz[1].
The last two lines can also be abbreviated in just one, which performs the moves and immediately generalizes it:
mv --filter --all abc.xyz[1] abc.xyz
DISCLAIMER: I am a co-author of this work for academic purposes. Other examples are available on youtube.
I think mv can't operate on multiple files directly without loop.
Use rename command instead. it uses regular expressions but easy to use once mastered and more powerful.
rename 's/^text-to-replace/new-text-you-want/' text-to-replace*
e.g to rename all .jar files in a directory to .jar_bak
rename 's/^jar/jar_bak/' jar*

Unix wildcard selectors? (Asterisks)

In Ryan Bates' Railscast about git, his .gitignore file contains the following line:
tmp/**/*
What is the purpose of using the double asterisks followed by an asterisk as such: **/*?
Would using simply tmp/* instead of tmp/**/* not achieve the exact same result?
Googling the issue, I found an unclear IBM article about it, and I was wondering if someone could clarify the issue.
It says to go into all the subdirectories below tmp, as well as just the content of tmp.
e.g. I have the following:
$ find tmp
tmp
tmp/a
tmp/a/b
tmp/a/b/file1
tmp/b
tmp/b/c
tmp/b/c/file2
matched output:
$ echo tmp/*
tmp/a tmp/b
matched output:
$ echo tmp/**/*
tmp/a tmp/a/b tmp/a/b/file1 tmp/b tmp/b/c tmp/b/c/file2
It is a default feature of zsh, to get it to work in bash 4, you perform:
shopt -s globstar
From http://blog.privateergroup.com/2010/03/gitignore-file-for-android-development/:
(kwoods)
"The double asterisk (**) is not a git thing per say, it’s really a linux / Mac shell thing.
It would match on everything including any sub folders that had been created.
You can see the effect in the shell like so:
# ls ./tmp/* = should show you the contents of ./tmp (files and folders)
# ls ./tmp/** = same as above, but it would also go into each sub-folder and show the contents there as well."
According to the documentation of gitignore, this syntax is supported since git version 1.8.2.
Here is the relevant section:
Two consecutive asterisks (**) in patterns matched against full pathname may have special meaning:
A leading ** followed by a slash means match in all directories. For example, **/foo matches file or directory foo anywhere, the
same as pattern foo. **/foo/bar matches file or directory bar
anywhere that is directly under directory foo.
A trailing /** matches everything inside. For example, abc/** matches all files inside directory abc, relative to the location of
the .gitignore file, with infinite depth.
A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, a/**/b matches a/b,
a/x/b, a/x/y/b and so on.
Other consecutive asterisks are considered invalid.

Resources