Removing an Extension in a filename in HDFS using shell script - unix

Please help me how to remove an extension in a filename in HDFS using
unix shell script.
For example, my initial filename is sample.txt.gz. I want to remove
the .gz in the filename.
Here what I have done so far.
#Parameters
baseDirHdfs=${1} dss=${2} ds=${3} processDirHdfs=${4} filename=${5} kerberosKeytab=${6} kerberosPrincipal=${7}
kinit -kt ${kerberosKeytab} ${kerberosPrincipal}
#Removing .gz extension
newFilename=echo ${filename} | cut -f1-6 -d '.'
#Decompressing .GZ Files
hdfs dfs -cat /${baseDirHdfs}/${dss}/${ds}/${processDirHdfs}/${filename}|gzip -d|hdfs dfs -put - /${baseDirHdfs}/${dss}/${ds}/${processDirHdfs}/${newFilename}

Normally, the filename and extensions are separated by a dot(.)
So, something like this will do it:
mayankp#mayank: $file=myfile.sh
mayankp#mayank: $file_name=`echo $file| awk -F'.' '{print $1}'`
mayankp#mayank: $echo $file_name
myfile
Let me know if this helps.

Related

Rename filename to filename_inode (Unix)

I would like to move my file from one location to another.
In the process, I want my filename to change from filename to filename_inode
Any idea how I can do that?
I know I can get the inode using
ls -i $filename
I would prefer a solution that does not require me to download any addition features.
In Bash, you can do the following
FILE=*<your-file-name>*
mv $FILE ${FILE}_$(ls -li $FILE | awk '{ print $1}')

Unix Create directory based on first value in file, then copy file from the full path listed in the same file to the directory

I have the following file containing a value before the '|' (I can change the delimiter if needed). I want to create two directories 368126 and 368153 (these values can change), and then copy the files that are in listed after the '|' from their full path location to the directories 368126 and 368153. How can I do this? Any help greatly appreciated.
368126|/nfs/filesEU/UA08039_512.png
368126|/nfs/filesEU/UA08039_256.png
368153|/nfs/filesUS/UA06495_512.png
368153|/nfs/filesUS/UA06495_256.png
368153|/nfs/filesUS/UA06495_64.png
I want to end up with the files sitting in the new directories like this
368126
UA08039_512.png
UA08039_256.png
368153
UA06495_512.png
UA06495_256.png
UA06495_64.png
You can use something like this to generate the script to be executed for the movement:
awk -F\| '{print "mkdir -p " $1 " ; cp " $2 " " $1}' input_file > script.sh
The execute the script as
sh ./script.sh
Not very efficient but hope you get the idea.

Download and change filename to a list of urls in a txt file

Let's say I have a .txt file where I have a list of image links that I want to download.
exaple:
image.jpg
image2.jpg
image3.jpg
I use: cat images.txt | xargs wget and it works just fine
What I want to do now is to provide another .txt file with the following format:
some_id1:image.jpg
some_id2:image2.jpg
some_id3:image3.jpg
What I want to do is to split each line in the ':' , download the link to the right, and change the downloaded file-name with the id provided to the left.
I want to somehow use wget image.jpg -O some_id1.jpg
So the output will be:
some_id1.jpg
some_id2.jpg
some_id3.jpg
Any ideas ?
My goto for such tasks is awk:
while read line; do lfn=`echo "$line" | awk -F":" '{ print $1".jpg" }'` ; rfn=`echo "$line" | awk -F":" '{ print $2 }'` ; wget $rfn -O $lfn ; done < images.txt
This presumes, of course, all the local file names should have the .jpg extension.

How can I rename a file on unix using characters in filename?

I have a bunch of files in a unix directory that look like the following:
filename_1234567.txt
I need to rename them by copying the last three characters of each filename
to the front of the filename like this:
567_filename_1234567.txt
Note: Both the filename and extension are variable.
I'm running this on a Solaris box.
Thanks in advance!
One possibility:
\ls *.txt | sed 's/\(.*\)\(...\).txt/mv \1\2.txt \2_\1.txt/' | sh
(probably wise to write that with echo mv, while you're double-checking it does what you think it does).
I can't decide if the alternative
sed 's/\(\(.*\)\(...\).txt\)/mv \1 \3_\2.txt/'
is more robust, or just way too fussy.
#for file in *_*.*
for file in `ls *_*.txt`
do
last_3="$(echo $file | grep -o "...\...."|cut -d'.' -f1)"
cp $file ${last_3}_${file}
done

Is there any way to extract only one file (or a regular expression) from tar file

I have a tar.gz file.
Because of space issues and the time required extract is longer, I need to extract only the selected file.
I have tried the below
grep -l '<text>' *
file1
file2
only file1,file2 should be extracted.
What should I Do to SAVE all the tail -f data to a FILE swa3?
I have swa1.out which has list of online data inputs.
swa2 is a file which should skip the keywords from swa1.
swa3 is a file where it should write the data.
Can anyone help in this?
I have tried below commnad, but I'm not able to get it
tail -f SWA1.out |grep -vf SWA2 >> swa3
You can use do this with --extract option like this
tar --extract --file=test.tar.gz main.c
Here in --file , specify the .gz filename and at the end specify the
filename you want to extract.

Resources