Search files and run a script on every result - Cont: - unix

I would like to know how to search certain pattern of files (GunZip Files) in all Sub Directories ( Month wise / Date wise - Sub Directories created).
And then, execute a script on the found files. Also need to populate FILENAME along with output for tracking purpose and further analysis on that particular files.
Step1: For example: currently searching files on this pattern TT_DETAIL*.gz.
find /cygdrive/c/Test/ -name TT_DETAIL*.gz
output#1:
/cygdrive/c/Test/Feb2014/TT_DETAIL_20141115.csv.gz
/cygdrive/c/Test/Jan2014/TT_DETAIL_20141110.csv.gz
/cygdrive/c//Test/Mar2014/TT_DETAIL_20141120.csv.gz
Step2:
zcat TT_DETAIL*.gz | awk 'BEGIN { FS=OFS=","} { if ($11=="10") print $2,$3,$6,$10,$11,$17}' >Op_TT_Detail.txt
cat Op_TT_Detail.txt
ZZZ,AAA,ECH,1,10,XXX
ZZZ,BBB,ECH,1,10,XXX
ZZZ,CCC,ECH,1,10,XXX
ZZZ,DDD,ECH,1,10,XXX
Thanks fedorqui for below script is working fine without FILENAME.
while IFS= read -r file
do
awk 'BEGIN { FS=OFS=","} { if ($11=="10") print $2,$3,$6,$10,$11,$17}' <(zcat "$file") >>Op_TT_Detail.txt
done < <(find /cygdrive/c/Test/ -name TT_DETAIL*.gz)
Have tried below command to populate FILENAME along with output for tracking purpose :
while IFS= read -r file
do
awk 'BEGIN { FS=OFS=","} { if ($11=="10") print $2,$3,$6,$10,$11,$17,FILENAME}' <(zcat "$file") >>Op_TT_Detail.txt
done < <(find /cygdrive/c/Test/ -name TT_DETAIL*.gz)
Desired Output:
ZZZ,AAA,ECH,1,10,XXX,/cygdrive/c/Test/Feb2014/TT_DETAIL_20141115.csv.gz
ZZZ,BBB,ECH,1,10,XXX,/cygdrive/c/Test/Feb2014/TT_DETAIL_20141115.csv.gz
ZZZ,CCC,ECH,1,10,XXX,/cygdrive/c//Test/Mar2014/TT_DETAIL_20141120.csv.gz
ZZZ,DDD,ECH,1,10,XXX,/cygdrive/c//Test/Mar2014/TT_DETAIL_20141120.csv.gz
Since FILENAME is not working for *.gz files , should I write" find /cygdrive/c/Test/ -name TT_DETAIL*.gz " into another output file
then call that output file into script , I don't have a write access for source files located server.
Looking for your suggestions !!!

Nice to see you are using the snippet I wrote in the previous question!
I would use this:
while IFS= read -r file
do
awk -v file="$file" 'BEGIN { FS=OFS=","} \
{ if ($11=="10") print $2,$3,$6,$10,$11,$17, file}' \
<(zcat "$file") >>Op_TT_Detail.txt
done < <(find /cygdrive/c/Test/ -name TT_DETAIL*.gz)
That is, with -v file="$file" you give the file name as a variable to awk. And then you use it in your print command.

Related

Download and change filename to a list of urls in a txt file

Let's say I have a .txt file where I have a list of image links that I want to download.
exaple:
image.jpg
image2.jpg
image3.jpg
I use: cat images.txt | xargs wget and it works just fine
What I want to do now is to provide another .txt file with the following format:
some_id1:image.jpg
some_id2:image2.jpg
some_id3:image3.jpg
What I want to do is to split each line in the ':' , download the link to the right, and change the downloaded file-name with the id provided to the left.
I want to somehow use wget image.jpg -O some_id1.jpg
So the output will be:
some_id1.jpg
some_id2.jpg
some_id3.jpg
Any ideas ?
My goto for such tasks is awk:
while read line; do lfn=`echo "$line" | awk -F":" '{ print $1".jpg" }'` ; rfn=`echo "$line" | awk -F":" '{ print $2 }'` ; wget $rfn -O $lfn ; done < images.txt
This presumes, of course, all the local file names should have the .jpg extension.

unix: Can I delete files in a directory that do not contain text?

Can I delete files in a directory that do NOT contain any text? These are text files with the extension '.fasta'. Initially I am running this script:
for g in `cat genenames.txt` ; do cat *${g}_*.fa > $g.fasta ; done
On a list of files that look like:
id_genename_othername.fa
But in some directories, not all the genenames from the list (genenames.txt) have files with names that match. So sometimes I will get this message:
cat: *genename_*.fa: No such file or directory
The above code still makes a '.fasta' file with the genename that doesn't exist and I would like to remove it. THANK YOU.
Assuming your script is using #!/bin/bash, I'd do
shopt -s nullglob
while IFS= read -r pattern; do
files=( *"$pattern"*.fa )
if [[ "${#files[#]}" -eq 0 ]]; then
echo "no files match pattern *$pattern*.fa"
else
cat "${files[#]}" > $pattern.fasta
fi
done < genenames.txt
Have you tried the following?
for g in `cat genenames.txt` ; do cat *${g}_*.fa 2>/dev/null > $g.fasta ; done
This should prevent the not found errors from producing files

I want to recursively insert two lines in all files of my directory where it was not present?

I have a directory customer. I have many customers in customer directory.
Now I want to add two lines in some process_config file within customer directory where it was not available.
For example:
/home/sam/customer/a1/na/process_config.txt
/home/sam/customer/p1/emea/process_config.txt
and so so.
Is this possible by single command like find & sed?
With a simple for loop :
for file in /home/sam/customer/*/*/process_config.txt; do
printf "one line\nanother line\n" >> "$file"
done
find /home/sam/customer -name 'process_config.txt' -exec DoYourAddWithSedAwkEchoOrWhatever {} \;
find give you the possibility to select each wanted (selected) file
option -exec start a subshell with your command on this file.
{} is the file name (full name) in this case.
Use \; as end of command for the iteration (other command couldbe used with the standard behaviour of ; ex -exec echo 'line1' >> {} ; echo "line2" >> {} \;
sed, awk or echo like in sample can modify the file

Split csv according to field and create Subdirectory then save :

Would like to split a csv file according to the 2nd "field". For instance the csv file contains:
Input.csv :: /c/Test/
aaa,K1,ppp
ddd,M3,ppp
bbb,K1,ppp
ccc,L2,ppp
This file would be split into three separate files according to second field.
First file: /c/Test/K1/K1.csv
aaa,K1,ppp
bbb,K1,ppp
Second file: /c/Test/M3/M3.csv
ddd,M3,ppp
Third file: /c/Test/L2/L2.csv
ccc,L2,ppp
Tried the below command to split file based on 2nd column and working fine, however the splitted files on the same directory
Like: /c/Test/K1.csv and /c/Test/M3.csv etc ..
awk -F, '{ print > $2".csv"}' Input.csv
Have tried the below command is not working to create subdirectory and incomplete , please help ..
awk -F, 'BEGIN { system("mkdir -p $2") } { print > $10".csv"}' Input.csv
awk -F, '{ print >/system("mkdir -p $2")/ $2".txt"}' Input.csv
awk -F, '{ system("mkdir -p "$2); print > $2"/"$2".csv"}' Input.csv
Assuming your Input.csv contains:
aaa,K1,ppp
ddd,M3,ppp
bbb,K1,ppp
ccc,L2,ppp
And this file is in /c/Test/ and you want directories to be created in /c/Test/.
The main difference with your attempt is system("mkdir -p "$2) i.e put $2 outside of quotes. This will concatenate "mkdir -p " and the value of $2. When you put it inside quotes it becomes literal $2 and the value is not available to mkdir command.
After the directory is created, it prints output to the desired file which has the path $2"/"$2".csv"
You could do this in bash using read:
#!/bin/bash
while IFS="," read a b c; do
mkdir -p "$b" && echo "$a,$b,$c" >> "$b/$b.csv"
done < Input.csv
read splits the input line on the input field separator (IFS). It makes the directory based on the name of the second column and echos the line to the relevant file.
Or if you can use bash arrays:
#!/bin/bash
(
IFS=","
while read -a l
do
mkdir -p "${l[1]}" && echo "${l[*]}" >> "${l[1]}/${l[1]}.csv"
done < Input.csv
)
The use of ( ) means that the original value of $IFS is preserved once the script has run.

Shell script to process files

I need to write a Shell Script to process a huge folder of nearly 20 levels.I have to process each and every file and check which files contain lines like
select
insert
update
When I mean line it should take the line till I find a semicolon in that file.
I should get a result like this
C:/test.java select * from dual
C:/test.java select * from test
C:/test1.java select * from tester
C:/test1.java select * from dual
and so on.Right now I have a script to read all the files
#!bin/ksh
FILE=<FILEPATH to be traversed>
TEMPFILE=<Location of Temp file>
cd $FILE
for f in `find . ! -type d`;
do
cat $FILE/addedText.txt>>$TEMPFILE/newFile.txt
cat $f>>$TEMPFILE/newFile.txt
rm $f
cat $TEMPFILE/newFile.txt>>$f
rm $TEMPFILE/newFile.txt
done
I have very little knowledge of awk and sed to proceed further in reading each file and achieve what I want to.Can anyone help me in this
if you have GNU find/gawk
find /path -type f -name "*.java" | while read -r FILE
do
awk -vfile="$FILE" 'BEGIN{RS=";"}
/select|update|insert/{
b=gensub(/(.*)(select|update|insert)(.*)/,"\\2\\3","g",$0)
gsub(/\n+/,"",b)
print file,b
}
' "$FILE"
done
if you are on Solaris, use nawk
find /path -type f -name "test*file" | while read -r FILE
do
nawk -v file="$FILE" 'BEGIN{RS=";"}
/select/{ gsub(/.*select/,"select");gsub(/\n+/,"");print file,$0; }
/update/{ gsub(/.*update/,"update");gsub(/\n+/,"");print file,$0; }
/insert/{ gsub(/.*insert/,"insert");gsub(/\n+/,"");print file,$0; }
' "$FILE"
done
Note this is simplistic case. your SQL statement might be more complicated.

Resources