Issues with iconv command in script - unix

I am trying to create a script which detects if files in a directory have not UTF-8 characters and if they do, grab the file type of that particular file and perform the iconv operation on it.
The code is follows
find <directory> |sed '1d'><directory>/filelist.txt
while read filename
do
file_nm=${filename%%.*}
ext=${filename#*.}
echo $filename
q=`grep -axv '.*' $filename|wc -l`
echo $q
r=`file -i $filename|cut -d '=' -f 2`
echo $r
#file_repair=$file_nm
if [ $q -gt 0 ]; then
iconv -f $r -t utf-8 -c ${file_nm}.${ext} >${file_nm}_repaired.${ext}
mv ${file_nm}_repaired.${ext} ${file_nm}.${ext}
fi
done< <directory>/filelist.txt
While running the code, there are several files that turn into 0 byte files and .bak gets appended to the file name.
ls| grep 'bak' | wc -l
36
Where am I making a mistake?
Thanks for the help.

It's really not clear what some parts of your script are supposed to do.
Probably the error is that you are assuming file -i will output a string which always contains =; but it often doesn't.
find <directory> |
# avoid temporary file
sed '1d' |
# use IFS='' read -r
while IFS='' read -r filename
do
# indent loop body
file_nm=${filename%%.*}
ext=${filename#*.}
# quote variables, print diagnostics to stderr
echo "$filename" >&2
# use grep -q instead of useless wc -l; don't enter condition needlessly; quote variable
if grep -qaxv '.*' "$filename"; then
# indent condition body
# use modern command substitution syntax, quote variable
# check if result contains =
r=$(file -i "$filename")
case $r in
*=*)
# only perform decoding if we can establish encoding
echo "$r" >&2
iconv -f "${r#*=}" -t utf-8 -c "${file_nm}.${ext}" >"${file_nm}_repaired.${ext}"
mv "${file_nm}_repaired.${ext}" "${file_nm}.${ext}" ;;
*)
echo "$r: could not establish encoding" >&2 ;;
esac
fi
done
See also Why is testing “$?” to see if a command succeeded or not, an anti-pattern? (tangential, but probably worth reading) and useless use of wc
The grep regex is kind of mysterious. I'm guessing you want to check if the file contains non-empty lines? grep -qa . "$filename" would do that.

Related

dynamically pass string to Rscript argument with sed

I wrote a script in R that has several arguments. I want to iterate over 20 directories and execute my script on each while passing in a substring from the file path as my -n argument using sed. I ran the following:
find . -name 'xray_data' -exec sh -c 'Rscript /Users/Caitlin/Desktop/DeMMO_Pubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f {} -b "{}/SEM_images" -c "{}/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "`sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/' "{}"`"' sh {} \;
which results in this error:
ubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f {} -b "{}/SEM_images" -c "{}/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "`sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/' "{}"`"' sh {} \;
sh: command substitution: line 0: syntax error near unexpected token `('
sh: command substitution: line 0: `sed -e s/.*DeMMO.*[/](.*)_.*[/]xray_data/1/ "./DeMMO1/D1T3rep_Dec2019_Ellison/xray_data"'
When I try to use sed with my pattern on an example file path, it works:
echo "./DeMMO1/D1T1exp_Dec2019_Poorman/xray_data" | sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/'
which produces the correct substring:
D1T1exp_Dec2019
I think there's an issue with trying to use single quotes inside the interpreted string but I don't know how to deal with this. I have tried replacing the single quotes around the sed pattern with double quotes as well as removing the single quotes, both result in this error:
sed: RE error: illegal byte sequence
How should I extract the substring from the file path dynamically in this case?
To loop through the output of find.
while IFS= read -ru "$fd" -d '' files; do
echo "$files" ##: do whatever you want to do with the files here.
done {fd}< <(find . -type f -name 'xray_data' -print0)
No embedded commands in quotes.
It uses a random fd just in case something inside the loop is eating/slurping stdin
Also -print0 delimits the files with null bytes, so it should be safe enough to handle spaces tabs and newlines on the path and file names.
A good start is always put an echo in front of every commands you want to do with the files, so you have an idea what's going to be executed/happen just in case...
This is the solution that ultimately worked for me due to issues with quotes in sed:
for dir in `find . -name 'xray_data'`;
do sampleID="`basename $(dirname $dir) | cut -f1 -d'_'`";
Rscript /Users/Caitlin/Desktop/DeMMO_Pubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f "$dir" -b "$dir/SEM_images" -c "$dir/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "$sampleID";
done

Bash Shell - Print names of all self-referential files

The idea is the write a bash script that prints the names of files in the current directory that contain their own name in their content.
e.g if a file called hello contains the string hello, we print hello, and we do this for all files in the current directory
Here's what I wrote, and I have no idea why it doesn't work.
#!/bin/bash
for file in *
do
if (cat $file | grep $file) 2> /dev/null
then
echo $file
fi
done
Fixed:
#!/bin/bash
for file in *
do
if grep $file $file 2> /dev/null
then
echo $file
fi
done
Apart from quoting issues, potential regex escaping issues, and the useless use of cat and (...), your code should work in principle.
Try this version - if it doesn't work, something else must be going on:
#!/bin/bash
for file in *
do
if grep -qF "$file" "$file" 2> /dev/null
then
echo "$file"
fi
done
-q makes grep not output matching lines (whether a match was found is implied by the exit code).
-F ensures that the search term is treated as a literal (rather than a regex).

Handling Special characters from Unix to DOS

I have a file that fetches descriptions from the DB. The values have special characters in them.
So while writing the same into the file after converting to DOS the characters change to something else.
So as a correction i have used sed command to replace the converted characters to the original special character and it works.
I could do for all the special characters that were present in DB.
Examples :
Original :
CANDELE 50°
After conversion to dos its visible as CANDELE 50Áø
.So i used a sed command sed -e 's/Áø/°/g'
What i want to do now is a permanent fix to automatically change any special character that comes in. is there any command that automatically converts a special character to its original after conversion to dos so that I can avoid a manual addition for every character.
Kindly help me doing the same:
changeFileFormat () {
cd $1
echo "Changing file format Unix -> DOS." >> $LOG_FILE
for file in `ls *.csv`
do
mv ${file} ${file}_unix
unix2dos ${file}_unix >> ${file}_dos
sed -e 's/ÁøU/°/g' -e 's/Âû/Ó/g' -e 's/‹¨«//g' -e 's/ª/ì/g' -e 's/¸/ù/g' -e 's/Áœ/£/g' -e 's/Á¨/¿/g' -e 's/ƒâª/€/g' ${file}_dos >> ${file}
if [ $? -ne 0 ]; then
echo "Conversion failed for file: [ $file ]." >> $LOG_FILE
mv ${file}_unix ${file}
else
rm -f ${file}_dos
rm -f ${file}_unix
fi;
done
echo "Conversion finished." >> $LOG_FILE
}
DB description : CANDELE 50°
CSV file that gets created in unix : ART|M|02A_1057M5706 |CANDELE 50°
After DOS conversion : ART|M|02A_1057M5706 |CANDELE 50Áø
After SED command : ART|M|02A_1057M5706 |CANDELE 50°

UNIX feed $PATH to find

Is there any way I can search through all folders in my path for a file. Something like
for f in $PATH ; do ; find "$f" -print | grep lapack ; done
So that every folder in PATH is recursively searched for lapack
This should do it, I ran a few tests, seems to be working:
echo -n $PATH | xargs -d: -i find "{}" -name "*lapack*"
The -n in echo prevents it from writing a newline in the end (otherwise the newline would be passed as part of the last directory name to find(1)).
The -d in xargs(1) says that the delimiter is :. The -i makes it replace {} with the current path.
The rest is self-explanatory, I guess.

mutt command with multiple attachments in single mail unix

My requirement is to attach all the .csv files in a folder and send them in a single mail.
Here is what have tried,
mutt -s "subject" -a *.csv -- abc#gmail.com < subject.txt
The above command is not working (It's not recognizing multiple files) and throwing the error
Error sending message, child exited 67 (User unknown.).
Could not send the message.
Then I tried using multiple -a option as follows,
mutt -s "subject" -a aaa.csv -a bbb.csv -- abc#gmail.com < subject.txt
This works as expected.
But this is not feasible for 100 files for example. I should be able use it with file mask (as like *.csv to take all csv files). Is there is any way we can use like *.csv in single command?
Thanks
Mutt doesn't support such syntax, but it doesn't mean it's impossible. You just have to build the mutt command.
mutt -s "subject" $( printf -- '-a %q ' *.csv ) ...
The command in $( ... ) produces something like this:
-a aaa.csv -a bbb.csv -a ...
Here is the example of sending multiple files using a single command -
mutt -s "Subject" -i "Mail_body text" email_id#abc.com -c email_cc_id#abc.com -a attachment1.pdf -a attachment2.pdf
At the end of the command line use -a for the attachment .
Some linux system have attachment size limit . Mostly it support less size .
I'm getting backslash( \ ) Additionally
Daily_Batch_Status{20131003}.PDF
Daily_System_Monitoring{20131003}.PDF
printf -- '-a %q ' *.PDF
-a Daily_Batch_Status \ {20131003 \ }.PDF -a Daily_System_Monitoring \ {20131003 \ }.PDF
#!/bin/bash
from="me#address.com"
to="target#address.com"
subject="pdfs $(date +%B) $(date +%Y)"
body="You can find the pdfs from $(date +%B) $(date +%Y)"
# here comes the attachments
mutt -s "$subject" $( printf -- ' -a %q' $PWD/*.pdf ) -- $to <<EOF
Dear Mr and Ms,
$(echo $body)
$(cat ~/.signature)
EOF
but it does not work with escape characters in file name like "\[5\]" which can come in MacOs.
I created as a script and collect needed PDFs in a folder and just run the script from that location. So monthly reports are sent... it does not matter how many pdfs (number can vary) but also there should be no white space.

Resources