unix diff to file - unix

I'm having a little trouble getting the output of diff to write to file. I have a new and old version of a .strings file and I want to be able to write the diff between these two files to a .strings.diff file.
Here's where I am right now:
diff -u -a -B $PROJECT_DIR/new/Localizable.strings $PROJECT_DIR/old/Localizable.strings >> $PROJECT_DIR/diff/Localizable.strings.diff
fgrep + $PROJECT_DIR/diff/Localizable.strings.diff > $PROJECT_DIR/diff/Localizable.txt
The result of the diff command writes to Localizable.strings.diff without any issues, but Localizable.strings.diff appears to be a binary file. Is there any way to output the diff to a UTF-8 encoded file instead?
Note that I'm trying to just get the additions using fgrep in my second command. If there's an easier way to do this, please let me know.
Thanks,
Sean

First, you probably need to identify the encoding of the Localizable.strings files. This might be done in a manner described by How to find encoding of a file in Unix via script(s), for example.
Then probably you need to convert the Localizable.strings file to UTF-8 with a tool like iconv using commands something like:
iconv -f x -t UTF-8 $PROJECT_DIR/new/Localizable.strings >Localizable.strings.new.utf8
iconv -f x -t UTF-8 $PROJECT_DIR/old/Localizable.strings >Localizable.strings.old.utf8
Where x is the actual encoding in a form recognized by iconv. You can use iconv --list to show all the encodings it knows about.
Then, you probably need to diff without having to use -a.
diff -u -B Localizable.strings.old.utf8 Localizable.strings.new.utf8 >Localizable.strings.diff.utf8

Related

How can I invoke the name of a file associated with a batch process?

I'm trying to batch process a folder full of text files with pandoc, and I'd like to maintain the current filenames. How do I call the filename as a variable in the output? For example, I want to write a command like this:
pandoc -s notes/*.txt -o rtf/$1.rtf
Where $1 represents the filename grabbed with the * character.
I'm sure this is a simple question, but I don't quite know the right language to search for it properly.
Thanks for any help!
Try
for file in notes/*txt
do
file_base_name=$(basename "${file}" | cut -d'.' -f1)
pandoc -s "$file" -o rtf/${file_base_name}.rtf
done

Replace  with space in a file

In my file somehow  is getting added. I am not sure what it is and how it is getting added.
12345A 210 CBCDEM
I want to remove this character from the file . I tried basic sed command to get it remove but unsuccessful.
sed -i -e 's/\Â//g'
I also read that dos2unix will do the job but unfortunately that also didn't work .Assuming it was hex character I also tried to remove it using hex value sed -i 's/\xc2//g' but that also didnt work
I really want to understand what this character is and how it is getting added. Moreover , is there possible way to delete all such characters in a file .
Adding encoding details :--
file test.txt
test.txt: ISO-8859 text
echo $LANG
en_US.UTF-8
OS Details :--
uname -a
Linux vm-testmachine-001 3.10.0-693.11.1.el7.x86_64 #1 SMP Fri Oct 27 05:39:05 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Regards.
It looks like you have an encoding mismatch between the program that writes the file (in some part of ISO-8859) and the program reading the file (assuming it to be UTF-8). This is a textbook use-case for iconv. In fact the sample in the man-page is almost exactly applicable to your case:
iconv -f iso-8859-1 -t utf-8 test.txt
iconv is a fairly standard program on almost every Unix distribution I have seen, so you should not have any issues here.
Based on the fact that you appear to be writing with English as your primary language, you are probably looking for iso-8859-1, which is quite popular apparently.
If that does not fix your issue, You probably need to find the proper encoding for the output of your database. You can do
iconv -l
to get a list of encodings available for iconv, and use the one that works for you. Keep in mind that the output of file saying ISO-8859 text is not absolute. There is no way to distinguish things like pure ASCII and UTF-8 in many cases. If I am not mistaken, file uses heuristics based on frequencies of character codes in the file to determine the encoding. It is quite liable to make a mistake if the sample is small and/or ambiguous.
If you want to save the output of iconv and your version supports the -o flag, you can use it. Otherwise, use redirection, but carefully:
TMP=$(mktemp)
iconv -f iso-8859-1 -t utf-8 test.txt > "$TMP" && mv "$TMP" test.txt

(grep) Regex to match non-ASCII characters? FROM WINDOWS

I'm developing a pre-commit hook to avoid committing files with non-ascii chars, it works as well from unix system, using the below REGEX:
grep -P -n '[\x80-\xFF]' /tmp/app.txt
Now the issue that is giving me a lot of pain is that when i commit from windows, the result of the grep change, giving me a lot of char more than non ascii chars...
Does someone know how to fix this? I really try a lot of different things..
strings -n 1 filename will show the normal characters, but what if you only want to see the kind of file? file filename will show the kind of file but I am afraid it won't work for you.
You might try something like
cat /tmp/app.txt | tr -d "[:print:]\r\n" | wc -c
or avoiding the cat
tr -d "[:print:]\r\n" < /tmp/app.txt | wc -c

GNU `ls` has `--quoting-style` option, what's the equivalent in BSD `ls`

I will use ls output for pipe input, so I need to escape the file name. when I use GNU ls, It works well. what's the equivalent in BSD ls? I hoping the output is like this.
$ gls --quoting-style escape t*1
text\ 1 text1
Why are/were you trying to use ls in a pipeline? You should probably be using find (maybe with -print0 and xargs -0, or -exec).
I suppose you could use ls -1f and then run the output through vis (or some similar filter) with some appropriate options to add the necessary quoting or escaping of your choice, but without knowing what you are feeding filenames into, and what (if any) other options you would want to use with ls, it's impossible to give much better guidance.
From the freebsd man page on ls there is no such option, however, you can try -m which will give you a comma separated streamed output:
-m Stream output format; list files across the page, separated by
commas.
I tried it on osx and it gave me:
$ ls -m
Hello World, Hello World.txt, foo.txt
That is a lot easier to parse from a script.

Converting this code from R to Shell script?

So I'm running a program that works but the issue is my computer is not powerful enough to handle the task. I have a code written in R but I have access to a supercomputer that runs a Unix system (as one would expect).
The program is designed to read a .csv file and find everything with the unit ft3(monthly total) in the "Units" column and select the value in the column before it. The files are charts that list things in multiple units.
To convert this program in R:
getwd()
setwd("/Users/youruserName/Desktop")
myData= read.table("yourFileName.csv", header=T, sep=",")
funData= subset(myData, units="ft3(monthly total)", select=units:value)
write.csv(funData, file="funData.csv")
To a program in Shell Script, I tried:
pwd
cd /Users/yourusername/Desktop
touch RunThisProgram
nano RunThisProgram
(((In nano, I wrote)))
if
grep -r yourFileName.csv ft3(monthly total)
cat > funData.csv
else
cat > nofun.csv
fi
control+x (((used control x to close nano)))
chmod -x RunThisProgram
./RunThisProgram
(((It runs for a while)))
We get a funData.csv file output but that file is empty
What am I doing wrong?
It isn't actually running, because there are a couple problems with your script.
grep needs the pattern first, and quoted; -r is for recursing a
directory...
if without a then
cat is called wrong so it is actually reading from stdin.
You really only need one line:
grep -F "ft3(monthly total)" yourFileName.csv > funData.csv

Resources