How do I change a shell scripts character encoding? - unix

I am using Gina Trapiani's excellent todo.sh to organize my todo-list.
However being a dane, it would be nice if the script accepted special danish characters like ø and æ.
I am an absolute UNIX-n00b, so it would be a great help if anybody could tell me how to fix this! :)

Slowly, the Unix world is moving from ASCII and other regional encodings to UTF-8. You need to be running a UTF terminal, such as a modern xterm or putty.
In your ~/.bash_profile set you language to be one of the UTF-8 variants.
export LANG=C.UTF-8
or
export LANG=en_AU.UTF-8
etc..
You should then be able to write UTF-8 characters in the terminal, and include them in bash scripts.
#!/bin/bash
echo "UTF-8 is græat ☺"
See also: https://serverfault.com/questions/11015/utf-8-and-shell-scripts

What does this command show?
locale
It should show something like this for you:
LC_CTYPE="da_DK.UTF-8"
LC_NUMERIC="da_DK.UTF-8"
LC_TIME="da_DK.UTF-8"
LC_COLLATE="da_DK.UTF-8"
LC_MONETARY="da_DK.UTF-8"
LC_MESSAGES="da_DK.UTF-8"
LC_PAPER="da_DK.UTF-8"
LC_NAME="da_DK.UTF-8"
LC_ADDRESS="da_DK.UTF-8"
LC_TELEPHONE="da_DK.UTF-8"
LC_MEASUREMENT="da_DK.UTF-8"
LC_IDENTIFICATION="da_DK.UTF-8"
LC_ALL=
If not, you might try doing this before you run your script:
LANG=da_DK.UTF-8
You don't say what happens when you run the script and it encounters these characters. Are they in the todo file? Are they entered at a prompt? Is there an error message? Is something output in place of the expected output?
Try this and see what you get:
read -p "Enter some characters" string
echo "$string"

Related

.REnviron with special characters

I am having trouble trying to add environment variables to a REnviron file that have special characters. This is on a Debian machine with the file located at /usr/lib/R/etc/Renviron. If my value has a &, I get a weird error when installing packages (although the package installs fine):
REnviron file: TEST_KEY=HEY&X&THERE
Command: install.packages(futures)
Error:
/usr/lib/R/bin/Rcmd: 468: /usr/lib/R/etc/Renviron: THERE: not found
/usr/lib/R/bin/Rcmd: 468: /usr/lib/R/etc/Renviron: X: not found
Which seems like it's because & is a special character. I can fix this by putting quotes around the value like this: TEST_KEY="HEY&X&THERE". However at that point I can't figure out how to handle when a value itself has a " in it. For example if I wanted the value to be HEY&"&THERE I am not sure how to format that (a backlash in front of the quote didn't work). I tried "HEY&\"&THERE", but that left the \ in the string once loaded into R. Which leads me to my broader question:
How can I ensure that anything that satisfies linux environment variable styling rules works in an REnviron file?
Update: this seems to be a Debian specific issue. You can recreate it using the debian:bullseye-slim docker image, installing R, then editing the Renviron to have a & in it.
Okay I spent an hour looking into this and I think there is the answer.
In both Ubuntu and Debian (and maybe other systems too), the Renviron file gets executed within bash. So what you're typing in the file is exactly bash commands. You can see in lines 39-40 of RCmd the commands:
. "${R_HOME}/etc${R_ARCH}/Renviron"
export `sed 's/^ *#.*//; s/^\([^=]*\)=.*/\1/' "${R_HOME}/etc${R_ARCH}/Renviron"`
The first line runs the Renviron file in the shell, the second then exports the variable names based on lines that have a = in them.
So in our case the way to handle this is to put double quotations around all the values, and any double-quote within the string should get a \ before it. The reason why I didn't realize the solution before I posted the question is that I didn't use cat() when printing my text in R, which removes the leading \. So: "HEY&\"&THERE" would be the right way to do it.
To recap:
The Renviron file is executed on the shell
To handle special characters in strings you use the same logic you would in the OS (so double quotes with \ to escape actual double quotes).

How get zsh prompt of form: [working-directory] # under macOS

In macOS Catalina (10.15.6), I want to use zsh for Terminal sessions. Formerly I had been using the default bash. For bash, I had a .profile containing the line
export PS1="[\u#\h:\w]$ "
which gave a prompt of the form:
[me#myhost:current-dir]$
I want something similar for zsh, but without the user-name#host-name prefix and with # instead of $ for the actual prompt.
In a zsh Terminal session, the command
PROMPT='[%/]%% '
gives the expected prompt, with the current directory enclosed in square brackets.
Of course I don't want to enter that manually each time. Instead, I want to set this in .zprofile. So in .zprofile I included the line
export PROMPT='[%/]%% '
However, that does not work as expected -- the prompt now has the form:
me#myhost current-dir %
Question: How can I get the zsh prompt to have the desired form as follows?
[current-dir] %
Just add the following export to ~/.zshrc, otherwise it won't work.
export PROMPT='[%1~] %%'
That will give you the following, my directory name is test-workflow-branch-only
[test-workflow-branch-only] %
NOTE: This will give you [~] % when in ~/ directory so don't be alarmed when you see that
UPDATE - per comment questions
We add it to ~/.zshrc as this file gets sourced in all interactive shell configurations. The file ~/.zprofile are for commands that we want to execute when we log in, therefore a non-login shell won't source this file.
Thanks for info from Edward Romero. My critique of answer is that it contains four wasted characters, '[',']',' ','%'. Using instead PROMPT='%d>' yields the nice clear absolute path, something like this:
/Users/myuser/test-workflow-branch-only>
In any case, nice to get this headache behind me, and begin reaping the wonderful benefits of using zsh, whatever they may be.

Replace  with space in a file

In my file somehow  is getting added. I am not sure what it is and how it is getting added.
12345A 210 CBCDEM
I want to remove this character from the file . I tried basic sed command to get it remove but unsuccessful.
sed -i -e 's/\Â//g'
I also read that dos2unix will do the job but unfortunately that also didn't work .Assuming it was hex character I also tried to remove it using hex value sed -i 's/\xc2//g' but that also didnt work
I really want to understand what this character is and how it is getting added. Moreover , is there possible way to delete all such characters in a file .
Adding encoding details :--
file test.txt
test.txt: ISO-8859 text
echo $LANG
en_US.UTF-8
OS Details :--
uname -a
Linux vm-testmachine-001 3.10.0-693.11.1.el7.x86_64 #1 SMP Fri Oct 27 05:39:05 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Regards.
It looks like you have an encoding mismatch between the program that writes the file (in some part of ISO-8859) and the program reading the file (assuming it to be UTF-8). This is a textbook use-case for iconv. In fact the sample in the man-page is almost exactly applicable to your case:
iconv -f iso-8859-1 -t utf-8 test.txt
iconv is a fairly standard program on almost every Unix distribution I have seen, so you should not have any issues here.
Based on the fact that you appear to be writing with English as your primary language, you are probably looking for iso-8859-1, which is quite popular apparently.
If that does not fix your issue, You probably need to find the proper encoding for the output of your database. You can do
iconv -l
to get a list of encodings available for iconv, and use the one that works for you. Keep in mind that the output of file saying ISO-8859 text is not absolute. There is no way to distinguish things like pure ASCII and UTF-8 in many cases. If I am not mistaken, file uses heuristics based on frequencies of character codes in the file to determine the encoding. It is quite liable to make a mistake if the sample is small and/or ambiguous.
If you want to save the output of iconv and your version supports the -o flag, you can use it. Otherwise, use redirection, but carefully:
TMP=$(mktemp)
iconv -f iso-8859-1 -t utf-8 test.txt > "$TMP" && mv "$TMP" test.txt

unix diff to file

I'm having a little trouble getting the output of diff to write to file. I have a new and old version of a .strings file and I want to be able to write the diff between these two files to a .strings.diff file.
Here's where I am right now:
diff -u -a -B $PROJECT_DIR/new/Localizable.strings $PROJECT_DIR/old/Localizable.strings >> $PROJECT_DIR/diff/Localizable.strings.diff
fgrep + $PROJECT_DIR/diff/Localizable.strings.diff > $PROJECT_DIR/diff/Localizable.txt
The result of the diff command writes to Localizable.strings.diff without any issues, but Localizable.strings.diff appears to be a binary file. Is there any way to output the diff to a UTF-8 encoded file instead?
Note that I'm trying to just get the additions using fgrep in my second command. If there's an easier way to do this, please let me know.
Thanks,
Sean
First, you probably need to identify the encoding of the Localizable.strings files. This might be done in a manner described by How to find encoding of a file in Unix via script(s), for example.
Then probably you need to convert the Localizable.strings file to UTF-8 with a tool like iconv using commands something like:
iconv -f x -t UTF-8 $PROJECT_DIR/new/Localizable.strings >Localizable.strings.new.utf8
iconv -f x -t UTF-8 $PROJECT_DIR/old/Localizable.strings >Localizable.strings.old.utf8
Where x is the actual encoding in a form recognized by iconv. You can use iconv --list to show all the encodings it knows about.
Then, you probably need to diff without having to use -a.
diff -u -B Localizable.strings.old.utf8 Localizable.strings.new.utf8 >Localizable.strings.diff.utf8

simple shell script in cygwin

#!/bin/bash
echo 'first line' >foo.xml
echo 'second line' >>foo.xml
I am a total newbie to shell scripting.
I am trying to run the above script in cygwin. I want to be able to write one line after the other to a new file.
However, when I execute the above script, I see the follwoing contents in foo.xml:
second line
The second time I run the script, I see in foo.xml:
second line
second line
and so on.
Also, I see the following error displayed at the command prompt after running the script:
: No such file or directory.xml
I will eventually be running this script on a unix box, I am just trying to develop it using cygwin. So I would appreciate it if you could point out if it is a cygwin oddity and if so, should I avoid trying to use cygwin for development of such scripts?
Thanks in advance.
Run dos2unix on your shell script. That will fix the problem.
I had the same kind of problem as the original poster: A very simple script file was not working in Cygwin.
Thanks to Don Branson for the clue.
The fix for me was built into the text editor I'm using. (Most programmer's editors have a feature like this.) For example, in my case I'm using Notepad++, which has a menu item to convert the file line endings to Unix-style. From the menu: [Edit]->[EOL Conversion]->[Unix (LF)]
Then the script behaved as expected.
But there must be something else that is wrong here. When I try it, it works as expected.
> foo.xml puts the line into foo.xml, replacing any previous contents.
>> foo.xml appends to file
It sounds like you may have a typo somewhere. Also keep in mind that while the Windows command prompt can be forgiving about paths with embedded spaces, cygwin's shells will not be, so if you have a filename that contains embedded spaces, you need to either quote the filename or escape the spaces:
echo 'first line' > 'My File.txt'
echo 'first line' > My\ File.txt
The same goes for certain "special" characters including quotes, ampersand (&), semicolons (;) and generally most punctuation other than period/full-stop (.).
So if you are seeing those issues using the exact script that you are running (i.e. you copy and pasted it, there is no possibility of transcription errors) then something truly strange may be happening that I can't explain. Otherwise, there may be a misplaced space or unquoted character somewhere.
I cannot reproduce your results. The script you quote looks correct, and indeed works as expected in my installation of Cygwin here, producing the file foo.xml containing the lines first line and second line; implying that what you are actually running differs from what you quoted in some way that is causing the problem.
The error message implies some sort of problem with the filename in the first echo line. Do you have some nonprintable characters in the script you are running? Have you missed escaping a space in the filename? Are you subsituting shell variables and mistyping the name of the variable or failing to escape the resulting string?
The above should work normally..
However you can always specify a heredoc:
#!/bin/bash
cat <<EOF > foo.xml
first line
second line
EOF

Resources