UTF-8 support in R on Windows - r

Since new function 'Beta: Use Unicode UTF-8 for worldwide language support' is added on Windows10, I thought it is possible for R to convert locale environment to UTF-8. However, when I try to change system locale to UTF-8 by
Sys.setlocale(locale = "Japanese_Japan.65001")
or
Sys.setlocale(locale = "Japanese_Japan.UTF-8")
I get
In Sys.setlocale("Japanese_Japan.65001") :
OS reports request to set locale to "Japanese_Japan.65001" cannot be honored
For now, does Windows allow R to use UTF-8?
(Because I am not very familiar with locale problem, I welcome comments if there should be more information.)
infomation
> Sys.getlocale()
[1] "LC_COLLATE=Japanese_Japan.932;LC_CTYPE=Japanese_Japan.932;LC_MONETARY=Japanese_Japan.932;LC_NUMERIC=C;LC_TIME=Japanese_Japan.932"

UPDATE: The (upcoming) R 4.2.0 should fully support UTF-8 on Windows: https://developer.r-project.org/Blog/public/2021/12/07/upcoming-changes-in-r-4.2-on-windows/
It appears that R has built experimental binaries that fully support UTF-8 on Windows 10, but since the project was marked as "experimental" as of 2020-07-30 and the official conclusion was:
Based also on this experience, I believe that switching to UCRT is already possible and I expect that building a complete toolchain should take a small number of months. It is I think the only realistic way to support Unicode characters (not representable in native encoding) reliably in R on Windows.
It clearly means that full UTF-8 support in R on Windows is still a plan for a bit more distant future.
Source: https://developer.r-project.org/Blog/public/2020/07/30/windows/utf-8-build-of-r-and-cran-packages/index.html

Sys.setlocale(locale = foo) defaults to category = "LC_ALL"; maybe set aspects of the locale for the R process individually, e.g. as follows:
locales <- c("LC_COLLATE","LC_CTYPE","LC_MONETARY","LC_NUMERIC","LC_TIME");
for (x in locales) { Sys.setlocale(category = x, locale="Japanese_Japan.65001")}
Please observe all warnings from above code snippet and further notes from
locales: Query or Set Aspects of the Locale article:
Attempts to change the character set (by Sys.setlocale("LC_CTYPE", ), if that implies a different character set) during a session may not work and are likely to lead to some confusion.
Setting "LC_NUMERIC" to any value other than "C" may cause R to function anomalously, so gives a warning.
Almost all the output routines used by R itself under Windows ignore the setting of "LC_NUMERIC" since they make use of the Trio library which is not internationalized.
For instance, my locale is Czech so I tried the following code snippet (itemized above loop to see the results and warnings in sequence):
Sys.getlocale(category = "LC_ALL")
Sys.setlocale(category = "LC_COLLATE" , locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_CTYPE" , locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_MONETARY", locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_NUMERIC" , locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_TIME" , locale="Czech_Czechia.65001")
Sys.getlocale(category = "LC_ALL")
Output (pasted into the RStudio console):
> Sys.getlocale()
[1] "LC_COLLATE=Czech_Czechia.1250;LC_CTYPE=Czech_Czechia.1250;LC_MONETARY=Czech_Czechia.1250;LC_NUMERIC=C;LC_TIME=Czech_Czechia.1250"
> Sys.setlocale(category = "LC_COLLATE" , locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
> Sys.setlocale(category = "LC_CTYPE" , locale="Czech_Czechia.65001")
[1] ""
Warning message:
In Sys.setlocale(category = "LC_CTYPE", locale = "Czech_Czechia.65001") :
OS reports request to set locale to "Czech_Czechia.65001" cannot be honored
> Sys.setlocale(category = "LC_MONETARY", locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
> Sys.setlocale(category = "LC_NUMERIC" , locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
Warning message:
In Sys.setlocale(category = "LC_NUMERIC", locale = "Czech_Czechia.65001") :
setting 'LC_NUMERIC' may cause R to function strangely
> Sys.setlocale(category = "LC_TIME" , locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
> Sys.getlocale(category = "LC_ALL")
[1] "LC_COLLATE=Czech_Czechia.65001;LC_CTYPE=Czech_Czechia.1250;LC_MONETARY=Czech_Czechia.65001;LC_NUMERIC=Czech_Czechia.65001;LC_TIME=Czech_Czechia.65001"
>

The best way to use R in Windows to this day (August 22nd, 2020) is to install WSL 2 (Windows Subsystem for Linux) and connect to RStudio Server via a web browser.
Instructions:
Install WSL 2:
https://learn.microsoft.com/en-us/windows/wsl/install-win10
(which requires Windows 10, updated to version 1903 or higher).
If you want GUI for WSL 2, here is the instruction: https://most-useful.com/ubuntu-20-04-desktop-gui-on-wsl-2-on-surface-pro-4/ (but it eats almost of my RAM and laggy as shit)
Install R and RStudio Server: https://rstudio.com/products/rstudio/download-server/
Start RStudio Server: sudo rstudio-server start
Open a web browser (I recommend Chrome) and connect to http://localhost:8787, access your Linux account, RStudio Server will open and run smoothly. I use it in full-screen mode and even create a desktop shortcut for the address which opens it in full-screen mode by default.

Related

Reading weekday data in R [duplicate]

I’m using R version 2.15.3 (2013-03-01) on Ubuntu 12.10. The System is in German and so is R. This comes unhandy when searching for error messages.
Executing R in xterm this way $ LANG="C" R partially solves the issue. Then R displays everything in English. But when loading RStudio this way, the R interpreter is still in German. So I’m looking for a way to change the locale of R in R itself.
I found this: How to change language settings in R, but Sys.setenv(LANG = "en") does’t work for me:
2+x
# Fehler: Objekt 'x' nicht gefunden
Sys.setenv(LANG = "en")
2+x
# Fehler: Objekt 'x' nicht gefunden
I also tried Sys.setenv(LANG = "en_US.UTF-8") with no success.
Output of Sys.getlocale()
Sys.getlocale()
# [1] "LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;
# LC_COLLATE=de_DE.UTF-8;LC_MONETARY=de_DE.UTF-8;LC_MESSAGES=de_DE.UTF-8;
# LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;
# LC_IDENTIFICATION=C"
(linebrakes added for convenience)
Just had the same problem and found the solution that worked for me on Windows/Linux:
Sys.setlocale("LC_ALL","English")
Sys.setlocale("LC_MESSAGES", 'en_GB.UTF-8')
Sys.setenv(LANG = "en_US.UTF-8")
This 2 worked for me. No more polish error messages in eclipse R. Though I think only the 2nd had effect. Thanks
edit: although I have to execute those every time i restart R environment.
If you want to do this temporarily, you can try starting R from the command line preceded by setting the language in-line:
# start R with LANGUAGE set to Mandarin
LANGUAGE=zh_CN.UTF-8 R --no-save
# do R stuff
q()
# any LANGUAGE set in your env will be unaffected afterwards
env | grep LANGUAGE
In Ubuntu (14.04) this is the solution that worked for me:
Edit the .Renviron file in your home directory and add this line:
LANGUAGE="en_US.utf8"
# for R with British accent use en_GB.utf8
Then restart R.
In my cases (OSX High Sierra + Ubuntu 14.04) I could switch the language of R output to English only by using this command (with immediate effect without restarting the R session):
Sys.setenv("LANGUAGE"="EN")
To permanently change the language either add the above line to your Rprofile.site file (see ?Startup) or create/edit the file .Renviron in your home folder (~/) and enter a line with LANGUAGE=en or similar (like LANGUAGE="fr_FR.utf8" for French with UTF-8 encoding which is used by default in Linux).
Surprisingly among so many answers I don't see an answer that I would prefer myself.
echo 'LC_ALL=C' >> ~/.Renviron
This will append (or create if doesn't exist) a environment configuration line to .Renviron file which is meant to be used exactly for this purpose.
After that any R process started should already have locale specified in .Renviron file.
Try Sys.setlocale("LC_TIME", "English")
Try:
Sys.setlocale("LC_MESSAGES", 'en_GB.UTF-8')
Taken from: http://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Localization-of-messages which should be consulted for further details.
I had the same problem. I solved it by changing my Macbook's system preference->region as US. Then, re-install the R. Then, the system language changed ultimately.
sessionInfo()
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
You just need to
Open Terminal
Write or paste in: defaults write org.R-project.R force.LANG
en_US.UTF-8
Close Terminal and restart R
It worked for me in OS X
I think that is an issue of your Ubuntu, not R. If the OS does not have correct locale setting of "en", the R cannot use it. Check out the OS locales. Or using locale 'C' instead of 'en' may work still.
Sys.setenv(LANG='C')

Setting locale failed

I keep getting the following error message in the R Markdown log:
cropping document_files/figure-latex/ranking_time_output-1.pdf
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LC_ALL = (unset),
LC_CTYPE = "en_NL.UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
I've tried numerous things, such as:
Sys.setlocale("LC_ALL", 'en_US.UTF-8')
Sys.setenv(LANG = "en_US.UTF-8")
Sys.setlocale("LC_MESSAGES", 'en_GB.UTF-8')
running in R. However, non of this seems to work.
Do I have to do something in the command line or is it an issue that I can fix in R? I'm not an expert in both, so would appreciate help!
RStudio version: 0.99.903, system: Mac OS X 10_11_6
Furthermore I'm located in the Netherlands, but I run everything on my system in English.
LC_CTYPE is set to "en_NL.UTF-8". Such a locale does not exist on Mac OS X (and probably no other OS). Try to find where the wrong setting comes from because it will probably cause other problems, too.
Setting the locale with Sys.setlocale() is useless because Perl is running in a child process created with fork() and exec() and then switches locale based on the process environment.
Setting the environment for the Perl process is probably the right approach but you have to overwrite the erroneous value LC_CTYPE, not LC_ALL:
Sys.setenv(LC_CTYPE = "en_US.UTF-8")
You can set those environment vars by:
sudo vi /etc/environment
add these lines
LANG=en_US.utf-8
LC_ALL=en_US.utf-8

Changed locale in R. Reset failes

OS X 10.9.2 + R 3.0.2 and R 3.1.0
I have set the locale in R with
Sys.setlocale(category = "LC_TIME", locale = "C")
because I wanted English weekday names in my plots. (LC_TIME locale was "de_DE.UTF-8")
This worked, but the change has become permanent. Restarting R gives:
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
..
5: Setting LC_MONETARY failed, using "C"
I tried to reset the locale with these commands:
Sys.setlocale(category = "LC_TIME", locale = "")
Sys.setlocale(category = "LC_ALL", locale = "")
In both cases I got a warning:
..
OS reports request to set locale to "" cannot be honored
I also did reinstall R (in combination with an upgrade from R 3.0.2 to R 3.1.0
Nothing changed. May be the locale settings are stored in a dot-file, which is kept when upgrading, but I can't find where.
So if nobody knows a working reset command, an idea in which file the locale setting is stored may suffice.
Enforcing the language setting with
system("defaults write org.R-project.R force.LANG de_DE.UTF-8")
or
system("defaults write org.R-project.R force.LANG en_US.UTF-8")
plus restart fixed R on my computer. (I tested both settings.)
R starts now without error
The setting is permanent. I.e. I can quit and restart R and the setting "survives".
The Sys.setlocale(..) command sets the locale temporarily
The reset with Sys.setlocale(.., locale = "") works now!
Information on enforcing the language setting can be found in
R -- Help -- R for Mac OS X FAQ -- 7 Internationalization of the R.app:
If you use a non-standard setup .. you can override the auto-detection ...
Unclear is if it really was the Sys.setlocale() command that corrupted my system, or if it was something I did later. And unclear is if there is a way to reset the system to the original state. That would be in my eyes a more natural solution than enforcing the language setting.
The OP has promised to post a more complete answer but it should be noted that it was in the R-for-MAC-FAQ
7 Internationalization of the R.app

How to read chinese in rstudio on Linux

I encountered an issue when read the chinese file on Linux system by rstudio.
The error as below.
dt <- read.csv(file = "/home/..../aa-0912.csv", header = T , sep=",")
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<be><ba><b5><c3><c8><cb>'
This csv file is written by rstudio on Window system w/o specified encoding, as below:
write.csv(file = "/home/.../aa-0912.csv", data)
And I can read correctly on window but when I copy this file on my Linux system the read.csv
doesn't work.
The locale on Linux is :
Sys.getlocale()
[1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"
The locale on Window is :
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
I am trying to read data by encoding="utf-8" but I got the similar error message.
Any help?
I'm not sure that this is the answer to your question.
I'll try to be as general as possible so that people having trouble in any language might have a solution:
First in the terminal local -a local would display all the available locales on your system.
Once you found the locale the right locale then on RStudio:
Sys.setlocale("LC_ALL","fr_FR.utf8")
Sorry I don't seem to have any Chinese locale on my system. Other people have had the same issues: here and here
have also a look at ?Sys.setlocale in R.

How to change the locale of R?

I’m using R version 2.15.3 (2013-03-01) on Ubuntu 12.10. The System is in German and so is R. This comes unhandy when searching for error messages.
Executing R in xterm this way $ LANG="C" R partially solves the issue. Then R displays everything in English. But when loading RStudio this way, the R interpreter is still in German. So I’m looking for a way to change the locale of R in R itself.
I found this: How to change language settings in R, but Sys.setenv(LANG = "en") does’t work for me:
2+x
# Fehler: Objekt 'x' nicht gefunden
Sys.setenv(LANG = "en")
2+x
# Fehler: Objekt 'x' nicht gefunden
I also tried Sys.setenv(LANG = "en_US.UTF-8") with no success.
Output of Sys.getlocale()
Sys.getlocale()
# [1] "LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;
# LC_COLLATE=de_DE.UTF-8;LC_MONETARY=de_DE.UTF-8;LC_MESSAGES=de_DE.UTF-8;
# LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;
# LC_IDENTIFICATION=C"
(linebrakes added for convenience)
Just had the same problem and found the solution that worked for me on Windows/Linux:
Sys.setlocale("LC_ALL","English")
Sys.setlocale("LC_MESSAGES", 'en_GB.UTF-8')
Sys.setenv(LANG = "en_US.UTF-8")
This 2 worked for me. No more polish error messages in eclipse R. Though I think only the 2nd had effect. Thanks
edit: although I have to execute those every time i restart R environment.
If you want to do this temporarily, you can try starting R from the command line preceded by setting the language in-line:
# start R with LANGUAGE set to Mandarin
LANGUAGE=zh_CN.UTF-8 R --no-save
# do R stuff
q()
# any LANGUAGE set in your env will be unaffected afterwards
env | grep LANGUAGE
In Ubuntu (14.04) this is the solution that worked for me:
Edit the .Renviron file in your home directory and add this line:
LANGUAGE="en_US.utf8"
# for R with British accent use en_GB.utf8
Then restart R.
In my cases (OSX High Sierra + Ubuntu 14.04) I could switch the language of R output to English only by using this command (with immediate effect without restarting the R session):
Sys.setenv("LANGUAGE"="EN")
To permanently change the language either add the above line to your Rprofile.site file (see ?Startup) or create/edit the file .Renviron in your home folder (~/) and enter a line with LANGUAGE=en or similar (like LANGUAGE="fr_FR.utf8" for French with UTF-8 encoding which is used by default in Linux).
Surprisingly among so many answers I don't see an answer that I would prefer myself.
echo 'LC_ALL=C' >> ~/.Renviron
This will append (or create if doesn't exist) a environment configuration line to .Renviron file which is meant to be used exactly for this purpose.
After that any R process started should already have locale specified in .Renviron file.
Try Sys.setlocale("LC_TIME", "English")
Try:
Sys.setlocale("LC_MESSAGES", 'en_GB.UTF-8')
Taken from: http://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Localization-of-messages which should be consulted for further details.
I had the same problem. I solved it by changing my Macbook's system preference->region as US. Then, re-install the R. Then, the system language changed ultimately.
sessionInfo()
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
You just need to
Open Terminal
Write or paste in: defaults write org.R-project.R force.LANG
en_US.UTF-8
Close Terminal and restart R
It worked for me in OS X
I think that is an issue of your Ubuntu, not R. If the OS does not have correct locale setting of "en", the R cannot use it. Check out the OS locales. Or using locale 'C' instead of 'en' may work still.
Sys.setenv(LANG='C')

Resources