How to read chinese in rstudio on Linux - r

I encountered an issue when read the chinese file on Linux system by rstudio.
The error as below.
dt <- read.csv(file = "/home/..../aa-0912.csv", header = T , sep=",")
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<be><ba><b5><c3><c8><cb>'
This csv file is written by rstudio on Window system w/o specified encoding, as below:
write.csv(file = "/home/.../aa-0912.csv", data)
And I can read correctly on window but when I copy this file on my Linux system the read.csv
doesn't work.
The locale on Linux is :
Sys.getlocale()
[1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"
The locale on Window is :
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
I am trying to read data by encoding="utf-8" but I got the similar error message.
Any help?

I'm not sure that this is the answer to your question.
I'll try to be as general as possible so that people having trouble in any language might have a solution:
First in the terminal local -a local would display all the available locales on your system.
Once you found the locale the right locale then on RStudio:
Sys.setlocale("LC_ALL","fr_FR.utf8")
Sorry I don't seem to have any Chinese locale on my system. Other people have had the same issues: here and here
have also a look at ?Sys.setlocale in R.

Related

Reading weekday data in R [duplicate]

I’m using R version 2.15.3 (2013-03-01) on Ubuntu 12.10. The System is in German and so is R. This comes unhandy when searching for error messages.
Executing R in xterm this way $ LANG="C" R partially solves the issue. Then R displays everything in English. But when loading RStudio this way, the R interpreter is still in German. So I’m looking for a way to change the locale of R in R itself.
I found this: How to change language settings in R, but Sys.setenv(LANG = "en") does’t work for me:
2+x
# Fehler: Objekt 'x' nicht gefunden
Sys.setenv(LANG = "en")
2+x
# Fehler: Objekt 'x' nicht gefunden
I also tried Sys.setenv(LANG = "en_US.UTF-8") with no success.
Output of Sys.getlocale()
Sys.getlocale()
# [1] "LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;
# LC_COLLATE=de_DE.UTF-8;LC_MONETARY=de_DE.UTF-8;LC_MESSAGES=de_DE.UTF-8;
# LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;
# LC_IDENTIFICATION=C"
(linebrakes added for convenience)
Just had the same problem and found the solution that worked for me on Windows/Linux:
Sys.setlocale("LC_ALL","English")
Sys.setlocale("LC_MESSAGES", 'en_GB.UTF-8')
Sys.setenv(LANG = "en_US.UTF-8")
This 2 worked for me. No more polish error messages in eclipse R. Though I think only the 2nd had effect. Thanks
edit: although I have to execute those every time i restart R environment.
If you want to do this temporarily, you can try starting R from the command line preceded by setting the language in-line:
# start R with LANGUAGE set to Mandarin
LANGUAGE=zh_CN.UTF-8 R --no-save
# do R stuff
q()
# any LANGUAGE set in your env will be unaffected afterwards
env | grep LANGUAGE
In Ubuntu (14.04) this is the solution that worked for me:
Edit the .Renviron file in your home directory and add this line:
LANGUAGE="en_US.utf8"
# for R with British accent use en_GB.utf8
Then restart R.
In my cases (OSX High Sierra + Ubuntu 14.04) I could switch the language of R output to English only by using this command (with immediate effect without restarting the R session):
Sys.setenv("LANGUAGE"="EN")
To permanently change the language either add the above line to your Rprofile.site file (see ?Startup) or create/edit the file .Renviron in your home folder (~/) and enter a line with LANGUAGE=en or similar (like LANGUAGE="fr_FR.utf8" for French with UTF-8 encoding which is used by default in Linux).
Surprisingly among so many answers I don't see an answer that I would prefer myself.
echo 'LC_ALL=C' >> ~/.Renviron
This will append (or create if doesn't exist) a environment configuration line to .Renviron file which is meant to be used exactly for this purpose.
After that any R process started should already have locale specified in .Renviron file.
Try Sys.setlocale("LC_TIME", "English")
Try:
Sys.setlocale("LC_MESSAGES", 'en_GB.UTF-8')
Taken from: http://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Localization-of-messages which should be consulted for further details.
I had the same problem. I solved it by changing my Macbook's system preference->region as US. Then, re-install the R. Then, the system language changed ultimately.
sessionInfo()
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
You just need to
Open Terminal
Write or paste in: defaults write org.R-project.R force.LANG
en_US.UTF-8
Close Terminal and restart R
It worked for me in OS X
I think that is an issue of your Ubuntu, not R. If the OS does not have correct locale setting of "en", the R cannot use it. Check out the OS locales. Or using locale 'C' instead of 'en' may work still.
Sys.setenv(LANG='C')

R Console Language Setting [duplicate]

My error messages are displayed in French. How can I change my system language setting so the error messages will be displayed in English?
You can set this using the Sys.setenv() function. My R session defaults to English, so I'll set it to French and then back again:
> Sys.setenv(LANG = "fr")
> 2 + x
Erreur : objet 'x' introuvable
> Sys.setenv(LANG = "en")
> 2 + x
Error: object 'x' not found
A list of the abbreviations can be found here.
Sys.getenv() gives you a list of all the environment variables that are set.
In the case of RStudio for Windows I succeeded in changing the language following the instructions found in R for Windows FAQ, in particular I wrote:
language = EN
inside the file Rconsole (in my installation it is C:\Program Files\R\R-2.15.2\etc\Rconsole); this works also for the command Rscript.
For example you can locate the Rconsole file with this two commands from a command prompt:
cd \
dir Rconsole /s
The first one make the root as the current directory, the second one looks for the Rconsole file.
In the following screenshot you have that Rconsole file is in the folder C:\Program Files\R\R-3.4.1\etc.
You may have more than one location, in that case you may edit all the Rconsole files.
After that you can open the Rconsole file with your favorite editor and look for the line language = and then append EN at the end of that line.
In the following screenshot the interesting line is the number 70 and you have to append EN at the end of it.
For mac users, I found this on the R for Mac FAQ
If you use a non-standard setup (e.g. different language than
formats), you can override the auto-detection performed by setting
`force.LANG' defaults setting, such as for example
defaults write org.R-project.R force.LANG en_US.UTF-8
when run in Terminal it will enforce US-english setting regardless of the system
setting. If you don't know what Terminal is you can use this R command
instead:
system("defaults write org.R-project.R force.LANG en_US.UTF-8")
but do not forget to quit R and start R.app again afterwards. Please
note that you must always use `.UTF-8' version of the locale,
otherwise R.app will not work properly.
This helped me to change my console language from Chinese to English.
This works from command line :
$ export LANG=en_US.UTF-8
None of the other answers above worked for me
If you use Ubuntu you will set
LANGUAGE=en
in /etc/R/Renviron.site.
You may also want to be aware of the difference between, for example, Sys.setenv(LANG = "ru") and Sys.setlocale(locale = "ru_RU.utf8").
> Sys.setlocale(locale = "ru_RU.utf8")
[1] "LC_CTYPE=ru_RU.utf8;LC_NUMERIC=C;LC_TIME=ru_RU.utf8;LC_COLLATE=ru_RU.utf8;LC_MONETARY=ru_RU.utf8;LC_MESSAGES=en_IE.utf8;LC_PAPER=en_IE.utf8;LC_NAME=en_IE.utf8;LC_ADDRESS=en_IE.utf8;LC_TELEPHONE=en_IE.utf8;LC_MEASUREMENT=en_IE.utf8;LC_IDENTIFICATION=en_IE.utf8"
If you are interested in changing the behaviour of functions that refer to one of these elements (e.g strptime to extract dates), you should use Sys.setlocale().
See ?Sys.setlocale for more details.
In order to see all available languages on a linux system, you can run
system("locale -a", intern = TRUE)
To permanently make it works, in both R and Rstudio (with Win 10),
one way to do this is to run the script every time automatically in the background initially.
No more changing the system language that influence the windows.
No more R only but fail in Rstudio. No more run a script every time manually. No more admin right but fail. No more short-cut setting but fail.
Step 1.
Use your system search, to find the file named "Rprofile"
My response is
C:\Program Files\R\R-4.0.5\library\base\R
C:\Program Files\R\R-4.0.5\etc
Step 2.
Edit C:\Program Files\R\R-4.0.5\library\base\R\Rprofile
The content:
This is the system Rprofile file. It is always run on startup.
Additional commands can be placed in site or user Rprofile files
(see ?Rprofile)
... and so on.
Step 3. Add Sys.setenv(LANGUAGE="en") at the end of the scrip
local({
Sys.setenv(LANGUAGE="en")
})
P.S. If you encounter the issue of authorization/saving,
move this file to desktop and replace the original file after editing.
type this first:
system("defaults write org.R-project.R force.LANG en_US.UTF-8")
then you will get a index number(in my case is 127)
then type:
Sys.setenv(LANG = "en")
then type the number and ENTER
127
For me worked:
Sys.setlocale("LC_MESSAGES", "en_US.utf8")
Testing:
> Sys.setlocale("LC_MESSAGES", "en_US.utf8")
[1] "en_US.utf8"
> x[3]
Error: object 'x' not found
Also working to get english messages:
Sys.setlocale("LC_MESSAGES", "C")
To reset to german messages I used
Sys.setlocale("LC_MESSAGES", "de_DE.utf8")
Here is the start of my sessionInfo:
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
A simple solution would be setting export Lang=C in your bash script.
I had a similar issue where the default language was german so it reverted back to english.
If you want to change R's language in terminal to English forever, this works fine for me in macOS:
Open terminal.app, and say:
touch .bash_profile
Then say:
open -a TextEdit.app .bash_profile
These two commands will help you open ".bash_profile" file in TextEdit.
Add this to ".bash_profile" file:
export LANG=en_US.UTF-8
Then save the file, reopen terminal and type R, you will find it's language has changed to english.
If you want language come back to it's original, just simply add a # before export LANG=en_US.UTF-8.
The only thing that worked for me was uninstalling R entirely (make sure to remove it from the Programs files as well), and install it, but unselect Message Translations during the installation process. When I installed R, and subsequently RCmdr, it finally came up in English.
In Ubuntu 14.04 LTS I had to remove the # from the comment #LANGUAGE=EN.
All other options din not work for me.
Change your current regional format to a different regional format in region settings on time&language settings in Windows by clicking on your time/date in lower right corner > adjust time/date > Region > change regional format to UK or US
This worked for me with a windows 10 laptop in German, where I wanted i.e. lubridate to return dates in English:
Sys.setlocale("LC_TIME", "English")
Im using R Studio on a Mac and I couldn't find the Rconsole file. So I took a more brutal way and just deleted the unwanted language files from the R app.
You just have to go to your Rapp in your application Folder, right click, show package content then /contents/Resources/. There are the language files e.g. English.lproj or in my case de.lproj wich I deleted. After restarting R, error messages appear in English..
Maybe thats helpful!
you simply have to change the basic language of microsoft on your computer!
press the windows button together with r, and tip the following code into the window that is opened
control.exe /name Microsoft.Language
load the language package you want to use and change the options. but take care, this will change also your keyboard layout!
on windows, when you have no admin right, just create a new program shortcut to Rgui.exe. Then in the properties of that shortcut, go to the 'Shortcut' tab and modify the target to include the system language of your choice, e.g. "C:\Program Files\R\R-3.5.3\bin\x64\Rgui.exe" LANGUAGE=en

UTF-8 support in R on Windows

Since new function 'Beta: Use Unicode UTF-8 for worldwide language support' is added on Windows10, I thought it is possible for R to convert locale environment to UTF-8. However, when I try to change system locale to UTF-8 by
Sys.setlocale(locale = "Japanese_Japan.65001")
or
Sys.setlocale(locale = "Japanese_Japan.UTF-8")
I get
In Sys.setlocale("Japanese_Japan.65001") :
OS reports request to set locale to "Japanese_Japan.65001" cannot be honored
For now, does Windows allow R to use UTF-8?
(Because I am not very familiar with locale problem, I welcome comments if there should be more information.)
infomation
> Sys.getlocale()
[1] "LC_COLLATE=Japanese_Japan.932;LC_CTYPE=Japanese_Japan.932;LC_MONETARY=Japanese_Japan.932;LC_NUMERIC=C;LC_TIME=Japanese_Japan.932"
UPDATE: The (upcoming) R 4.2.0 should fully support UTF-8 on Windows: https://developer.r-project.org/Blog/public/2021/12/07/upcoming-changes-in-r-4.2-on-windows/
It appears that R has built experimental binaries that fully support UTF-8 on Windows 10, but since the project was marked as "experimental" as of 2020-07-30 and the official conclusion was:
Based also on this experience, I believe that switching to UCRT is already possible and I expect that building a complete toolchain should take a small number of months. It is I think the only realistic way to support Unicode characters (not representable in native encoding) reliably in R on Windows.
It clearly means that full UTF-8 support in R on Windows is still a plan for a bit more distant future.
Source: https://developer.r-project.org/Blog/public/2020/07/30/windows/utf-8-build-of-r-and-cran-packages/index.html
Sys.setlocale(locale = foo) defaults to category = "LC_ALL"; maybe set aspects of the locale for the R process individually, e.g. as follows:
locales <- c("LC_COLLATE","LC_CTYPE","LC_MONETARY","LC_NUMERIC","LC_TIME");
for (x in locales) { Sys.setlocale(category = x, locale="Japanese_Japan.65001")}
Please observe all warnings from above code snippet and further notes from
locales: Query or Set Aspects of the Locale article:
Attempts to change the character set (by Sys.setlocale("LC_CTYPE", ), if that implies a different character set) during a session may not work and are likely to lead to some confusion.
Setting "LC_NUMERIC" to any value other than "C" may cause R to function anomalously, so gives a warning.
Almost all the output routines used by R itself under Windows ignore the setting of "LC_NUMERIC" since they make use of the Trio library which is not internationalized.
For instance, my locale is Czech so I tried the following code snippet (itemized above loop to see the results and warnings in sequence):
Sys.getlocale(category = "LC_ALL")
Sys.setlocale(category = "LC_COLLATE" , locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_CTYPE" , locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_MONETARY", locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_NUMERIC" , locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_TIME" , locale="Czech_Czechia.65001")
Sys.getlocale(category = "LC_ALL")
Output (pasted into the RStudio console):
> Sys.getlocale()
[1] "LC_COLLATE=Czech_Czechia.1250;LC_CTYPE=Czech_Czechia.1250;LC_MONETARY=Czech_Czechia.1250;LC_NUMERIC=C;LC_TIME=Czech_Czechia.1250"
> Sys.setlocale(category = "LC_COLLATE" , locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
> Sys.setlocale(category = "LC_CTYPE" , locale="Czech_Czechia.65001")
[1] ""
Warning message:
In Sys.setlocale(category = "LC_CTYPE", locale = "Czech_Czechia.65001") :
OS reports request to set locale to "Czech_Czechia.65001" cannot be honored
> Sys.setlocale(category = "LC_MONETARY", locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
> Sys.setlocale(category = "LC_NUMERIC" , locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
Warning message:
In Sys.setlocale(category = "LC_NUMERIC", locale = "Czech_Czechia.65001") :
setting 'LC_NUMERIC' may cause R to function strangely
> Sys.setlocale(category = "LC_TIME" , locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
> Sys.getlocale(category = "LC_ALL")
[1] "LC_COLLATE=Czech_Czechia.65001;LC_CTYPE=Czech_Czechia.1250;LC_MONETARY=Czech_Czechia.65001;LC_NUMERIC=Czech_Czechia.65001;LC_TIME=Czech_Czechia.65001"
>
The best way to use R in Windows to this day (August 22nd, 2020) is to install WSL 2 (Windows Subsystem for Linux) and connect to RStudio Server via a web browser.
Instructions:
Install WSL 2:
https://learn.microsoft.com/en-us/windows/wsl/install-win10
(which requires Windows 10, updated to version 1903 or higher).
If you want GUI for WSL 2, here is the instruction: https://most-useful.com/ubuntu-20-04-desktop-gui-on-wsl-2-on-surface-pro-4/ (but it eats almost of my RAM and laggy as shit)
Install R and RStudio Server: https://rstudio.com/products/rstudio/download-server/
Start RStudio Server: sudo rstudio-server start
Open a web browser (I recommend Chrome) and connect to http://localhost:8787, access your Linux account, RStudio Server will open and run smoothly. I use it in full-screen mode and even create a desktop shortcut for the address which opens it in full-screen mode by default.

source() not recognising characters of Sys.setlocale in R

When I'm running my source-line with encoding = "UTF-8", R gives me an error message:
INCOMPLETE_STRING
The script stops at the first special character (ö).
Before I run the script, I defined (Windows 10 PC)
Sys.setlocale ("LC_ALL","German")
[1] "LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252"
Running the code manually works when I open the script with UTF-8 encoding. If I understood correctly, I cannot set the locale to UTF-8 on a Windows PC, right? What can I do?
I think I found the answer myself.
encoding = "windows-1252"
is the correct encoding for source, although I saved it as UTF-8. Very strange! Hope it will help someone else.

How to change the locale of R?

I’m using R version 2.15.3 (2013-03-01) on Ubuntu 12.10. The System is in German and so is R. This comes unhandy when searching for error messages.
Executing R in xterm this way $ LANG="C" R partially solves the issue. Then R displays everything in English. But when loading RStudio this way, the R interpreter is still in German. So I’m looking for a way to change the locale of R in R itself.
I found this: How to change language settings in R, but Sys.setenv(LANG = "en") does’t work for me:
2+x
# Fehler: Objekt 'x' nicht gefunden
Sys.setenv(LANG = "en")
2+x
# Fehler: Objekt 'x' nicht gefunden
I also tried Sys.setenv(LANG = "en_US.UTF-8") with no success.
Output of Sys.getlocale()
Sys.getlocale()
# [1] "LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;
# LC_COLLATE=de_DE.UTF-8;LC_MONETARY=de_DE.UTF-8;LC_MESSAGES=de_DE.UTF-8;
# LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;
# LC_IDENTIFICATION=C"
(linebrakes added for convenience)
Just had the same problem and found the solution that worked for me on Windows/Linux:
Sys.setlocale("LC_ALL","English")
Sys.setlocale("LC_MESSAGES", 'en_GB.UTF-8')
Sys.setenv(LANG = "en_US.UTF-8")
This 2 worked for me. No more polish error messages in eclipse R. Though I think only the 2nd had effect. Thanks
edit: although I have to execute those every time i restart R environment.
If you want to do this temporarily, you can try starting R from the command line preceded by setting the language in-line:
# start R with LANGUAGE set to Mandarin
LANGUAGE=zh_CN.UTF-8 R --no-save
# do R stuff
q()
# any LANGUAGE set in your env will be unaffected afterwards
env | grep LANGUAGE
In Ubuntu (14.04) this is the solution that worked for me:
Edit the .Renviron file in your home directory and add this line:
LANGUAGE="en_US.utf8"
# for R with British accent use en_GB.utf8
Then restart R.
In my cases (OSX High Sierra + Ubuntu 14.04) I could switch the language of R output to English only by using this command (with immediate effect without restarting the R session):
Sys.setenv("LANGUAGE"="EN")
To permanently change the language either add the above line to your Rprofile.site file (see ?Startup) or create/edit the file .Renviron in your home folder (~/) and enter a line with LANGUAGE=en or similar (like LANGUAGE="fr_FR.utf8" for French with UTF-8 encoding which is used by default in Linux).
Surprisingly among so many answers I don't see an answer that I would prefer myself.
echo 'LC_ALL=C' >> ~/.Renviron
This will append (or create if doesn't exist) a environment configuration line to .Renviron file which is meant to be used exactly for this purpose.
After that any R process started should already have locale specified in .Renviron file.
Try Sys.setlocale("LC_TIME", "English")
Try:
Sys.setlocale("LC_MESSAGES", 'en_GB.UTF-8')
Taken from: http://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Localization-of-messages which should be consulted for further details.
I had the same problem. I solved it by changing my Macbook's system preference->region as US. Then, re-install the R. Then, the system language changed ultimately.
sessionInfo()
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
You just need to
Open Terminal
Write or paste in: defaults write org.R-project.R force.LANG
en_US.UTF-8
Close Terminal and restart R
It worked for me in OS X
I think that is an issue of your Ubuntu, not R. If the OS does not have correct locale setting of "en", the R cannot use it. Check out the OS locales. Or using locale 'C' instead of 'en' may work still.
Sys.setenv(LANG='C')

Resources