Encoding of cyrillic symbols in R - r

all.
I have to use cyrillic symbols, but I have trouble with it. When I print it in console everything is fine. But in view function and write to .csv characters are unreadable. Look at picture

Related

how to export unicode character to csv file in R? Is it possible or impossible?

I have text that contains Unicode.
Export this text to CSV.
I don't want to see Unicode in CSV files created in R.
This is the Unicode in question.
<U+00A0>
This unicode will appear blank when exported to an xlsx file.
However, when exporting as csv, it comes out as Unicode. It looks like <U+00A0> in the csv file.
How can solve this problem. and I want to know it is possible.
I tried changing the encoding option of the write.table function.
I tried using the iconv function.
But it was not resolved.

RStudio character encoding

I am trying to view characters of multiple of languages in RStudio. What I find unusual is I am able to view these in the console, but not in the viewer. UTF-8 encoded characters appear like 'U+3042', 'U+500B', etc. in the viewer.
Is there a way to get the viewer to display the actual characters instead of the encoded character?
Here are a couple of images showing what I mean -
In console: https://ibb.co/T0681H7
In viewer: https://ibb.co/QnxF25c
This is a known issue in RStudio. Feel free to comment/upvote here:
https://github.com/rstudio/rstudio/issues/4193

Corrupted Rmarkdown script: How can I get the Cyrillic characters back?

I was working with a script with lots of Cyrillic characters (throughout chunks and out of them) for weeks. One day I have opened a new Rmarkdown script where I wrote English, while the other document is still in my R session. Afterwards, I have returned to the Cyrillic document and everything written turns to something like this 8 иÑлÑ 1995 --> ÐлаÑÑÑ - наÑодÑ
The question is: Where is the source of problem? And, how can the corrupted script turn to its original form (with the Cyrillic characters)?
UPDATE!!
I have tried reopeining the Rstudio scrip with encoding CP1251, CP1252, windows1251 and UTF8, but it does not work. Certaintly the weird symbols change to another weird symbols. The problem is that I have saved the document with the default encoding CP1251 and windows1251) at the very begining.
Solution:
If working with cyrillic and lating characters, be sure you save the Rstudio script with UTF-8 encoding always, when you computer is windows (I do not know mac). If you close the script and open it again, re-open the file with UTF8 encoding.
Assuming you're using RStudio: Open your *.Rmd file and then try to reopen it "with encoding". Therefore simply use the File-Menu as shown below.
Select "Show all encodings" and choose your specific encoding, I suggest windows-1251 for cyrillic encoding:
Note: Apparently the issue can also occur while at the one time opening the *.Rmd file as "standalone" and at the other time from within an R Project.
Hope that would help.

RStudio - Weird characters becoming regular characters

I'm working with a messy database, in which I need to give format to some columns of data. For this, I use a lot of GSub and other forms of regular expressions. My problem is some of the characters I need to clean are "weird" characters, specially the A with the curly thing above followed by other weird character (Ñ).
When I copy from the database and then paste on my gsub function:
gsub("CALLÑE", "CALLE", data)
It works fine until I close and RStudio and reopen it. Then the characters are different in the RScript file. It is as if RStudio didn't support weird characters itself, and removes them from the Scripts when they are reopened:
gsub("CALLÃ'E", "CALLE", data)
How can I avoid this? And keep my weird characters even after closing the file.
In RStudio, go to File -> Save with Encoding...
Select UTF-8 option.

Displaying UTF-8 encoded Chinese characters in R

I try to open a UTF-8 encoded .csv file that contains (traditional) Chinese characters in R. For some reason, R displays the information sometimes as Chinese characters, sometimes as unicode characters.
For instance:
data <-read.csv("mydata.csv", encoding="UTF-8")
data
will produce unicode characters, while:
data <-read.csv("mydata.csv", encoding="UTF-8")
data[,1]
will actually display Chinese characters.
If I turn it into a matrix, it will also display Chinese characters, but if I try to look at the data (command View(data) or fix(data)) it is in unicode again.
I've asked for advice from people who use a Mac (I'm using a PC, Windows 7), and some of them got Chinese characters throughout, others didn't. I tried to save the original data as a table instead and read it into R this way - same result. I tried running the script in RStudio, Revolution R, and RGui. I tried to adjust the locale (e.g. to chinese), but either R didn't let me change it or else the result was gibberish instead of unicode characters.
My current locale is:
"LC_COLLATE=French_Switzerland.1252;LC_CTYPE=French_Switzerland.1252;LC_MONETARY=French_Switzerland.1252;LC_NUMERIC=C;LC_TIME=French_Switzerland.1252"
Any help to get R to consistently display Chinese characters would be greatly appreciated...
Not a bug, more a misunderstanding of the underlying type system conversions (the character type and the factor type) when constructing a data.frame.
You could start first with data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE) which will make your Chinese characters to be of the character type and so by printing them out you should see waht you are expecting.
#nograpes: similarly x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE) and everything should be ok.
In my case, the utf-8 encoding does not work in my r. But the Gb* encoding works.The utf8 wroks in ubuntu. First you need to figure out the default encoding in your OS. And encode it as it is. Excel can not encode it as utf8 properly even it claims that it save as utf8.
(1) Download 'Open Sheet' software.
(2) Open it properly. You can scroll the encoding method until you
see the Chinese character displayed in the preview windows.
(3) Save it as utf-8(if you want utf-8). (UTF-8 is not solution to every problem, you HAVE TO know the default encoding in your system first)

Resources