read japanese character in R Studio - r

I have a japanese text csv file seperated by tab
It was written in utf-8 using python csv package
However,when i import it with command in RStudio as below
A <- read.csv("reviews4.csv",sep="\t",header = F,encoding="UTF-8")
The japanese character would show like this:
<U+8AAC>明無<U+3057><U+306B><U+5185>容量<U.....
I think it only shows kanji parts correctly.
I've tried encoding = "CP932"
It would show:
隤祆<98><81><86>捆<87><....
Then I tried another way: click the file in the lower right and select "import dataset"
Then strange things happend:
When I choose "First rows as names",the colnames show japanese properly
but when I disable that,it show uncorrectly.
Can anyone help me importing japanese csv properly?
Thank you so much!

Use fileEncoding="UTF-8" instead of encoding="UTF-8".

Related

Storing special characters with R in csv

I need a solution for storing special characters like emojis, arabic or chinese characters in a csv. I tried the base write.csv, write.csv2 with the fileEncoding="UTF-8" parameter and the readr-function write_csv but nothing worked properly. The special characters are shown in R, so I guess there is a solution for storing them.
Example-Code:
df <- data.frame("x" = c("ö", "ä"),
"y" = c("مضر السامرائي", "🐇"))
write.csv(df, "~/TubeWork/data/test2.csv", fileEncoding = "UTF-8")
To check the results I use Excel and it looks as follows:
Maybe it's a problem of Excel, which can display the results correctly? If yes, how I should check if the characters are displayed correctly?
Is there maybe a solution to convert the characters to unicode and save it as unicode? This would be fine for me as well. But the best solution would be a csv with the special characters displayed.
Thank you in advance!
Windows 10 64-bit; R 4.2.1; RStudio 2022.12.0+353
Update!
If I read the exported csv back in R, all the Emojis are displayed correctly. So as you all wrote, Excel can't diplay the emojis and special character correctly. If I want the special characters character displayed in Excel, you should use readr:write_excel_csv() (Big Thanks to Ritchie Scramenta for the useful comment).
Once again: Problem solved!
Thank you!

Which is the correct encoding for a degree character?

I have a line of code that alters text
temperature<-as.numeric(gsub("°.*","",temp))
R does not like the "°" character. When I save the file it says I need to use a different encoding.
I have tried all sorts of different encodings from the list, but they all save the code in some variation of
temperature<-as.numeric(gsub("??.*","",temp))
My current solution is to open the script in notepad and copy paste the code into rstudio. Which encoding do I need to save a ° in rstudio?
The full solution to this in rstudio was to go to file -> save with encoding -> select ISO-8859-1 -> check the box Set as default encoding for source files. Now the file opens properly with the degree character every time.

Corrupted Rmarkdown script: How can I get the Cyrillic characters back?

I was working with a script with lots of Cyrillic characters (throughout chunks and out of them) for weeks. One day I have opened a new Rmarkdown script where I wrote English, while the other document is still in my R session. Afterwards, I have returned to the Cyrillic document and everything written turns to something like this 8 иÑлÑ 1995 --> ÐлаÑÑÑ - наÑодÑ
The question is: Where is the source of problem? And, how can the corrupted script turn to its original form (with the Cyrillic characters)?
UPDATE!!
I have tried reopeining the Rstudio scrip with encoding CP1251, CP1252, windows1251 and UTF8, but it does not work. Certaintly the weird symbols change to another weird symbols. The problem is that I have saved the document with the default encoding CP1251 and windows1251) at the very begining.
Solution:
If working with cyrillic and lating characters, be sure you save the Rstudio script with UTF-8 encoding always, when you computer is windows (I do not know mac). If you close the script and open it again, re-open the file with UTF8 encoding.
Assuming you're using RStudio: Open your *.Rmd file and then try to reopen it "with encoding". Therefore simply use the File-Menu as shown below.
Select "Show all encodings" and choose your specific encoding, I suggest windows-1251 for cyrillic encoding:
Note: Apparently the issue can also occur while at the one time opening the *.Rmd file as "standalone" and at the other time from within an R Project.
Hope that would help.

wrong text file output of special characters using UTF-8 enconding in R 3.1.2 with Mac OS X

I am having problems to write a csv file with Spanish accents, using R 3.1.2 and Mac OS X 10.6.
I cannot write words with accents into text file.
When I do:
con <- file("y.csv",encoding="UTF-8")
write.csv("Ú",con)
I get y.csv file which has the following content:
"","x"
"1","√ö"
Ie, "√ö" instead of "Ú".
When using write.table the outcome is equivalent.
Encoding("Ú") is "UTF-8"
If I do write.xlsx("Ú","y.xlsx") I get y.xlsx file which successfully shows Ú.
I have also tried to convert to other encodings using iconv() with no success.
I have set default encoding "UTF-8" in RStudio and on TextEdit. When using only R (not RStudio) the problem is the same.
In RStudio the special characters appear correctly (in files), and also in the console in R.
Sys.getlocale()gives
"es_ES.UTF-8/es_ES.UTF-8/es_ES.UTF-8/C/es_ES.UTF-8/es_ES.UTF-8"
In Mac OS X Terminal
file -I y.csv
gives
y.csv: text/plain; charset=utf-8
I don't see where the problem is. Any help, please?
Just came across this other question that seems like a near duplicate to this one :
Export UTF-8 BOM to .csv in R
The problem was not one of codification in R, but from TextEdit, which did not show the right characters, although I had selected UTF-8 codification in preferences. Got solved using a different editor. I was using Mac OS X 10.6.8 and TextEdit 1.6.
Maybe you cán write words with accents, but Excel expects a different encoding. Try writing your csv with for example write_csv(), and open the csv with a workaround:
open Excel
then choose tab Data | Get External Data | From Text
choose your file
and in step 1 of the text import wizard, choose file origin 65001: Unicode(UTF8).
See also http://www.openforis.org/support/questions/279/wrong-characters-display-when-exporting-files-to-csv-from-collect

How to import data with Chinese Characters correctly in R

Here is the link to the file I am having trouble with. https://www.dropbox.com/s/m7dllf3ec884mte/help.csv?dl=0
I am using RStudio with R 3.1.2 installed. My problem is that when using read.csv to import this file, the Chinese character within the file turn out to be some gibberish. Could someone help me with this, I would be much appreciated.
I have tried to change my locale using Sys.setlocale(category = "LC_ALL", locale = "chs"). It didn't work. I have also tried to change the encoding of the file using data <-read.csv("help.csv", encoding="UTF-8", stringsAsFactors=FALSE), it didn't work either. All it does was changing to another kind of random characters instead of proper Chinese character.
There is no problem with my RStudio console when just typing in Chinese characters.

Resources