Encoding issues with german Umlaute ä, ö, ü and ß in R - r

I work on a mac and I'm currently working with a large csv file (german language) that I import to R. Encoding for ä, ö, ü, and ß is fine in that CSV file. When I import it, however, things get messy for those letters.
ü becomes <c3><bc>,
ä becomes <c3><a4>
....
I tried to apply UTF-8 when importing: df <- read.csv("file.csv", sep=";", encoding = "UTF-8")Still it looks the same. Standard encoding is also set to UTF-8.
Does anyone have an idea?

Go to the CSV and use the save as function of Excel to transform it. Save the file as CSV UTF-8 (Comma delimited) (it is called CSV UTF-8 (durch Trennzeichen getrennt) in German). Import this file with readr::read_csv('newfile.csv').

Related

how to export unicode character to csv file in R? Is it possible or impossible?

I have text that contains Unicode.
Export this text to CSV.
I don't want to see Unicode in CSV files created in R.
This is the Unicode in question.
<U+00A0>
This unicode will appear blank when exported to an xlsx file.
However, when exporting as csv, it comes out as Unicode. It looks like <U+00A0> in the csv file.
How can solve this problem. and I want to know it is possible.
I tried changing the encoding option of the write.table function.
I tried using the iconv function.
But it was not resolved.

Saving Umlauts in a .csv in R

I am trying to save a dataframe into a csv file. The dataframe contains umlauts, which I want to keep.
I tried exporting with
write.csv2(x, fileEncoding = "UTF-8")
as well as
readr::write_csv2(x)
in both cases, the umlauts do not get exported correctly. Instead of ä weird symbols occur: ä . My R is set to save files as UTF-8 as well. Is there something else I can try?

R write.csv not handling characters like é correctly

When I look at data in R, it has characters like "é" displayed correctly.
I export it to excel using write.csv. When I open the csv file, "é" is displayed as "√©". Is the problem with write.csv or with excel? What can I do to fix it?
Thanks
Try the write_excel_csv() function from the readr package
readr::write_excel_csv(your_dataframe, "file_path")
It's a problem with Excel. Try Importing data instead of Opening the file.
Go to: 'Data' --> 'From Text/CSV' and then select '65001:Unicode (UTF-8)'. That will match the encoding from R.
Try experimenting with the parameter fileEncoding of write.csv:
write.csv(..., fileEncoding="UTF-16LE")
From write.csv documentation:
fileEncoding character string: if non-empty declares the encoding to
be used on a file (not a connection) so the character data can be
re-encoded as they are written. See file.
CSV files do not record an encoding, and this causes problems if they
are not ASCII for many other applications. Windows Excel 2007/10 will
open files (e.g., by the file association mechanism) correctly if they
are ASCII or UTF-16 (use fileEncoding = "UTF-16LE") or perhaps in the
current Windows codepage (e.g., "CP1252"), but the ‘Text Import
Wizard’ (from the ‘Data’ tab) allows far more choice of encodings.
Excel:mac 2004/8 can import only ‘Macintosh’ (which seems to mean Mac
Roman), ‘Windows’ (perhaps Latin-1) and ‘PC-8’ files. OpenOffice 3.x
asks for the character set when opening the file.

How to output a file with Chinese Character into .csv file that's compatible with excel?

I want to export the data.frame with character vector in Chinese.
I have tried to output it into text file, it works perfectly with the following code
Sys.setlocale(category = "LC_ALL", locale = "zh_cn.utf-8")
data<-data.frame(ID=c('小李','小王','小宗'),number=c(1:3))
write.table(data,'test.txt',quote=F,row.names=F,sep='\t')
But when I tried to use the write.csv, if I use excel to open the data file, the Chinese part of the data is not correct for the output test.csv, see the figure below for details.
write.csv(data,'test.csv',row.names=F)
I have found a similar post on stackoverflow, but failed to figure out how to cope with my case. How to export a csv in utf-8 format?.
Is there any solution that can output the data file that is compatible with excel?

UTF-8 file encoding in R

I have a .csv file which should be in 'UTF-8' encoding. I have exported it from Sql Server Management Studio. However, when importing it to R it fails on the lines with ÿ. I use read.csv2 and specify file encoding "UTF-8-BOM".
Notepad++ correctly displays the ÿ and says it is UTF-8 encoding. Is this a bug with the R encoding, or is ÿ in fact not part of the UTF-8 encoding scheme?
I have uploaded a small tab delimited .txt file that fails here:
https://www.dropbox.com/s/i2d5yj8sv299bsu/TestData.txt
Thanks
That is probably part of the BOM marker at the beginning. If the editor or parser doesn't recognize BOM markers it believes it is garbage. See https://www.ultraedit.com/support/tutorials-power-tips/ultraedit/unicode.html for more details.

Resources