How to export a csv in utf-8 format? - r

I am trying to export a data.frame to a csv with utf-8 encoding. I have tried generating the file with write.csv with no success and the help(write.csv) did not mention any specific advice on creating that specific output. Here is my current export line.
write.csv(prod_out, file="product_output.csv",append=FALSE,eol="\r")
Any advice you can offer is appreciated.

This question is pretty old - I guess things have changed a lot since 2010. Anyway, I just came across this post and I happen to know the solution. You just add fileEncoding = "UTF-8" option directly to write.csv.

Try opening a UTF8 connection:
con<-file('filename',encoding="UTF-8")
write.csv(...,file=con,...)

You can try this solution:
write.csv(data,"data.csv",fileEncoding = "UTF-8")

Related

R version 4.2.0 and Swedish letters (ä ö å) not working in newest R release. Anyone found a solution?

I have updated to the latest R release (R version 4.2.0), but I am now facing the problem that all the Swedish special letters cannot be read anymore. I am working with a database that has many Swedish letters in its factor lables and even if I am reading them in as strings R doesn't recognise them, with the consequence that all summary tables that are based on these factors as groups are not calculated correctly anymore. The code has been working fine under the previous release (but I had issues with knitting Rmarkdown files, therefore the need for updating).
I have set the encoding to iso-5889-4 (which is nothern languages) after UTF-8 has not worked. Is there anything else I could try? Or has anyone come to a solution on how to fix this, other than to rename all lables before reading in the .csv files? (I would really like to avoid this fix, since I am often working with similar data)
I have used read.csv() and it produces cryptic outputs replacing the special letters with for example <d6> instead of ö and <c4> instead of ä.
I hope that someone has an idea for a fix. Thanks.
edit: I use windows.
Sys.getlocale("LC_CTYPE")
[1] "Swedish_Sweden.utf8"
Use the encoding parameter
I have been able to detect failed loads by attempting to apply toupper to strings, which gives me errors such as
Error in toupper(dataset$column) :
invalid multibyte string 999751
This is resolved and expected outcomes obtained by using
read.csv(..., encoding = 'latin1')
or
data.table::fread(..., encoding = 'Latin-1')
I believe this solution should apply to Swedish characters as they are also covered by the Latin-1 encoding.
I have the same problem, what worked for me was like the answer above said but I used encoding ISO-8859-1 instead. It works for both reading from file and saving to file for Swedish characters å,ä,ö,Å,Ä,Ä, i.e:
read.csv("~/test.csv", fileEncoding = "ISO-8859-1")
and
write.csv2(x, file="test.csv", row.names = FALSE, na = "", fileEncoding = "ISO-8859-1")
It's tedious but it works right now. Another tip is if you use Rstudio is to go to Global options -> Code -> Saving and set your default text encoding to ISO-8859-1 and restart Rstudio. It will save and read your scripts in that encoding as default if I understand correctly. I had the problem when I opened my scripts with Swedish characters, they would display wrong characters. This solution fixed that.

How to read csv file with unknown formatting and unknown encoding in R Program? (example file provided)

I have tried my best to read a CSV file in r but failed. I have provided a sample of the file in the following Gdrive link.
Data
I found that it is a tab-delimited file by opening in a text editor. The file is read in Excel without issues. But when I try to read it in R using "readr" package or the base r packages, it fails. Not sure why. I have tried different encoding like UTF-8. UTF-16, UTF16LE. Could you please help me to write the correct script to read this file. Currently, I am converting this file to excel as a comma-delimited to read in R. But I am sure there must be something that I am doing wrong. Any help would be appreciated.
Thanks
Amal
PS: What I don't understand is how excel is reading the file without any parameters provided? Can we build the same logic in R to read any file?
This is a Windows-related encoding problem.
When I open your file in Notepad++ it tells me it is encoded as UCS-2 LE BOM. There is a trick to reading in files with unusual encodings into R. In your case this seems to do the trick:
read.delim(con <- file("temp.csv", encoding = "UCS-2LE"))
(adapted from R: can't read unicode text files even when specifying the encoding).
BTW "CSV" stands for "comma separated values". This file has tab-separated values, so you should give it either a .tsv or .txt suffix, not .csv, to avoid confusion.
In terms of your second question, could we build the same logic in R to guess encoding, delimiters and read in many types of file without us explicitly saying what the encoding and delimiter is - yes, this would certainly be possible. Whether it is desirable I'm not sure.

R changing names when there is ä ü ö

OK, this is an extremly annoying problem and I was not able to find a solution on the internet, therefore I come to you.
When importing data sets that contain German names with Umlaut (ä, ö, ü), R modifies the names. Somethin like Möhlin -> M<f6>hlin.
When writing code word containing Umlaut cause no problem, until saving the script. After reloading a save script all my beloved Umlaut are modified. Aka all the names of my plots, the name of the variables, etc etc ...
Please, anyone can help me ?
Try setting the locale:
Sys.setlocale(category = "LC_ALL", locale = "German")
Try changing default codepage to UTF-8 in RStudio via:
Tools - Global Options - Code - Saving - Default Text Encoding - UTF-8
then restart RStudio and save and reopen your script with umlauts.
I'd just try to make sure all your files are UTF-8 encoded, ie. know their Umlauts.
Thus, when writing and reading files, try to always explicitly set the file encoding to "UTF-8".
For instance, when writing df to file,
write.csv(tt, "output.csv", fileEncoding = "UTF-8")
The same logic applies to read.csv(), etc.
Note that opening files that way will only work properly when you saved them as UTF-8 in the first place.
I know that some people like to use stringr for string manipulation in general when working with non-English text, but I have never used it.

Fast method to read csv with UTF-16LE encoding

I'm dealing with .csv files with UTF-16LE encoding, this method works to read the files, but read.csv is very slow compared to read_csv.
read.csv2(path,dec=",",skip=1,header=T,fileEncoding="UTF-16LE",sep="/t")
Unfortunately I can't make read_csv work, I only get empty rows and I don't find a way to even specify encoding in the function.
I can't share my data, but if anyone dealt with this encoding any help would be appreciated.
You can specify file encodings with readr functions like read_csv with the locale option: locale=locale(encoding="UTF-16LE"). However, I haven't successfully read in a utf-16le file with read_csv. I get an "Incomplete multibyte sequence" error. There's a related issue filed, but I still have issues with my file -- hopefully others will have more success.

write.csv strange encoding in R

I am encountering a very strange problem that I am not able to resolve by myself.
Suddenly, write.csv is encoding csv file in a way that make it impossible to read it in libre office.
The command has always worked until today. Now, if I try to use write.csv (or its more general equivalent write.table) and then i try to open the file with libre office, all I get is a bunch of symbol and asian character.
I don't really understand what's happening here, it seems that the default encoding of write.csv has changed by itself.
The only different thing that I done today was reading some text file that were encoded using the program eprime, and so I had to use the following command in order to read the file
A=read.delim("Pre_NewTask_Run1.txt", fileEncoding="UCS-2LE")
Is it possible that this has changed the default encoding of write.csv ? And if this is the case, how can I change back ?
Thanks in advance for any help
It may be difficult to provide you with a precise answer without sample data or reproducible code being made available. Having said that, as an initial attempt you can attempt to force export of your data with use of specific encoding for example, the code:
con<-file('filename',encoding="utf8")
write.csv(...,file=con,...)
would enable you to use the utf-8 encoding. You may also run the l10n_info() command that would provide you with information on the local encoding that you currently have:
> l10n_info()
$MBCS
[1] FALSE
$`UTF-8`
[1] FALSE
$`Latin-1`
[1] TRUE
$codepage
[1] 1252

Resources