I am working with a dataset that contains data in multiple languages.
Is there a way to export my work as a CSV file and have R maintain the use of characters in a foreign language instead of replacing them with gibberish English symbols?
Update for anyone who reaches this by Google:
It looks like R only pretends to screw up foreign languages. When you use write_csv, it actually does create a .csv that uses the correct foreign characters.
However, you'll only see them if you open them in Notepad. If you open it in Excel, Excel will screw it up, and if you open it with read_csv, R will screw it up (but will still export it correctly when you use write_csv again).
write_excel_csv() from the readr package seems to work without messing up the foreign characters when opening with Excel.
Related
Is there any way to deal with this letter in R -Å?
In some configuration I'm able to read this letter from SQL by RODBC, but I didn't found any solution to save this letter to csv or txt. It's always getting converted to normal A or Ĺ.
Also, how to read this letter correctly from Excel file?
I understand from you question that the letter displays properly inside R but you have problems writing it to files.
R's writing functions usually have an encoding parameter (for example, for write.csv and write.table it's called fileEncoding).
When you don't set it explicitly, the function will encode the file using your OS's (or R-installations) native encoding, which can sometimes cause problems with special characters. What exactly goes wrong and how to fix it depends heavily on your system setup - especially if you're also interacting with databases, as you describe.
But very often, an easy fix is writing files in UTF-8 encoding, i.e.
write.csv(your_df, your_path, fileEncoding='UTF-8')
as most external programs (such as Excel) are able to automatically detect and properly read UTF-8 encoded files.
Set the fileEncoding argument on write.table to fit your needs (e.g., if your text is encoded as UTF-8, try write.table(my_tab, file = "my_tab.txt", fileEncoding = "UTF8")).
Our users have RStudio installed on their local machines and are using Shiny to filter data and exporting dataframes to an .xlsx file.
This works really well for most characters, but not when it comes to the Japanese and Mandarin ones. For those, they get to see ??????? instead of the actual text.
Data is residing in a SQL DB and we're using RODBC to connect to DB.
RODBC doesn't seem to like reading these Japanese and Mandarin characters. Is there a way to get around this?
Any help is much appreciated!
Thanks
I had a similar problem with french language the other day. Maybe these options can help you :
In RStudio, try going in Tool > Global Options > Code > Saving > and then choose the right encoding for Japanese and Mandarin. The UTF-8 enconding might work for you.
The blog post Escaping from character encoding hell in R on Windows explains how to set the encoding to import external documents. It should work with data imported with RODBC as well. The autor uses Japanese characters in his examples.
In the odbcDriverConnect() function of the RODBC package, the argument DBMSencoding="UTF-8" might work for you.
I've been researching into this question, which I assume should be easy to fix, but am not having any luck. I have an excel file where each cell is some text of variable length. I'm wanting to read this into R so I can eventually do some text classification, but am failing. I get errors when using read.table and am struggling with all other alternatives. I've never worked with text data before so perhaps this is my issue. Having problems finding good examples of importing text data into R when it isn't in a corpus format.
There are special packages for reading data from the excel format. I mostly use readxl when I need to do this, but I know that there are several (a lot of them are described in this tutorial by datacamp, in the section Importing Excel files into R).
Another possibility (assuming you are using windows) is to copy the cells to the clipboard and use
read.table("clipboard")
for macOS and Linux there are similar commands, but I don't know them by heart.
My data.frame uses the scientific notation, when parsing files like 3.007530e+07.
I definitely like to use it in R, however, for this analysis I have to transfer my data to csv and open it in excel(German Version), which cannot handle this notation.
My df looks sth like that:
df <- c(6.402000e+05,9.312903e+05,1.007800e+06,1.142000e+06,1.298500e+06,1.511700e+06,1.749000e+06,1.869357e+06)
I tried changing my global options such as options(scipen=999), which does not work, because then I have problems with my fread function.
Therefore, my question:
How to change the notation in a data.frame before, using write.csv()?
I appreciate your replies!
As an alternative to altering the R format (since you want to keep scientific notation in R), could you change how Excel imports your file?
For example, naming your csv file with a non-standard extension to trigger the manual importing process (import wizard), instead of automatically opening the file in the wrong format?
I tried a simple test with a csv formatted file of numbers in scientific notation, saved with a ".sci" filename. My version of Excel launched the wizard, then imported the file and handled the scientific notation correctly [MS Excel Starter 2010, English version].
Edit: I found the reference to using an unrecognized file extension to trigger Excel's import wizard: http://excelribbon.tips.net/T012201_Avoiding_Scientific_Notation_on_File_Imports.html
[The article suggests using .DAT, which I wouldn't use for an ASCII file, but I wanted to give credit where it's due for the idea.]
I have a Flex application with a couple of DataGrids with data. I'd like to save the data to a file so that the user can keep working with them in Excel, OpenOffice or Numbers.
I'm currently writing a csv file straight off, which opens well in OpenOffice or Numbers, but not in Excel. The problem is with the Swedish characters ÅÄÖ, which turn up as other characters when opening in Excel. Converting (in Notepad++) the csv-file to ANSI encoding makes the ÅÄÖ show up correctly in Excel.
Is there any way to write ANSI-encoded files straight from Flex?
Any other options for writing a file that can be opened in Excel and OpenOffice?
(I've looked at the as3xls library, but according to the comments those files cannot be opened in OpenOffice)
Using the writeMultiByte function from the ByteArray class allows you to specify a character set. See :
http://www.adobe.com/livedocs/flash/9.0/ActionScriptLangRefV3/flash/utils/ByteArray.html#writeMultiByte%28%29
There is also the option of the as3xls package at http://code.google.com/p/as3xls/. I like this as it comes out as a straight excel file that can also be easily opened in open office as well.