I want to export strings which includes greek letters to Excel using R.
For example I want to export the below expression:
β0=α0+1
I am using XLConnectJars and XLConnect libraries for exporting expressions from R to Excel.
Is there any way to export such expression to export from R to Excel?
For example the below code creates an excel file named "example" to my desktop. That file has an "Expression" sheet and, in that sheet, below expression is printed into the B3 cell:
B0=A0+1
library(XLConnectJars)
library(XLConnect)
wb<-loadWorkbook("data.xlsx", create = TRUE)
createSheet(wb,"Expression")
writeWorksheet(wb,"B0=A0+1", "Expression",startRow = 3,startCol = 2,header=FALSE)
saveWorkbook(wb, file="C:/Users/ozgur/Desktop/example.xlsx")
I want the same thing, but with Greek letters.
I will be very glad for any help? Thanks a lot.
You can do this using unicode characters for the expression for any Greek letters. In the example code below, I also changed the 0 to a subscript 0 using unicode. For this particular expression Beta is Unicode U+03B2, which in R is written as "\U03B2".
library(XLConnectJars)
library(XLConnect)
wb<-loadWorkbook("data.xlsx", create = TRUE)
createSheet(wb,"Expression")
ex <- "\U03B2\U2080=\U03B1\U2080+1"
writeWorksheet(wb,ex, "Expression",startRow = 3,startCol = 2,header=FALSE)
saveWorkbook(wb, file=paste0(Sys.getenv(c("USERPROFILE")),"\\Desktop\\example.xlsx"))
I also used the Sys.getenv to make the saving to the desktop more generalized than a specific user.
Related
Using read_excel function, I read an excel sheet which has a column that contains data in both English and Arabic language.
English is shown normally in R. but Arabic text is shown like this <U+0627><U+0644><U+0639><U+0645><U+0644>
dataset <- read_excel("Dataset_Draft v1.xlsx",skip = 1 )
dataset %>% select(description)
I tried Sys.setlocale("LC_ALL", "en_US.UTF-8") but with no success.
I want to show Arabic text normally and I want to make filter on this column with Arabic value.
Thank you.
You could try the read.xlsx() function from the xlsx library.
Here you can specify an encoding.
data <- xlsx::read.xlsx("file.xlsx", encoding="UTF-8")
Good day! I need some help for this. So I am trying to do some text mining where I am counting the word frequencies for every word in a text. I was able to do it fine in R with all the different characters, but the problem lies when I export it to .csv file. Basically, I am working on Hungarian text and when I try to save my data frame to .csv, three accented letters (ő, ű, ú) get converted to non-accented ones (o, u and u). It doesn't happen when the file is in .rds but I need to convert it to a .csv file so one of my consultants (zero knowledge of programming) can look at it in a normal Excel file. I tried some tricks e.g. making sure Notepad++ is in UTF-8 format, adding a line like this (fileEncoding = "UTF-8" or encoding ="UTF-8") when writing the .csv file using the write.csv
command, but it doesn't work.
Hope you can help me.
Thank you.
write.csv() works with the three characters you mentioned in the question.
Example
First create a data.frame containing special characters
library(tidyverse)
# Create an RDS file
test_df <- "ő, ű, ú" %>%
as.data.frame
test_df
# .
# 1 ő, ű, ú
Save it as an RDS
saveRDS(test_df, "test.RDS")
Now read in the RDS, save as csv, and read it back in:
# Read in the RDS
df_with_special_characters <- readRDS("test.RDS")
write.csv(df_with_special_characters, "first.csv", row.names=FALSE)
first <- read.csv("first.csv")
first
# .
# 1 ő, ű, ú
We can see above that the special characters are still there!
Extra note
If you have even rarer special characters, you could try setting the file encoding, like so:
write.csv(df_with_special_characters, "second.csv", fileEncoding = "UTF-8", row.names=FALSE)
second <- read.csv("second.csv")
# second
# # .
# # 1 ő, ű, ú
With the writexl package you could use write_xlsx(...) to write an xlsx file instead. It should handle unicode just fine.
Using openxlsx read.xlsx to import a dataframe from a multi-class column. The desired result is to import all values as strings, exactly as they're represented in Excel. However, some decimals are represented as very long floats.
Sample data is simply an Excel file with a column containing the following rows:
abc123,
556.1,
556.12,
556.123,
556.1234,
556.12345
require(openxlsx)
df <- read.xlsx('testnumbers.xlsx', )
Using the above R code to read the file results in df containing these string
values:
abc123,
556.1,
556.12,
556.12300000000005,
556.12339999999995,
556.12345000000005
The Excel file provided in production has the column formatted as "General". If I format the column as Text, there is no change unless I explicitly double-click each cell in Excel and hit enter. In that case, the number is correctly displayed as a string. Unfortunately, clicking each cell isn't an option in the production environment. Any solution, Excel, R, or otherwise is appreciated.
*Edit:
I've read through this question and believe I understand the math behind what's going on. At this point, I suppose I'm looking for a workaround. How can I get a float from Excel to an R dataframe as text without changing the representation?
Why Are Floating Point Numbers Inaccurate?
I was able to get the correct formats into a data frame using pandas in python.
import pandas as pd
test = pd.read_excel('testnumbers.xlsx', dtype = str)
This will suffice as a workaround, but I'd like to see a solution built in R.
Here is a workaround in R using openxlsx that I used to solve a similar issue. I think it will solve your question, or at least allow you to format as text in the excel files programmatically.
I will use it to reformat specific cells in a large number of files (I'm converting from general to 'scientific' in my case- as an example of how you might alter this for another format).
This uses functions in the openxlsx package that you reference in the OP
First, load the xlsx file in as a workbook (stored in memory, which preserves all the xlsx formatting/etc; slightly different than the method shown in the question, which pulls in only the data):
testnumbers <- loadWorkbook(here::here("test_data/testnumbers.xlsx"))
Then create a "style" to apply which converts the numbers to "text" and apply it to the virtual worksheet (in memory).
numbersAsText <- createStyle(numFmt = "TEXT")
addStyle(testnumbers, sheet = "Sheet1", style = numbersAsText, cols = 1, rows = 1:10)
finally, save it back to the original file:
saveWorkbook(testnumbers,
file = here::here("test_data/testnumbers_formatted.xlsx"),
overwrite = T)
When you open the excel file, the numbers will be stored as "text"
I'm trying to use R to create the content of a tex file. The content contains many accented letters and I not able to correctly write them to a tex file.
Here is a short minimal example of what I would like to perform:
I have a file texinput.tex, which already exists and is encoded as UTF8 without BOM. When I manually write é in Notepad++ and save this file, it compiles correctly in LaTex and the output is as expected.
Then I tried to do this in R:
str.to.write <- "é"
cat(str.to.write, file = "tex_list.tex", append=TRUE)
As a result, the encoded character xe9 appears in the tex file. LaTex throws this error when trying to compile:
! File ended while scanning use of \UTFviii#three#octets.<inserted text>\par \include{texinput}
I then tried all of the following things before the cat command:
Encoding(str.to.write) <- "latin1"
-> same output error as above
str.to.write <- enc2utf8(str.to.write)
-> same output and error as above
Encoding(str.to.write) <- "UTF-8"
-> this appears in the tex file: \xe9. LaTex throws this error: ! Undefined control sequence. \xe
Encoding(str.to.write) <- "bytes"
-> this appears in the tex file: \\xe9. LaTex compiles without error and the output is xe9
I know that I could replace é by \'{e}, but I would like to have an automatic method, because the real content is very long and contains words from 3 different Latin languages, so it has lots of different accented characters.
However, I would also be happy about a function to automatically sanitize the R output to be used with Latex. I tried using xtable and sanitize.text.function, but it appears that it doesn't accept character vectors as input.
After quite a bit of searching and trial-and-error, I found something that worked for me:
# create output function
writeTex <- function(x) {write.table(x, "tex_list.tex",
append = TRUE, row.names = FALSE,
col.names = FALSE, quote = FALSE,
fileEncoding = "UTF-8")}
writeTex("é")
Output is as expected (é), and it compiles perfectly well in LaTex.
Use TIPA for processing International
Phonetic Alphabet (IPA) symbols in Latex. It has become standard in the linguistics field.
I was trying to read an excel spreadsheet into R data frame. However, some of the columns have formulas or are linked to other external spreadsheets. Whenever I read the spreadsheet into R, there are always many cells becomes NA. Is there a good way to fix this problem so that I can get the original value of those cells?
The R script I used to do the import is like the following:
options(java.parameters = "-Xmx8g")
library(XLConnect)
# Step 1 import the "raw" tab
path_cost = "..."
wb = loadWorkbook(...)
raw = readWorksheet(wb, sheet = '...', header = TRUE, useCachedValues = FALSE)
UPDATE: read_excel from the readxl package looks like a better solution. It's very fast (0.14 sec in the 1400 x 6 file I mentioned in the comments) and it evaluates formulas before import. It doesn't use java, so no need to set any java options.
# sheet can be a string (name of sheet) or integer (position of sheet)
raw = read_excel(file, sheet=sheet)
For more information and examples, see the short vignette.
ORIGINAL ANSWER: Try read.xlsx from the xlsx package. The help file implies that by default it evaluates formulas before importing (see the keepFormulas parameter). I checked this on a small test file and it worked for me. Formula results were imported correctly, including formulas that depend on other sheets in the same workbook and formulas that depend on other workbooks in the same directory.
One caveat: If an externally linked sheet has changed since the last time you updated the links on the file you're reading into R, then any values read into R that depend on external links will be the old values, not the latest ones.
The code in your case would be:
library(xlsx)
options(java.parameters = "-Xmx8g") # xlsx also uses java
# Replace file and sheetName with appropriate values for your file
# keepFormulas=FALSE and header=TRUE are the defaults. I added them only for illustration.
raw = read.xlsx(file, sheetName=sheetName, header=TRUE, keepFormulas=FALSE)