Showing unicode character in R - r

Using read_excel function, I read an excel sheet which has a column that contains data in both English and Arabic language.
English is shown normally in R. but Arabic text is shown like this <U+0627><U+0644><U+0639><U+0645><U+0644>
dataset <- read_excel("Dataset_Draft v1.xlsx",skip = 1 )
dataset %>% select(description)
I tried Sys.setlocale("LC_ALL", "en_US.UTF-8") but with no success.
I want to show Arabic text normally and I want to make filter on this column with Arabic value.
Thank you.

You could try the read.xlsx() function from the xlsx library.
Here you can specify an encoding.
data <- xlsx::read.xlsx("file.xlsx", encoding="UTF-8")

Related

Removing tags after reading PDF in R

I am reading PDF in Hebrew into R using textreadr::read_document, and getting tags which I can't remove, such as <U+202B>. Looking at the data in the console, the tags are absent; if I try to remove them using gsub or stringr::str_replace, nothing happens. However, they are clearly there (see image), and worse - if I export to Excel, they are exported as part of the data. What can I do?
Could you try something like this: This code I used to replace non-ASCII characters.
library(textclean)
attach(CA_videos_df) ##data frame name
Encoding(title) <- "latin1" ## title is data frame column name
name = replace_non_ascii(title,replacement = NA, remove.nonconverted = TRUE) ## replacing title with non-ascii characters with NA

in R, data.frame() or write.csv2() functions, change the encoding

I have a text in Persian:
tabs <- "سرگرمی"
and I need to have it in dataframe.
when I try:
final <- data.frame(tabs)
I get this:
exporting the text to .csv using write.csv2(), gives me the same problem too.
any idea how to have the text as original encoding?
We can set the locale with Sys.setlocale
Sys.setlocale("LC_ALL","Persian")

Exporting greek letters from R to Excel

I want to export strings which includes greek letters to Excel using R.
For example I want to export the below expression:
β0=α0+1
I am using XLConnectJars and XLConnect libraries for exporting expressions from R to Excel.
Is there any way to export such expression to export from R to Excel?
For example the below code creates an excel file named "example" to my desktop. That file has an "Expression" sheet and, in that sheet, below expression is printed into the B3 cell:
B0=A0+1
library(XLConnectJars)
library(XLConnect)
wb<-loadWorkbook("data.xlsx", create = TRUE)
createSheet(wb,"Expression")
writeWorksheet(wb,"B0=A0+1", "Expression",startRow = 3,startCol = 2,header=FALSE)
saveWorkbook(wb, file="C:/Users/ozgur/Desktop/example.xlsx")
I want the same thing, but with Greek letters.
I will be very glad for any help? Thanks a lot.
You can do this using unicode characters for the expression for any Greek letters. In the example code below, I also changed the 0 to a subscript 0 using unicode. For this particular expression Beta is Unicode U+03B2, which in R is written as "\U03B2".
library(XLConnectJars)
library(XLConnect)
wb<-loadWorkbook("data.xlsx", create = TRUE)
createSheet(wb,"Expression")
ex <- "\U03B2\U2080=\U03B1\U2080+1"
writeWorksheet(wb,ex, "Expression",startRow = 3,startCol = 2,header=FALSE)
saveWorkbook(wb, file=paste0(Sys.getenv(c("USERPROFILE")),"\\Desktop\\example.xlsx"))
I also used the Sys.getenv to make the saving to the desktop more generalized than a specific user.

Import dataset with spanish special characters into R

I'm new to R and I've imported a dataset in a CSV file format created in Excel into my project using the "Import Dataset from Text File" function. However the dataset displays spanish special characters (á, é, í, ó, ú, ñ) with the � symbol, as below:
Nombre Direccion Beneficiado
Mu�oz B�rbara H�medo
...
Subsequently I tried with this code to make R display the spanish special characters:
Encoding(dataset) <- "UTF-8"
And received the following answer:
Error in `Encoding<-`(`*tmp*`, value = "UTF-8") :
a character vector argument expected
So far I haven't been able to find a solution to this.
I'm working in Rstudio Version 0.98.1083, in Windows 7.
Thanks in advance for your help.

Reading Chinese characters in R

I am using read.xlsx to load a spreadsheet that has several columns containing Chinese characters.
slides<-read.xlsx("test1.xlsx",sheetName="Sheet1",encoding="UTF8",stringsAsFactors=FALSE)
I have tried with and without specifying encoding, I have tried reading from a text file, CSV file etc. No matter the approach, the result is always:
樊志强 -> é‡åº†æ–°æ¡¥åŒ»é™¢
Is there any package/format/sys.locale setup that might help me get the right information into R?
Any help would be greatly appreciated.
You can try the readr package, and use the following code:
library(readr)
loc <- locale(encoding = "UTF-8")
slides <- read_table("filename", locale = loc)

Resources