Chinese in stylo (R and R studio) - r

I'm quite new to R and R studio, but I wanted to use the stylo package for stylometric analyses.
I collected two Chinese corpora in .txt, followed the instructions on https://sites.google.com/site/computationalstylistics/ , but neither R or R studio seemed to be able to read the Chinese text.
I know Chinese isn't one of the default supported languages of the stylo package, but here they mentioned that results for Chinese and the stylo package are 'quite promising'.
Also, in R it doesn't seem to be able to type Chinese characters, but in R studio it does. Am I doing something wrong? I believe my encoding is in unicode.
I hope you can help me
Thomas

Related

Unicode version and conformance info in R

Unicode has many versions.
Current version is 14.
Package utf8 documentation explicitly says it supports 10.0.0.
But what about R itself? I can not find any comment on which version of Unicode R supports.
Unicode has different level of support it can get.
As I print out '\u0061\u20de\u0308\u20dd' in R studio,
the output is not what unicode 14 guideline says for enclosing marks.
So for complete specification of unicode support,
I think R should say about the level it supports unicode, but I could not find any for this info either. So anyone knows any info about it?

Using accented letters in R package documentation using bibtex and roxygen2

I am developing an R package in RStudio (R version 3.6.1; RStudio version 1.2.1335) using roxygen2 (version 6.1.1) and am using the \insertCite{} command together with a bibtex file in order to cite references in the documentation for individual functions. I am following the instructions Inserting references in Rd and roxygen2 documentation. Everything works fine, except when I try to include a reference with accented characters. So my REFERENCES.bib file contains the following entry:
#ARTICLE{Cabcdef15,
author={John {\c C}abcdef},
title={A title},
journal={Journal of Applied Stuff},
year={2015},
volume={81},
number={1},
pages={100--200},
}
The {\c C} is the LaTeX command for a C-cedilla (Ç). (I also tried \c{C} and pasting Ç directly and neither resolved the issue.)
I cited this reference in the roxygen2 preamble for my R function myfunction using \insertCite{Cabcdef15}{mypackage}. However, in the documentation output (after running devtools::document() and devtools::build(), installing the package and running library(mypackage) and ?myfunction) the citation appears in my browser (Google Chrome) as (Çabcdef 2015) rather than as (Çabcdef 2015).
Presumably this is an encoding issue. However, from what I read in the aforementioned instructions (under 4.4 Encoding of file REFERENCES.bib) this should be working, provided that I have the line Encoding: UTF-8 in the DESCRIPTION file for my R package, which I do. Hence I am stumped.
I have a strong suspicion you are using a Microsoft operating system.
I have code in a roxygen2 examples block which outputs accented French characters: works fine with non-French locales on MacOS and Linux: Windows makes a mess of it. I have UTF-8 in package DESCRIPTION. For me, the obvious work-around is not to use Windows for documenting the package. UTF-8 everywhere seems to work well for me, except on Windows. The R documentation links are helpful, and, in a related post, the mighty Yihui Xie writes about this issue.
This WONTFIX R issue also hints at the root cause: Windows.
A more palatable and Windows-compatible workaround is discussed in platform specific sections in Writing R Extensions.

Arabic text not showing in R-

Just started working with R in Arabic as I plan to do text analysis and text mining with Hadith corpus. I have been reading threads related to my question but nevertheless, still can't manage to get the REAL basics here (sorry, absolute beginner).
So, I entered:
textarabic.v <- scan("data/arabic-text.txt", encoding="UTF-8", what= "character",sep="\n")
And what comes out textarabic.v is of course, symbols (pic). Prior to this, I saved my text in utf-8 as I read in a thread but still nothing shows in Arabic.
I can type in Arabic R but scan brings the text in symbols.
Also read and tried to implement other user's are codes to make Arabic text function but I don't even know how and where to implement them.
I added to R, tm and NLP packages.
What do you suggest for me to do next?
Thanks in advance,
I just posted an answer saying that you must definitely be using R on Windows before I saw your comment that you're on OSX. On OSX the situation is not quite so dire. The problem is that you're using too old a version of R. If I right remember, anything prior to 3.2 does not handle Unicode correctly. Try installing 3.3.3 from https://cran.r-project.org/bin/macosx/ and if necessary re-install the packages you need. Then you should be fine. بالتوفيق!

R doesn't recognize Latin7 characters

I have really strange problem. I am using Lithuanian keyboard, but R doesn't recognize letters such as į, š, č.
For example when I write:
žodis <- "žibutė"
in R console I see
þodis <- "þibutë".
I have R in several computers, all work fine except this one. Can you help me with this issue? Is any function to let R know that I'm using Lithuanian keyboard? My computer's operating system is Windows 10 and R version 3.3.2.

knitr, lyx, mac, Korean

Dear Friends in the knitr community with lyx and Apple Mac
Hello! I am using Apple Mac for Reproducible research of R. Currently, I enjoy knitr in lyx for writing some R scripts. The problem I have is related non-English characters, specially Korean in the knitr chunk.
When I type any Korean Character in a Chunk for typing file name or some extra explanations in #, the pdf output of the file does not print Korean properly. No matter of choosing or not choosing the option "eval=FALSE", R in lyx does not show Korean on the output. But, in the ordinary text part of the writing, Korean shows well. This problem seems to be Apple Mac specific, because another lyx environment in Linux (ubutu) makes the similar problem at al. Is it a lyx issue with MacTex or knitr one? Thank you in advance.
regards,
Jong-Hwa

Resources