Unicode version and conformance info in R - r

Unicode has many versions.
Current version is 14.
Package utf8 documentation explicitly says it supports 10.0.0.
But what about R itself? I can not find any comment on which version of Unicode R supports.
Unicode has different level of support it can get.
As I print out '\u0061\u20de\u0308\u20dd' in R studio,
the output is not what unicode 14 guideline says for enclosing marks.
So for complete specification of unicode support,
I think R should say about the level it supports unicode, but I could not find any for this info either. So anyone knows any info about it?

Related

Problem with spell checking packages in R

I'm trying to check spelling some words in Russian using "hunspell" library in R.
bad_words <- hunspell("Язвенная болзень", dict='ru_RU.dic')
I have installed Russian dictionary, from here: https://code.google.com/archive/p/hunspell-ru/
It has encoding UTF-8. However, I have following error:
Failed to convert line 1 to ISO8859-1 encoding. Try spelling with a UTF8 dictionary.
It seems strange, neither dict nor R file don't have encoding ISO8859-1...
What is the problem?
If you are operating on Windows, my first guess would be that this is related to the lack of native UTF-8 support in R on Windows. This will be resolved when R4.2 is released; you might wish to try using the development release and seeing whether the problem persists.
Another thing to check is whether your DESCRIPTION file contains the line Encoding: UTF-8, such that your source files are treated as having this encoding.

Using accented letters in R package documentation using bibtex and roxygen2

I am developing an R package in RStudio (R version 3.6.1; RStudio version 1.2.1335) using roxygen2 (version 6.1.1) and am using the \insertCite{} command together with a bibtex file in order to cite references in the documentation for individual functions. I am following the instructions Inserting references in Rd and roxygen2 documentation. Everything works fine, except when I try to include a reference with accented characters. So my REFERENCES.bib file contains the following entry:
#ARTICLE{Cabcdef15,
author={John {\c C}abcdef},
title={A title},
journal={Journal of Applied Stuff},
year={2015},
volume={81},
number={1},
pages={100--200},
}
The {\c C} is the LaTeX command for a C-cedilla (Ç). (I also tried \c{C} and pasting Ç directly and neither resolved the issue.)
I cited this reference in the roxygen2 preamble for my R function myfunction using \insertCite{Cabcdef15}{mypackage}. However, in the documentation output (after running devtools::document() and devtools::build(), installing the package and running library(mypackage) and ?myfunction) the citation appears in my browser (Google Chrome) as (Çabcdef 2015) rather than as (Çabcdef 2015).
Presumably this is an encoding issue. However, from what I read in the aforementioned instructions (under 4.4 Encoding of file REFERENCES.bib) this should be working, provided that I have the line Encoding: UTF-8 in the DESCRIPTION file for my R package, which I do. Hence I am stumped.
I have a strong suspicion you are using a Microsoft operating system.
I have code in a roxygen2 examples block which outputs accented French characters: works fine with non-French locales on MacOS and Linux: Windows makes a mess of it. I have UTF-8 in package DESCRIPTION. For me, the obvious work-around is not to use Windows for documenting the package. UTF-8 everywhere seems to work well for me, except on Windows. The R documentation links are helpful, and, in a related post, the mighty Yihui Xie writes about this issue.
This WONTFIX R issue also hints at the root cause: Windows.
A more palatable and Windows-compatible workaround is discussed in platform specific sections in Writing R Extensions.

R - How to set text language when exporting plots with devEMF

I work with R-Studio in kubuntu 16.04. My language settings are:
> Sys.getlocale()
[1] "LC_CTYPE=de_AT.UTF-8;LC_NUMERIC=C;LC_TIME=de_AT.UTF-8;LC_COLLATE=de_AT.UTF-8;LC_MONETARY=de_AT.UTF-8;LC_MESSAGES=de_AT.UTF-8;LC_PAPER=de_AT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_AT.UTF-8;LC_IDENTIFICATION=C"
> Sys.getenv()
...
LANG de_AT.UTF-8
LANGUAGE de_AT:de
...
However, if I export a plot with the "Enhanced Metafile Graphics Device" (link), for instance:
emf("file.emf"); hist(somedata, main = "Überschrift"); dev.off()
and then import file.emf into MS Word (on another PC) and make it editable, all the text of the plot is in US English.
Question 1: Is it possible to obtain plots with text languages other than English?
Question 2: How?
(I realize my answer is late, but adding information for posterity)
There are two graphics formats supported by devEMF:
EMF. Locale / language information is not included in the EMF format specification except to distinguish languages with vertical vs. horizontal text.
EMF+. Locale / language information IS included in the EMF+ specification, but this feature is not implemented by devEMF.
That said, Unicode characters are fully supported for both formats and should not disappear or otherwise change when viewed/edited in MS Word.

Chinese in stylo (R and R studio)

I'm quite new to R and R studio, but I wanted to use the stylo package for stylometric analyses.
I collected two Chinese corpora in .txt, followed the instructions on https://sites.google.com/site/computationalstylistics/ , but neither R or R studio seemed to be able to read the Chinese text.
I know Chinese isn't one of the default supported languages of the stylo package, but here they mentioned that results for Chinese and the stylo package are 'quite promising'.
Also, in R it doesn't seem to be able to type Chinese characters, but in R studio it does. Am I doing something wrong? I believe my encoding is in unicode.
I hope you can help me
Thomas

R documentation, how to force Rcmd Rd2pdf to automatically break an overfull line in examples R-like section?

My package passed Rcmd check successfully. but in constructed PDF format of R documentation, within the examples section, half of the line (R code) is out of the paper. I also found another CRAN submitted package, ftsa , that suffer from overfulling in a line too, see ftsa Reference manual.
I guess this
problem roots in the behavior of the verbatim environment in Latex. However there are some Latex packages to deal with this, https://tex.stackexchange.com/questions/14342/verbatim-environment-that-can-break-too-long-lines, but I do not know how to use them with Rcmd.
In dealing with these cases, why Rcmd check does not show any error, warnings or note as Latex does?
How can I force line breaking in examples section?
Thank you
R CMD check does not analyze / forward all LaTeX warnings. Yes, in an ideal world it would.
It has been the case, always that you should format the \examples{ ... } section
well, yourself,
notably including sensible line breaks (and leading spaces) for nice alignment of multi -line examples.
Use the sources of R itself or recommended packages (or those authored by me, as I do pay attention to this quite a bit), to see good examples in their *.Rd files.
Remember: The current development sources of R are always web accessible at
R (devel) source tree # svn.r-project.org, i.e. R's standard packages at R sources src/library.

Resources