How can I use R-scripts containing Umlauts cross-platform? - r

I'm using Windows on my Desktop PC at work and Linux on my Laptop. I frequently work on R-scripts in Rstudio alternating between the two machines. Whenever I try to import scripts that contain umlauts on one system after working with the other, the umlauts (e.g. ä,ü,ß,ö) are replaced with question marks. Importantly, I'm not talking about data that I am importing but the text in the script itself. For example, writing the following script file in Linux:
# This iß an exämple
text <- c("R kann äußerst nervig sein")
Will be displayed differently when opened on Windows:
# This i? an ex?mple
text <- c("R kann ?u?erst nervig sein")
Are there any settings that prevent this from happening? I've already tried to set the standard encoding to utf-8 on both machines but it didn't seem to change anything.

The standard R build on Windows doesn't fully support UTF-8, because Windows itself just added that capability very recently. So you could download the "WinUCRT" build of R (though I forget the location, Google probably knows), and then things would be fine.
Alternatively, for widest portability you could write your scripts in pure ascii by encoding the accented letters as Unicode escapes. The stringi package can help with this, e.g.
cat(stringi::stri_escape_unicode("R kann äußerst nervig sein"))
#> R kann \u00e4u\u00dferst nervig sein
Created on 2021-11-09 by the reprex package (v2.0.1)
so you'd put this in your code:
text <- "R kann \u00e4u\u00dferst nervig sein"
(There's no need to call c() for one element.) This is inconvenient, but should work on all systems.

Related

Problem with spell checking packages in R

I'm trying to check spelling some words in Russian using "hunspell" library in R.
bad_words <- hunspell("Язвенная болзень", dict='ru_RU.dic')
I have installed Russian dictionary, from here: https://code.google.com/archive/p/hunspell-ru/
It has encoding UTF-8. However, I have following error:
Failed to convert line 1 to ISO8859-1 encoding. Try spelling with a UTF8 dictionary.
It seems strange, neither dict nor R file don't have encoding ISO8859-1...
What is the problem?
If you are operating on Windows, my first guess would be that this is related to the lack of native UTF-8 support in R on Windows. This will be resolved when R4.2 is released; you might wish to try using the development release and seeing whether the problem persists.
Another thing to check is whether your DESCRIPTION file contains the line Encoding: UTF-8, such that your source files are treated as having this encoding.

Using accented letters in R package documentation using bibtex and roxygen2

I am developing an R package in RStudio (R version 3.6.1; RStudio version 1.2.1335) using roxygen2 (version 6.1.1) and am using the \insertCite{} command together with a bibtex file in order to cite references in the documentation for individual functions. I am following the instructions Inserting references in Rd and roxygen2 documentation. Everything works fine, except when I try to include a reference with accented characters. So my REFERENCES.bib file contains the following entry:
#ARTICLE{Cabcdef15,
author={John {\c C}abcdef},
title={A title},
journal={Journal of Applied Stuff},
year={2015},
volume={81},
number={1},
pages={100--200},
}
The {\c C} is the LaTeX command for a C-cedilla (Ç). (I also tried \c{C} and pasting Ç directly and neither resolved the issue.)
I cited this reference in the roxygen2 preamble for my R function myfunction using \insertCite{Cabcdef15}{mypackage}. However, in the documentation output (after running devtools::document() and devtools::build(), installing the package and running library(mypackage) and ?myfunction) the citation appears in my browser (Google Chrome) as (Çabcdef 2015) rather than as (Çabcdef 2015).
Presumably this is an encoding issue. However, from what I read in the aforementioned instructions (under 4.4 Encoding of file REFERENCES.bib) this should be working, provided that I have the line Encoding: UTF-8 in the DESCRIPTION file for my R package, which I do. Hence I am stumped.
I have a strong suspicion you are using a Microsoft operating system.
I have code in a roxygen2 examples block which outputs accented French characters: works fine with non-French locales on MacOS and Linux: Windows makes a mess of it. I have UTF-8 in package DESCRIPTION. For me, the obvious work-around is not to use Windows for documenting the package. UTF-8 everywhere seems to work well for me, except on Windows. The R documentation links are helpful, and, in a related post, the mighty Yihui Xie writes about this issue.
This WONTFIX R issue also hints at the root cause: Windows.
A more palatable and Windows-compatible workaround is discussed in platform specific sections in Writing R Extensions.

R I want to remove those red lines

I want to remove those red lines in r studio.
I upgraded to the latest version, according to someone's suggestion.
But it is not working.
The problem occurs when I write Korean words.
The default encoding is UTF-8.
I found a similar problem here, but it didn't work for me.
https://community.rstudio.com/t/why-and-where-is-a-an-unexpected-token-in-r-and-how-should-i-deal-with-it/26496/4
df$번호
df$이름
df$성별
This is a bug -- unfortunately, the RStudio diagnostics system does not correctly handle multibyte characters in R Markdown documents on Windows. This will hopefully be fixed in the next release (v1.3).

Arabic text not showing in R-

Just started working with R in Arabic as I plan to do text analysis and text mining with Hadith corpus. I have been reading threads related to my question but nevertheless, still can't manage to get the REAL basics here (sorry, absolute beginner).
So, I entered:
textarabic.v <- scan("data/arabic-text.txt", encoding="UTF-8", what= "character",sep="\n")
And what comes out textarabic.v is of course, symbols (pic). Prior to this, I saved my text in utf-8 as I read in a thread but still nothing shows in Arabic.
I can type in Arabic R but scan brings the text in symbols.
Also read and tried to implement other user's are codes to make Arabic text function but I don't even know how and where to implement them.
I added to R, tm and NLP packages.
What do you suggest for me to do next?
Thanks in advance,
I just posted an answer saying that you must definitely be using R on Windows before I saw your comment that you're on OSX. On OSX the situation is not quite so dire. The problem is that you're using too old a version of R. If I right remember, anything prior to 3.2 does not handle Unicode correctly. Try installing 3.3.3 from https://cran.r-project.org/bin/macosx/ and if necessary re-install the packages you need. Then you should be fine. بالتوفيق!

The wrong symbols inplace of Cyrillic in Rpres file (RStudio)

I have a trouble - the presentation compiled in RStudio can't make the correct output. Instead of cyrillic symbols(Ukrainian/Russian) I have got smth from ASCII list (�).
Could you help me with cyrillic?
so far I have not found a clear hints how to remove this trouble.
PC:
win7 bit
RAM 8Gb
R 3.3.0 64bit
Rstudio - Version 0.99.486
Try "save with encoding" (UTF-8 or whatever is the right encoding).
You can also change the default encoding in Rstudio from Tools --> Global options --> Default text encoding.

Resources