Warning about UTF-8 with roxygen2 - r

I have a problem about UTF-8.
After conducting roxygen2::roxygenise() for my package,
it showed the warning message 'roxygen2 requires Encoding: UTF-8'.
How can I fix it?
roxygen2::roxygenise()
> Writing NAMESPACE
>
> Loading ABXTT
>
> Writing NAMESPACE
>
> Warning message:
>
> roxygen2 requires Encoding: UTF-8

Add:
Encoding: UTF-8
to your DESCRIPTION file.

As James said,
Add: Encoding: UTF-8
to your DESCRIPTION file to eliminate the warning. You can find the documentation for the format of the DESCRIPTION file on CRAN). That standard includes this field to tell the rest of the build process which file encoding is used. CRAN defaults to use ASCII, so if you use something other than ASCII this field is needed.
As it turns out, roxygen as of 6.1.0 reads and writes files as UTF-8. As noted above this actually is in conflict with CRAN standards/defaults. So, at some point roxygen may stop writing files as UTF-8. Until then, your package documentation has been written as UTF-8 and you probably will need to add this line to avoid the warning.
Side note: UTF-8 reads as ASCII so long as there are no characters outside of the ASCII range (i.e. no special characters). So, in practice whether you remember to include this field or not may make no difference.

Related

Problem with spell checking packages in R

I'm trying to check spelling some words in Russian using "hunspell" library in R.
bad_words <- hunspell("Язвенная болзень", dict='ru_RU.dic')
I have installed Russian dictionary, from here: https://code.google.com/archive/p/hunspell-ru/
It has encoding UTF-8. However, I have following error:
Failed to convert line 1 to ISO8859-1 encoding. Try spelling with a UTF8 dictionary.
It seems strange, neither dict nor R file don't have encoding ISO8859-1...
What is the problem?
If you are operating on Windows, my first guess would be that this is related to the lack of native UTF-8 support in R on Windows. This will be resolved when R4.2 is released; you might wish to try using the development release and seeing whether the problem persists.
Another thing to check is whether your DESCRIPTION file contains the line Encoding: UTF-8, such that your source files are treated as having this encoding.

Encoding problem when your package contains functions with non-english characters

I am building my own package, and I keep running into encoding issues because the functions in my package has non-english (non-ASCII) characters.
Inherently, Korean characters are a part of many of the functions in my package. A sample function:
library(rvest)
sampleprob <- function(url) {
# sample url: "http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20200330003851"
result <- grepl("연결재무제표 주석", html_text(read_html(url)))
return(result)
}
However, when installing the package I run into encoding problems.
I created a sample package (https://github.com/hyk0127/KorEncod/) with just one function (what is shown above) and uploaded it onto my github page for a reproducible example. I run the following code to install:
library(devtools)
install_github("hyk0127/KorEncod")
Below is the error message that I see
Error : (converted from warning) unable to re-encode 'hello.R' line 7
ERROR: unable to collate and parse R files for package 'KorEncod'
* removing 'C:/Users/myname/Documents/R/win-library/3.6/KorEncod'
* restoring previous 'C:/Users/myname/Documents/R/win-library/3.6/KorEncod'
Error: Failed to install 'KorEncod' from GitHub:
(converted from warning) installation of package ‘C:/Users/myname/AppData/Local/Temp/RtmpmS5ZOe/file48c02d205c44/KorEncod_0.1.0.tar.gz’ had non-zero exit status
The error message about line 7 refers to the Korean characters in the function.
It is possible to locally install the package with tar.gz file, but then the function does not run as intended, because the Korean characters are recognized in broken encoding.
This cannot be the first time that someone has tried building a package that has non-english (or non-ASCII) characters, and yet I couldn't find a solution to this. Any help will be deeply appreciated.
A few pieces of info that I think are related:
Currently the DESCRIPTION file specifies "Encoding: UTF-8".
I have used sys.setlocale to set the locale into Korean and back to no avail.
I have specified #encoding UTF-8 to the function to no avail as well.
I am currently using Windows where the administrative language is set to English. I have tried using a different laptop with Windows & administrative language set to Korean, and the same problem appears.
The key trick is replacing the non-ASCII characters with their unicode codes - the \uxxxx encoding.
These can be generated via stringi::stri_escape_unicode() function.
Note that since it will be necessary to completely get rid of the Korean characters in your code in order to pass the R CMD check it will be necessary to perform a manual copy & re-encode via {stringi} on the command line & paste back operation on all your R scripts included in the package.
I am not aware of an available automated solution for this problem.
In the specific use case of the example provided the unicode would read like this:
sampleprob <- function(url) {
# stringi::stri_escape_unicode("연결재무제표 주석") to get the \uxxxx codes
result <- grepl("\uc5f0\uacb0\uc7ac\ubb34\uc81c\ud45c \uc8fc\uc11d",
rvest::html_text(xml2::read_html(url)))
return(result)
}
sampleprob("http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20200330003851")
[1] TRUE
This will be a hassle, but it seems to be the only way to make your code platform neutral (which is a key CRAN requirement, and thus subject to R CMD check).
Adding for the future value (for those facing similar problems), you can also solve this problem by saving the non-ASCII characters in a data file, then loading the value & using it.
So save the character as a data file (using standard package folder names and roxygen2 package)
# In your package, save as a separate file within .\data-raw
kor_chrs <- list(sampleprob = "연결재무제표 주석")
usethis::use_data(kor_chrs)
Then in your functions load the data and use them.
# This is your R file for the function within ./R folder
#' #importFrom rvest html_text
#' #importFrom xml2 read_html
#' #export
sampleprob <- function(url) {
# sample url: "http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20200330003851"
result <- grepl(kor_chrs$sampleprob[1], html_text(read_html(url)))
return(result)
}
This, yes, is still a workaround, but it runs in Windows machines without any troubles.

R github package w/ devtools: warning unknown macro '\item'

I made a package with the help of RStudio & devtools for namespace, DESCRIPTION & Roxygen2 for the man pages. This worked fine, and the help pages I've recently added worked too. I decided to add author name, email, and some details. Initially by manually editing the man page file (BAD) then editing the R script Roxygen2 parts & pushing the change to the Rm file with document()
But: when I install my package
devtools::install_github('SimonDedman/gbm.auto')
I get the following warnings:
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:35: unknown macro '\item'
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:37: unknown macro '\item'
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:39: unknown macro '\item'
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:41: unknown macro '\item'
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:43: unknown macro '\item'
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:45: unknown macro '\item'
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:47: unknown macro '\item'
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:49: unknown macro '\item'
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:51: unexpected section header '\value'
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:55: unexpected section header '\description'
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:65: unexpected section header '\examples'
Warning: /tmp/RtmpNladba/devtools27303e05b1fc/SimonDedman-gbm.auto-dbe3cb0/man/gbm.valuemap.Rd:69: unexpected END_OF_INPUT '
'
Those items are just simple #param arguments which i've not changed, look fine and worked before. Ditto the value / description / examples arguments, which are all standard (but probably a downstream issue which will get solved once the upstream issue is fixed).
Can anyone think what might cause this? None of my help pages are clickable now, even though one would have thought that whatever this problem is with the one script (gbm.valuemap.R), the others should be fine?
Thanks in advance.
You can use RStudio to help on package development in several aspects:
use build & load tool in build panel for package project. You can build, update documents (you may want to check more options in setting about Roxygen2, some are not turned on by default), load package in one click. The error you met should be found earlier, and you don't need to manually install to test.
use the preview feature for .Rd file
Another method is to compare the working and non-working version source file, .Rd file to find the difference.
Generally direct editing of generated file should be avoided.
To make this answer more complete, here are tips from #Benjamin about formatting in Rd files:
It may be the & in lines like "Import with (e.g.) read.csv & specify"
although I'm not entirely sure. I recall some of the punctuation being
problematic, especially %. Changing & to and might be worth a shot
N.B. Changing % to \% in the .Rd documentation file, or the underlying roxygen comment, should prevent issues relating to use of the % symbol. This is due to the use of LaTeX.
From #Thomas:
Can also come up if you have a stray { somewhere
For me, this was happening when I was using \n in the description of a function in a library I was writing (I was describing what that backslash "n" escape sequence meant. I fixed it by escaping the escape sequence itself in the function's .Rd file. Example:
BAD/ERRORED:
\description{
Blah blah and `\n` is an example of blah blah blah
}
FIXED/SOLUTION:
\description{
Blah blah and `\\n` is an example of blah blah blah
}
For me the issue was using "%" sign in the text. When I removed it or escaped it ("\%"), it no longer threw an error. If you open the roxygen generated .Rd file in the editor, it can give you a hint, for me the text after the % was coloured differently.
I had the same error with the unknown macro '\item' and resolved it by removing a repeated #author XXX line from the file. Seems that repeated #{item} parameters that are not meant to be repeated may raise that error.

Documenting special (infix) functions in R packages

In an earlier post, I asked about declaring such functions in R packages and making them work. Having succeeded, I'm now trying to document one such function.
I created an Rd file with the function's name as a title, but when running the CHECK, I get the following warning:
* checking for missing documentation entries ... WARNING
Undocumented code objects:
'%IN%'
I tried several names such as %IN%.Rd or '%IN%'.Rd, to no avail. Any hints on how to make this work?
The goto guide would definitely be section 2.1.1 "Documenting functions"[1] of the "Writing R Extensions" manual. As #joran pointed out in a comment the important part maybe the use of an \alias. According to the "Writing R extensions" manual the %s need to be escaped at least in \alias and in the text. About \name it states: " [name should not contain] ‘!’ ‘|’ nor ‘#’, and to avoid possible problems with the HTML help system it should not contain ‘/’ nor a space. (LaTeX special characters are allowed, but may not be collated correctly in the index.)"[2] and about \alias: " Percent and left brace need to be escaped by a backslash."[3]

Non-english special characters in knitr

I am using knitr 1.1. in R 3.0.0 and within WinEdt (RWinEdt 2.0). I am having problems with knitr recognizing Swedish characters (ä, ö, å). This is not an issue with R; those characters are even recognized in file names, directory names, objects, etc. In Sweave it was not a problem either.
I already have \usepackage[utf8]{inputenc} in my document, but knitr does not seem able to handle the special characters. After running knitr, I get the following message:
Warning in remind_sweave(if (in.file) input) :
It seems you are using the Sweave-specific syntax; you may need Sweave2knitr("deskriptiv 130409.Rnw") to convert it to knitr
processing file: deskriptiv 130409.Rnw
(*) NOTE: I saw chunk options "label=läser_in_data"
please go to http://yihui.name/knitr/options (it is likely that you forgot to
quote "character" options)
Error in parse(text = str_c("alist(", quote_label(params), ")"), srcfile = NULL) :
1:15: unexpected input
1: alist(label=lä
^
Calls: knit ... parse_params -> withCallingHandlers -> eval -> parse
Execution halted
The particular label it complains about is label=läser. Changing the label is not enough, since knitr even complains if R objects use äåö.
I used Sweave2knitr() since the file originally was created for Sweave, but the result was not better: now all äåö have been transformed to äpåö, both in the R chunks and in the latex text, and knitr still gives an error message.
Session info:
R version 3.0.0 (2013-04-03)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 LC_MONETARY=Swedish_Sweden.1252
[4] LC_NUMERIC=C LC_TIME=Swedish_Sweden.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.1
loaded via a namespace (and not attached):
[1] digest_0.6.3 evaluate_0.4.3 formatR_0.7 stringr_0.6.2 tools_3.0.0
As I mentioned there are file names and objects with Swedish characters (since that has not been a problem before), and also the text needs to be in Swedish.
Thank you for any help in getting knitr to work outside of English.
I think you have to contact the maintainer of the R-Sweave mode in WinEdt if you are using this mode to call knitr. The issue is WinEdt has to pass the encoding of the file to knit() if you are not using the native encoding of your OS. You mentioned UTF-8 but that is not the native encoding for Windows, so you must not use \usepackage[utf8]{inputenc} unless you are sure your file is UTF8-encoded.
There are several problems mixed up here, and it is unlikely to solve them all with a single answer.
The first problem is label=läser, which really should be label='läser', i.e. you must quote all the chunk labels (check other labels in the document as well); knitr tries to automatically quote your labels when you write <<foo>>= (it is turned to <<'foo'>>=), but this does not work when you use <<label=foo>>= (you have to write <<label='foo'>>= explicitly). But this problem is perhaps not essential here.
I think the real problem here is the file encoding (which is nasty under Windows). You seem to be using UTF-8 under a system that does not respect UTF-8 by default. In this case you have call knit('yourfile.Rnw', encoding = 'UTF-8'), i.e. pass the encoding to knit(). I do not use WinEdt, so I have no idea how to do that. You can hard-code the encoding in the configurations, but that is not recommended.
Two suggestions:
do not use UTF-8 under Windows; use your system native encoding (Windows-1252, I guess) instead;
or use RStudio instead of WinEdt, which can pass the encoding to knitr;
BTW, since Sweave2knitr() was popped up, there must be other problems in your Rnw document. To diagnose the problem, there are two ways to go:
if you use UTF-8, run Sweave2knitr('deskriptiv 130409.Rnw', encoding = 'UTF-8')
if you use the native encoding of your OS, just run Sweave2knitr('deskriptiv 130409.Rnw')
Please read the documentation if you have questions about the diagnostic information printed out by Sweave2knitr().
R-Sweave invokes knitr through the knitr.edt macro, which itself uses the code in knitrSweave.R to launch knit. The knitcommand in this later script is near the top and reads res <- knit(filename).
Following Yihui's suggestion, you can try to replace this command with
res <- knit(filename, encoding = 'UTF-8')
The knitr.edt and knitrSweave.R files should be in your %b\Contrib\R-Sweave folder, where %b is your winEdt user folder (something like "C:\Users\userA\AppData\Roaming\WinEdt Team\WinEdt 7" under Win 7).
Currently, I do not know how we could pass the encoding as an argument to avoid this hard coding solution.
I would suggest to avoid extended characters in file names which can only be sources of problems. Personally, I never use such names.

Resources