Unicode hyphen causes error when building R documentation - r

I'm trying to put together my first package for R, and I'm unable to generate the PDF for the documentation using R CMD Rd2pdf. The error I get is
! Package inputenc Error: Unicode char ‐ (U+2010)
(inputenc) not set up for use with LaTeX.
I don't really understand what's going on here, since it seems absurd that the hyphen symbol would naturally throw an error in building documentation for an R package. I'm using roxygen2 to build the package, and the main file is saved with UTF-8 encoding.

Related

LaTeX Error: Unicode character √ (U+221A) not set up for use with LaTeX

I am writing one R package where one of the output has an unit √(mg/√kg L). When I am submitting the package to CRAN it shows me the following error
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
LaTeX errors found:
! Package inputenc Error: Unicode character √ (U+221A)
(inputenc) not set up for use with LaTeX.
How can I solve the problem?

Encoding problem when your package contains functions with non-english characters

I am building my own package, and I keep running into encoding issues because the functions in my package has non-english (non-ASCII) characters.
Inherently, Korean characters are a part of many of the functions in my package. A sample function:
library(rvest)
sampleprob <- function(url) {
# sample url: "http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20200330003851"
result <- grepl("연결재무제표 주석", html_text(read_html(url)))
return(result)
}
However, when installing the package I run into encoding problems.
I created a sample package (https://github.com/hyk0127/KorEncod/) with just one function (what is shown above) and uploaded it onto my github page for a reproducible example. I run the following code to install:
library(devtools)
install_github("hyk0127/KorEncod")
Below is the error message that I see
Error : (converted from warning) unable to re-encode 'hello.R' line 7
ERROR: unable to collate and parse R files for package 'KorEncod'
* removing 'C:/Users/myname/Documents/R/win-library/3.6/KorEncod'
* restoring previous 'C:/Users/myname/Documents/R/win-library/3.6/KorEncod'
Error: Failed to install 'KorEncod' from GitHub:
(converted from warning) installation of package ‘C:/Users/myname/AppData/Local/Temp/RtmpmS5ZOe/file48c02d205c44/KorEncod_0.1.0.tar.gz’ had non-zero exit status
The error message about line 7 refers to the Korean characters in the function.
It is possible to locally install the package with tar.gz file, but then the function does not run as intended, because the Korean characters are recognized in broken encoding.
This cannot be the first time that someone has tried building a package that has non-english (or non-ASCII) characters, and yet I couldn't find a solution to this. Any help will be deeply appreciated.
A few pieces of info that I think are related:
Currently the DESCRIPTION file specifies "Encoding: UTF-8".
I have used sys.setlocale to set the locale into Korean and back to no avail.
I have specified #encoding UTF-8 to the function to no avail as well.
I am currently using Windows where the administrative language is set to English. I have tried using a different laptop with Windows & administrative language set to Korean, and the same problem appears.
The key trick is replacing the non-ASCII characters with their unicode codes - the \uxxxx encoding.
These can be generated via stringi::stri_escape_unicode() function.
Note that since it will be necessary to completely get rid of the Korean characters in your code in order to pass the R CMD check it will be necessary to perform a manual copy & re-encode via {stringi} on the command line & paste back operation on all your R scripts included in the package.
I am not aware of an available automated solution for this problem.
In the specific use case of the example provided the unicode would read like this:
sampleprob <- function(url) {
# stringi::stri_escape_unicode("연결재무제표 주석") to get the \uxxxx codes
result <- grepl("\uc5f0\uacb0\uc7ac\ubb34\uc81c\ud45c \uc8fc\uc11d",
rvest::html_text(xml2::read_html(url)))
return(result)
}
sampleprob("http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20200330003851")
[1] TRUE
This will be a hassle, but it seems to be the only way to make your code platform neutral (which is a key CRAN requirement, and thus subject to R CMD check).
Adding for the future value (for those facing similar problems), you can also solve this problem by saving the non-ASCII characters in a data file, then loading the value & using it.
So save the character as a data file (using standard package folder names and roxygen2 package)
# In your package, save as a separate file within .\data-raw
kor_chrs <- list(sampleprob = "연결재무제표 주석")
usethis::use_data(kor_chrs)
Then in your functions load the data and use them.
# This is your R file for the function within ./R folder
#' #importFrom rvest html_text
#' #importFrom xml2 read_html
#' #export
sampleprob <- function(url) {
# sample url: "http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20200330003851"
result <- grepl(kor_chrs$sampleprob[1], html_text(read_html(url)))
return(result)
}
This, yes, is still a workaround, but it runs in Windows machines without any troubles.

I have unicode characters/ latex errors in my documentation causing issues in R CMD check. How do I troubleshoot?

I am building my my first pacakge, and when running R CMD check after devtools::build() I found a bunch of issues with "checking PDF version of the manual." I struggled to find directions on how to troubleshoot so I'm going to ask this question: "How can I identify the problematic text in my .R files that Roxygen uses to make the .Rd files that make up the manual?" and then go ahead an answer in hopes that it is useful to someone else down the line.
I would get a long series of WARNINGS, and the unique components are below:
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
LaTeX errors found:
! LaTeX Error: Bad math environment delimiter.
See the LaTeX manual or LaTeX Companion for explanation. Type H
for immediate help.
! Package inputenc Error: Unicode character ℓ (U+2113) (inputenc)
not set up for use with LaTeX.
See the inputenc package documentation for explanation. Type H
for immediate help.
I thought these were pretty clear: Somewhere I have bad Latex (like a missing curly bracket or dollar sign, a imagine), and also there is some issue with including this unicode character.
My question is, "How do I identify the offending lines?"
Following up on #user2554330's comment. Here's a simple way to check for eventual unicode characters in your functions and documentation:
# functions
functions <- list.files(path = './R', all.files = T, recursive = T, full.names = T)
lapply(X=functions, FUN = tools::showNonASCIIfile)
# documentation
docs <- list.files(path = './man', all.files = T, recursive = T, full.names = T)
lapply(X=docs, FUN = tools::showNonASCIIfile)
Running R CMD check produces a directory, mypackagename.Rcheck/
In my case there is a file in there called Rdlatex.log
This logfile has more verbose (and very helpful/ clear) warning and error messages with line numbers and file names to allow me to find the offending text and develop some ideas of how to fix it :-)
There may be other issues (e.g. I noticed in other related questions the suggestion to check the installation and paths for the underlying latex infrastructure), but it seems like starting with my .R files is likely to be fruitful.
This is platform-specific but I'm pretty sure there are text editors equivalent on other platforms.
On MacOSX I use BareBones Edit, and it has some commands like "Zap Gremlins" which eliminate all non-ASCII characters and/or optionally replace things like 'smart quotes' with ASCII quote characters.
As to fixing LaTeX embedded command strings: best I can suggest is using some tool like "KLatexFormula" (Windows) or equivalent to play with the strings until they render.

Rjsonio vs jsonlite lexical error

New to this so please excuse any mistakes.
I am using R to convert some JSON files and using JSONLITE is perfect in relation to the format that it produces. However I keep getting a lexical error similar to the below
Error in parse_con(txt, bigint_as_char) :
lexical error: invalid char in json text.
ed that it was delivered to "letterbox" but it wasn't\" ]\n
(right here) ------^
So I decided to use the JSONIO package and there is no lexical error with this package but the format is not as easy to work with.
So suppose I have two questions:
1: Does any one know why the error is produced using Jsonlite and not jsonio
2; is there an easy way to get jsonio to produce outputs like jsonlite?
Thanks
Ibrahim

cannot knit PDF in R studio

I am totally new to RStudio and I was working on an RMarkdown document. When I click on Knit PDF, and I keep getting the result like this:
output file: A1-soln-template__2__molly.knit.md
! Package inputenc Error: Unicode char μ (U+3BC)
(inputenc) not set up for use with LaTeX.
See the inputenc package documentation for explanation.
Type H for immediate help.
...
l.142 null hypothesis: Ho:μ
Try running pandoc with --latex-engine=xelatex.
pandoc: Error producing PDF
Error: pandoc document conversion failed with error 43
Execution halted
How can I solve this and get things to work?
Thank you
Try:
install.packages('jsonlite', dependencies=TRUE, repos='http://cran.rstudio.com/')
Also, you might need to install MiKTeX.
The issue here is related to encoding. You are trying to include ASCII characters where they should not be.
In particular, you have typed:
Ho:μ
This needs to be changed to:
$H_0: \mu$
You will need to learn some LaTeX.

Resources