knitr: generating UTF-8 output from chunks - r

I have a doc.Rnw supposed to produce some Russian UTF-8 strings:
\documentclass{article}
\usepackage{inputenc}
\inputencoding{utf8}
\usepackage[main=english,russian]{babel}
\begin{document}
\selectlanguage {russian}
<<test, results='asis', echo=FALSE>>=
print(readLines('string.rus', encoding="UTF-8"))
print("Здравствуйте")
#
Здравствуйте
\selectlanguage {english}
\end{document}
string.rus has a UTF-8 string which corrrctly shows in R console:
print(readLines('string.rus', encoding="UTF-8"))
# [1] "Здравствуйте"
doc.Rnw coorectly shows in Windows notepad, while both:
file.show("doc.Rnw")
file.show("doc.Rnw", encoding="UTF-8")
fail to show properly the UTF-8 strings.
Using:
knit("doc.Rnw")
The document part of the output doc.tex shows:
\begin{document}
\selectlanguage {russian}
[1] "<U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>"
[1] " <U+0097>д <U+0080>авс <U+0082>в <U+0083>й <U+0082>е"
Здравствуйте
\selectlanguage {english}
\end{document}
which of course does not compile in PDFLaTeX. Using:
knit("doc.Rnw", encoding="UTF-8")
gives even worse results.
Commenting the chunks which should generate UTF-8 strings:
print(readLines('string.rus', encoding="UTF-8"))
print("Здравствуйте")
gives a valid doc.tex which compiles in MikTeX and shows properly the remaining UTF-8 string.
Even if I comment the first print... and leave only the second one. I can't compile. This seems to prove that the original encoding of doc.Rnw is correct.
I tried to replace both print commands with:
a="Здравствуйте"
Encoding(a)="UTF-8"
print(a)
In this case I can compile, but the PDF output is (first string is cut out from margin):
[1] «U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443>
Здравствуйте
So the chunk output is still wrong.
How to properly print UTF-8 strings from chunks?
R version is 3.3.3 (2017-03-06) for Windows and knitr is 1.15.1 (2016-11-22).

An extended working example is below:
\documentclass{article}
\usepackage{inputenc}
\inputencoding{utf8}
\usepackage[main=english,russian]{babel}
\begin{document}
\selectlanguage {russian}
<<test, results='asis', echo=FALSE>>=
s=readLines('string.rus', , encoding="UTF-8")
message("s ", Encoding(s), ": ", s)
Encoding(s)="latin1"
message("s latin1: ", s)
Encoding(s)="unkwnown"
message("s unkwnown: ", s)
Encoding(s)="utf8"
message("s utf8: ", a)
a="Здравствуйте"
message("a ", Encoding(a), ": ", a)
Encoding(a)="latin1"
message("a latin1: ", a)
Encoding(a)="utf8"
message("a utf8: ", a)
Encoding(a)="UTF-8"
message("a UTF-8: ", a)
u=("\U0417")
message("u ", Encoding(u), ": ", u)
Encoding(u)="latin1"
message("u latin1: ", u)
Encoding(u)="unkwnown"
message("u unkwnown: ", u)
#
Здравствуйте
\selectlanguage {english}
\end{document}
After knit("doc.Rnw", this is the output related to test chunk found in doc.tex (without knitr code decoration for readability):
s UTF-8: <U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>
s latin1: Здравствуйте
s unkwnown: Здравствуйте
s utf8: <U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>
a unknown: Здравствуйте
a latin1: Здравствуйте
a utf8: Здравствуйте
a UTF-8: <U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>
u UTF-8: <U+0417>
u latin1: З
u unkwnown: З
Some comments follow.
First, only message() works, print() gives always errors.
In both the externally read string s and the locally set a, the behavior is weird.
in fact, keeping or explicitly setting the code to UTF-8 produces the wrong results (utf8 works for a).
One might think the UTF8 encoding of the documents (doc.Rnw and string.rus) is not properly set. This is why I added the line u=("\U0417"), which is UTF8 for sure. Again, only removing the UTF8 encoding gives a proper output.
In a simialr fashion, requesting explicitly an UTF8 output:
knit("doc.Rnw", encoding="UTF-8")
does not produce the UTF8 charaters, but their unicode values or weird ones.
In the end, I can produce the desired .tex file and compile the LaTeX it, but why there is the above counter-intuitive behavior is beyond me.
Hopefully someone will give a good explanation.

Related

How to avoid "! LaTeX Error: Environment axis undefined" when using include_tikz with pgfplots?

I have successfully included in an R/exams .Rmd file several graphics made in TikZ. The same does not happen when I try to include a plot under pgfplots using include_tikz(). Whenever \begin {axis} and \end {axis} are included, beware of the error "! LaTeX Error: Environment axis undefined".
In the RStudio console the legend appears: "This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) (preloaded format = pdflatex) restricted \ write18 enabled.entering extended mode", even having enabled in TexStudio write-18. None of these messages appear when I include other TikZ graphs other than pgfplots.
Any TikZ graph works when run in TexMaker or TexStudio, which indicates that it is not a problem of the absence of LaTeX libraries or packages.
I include a part of my code, adapted from https://www.latex4technics.com/?note=1HCT:
```{r datos1, echo = FALSE, results = "hide"}
library(exams)
typ <- match_exams_device()
image01<-'
\\begin{tikzpicture}
\\begin{axis}[legend pos=south east]
\\addlegendimage{empty legend}
\\addplot {sqrt(x)};
\\addplot {ln(x)};
\\addlegendentry{\\hspace{-.6cm}\\textbf{A title}}
\\addlegendentry{$\\sqrt{x}$}
\\addlegendentry{$\\ln{x}$}
\\end{axis}
\\end{tikzpicture}
'
```
```{r grafica01, echo = FALSE, results = "asis"}
include_tikz(image01, name = "grafiko1", markup =
"markdown",format = typ,library = c("arrows"),packages =
"booktabs",width = "7cm",header = "\\usepackage{/home/r-
exams/Documentos/NuevoRStudio/Rmarkdowns/
Esqueleto/exercises/schoice/
LaboratorioTikZ/3dplot}")
```
The answer is right there in your question title. You need to include the pgfplots package:
include_tikz(image01, packages = "pgfplots", ...)
The other packages, library, and header arguments from the call in your question are not needed.
The reason is that for include_tikz() you just use the {tikzpicture} code while in the full .tex file that you linked you additionally have:
\documentclass{standalone}
\usepackage{pgfplots}
\begin{document}
\begin{tikzpicture}
...
\end{tikzpicture}
\end{document}
Note the \usepackge{pgfplots} in the second line!

R library "XML" doesn't recognize encoding

Problem
I have an XML file that I would like to parse in R. I know that this file is not corrupted because the following Python code seems to work:
>>> import xml.etree.ElementTree as ET
>>> xml_tree = ET.parse(PATH_TO_MY_XML_FILE)
>>> do_my_regular_xml_stuff_that_seems_to_work_no_problem(xml_tree)
Now, when I try to run the following code in R, I get an error message:
> library("XML")
> xml_tree <- XML::xmlParse(PATH_TO_MY_XML_FILE)
Error in nchar(text_repr): invalid multibyte string, element 1
Traceback:
Alright, maybe the parser doesn't recognize the encoding. Luckily this should be specified in a decent XML file. So, I go to my shell and check:
$ head -n1 PATH_TO_MY_XML_FILE
??<?xml version="1.0" encoding="utf-16"?>
Now, I can go back to R and explicitly pass on the encoding, only to face the next error message where I got stuck now:
> library("XML")
> xml_tree <- XML::xmlParse(PATH_TO_MY_XML_FILE, encoding='UTF-16')
Start tag expected, '<' not found
Error: 1: Start tag expected, '<' not found
Traceback:
1. XML::xmlParse(filePath, encoding = "UTF-16")
2. (function (msg, ...)
. {
. if (length(grep("\\\n$", msg)) == 0)
. paste(msg, "\n", sep = "")
. if (immediate)
. cat(msg)
. if (length(msg) == 0) {
. e = simpleError(paste(1:length(messages), messages, sep = ": ",
. collapse = ""))
. class(e) = c(class, class(e))
. stop(e)
. }
. messages <<- c(messages, msg)
. })(character(0))
A last attempt to check (in R) if the file is in fact "UTF-16" encoded yields:
> f <- file(filePath, 'r', encoding = "UTF-16")
> firstLine <- readLines(f, n=1)
> close(f)
> print(line)
[1] "<?xml version=\"1.0\" encoding=\"utf-16\"?>"
Which looks just about right to me.
Question(s)
Does anyone know what is happening here? Is this a bug from the XML library? Is the file maybe not 'UTF-16' encoded, even though it claims it is? What are the two question marks ?? that I see when I print the file into the shell? These question marks don't appear when reading in the file properly...
Is this a bug from the XML library?
I think there could be a bug here. If I generate a valid UTF-16 XML document, which will have an initial byte-order mark:
$ echo '<a>😊</a>' | iconv -t utf-16 >a-utf16.xml
$ xxd a-utf16.xml
00000000: fffe 3c00 6100 3e00 3dd8 0ade 3c00 2f00 ..<.a.>.=...<./.
00000010: 6100 3e00 0a00 a.>...
then I can parse it with:
> XML::xmlParse('a-utf16.xml')
<?xml version="1.0"?>
<a>😊</a>
but not if I specify the encoding:
> XML::xmlParse('a-utf16.xml', encoding='utf-16')
Start tag expected, '<' not found
Error: 1: Start tag expected, '<' not found
Your original problem was when you weren't specifying the encoding. However:
I know that this file is not corrupted because the following Python code seems to work
That's a good hint, but I think you'll find edge cases where that doesn't hold. Try iconv for a second opinion on whether the file is encoded correctly.
For a more specific response, you'll need to post a reproducible XML file,

Parameterized RMarkdown File Name

I am trying to use 1 parameterized RMarkdown file to run 3 reports and output a different file name for each report (e.g. Report_A.html, Report_B.html, Report_C.html).
I have tried modifying the knit hook of the YAML, but am unable to get it to resolve the parameters.
I do not want to create a separate R file to loop through the parameters.
So far I have the following and the title is parameterized but the report name is Report_r params$site_2019-12-12.html
params:
site: "A"
title: "Report `r params$site`"
knit: (function(inputFile, encoding) {
rmarkdown::render(inputFile,
output_file=paste0("Report_", `r params$site`, "_", Sys.Date(),'.html')) })
Any suggestions would be appreciated - thank you!
Maybe don't call 'r ...' in :
output_file=paste0("Report_", `r params$site`, "_", Sys.Date(),'.html')) })
as in:
output_file=paste0("Report_", params$site, "_", Sys.Date(),'.html')) })

knitr sanitize_fn warning incorporating plots into latex via knit2pdf

I am dynamically generating pdf reports in R using a driver script that calls knit2pdf. My report source is latex, in a .Rnw file, and the call is like this:
knit2pdf("source.Rnw",output=paste0(fname,".tex"),quiet=T)
fname does not contain any dots.
source.Rnw contains:
<<setup, echo=FALSE >>=
opts_chunk$set(fig.path=tempfile(tmpdir="work",pattern=fname,fileext=".pdf"))
#
<<custom-dev, echo=FALSE >>=
my_pdf<-function(file,width,height) {
pdf(file,width=5,height=2)
}
#
<<plot, echo=FALSE, results="asis", dev="my_pdf", fig.ext="pdf">>=
# A ggplot chart
print(g)
#
The reports are fine, but the following warning is generated from
knitr's sanitize_fn:
dots in figure paths replaced with _ ("work/fname_pdfplot")
Clearly, the offending . is coming from the fileext in opts_chunk. However, if I change that fileext to "_pdf", I don't get the plot in the report at all, and latex throws an error about the file (fname_pdfplot-1) not being found.
Ideas on how to (a) do this right so there's no warning, or (b) do this as I'm doing it but suppress this particular warning?
Edit 1:
Here is a working example of source.Rnw without using fileext. This does seem to be closer, because now it breaks with an error due to putting work\fname... in includegraphics rather than work/fname..., and if I change the backslash to a proper slash, it compiles cleanly.
tempfile is returning work\fname..., so perhaps my fix is just to re-escape those backslashes (or replace them with a forward slash). Is this something I should have known to do already?
\documentclass[titlepage]{article}
\usepackage[utf8]{inputenc}
\usepackage[headheight=36pt, foot=24pt, top=1in, bottom=1in, left=1in, right=1in, landscape]{geometry}
\usepackage{hyperref}
\usepackage{bookmark}
\usepackage{fancyhdr}
\usepackage{longtable}
\usepackage{multirow}
\usepackage{float}
\usepackage{booktabs}
\usepackage{tabularx}
\usepackage{microtype}
\usepackage{libertine}
\usepackage{parskip}
\usepackage{environ}
\usepackage{preview}
\usepackage[labelformat=empty]{caption}
\usepackage{amssymb}
\usepackage[usenames,dvipsnames,svgnames,table]{xcolor}
\usepackage{picture}
\usepackage{needspace}
\usepackage{adjustbox}
\usepackage{graphicx}
\pagestyle{fancy}
\raggedbottom
\renewcommand\familydefault{\sfdefault}
\newcommand{\helv}{%
\fontfamily{phv}\fontseries{m}\fontsize{8}{10}\selectfont}
\newcommand{\mycopyright}{\helv Copyright.}
\cfoot{\mycopyright}
\rhead{\textbf{\Sexpr{firstname} \Sexpr{lastname}} \\ \Sexpr{oafr} to \Sexpr{eoafr} \\ Page \thepage}
\renewcommand{\headrulewidth}{0.4pt}
\renewcommand{\footrulewidth}{0.4pt}
\fancypagestyle{fancytitlepage}
{
\fancyhf{}
\cfoot{\mycopyright}
\rhead{}
\renewcommand{\headrulewidth}{0pt}
}
\linespread{1.2}
\usepackage{sectsty}
\allsectionsfont{\sffamily}
\partfont{\centering}
\makeatletter
\newcommand{\sectbox}[1]{%
\noindent\protect\colorbox{gray!40}{%
\#tempdima=\hsize
\advance\#tempdima by-2\fboxsep
\advance\#tempdima by-2\fboxrule
\protect\parbox{\#tempdima}{%
\smallskip
% extra commands here
\centering
#1\smallskip
}}}
\newcommand{\subsectbox}[1]{%
\noindent\protect\colorbox{gray!20}{%
\#tempdima=\hsize
\advance\#tempdima by-2\fboxsep
\advance\#tempdima by-2\fboxrule
\protect\parbox{\#tempdima}{%
\smallskip
% extra commands here
#1\smallskip
}}}
\makeatother
\sectionfont{\sectbox}
\subsubsectionfont{\subsectbox}
\makeatletter
\newcommand\cellwidth{\TX#col#width}
\makeatother
\newlength\foo
\NewEnviron{recipe}{%
\begin{adjustbox}{minipage=\linewidth,gstore totalheight=\foo, gobble}
\BODY
\end{adjustbox}
\needspace{\foo}
\BODY%
}
<<setup, echo=FALSE >>=
opts_chunk$set(fig.path = tempfile(tmpdir="work",pattern=fname))
#
\hyphenpenalty=100000
\begin{document}
\raggedbottom
\setlength{\parskip}{0pt}
<<custom-dev,echo=FALSE>>=
wkld_pdf<-function(file,width,height) {
pdf(file,width=5,height=2)
}
#
<<wkld, echo=FALSE, results='asis',fig.align="center",dev="wkld_pdf",fig.ext="pdf">>=
if (!is.na(wkld.team) | !is.na(wkld.res)) {
g<-pltr$workload.chart(wkld.team,wkld.res,firstname)
print(g)
}
#
\end{document}
In the above example, the file work\fname61c28cd1a0awkld-1.pdf is correctly created, but the tex generated has:
{\centering \includegraphics[width=\maxwidth]{work\fname61c28cd1a0awkld-1}
}
and thus doesn't find it.
It appears that leaving out fileext works (and likely setting it to _pdf would as well) to remove the warning.
It was also necessary to replace the \ generated by tempfile with a / to prevent another warning from the generated includegraphics call, as somewhere in the chain, the \ was evaluated down to . This worked:
opts_chunk$set(fig.path = gsub('\\\\','/',tempfile(tmpdir="work",pattern=fname)))
Thank you for helping me track that down.

Decreasing space between commands and output in knitr chunks

I'm using knitr with LaTeX and there seems to be a lot of space between the commands echoed by a code chunk and the start of the output:
The LaTeX code for this looks like:
\begin{knitrout}\scriptsize
\definecolor{shadecolor}{rgb}{1, 1, 1}\color{fgcolor}\begin{kframe}
\begin{alltt}
\hlstd{> }\hlstd{lda_test_pred} \hlkwb{<-} \hlkwd{predict}\hlstd{(lda_fit,} \hlkwc{newdata} \hlstd{= seg_test)}
\hlstd{> }\hlkwd{library}\hlstd{(pROC)}
\hlstd{> }
\hlstd{> }\hlstd{lda_roc} \hlkwb{<-} \hlkwd{roc}\hlstd{(}\hlkwc{response} \hlstd{= seg_test}\hlopt{$}\hlstd{Class,}
\hlstd{+ } \hlkwc{predictor} \hlstd{= lda_test_pred}\hlopt{$}\hlstd{posterior[,} \hlstr{"PS"}\hlstd{],}
\hlstd{+ } \hlcom{## we need to tell the function that the _first_ level}
\hlstd{+ } \hlcom{## is our event of interest}
\hlstd{+ } \hlkwc{levels} \hlstd{=} \hlkwd{rev}\hlstd{(}\hlkwd{levels}\hlstd{(seg_test}\hlopt{$}\hlstd{Class)))}
\hlstd{> }\hlstd{lda_roc}
\end{alltt}
\begin{verbatim}
Call:
roc.default(response = seg_test$Class, predictor = lda_test_pred$posterior[, "PS"], levels = rev(levels(seg_test$Class)))
Data: lda_test_pred$posterior[, "PS"] in 346 controls (seg_test$Class WS) < 664 cases (seg_test$Class PS).
Area under the curve: 0.874
\end{verbatim}
\begin{alltt}
\hlstd{> }\hlcom{# plot(exRoc print.thres = .5)}
\end{alltt}
\end{kframe}
\end{knitrout}
The space is generated between the end of alltt and the start of verbatim. Part of the gap, for this example, is the blank line prior to the call output.
Any ideas on how to modulate this in knitr (without affecting any spacing between paragraphs etc)?
Follow the advice found here
control vertical space before and after verbatim environment?
and add the following lines to your document:
\usepackage{etoolbox}
\makeatletter
\preto{\#verbatim}{\topsep=0pt \partopsep=0pt }
\makeatother
For some more detail you can check this answer.

Resources