Input Chinese characters not correctly echoed in ESS - r

I had this weird encoding issue for my Emacs and R environment. Display of Chinese characters are all good with my .Rprofile setting Sys.setlocale("LC_ALL","zh_CN.utf-8"); except the echo of input ones.
> linkTexts[5]
font
"使用帮助"
> functionNotExist()
错误: 没有"functionNotExist"这个函数
> fire <- "你好"
> fire
[1] " "
As we can see, Chinese characters contained in the vector linkTexts, Chinese error messages, and input Chinese characters all can be perfectly shown, yet the echo of input characters were only shown as blank placeholders.
sessionInfo() is here, which is as expected given the Sys.setlocale("LC_ALL","zh_CN.utf-8"); setting:
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] zh_CN.utf-8/zh_CN.utf-8/zh_CN.utf-8/C/zh_CN.utf-8/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_3.96-1.1
loaded via a namespace (and not attached):
[1] compiler_2.15.2 tools_2.15.2
And I have no locale settings in the .Emacs file.
To me, this seems to be an Emacs encoding issue, but I just don't know how to correct it. Any idea or suggestion? Thanks.

Your examples work for me out of the box. You can set emacs process decoding/encoding with M-x set-buffer-process-coding-system. Once you figure out what encoding works (if any) you can make the change permanent with:
(add-hook 'ess-R-post-run-hook
(lambda () (set-buffer-process-coding-system
'utf-8-unix 'utf-8-unix)))
Replace utf-8-unix with your chosen encoding.
I am not very convinced that the above will help. LinkText in your example displays well, but fire does not, doesn't look like an emacs/ESS issue.

VitoshKa has made the perfectly correct suggestion. I just wanna add more of own findings here, as people may meet different but similar special character problems. Yet they can be solved in the same way.
The root cause is the input encoding setting to the current buffer process. As shown by the M-x describe-current-coding-system command, default buffer process encoding setting was good for output (utf-8-unix) but deteriorated for input:
Coding systems for process I/O:
encoding input to the process: 1 -- iso-latin-1-unix (alias: iso-8859-1-unix latin-1-unix)
decoding output from the process: U -- utf-8-unix (alias: mule-utf-8-unix)
Changing the coding system for input into utf-8-unix, either by 'M-x set-buffer-process-coding-system' or adding the ess-post-run-hook into .emacs like suggested by VitoshKa, would suffice for solving the Chinese character display problem.
The other problem people may meet due to this setting is special character in ESS. When trying to input special characters, you may get the error message, 错误: 句法分析器%d行里不能有多字节字符
, or invalid multibyte character in parser at line %d in English.
> x <- data.frame(part = c("målløs", "ny"))
错误: 句法分析器1行里不能有多字节字符
And with the correct utf-8-unix setting for input coding system of buffer process, the above error for special characters disappears.

Related

Erratic behaviour of traceback() with Rstudio: different output every time

I use RStudio; I have script1 importing script2 with source ; a function in script2 causes an error. The first time it happens, Rstudio tells me that script 1 at row x caused an error in script2 at row y. If I rerun it, it only tells me about the error in script2. Why? This can make debugging much more painful than it needs to be.
More in detail:
I have a myfun.R file which contains this function (of course this is just a toy example):
sum_2_nums <- function(x,y) {
out <- x + y
return(out)
}
I then import the function into another file using source:
source("myfun.R")
mysum <- sum_2_nums(3,"hello")
RStudio is set to debug -> on error -> error inspector. When I run the code above, I see:
which tells me that an error in myfun.R, row 12 was caused by try_traceback.R, row 13. Great.
However, if I run the script again, I get:
i.e. the error no longer gets traced back to try_traceback.R. I need to type traceback() in the console to see that. Why? The different behaviour the second time really puzzles me. This makes debug needlessly more painful than it needs to be! Is there a way to avoid it?
Note this questions is not a duplicate of this: they may look similar but the answer given there of using echo=TRUE or verbose=TRUE does not address my point about tracing the error to the first .R file.
I saw the same question here, but it remains unanswered.
Edit
To clarify, in answer to some comments:
like I said, if I click on Debug -> on Error -> I see that "error inspector" is ticked.
if I type options(error=function()traceback(1)) in the console, nothing happens on screen, but "error inspector" becomes deselected
I do not have anything else in my code. It is a toy example with only the lines I have shown, and nothing else. I do not know what else I can clarify - any guidance would be most appreciated.
I am very new to R. I use Python for data analysis but am curious to understand about R. I had heard that debugging was much more cumbersome in R and wanted to try for myself. Let's say I am not going to invest a lot of my time learning R if even only tracing back such a banal error is so problematic, so I'd like to understand if I am doing something wrong, or debugging and tracing back are always like this in R
when I say "run" I mean I click the "source" button in RStudio (the button next to "run"
sessionInfo() shows:
sessionInfo() R version 3.5.3 (2019-03-11) Platform:
x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.3 tools_3.5.3 yaml_2.2.0
Edit #2 (in reply to duckmayr's answer)
Let's maybe take a step back. In Python I am used to always seeing a rather detailed traceback of what called what when there is an error. E.g. when I see something like this, I find it very helpful:
The first screenshot I posted of R studio is quite similar, and similarly helpful. So my expected output would be to get the full traceback every single time the error occurs.
Why? Because that's most helpful to me; what would possibly be the advantages of NOT showing it every time? I don't get it.
Are you instead saying that it is by design that the full traceback is not listed every time? If so:
Why? Is there a fundamental reason I am missing? Is there any scenario where this makes more sense than what I am used to with Python?
Is this documented anywhere? I couldn't find it.
Is there a way to get the full traceback every single time? A setting to change? Would exiting and reopening RStudio do?
Yes, I understand I can always type traceback(), but Python's behaviour of showing the full traceback every single time is much more convenient - I genuinely fail to see what the upside of showing the traceback only the first time would be.
I find it somewhat difficult to get precisely what behavior you are wanting RStudio to give you, but it may help you if I post some information about what is expected.
The full traceback is not listed every time: In my experience, RStudio only automatically prints the full traceback the first time a function is executed in its current form. If you change your code, then the very next call should also automatically print the full traceback. This behavior may or may not be automatically configurable. However, more importantly:
The R functions traceback() and debug() are always available to you, regardless of RStudio's settings! So, the first time I source your script, I see:
Then if I source your script with nothing else changed, I get:
> source('~/try_traceback.R')
Error in x + y : non-numeric argument to binary operator
But, this is no problem; I can just run:
> traceback()
5: sum_2_nums(3, "hello") at try_traceback.R#2
4: eval(ei, envir)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("~/try_traceback.R")
I could also use debug(sum_2_nums) to re-run the function with debug.
Additionally, if I change sum_2_nums() in myfun.R to the following:
sum_2_nums <- function(x,y) {
cat("")
out <- x + y
return(out)
}
I again see
when I try to source your script.
So you may disagree, but I do not find debugging to be difficult in R, just remember the functions traceback() and debug() and you'll be alright.

Non-english special characters in knitr

I am using knitr 1.1. in R 3.0.0 and within WinEdt (RWinEdt 2.0). I am having problems with knitr recognizing Swedish characters (ä, ö, å). This is not an issue with R; those characters are even recognized in file names, directory names, objects, etc. In Sweave it was not a problem either.
I already have \usepackage[utf8]{inputenc} in my document, but knitr does not seem able to handle the special characters. After running knitr, I get the following message:
Warning in remind_sweave(if (in.file) input) :
It seems you are using the Sweave-specific syntax; you may need Sweave2knitr("deskriptiv 130409.Rnw") to convert it to knitr
processing file: deskriptiv 130409.Rnw
(*) NOTE: I saw chunk options "label=läser_in_data"
please go to http://yihui.name/knitr/options (it is likely that you forgot to
quote "character" options)
Error in parse(text = str_c("alist(", quote_label(params), ")"), srcfile = NULL) :
1:15: unexpected input
1: alist(label=lä
^
Calls: knit ... parse_params -> withCallingHandlers -> eval -> parse
Execution halted
The particular label it complains about is label=läser. Changing the label is not enough, since knitr even complains if R objects use äåö.
I used Sweave2knitr() since the file originally was created for Sweave, but the result was not better: now all äåö have been transformed to äpåö, both in the R chunks and in the latex text, and knitr still gives an error message.
Session info:
R version 3.0.0 (2013-04-03)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=Swedish_Sweden.1252 LC_CTYPE=Swedish_Sweden.1252 LC_MONETARY=Swedish_Sweden.1252
[4] LC_NUMERIC=C LC_TIME=Swedish_Sweden.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.1
loaded via a namespace (and not attached):
[1] digest_0.6.3 evaluate_0.4.3 formatR_0.7 stringr_0.6.2 tools_3.0.0
As I mentioned there are file names and objects with Swedish characters (since that has not been a problem before), and also the text needs to be in Swedish.
Thank you for any help in getting knitr to work outside of English.
I think you have to contact the maintainer of the R-Sweave mode in WinEdt if you are using this mode to call knitr. The issue is WinEdt has to pass the encoding of the file to knit() if you are not using the native encoding of your OS. You mentioned UTF-8 but that is not the native encoding for Windows, so you must not use \usepackage[utf8]{inputenc} unless you are sure your file is UTF8-encoded.
There are several problems mixed up here, and it is unlikely to solve them all with a single answer.
The first problem is label=läser, which really should be label='läser', i.e. you must quote all the chunk labels (check other labels in the document as well); knitr tries to automatically quote your labels when you write <<foo>>= (it is turned to <<'foo'>>=), but this does not work when you use <<label=foo>>= (you have to write <<label='foo'>>= explicitly). But this problem is perhaps not essential here.
I think the real problem here is the file encoding (which is nasty under Windows). You seem to be using UTF-8 under a system that does not respect UTF-8 by default. In this case you have call knit('yourfile.Rnw', encoding = 'UTF-8'), i.e. pass the encoding to knit(). I do not use WinEdt, so I have no idea how to do that. You can hard-code the encoding in the configurations, but that is not recommended.
Two suggestions:
do not use UTF-8 under Windows; use your system native encoding (Windows-1252, I guess) instead;
or use RStudio instead of WinEdt, which can pass the encoding to knitr;
BTW, since Sweave2knitr() was popped up, there must be other problems in your Rnw document. To diagnose the problem, there are two ways to go:
if you use UTF-8, run Sweave2knitr('deskriptiv 130409.Rnw', encoding = 'UTF-8')
if you use the native encoding of your OS, just run Sweave2knitr('deskriptiv 130409.Rnw')
Please read the documentation if you have questions about the diagnostic information printed out by Sweave2knitr().
R-Sweave invokes knitr through the knitr.edt macro, which itself uses the code in knitrSweave.R to launch knit. The knitcommand in this later script is near the top and reads res <- knit(filename).
Following Yihui's suggestion, you can try to replace this command with
res <- knit(filename, encoding = 'UTF-8')
The knitr.edt and knitrSweave.R files should be in your %b\Contrib\R-Sweave folder, where %b is your winEdt user folder (something like "C:\Users\userA\AppData\Roaming\WinEdt Team\WinEdt 7" under Win 7).
Currently, I do not know how we could pass the encoding as an argument to avoid this hard coding solution.
I would suggest to avoid extended characters in file names which can only be sources of problems. Personally, I never use such names.

Encoding: knitr and child files

I am using Windows 7, R2.15.3 and RStudio 0.97.320 with knitr 1.1. Not sure what my pandoc version is, but I downloaded it a couple of days ago.
sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Spanish_Argentina.1252 LC_CTYPE=Spanish_Argentina.1252 LC_MONETARY=Spanish_Argentina.1252
[4] LC_NUMERIC=C LC_TIME=Spanish_Argentina.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.15.3
I would like to get my reports both in html and Word, so I'm using markdown and pandoc.
I write in spanish with accents on vowels and tildes on the n: á-ú and ñ.
I have read many posts and I see problems similar to the one I'm having have been solved with new versions of knitr. But there is one issue I haven't found a solution for.
When I started, I used the 'system default' encoding that appears in the RStudio dialog, i.e. ISO 8859-1, and the RStudio previews worked great. However when I tried to get Word documents, pandoc choked on the accentuated vowels. I found a post showing how to solve this using iconv:
iconv -t utf-8 "myfile.md" | pandoc -o "myfile.docx"| iconv -f utf-8
While this did solve pandoc's unrecognized utf-8 characters complaints, for some reason pandoc stops finding my plots, with an error like this one:
pandoc: Could not find image `figure/Parent.png', skipping...
If I use only non-accent characters, pandoc finds the images with no problems. I looked at the two .md files with an hex editor, and I can't see any difference when I compare the sections that handle the figures:
![plot of chunk Parent](figure/Parent.png)
although obviously the accentuated characters are completely different... I have verified that the image files do exist in the figure folder
Anyway, after reading many posts I decided to set RStudio to use UTF-8 encoding. With only one level of files things work great. For example, I can -independently- knit and then pandoc into Word the following 2 Rmd files:
Parent - SAVED WITH utf-8 encoding in RStudio
========================================================
u with an accent: "ú" SAVED WITH utf-8 encoding in RStudio
```{r fig.width=7, fig.height=6}
plot(cars, main='Parent ú')
```
and separately:
Child - SAVED WITH utf-8 encoding in RStudio
========================================================
u with an accent: "ú" Child file
```{r fig.width=7, fig.height=6}
plot(cars, main='One File Child ú')
```
and I get both 2 perfect prevues in RStudio and 2 perfect Word documents from pandoc.
The problem arises when I try to call the child part from the parent part. In other words, if I add to the first file the following lines:
```{r CallChild, child='TestUTFChild.Rmd'}
```
then all the accents in the child file become garbled as if the UTF-8 was beeing interpreted as ISO 8859-1. Pandoc stops reading the file as well, complaining it's not utf-8.
If anybody could point me in the right direction, either:
1. With pandoc not finding the plots if I stay with ISO 8859-1. I have also tried Windows-1252 because it's what I saw in the sessionInfo, but the result is the same.
or
2. With the call to the child file, if UTF-8 is the way to go. I have looked for a way of setting some option to force the encoding in the child call, but I haven't found it yet.
Many thanks!
I think this problem should be fixed in the latest development version. See instructions in the development repository on how to install the devel version. Then you should be able to choose UTF-8 in RStudio, and get a UTF-8 encoded output file.
Just in case anyone is interested in the gory details: the reason for the failure before was that I wrote the child output with the encoding you provided, but did not read it with the same encoding. Now I just avoid writing output files for child documents.

Mathematical notion/superscripts in Rd files

I'm working on filling in an Rd file for a function.
When I use \eqn{2^{x}} in the Details section, then build and install the package, there is no superscripted exponent.
Looking at R-exts.pdf, it points to Poisson.Rd as an example on how to use \eqn or \deqn. In the example in that file, there is a superscripted exponent.
When I look at the help file for Poisson (?Poisson), There are no superscripted exponents.
Is this an issue on my computer or is this standard behavior?
Thanks!
> sessionInfo()
R version 2.11.1 (2010-05-31)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] cimis_0.1-3 RLastFM_0.1-4 RCurl_1.4-2 bitops_1.0-4.1 XML_3.1-0 lattice_0.18-8
loaded via a namespace (and not attached):
[1] grid_2.11.1 tools_2.11.1
Nowadays, people mainly use the HTML help.
To get the superscript in the HTML help as well as in the PDF help, do:
\ifelse{html}{\out{2<sup>x</sup>}}{\eqn{2^x}}
The syntax is:
\ifelse{html}{\out{HTML CODE}}{\eqn{LATEX-LIKE CODE}{ASCII}}
with {ASCII} optional.
You don't say where you looked to see if there was a superscripted exponent. I presume the text based help, not the PDF version of the manual?
The syntax for the \eqn macro is \eqn{latex}{ascii}. The {ascii} bit is optional, in which case R will do it's best to render the LaTeX version. Conventionally, subscripts in ASCII would be wrapped in [] and superscipts with ^.
So I would write:
\eqn{2^{x}}{2^x}
But in all practical senses these are the same. The issue is just that the text help can't display superscipts, but the PDF can.

invalid multibyte character crashes when script is loaded from source (umlauts / special characters)

EDIT:
Thx to suggestions from the mailing list I realized that the problem I got has nothing to do with Sweave or Latex. It´s some Mac OS X related issue. Whenever I run my script by selecting all and sending it to R it works.
When I use
source("myplainRcode.R")
i get the error message stated below
finally I got sweave working together with ggplot2 on my Mac OS X. I invoke Sweave inside R with
Sweave("myfile.Rnw")
which creates the desired latex output. Now that the basic tests work, I try to source my real world file and it crashes at the following line:
gl_bybranch = ddply(new_wans,.(period,Branchen),
function(X)data.frame(Geschäftslage=mean(X$sentiment)))
I guess it has either to do with the ".(period...)" or the "ä" . Unfortunately I can't change these labels because they are also used in legends. So, somewhere in my code these ugly umlauts will appear. Is there a way to escape them in Sweave? I can't believe that this is problem since Sweave is written by a German who probably have second most umlaut characters (behind Turkey).
The error message I get is: invalid multibyte character in Parser on line 195
Thx for any ideas in advance!
YAY ! i got it. Sorry for the noise everybody. I switched all three files (.Rnw, mysource.R , invokeSweave.R) to UTF-8 it finally worked. So everybody who works with Komodo on a Mac make sure to change your default encoding to UTF-8 !

Resources