Using Rstudio [Windows8], when I use the dygraph function to plot a time series, I have a problem when trying to use UTF-8 characters in the main title.
library(dygraphs)
dygraph(AirPassengers, main = "Título")
This results in a title: "T?tulo"
I have tried to convert "Título" to the utf-8 enconding, but it doesn't work.
You can use enc2utf8.
dygraph(AirPassengers, main = enc2utf8("Título"))
You need to make sure your locale settings support the character that you want to use, and that the file is saved with the right encoding. Saving as UTF-8 worked for me.
I was able to replicate your situation in Windows 7 and tried a bunch of things. Embedded in Rmarkdown, here is a minimal working example.
```{r}
Sys.setlocale("LC_ALL","German")
#note that windows locale names are different from unix & mac, usually
#the name of nationality works here.
#also works with "Faroese", "Hungarian", and others who have this letter.
#the locale has to be set in a preceding block to take effect.
```
```{r}
Encoding("Título")
library(dygraphs)
dygraph(AirPassengers, main = "Título")
```
You can try out the encoding given to the title to have with Encoding(). Languages like Faroese, Hungarian, and German encode "Título" as latin or unknown, both of which seem to cause no problems for dygraph's javascript. UTF-8 wrote it as <U+00ED> which was a problem for the javascript, as well as some, but not all other functions. With a matching locale, converting to utf-8 as #Michele recommended has the same result.
Also, if you don't have the title in many places, it is possible to just manually find and replace the title in the html/javascript file that is made. The problem occurs on conversion, but if the file is already made, the title variable can be successfully changed. The letter still has a question mark in Rstudio "Viewer" output, but I recommend making the entire file for javascript regularly, as I've seen other functions malfunction in the viewer window.
Related
Is there a set of best practices or documentation for working with Unicode in knitr and Rmarkdown? I can't seem to get any glyphs to show up properly when knitting a document.
For example, this works in the console (in Rstudio):
> cat("\U2660 \U2665 \U2666 \U2663")
♠ ♥ ♦ ♣
But when knitting I get this:
HTML
Word
It looks like an encoding issue specific to Windows, and may be related to this issue: https://github.com/hadley/evaluate/issues/59 Unfortunately we have to wait for a fix in base R, but if you don't have to use cat(), and this expression is a top-level expression in your code chunk (e.g. not inside a for-loop or if-statement), I guess this may work:
knitr::asis_output("\U2660 \U2665 \U2666 \U2663")
It passes the character string directly to knitr and bypasses cat(), since knitr cannot reliably catch multibyte characters written out by cat() on Windows -- it depends on whether the characters can be represented by your system's native encoding.
For anyone else who came across this after trying to get emoji support in Rstudio/Rmarkdown documents, another possible issue is that if the file encoding isn't set to UTF-8, the resulting compiled document won't support emojis either.
In order for emoji to work in Rmarkdown, you must change the file encoding of the Rmd document. Go to File -> Reopen with encoding, then select UTF-8.
Once you have ensured the file is open in UTF-8 encoding, you should be able to compile with emoji support.
You should even be able to paste emoji from a browser directly into the document. 😺
It is probably a good idea to change the default encoding for all files to UTF-8 so that you don't have to deal with this issue again.
Unicode: Inline
Phew, that was close `r knitr::asis_output("\U1F605 \U2660 \U2665 \U2666 \U2663")`
Unicode: Block
```{r, echo=FALSE}
knitr::asis_output("Phew, that was close \U1F605 \U2660 \U2665 \U2666 \U2663")
```
The emo package
Unfortunately, this package isn't yet on CRAN, but it can be installed with devtools::install_github("hadley/emo")
emo::ji("face")
There are some more examples here
I have imported several .txt files (texts written in Spanish) to RStudio using the following code:
content = readLines(paste("my_texts", "text1",sep = "/"))
However, when I read the texts in RStudio, they contain codes instead of diacritics. For example, I see the code <97> instead of an "ó" or the code <96> instead of an "ñ".
I have realized also that if the .txt file was originally written using a computer configured in Spanish, I don't see the codes but the actual diacritics. And if the texts were written using a a computer configured in English, then I do get the codes (even though when opening the .txt file on TextEdit I see the diacritics).
I don't know why R displays those symbols and what I can do to retain the diacritics I see in the original .txt files.
I read I could possibly solve this by changing the encoding to UTF-8, so I tried this:
content = readLines(paste("my_texts", "text1",sep = "/"), encoding = "UTF-8")
But that didn't work. Any ideas what those codes are and how to keep my diacritics?
As you figured out, you need to set the correct encoding. Unfortunately the text file was written using a legacy encoding rather than UTF-8 — namely, MacRoman. Ideally the application producing the file would not use this encoding, and Apple products by default no longer produce it.
But since this is what you’ve got, we have to deal with it, and we can. But unfortunately we need to go a detour because the encoding argument of readLines is a bit useless. Instead, we need to manually open a file connection:
con = file(file.path("my_texts", "text1"), encoding = "macintosh")
on.exit(close(con)) # Always make sure to close connections!
contents = readLines(con)
Do note that the encoding name “macintosh” is strictly speaking not portable, so this might not work on all platforms.
The RGui (Windows; R version 3.5.3) appears to ignore tab characters that occur at the beginning of a line within a character string (press CTRL+R over the lines of code):
# REPLACE "<TAB>" WITH AN ACTUAL TAB CHARACTER TO GET THE CODE INTENDED BELOW.
foo <- 'LINE1
<TAB>LINE2
<TAB>LINE3
'
foo
# [1] "LINE1\nLINE2\nLINE3\n"
longstring <- removetabsatbeginningoflines('
<TAB>Sometimes I have really long strings that I format
<TAB>so that they read nicely (not with too long of a
<TAB>line length). Tabs at the beginning of the lines
<TAB>within a string preserve my code indenting scheme
<TAB>that I use to make the code more readable. If the
<TAB>tabs are not removed automatically by the parser,
<TAB>then I need to wrap the string in a function that
<TAB>removes them.')
The tab characters are preserved when the above code is source'd from a file.
Why doesn't RGui keep the tab characters?
Where is this behavior documented?
What other non-intuitive, related behaviors does RGui have with regard to parsing (multiline) strings?
I can't find anywhere that this is documented, so this is a worthwhile question to publicly answer.
If you are using RGui's built-in 'R Editor', then all Tab characters entered via the Tab key, or already existing in a text file that you have opened into the 'R Editor', will not be respected when submitting using Ctrl-R (This is difficult to represent here in an example given that tabs are stripped from answers).
I imagine the 'R Editor' is not meant to be used for serious code editing, and you may be better off using a dedicated IDE (e.g. RStudio) or more full-featured editor (e.g. Emacs, Notepad++).
You can route around this issue in RGui by manually replacing tabs as \t when editing in RGui, but this may not be appropriate if you want to keep actual Tabs in your file. Tabs will also be correctly processed when using source() to directly run the code stored in the text file.
What?
An .Rmd file is error-free rendered via knitr (or rmarkdown) within from Linux. Related material (i.e. child R scripts and CSV input data) is all set in UTF-8.
Executing the same script from within Windows (actually the script is inside a cloned git repository), does not render all characters cleanly, since it's set to Windows-1252.
Examples
For example, the string "sans réserves", sourced from a CSV into some data.frame's column content, is typeset as "sans réserves". To read this one correctly, it suffices to add encoding='UTF-8' to read.csv, obviously while reading-in the data.
Another example, that concerns an entry among other R code lines, is the string "Trésorier Général". It is typeset as "Trésorier Général". Fortunately, the following advice
read_chunk(lines = readLines("TestSpanishText.R", encoding = "UTF-8"))
taken from https://stackoverflow.com/a/15714617/1172302, works and the string is rendered as expected.
Related
[Update] There are some related Q&As, but they are more than 2-3 years old. As well, this page https://support.rstudio.com/hc/en-us/articles/200532197-Character-Encoding, points to the very issue.
Questions
Is there another, easier way to overcome this issue regarding UTF-8 and Windows, inside R? Recommendations on how to approach such a problem? I am trying to follow a one source for all principle.
ps- An interesting reading: https://superuser.com/a/221602/128768
I am reading in data from a web site, with text identifying each row. I simply copied and pasted the data into Excel, and the file is then read by R. One of these rows contains the name of a German city, "Würzburg", which includes a lower case u with an umlaut. I have no problem seeing the special character on the web or on Excel. The problem is, when this word is passed to ggplot2, it is displayed in the plot as "WÃzburg", with tilde over the capital A. RStudio shows both forms depending on the area in which it is displayed. I would assume that ggplot2 uses a different language for interpreting the special characters.
Is there a way to tell ggplot how to read, interpret and display the special characters? I do not want to write specialized code just for this city, but to solve the problem in general. I am likely to encounter other characters as the data expands over time.
I encountered a similar error with ggplot2, when I used a hardcoded data.frame (i.e., I would write Großbritannien (Great Britain) and it would get encoded to some gibberish).
My solution was to include
Sys.setlocale("LC_ALL", "German")
options(encoding = "UTF-8")
in the beginning of the script.
Read the file in as follows
library('data.table')
fread('path_to_file', ..., encoding = 'UTF-8')
My solution to this problem is switching to cairo for pdf plotting. All special characters are shown properly by the ggplot2. It is enough to put this line of code among the knitr settings:
knitr::opts_chunk$set(dev='cairo_pdf')