How does R (RGui) parse multiline character strings?

How does R (RGui) parse multiline character strings? - r

The RGui (Windows; R version 3.5.3) appears to ignore tab characters that occur at the beginning of a line within a character string (press CTRL+R over the lines of code):
# REPLACE "<TAB>" WITH AN ACTUAL TAB CHARACTER TO GET THE CODE INTENDED BELOW.
foo <- 'LINE1
<TAB>LINE2
<TAB>LINE3
'
foo
# [1] "LINE1\nLINE2\nLINE3\n"
longstring <- removetabsatbeginningoflines('
<TAB>Sometimes I have really long strings that I format
<TAB>so that they read nicely (not with too long of a
<TAB>line length). Tabs at the beginning of the lines
<TAB>within a string preserve my code indenting scheme
<TAB>that I use to make the code more readable. If the
<TAB>tabs are not removed automatically by the parser,
<TAB>then I need to wrap the string in a function that
<TAB>removes them.')
The tab characters are preserved when the above code is source'd from a file.
Why doesn't RGui keep the tab characters?
Where is this behavior documented?
What other non-intuitive, related behaviors does RGui have with regard to parsing (multiline) strings?

I can't find anywhere that this is documented, so this is a worthwhile question to publicly answer.
If you are using RGui's built-in 'R Editor', then all Tab characters entered via the Tab key, or already existing in a text file that you have opened into the 'R Editor', will not be respected when submitting using Ctrl-R (This is difficult to represent here in an example given that tabs are stripped from answers).
I imagine the 'R Editor' is not meant to be used for serious code editing, and you may be better off using a dedicated IDE (e.g. RStudio) or more full-featured editor (e.g. Emacs, Notepad++).
You can route around this issue in RGui by manually replacing tabs as \t when editing in RGui, but this may not be appropriate if you want to keep actual Tabs in your file. Tabs will also be correctly processed when using source() to directly run the code stored in the text file.

Related

RStudio - some non-standard special characters in the R Script change by themselves

I occasionally work with data frames where unorthodox special characters are used that look identical to standard characters in RStudio's in-built viewing functionality. I refer to these characters in my scripts, but sometimes when I open the file, these characters have been changed to standard keyboard characters within the script.
For example, in my script, ’ changes to a standard apostrophe ' and – changes to a standard hyphen -.
These scripts are ones I have to run regularly, so having to manually correct this each time is a chore. I also haven't worked out what it is that triggers RStudio to make these changes. I've tried closing and reopening to test if that's the trigger, and the characters have remained correct. It only seems to happen after I've turned off my computer.
Does anyone know of a workaround for this and/or what is causing this? TIA
EDIT: the reason I need to do this is I export to csv which is UTF-8 encoded.

I've found a workaround, although I welcome any feedback on any drawbacks to this.
If you have already written your code (including the special characters):
Click File > Save with Encoding... > Show all encodings > unicodeFFFE
Now when you reopen the file:
Click File > Reopen with Encoding... > Show all encodings > unicodeFFFE
If you haven't already written your code, it should just be a case of saving your file from the start with the unicodeFFFE encoding (instructions above) before you write the code and then using the reopen with encoding option whenever you open the file.

RStudio - Weird characters becoming regular characters

I'm working with a messy database, in which I need to give format to some columns of data. For this, I use a lot of GSub and other forms of regular expressions. My problem is some of the characters I need to clean are "weird" characters, specially the A with the curly thing above followed by other weird character (Ã‘).
When I copy from the database and then paste on my gsub function:
gsub("CALLÃ‘E", "CALLE", data)
It works fine until I close and RStudio and reopen it. Then the characters are different in the RScript file. It is as if RStudio didn't support weird characters itself, and removes them from the Scripts when they are reopened:
gsub("CALLÃ'E", "CALLE", data)
How can I avoid this? And keep my weird characters even after closing the file.

In RStudio, go to File -> Save with Encoding...
Select UTF-8 option.

UTF-8 with R Markdown, knitr and Windows

What?
An .Rmd file is error-free rendered via knitr (or rmarkdown) within from Linux. Related material (i.e. child R scripts and CSV input data) is all set in UTF-8.
Executing the same script from within Windows (actually the script is inside a cloned git repository), does not render all characters cleanly, since it's set to Windows-1252.
Examples
For example, the string "sans réserves", sourced from a CSV into some data.frame's column content, is typeset as "sans rÃ©serves". To read this one correctly, it suffices to add encoding='UTF-8' to read.csv, obviously while reading-in the data.
Another example, that concerns an entry among other R code lines, is the string "Trésorier Général". It is typeset as "TrÃ©sorier GÃ©nÃ©ral". Fortunately, the following advice
read_chunk(lines = readLines("TestSpanishText.R", encoding = "UTF-8"))
taken from https://stackoverflow.com/a/15714617/1172302, works and the string is rendered as expected.
Related
[Update] There are some related Q&As, but they are more than 2-3 years old. As well, this page https://support.rstudio.com/hc/en-us/articles/200532197-Character-Encoding, points to the very issue.
Questions
Is there another, easier way to overcome this issue regarding UTF-8 and Windows, inside R? Recommendations on how to approach such a problem? I am trying to follow a one source for all principle.
ps- An interesting reading: https://superuser.com/a/221602/128768

Track the exact place of a not encoded character in an R script file

more a tip question that can save lots of time in many cases. I have a script.R file which I try to save and get the error:
Not all of the characters in ~/folder/script.R could be encoded using ASCII. To save using a different encoding, choose "File | Save with Encoding..." from the main menu.
I was working on this file for months and today I was editing like crazy my code and got this error for the first time, so obviously I inserted a character that can not be encoded while I was working today.
My question is, can I track and find this specific character and where exactly in the document is?
There are about 1000 lines in my code and it's almost impossible to manually search it.

Use tools::showNonASCIIfile() to spot the non-ascii.

Let me suggest two slight improvements this.
Process:
Save your file using a different encoding (eg UTF-8)
set a variable 'f' to the name of that file. something like this f <- yourpath\\yourfile.R
Then use tools::showNonASCIIfile(f) to display the faulty characters.
Something to check:
I have a Markdown file which I run to output to Word document (not important).
Some of the packages I used to initialise overload previous functions. I have found that the warning messages sometimes have nonASCII characters and this seems to have caused this message for me - some fault put all that output at the end of the file and I had to delete it anyway!
Check where characters are coming back from Warnings!
Cheers

Expanding the accepted answer with this answer to another question, to check for offending characters in the script currently open in RStudio, you can use this:
tools::showNonASCIIfile(rstudioapi::getSourceEditorContext()$path)

How to use UTF-8 characters in dygraph title in R

Using Rstudio [Windows8], when I use the dygraph function to plot a time series, I have a problem when trying to use UTF-8 characters in the main title.
library(dygraphs)
dygraph(AirPassengers, main = "Título")
This results in a title: "T?tulo"
I have tried to convert "Título" to the utf-8 enconding, but it doesn't work.

You can use enc2utf8.
dygraph(AirPassengers, main = enc2utf8("Título"))

You need to make sure your locale settings support the character that you want to use, and that the file is saved with the right encoding. Saving as UTF-8 worked for me.
I was able to replicate your situation in Windows 7 and tried a bunch of things. Embedded in Rmarkdown, here is a minimal working example.
```{r}
Sys.setlocale("LC_ALL","German")
#note that windows locale names are different from unix & mac, usually
#the name of nationality works here.
#also works with "Faroese", "Hungarian", and others who have this letter.
#the locale has to be set in a preceding block to take effect.
```
```{r}
Encoding("Título")
library(dygraphs)
dygraph(AirPassengers, main = "Título")
```
You can try out the encoding given to the title to have with Encoding(). Languages like Faroese, Hungarian, and German encode "Título" as latin or unknown, both of which seem to cause no problems for dygraph's javascript. UTF-8 wrote it as <U+00ED> which was a problem for the javascript, as well as some, but not all other functions. With a matching locale, converting to utf-8 as #Michele recommended has the same result.
Also, if you don't have the title in many places, it is possible to just manually find and replace the title in the html/javascript file that is made. The problem occurs on conversion, but if the file is already made, the title variable can be successfully changed. The letter still has a question mark in Rstudio "Viewer" output, but I recommend making the entire file for javascript regularly, as I've seen other functions malfunction in the viewer window.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex