Corrupted Rmarkdown script: How can I get the Cyrillic characters back? - r

I was working with a script with lots of Cyrillic characters (throughout chunks and out of them) for weeks. One day I have opened a new Rmarkdown script where I wrote English, while the other document is still in my R session. Afterwards, I have returned to the Cyrillic document and everything written turns to something like this 8 иÑлÑ 1995 --> ÐлаÑÑÑ - наÑодÑ
The question is: Where is the source of problem? And, how can the corrupted script turn to its original form (with the Cyrillic characters)?
UPDATE!!
I have tried reopeining the Rstudio scrip with encoding CP1251, CP1252, windows1251 and UTF8, but it does not work. Certaintly the weird symbols change to another weird symbols. The problem is that I have saved the document with the default encoding CP1251 and windows1251) at the very begining.
Solution:
If working with cyrillic and lating characters, be sure you save the Rstudio script with UTF-8 encoding always, when you computer is windows (I do not know mac). If you close the script and open it again, re-open the file with UTF8 encoding.

Assuming you're using RStudio: Open your *.Rmd file and then try to reopen it "with encoding". Therefore simply use the File-Menu as shown below.
Select "Show all encodings" and choose your specific encoding, I suggest windows-1251 for cyrillic encoding:
Note: Apparently the issue can also occur while at the one time opening the *.Rmd file as "standalone" and at the other time from within an R Project.
Hope that would help.

Related

RStudio - some non-standard special characters in the R Script change by themselves

I occasionally work with data frames where unorthodox special characters are used that look identical to standard characters in RStudio's in-built viewing functionality. I refer to these characters in my scripts, but sometimes when I open the file, these characters have been changed to standard keyboard characters within the script.
For example, in my script, ’ changes to a standard apostrophe ' and – changes to a standard hyphen -.
These scripts are ones I have to run regularly, so having to manually correct this each time is a chore. I also haven't worked out what it is that triggers RStudio to make these changes. I've tried closing and reopening to test if that's the trigger, and the characters have remained correct. It only seems to happen after I've turned off my computer.
Does anyone know of a workaround for this and/or what is causing this? TIA
EDIT: the reason I need to do this is I export to csv which is UTF-8 encoded.
I've found a workaround, although I welcome any feedback on any drawbacks to this.
If you have already written your code (including the special characters):
Click File > Save with Encoding... > Show all encodings > unicodeFFFE
Now when you reopen the file:
Click File > Reopen with Encoding... > Show all encodings > unicodeFFFE
If you haven't already written your code, it should just be a case of saving your file from the start with the unicodeFFFE encoding (instructions above) before you write the code and then using the reopen with encoding option whenever you open the file.

How can I display UTF-8 characters in knitr chunk outputs? [duplicate]

Is there a set of best practices or documentation for working with Unicode in knitr and Rmarkdown? I can't seem to get any glyphs to show up properly when knitting a document.
For example, this works in the console (in Rstudio):
> cat("\U2660 \U2665 \U2666 \U2663")
♠ ♥ ♦ ♣
But when knitting I get this:
HTML
Word
It looks like an encoding issue specific to Windows, and may be related to this issue: https://github.com/hadley/evaluate/issues/59 Unfortunately we have to wait for a fix in base R, but if you don't have to use cat(), and this expression is a top-level expression in your code chunk (e.g. not inside a for-loop or if-statement), I guess this may work:
knitr::asis_output("\U2660 \U2665 \U2666 \U2663")
It passes the character string directly to knitr and bypasses cat(), since knitr cannot reliably catch multibyte characters written out by cat() on Windows -- it depends on whether the characters can be represented by your system's native encoding.
For anyone else who came across this after trying to get emoji support in Rstudio/Rmarkdown documents, another possible issue is that if the file encoding isn't set to UTF-8, the resulting compiled document won't support emojis either.
In order for emoji to work in Rmarkdown, you must change the file encoding of the Rmd document. Go to File -> Reopen with encoding, then select UTF-8.
Once you have ensured the file is open in UTF-8 encoding, you should be able to compile with emoji support.
You should even be able to paste emoji from a browser directly into the document. 😺
It is probably a good idea to change the default encoding for all files to UTF-8 so that you don't have to deal with this issue again.
Unicode: Inline
Phew, that was close `r knitr::asis_output("\U1F605 \U2660 \U2665 \U2666 \U2663")`
Unicode: Block
```{r, echo=FALSE}
knitr::asis_output("Phew, that was close \U1F605 \U2660 \U2665 \U2666 \U2663")
```
The emo package
Unfortunately, this package isn't yet on CRAN, but it can be installed with devtools::install_github("hadley/emo")
emo::ji("face")
There are some more examples here

How to handle encodings in knitr spin_child

Following recommendations on https://yihui.name/en/2018/11/biggest-regret-knitr, I started saving my .R files with UTF-8 encoding (using RStudios "Save with encoding"). Seems to work well, until it comes to using spin_child.
Under Windows, for example I have a file mainfile.R with this code:
print("Bär 1"); spin_child("subfile.R")
subfile.R has this code:
print("Bär 2")
In RStudio under Windows, I run
rmarkdown::render("mainfile.R", encoding = "UTF-8")
If both files are saved with encoding UTF-8, Bär 1 comes out fine, but Bär 2 comes out as "Bär 2" (which seems like an encoding problem to me).
Interestingly, if I change encoding of subfile.R to ISO-8859-1 (System default), both Bär come out correctly.
It seems strange to me that I should have to use different encodings for different files, so I'm wonderning what I have to do if the input file for spin_child ist UTF-8 as well?
This is a bug in knitr, and I just fixed it on Github. Please try the development version and continue using UTF-8:
remotes::install_github('yihui/knitr')

.Rmd files open as completely empty

When opening .rmd files in RStudio 3.3.2, they show up as completely empty. There is text if I open using Notepad or if I open on another machine. What is going on?
RMD file in question
I had a similar issue with older R files that opened as empty. It turned out that RStudio didn't use the correct encoding as default and therefore wasn't able to read the file (presented the file as empty).
You can make sure that you are using the correct encoding by:
Opening the file in RStudio as you normally would (the file will be empty)
Navigate to File -> Reopen with Encoding...
Select UTF-8 and click OK
UTF-8 will most likely be the encoding you need. You can also choose to set this as the default for all source files.
This issue was also addressed on RStudio Support
In RStudio go to:
Tools
Global Setting
Left hand side "Code"
a. under "Saving" - Default text econding: change to UTF-8
save #Richard N mentioned, save the files with "UTF-8" encoding. Will solve the issue.
Incase you saved the files without encoding, use the open with encoding option under "Files" tab.

wrong text file output of special characters using UTF-8 enconding in R 3.1.2 with Mac OS X

I am having problems to write a csv file with Spanish accents, using R 3.1.2 and Mac OS X 10.6.
I cannot write words with accents into text file.
When I do:
con <- file("y.csv",encoding="UTF-8")
write.csv("Ú",con)
I get y.csv file which has the following content:
"","x"
"1","√ö"
Ie, "√ö" instead of "Ú".
When using write.table the outcome is equivalent.
Encoding("Ú") is "UTF-8"
If I do write.xlsx("Ú","y.xlsx") I get y.xlsx file which successfully shows Ú.
I have also tried to convert to other encodings using iconv() with no success.
I have set default encoding "UTF-8" in RStudio and on TextEdit. When using only R (not RStudio) the problem is the same.
In RStudio the special characters appear correctly (in files), and also in the console in R.
Sys.getlocale()gives
"es_ES.UTF-8/es_ES.UTF-8/es_ES.UTF-8/C/es_ES.UTF-8/es_ES.UTF-8"
In Mac OS X Terminal
file -I y.csv
gives
y.csv: text/plain; charset=utf-8
I don't see where the problem is. Any help, please?
Just came across this other question that seems like a near duplicate to this one :
Export UTF-8 BOM to .csv in R
The problem was not one of codification in R, but from TextEdit, which did not show the right characters, although I had selected UTF-8 codification in preferences. Got solved using a different editor. I was using Mac OS X 10.6.8 and TextEdit 1.6.
Maybe you cán write words with accents, but Excel expects a different encoding. Try writing your csv with for example write_csv(), and open the csv with a workaround:
open Excel
then choose tab Data | Get External Data | From Text
choose your file
and in step 1 of the text import wizard, choose file origin 65001: Unicode(UTF8).
See also http://www.openforis.org/support/questions/279/wrong-characters-display-when-exporting-files-to-csv-from-collect

Resources