How to make RStudio treat cyriclic (Russian) symbols properly - r

The problem is simple and annoying. I cannot print russian text neither in console nor in file.
Input:
print("hello world")
print("привет мир")
Output:
> print("hello world")
[1] "hello world"
> print("привет мир")
[1] "ïðèâåò ìèð"
I would not use russian at all, but sometimes I get errors in russian, and since they are not readable, I have no way to treat those errors:
> load(swirl)
Error in load(swirl) : íåïðàâèëüíûé àðãóìåíò 'file'

Related

Natural sorting with R differs on deployment (maybe OS/Locale issue)

I am using the package "naturalsort" found here: https://github.com/kos59125/naturalsort Natural sorting is not something that is implemented elsewhere in a good manner in R as far as I know, so I was happy to find this package.
I use the function naturalsort to sort file names just like windows explorer, which works great locally.
But when I use it in my production environment deployed with Docker on Google Cloud Run, the sorting changes. I don't know if this is due to changes in locale(I am fra Denmark) or it is due to OS differences between my windows PC and the Docker/Google Cloud Run deployment.
I have created a example ready to be run in R:
######## Code start ###########
require(plumber)
require(naturalsort) #for name sorting
#* Retrieve sorted string list
#* #get /sortstrings
#* #param nothing
function(nothing) {
print(nothing)
test <- c("0.jpg", "file (4_5_1).jpeg", "1 tall thin image.jpeg",
"8.jpeg", "8.jpg", "file (2.1.2).jpeg", "file (0).jpeg", "3.jpeg",
"file (1).jpeg", "file (2.1.1).jpeg", "file (0) (3).jpeg", "file (2).jpeg",
"file (2.1).jpeg", "file (4_5).jpeg", "file (4).jpeg", "file (39).jpeg")
print("Direct sort")
print(naturalsort(text = test))
sorted_strings <- naturalsort(text = test)
return(sorted_strings)
}
######## Code end ###########
I would expect it to sort the file names like below, which it does locally both when run directly in the script and also when doing it through plumber RUN API:
c("0.jpg",
"1 tall thin image.jpeg",
"3.jpeg",
"8.jpeg",
"8.jpg",
"file (0) (3).jpeg",
"file (0).jpeg",
"file (1).jpeg",
"file (2).jpeg",
"file (2.1).jpeg",
"file (2.1.1).jpeg",
"file (2.1.2).jpeg",
"file (4).jpeg",
"file (4_5).jpeg",
"file (4_5_1).jpeg",
"file (39).jpeg"
)
But instead it sorts it like this:
c("0.jpg",
"1 tall thin image.jpeg",
"3.jpeg",
"8.jpeg",
"8.jpg",
"file (0) (3).jpeg",
"file (0).jpeg",
"file (1).jpeg",
"file (2.1.1).jpeg",
"file (2.1.2).jpeg",
"file (2.1).jpeg",
"file (2).jpeg",
"file (4_5_1).jpeg",
"file (4_5).jpeg",
"file (4).jpeg",
"file (39).jpeg")
Which is not like windows explorer.
Try fixing the collating sequence prior to the naturalsort call. It varies by locale and can affect how strings are compared (and therefore sorted).
## Get initial value
lcc <- Sys.getlocale("LC_COLLATE")
## Use fixed value
Sys.setlocale("LC_COLLATE", "C")
sorted_strings <- naturalsort(text = test)
## Restore initial value
Sys.setlocale("LC_COLLATE", lcc)
You can find some details in ?sort, ?Comparison, and ?locales and more here.

In R cannot get function from imported python file using reticulate

I would like to import a python file and then use a function within the python file; but it does not work (it only work for source_python). Is it suppose to be this way?
In python file called the_py_module.py includes this code:
def f1():
return "f one"
def f2():
return "f two"
R script
# Trying to import the python file, which appear to work:
reticulate::import("the_py_module")
Gives this output:
Module(the_py_module)
# But when calling the function:
f1()
I get error saying:
Error in f1() : could not find function "f1"
This works using source python script though.
reticulate::source_python("the_py_module.py")
f1()
Try the following approach:
> library(reticulate)
> my_module <- import(the_py_module)
> my_module$f1()
[1] "f one"
or, using your approach
> my_module_2 <- reticulate::import("the_py_module")
> my_module_2$f1()
[1] "f one"

setting tracepoint in future file being sourced in R

I have a file, test.R :
somefunc<-function(){
print("Hello")
print("World")
}
somefunc()
print("The End")
I want to set a tracepoint on the first line of somefunc. So I try,
> trace(what='somefunc', tracer=browser, at=1)
Error in getFunction(what, where = whereF) : no function ‘somefunc’ found
No traceback available
Ok, so it is not in the namespace. Let's load the file (so the function is in our current namespace) and then set the tracepoint...
> source("test.R")
[1] "Hello"
[1] "World"
[1] "The End"
> trace(what='somefunc', tracer=browser, at=1)
[1] "somefunc"
Now run the file again.
> source("test.R")
[1] "Hello"
[1] "World"
[1] "The End"
Alas, the breakpoint isn't hit. Presumably, the act of loading the file again clobbered the previous function (and tracepoint) in namespace.
I can only hit the tracepoint, after I've loaded the file and called the function directly through the interpreter. E.g.
> source("test.R")
[1] "Hello"
[1] "World"
[1] "The End"
> trace(what='somefunc', tracer=browser, at=1)
[1] "somefunc"
> somefunc()
Tracing somefunc() step 1
Called from: eval(expr, p)
Browse[1]>
QUESTIONS :
How do I set a breakpoint on a function that is to be loaded / sourced in R?
How do I previously set tracepoints when reloading the sourced file?
NOTE :
I'm not looking for the 'I like RStudio, why don't you try using that?' answer.

knitr: generating UTF-8 output from chunks

I have a doc.Rnw supposed to produce some Russian UTF-8 strings:
\documentclass{article}
\usepackage{inputenc}
\inputencoding{utf8}
\usepackage[main=english,russian]{babel}
\begin{document}
\selectlanguage {russian}
<<test, results='asis', echo=FALSE>>=
print(readLines('string.rus', encoding="UTF-8"))
print("Здравствуйте")
#
Здравствуйте
\selectlanguage {english}
\end{document}
string.rus has a UTF-8 string which corrrctly shows in R console:
print(readLines('string.rus', encoding="UTF-8"))
# [1] "Здравствуйте"
doc.Rnw coorectly shows in Windows notepad, while both:
file.show("doc.Rnw")
file.show("doc.Rnw", encoding="UTF-8")
fail to show properly the UTF-8 strings.
Using:
knit("doc.Rnw")
The document part of the output doc.tex shows:
\begin{document}
\selectlanguage {russian}
[1] "<U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>"
[1] " <U+0097>д <U+0080>авс <U+0082>в <U+0083>й <U+0082>е"
Здравствуйте
\selectlanguage {english}
\end{document}
which of course does not compile in PDFLaTeX. Using:
knit("doc.Rnw", encoding="UTF-8")
gives even worse results.
Commenting the chunks which should generate UTF-8 strings:
print(readLines('string.rus', encoding="UTF-8"))
print("Здравствуйте")
gives a valid doc.tex which compiles in MikTeX and shows properly the remaining UTF-8 string.
Even if I comment the first print... and leave only the second one. I can't compile. This seems to prove that the original encoding of doc.Rnw is correct.
I tried to replace both print commands with:
a="Здравствуйте"
Encoding(a)="UTF-8"
print(a)
In this case I can compile, but the PDF output is (first string is cut out from margin):
[1] «U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443>
Здравствуйте
So the chunk output is still wrong.
How to properly print UTF-8 strings from chunks?
R version is 3.3.3 (2017-03-06) for Windows and knitr is 1.15.1 (2016-11-22).
An extended working example is below:
\documentclass{article}
\usepackage{inputenc}
\inputencoding{utf8}
\usepackage[main=english,russian]{babel}
\begin{document}
\selectlanguage {russian}
<<test, results='asis', echo=FALSE>>=
s=readLines('string.rus', , encoding="UTF-8")
message("s ", Encoding(s), ": ", s)
Encoding(s)="latin1"
message("s latin1: ", s)
Encoding(s)="unkwnown"
message("s unkwnown: ", s)
Encoding(s)="utf8"
message("s utf8: ", a)
a="Здравствуйте"
message("a ", Encoding(a), ": ", a)
Encoding(a)="latin1"
message("a latin1: ", a)
Encoding(a)="utf8"
message("a utf8: ", a)
Encoding(a)="UTF-8"
message("a UTF-8: ", a)
u=("\U0417")
message("u ", Encoding(u), ": ", u)
Encoding(u)="latin1"
message("u latin1: ", u)
Encoding(u)="unkwnown"
message("u unkwnown: ", u)
#
Здравствуйте
\selectlanguage {english}
\end{document}
After knit("doc.Rnw", this is the output related to test chunk found in doc.tex (without knitr code decoration for readability):
s UTF-8: <U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>
s latin1: Здравствуйте
s unkwnown: Здравствуйте
s utf8: <U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>
a unknown: Здравствуйте
a latin1: Здравствуйте
a utf8: Здравствуйте
a UTF-8: <U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>
u UTF-8: <U+0417>
u latin1: З
u unkwnown: З
Some comments follow.
First, only message() works, print() gives always errors.
In both the externally read string s and the locally set a, the behavior is weird.
in fact, keeping or explicitly setting the code to UTF-8 produces the wrong results (utf8 works for a).
One might think the UTF8 encoding of the documents (doc.Rnw and string.rus) is not properly set. This is why I added the line u=("\U0417"), which is UTF8 for sure. Again, only removing the UTF8 encoding gives a proper output.
In a simialr fashion, requesting explicitly an UTF8 output:
knit("doc.Rnw", encoding="UTF-8")
does not produce the UTF8 charaters, but their unicode values or weird ones.
In the end, I can produce the desired .tex file and compile the LaTeX it, but why there is the above counter-intuitive behavior is beyond me.
Hopefully someone will give a good explanation.

rPython method python.get returns weird encoding

I m trying to use rPython package to pass some arguments into python code and get results back. But for some reason I m getting weird encoding from my python code. Maybe someone has some hints to point me out.
Here is my simple code to test:
require(rPython)
#pass the test word 'audiention' (in ukrainian)
word<-"аудієнція"
python.assign("input", word)
python.exec("input = input.encode('utf-8')")
python.exec("print input") #the output in console is correct at this step: аудієнція
x<-python.get("input")
cat(x) # the output is: 0C4VT=FVO
Does anybody have some suggestions why the output of python.get is encoded weird?
My Sys.getlocale() output is:
Sys.getlocale()
[1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=uk_UA.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=uk_UA.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=uk_UA.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=uk_UA.UTF-8;LC_IDENTIFICATION=C"
Thank you in advance for any hints!
I have recently built a new package based on the original rPython code called SnakeCharmR that addresses this and other problems rPython had.
A quick comparison:
> library(SnakeCharmR)
> py.assign("a", "'")
> py.get("a")
[1] "'"
> py.assign("a", "áéíóú")
> py.get("a")
[1] "áéíóú"
> library(rPython)
> python.assign("a", "'")
File "<string>", line 2
a =' [ "'" ] '
^
SyntaxError: EOL while scanning string literal
> python.assign("a", "áéíóú")
> python.get("a")
[1] "\xe1\xe9\xed\xf3\xfa"
You can install SnakeCharmR like this:
> library(devtools)
> install_github("asieira/SnakeCharmR")
Hope this helps.

Resources