I was working on a toy project and tried using some unicode variable names to match a paper I was attempting to implement.
The following code works fine on R 3.4.3 on Windows (RStudio version 1.1.456) and R 3.5.1 on OSX:
> µ <- function(ß, n) ß * n
> µ(2, 3)
[1] 6
This code gives the following error, with α typed as ALT+224:
> α <- 2
Error: unexpected input in "\"
The file was saved as UTF-8, so this is surprising to me.
make.names is consistent with the results above:
> make.names('µ')
[1] "µ"
> make.names('α')
[1] "a"
What is the rule for non-ASCII letters, why are mu and scharfes OK but alpha isn't?
Edit: Output of sessionInfo()
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3 yaml_2.2.0
Edit2: It seems like Sys.setlocale should be the answer, but here is what happens when I try this:
> Sys.setlocale("LC_ALL", 'en_US.UTF-8')
[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
OS reports request to set locale to "en_US.UTF-8" cannot be honored
Working with Ben Bolker we determined the issue was that the current session was using character encoding Windows-1252, which has some non-ASCII characters but not many. This is despite the fact that RStudio saved the file as UTF-8.
Attempting to change the current collation of a running R session does not seem to be possible? At least on Windows I get a warning (see the question and here).
I have a partial solution, if someone finds themselves in the situation where they are given a file like this and want to run it and have interactive access to the results, the following will mostly work (variables will be translated to Win-1252):
> source('utf-8-file.r', encoding='UTF-8')
I would be very excited to see a better solution, one which allows editing and running the file and entering such snippets into the console of RStudio on Windows.
Related
I'm working with files containing text in Hindi and parsing them. I wrote my code in Rstudio and executed it without many issues. But now, I need to execute the same script from command line using R.exe/Rscript.exe and it doesn't work the same way. I've run a simple script from both RStudio and the terminal:
n_p<-'नाम'
Encoding(n_p)
gregexpr(n_p,c('adfdafc','नाम adsfdfa'))
sessionInfo()
Output In RStudio:
> n_p<-'नाम'
>
> Encoding(n_p)
[1] "UTF-8"
>
> gregexpr(n_p,c('adfdafc','नाम adsfdfa'))
[[1]]
[1] -1
attr(,"match.length")
[1] -1
[[2]]
[1] 1
attr(,"match.length")
[1] 3
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7600)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rJava_0.9-10
loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0
Output with R.exe in cmd (For debugging purposes. Rscript.exe gives a similar if not identical output)
> n_p<-'à☼"à☼_à☼r'
>
> Encoding(n_p)
[1] "latin1"
>
> gregexpr(n_p,c('adfdafc','à☼"à☼_à☼r adsfdfa'))
[[1]]
[1] -1
attr(,"match.length")
[1] -1
[[2]]
[1] 1
attr(,"match.length")
[1] 9
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7600)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.0
I've tried changing locales but Sys.setlocale refuses to work properly. In some cases, gregexpr gives an error when it can't parse non ASCII code. And finally, when it does run without errors, it doesn't match regular expressions properly. I can't provide a reproducible example at the moment, but I will try to later.
Help.
The right answer is that you should run Rscript with the option --encoding=file encoding
There is no need to set locale, and as you probably found out, it doesn't work anyway. If your file is UTF-8:
Rscript.exe --encoding=UTF-8 file.R
You need to ensure that R is running in a suitable locale:
Running rterm use: Sys.getlocale() to find your current locale.
You can set your locale using:
Sys.setlocale(category = "LC_ALL", locale = "hi-IN")
# Try "hi-IN.UTF-8" too...
You can find locale names here, the MSDN, and here.
If you have the correct value, put the Sys.setlocale() command in your ~/.Rprofile.
References
https://cran.r-project.org/bin/windows/base/rw-FAQ.html
http://withr.me/configure-character-encoding-for-r-under-linux-and-windows/
I'm trying to bring up news for a package while in RStudio. While the default utils::news() works for generating the base R changelog in the built-in viewer, I can't get it to work for a specific package; it throws an error. The function works fine for specific packages in RGui.
Fresh R session in RStudio 1.2.1335:
news() # this works
news(package = "ggplot2") # this doesn't
Error that I get in viewer: Error in UseMethod("toHTML") : no applicable method for 'toHTML' applied to an object of class "NULL"
Fresh R session in RGui:
news()
news(package = "ggplot2") # both work perfectly
Session info:
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.6.0 tools_3.6.0 Rcpp_1.0.1 xml2_1.2.0 commonmark_1.7
How can I get the function to output without error in RStudio? If this is not possible, how can I run the function in RStudio but tell it to view the HTML outside of the viewer, e.g., in the browser like RGui?
This looks like an RStudio bug, so probably the best action is to report it to them. As a workaround, you can avoid using their built-in browser by changing the setting for options("browser").
For example, on a Mac outside of RStudio I see
options("browser")
# $browser
# [1] "/usr/bin/open"
and in RStudio running
options(browser = "/usr/bin/open")
disables the built-in browser. I don't know what it defaults to in RGui on Windows, but setting it to the same in RStudio as it is in RGui should get it to work.
Unfortunately, this disables it for everything, not just for news(), so you probably want something like this instead:
save <- options(browser = "/usr/bin/open")
news(package = "ggplot2")
options(save)
I'm working with files containing text in Hindi and parsing them. I wrote my code in Rstudio and executed it without many issues. But now, I need to execute the same script from command line using R.exe/Rscript.exe and it doesn't work the same way. I've run a simple script from both RStudio and the terminal:
n_p<-'नाम'
Encoding(n_p)
gregexpr(n_p,c('adfdafc','नाम adsfdfa'))
sessionInfo()
Output In RStudio:
> n_p<-'नाम'
>
> Encoding(n_p)
[1] "UTF-8"
>
> gregexpr(n_p,c('adfdafc','नाम adsfdfa'))
[[1]]
[1] -1
attr(,"match.length")
[1] -1
[[2]]
[1] 1
attr(,"match.length")
[1] 3
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7600)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rJava_0.9-10
loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0
Output with R.exe in cmd (For debugging purposes. Rscript.exe gives a similar if not identical output)
> n_p<-'à☼"à☼_à☼r'
>
> Encoding(n_p)
[1] "latin1"
>
> gregexpr(n_p,c('adfdafc','à☼"à☼_à☼r adsfdfa'))
[[1]]
[1] -1
attr(,"match.length")
[1] -1
[[2]]
[1] 1
attr(,"match.length")
[1] 9
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7600)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.0
I've tried changing locales but Sys.setlocale refuses to work properly. In some cases, gregexpr gives an error when it can't parse non ASCII code. And finally, when it does run without errors, it doesn't match regular expressions properly. I can't provide a reproducible example at the moment, but I will try to later.
Help.
The right answer is that you should run Rscript with the option --encoding=file encoding
There is no need to set locale, and as you probably found out, it doesn't work anyway. If your file is UTF-8:
Rscript.exe --encoding=UTF-8 file.R
You need to ensure that R is running in a suitable locale:
Running rterm use: Sys.getlocale() to find your current locale.
You can set your locale using:
Sys.setlocale(category = "LC_ALL", locale = "hi-IN")
# Try "hi-IN.UTF-8" too...
You can find locale names here, the MSDN, and here.
If you have the correct value, put the Sys.setlocale() command in your ~/.Rprofile.
References
https://cran.r-project.org/bin/windows/base/rw-FAQ.html
http://withr.me/configure-character-encoding-for-r-under-linux-and-windows/
I have installed RStudio on a new computer, and has developed encoding issues. When I type accented text in console (no file writing or reading involved, just plain console) I lose Czech accents (as in this example - notice the accented N and Č)
> "Ňuf ňuf ňufičky"
[1] "Nuf nuf nuficky"
I know it is a settings issue - I have other R installations that behave correctly - but I am unable to find exact place in my settings to force UTF-8 behavior. Any help would be appreciated.
My session info is:
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.2 tools_3.4.2 yaml_2.1.14
For the benefit of posterity - I overcame my problem by setting code page to 1250 (while keeping US English as my language).
if (.Platform$OS.type == 'windows') {
Sys.setlocale(category = 'LC_ALL','English_United States.1250')
} else {
Sys.setlocale(category = 'LC_ALL','en_US.UTF-8')
}
in the .Rprofile
Perhaps:
new.locale <- ifelse(.Platform$OS.type=="windows", "Czech_Czech Republic.1250", "en_US.UTF-8")
Sys.setlocale("LC_CTYPE", new.locale)
Also learn to specify your OS.
I am having trouble running Rcpp on my PC in RStudio. Whenever I sourceCpp() a cpp file, even the Hello World file that comes with Rcpp::Rcpp.package.skeleton(), I get the warning
In normalizePath(path.expand(path), winslash, mustWork) :
path[1]=".../anRpackage/src/../inst/include": The system cannot find the path specified
I searched Stackoverflow and it looks like some people get this warning if they don't have Depends: Rcpp in the DESCRIPTION of their package, but I am just running sourceCpp() so the DESCRIPTION file shouldn't matter (I also changed my DESCRIPTION file).
It is just a warning so the class and functions I wrote do appear in R, but RStudio frequently crashes after I use the functions in R a few times, which may or may not be related.
My session info:
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rcpp_0.12.1 RevoUtilsMath_7.4.1 RevoUtils_7.4.1 RevoMods_7.4.1 RevoScaleR_7.4.1 lattice_0.20-30 rpart_4.1-9
loaded via a namespace (and not attached):
[1] codetools_0.2-10 foreach_1.4.2 grid_3.1.3 iterators_1.0.7 tools_3.1.3
I suppose it is possible that Revolution R is the culprit here, but I have no way of knowing. I would appreciate help, because I don't want to ignore this warning, and it's obviously not ideal for RStudio to crash repeatedly.
Kind Regards
This is still relevant today, so here's my discoveries.
Rcpp can generate interfaces to and from C++ and R.
These are generated with the help of attributes specified in source-files.
From these attributes, the call to Rcpp::compileAttributes() produces the headers. Whilst at it, this also create the folder <package directory>/inst/include. If you have specified no attributes, anywhere, then compileAttributes() does not create these directories.
In order to get rid of this warning, create the <package directory>/inst/include.
For more on attributes, see Rcpp attributes vignette.