utf8 conversion issue on windows using rmongodb - r

I am having issues receiving proper utf8 characters using rmongodb.
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rmongodb_1.8.0
loaded via a namespace (and not attached):
[1] plyr_1.8.3 rsconnect_0.4.1.4 tools_3.2.2 rstudioapi_0.3.1 Rcpp_0.12.1 jsonlite_0.9.17
Using mongo shell I am getting:
"organization" : [
{
"name" : "Birkhäuser Verlag GmbH",
}
],
In RStudio:
[1] "WERK Birkhäuser Verlag GmbH"
To get the data in R, I am using:
mongo.find.all(mongo = mdb, ns = colname, limit = 10)
So the Umlaut is not handled properly.
Any help much appreciated!

Related

R 3.6.3 faster than R 4.2.2?

I am not sure wether this is the right place for this question, but I've upgraded my R from 3.6.3 to 4.2.2 and my experience has been that the 4.2.2 version is way slower than the older one in some cases. Is this really an issue with R 4.x compared to R 3.6.x or do I need to adjust anything on my setup?
I found this link regarding a problem similar to mine with R 4.0.2: https://github.com/rocker-org/rocker/issues/412
Sample code (runs on average twice as fast on 3.6.3 than on 4.2.2)
I know this could be a package implementation issue rather than an R version issue, but I had problems with other scripts that do not use any of the packages below
library(tictoc)
library(seasonal)
times <- c()
for(j in 1:100) {
print(j)
tic()
for(i in 1:20) {
try({ final(seas(ts(1:120 + runif(120), start = c(1990, 1), frequency = 12))) })
}
elapsed <- toc()
times <- c(elapsed, unname(elapsed$toc - elapsed$tic))
}
Other informations:
→ Everything ran on the same machine
Session Info for 3.6.3
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252 LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C LC_TIME=Portuguese_Brazil.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] seasonal_1.9.0 tictoc_1.0
loaded via a namespace (and not attached):
[1] x13binary_1.1.39-2 compiler_3.6.3 tools_3.6.3
Session Info for 4.2.2
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=Portuguese_Brazil.utf8 LC_CTYPE=Portuguese_Brazil.utf8 LC_MONETARY=Portuguese_Brazil.utf8 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] seasonal_1.9.0 tictoc_1.1
loaded via a namespace (and not attached):
[1] colorspace_2.1-0 scales_1.2.1 x13binary_1.1.57-3 compiler_4.2.2 R6_2.5.1 cli_3.6.0 tools_4.2.2 rstudioapi_0.14
[9] lifecycle_1.0.3 munsell_0.5.0 rlang_1.0.6

RPostgres error with locale encoding language issue

I am trying to connect a local PostgrelSQL database from Rstudio via the RPostgres package.
con <- dbConnect(RPostgres::Postgres(),
dbname="####",
host="127.0.0.1",
port=5432,
user="postgre",
password="####")
it returns unreadable characters:
Error: ��������: �û� "postgre" Password ��֤ʧ��
I learned it might be a local encoding language issue.
Sys.getlocale()
[1] "LC_COLLATE=Chinese (Simplified)_China.utf8;LC_CTYPE=Chinese (Simplified)_China.utf8;LC_MONETARY=Chinese (Simplified)_China.utf8;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_China.utf8"
So I try setting the locale as:
Sys.setlocale(category = "LC_ALL",locale = "English_United States.1252")
Then the error becomes:
Error: <d6><c2><c3><fc><b4><ed><ce><f3>: <d3><U+00FB><a7> "postgre" Password <c8><cf><U+05A4><U+02A7><b0><dc>
How to solve this issue?
The PostgrelSQL database on my Windows PC is encoded in Chinese (Simplified)_China.936.
and the Rstudio sessioninfo are:
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=zh-cn.UTF-8 LC_CTYPE=zh-cn.UTF-8 LC_MONETARY=zh-cn.UTF-8 LC_NUMERIC=C LC_TIME=zh-cn.UTF-8
attached base packages:
[1] graphics grDevices utils datasets stats methods base
other attached packages:
[1] DBI_1.1.3 RPostgres_1.4.4 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4 readr_2.1.2
[8] tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.2

Rstudio multiline commenting shortcut does not do anything

As the title says, I normally use ctrl + shift + c to comment out multiple lines of code in Rstudio. It no longer works (nothing happens when I use the above shortcut).
I am using Rstudio version:
2022.02.3 Build 492
"Prairie Trillium" Release (1db809b8, 2022-05-20) for Windows
Session info:
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_New Zealand.1252 LC_CTYPE=English_New Zealand.1252 LC_MONETARY=English_New Zealand.1252 LC_NUMERIC=C
[5] LC_TIME=English_New Zealand.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.1 tools_4.1.1
I have had look in "Global options" but cannot see any setting relating to multiline commenting and in the "keyboard shortcuts help" section it still lists ctrl + shift + c as the correct shortcut. Anyone know what my issue could be?

Error while trying to use 'ez' library

I've error while trying to call ez library for rAnova, the issue that I cannot found pbkrtest package while I executed with success install.packages("ez"). library(ez) return to me the following error :
Error in loadNamespace (j <- i [[1L]], c (lib.loc, .libPaths ()) = vI VERSIONCHECK [[j]]): no package named 'pbkrtest' is found
In addition: Warning message: package 'ez' was compiled with version 3.2.5 R Error: loading the package or namespace failed for 'ez'
How can I properly use this library ? Thanks
EDIT : Sessioninfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] devtools_1.12.0
loaded via a namespace (and not attached):
Error in x[["Version"]] : index out of limits
De plus : Warning messages:
1: In FUN(X[[i]], ...) :
The file DESCRIPTION of the package 'digest' is missing or incorrect
2: In FUN(X[[i]], ...) :
The file DESCRIPTION of the package 'nlme' is missing or incorrect
EDIT 2: Sessioninfo() after restarting machine and R
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] MASS_7.3-45 Matrix_1.2-7.1 tools_3.2.2 mgcv_1.8-15
[5] nnet_7.3-12 nlme_3.1-121 grid_3.2.2 lattice_0.20-34
but still the same error (above) when tape library(ez).
So it seems that was a question of R version. As noticed in the error and as I understood 'ez' was compiled with version 3.2.5 R. So now i'm under 3.3.1 and I intalled EZ with success and can load the library and use the function. The more complicated is to reinstall R and all packages that seems painful, but thanks to this great post that allow me to store all my packages in a temp folder and to reinstall them in the newest version
For information if it could help someone else, my Sessioninfo() returns :
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8.1 x64 (build 9600)
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ez_4.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.7 magrittr_1.5 splines_3.3.1 MASS_7.3-45 munsell_0.4.3
[6] colorspace_1.2-6 lattice_0.20-33 minqa_1.2.4 stringr_1.1.0 car_2.1-3
[11] plyr_1.8.4 tools_3.3.1 nnet_7.3-12 parallel_3.3.1 pbkrtest_0.4-6
[16] grid_3.3.1 nlme_3.1-128 gtable_0.2.0 mgcv_1.8-12 quantreg_5.29
[21] MatrixModels_0.4-1 lme4_1.1-12 Matrix_1.2-6 nloptr_1.0.4 reshape2_1.4.1
[26] ggplot2_2.1.0 stringi_1.1.2 scales_0.4.0 SparseM_1.72
>

How to read extra-ASCII characters in R?

I am reading input text file line by line with the following function:
lines_reader<-function(filename){
conn<-file(filename,open="r")
linn<-readLines(conn,encoding="UCS-2LE")
close(conn)
return(linn)
}
If I try to plot these lines in the R enviroment, letters with accent marks are treated not adequately appearing like "Ã" or "è" instead of "à" or "è".
How to cope with this? What encoding I should choose?
Here they are my session and local system info:
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252
[3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.2.0
> Sys.getlocale()
[1] "LC_COLLATE=Italian_Italy.1252;LC_CTYPE=Italian_Italy.1252;LC_MONETARY=Italian_Italy.1252;LC_NUMERIC=C;LC_TIME=Italian_Italy.1252"
How about changing the encoding that you are using:
lines_reader<-function(filename){
conn<-file(filename,open="r")
linn<-readLines(conn,encoding="UTF-8")
close(conn)
return(linn)
}

Resources