rjdbc Parallel query with parallelmap - r

I'm trying to run my query in parallele and i get an 00001: Error in .jcheck() : No running JVM detected. Maybe .jinit() would help. error.
The queries are working when i run them one by one
My script:
I know it's not really reproductible but i can't give give you my log/pass :)
i tried to .jinit() and Sys.setenv(JAVA_HOME='C:\\Program Files\\Java\\jdk1.8.0_102') in the slave it's not working
library(RJDBC)
library(parallelemap)
jdbcDriver <- JDBC(driverClass="oracle.jdbc.OracleDriver", classPath="ojdbc6.jar" )
jdbcConnection <- dbConnect(jdbcDriver, "jdbc:oracle:thin:#//mybase", "login", "pass")
query_list<- list( "SELECT * FROM table1",
"SELECT * FROM table2",
"SELECT * FROM table3",
"SELECT * FROM table4",
"SELECT * FROM table5")
import_base_fonction <- function(query) {return(dbGetQuery( jdbcConnection , query))}
parallelStartSocket( 5 )
parallelLibrary("RJDBC","rJava")
parallelExport("listquery_list","import_base_fonction" ,"jdbcConnection")
mes_tables <- parallelMap(import_base_fonction,query_list)
parallelStop()
my session info
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] parallelMap_1.3 PhViD_1.0.8 MCMCpack_1.4-0 MASS_7.3-47 coda_0.19-1 LBE_1.44.0 dplyr_0.7.1
[8] plyr_1.8.4 shiny_1.0.3 DT_0.2 shinydashboard_0.6.1 data.table_1.10.4 RJDBC_0.2-5 rJava_0.9-8
[15] DBI_0.7
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 compiler_3.4.1 bindr_0.1 tools_3.4.1 digest_0.6.12 checkmate_1.8.3 tibble_1.3.3 lattice_0.20-35
[9] pkgconfig_2.0.1 rlang_0.1.1 Matrix_1.2-10 parallel_3.4.1 SparseM_1.77 bindrcpp_0.2 htmlwidgets_0.9 MatrixModels_0.4-1
[17] grid_3.4.1 glue_1.1.1 R6_2.2.2 magrittr_1.5 backports_1.1.0 BBmisc_1.11 htmltools_0.3.6 mcmc_0.9-5
[25] assertthat_0.2.0 mime_0.5 xtable_1.8-2 httpuv_1.3.5 quantreg_5.33
The base is on Oracle 11.xx server.
Please guide.

I think you can change import_base_fonction to
import_base_fonction <- function(query) {
.jinit("ojdbc6.jar")
return(dbGetQuery( jdbcConnection , query))
}

Related

Error when connecting sparklyr to local

I'm trying to run sparklyr from my local environment to replicate a production environment. However, I can't even get started. I successfully installed the latest version of Spark using spark_install(), but when trying to run spark_connect() I get this vague and unhelpful error.
> library(sparklyr)
> spark_installed_versions()
spark hadoop dir
1 2.3.1 2.7 C:\\Users\\...\\AppData\\Local/spark/spark-2.3.1-bin-hadoop2.7
> spark_connect(master = "local")
Error in if (is.na(a)) return(-1L) : argument is of length zero
Here is what my session info looks like.
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sparklyr_0.8.4.9003
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 dbplyr_1.2.1 compiler_3.5.0 pillar_1.2.3 later_0.7.3
[6] plyr_1.8.4 bindr_0.1.1 base64enc_0.1-3 tools_3.5.0 digest_0.6.15
[11] jsonlite_1.5 tibble_1.4.2 nlme_3.1-137 lattice_0.20-35 pkgconfig_2.0.1
[16] rlang_0.2.1 psych_1.8.4 shiny_1.1.0 DBI_1.0.0 rstudioapi_0.7
[21] yaml_2.1.19 parallel_3.5.0 bindrcpp_0.2.2 stringr_1.3.1 dplyr_0.7.5
[26] httr_1.3.1 rappdirs_0.3.1 rprojroot_1.3-2 grid_3.5.0 tidyselect_0.2.4
[31] glue_1.2.0 R6_2.2.2 foreign_0.8-70 reshape2_1.4.3 purrr_0.2.5
[36] tidyr_0.8.1 magrittr_1.5 backports_1.1.2 promises_1.0.1 htmltools_0.3.6
[41] assertthat_0.2.0 mnormt_1.5-5 mime_0.5 xtable_1.8-2 httpuv_1.4.3
[46] config_0.3 stringi_1.1.7 lazyeval_0.2.1 broom_0.4.4
Well, with a bit of guessing I was able to solve my problem. I had to specify the "SPARK_HOME" environment manually.
spark_installed_versions()[1, 3] %>% spark_home_set()

R: kableExtra package- cannot save (LaTeX) table as image using kable_as_image()

Error Message:
Error in open.connection (path, "wb") : cannot open the connection
Calls: ... kable_as_image -> write_file-> open -> open.connection execution halted
While knitting an RMarkdown file with the following commands:
library(knitr)
library(kableExtra)
dt <- mtcars[1:5, 1:6]
options(knitr.table.format = "latex")
kable_as_image(kable_input = kable(dt),filename = "abcd",file_format =
"png",latex_header_includes = NULL, keep_pdf = FALSE)
Working Directory: "C:/Users/xyz/Documents"
sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
[5] LC_TIME=English_India.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] kableExtra_0.5.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 digest_0.6.12 rprojroot_1.2
[4] R6_2.2.2 backports_1.1.0 magrittr_1.5
[7] evaluate_0.10.1 httr_1.3.0 rlang_0.1.2
[10] stringi_1.1.5 magick_1.3 xml2_1.1.1
[13] rmarkdown_1.6.0.9004 tools_3.4.1 stringr_1.2.0
[16] readr_1.1.1 hms_0.3 yaml_2.1.14
[19] compiler_3.4.1 rvest_0.3.2 htmltools_0.3.6
[22] knitr_1.17 tibble_1.3.3

POSIXts error when quit Rstudio

When I tried to close Rstudio in Mac I get the following error:
Error in as.POSIXlt(x, tz = tz(x)) :
argument "x" is missing, with no default
Have not tried to reisntall Rstudio as i am hoping can fix the issue.
this are the packages loaded in session:
library(readr)
library(dplyr)
library(lubridate)
library(ggplot2)
library(scales)
These are the objects in the global env:
> ls()
[1] "csv" "csv2" "csv3" "p"
>
Have tried base::q() and base::quit() and get the same error.
sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] shiny_1.0.3 scales_0.4.1 ggplot2_2.2.1 bindrcpp_0.2 dplyr_0.7.1 lubridate_1.6.0 readr_1.1.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 compiler_3.4.0 colourpicker_0.3 plyr_1.8.4 bindr_0.1 shinyjs_0.9.1
[7] tools_3.4.0 digest_0.6.12 jsonlite_1.5 tibble_1.3.3 gtable_0.2.0 pkgconfig_2.0.1
[13] rlang_0.1.1 rstudioapi_0.6 curl_2.7 stringr_1.2.0 knitr_1.16 htmlwidgets_0.8
[19] hms_0.3 grid_3.4.0 glue_1.1.1 R6_2.2.2 magrittr_1.5 htmltools_0.3.6
[25] assertthat_0.2.0 mime_0.5 colorspace_1.3-2 xtable_1.8-2 httpuv_1.3.3 labeling_0.3
[31] V8_1.5 stringi_1.1.5 miniUI_0.1.1 lazyeval_0.2.0 munsell_0.4.3
This could be because lubridate expects input but base::date() does not need it. Removing lubridate might solve the issue, or using base::date() instead. See this thread for a discussion on this issue.
I hope this helps!

dply::left_join with postgres backend not working?

I'm running something like this: (sorry it is not 100% reproducible, unless you have postgres running in your machine, with mydb and tables created, and the given user and password)
library(RPostgreSQL)
library(tidyverse)
library(dbplyr)
pg_conn <- RPostgreSQL::dbConnect(
drv = "PostgreSQL", dbname = "mydb",
user = "postgres", password = "postgres"
)
table1_pg <- dplyr::tbl(pg_conn, "table1")
table2_pg <- dplyr::tbl(pg_conn, "table2")
table_join <- table1_pg %>%
left_join(table2_pg, by = c("x" = "x"))
And I get the following error:
Error in nlevels(object) : argument "object" is missing, with no default
And I have no clue what is going on (I am 100% sure the tables exists and each of them has the x column; I can query them, both using the tbl or directly sending an sql with RPostgreSQL::dbGetQuery).
I've googled the problem and searched GitHub and SO, but I found no solution, nor anybody reporting the problem. The closest thing I found is this issue that left_join duplicates join variables. I tried #hadley's dx example there using SQLite, but that works all right on my machine, so perhaps this is a postgres-specific issue?
Here's my sessionInfo()
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dbplyr_1.0.0 dplyr_0.7.1 purrr_0.2.2.2 readr_1.1.1 tidyr_0.6.3 tibble_1.3.3
[7] ggplot2_2.2.1 tidyverse_1.1.1 RPostgreSQL_0.4-1 DBI_0.6-1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 cellranger_1.1.0 compiler_3.4.0 plyr_1.8.4 bindr_0.1 forcats_0.2.0 tools_3.4.0
[8] lubridate_1.6.0 jsonlite_1.5 nlme_3.1-131 gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.1 rlang_0.1.1
[15] psych_1.7.5 parallel_3.4.0 haven_1.0.0 bindrcpp_0.2 xml2_1.1.1 httr_1.2.1 stringr_1.2.0
[22] hms_0.3 grid_3.4.0 glue_1.1.1 R6_2.2.1 readxl_1.0.0 foreign_0.8-68 modelr_0.1.0
[29] reshape2_1.4.2 magrittr_1.5 scales_0.4.1 rvest_0.3.2 assertthat_0.2.0 mnormt_1.5-5 colorspace_1.3-2
[36] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3 broom_0.4.2

Importing data from GA split by dimensions containing interval in their name

I'm using the RGA package to import data from GA to R. The package is tremendously helpful, but when trying to import data split by a dimension whose name contains an interval, the following error message is returned:
Error: Client error: (400) Bad Request Bad request: Unknown
dimension(s): ga:Week_of_Year
The code:
library(RGA)
authorize()
id <- "95872896"
Sessions.by.source <- get_ga(id, metrics = "ga:New Users",
dimensions = c("ga:Date","ga:Source","ga:Week of Year"),
sort = "ga:Date")
sessionInfo():
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] RGA_0.4.2 zoo_1.7-13 tidyr_0.6.0 stringr_1.1.0 RODBC_1.3-14
[6] devtools_1.12.0 curl_1.2 RGA_0.4.2 httr_1.2.1 jsonlite_1.1
[11] RCurl_1.95-4.8 bitops_1.0-6 taskscheduleR_1.0 gridExtra_2.2.1 dplyr_0.5.0
[16] plyr_1.8.4 ggplot2_2.1.0 scales_0.4.0 data.table_1.9.6 r2excel_1.0.0
[21] xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-8
loaded via a namespace (and not attached):
[1] Rcpp_0.12.6 tools_3.3.1 digest_0.6.10 lubridate_1.5.6 memoise_1.0.0 tibble_1.1
[7] gtable_0.2.0 lattice_0.20-33 DBI_0.4-1 withr_1.0.2 R6_2.1.2 magrittr_1.5
[13] assertthat_0.1 colorspace_1.2-6 httpuv_1.3.3 labeling_0.3 stringi_1.1.1 openssl_0.9.4
[19] lazyeval_0.2.0 munsell_0.4.3 chron_2.3-47
Regards,
The API dimensions definitions are different that the labels in GA itself.
For the API dimensions definitions refer to:
https://developers.google.com/analytics/devguides/reporting/core/dimsmets

Resources