rquery: Connect to specific schema in Postgres DB - r

The rquery package has been out for some time now, but the documentation is still very sparse. There isn't even a tag yet in SO, this question will create it.
Maybe there is someone who can help me nevertheless.
I want to connect to a schema in my Postgres-DB via rqueryto read the data into R with all the speed it promises.
Using this code it works with all the tables in the public-schema.
library(RPostgres)
library(rquery)
con <- dbConnect(RPostgres::Postgres(),
host = #####,
dbname = #####,
user = #####,
password = ######)
df <- db_td(con, "tablename") %.>%
execute(con, .)
Now when I want to access a table in a specific schema db_td() has the argument qualifiers = which is an
optional named ordered vector of strings carrying
additional db hierarchy terms,such as schema
So I did:
db_td(db, "tablename", qualifiers = c(schema = "schema"))
But:
Error in result_create(conn#ptr, statement) : Failed to prepare
query: FEHLER: Relation »tablename« existiert nicht LINE 1: SELECT
* FROM "tablename" LIMIT 1
So the qualifiers = argument seems to be completely ignored.
My question is thus pretty basic:
How can I connect to a schema in a PostgresDB via rquery?

all my attempts to solve this "within" rquery seem to fail miserably, but you can work around it by doing something like:
dbExecute(con, "SET search_path = foo_schema, public;")
before you run db_td.
I think it's caused by rq_colnames doing:
paste0("SELECT * FROM ", quote_identifier(db, table_name),
" LIMIT 1")
and hence not doing anything with its qualifiers, at least this matches the error I get back.
maybe report a bug/issue with rquery if this isn't enough

I have created an issue on github. So far regular rquery indeed doesn't have schema ability. The development version of rquery (1.3.4) however has, as of today, basic schema ability.
To be installed via:
library(devtools)
install_github("WinVector/rquery", host = "https://api.github.com")
Here's a small instruction. Seems to have been inteded to work just as I was trying in my question.
Be careful though, rquery hasn't been fully tested in schema-mode and some things might not work.
EDIT: rquery now has full schema support.

Related

Creating a parameter in neo4j through R driver

I am trying to generate a graph using the neo4r R driver. I have no problems preforming standard queries such as
"MATCH (n:Node {nodeName: ‘A Name’}) RETURN COUNT(n)” %>% call_neo4j(con)
However when I try to create a parameter with the following query
":params {Testnode: {testNodeName: 'Node Name'}}" %>% call_neo4j(con)
I get the following syntax error
$error_code
[1] "Neo.ClientError.Statement.SyntaxError"
$error_message
[1] "Invalid input ':': expected <init> (line 1, column 1 (offset: 0))\n\":params {Testnode: {testNodeName: 'Node Name'}}\"\n ^"
The parameter query works fine when I run it directly in the neo4j browser so I do not understand how there is a syntax error?
Any ideas on how to fix this greatly accepted!
:params only works in the Neo4j Browser, it's not really Cypher.
Worse, the R Neo4j driver doesn't seem to support passing parameters - there's an open Github issue that points to a fork that contains relevant changes, but that fork also has other changes that make it deviate from the main driver.
I'd try either using the fork to see if it gets you anywhere, and if it does either create the relevant PR to the project or maintain a local fork that track the main driver but just contains that parameter change.

Avoiding warning message “There is a result object still in use” when using dbSendQuery to create table on database

Background:
I use dbplyr and dplyr to extract data from a database, then I use the command dbSendQuery() to build my table.
Issue:
After the table is built, if I run another command I get the following warning:
Warning messages:
1. In new_result(connection#ptr, statement): Cancelling previous query
2. In connection_release(conn#ptr) :
 There is a result object still in use.
The connection will be automatically released when it is closed.
Question:
Because I don’t have a result to fetch (I am sending a command to build a table) I’m not sure how to avoid this warning. At the moment I disconnect after building a table and the error goes away. Is there anything I can do do to avoid this warning?
Currently everything works, I just have this warning. I'd just like to avoid it as I assume I should be clearing something after I've built my table.
Code sample
# establish connection
con = DBI::dbConnect(<connection stuff here>)
# connect to table and database
transactions = tbl(con,in_schema(“DATABASE_NAME”,”TABLE_NAME”))
# build query string
query_string = “SELECT * FROM some_table”
# drop current version of table
DBI::dbSendQuery(con,paste('DROP TABLE MY_DB.MY_TABLE'))
# build new version of table
DBI::dbSendQuery(con,paste('CREATE TABLE PABLE MY_DB.MY_TABLE AS (‘,query_string,’) WITH DATA'))
Even though you're not retrieving stuff with a SELECT clause, DBI still allocates a result set after every call to DBI::dbSendQuery().
Give it a try with DBI::dbClearResult() in between of DBI::dbSendQuery() calls.
DBI::dbClearResult() does:
Clear A Result Set
Frees all resources (local and remote) associated with a
result set. In some cases (e.g., very large result sets) this
can be a critical step to avoid exhausting resources
(memory, file descriptors, etc.)
The example of the man page should give a hint how the function should be called:
con <- dbConnect(RSQLite::SQLite(), ":memory:")
rs <- dbSendQuery(con, "SELECT 1")
print(dbFetch(rs))
dbClearResult(rs)
dbDisconnect(con)

Grant SQL permissions in PostgreSQL using R

I'm accessing a PostgreSQL database through the R library RPostgreSQL. The following line successfully reads my table into object DF:
DF <- dbReadTable(conn = con, name = c("my_schema","my_table"))
However, attempting to write back into the database with the following line throws ERROR: permission denied for schema my_schema:
dbWriteTable(conn = con, name = c("my_schema", "my_table"), value = DF)
I've discovered from the question Writing to specific schemas with RPostgreSQL that the solution is to SET search_path = my_schema, public;, but I have no idea how to run this from the R Console. I've tried lines such as dbSendQuery(conn = con, statement = "SET search_path = my_schema, public;"), and I recognize that setting permissions is not querying at all, but there's not a dbSetPermissions function in RPostgreSQL.
I'm clearly missing something fundamental since the answer to the aforementioned question satisfied the user who asked it, so I appreciate your patience.

RImpala: Query Failed When Larger Data

check1<-rimpala.query("select * from sum2")
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.sql.SQLException: Method not supported
dim(sum2) is 49501 rows and 18 columns.
check1<-rimpala.query("select *from sum3")
dim(sum3) is 102 rows and 6 columns.
It worked with smaller sample size.
sorry that I cant reproduce example to this. Is anyone encounter the same problem with larger data size? Any idea to solve this? Thanks.
As noted elsewhere on StackOverflow, RImpala does not implement executeUpdate and so cannot run any query that modifies state. I suspect you hit your error not by running a larger SELECT query but rather because you tried to insert, update, or delete some data.
If you'd like to use Impala from R, I'd recommend using dplyrimpaladb.
RImpala (v0.1.6) build is updated with the support to execute DDL queries using executeUpdate.
The latest build contains the following fixes / additions:
Support for DDL query execution.
fetchSize parameter in query function to state the number of records that can be retrieved in one round trip read from Impala.
Fix for query failing when NULL values are being returned.
Compatiblity with CDH 5.x.x
You can run DDL queries using the query function as illustrated below:
rimpala.query(Q="drop table sample_table",isDDL="true")
You can also specify the fetchSize in the query function to aid reading large data efficiently.
rimpala.query(Q="select * from sample_table",fetchSize="10000")
Please find the latest build in Cran : http://cran.r-project.org/web/packages/RImpala/index.html
Source Code : https://github.com/Mu-Sigma/RImpala
I have the same problem with the RImpala package and recommend to use the RJDBC package:
library(RJDBC)
drv <- JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver",
classPath = list.files("path_to_jars",pattern="jar$",full.names=T),
identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:21050/;auth=noSasl")
check1 <- dbGetQuery(conn, "select *from sum3")
I used these jar files an evenything works as expected:
https://downloads.cloudera.com/impala-jdbc/impala-jdbc-0.5-2.zip
For more information and a speed comparison look at this blog post:
http://datascience.la/r-and-impala-its-better-to-kiss-than-using-java/

sqlFetch Table not found error

After I use
cn<-odbcConnect(...)
to connect to MS SQL Server. I can successfully get data using:
tmp <- sqlQuery(cn, "select * from MyTable")
But if I use
tmp <- sqlFetch(cn,"MyTable")
R would complain about "Error in odbcTableExists(channel, sqtable) : table not found on channel". Did I miss anything here?
Assuming you work on Windows OS. When you define your "dsn" in Control panel > Administrative tools > System and Security > Data Sources (ODBC), you have to select a database as well. If you do that your code should work as expected.
So, the problem is not in your R code, but in your "dsn" string that in my opinion does not contain the reference to a database which is needed.

Resources