syntax for database.table in dbplyr? - r

I have a connection to our database:
con <- dbConnect(odbc::odbc(), "myHive")
I know this is successful because when I run it, in the top right of RStudio I can see all of our databases and tables.
My question is, how can I select a specific database table combination? The documentation shows a user sleecting a single table, "flights" but I need to do the equivilent of somedatabase.sometable.
Tried:
mytable <- tbl(con, "somedb.sometable")
Error in new_result(connection#ptr, statement) :
nanodbc/nanodbc.cpp:1344: 42S02: [Hortonworks][SQLEngine] (31740) Table or view not found: HIVE..dp_enterprise.uds_order
Then tried:
mytable <- tbl(con, "somedb::sometable")
Error in new_result(connection#ptr, statement) :
nanodbc/nanodbc.cpp:1344: 42S02: [Hortonworks][SQLEngine] (31740) Table or view not found: HIVE..somedb::sometable
I tried removing the quotes "" too.
Within the connections pane of RStudio I can see somedb.sometable. It's there! How can I save it to variable mytable?

You select the database when creating the connection and the table when creating the tbl (with the from argument).
There is no standard interface to dbConnect, so the exact way to pass the database name depends on the DBDriver you use. Indeed DBI::dbConnect is simply a generic dispatching to the driver-specific dbConnect.
In your case, the driver is odbc so you can check out the documentation for odbc::dbConnect and you'll see the relevant argument is database.
This will work:
con <- dbConnect(odbc::odbc(), "myHive", database = "somedb")
df <- tbl(con, from = "sometable")
With most other drivers (e.g. RMariaDB, RMySQL, RPostgres, RSQLite), the argument is called dbname, so you'd do this:
con <- dbConnect(RMariaDB::MariaDB(), dbname = "somedb")
df <- tbl(con, from = "sometable")

I think I found it, use in_schema
mytable <- tbl(con, in_schema("somedb", "sometable"))
This returns a list not a tbl though so I'm not sure.

Related

How to save to pre-existing Snowflake table from R using pool

I am using pool to handle connections to my Snowflake warehouse. I have created a connection to my database and can read data in a pre-existing table with no issues e.g:
my_pool <- dbPool(odbc::odbc(),
Driver = "Snowflake",
Server = Sys.getenv('WH_URL'),
UID = Sys.getenv('WH_USER'),
PWD = Sys.getenv('WH_PW'),
Warehouse = Sys.getenv('WH_WH'),
Database = "MY_DB")
my_data<-tbl(my_pool, in_schema(sql("schema_name"), sql("table_name"))) %>%
collect()
I would like to save back to a table (table_name) and I believe the best way to do this is with pool::dbWriteTable:
# Create some data to save to db
data<-data.frame("user_email" = "tim#apple.com",
"query_run" = "arrivals_departures",
"data_downloaded" = FALSE,
"created_at" = as.character(Sys.time()))
# Define where to save the data
table_id <- Id(database="MY_DB", schema="MY_SCHEMA", table="TABLE_NAME")
# Write to database
pool::dbWriteTable(my_pool, table_id, data, append=TRUE)
However this returns the error:
Error in new_result(connection#ptr, statement, immediate) :
nanodbc/nanodbc.cpp:1594: 00000: SQL compilation error:
Object 'MY_DB.MY_SCHEMA.TABLE_NAME' already exists.
I have read/write/update permissions for this database for the user specified in my_pool.
I have explored the accepted answers here and here to create the above attempt and can't figure out what I'm doing wrong. It's probably something simple that I've forgotten to do - any thoughts?
EDIT: Wondering if my issue is anything to do with: https://github.com/r-dbi/odbc/issues/480

Save dplyr query to different schema in dbplyr

I have a JDBC connection and would like to query data from one schema and save to another
library(tidyverse)
library(dbplyr)
library(rJava)
library(RJDBC)
# access the temp table in the native schema
tbl(conn, "temp")
temp_ed <- temp %*% mutate(new = 1)
# I would like to save temp_ed to a new schema "schmema_new"
I would like to use something like dbplyr::compute() but define the output schema specifically. It seems dbplyr::copy_to could be used, but would require bringing the data through the local machine.
I want to use something like RJDBC::dbSendUpdate() but which would ideally integrate nicely with the data manipulating pipeline above.
I do this using dbExecute from the DBI package.
The key idea is to extract the query that defines the current remote table and make this a sub-query in a larger SQL query that writes the table. This requires that (1) the schema exists, (2) you have permission to write new tables, and (3) you know the correct SQL syntax.
Doing this directly might look like:
tbl(conn, "temp")
temp_ed <- temp %*% mutate(new = 1)
save_table_query = paste(
"SELECT * INTO my_database.schema_new.my_table FROM (\n",
dbplyr::sql_render(temp_ed),
"\n) AS sub_query"
)
dbExecute(conn, as.character(save_table_query))
INTO is the clause for writing a new table in SQL server (the flavour of SQL I use). You will need to find the equivalent clause for your database.
In practice I use a custom function that looks something like this:
write_to_database <- function(input_tbl, db, schema, tbl_name){
# connection
tbl_connection <- input_tbl$src$con
# SQL query
sql_query <- glue::glue(
"SELECT *\n",
"INTO {db}.{schema}.{tbl_name}\n",
"FROM (\n",
dbplyr::sql_render(input_tbl),
"\n) AS sub_query"
)
result <- dbExecute(tbl_connection, as.character(sql_query))
}
Applying this in your context:
tbl(conn, "temp")
temp_ed <- temp %*% mutate(new = 1)
write_to_database(temp_ed, "my_database", "schema_new", "my_table")

Reading Data from a SQL server in RStudio (dplyr)

I am sure this question is very basic, but this is the first time I am using R connected to a server, so a few things still confuse me.
I used ODBC Data Sources on Windows to create a DNS, and used
con <- dbConnect(odbc::odbc(), "TEST_SERVER")
this worked, and now under the connection tab I can see the server, and if I double click I can see the databases and tables that exist in the server. How would I go about reading something inside one of those databases?
For Example, if the database name is db1, and the table name is t1, what is the code needed to read that table into local memory? I would prefer using dbplyr as I am familiar with the syntax. I am just unsure how to refer to a particular database and table after making the connection to the server.
I haven't used dbplyr before, but you can query the database using dbGetQuery.
test <- dbGetQuery(
con,
"SELECT *
FROM db1.t1
"
)
You can also pass the database into the connection string.
con <- dbConnect(
drv = odbc(),
dsn = "TEST_SERVER",
database = "db1"
)
And then your query would just be "SELECT * FROM t1".
EDIT: To query the table using dbplyr:
tbl1 <- tbl(con, "t1")
qry <- tbl1 %>% head() %>% collect()
I like to use RODBC-
con <- RODBC::odbcConnect(dsn = 'your_dsn',
uid = 'userid',
pwd = 'password')
table_output <- RODBC::sqlQuery(con, 'SELECT * FROM Table')

Appending records to a table in SQL Server from R

I am able to establish a connection to a Microsoft SQL Server and am also able to read tables.
pool <- pool::dbPool(drv=odbc::odbc(),
dsn="MYDSN",
uid = "MYUID",
pwd = "XXXXX")
con <- poolCheckout(pool)
WVDListFull <- tbl(con, in_schema('Midas',"WVDListFull")) %>% head() %>% collect()
However I am unable to append new records to the table. Assuming that I have new records in a dataframe called x, I ttried the following code:
dbWriteTable(pool,'[Midas].[WVDListFull]', x, append=TRUE)
This gave me an error:
nanodbc/nanodbc.cpp:1587: 42000: [FreeTDS][SQL Server]CREATE TABLE permission denied in database 'ScorpioEDW'.
I do have read and write permissions on the said database. I also tried this:
dbWriteTable(con,DBI::SQL("Midas.WVDListFull"), x, append=TRUE)
Which resulted in another error:
Error: Can't unquote Midas.WVDListFull
Here Midas is the schema containing the table WVDListFull. Can someone tell me what's going on here?

Writing to specific schemas with RPostgreSQL

I'm using RPostgreSQL to read and write data. Reading from any schema works perfectly, but I'm not able to write to non-public schemas. For example, the following code places a table in the public schema, with the name myschema.tablex
# write dataframe to postgres
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, host="localhost", user="postgres", password="zzzz", dbname="mydatabase", port="5436")
if(dbExistsTable(con,"myschema.tablex")) {
dbRemoveTable(con,"myschema.vkt_tablex")}
dbWriteTable(con,"myschema.tablex", dataframe, row.names=F)
What I want to do, is to place the table tablex in the schema myschema. I've also tried to name the schema in the connection: dbname="mydatabase.myschema" and trying the argument schemaname which I saw referred to in an earlier bug.
None of these approaches work, so I'm wondering if there is another method that I can use.
Use this:
library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "db", host = "host", port = 5432,
user = "user", password = "pwd")
dbWriteTable(con, c("yourschema", "yourtable"), value = yourRdataframe)
dbDisconnect(con)
More details: https://stat.ethz.ch/pipermail/r-sig-db/2011q1/001043.html
The default schema where objects are created is defined by the search_path. One way would be to set it accordingly. For instance:
SET search_path = myschema, public;
I quote the manual:
When objects are created without specifying a particular target
schema, they will be placed in the first schema listed in the search
path. An error is reported if the search path is empty.
You can also make this the default for a role, so it is set automatically for every connection made by this role. More:
How does the search_path influence identifier resolution and the "current schema"
In case a reader is using the newer package RPostgres to do this, the code to specify schemas is:
dbCreateTable(conn = con, name = Id(schema = "yourschema", table = "yourtable"), fields = yourRdataframe)

Resources