How do I access nested SQL tables in R? - r

From R Studio's ODBC database documentation I can see a simple example of how to read a SQL table into an R data frame:
data <- dbReadTable(con, "flights")
Let me paste a graphic of the BGBUref table(?) I'm trying to read to an R data frame. This is from my connection pane in R studio.
If I use the same syntax as above, where con is the output of my dbConnect(...) I get the following:
df <- dbReadTable(con, "BGBURef")
#> Error: <SQL> 'SELECT * FROM "BGBURef"' nanodbc/nanodbc.cpp:1587: 42S02:
#> [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid object name
#> 'BGBURef'.
Is my understanding of what a "table" is incorrect? Or do I need to do something like this to get to the nested BGBUref table:
df <- dbReadTable(con, "QnRStore\dbo\BGBURef")
#> Error: '\d' is an unrecognized escape in character string starting ""QnRStore\d"
The BGBUref data frame will come up in R Studio if I click on the little spreadsheet icon. I just can't figure out how to get it into a defined data frame, in my case df.
Here's the output when I run these commands:
df <- dbReadTable(con, "QnRStore")
#> Error: <SQL> 'SELECT * FROM "QnRStore"'
#> nanodbc/nanodbc.cpp:1587: 42S02: [Microsoft][ODBC Driver 17 for SQL
#> Server][SQL Server]Invalid object name 'QnRStore'.
and:
dbListTables(con)
#> [1] "spt_fallback_db"
#> [2] "spt_fallback_dev"
#> [3] "spt_fallback_usg"
#> [4] "spt_monitor"
#> [5] "trace_xe_action_map"
#> [6] "trace_xe_event_map"
#> [7] "spt_values"
#> [8] "CHECK_CONSTRAINTS"
#> [9] "COLUMN_DOMAIN_USAGE"
#> [10] "COLUMN_PRIVILEGES"
#> ...
#> [650] "xml_schema_types"
#> [651] "xml_schema_wildcard_namespaces"
#> [652] "xml_schema_wildcards"

General Background
Before anything, consider reading on the relational database architecture where tables are encapsulated in schemas which themselves are encapsulated in databases which are then encapsulated in servers or clusters. Notice the icons in your image correspond to the object type:
cluster/server < catalog/database < schema/namespace < table
Hence, there is no nested tables in your situation but a typical architecture:
myserver < QnRStore < dbo < BGBURef
To access this architecture from server-level in an SQL query, you would use period-qualifying names:
SELECT * FROM database.schema.table
SELECT * FROM QnRStore.dbo.BGBURef
The default schema for SQL Server is dbo (by comparison for Postgres it is public). Usually, DB-APIs like R's odbc connects to a database which allows connection to any underlying schemas and corresponding tables, assuming the connected user has access to such schemas. Please note this rule is not generalizable. For example, Oracle's schema aligns to owner and MySQL's database is synonymous to schema.
See further reading:
What is the difference between a schema and a table and a database?
Differences between Database and Schema using different databases?
Difference between database and schema
What's the difference between a catalog and a schema in a relational database?
A database schema vs a database tablespace?
Specific Case
Therefore, to connect to an SQL Server database table in a default schema, simply reference the table, BGBURef, which assumes the table resides in the dbo schema of your connecting database.
df <- dbReadTable(con, "BGBURef")
If you use a non-default schema, you will need to specify it accordingly which recently you can do with DBI::Id and can use it similarly for dbReadTable and dbWriteTable:
s <- Id(schema = "myschema", table = "mytable")
df <- dbReadTable(con, s)
dbWriteTable(conn, s, mydataframe)
Alternatively, you can run the needed period qualifying SQL query:
df <- dbGetQuery(con, "SELECT * FROM [myschema].[mytable]")
And you can use SQL() for writing to persistent tables:
dbWriteTable(con, SQL("myschema.mytable"), mydataframe)

When using dbplyr it appears that
df = tbl(con, from = 'BGBUref')
if roughly equivalent to
USE QnRStore
GO
SELECT * FROM BGBUref;
From #David_Browne's comment and the image it looks like you have:
A table named 'BGBUref'
In a schema named 'dbo'
In a database called 'QnRStore'
In this case you need the in_schema command.
If your connection (con) is to the QnRStore database then try this:
df = tbl(con, from = in_schema('dbo', 'BGBUref'))
If your connection (con) is not to the QnRStore database directly then this may work:
df = tbl(con, from = in_schema('QnRStore.dbo', 'BGBUref'))
(I use this form when accessing multiple databases via the same connection. Because dbplyr performs best if you use the same connection when joining between tables from different databases.)

Related

JDBC error when querying Oracle 11.1 (ORA-00933)

I have searched high and low for answers so apologies if it has already been answered!
Using R I am trying to perform a lazy evaluation of Oracle 11.1 databases. I have used JDBC to facilitate the connection and I can confirm it works fine. I am also able to query tables using dbGetQuery, although the results are so large that I quickly run out of memory.
I have tried dbplyr/dplyr tbl(con, "ORACLE_TABLE") although I get the following error:
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", :
Unable to retrieve JDBC result set for SELECT *
FROM "ORACLE_TABLE" AS "zzz39"
WHERE (0 = 1) (ORA-00933: SQL command not properly ended)
I have also tried using db_table <- tbl(con, in_schema('ORACLE_TABLE'))
This is happening with all databases I am connected to, despite being able to perform a regular dbGetQuery.
Full Code:
# Libraries
library(odbc)
library(DBI)
library(config)
library(RJDBC)
library(dplyr)
library(tidyr)
library(magrittr)
library(stringr)
library(xlsx)
library(RSQLite)
library(dbplyr)
Oracle Connection
db <- config::get('db')
drv1 <- JDBC(driverClass=db$driverClass, classPath=db$classPath)
con_db <- dbConnect(drv1, db$connStr, db$orauser, db$orapw, trusted_connection = TRUE)
# Query (This one works but the data set is too large)
db_data <- dbSendQuery(con_db, "SELECT end_dte, reference, id_number FROM ORACLE_TABLE where end_dte > '01JAN2019'")
**# Query (this one wont work)**
oracle_table <- tbl(con_db, "ORACLE_TABLE")
Solved:
Updated Rstudio + Packages.
Follow this manual:
https://www.linkedin.com/pulse/connect-oracle-database-r-rjdbc-tianwei-zhang/
Insert the following code after 'con':
sql_translate_env.JDBCConnection <- dbplyr:::sql_translate_env.Oracle
sql_select.JDBCConnection <- dbplyr:::sql_select.Oracle
sql_subquery.JDBCConnection <- dbplyr:::sql_subquery.Oracle

Problems Connecting to MS SQL Server Through R DBI Package

I'm trying to setup a connection to a SQL Server from my Mac using the
Microsoft OBDC Driver and the DBI package.
The connection establishes, however character fields, even those that have no special characters, return garbled. The database is proprietary so I'm limited as to what actual output I can show. Numeric fields return fine.
Some other notes.
If I submit a query, I'm able to receive a record set using the correct table. For example the below query returns results, and the column name is correct. The data in the column is garbled
> dbGetquery(con, "Select name from tb1", n = 1)
Warning: Pending rows
name
1 CalteMtrSeda
dbListTables() also returns garbled output, even though as shown above I can receive output from the table referencing it by name.
dbListTables() returns the correct number of tables, but the names are not intelligible.
grep("tb1", dbListTables(con), value = TRUE)
character(0)
Output from my con object
> con
<OdbcConnection> user#ExpectedDataBase
Database: NameIWouldExpect
Microsoft SQL Server Version: 13.00.1742
** Updated to include pattern.
I'm getting every other character returned. From the example above.
CalteMtrSeda == CharlotteMotorSpeedway
This is the first time I've attempted to connect to this database from a Mac.
Turned out to be related to R3.6. Reverting to R3.5 fixed the issue. Link to relevant issue in odbc repo
https://github.com/r-dbi/odbc/issues/283

Avoiding warning message “There is a result object still in use” when using dbSendQuery to create table on database

Background:
I use dbplyr and dplyr to extract data from a database, then I use the command dbSendQuery() to build my table.
Issue:
After the table is built, if I run another command I get the following warning:
Warning messages:
1. In new_result(connection#ptr, statement): Cancelling previous query
2. In connection_release(conn#ptr) :
 There is a result object still in use.
The connection will be automatically released when it is closed.
Question:
Because I don’t have a result to fetch (I am sending a command to build a table) I’m not sure how to avoid this warning. At the moment I disconnect after building a table and the error goes away. Is there anything I can do do to avoid this warning?
Currently everything works, I just have this warning. I'd just like to avoid it as I assume I should be clearing something after I've built my table.
Code sample
# establish connection
con = DBI::dbConnect(<connection stuff here>)
# connect to table and database
transactions = tbl(con,in_schema(“DATABASE_NAME”,”TABLE_NAME”))
# build query string
query_string = “SELECT * FROM some_table”
# drop current version of table
DBI::dbSendQuery(con,paste('DROP TABLE MY_DB.MY_TABLE'))
# build new version of table
DBI::dbSendQuery(con,paste('CREATE TABLE PABLE MY_DB.MY_TABLE AS (‘,query_string,’) WITH DATA'))
Even though you're not retrieving stuff with a SELECT clause, DBI still allocates a result set after every call to DBI::dbSendQuery().
Give it a try with DBI::dbClearResult() in between of DBI::dbSendQuery() calls.
DBI::dbClearResult() does:
Clear A Result Set
Frees all resources (local and remote) associated with a
result set. In some cases (e.g., very large result sets) this
can be a critical step to avoid exhausting resources
(memory, file descriptors, etc.)
The example of the man page should give a hint how the function should be called:
con <- dbConnect(RSQLite::SQLite(), ":memory:")
rs <- dbSendQuery(con, "SELECT 1")
print(dbFetch(rs))
dbClearResult(rs)
dbDisconnect(con)

R - RPostgreSQL Package - dbWriteTable to non-default schema where target table contains more fields than dataframe

The Issue
I am attempting to copy the contents of an R dataframe df to a PostgreSQL table table_name located in schema schema_name. By default, PostgreSQL will write tables to the public schema and I do not want to change this setting. The two unique aspects of this transfer are:
Writing to a table under a non-default schema; and
The dataframe df contains a fewer number of fields than table_name. All the fields contained in df, however, do exist in table_name.
What I've Tried
I first attempted to use dbWriteTable from the RPostgreSQL package by using a workaround:
dbWriteTable(con, c("schema_name","table_name"), df, append = T)
resulting in the following exception:
Error in postgresqlgetResult(new.con) :
RS-DBI driver: (could not Retrieve the result : ERROR: missing data for column "extra_col"
CONTEXT: COPY df, line 1: " [removed contents] "
I then attempted to us dbWriteTable2 from the caroline package (a wrapper for the aforementioned dbWriteTable function), but the non-default schema hack employed above does not appear to work:
dbWriteTable2(con, c("schema_name","table_name"), df, append = T, add.id = FALSE)
creates the following exception:
creating NAs/NULLs for for fields of table that are missing in your df
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: relation "schema_name" does not exist
LINE 1: SELECT * FROM schema_name ORDER BY id DESC LIMIT 1
Add the missing null fields before the query :
df$extr_col1 <- NA
df$extr_col2 <- NA
...
then run your original dbWriteTable()...

Unable to debug a function in R

Debugging the R script I have come across a strange error: “Error in debug(fun, text, condition) : argument must be a closure”.
PC features: Win7/64 bit, Oracle client 12 (both 32 and 64bit), R (64bit)
Earlier the script has been debugged well without errors. I have looked for a clue in the Inet but have found no clear explanation what the mistake is and how to remove it.
Running the script as a plain script but not a function produces no errors.
I would be very grateful for your ideas 
The source script (connection to oracle DB and executing a simple query)as follows (conects to Oracle DB and execute the query:
download1<-function(){
if (require("dplyr")){
#install.packages("dplyr")
}
if (require("RODBC")){
#install.packages("RODBC")
}
library(RODBC)
library(dplyr)
# to establish connection with DB or schema
con <- odbcConnect("DB", uid="ANALYTICS", pwd="122334fgcx", rows_at_time = 500,believeNRows=FALSE)
# Check that connection is working (Optional)
odbcGetInfo(con)
# Query the database and put the results into the data frame "dataframe"
ptm <- proc.time()
x<-sqlQuery(con, "select * from my_table")
proc.time()-ptm
# to extract all field names to the separate vector
#field_names<-sqlQuery(con,"SELECT column_name FROM all_tab_cols WHERE table_name = 'MY_TABLE'")
close(con)
}
debug(download1(),text = "", condition = NULL)
Use
debug(download1)
download1()

Resources