Reading Data from a SQL server in RStudio (dplyr) - r

I am sure this question is very basic, but this is the first time I am using R connected to a server, so a few things still confuse me.
I used ODBC Data Sources on Windows to create a DNS, and used
con <- dbConnect(odbc::odbc(), "TEST_SERVER")
this worked, and now under the connection tab I can see the server, and if I double click I can see the databases and tables that exist in the server. How would I go about reading something inside one of those databases?
For Example, if the database name is db1, and the table name is t1, what is the code needed to read that table into local memory? I would prefer using dbplyr as I am familiar with the syntax. I am just unsure how to refer to a particular database and table after making the connection to the server.

I haven't used dbplyr before, but you can query the database using dbGetQuery.
test <- dbGetQuery(
con,
"SELECT *
FROM db1.t1
"
)
You can also pass the database into the connection string.
con <- dbConnect(
drv = odbc(),
dsn = "TEST_SERVER",
database = "db1"
)
And then your query would just be "SELECT * FROM t1".
EDIT: To query the table using dbplyr:
tbl1 <- tbl(con, "t1")
qry <- tbl1 %>% head() %>% collect()

I like to use RODBC-
con <- RODBC::odbcConnect(dsn = 'your_dsn',
uid = 'userid',
pwd = 'password')
table_output <- RODBC::sqlQuery(con, 'SELECT * FROM Table')

Related

Save dplyr query to different schema in dbplyr

I have a JDBC connection and would like to query data from one schema and save to another
library(tidyverse)
library(dbplyr)
library(rJava)
library(RJDBC)
# access the temp table in the native schema
tbl(conn, "temp")
temp_ed <- temp %*% mutate(new = 1)
# I would like to save temp_ed to a new schema "schmema_new"
I would like to use something like dbplyr::compute() but define the output schema specifically. It seems dbplyr::copy_to could be used, but would require bringing the data through the local machine.
I want to use something like RJDBC::dbSendUpdate() but which would ideally integrate nicely with the data manipulating pipeline above.
I do this using dbExecute from the DBI package.
The key idea is to extract the query that defines the current remote table and make this a sub-query in a larger SQL query that writes the table. This requires that (1) the schema exists, (2) you have permission to write new tables, and (3) you know the correct SQL syntax.
Doing this directly might look like:
tbl(conn, "temp")
temp_ed <- temp %*% mutate(new = 1)
save_table_query = paste(
"SELECT * INTO my_database.schema_new.my_table FROM (\n",
dbplyr::sql_render(temp_ed),
"\n) AS sub_query"
)
dbExecute(conn, as.character(save_table_query))
INTO is the clause for writing a new table in SQL server (the flavour of SQL I use). You will need to find the equivalent clause for your database.
In practice I use a custom function that looks something like this:
write_to_database <- function(input_tbl, db, schema, tbl_name){
# connection
tbl_connection <- input_tbl$src$con
# SQL query
sql_query <- glue::glue(
"SELECT *\n",
"INTO {db}.{schema}.{tbl_name}\n",
"FROM (\n",
dbplyr::sql_render(input_tbl),
"\n) AS sub_query"
)
result <- dbExecute(tbl_connection, as.character(sql_query))
}
Applying this in your context:
tbl(conn, "temp")
temp_ed <- temp %*% mutate(new = 1)
write_to_database(temp_ed, "my_database", "schema_new", "my_table")

How get access to SAP sandbox via R?

I made registration on SAP ID Service to check how their sandbox works.
How it's possible to connect to the sandbox data tables by using R?
The examples I found:
library ("RODBC")
# 1
ch <- odbcConnect("data source name", uid = "test_hana" , pwd = "test12")
sqlQuery(ch, "SELECT * FROM '_SYS_BIC'.'BILLING_DATA'")
# 2
ch <- odbcConnect("HANA_TK", uid="xxxx", pwd="xxxx")
odbcQuery(ch, "SELECT table_name from SYS.CS_TABLES_ where schema_name = 'SFLIGHT'")
tables <- sqlGetResults(ch)
odbcClose(ch)
Both not work. And, it's unclear how to have access to at least one SAP table in the sandbox.
Any ideas are welcome!
You need to setup ODBC connection into HANA in system settings, here is how or here, and then you must install and load RODBC package in R:
> install.packages("RODBC")
> library("RODBC")
Then connect to HANA through the connection string:
> channel <- odbcConnect(“data source name”,uid=”test_hana”,pwd=”test12″);
and pull data like this:
> sqlQuery(channel, ‘SELECT * FROM “_SYS_BIC”.”BILLING_DATA” ‘)

RStudio - ODBC connection with SQL query taken from sql file

I have my ODBC query to connect to Teradata and just wonder if I can read in the SQL file as oppose to have SQL code in? I am trying to find R function for Python's pd.read_sql_query(f, con) where f is my SQL file with code.
So for my connection, it would change from:
con <- function(){
query <- paste0("
SELECT * FROM table1
")
print(queryData(query))
}
con<- data.frame(con())
to
con <- function(){
query <- "SQL_code.sql"
print(queryData(query))
}
con<- data.frame(con())
read your sql from a file:
sql_query <- read.delim('/path/SQL_code.sql', header = FALSE) %>% as.character()
then define the connection and use it:
library(DBI)
db <- dbConnect(...)
dbGetQuery(db, sql_query)
If I understand your question correctly, you could try something like this?
library(DBI)
library(readr)
df <- dbGetQuery(con, statement = read_file('SQL_code.sql'))
# con is your connection
If it does not solve your problem, there may be some solutions here: How to read the contents of an .sql file into an R script to run a query?

Are dplyr joins between different data sources performed on database side or in local?

I would like to have a list of the queries sent to the database by my R script, but it is a bit unclear to me how/where are performed some operations involving local dataframes & database tables.
As mentionned in this post, it seems that when performing an operation between a data frame in local env and a table from a DBI connexion (e.g. a left_join(... ,copy=TRUE)), - copy=TRUE needed because data is coming from different datasources - the operations are performed on the database side, working with temporary tables.
I tried to verify this using the show_query() to see exactly what is sent to the database and what is not.
I cannot give a proper reproductible example as it involves a database connexion, but here is the logic :
con <- DBI::dbConnect(odbc::odbc(),
Driver = "SQL Server",
Server = "server",
Database = "database",
UID = "user",
PWD = "pwd",
Port = port)
db_table <- tbl(con, "tbl_A")
local_df <- read.csv("/.../file.csv",stringsAsFactors = FALSE)
q1 <- local_df %>% inner_join(db_table ,by=c('id'='id'),copy=TRUE)
Below are the outputs of the show_query() statements :
> db_table %>% show_query()
<SQL>
SELECT *
FROM "tbl_A"
q1 %>% show_query()
Error in UseMethod("show_query") :
no applicable method for 'show_query' applied to an object of class "data.frame"
This makes me think that in that sequence, the only operation performed on the database side is SELECT * FROM "tbl_A", and that q1 is performed on the local environment using the local_df and a local copy of the database table.
I tried to have a look at the dplyr documentation but there is no information for when data is coming from multiple sources.

syntax for database.table in dbplyr?

I have a connection to our database:
con <- dbConnect(odbc::odbc(), "myHive")
I know this is successful because when I run it, in the top right of RStudio I can see all of our databases and tables.
My question is, how can I select a specific database table combination? The documentation shows a user sleecting a single table, "flights" but I need to do the equivilent of somedatabase.sometable.
Tried:
mytable <- tbl(con, "somedb.sometable")
Error in new_result(connection#ptr, statement) :
nanodbc/nanodbc.cpp:1344: 42S02: [Hortonworks][SQLEngine] (31740) Table or view not found: HIVE..dp_enterprise.uds_order
Then tried:
mytable <- tbl(con, "somedb::sometable")
Error in new_result(connection#ptr, statement) :
nanodbc/nanodbc.cpp:1344: 42S02: [Hortonworks][SQLEngine] (31740) Table or view not found: HIVE..somedb::sometable
I tried removing the quotes "" too.
Within the connections pane of RStudio I can see somedb.sometable. It's there! How can I save it to variable mytable?
You select the database when creating the connection and the table when creating the tbl (with the from argument).
There is no standard interface to dbConnect, so the exact way to pass the database name depends on the DBDriver you use. Indeed DBI::dbConnect is simply a generic dispatching to the driver-specific dbConnect.
In your case, the driver is odbc so you can check out the documentation for odbc::dbConnect and you'll see the relevant argument is database.
This will work:
con <- dbConnect(odbc::odbc(), "myHive", database = "somedb")
df <- tbl(con, from = "sometable")
With most other drivers (e.g. RMariaDB, RMySQL, RPostgres, RSQLite), the argument is called dbname, so you'd do this:
con <- dbConnect(RMariaDB::MariaDB(), dbname = "somedb")
df <- tbl(con, from = "sometable")
I think I found it, use in_schema
mytable <- tbl(con, in_schema("somedb", "sometable"))
This returns a list not a tbl though so I'm not sure.

Resources