Here is my code
library(DBI)
library(dplyr)
con <- dbConnect(odbc::odbc(), some_credentials)
dbListTables(con, table_name = "Table_A")
The above code returns Table_A indicating presence of table. Now I am trying to query Table_A
df <- as.data.frame(tbl(con, "Table_A"))
and get back:
Error: <SQL> 'SELECT *
FROM "Table_A" AS "zzz18"
WHERE (0 = 1)'
nanodbc/nanodbc.cpp:1587: 42S02: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid object name 'Table_A'.
so dplyr does not see it. How can I reconcile. I already double checked spelling.
As mentioned, any object (table, stored procedure, function, etc.) residing in a non-default schema requires explicit reference to the schema. Default schemas include dbo in SQL Server and public in PostgreSQL. Therefore, as docs indicate, use in_schema in dbdplyr and Id or SQL in DBI:
# dbplyr VERSION
df <- tbl(con, in_schema("myschema", "Table_A"))
# DBI VERSION
t <- Id(schema = "myschema", table = "Table_A")
df <- dbReadTable(con, t)
df <- dbReadTable(con, SQL("myschema.Table_A"))
Without a reproducible example it is kinda hard but I will try my best. I think you should add the dbplyr package which is often used for connecting to databases.
library(DBI)
library(dbplyr)
library(tidyverse)
con <- dbConnect(odbc::odbc(), some_credentials)
df <- tbl(con, "Table_A") %>%
collect() #will create a dataframe in R and use dplyr
Here are some additional resources:
https://cran.r-project.org/web/packages/dbplyr/vignettes/dbplyr.html
Hope that can help!
Related
I'm joining two relatively simple tables using ODBC and dbplyr. However, I'm getting an error on my join key, it's throwing up an ambiguous column name error. This doesn't happen normally with dplyr joins, and I don't know how to use like an a.key = b.key, using dbplyr.
Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]Ambiguous column name 'Calendar_key'. [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared.
<SQL> 'SELECT "Calendar_key", "Organization_key", "Product_Key", "Promotion_Key", "Shift_Key", "ETL_source_system_key", "Pack_Size", "Qty_Sold", "Inv_Unit_Qty", "Extended_Cost", "Extended_Purchase_Rebate", "Extended_Sales_Rebate", "Extended_Sales", "Ent_Source_Hdr_Key", "Ent_Source_Dtl_Key", "Day_Date", "Day_Of_Week_ID", "Day_Of_Week", "Holiday", "Type_Of_Day", "Calendar_Month_No", "Calendar_Month_Name", "Calendar_Qtr_No", "Calendar_Qtr_Desc", "Calendar_Year", "Fiscal_Week", "Fiscal_Period_No", "Fiscal_Period_Desc", "Fiscal_Year"
FROM "Item_Sales_Fact" AS "LHS"
LEFT JOIN "calendar" AS "RHS"
ON ("LHS"."Calendar_key" = "RHS"."calendar_key")
This is the code block below: My connection is called con
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "192.168.139.1",
Database = "pdi_warehouse_2304_01",
UID = XXXX,
PWD = XXXX,
Port = 1433)
item.sales <- tbl(con, "Item_Sales_Fact")
calendar <- tbl(con, "calendar")
organization <- tbl(con, "Organization")
test.df <- item.sales %>%
left_join(calendar, by = c("Calendar_key" = "calendar_key")) %>%
collect()
The SQL generated by dbplyr isn't correct as Calendar_key can either come from RHS or LHS because SQL isn't case sensitive and contrary to R doesn't make a distinction between Calendar_key and calendar_key:
SELECT "Calendar_key", ...
The problem seems to come from the fact that although SQL isn't case sensitive, SQL Server handles case sensitive column names.
A workaround is to rename one of the two keys to obtain exactly the same case sensitive names:
item.sales <- tbl(con, "Item_Sales_Fact")
calendar <- tbl(con, "calendar") %>% rename(Calendar_key = calendar_key)
test.df <- item.sales %>%
left_join(calendar, by = c("Calendar_key" = "Calendar_key")) %>%
collect()
I connected R to SQL using the following:
library(dplyr)
library(dbplyr)
library(odbc)
library(RODBC)
library(DBI)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "srv name",
Database = "Warehouse")
I pull in the table I want using
data <- tbl(con, in_schema("prc", "PricingLawOfUniv")))
The following things show me what I expect to see (a 38 X 1000 table of data):
head(data)
colnames(data)
The following things behave as I expect:
In the Environment data is a "list of 2"
View(data) shows a list with "src" and "ops" - each of those is also a list of 2.
Ultimately I want to work with the 38 X 1000 table as a dataframe using dplyr. How can I do this? I tried data[1] and data[2] but neither worked. Where is the actual table I want hiding?
You could use DBI::Id to specify the table/schema, and then dbReadTable:
tbl <- DBI::Id(
schema = "prc",
table = "PricingLawOfUniv"
)
data <- DBI::dbReadTable(con, tbl)
In another project working with Amazon Athena I could do this:
con <- DBI::dbConnect(odbc::odbc(), Driver = "path-to-driver",
S3OutputLocation = "location",
AwsRegion = "eu-west-1", AuthenticationType = "IAM Profile",
AWSProfile = "profile", Schema = "prod")
tbl(con,
# Run SQL query
sql('SELECT *
FROM TABLE')) %>%
# Without having collected the data, I could further wrangle the data inside the database
# using dplyr code
select(var1, var2) %>%
mutate(var3 = var1 + var2)
However, now using BigQuery I get the following error
con <- DBI::dbConnect(bigrquery::bigquery(),
project = "project")
tbl(con,
sql(
'SELECT *
FROM TABLE'
))
Error: dataset is not a string (a length one character vector).
Any idea if with BigQuery is not possible to do what I'm trying to do?
Not a BigQuery user, so can't test this, but from looking at this example it appears unrelated to how you are piping queries (%>%). Instead it appears BigQuery does not support receiving a tbl with an sql string as the second argument.
So it is likely to work when the second argument is a string with the name of the table:
tbl(con, "db_name.table_name")
But you should expect it to fail if the second argument is of type sql:
query_string = "SELECT * FROM db_name.table_name"
tbl(con, sql(query_string))
Other things to test:
Using odbc::odbc() to connect to BigQuery instead of bigquery::bigquery(). The problem could be caused by the bigquery package.
The second approach without the conversation to sql: tbl(con, query_string)
I'm connected to Hive using dbplyr and odbc.
A table I would like to connect to is called "pros_year_month":
library(odbc)
library(tidyverse)
library(dbplyr)
con <- dbConnect(odbc::odbc(), "HiveProd")
prosym <- tbl(con, in_schema("my_schema_name", "pros_year_month"))
Table pros_year_month has several fields, two of which are "country" and "year_month".
This appears to work without any problem:
pros_nov <- prosym %>% filter(country == "United States") %>% collect()
However this does not:
pros_nov <- prosym %>% filter(year_month = ymd(as.character(paste0(year_month, "01")))) %>% collect()
Error in new_result(connection#ptr, statement) :
nanodbc/nanodbc.cpp:1344: 42000: [Hortonworks][Hardy] (80) Syntax or
semantic analysis error thrown in server while executing query. Error
message from server: Error while compiling statement: FAILED:
SemanticException [Error 10004]: Line 1:7 Invalid table alias or
column reference 'zzz1.year_month': (possible column names are:
year_month, country, ...
It looks like the field name year_month is somehow now zzz1.year_month? Not sure what this is or how to get around it.
How can I apply a filter for country then year_month before calling collect on a dbplyr object?
Does anyone have a connection string example for using RODBC and connecting to MS SQL Server 2005 or 2008.
Thank you.
library(RODBC)
dbhandle <- odbcDriverConnect('driver={SQL Server};server=mysqlhost;database=mydbname;trusted_connection=true')
res <- sqlQuery(dbhandle, 'select * from information_schema.tables')
Taken from a posting to r-help:
library(RODBC)
channel <- odbcDriverConnect("driver=SQL Server;server=01wh155073")
initdata<- sqlQuery(channel,paste("select * from test_DB ..
test_vikrant"))
dim(initdata)
odbcClose(channel)
If you have to include the USERNAME and PASSWORD:
library(RODBC) # don't forget to install it beforehand
my_server="ABC05"
my_db="myDatabaseName"
my_username="JohnDoe"
my_pwd="mVwpR55zobUldrdtXqeHez"
db <- odbcDriverConnect(paste0("DRIVER={SQL Server};
server=",my_server,";
database=",my_db,";
uid=",my_username,";
pwd=",my_pwd))
sql="SELECT * FROM dbo.MyTableName" #dbo is the schema here
df <- sqlQuery(db,sql)
Try to use RSQLS package: https://github.com/martinkabe/RSQLS
Very fast pushes data from data.frame to SQL Server or pulls from SQL Server to data.frame.
Example:
library(devtools)
install_github("martinkabe/RSQLS")
library(RSQLS)
cs <- set_connString("LAPTOP-USER\\SQLEXPRESS", "Database_Name")
push_data(cs, dataFrame, "dbo.TableName", append = TRUE, showprogress = TRUE)
df <- pull_data(cs, "SELECT * FROM dbo.TableName", showprogress = TRUE)
This solution is much faster and more robust than RODBC::sqlSave or DBI::dbWriteTable.
First You have to Create/configure DSN (ODBC connection with specific DB)
Then install RODBC library.
library(RODBC)
myconn <-odbcConnect("MyDSN", uid="***", pwd="*******")
fetchData<- sqlQuery(myconn, "select * from tableName")
View(fetchData)
close(myconn)