Does anyone have a connection string example for using RODBC and connecting to MS SQL Server 2005 or 2008.
Thank you.
library(RODBC)
dbhandle <- odbcDriverConnect('driver={SQL Server};server=mysqlhost;database=mydbname;trusted_connection=true')
res <- sqlQuery(dbhandle, 'select * from information_schema.tables')
Taken from a posting to r-help:
library(RODBC)
channel <- odbcDriverConnect("driver=SQL Server;server=01wh155073")
initdata<- sqlQuery(channel,paste("select * from test_DB ..
test_vikrant"))
dim(initdata)
odbcClose(channel)
If you have to include the USERNAME and PASSWORD:
library(RODBC) # don't forget to install it beforehand
my_server="ABC05"
my_db="myDatabaseName"
my_username="JohnDoe"
my_pwd="mVwpR55zobUldrdtXqeHez"
db <- odbcDriverConnect(paste0("DRIVER={SQL Server};
server=",my_server,";
database=",my_db,";
uid=",my_username,";
pwd=",my_pwd))
sql="SELECT * FROM dbo.MyTableName" #dbo is the schema here
df <- sqlQuery(db,sql)
Try to use RSQLS package: https://github.com/martinkabe/RSQLS
Very fast pushes data from data.frame to SQL Server or pulls from SQL Server to data.frame.
Example:
library(devtools)
install_github("martinkabe/RSQLS")
library(RSQLS)
cs <- set_connString("LAPTOP-USER\\SQLEXPRESS", "Database_Name")
push_data(cs, dataFrame, "dbo.TableName", append = TRUE, showprogress = TRUE)
df <- pull_data(cs, "SELECT * FROM dbo.TableName", showprogress = TRUE)
This solution is much faster and more robust than RODBC::sqlSave or DBI::dbWriteTable.
First You have to Create/configure DSN (ODBC connection with specific DB)
Then install RODBC library.
library(RODBC)
myconn <-odbcConnect("MyDSN", uid="***", pwd="*******")
fetchData<- sqlQuery(myconn, "select * from tableName")
View(fetchData)
close(myconn)
Related
I have a problem with some data I´m working.
I extrac data from SQL SERVER and with R I work them, but for some fields of names, some names have instead of a letter the REPLACEMENT CHARACTER (Unicode Character 'REPLACEMENT CHARACTER' (U+FFFD)), is the one enter image description here
I don´t want to use the replace function, to change the entire name.
Some ideas?
example the name MAGAÑA: MAGA�A
I use the following code to the connection and query:
library(odbc)
library(tidyverse)
library(dgof)
library(pROC)
library(ggplot2)
library(dbplyr)
library(dplyr)
library(lubridate)
library(janitor)
library(DBI)
library(readxl)
library(data.table)
## Connection
conex1 <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "xxx.xxx.xxx.xx",
Database = "xxxxxxxx",
UID = "xxxxxxx",
PWD = "xxxxxxxxx",
Port = 1433)
# Query
Fecha_nac<- dbSendQuery(conex1, "SELECT id_orden,
fecha_nacimiento
FROM zzgm_clientes_xxxxxxx") %>%
dbFetch()
I think, iconv can help to you in this situation.
dataframe_with_right_symbols <- raw_dataframe %>%
mutate_if(is.character, function(col) iconv(col, to="UTF-8"))
I connected R to SQL using the following:
library(dplyr)
library(dbplyr)
library(odbc)
library(RODBC)
library(DBI)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "srv name",
Database = "Warehouse")
I pull in the table I want using
data <- tbl(con, in_schema("prc", "PricingLawOfUniv")))
The following things show me what I expect to see (a 38 X 1000 table of data):
head(data)
colnames(data)
The following things behave as I expect:
In the Environment data is a "list of 2"
View(data) shows a list with "src" and "ops" - each of those is also a list of 2.
Ultimately I want to work with the 38 X 1000 table as a dataframe using dplyr. How can I do this? I tried data[1] and data[2] but neither worked. Where is the actual table I want hiding?
You could use DBI::Id to specify the table/schema, and then dbReadTable:
tbl <- DBI::Id(
schema = "prc",
table = "PricingLawOfUniv"
)
data <- DBI::dbReadTable(con, tbl)
In another project working with Amazon Athena I could do this:
con <- DBI::dbConnect(odbc::odbc(), Driver = "path-to-driver",
S3OutputLocation = "location",
AwsRegion = "eu-west-1", AuthenticationType = "IAM Profile",
AWSProfile = "profile", Schema = "prod")
tbl(con,
# Run SQL query
sql('SELECT *
FROM TABLE')) %>%
# Without having collected the data, I could further wrangle the data inside the database
# using dplyr code
select(var1, var2) %>%
mutate(var3 = var1 + var2)
However, now using BigQuery I get the following error
con <- DBI::dbConnect(bigrquery::bigquery(),
project = "project")
tbl(con,
sql(
'SELECT *
FROM TABLE'
))
Error: dataset is not a string (a length one character vector).
Any idea if with BigQuery is not possible to do what I'm trying to do?
Not a BigQuery user, so can't test this, but from looking at this example it appears unrelated to how you are piping queries (%>%). Instead it appears BigQuery does not support receiving a tbl with an sql string as the second argument.
So it is likely to work when the second argument is a string with the name of the table:
tbl(con, "db_name.table_name")
But you should expect it to fail if the second argument is of type sql:
query_string = "SELECT * FROM db_name.table_name"
tbl(con, sql(query_string))
Other things to test:
Using odbc::odbc() to connect to BigQuery instead of bigquery::bigquery(). The problem could be caused by the bigquery package.
The second approach without the conversation to sql: tbl(con, query_string)
Here is my code
library(DBI)
library(dplyr)
con <- dbConnect(odbc::odbc(), some_credentials)
dbListTables(con, table_name = "Table_A")
The above code returns Table_A indicating presence of table. Now I am trying to query Table_A
df <- as.data.frame(tbl(con, "Table_A"))
and get back:
Error: <SQL> 'SELECT *
FROM "Table_A" AS "zzz18"
WHERE (0 = 1)'
nanodbc/nanodbc.cpp:1587: 42S02: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid object name 'Table_A'.
so dplyr does not see it. How can I reconcile. I already double checked spelling.
As mentioned, any object (table, stored procedure, function, etc.) residing in a non-default schema requires explicit reference to the schema. Default schemas include dbo in SQL Server and public in PostgreSQL. Therefore, as docs indicate, use in_schema in dbdplyr and Id or SQL in DBI:
# dbplyr VERSION
df <- tbl(con, in_schema("myschema", "Table_A"))
# DBI VERSION
t <- Id(schema = "myschema", table = "Table_A")
df <- dbReadTable(con, t)
df <- dbReadTable(con, SQL("myschema.Table_A"))
Without a reproducible example it is kinda hard but I will try my best. I think you should add the dbplyr package which is often used for connecting to databases.
library(DBI)
library(dbplyr)
library(tidyverse)
con <- dbConnect(odbc::odbc(), some_credentials)
df <- tbl(con, "Table_A") %>%
collect() #will create a dataframe in R and use dplyr
Here are some additional resources:
https://cran.r-project.org/web/packages/dbplyr/vignettes/dbplyr.html
Hope that can help!
Normally we do not find any trouble in using the below connection method and run queries from redshift
require("RPostgreSQL")
drv <- dbDriver("PostgreSQL")
conn <- dbConnect(drv, dbname = "redshiftdb",
host = "XX.XX.XX.XX", port = "1234",
user = "userid", password = "pwd")
my_data <- dbGetQuery(conn, "select a.*, b.* from redshiftdb.schema1.table1 inner join redshiftdb.schema2.table2 on a.key = b.key")
But the problem with this method is people can use long complex SQL queries which becomes hard to debug and illustrate while re engineering. Unless I am hard core SQL coder.
I have been learning R language since September and I thought it would it would be interesting to use dplyr joins and pipes to do the same work.
I connected using
conn <- src_postgres(dbname = "redshiftdb",
host = "XX.XX.XX.XX", port = 1234,
user = "userid",
password = "pwd")
my_tbl1 <- tbl(conn, dplyr::sql('select * from schema1.table1'))
my_tbl2 <- tbl(conn, dplyr::sql('select * from schema1.table2'))
my_tbl3 <- tbl(conn, dplyr::sql('select * from schema1.table3'))
my_tbl4 <- tbl(conn, dplyr::sql('select * from schema1.table4'))
my_tbl5 <- tbl(conn, dplyr::sql('select * from schema2.table1'))
my_tbl6 <- tbl(conn, dplyr::sql('select distinct var1, var2 from schema2.table2'))
my_tbl7 <- tbl(conn, dplyr::sql('select * from schema2.table3'))
I get the above error using left_join and %>% to join across tables in schema1 as well as cross schema (i.e. schema1 & schema2)
When I use copy = TRUE it takes time and gives WARNING : only 1,00,000 records copied and it is really time consuming.
I have checked
https://github.com/hadley/dplyr/issues/244
but the pool method does not seem to work.
Any help would be much appreciated or else learning dplyr would be of no use to serve my immediate purpose.