I´m trying to connect R with Netezza using the JDBC driver.
I manage to connect succesfully with the database, but the results are not corretc.
# Here are the connection details
library(RJDBC)
drv <- JDBC(driverClass="org.netezza.Driver", classPath = "C://JDBC//nzjdbc.jar", "'")
con <- dbConnect(drv, "jdbc:netezza://10.206.0.66:5480//DBASE", "USER", "PASS")
# > con
# An object of class "JDBCConnection"
# Slot "jc":
# [1] "Java-Object{org.netezza.sql.NzConnection#bce3d7}"
# Slot "identifier.quote":
# [1] "'"
res <- dbSendQuery(con, "SELECT * FROM DBASE.MARBEL.DATOS limit 10000;")
res <- fetch(res, n = -1)
The problem is that the fields are resturned as list with "vertical" variables instead of columns of a table!
head(res)
SUBSCRIPTION_ID
1 245206318120314
2 235109338101206
3 238463669110624
4 214177015090830
5 212403495090830
6 13874138618090824
SUB_ACCOUNT_ID
1 MV_SUBCTA_45206318_20120316
2 MV_SUBCTA_35109338_20101207
3 MV_SUBCTA_38463669_20110627
4 MV_SUBCTA_45223848_20120316
5 MV_SUBCTA_12403495_20081224
6 MV_SUBCTA_18932919_20091012
ACCOUNT_ID
1 MV_CTA_44123765_20120316
2 MV_CTA_35213277_20101207
3 MV_CTA_37772612_20110627
4 MV_CTA_14217213_20090330
5 MV_CTA_12477560_20081224
6 MV_CTA_18758944_20091012
ACCESS_METHOD_ID
1 1167391804
2 1159354610
3 2966407995
4 1153360304
5 1131960835
6 3874138618
Any idea how to solve this?? I have a working ODBC connection, but i´d rather use JDBC.
I scrolled your output all the way to the right and it looks like the strings in your columns are very wide (are they CHAR instead of VARCHAR?), so the result does not fit the width of R console. Hence R displays them that way.
So try to either trim them in your query
select rtrim(SUB_ACCOUNT_ID), ...
or in R:
require('stringr')
res$SUB_ACCOUNT_ID <- str_trim(res$SUB_ACCOUNT_ID)
Based on Alex answer i wrote this function to use rtrim in all variables.
query_nzz <- function(con, select="select * ", from="", where = "", limit = " 10000; "){
options(scipen=666)
# Get variable Names
query_names = paste(select, " from ",from, where, sep = " ")
names <- dbGetQuery(con, paste(query_names,"limit 1;", sep= " "))
names <- names(names)
# Trim spaces
select <- paste0("trim(",names,") as ", names, collapse = ",")
query = paste0("select ", select, " from ", from, where, " limit ",deparse(limit),";")
data = dbGetQuery(con, query)
data
}
Function usage
dt <- query_nzz(
con,
select = "select * ",
from = "DATABASE.TABLENAME",
where = "",
limit = 100000
)
Related
Using RODBC you can query a database like this:
library(RODBC)
dbHandle <- odbcDriverConnect('driver=SQL Server;server=SOME_SERVER;trusted_connection=true')
returnDf <- sqlQuery(dbHandle, query, stringsAsFactors = FALSE)
odbcClose(dbHandle)
This assumes that the object query is a vector of length 1. What happens if it is not? So, if query contains two elements - is the database queried twice?
Thanks #r2evans for pointing out the solution:
query <- c("select 1 as a", "select 2 as b")
dbhandle <- odbcDriverConnect('driver=SQL Server;server=SOME_SERVER;database=csn_pricing;trusted_connection=true')
df <- sqlQuery(dbhandle, query, stringsAsFactors = FALSE)
odbcClose(dbhandle)
This results in
> df
a
1 1
Hence, only the first element is used.
I am trying to process a large number of queries with parameters in R using ROracle. I know which parameters appear in each query, but I don't know in which order they appear. I am therefore looking for a way to submit the parameters by name in each query. Sample code:
library(ROracle)
# establish connection to DB
drv <- dbDriver("Oracle")
con <- dbConnect(drv, "User", "password", dbname = "DB")
# create table
createTab <- "create table RORACLE_TEST(num1 number, num2 number)"
dbGetQuery(con, createTab)
# insert String
insStr <- "insert into RORACLE_TEST values(:row1, :row2)"
dbGetQuery(con, insStr, data.frame(row2 = 0, row1 = 1))
# check output
dbGetQuery(con, "SELECT * FROM RORACLE_TEST")
# Output is:
# NUM1 NUM2
#1 0 1
# Desired output should be:
# NUM1 NUM2
#1 1 0
Any workaround for this will be appreciated except solutions of the kind
dbGetQuery(con,gsub(":row2", "0", gsub(":row1", "1", insStr)))
since this will not sanitize against sql injection (parameters will come from user input).
I have spent some time on the same question recently myself and found no perfect solution for this. In my opinion it is very wrong from syntax point of view to use named placeholders as it gives impression that argument order is insignificant.
I believe this is not the issue nor responsibility of ROracle library as this also would lead to a result which for person who doesn't know PL/SQL good enough is not expected:
DEFINE
row1 number;
row2 number;
BEGIN
row1 := 1;
row2 := 0;
EXECUTE IMMEDIATE 'insert into RORACLE_TEST values(:row1, :row2)' USING row2, row1;
END;
/
As we migrated our applications from RODBC with RODBCext to ROracle, we kept using ? as a bind variable placeholder and replaced them with colon style in our database connectivity API. At least this shouldn't raise anyone's eyebrows:
# placeholders '?' are replaced with :1 and :2 in custom_dbGetQuery()
insStr <- "insert into RORACLE_TEST values(?, ?)"
custom_dbGetQuery(con, insStr, data.frame(row2 = 0, row1 = 1))
Edit: added suggestion to reorder data frame
You could go further by reordering the data frame yourself by checking placeholder occurrences in the query string:
custom_dbGetQuery <- function(con, insStr, data) {
names <- names(data)
name.pos <- sort(sapply(names, function(ph) {
regexp <- paste0(":", ph, "[^\\w]")
matches <- gregexpr(regexp, insStr, perl = TRUE)
matches <- unlist(matches)
stopifnot(length(matches) == 1, all(matches != -1))
matches
}))
data <- data[, names(name.pos)]
print("running query:")
print(insStr)
print("using data:")
print(data)
dbGetQuery(con, insStr, data)
}
insStr <- "insert into RORACLE_TEST values(:row1, :row2)"
custom_dbGetQuery(con, insStr, data.frame(row2 = 0, row1 = 1))
# Output:
# [1] "running query:"
# [1] "insert into RORACLE_TEST values(:row1, :row2)"
# [1] "using data:"
# row1 row2
# 1 1 0
A tidyverse solution inspired by sqlInterpolate (thanks #Scarabee!):
readr::read_file('query.sql') %>%
stringr::str_replace_all(., ':','?') %>%
DBI::sqlInterpolate(con, ., row2 = 0, row1 = 1) %>%
DBI::dbGetQuery(con, .)
This question is the extension of this question How to quickly export data from R to SQL Server. Currently I am using following code:
# DB Handle for config file #
dbhandle <- odbcDriverConnect()
# save the data in the table finally
sqlSave(dbhandle, bp, "FACT_OP", append=TRUE, rownames=FALSE, verbose = verbose, fast = TRUE)
# varTypes <- c(Date="datetime", QueryDate = "datetime")
# sqlSave(dbhandle, bp, "FACT_OP", rownames=FALSE,verbose = TRUE, fast = TRUE, varTypes=varTypes)
# DB handle close
odbcClose(dbhandle)
I have tried this approach also, which is working beautifully and I have gained significant speed as well.
toSQL = data.frame(...);
write.table(toSQL,"C:\\export\\filename.txt",quote=FALSE,sep=",",row.names=FALSE,col.names=FALSE,append=FALSE);
sqlQuery(channel,"BULK
INSERT Yada.dbo.yada
FROM '\\\\<server-that-SQL-server-can-see>\\export\\filename.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\\n'
)");
But my issue is I can NOT keep my data at rest between the transaction (Writing data to a file is not an option because of data security), so I was looking for solution if I can directly Bulk insert from memory or cache the data. Thanks for the help.
Good question - also useful in instances where the BULK INSERT permissions cannot be setup for whatever reason.
I threw together this poor man's solution a while back when I had enough data that sqlSave was too slow, but not enough to justify setting up BULK INSERT, so it does not require any data being written to a file. The primary reason that sqlSave and parameterized queries are so slow for inserting data is that each row is inserted with a new INSERT statement. Having R write the INSERT statement manually bypasses this in my example below:
library(RODBC)
channel <- ...
dataTable <- ...relevant data...
numberOfThousands <- floor(nrow(dataTable)/1000)
extra <- nrow(dataTable)%%1000
thousandInsertQuery <- function(channel,dat,range){
sqlQuery(channel,paste0("INSERT INTO Database.dbo.Responses (IDNum,State,Answer)
VALUES "
,paste0(
sapply(range,function(k) {
paste0("(",dat$IDNum[k],",'",
dat$State[k],"','",
gsub("'","''",dat$Answer[k],fixed=TRUE),"')")
})
,collapse=",")))
}
if(numberOfThousands)
for(n in 1:numberOfThousands)
{
thousandInsertQuery(channel,(1000*(n-1)+1):(1000*n),dataTable)
}
if(extra)
thousandInsertQuery(channel,(1000*numberOfThousands+1):(1000*numberOfThousands+extra))
SQL's INSERT statements written out with values will only accept up to 1000 rows at a time, so this code breaks it up into chunks (much more efficiently than one row at a time).
The thousandInsertQuery function will obviously have to be customized to handle whatever columns your data frame has - note also that there are single quotes around the character/factor columns and a gsub to handle any single quotes that might be in the character column. Other than this there are no safeguards against SQL injection attacks.
What about using DBI::dbWriteTable() function?
Example below (I am connecting my R code to AWS RDS instance of MS SQL Express):
library(DBI)
library(RJDBC)
library(tidyverse)
# Specify where you driver lives
drv <- JDBC(
"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"c:/R/SQL/sqljdbc42.jar")
# Connect to AWS RDS instance
conn <- drv %>%
dbConnect(
host = "jdbc:sqlserver://xxx.ccgqenhjdi18.ap-southeast-2.rds.amazonaws.com",
user = "xxx",
password = "********",
port = 1433,
dbname= "qlik")
if(0) { # check what the conn object has access to
queryResults <- conn %>%
dbGetQuery("select * from information_schema.tables")
}
# Create test data
example_data <- data.frame(animal=c("dog", "cat", "sea cucumber", "sea urchin"),
feel=c("furry", "furry", "squishy", "spiny"),
weight=c(45, 8, 1.1, 0.8))
# Works in 20ms in my case
system.time(
conn %>% dbWriteTable(
"qlik.export.test",
example_data
)
)
# Let us see if we see the exported results
conn %>% dbGetQuery("select * FROM qlik.export.test")
# Let's clean the mess and force-close connection at the end of the process
conn %>% dbDisconnect()
It works pretty fast for small amount of data transferred and seems rather elegant if you want data.frame -> SQL table solution.
Enjoy!
Building on #jpd527 solution which I found really worth digging into...
require(RODBC)
channel <- #connection parameters
dbPath <- # path to your table, database.table
data <- # the DF you have prepared for insertion, /!\ beware of column names and values types...
# Function to insert 1000 rows of data in one sqlQuery call, coming from
# any DF and into any database.table
insert1000Rows <- function(channel, dbPath, data, range){
# Defines columns names for the database.table
columns <- paste(names(data), collapse = ", ")
# Initialize a string which will incorporate all 1000 rows of values
values <- ""
# Not very elegant, but appropriately builds the values (a, b, c...), (d, e, f...) into a string
for (i in range) {
for (j in 1:ncol(data)) {
# First column
if (j == 1) {
if (i == min(range)) {
# First row, only "("
values <- paste0(values, "(")
} else {
# Next rows, ",("
values <- paste0(values, ",(")
}
}
# Value Handling
values <- paste0(
values
# Handling NA values you want to insert as NULL values
, ifelse(is.na(data[i, j])
, "null"
# Handling numeric values you want to insert as INT
, ifelse(is.numeric(data[i, j])
, data[i, J]
# Else handling as character to insert as VARCHAR
, paste0("'", data[i, j], "'")
)
)
)
# Separator for columns
if (j == ncol(data)) {
# Last column, close parenthesis
values <- paste0(values, ")")
} else {
# Other columns, add comma
values <- paste0(values, ",")
}
}
}
# Once the string is built, insert it into SQL Server
sqlQuery(channel,paste0("insert into ", dbPath, " (", columns, ") values ", values))
}
This insert1000Rows function is used in a loop in the next function, sqlInsertAll, for which you simply define which DF you want to insert into which database.table.
# Main function which uses the insert1000rows function in a loop
sqlInsertAll <- function(channel, dbPath, data) {
numberOfThousands <- floor(nrow(data) / 1000)
extra <- nrow(data) %% 1000
if (numberOfThousands) {
for(n in 1:numberOfThousands) {
insert1000Rows(channel, dbPath, data, (1000 * (n - 1) + 1):(1000 * n))
print(paste0(n, "/", numberOfThousands))
}
}
if (extra) {
insert1000Rows(channel, dbPath, data, (1000 * numberOfThousands + 1):(1000 * numberOfThousands + extra))
}
}
With this, I am able to insert 250k rows of data in 5 minutes or so, whereas it took more than 24 hours using sqlSave from the RODBC package.
Basically I am trying to derive the WHERE part of a SELECT statement by unlisting and paste-ing a list where the list names represent the database TABLE Columns and the respective list values equal the parameters for the WHERE clause. Here is a simplified example ...
lst <- list(DATE=as.Date('2015-10-25'), NUM="0001", PROD="SOMETHING")
lst
$DATE
[1] "2015-10-25"
$NUM
[1] "0001"
$PROD
[1] "SOMETHING"
This would ideally be transformed into (the interesting bit starting in the second line after the WHERE):
"SELECT SOME_COLUMNS WHERE
DATE = '", lst$DATE, "' AND
NUM = '", lst$NUM, "' AND
PROD = '" lst$PROD ,"'")
I am quite sure that someone knows of some fancy combination of apply(),
paste(..,collapse ="' AND ") and/or substitute() that can accomplish that in an elegant form, but I am stuck.
I don't know if this is elegant enough but it should work:
sql <- paste0("SELECT ",
paste0(names(lst),collapse=','),
" WHERE\n",
paste(lapply(names(lst),function(x)paste0(x," = '",lst[[x]],"'")),collapse="AND\n"))
> cat(sql)
SELECT DATE,NUM,PROD WHERE
DATE = '2015-10-25'AND
NUM = '0001'AND
PROD = 'SOMETHING'
sprintf is generally useful:
lst <- list(DATE=as.Date('2015-10-25'), NUM="0001", PROD="SOMETHING")
q <- "SELECT SOME_COLUMNS WHERE DATE = '%s' AND NUM = '%s' AND PROD = '%s'"
> sprintf(q,lst[[1]],lst[[2]],lst[[3]])
[1] "SELECT SOME_COLUMNS WHERE DATE = '2015-10-25' AND NUM = '0001' AND PROD = 'SOMETHING'"
Also, see my other answer here for more ideas. If you do this a lot, it pays to build up some specialized tools for it, as I outline in that answer.
Are you looking for something like this?
lst2sql <- function(lst) {
sql <- "SELECT col1, col2 FROM table1 WHERE"
predicates <- vapply(names(lst), function(n) { paste(n, " = '", lst[[n]], "'", sep="") }, character(length(names)))
paste(sql, paste(predicates, collapse=" AND "))
}
When called on your example list will produce:
"SELECT col1, col2 FROM table1 WHERE DATE = '2015-10-25' AND NUM = '0001' AND PROD = 'SOMETHING'"
I'm in the process of learning R, to wave SAS goodbye, I'm still new to this and I somehow have difficulties finding exactly what I'm looking for.
But for this specific case, I read:
Pass R variable to RODBC's sqlQuery?
and made it work for myself, as long as I'm only inserting one variable in the destination table.
Here is my code:
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
paste("insert into TestTable (UniqueID) Values (",b,")", sep = "")
When I replace the top 1 by any other number, let's say top 2, and run the exact same code, I get the following errors:
[1] "42000 195 [Microsoft][SQL Server Native Client 10.0][SQL Server]
'c' is not a recognized built-in function name."
[2] "[RODBC] ERROR: Could not SQLExecDirect
'insert into TestTable (UniqueID) Values (c(8535735, 8449336))'"
I understand that it is because there is an extra c that is generated, I assume for column when I give the command: paste(b).
So how can I get "8535735, 8449336" instead of "c(8535735, 8449336)" when using paste(b)? Or is there another way to do this?
Look into the collapse argument in the paste() documentation. Try replacing b with paste(b, collapse = ", "), as shown below.
Edit As Joshua points out, sqlQuery returns a data.frame, not a vector. So, instead of paste(b, collapse = ", "), you could use paste(b[[1]], collapse = ", ").
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
## note paste(b[[1]], collapse = ", ") in line below
paste("insert into TestTable (UniqueID) Values (", paste(b[[1]], collapse = ", "),")", sep = "")
Assuming b looks like this:
b <- data.frame(Noinscr=c("8535735", "8449336"))
Then you only need a couple steps:
# in case Noinscr is a factor
b$Noinscr <- as.character(b$Noinscr)
# convert the vector into a single string
# NOTE that I subset to get the vector, since b is a data.frame
B <- paste(b$Noinscr, collapse=",")
# create your query
paste("insert into TestTable (UniqueID) Values (",B,")", sep="")
# [1] "insert into TestTable (UniqueID) Values (8535735,8449336)"
You got odd results because sqlQuery returns a data.frame, not a vector. As you learned, using paste on a data.frame (or any list) can provide weird results because paste must return a character vector.