String length limitation using dbBind of DBI in R - r

I want to use DBI::dbBind to run some parameterized queries to update a SQL Server database but the string values I wrote got truncated at 256 characters. In the code below, I am supposed to see a string of 500 "n" in the database but I only see 256.
conn <- DBI::dbConnect(odbc::odbc(), Driver = "xxx", Server = "serverx", Database = "dbx", UID = "pathx", PWD = "passwd", PORT = 1234)
query <- "UPDATE tableA SET fieldA = ? WHERE rowID = ?"
para <- list(strrep("n", 500), "id12345")
sentQuery <- dbSendQuery(conn, query)
dbBind(sentQuery, para)
dbClearResult(sentQuery)
I also tried writing the 500 "n" without using dbBind, and the result is fine. I see all 500 n. I guess this eliminates some culprits of the problem, like the conn and the field definition in the database. This is the code that works.
query <- (paste0("UPDATE tableA SET fieldA = '", strrep("n", 500), "' WHERE rowID = 'id12345'"))
dbExecute(conn, query)
I found one similar question without answer (Truncated updated string with R DBI package). However that question didn't point out dbBind therefore I am posting this for higher specificity.

Related

How to connect to Teradata database using Dask?

The pandas equivalent code for connecting to Teradata, I have used is:
database = config.get('Teradata connection', 'database')
host = config.get('Teradata connection', 'host')
user = config.get('Teradata connection', 'user')
pwd = config.get('Teradata connection', 'pwd')
with teradatasql.connect(host=host, user=user, password=pwd) as connect:
query1 = "SELECT * FROM {}.{}".format(database, tables)
df = pd.read_sql_query(query1, connect)
Now, I need to use the Dask library for loading big data as an alternative to pandas.
Please suggest a method to connect the same with Teradata.
Teradata appears to have a sqlalchemy engine, so you should be able to install that, set your connection string appropriately and use Dask's existing from_sql function.
Alternatively, you could do this by hand: you need to decide on a set of conditions which will partition the data for you, each partition being small enough for your workers to handle. Then you can make a set of partitions and combine into a dataframe as follows
def get_part(condition):
with teradatasql.connect(host=host, user=user, password=pwd) as connect:
query1 = "SELECT * FROM {}.{} WHERE {}".format(database, tables, condition)
return pd.read_sql_query(query1, connect)
parts = [dask.delayed(get_part)(cond) for cond in conditions)
df = dd.from_delayed(parts)
(ideally, you can derive the meta= parameter for from_delayed beforehand, perhaps by getting the first 10 rows of the original query).

In R connected to an Access Database through ODBC, how can I update a text field?

I have R linked to an Access database using the ODBC and DBI packages. I scoured the internet and couldn't find a way to write an update query, so I'm using the dbSendStatement function to update entries individually. Combined with a for loop, this effectively works like an update query with one snag - when I try to update any field in the database that is text I get an error that says "[Microsoft][ODBC Microsoft Access Driver] One of your parameters is invalid."
DBI::dbSendStatement(conn = dB.Connection, statement = paste("UPDATE DC_FIMs_BLDG_Lvl SET kWh_Rate_Type = ",dquote(BLDG.LVL.Details[i,5])," WHERE FIM_ID = ",BLDG.LVL.Details[i,1]," AND BUILDING_ID = ",BLDG.LVL.Details[i,2],";", sep = ""))
If it's easier, when pasted, the code reads like this:
DBI::dbSendStatement(conn = dB.Connection, statement = paste("UPDATE DC_FIMs_BLDG_Lvl SET kWh_Rate_Type = “Incremental” WHERE FIM_ID = 26242807 AND BUILDING_ID = 515;", sep = ""))

How to use glue_data_sql to write safe parameterized queries on an SQL server database?

The problem
I want to write a wrapper around some DBI functions that allows safe execution of parameterized queries. I've found the this resource that explains how to use the glue package to insert parameters into an SQL query. However, there seem to be two distinct ways to use the glue package to insert parameters:
Method 1 involves using ? in the sql query where the parameters need to be inserted, and then subsequently using dbBind to fill them in. Example from the link above:
library(glue)
library(DBI)
airport_sql <- glue_sql("SELECT * FROM airports WHERE faa = ?")
airport <- dbSendQuery(con, airport_sql)
dbBind(airport, list("GPT"))
dbFetch(airport)
Method 2 involves using glue_sql or glue_data_sql to fill in the parameters by itself (no use of dbBind). Again an example from the link above:
airport_sql <-
glue_sql(
"SELECT * FROM airports WHERE faa IN ({airports*})",
airports = c("GPT", "MSY"),
.con = con
)
airport <- dbSendQuery(con, airport_sql)
dbFetch(airport)
I would prefer using the second method because this has a lot of extra functionality such as collapsing multiple values for an in statement in the where clause of an sql statement. See second example above for how that works (note the * after the parameter which indicates it must be collapsed). The question is: is this safe against SQL injection? (Are there other things I need to worry about?)
My code
This is currently the code I have for my wrapper.
paramQueryWrapper <- function(
sql,
params = NULL,
dsn = standard_dsn,
login = user_login,
pw = user_pw
){
if(missing(sql) || length(sql) != 1 || !is.character(sql)){
stop("Please provide sql as a character vector of length 1.")
}
if(!is.null(params)){
if(!is.list(params)) stop("params must be a (named) list (or NULL).")
if(length(params) < 1) stop("params must be either NULL, or contain at least one element.")
if(is.null(names(params)) || any(names(params) == "")) stop("All elements in params must be named.")
}
con <- DBI::dbConnect(
odbc::odbc(),
dsn = dsn,
UID = login,
PWD = pw
)
on.exit(DBI::dbDisconnect(con), add = TRUE)
# Replace params with corresponding values and execute query
sql <- glue::glue_data_sql(.x = params, sql, .con = con)
query <- DBI::dbSendQuery(conn = con, sql)
on.exit(DBI::dbClearResult(query), add = TRUE, after = FALSE)
return(tibble::as_tibble(DBI::dbFetch(query)))
}
My question
Is this safe against SQL injection? Especially since I am not using dbBind.
Epilogue
I know that there already exists a wrapper called dbGetQuery that allows parameters (see this question for more info - look for the answer by #krlmlr for an example with a parameterized query). But this again relies on the first method using ? which is much more basic in terms of functionality.

How do I find the schema of a table in an ODBC connection by name?

I'm using the odbc package to connect to a MS SQL Server
con <- dbConnect(odbc::odbc(),
Driver = "ODBC Driver 13 for SQL Server",
Server = "server",
Database = "database",
UID = "user",
PWD = "pass",
Port = 1111)
This server has many tables, so I'm using dbListTables(con) to search for the ones containing a certain substring. But once I find them I need to discover which schema they are in to be able to query them. I'm currently doing this manually (looking for the name of the table in each schema), but is there any way I can get the schema of all tables that match a string?
Consider running an SQL query with LIKE search using the built-in INFORMATION_SCHEMA metadata table if your user has sufficient privileges.
SELECT SCHEMA_NAME
FROM INFORMATION_SCHEMA.SCHEMATA
WHERE SCHEMA_NAME LIKE '%some string%'
Call above with R odbc with a parameterized query on the wildcard search:
# PREPARED STATEMENT
strSQL <- paste("SELECT SCHEMA_NAME" ,
"FROM INFORMATION_SCHEMA.SCHEMATA",
"WHERE SCHEMA_NAME LIKE ?SEARCH")
# SAFELY INTERPOLATED QUERY
query <- sqlInterpolate(conn, strSQL, SEARCH = '%some string%')
# DATA FRAME BUILD FROM RESULTSET
schema_names_df <- dbGetQuery(conn, query)
I found a work around using the RODBC package:
library('RODBC')
# First connect to the DB
dbconn <- odbcDriverConnect("driver = {ODBC Driver xx for SQL Server};
server = server;
database = database;
uid = username;
pwd = password")
# Now fetch the DB tables
sqlTables(dbconn)
For my specific DB I get:
names(sqlTables(dbconn)
[1] "TABLE_CAT" "TABLE_SCHEM" "TABLE_NAME" "TABLE_TYPE" "REMARKS"

Data from ODBC blob not matching return from SQL query

I’m reading a BLOB field from an ODBC data connection (the BLOB field is a file). I connect and query the database, returning the blob and the filename. The blob itself does not contain the same data as I find in the database however. My code is as follows along with the data returned vs in the DB.
library(RODBC)
sqlret<-odbcConnect('ODBCConnection')
qry<-'select content,Filename from document with(nolock) where documentid = \'xxxx\''
df<-sqlQuery(sqlret,qry)
close(sqlret)
rootpath<-paste0(getwd(),'/DocTest/')
dir.create(rootpath,showWarnings = FALSE)
content<-unlist(df$content)
fileout<-file(paste0(rootpath,df$Filename),"w+b")
writeBin(content, fileout)
close(fileout)
database blob is
0x50726F642050434E203A0D0A35363937313533320D0A33383335323133320D0A42463643453335380D0A0D0A574C4944203A0D0A0D0…
the dataframe’s content is
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004b020000000000000d0000f1000000008807840200000000d0f60c0c0000…
The filenames match up, as does the size of the content/blob.
The exact approach you take may vary depending on your ODBC driver. I'll demonstrate how I do this on MS SQL Server, and hopefully you can adapt it to your needs.
I'm going to use a table in my database called InsertFile with the following definition:
CREATE TABLE [dbo].[InsertFile](
[OID] [int] IDENTITY(1,1) NOT NULL,
[filename] [varchar](50) NULL,
[filedata] [varbinary](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
Now let's create a file that we will push into the database.
file <- "hello_world.txt"
write("Hello world", file)
I need to do a little work to prep the byte code for this file to go into SQL. I use this function for that.
prep_file_for_sql <- function(filename){
bytes <-
mapply(FUN = readBin,
con = filename,
what = "raw",
n = file.info(filename)[["size"]],
SIMPLIFY = FALSE)
chars <-
lapply(X = bytes,
FUN = as.character)
vapply(X = bytes,
FUN = paste,
collapse = "",
FUN.VALUE = character(1))
}
Now, this is a bit strange, but the SQL Server ODBC driver is pretty good at writing VARBINARY columns, but terrible at reading them.
Coincidentally, the SQL Server Native Client 11.0 ODBC driver is terrible at writing VARBINARY columns, but okay-ish with reading them.
So I'm going to have two RODBC objects, conn_write and conn_read.
conn_write <-
RODBC::odbcDriverConnect(
paste0("driver=SQL Server; server=[server_name]; database=[database_name];",
"uid=[user_name]; pwd=[password]")
)
conn_read <-
RODBC::odbcDriverConnect(
paste0("driver=SQL Server Native Client 11.0; server=[server_name]; database=[database_name];",
"uid=[user_name]; pwd=[password]")
)
Now I'm going to insert the text file into the database using a parameterized query.
sqlExecute(
channel = conn_write,
query = "INSERT INTO dbo.InsertFile (filename, filedata) VALUES (?, ?)",
data = list(file,
prep_file_for_sql(file)),
fetch = FALSE
)
And now to read it back out using a parameterized query. The unpleasant trick to use here is recasting your VARBINARY property as a VARBINARY (don't ask me why, but it works).
X <- sqlExecute(
channel = conn_read,
query = paste0("SELECT OID, filename, ",
"CAST(filedata AS VARBINARY(8000)) AS filedata ",
"FROM dbo.InsertFile WHERE filename = ?"),
data = list("hello_world.txt"),
fetch = TRUE,
stringsAsFactors = FALSE
)
Now you can look at the contents with
unlist(X$filedata)
And write the file with
writeBin(unlist(X$filedata),
con = "hello_world2.txt")
BIG DANGEROUS CAVEAT
You need to be aware of the size of your files. I usually store files as a VARBINARY(MAX), and SQL Server isn't very friendly about exporting those through ODBC (I'm not sure about other SQL Engines; see RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX) for more details)
The only way I've found to get around this is to recast the VARBINARY(MAX) as a VARBINARY(8000). That obviously is a terrible solution if you have more than 8000 bytes in your file. When I need to get around this, I've had to loop over the VARBINARY(MAX) column and created multiple new columns each of length 8000, and then paste them all together in R. (check out: Reconstitute PNG file stored as RAW in SQL Database)
As of yet, I've not come up with a generalized solution to this problem. Perhaps that's something I should spend more time on, though.
The limit of the 8000 is imposed by the ODBC driver and not by the RODBC, DBI or odbc packages.
Use the latest driver to remove the limitation: ODBC Driver 17 for SQL Server
https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-2017
There is no need to convert column to VARBINARY with this latest driver.
Following should work
X <- sqlExecute(
channel = conn_read,
query = paste0("SELECT OID, filename, ",
"filedata ",
"FROM dbo.InsertFile WHERE filename = ?"),
data = list("hello_world.txt"),
fetch = TRUE,
stringsAsFactors = FALSE
)

Resources