Basically I am trying to derive the WHERE part of a SELECT statement by unlisting and paste-ing a list where the list names represent the database TABLE Columns and the respective list values equal the parameters for the WHERE clause. Here is a simplified example ...
lst <- list(DATE=as.Date('2015-10-25'), NUM="0001", PROD="SOMETHING")
lst
$DATE
[1] "2015-10-25"
$NUM
[1] "0001"
$PROD
[1] "SOMETHING"
This would ideally be transformed into (the interesting bit starting in the second line after the WHERE):
"SELECT SOME_COLUMNS WHERE
DATE = '", lst$DATE, "' AND
NUM = '", lst$NUM, "' AND
PROD = '" lst$PROD ,"'")
I am quite sure that someone knows of some fancy combination of apply(),
paste(..,collapse ="' AND ") and/or substitute() that can accomplish that in an elegant form, but I am stuck.
I don't know if this is elegant enough but it should work:
sql <- paste0("SELECT ",
paste0(names(lst),collapse=','),
" WHERE\n",
paste(lapply(names(lst),function(x)paste0(x," = '",lst[[x]],"'")),collapse="AND\n"))
> cat(sql)
SELECT DATE,NUM,PROD WHERE
DATE = '2015-10-25'AND
NUM = '0001'AND
PROD = 'SOMETHING'
sprintf is generally useful:
lst <- list(DATE=as.Date('2015-10-25'), NUM="0001", PROD="SOMETHING")
q <- "SELECT SOME_COLUMNS WHERE DATE = '%s' AND NUM = '%s' AND PROD = '%s'"
> sprintf(q,lst[[1]],lst[[2]],lst[[3]])
[1] "SELECT SOME_COLUMNS WHERE DATE = '2015-10-25' AND NUM = '0001' AND PROD = 'SOMETHING'"
Also, see my other answer here for more ideas. If you do this a lot, it pays to build up some specialized tools for it, as I outline in that answer.
Are you looking for something like this?
lst2sql <- function(lst) {
sql <- "SELECT col1, col2 FROM table1 WHERE"
predicates <- vapply(names(lst), function(n) { paste(n, " = '", lst[[n]], "'", sep="") }, character(length(names)))
paste(sql, paste(predicates, collapse=" AND "))
}
When called on your example list will produce:
"SELECT col1, col2 FROM table1 WHERE DATE = '2015-10-25' AND NUM = '0001' AND PROD = 'SOMETHING'"
Related
I am currently building a (large) survey and need to send the responses people provide to a database. I have set up my database connection using the pool and RMariaDB packages, and I have written the following function to construct the SQL queries and submit my data (the data is secured with SSL certificates and all this information is passed through the list db_config).
save_db <- function (db_pool, x, db_name, db_config, replace_val) {
# Construct the DB query to be sent to the database
if (!replace_val) {
query <- sprintf(
"INSERT INTO %s (%s) VALUES ('%s')",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = "', '")
)
} else {
query <- sprintf(
"UPDATE %s SET %s WHERE %s;",
db_name,
paste(paste0(names(x)[-1], " = \'", x[-1], "\'"), collapse = ", "),
paste0(names(x)[1], " = \'", x[1], "\'")
)
}
# Submit the insert query to the database via the opened connection
RMariaDB::dbExecute(db_pool, query)
}
db_poolis the pool object handling my database connections; x is a named vector with the data that I am sending to the database, where the names corresponds to the column names of my MariaDB and the values are stored as data blobs; db_name is the name of my database; replace_val a boolean.
The data blobs are essentially different output objects from the survey, e.g. vectors or matrices of responses, turned into character strings using the toJSON() from the jsonlite package.
So far, so good. I am able to send data to the database, download it and reconstruct the responses using the fromJSON() command. All is good. However, I do have one security concern. In my survey, I do have a few open-ended questions where people can write what they want. While unlikely, I am concerned that someone might use a SQL injection attack. Worst case scenario, I lose all my data.
I know of the sqlInterpolate() function from the DBI package. From my understanding, the function escapes any quotation marks, meaning that any value submitted will be turned into a safe string.
What I have not been able to do is modify my function above to work with sqlInterpolate. In my case x is a named vector of length seven where each vector element is a JSON string. Essentially, I need to use sqlInterpolate() on each of the JSON strings. I was wondering if there is an "easy" way of doing this, or if my best course of action would be to completely rewrite my function to send seven individual deposits to the DB, i.e. one for each vector element.
A rather simplified example would be something like this:
library(jsonlite)
# Create some data to test the string on
y <- 1:3
z <- matrix(runif(4), 2, 2)
q <- c("one", "don't")
x <- c(toJSON(y), toJSON(z), toJSON(q))
names(x) <- c("var_1", "var_2", "var_3")
db_name <- "my_db"
# Current sprintf() statement
sprintf(
"INSERT INTO %s (%s) VALUES ('%s')",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = "', '")
)
And what I would need to interpolate is the values captured by ('%s') in the sprintf() statement (and similarly for the update query). Because I am fairly certain that just turning everything into a JSON string would sanitize my DB input?
Any help would be much appreciated.
Having spent several hours trying and failing at this today, I believe I managed to find a work around. I have done some testing and it appears to be working. I am posting an answer to my own question in case someone has a similar problem at a different time.
My updated function now looks like this:
save_db <- function (db_pool, x, db_name, db_config, replace_val) {
# Interpolate the elements of x
x <- do.call(c, lapply(x, function(y) {
sql <- "?value"
sqlInterpolate(db_pool, sql, value = y)
}))
# Construct the DB query to be sent to the database
if (!replace_val) {
query <- sprintf(
"INSERT INTO %s (%s) VALUES (%s)",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = ", ")
)
} else {
query <- sprintf(
"UPDATE %s SET %s WHERE %s;",
db_name,
paste(paste0(names(x)[-1], " = ", x[-1]), collapse = ", "),
paste0(names(x)[1], " = ", x[1])
)
}
# Submit the insert query to the database via the opened connection
RMariaDB::dbExecute(db_pool, query)
}
It appears that the key was to only use the interpolation on the actual JSON string itself, like so:
x <- do.call(c, lapply(x, function(y) {
sql <- "?value"
sqlInterpolate(db_pool, sql, value = y)
}))
And the rest of the function can be used as is. To see this, let's use the example I provided in my original question:
y <- 1:3
z <- matrix(runif(4), 2, 2)
q <- c("one", "don't")
x <- c(toJSON(y), toJSON(z), toJSON(q))
names(x) <- c("var_1", "var_2", "var_3")
db_name <- "my_db"
# Current sprintf() statement
sprintf(
"INSERT INTO %s (%s) VALUES ('%s')",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = "', '")
)
Which yields the output:
"INSERT INTO my_db (var_1, var_2, var_3) VALUES ('[1,2,3]', '[[0.6573,0.1726],[0.3291,0.9903]]', '[\"one\",\"don't\"]')"
If I now transform my x as above and use the updated sprintf() call (Note that the extra single quotation marks are removed):
x <- do.call(c, lapply(x, function(y) {
sql <- "?value"
sqlInterpolate(ANSI(), sql, value = y)
}))
sprintf(
"INSERT INTO %s (%s) VALUES (%s)",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = ", ")
)
I will get:
"INSERT INTO my_db (var_1, var_2, var_3) VALUES ('[1,2,3]', '[[0.6573,0.1726],[0.3291,0.9903]]', '[\"one\",\"don''t\"]')"
And we see that the single quotation mark in don't is correctly quoted out. If I have missed something crucial in my own solution, please feel free to comment on it.
I am trying to process a large number of queries with parameters in R using ROracle. I know which parameters appear in each query, but I don't know in which order they appear. I am therefore looking for a way to submit the parameters by name in each query. Sample code:
library(ROracle)
# establish connection to DB
drv <- dbDriver("Oracle")
con <- dbConnect(drv, "User", "password", dbname = "DB")
# create table
createTab <- "create table RORACLE_TEST(num1 number, num2 number)"
dbGetQuery(con, createTab)
# insert String
insStr <- "insert into RORACLE_TEST values(:row1, :row2)"
dbGetQuery(con, insStr, data.frame(row2 = 0, row1 = 1))
# check output
dbGetQuery(con, "SELECT * FROM RORACLE_TEST")
# Output is:
# NUM1 NUM2
#1 0 1
# Desired output should be:
# NUM1 NUM2
#1 1 0
Any workaround for this will be appreciated except solutions of the kind
dbGetQuery(con,gsub(":row2", "0", gsub(":row1", "1", insStr)))
since this will not sanitize against sql injection (parameters will come from user input).
I have spent some time on the same question recently myself and found no perfect solution for this. In my opinion it is very wrong from syntax point of view to use named placeholders as it gives impression that argument order is insignificant.
I believe this is not the issue nor responsibility of ROracle library as this also would lead to a result which for person who doesn't know PL/SQL good enough is not expected:
DEFINE
row1 number;
row2 number;
BEGIN
row1 := 1;
row2 := 0;
EXECUTE IMMEDIATE 'insert into RORACLE_TEST values(:row1, :row2)' USING row2, row1;
END;
/
As we migrated our applications from RODBC with RODBCext to ROracle, we kept using ? as a bind variable placeholder and replaced them with colon style in our database connectivity API. At least this shouldn't raise anyone's eyebrows:
# placeholders '?' are replaced with :1 and :2 in custom_dbGetQuery()
insStr <- "insert into RORACLE_TEST values(?, ?)"
custom_dbGetQuery(con, insStr, data.frame(row2 = 0, row1 = 1))
Edit: added suggestion to reorder data frame
You could go further by reordering the data frame yourself by checking placeholder occurrences in the query string:
custom_dbGetQuery <- function(con, insStr, data) {
names <- names(data)
name.pos <- sort(sapply(names, function(ph) {
regexp <- paste0(":", ph, "[^\\w]")
matches <- gregexpr(regexp, insStr, perl = TRUE)
matches <- unlist(matches)
stopifnot(length(matches) == 1, all(matches != -1))
matches
}))
data <- data[, names(name.pos)]
print("running query:")
print(insStr)
print("using data:")
print(data)
dbGetQuery(con, insStr, data)
}
insStr <- "insert into RORACLE_TEST values(:row1, :row2)"
custom_dbGetQuery(con, insStr, data.frame(row2 = 0, row1 = 1))
# Output:
# [1] "running query:"
# [1] "insert into RORACLE_TEST values(:row1, :row2)"
# [1] "using data:"
# row1 row2
# 1 1 0
A tidyverse solution inspired by sqlInterpolate (thanks #Scarabee!):
readr::read_file('query.sql') %>%
stringr::str_replace_all(., ':','?') %>%
DBI::sqlInterpolate(con, ., row2 = 0, row1 = 1) %>%
DBI::dbGetQuery(con, .)
In the code below, the dataframe offshore_Sites already exists and contains at about 1000 records. I am building a function so I can re-use it for all the other dataframes I have.
The dataframes are obtained from SQL Server. At the moment I have only got the offshore_Sites one but the others will be produced in the same way.
The idea is to call this function, that has a switch statement inside and, depending on the dataframe, I will be performing different transformations. For the offshore_Sites one, I need to concatenate some of the fields, as in the example.
myStringConn <- "Driver=SQL Server;Server=SQL-SPATIAL;Database=AreasProt;Trusted_Connection=True;"
conn <- odbcDriverConnect(myStringConn)
offshore_Sites <- sqlQuery(conn, "select * from Offshore_Sites")
formatDataFrame <- function(dataframe) {
switch(dataframe, "offshore_Sites" = {
offshore_sites <- as.data.table(offshore_Sites)
offshore_sites <- setnames(offshore_sites, 1:6, c("status","country","region","area","long","lat"))
offshore_sites <- unique(offshore_sites[, list(status,
country = paste(sort(unique(country)), collapse = ' & '),
region = paste(sort(unique(region)), collapse = ' & '),
area,
long,
lat), by = code])
})
}
formatDataFrame(offshore_Sites)
However, when I run this, I get the error:
Error in switch(dataframe, offshore_Sites = { :
EXPR must be a length 1 vector
Does anyone understand what is happening?
I had some inspiration today and I kind of spotted where the problem was. The function needs two variables, the dataframe name and the dataframe itself.
myStringConn <- "Driver=SQL Server;Server=SQL-SPATIAL;Database=AreasProt;Trusted_Connection=True;"
conn <- odbcDriverConnect(myStringConn)
offshore_Sites <- sqlQuery(conn, "select * from Offshore_Sites")
formatDataFrame <- function(dataframe, dataframeName) {
switch(dataframeName, "offshore_Sites" = {
offshore_sites <- as.data.table(dataframe)
offshore_sites <- setnames(offshore_sites, 1:6, c("status","country","region","area","long","lat"))
offshore_sites <- unique(offshore_sites[, list(status,
country = paste(sort(unique(country)), collapse = ' & '),
region = paste(sort(unique(region)), collapse = ' & '),
area,
long,
lat), by = code])
})
}
formatDataFrame(offshore_Sites, "Offshore_Sites")
Thanks for all the comments :)
I would like to take the values from a data frame and paste them into a text string that can be used as a sql query. In SAS I would do it
proc sql noprint; Select Names into :names separated by ", " from df; quit;
this would create a variable &name storing all the names. Like: Id, Name, Account. I would like to do this same type of thing in R, but do not know how. I can create a vector with names separated by comma and each one is surrounded by quotes, I can take away the quotes using noquote function and have them in a vector, but I can not get the elements in another paste statement to add the "SELECT" and FROM. I can not get it to all paste. Is there a way to pull the values on Column and create a text string that can be used as a SQL query inside R? Here is what I have tried in R:
name = c("Id", "IsDeleted", "Name", "Credit__Loan__c")
label = c("Record Id", "Deleted", "ID", "Loan")
df = data.frame(name, label)
names(df) <- c("name", "label")
as.query.fields = noquote(paste(df$name, collaspe=", "))
as.query.final <- paste("SELECT " , noquote(paste(df$name, collaspe=", ")), " id FROM Credit_Amortization_Schedule__c")
data(iris)
colnames(iris)
a <- noquote(paste(colnames(iris), collaspe=", "))
as.query.final <- cat("SELECT " , a, " id FROM Credit_Amortization_Schedule__c")
The result is:
SELECT Sepal.Length , Sepal.Width , Petal.Length , Petal.Width , Species , id FROM Credit_Amortization_Schedule__c
which you can then use with SQL like this:
require(RODBC)
result <- sqlQuery(db, as.query.final)
where db is your database connection
Or, since I see your sqldf tag now, if you want to use sqldf it's just:
sqldf(as.query.final)
The gsubfn package supports string interpolation:
library(gsubfn)
Names <- toString( sprintf("%s '%s'", df$name, df$label) )
fn$identity("select $Names from myTable")
giving:
[1] "select Id 'Record Id', IsDeleted 'Deleted', Name 'ID', Credit__Loan__c 'Loan' from myTable"
Here some additional examples: SO example 1 and SO example 2 .
I'm in the process of learning R, to wave SAS goodbye, I'm still new to this and I somehow have difficulties finding exactly what I'm looking for.
But for this specific case, I read:
Pass R variable to RODBC's sqlQuery?
and made it work for myself, as long as I'm only inserting one variable in the destination table.
Here is my code:
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
paste("insert into TestTable (UniqueID) Values (",b,")", sep = "")
When I replace the top 1 by any other number, let's say top 2, and run the exact same code, I get the following errors:
[1] "42000 195 [Microsoft][SQL Server Native Client 10.0][SQL Server]
'c' is not a recognized built-in function name."
[2] "[RODBC] ERROR: Could not SQLExecDirect
'insert into TestTable (UniqueID) Values (c(8535735, 8449336))'"
I understand that it is because there is an extra c that is generated, I assume for column when I give the command: paste(b).
So how can I get "8535735, 8449336" instead of "c(8535735, 8449336)" when using paste(b)? Or is there another way to do this?
Look into the collapse argument in the paste() documentation. Try replacing b with paste(b, collapse = ", "), as shown below.
Edit As Joshua points out, sqlQuery returns a data.frame, not a vector. So, instead of paste(b, collapse = ", "), you could use paste(b[[1]], collapse = ", ").
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
## note paste(b[[1]], collapse = ", ") in line below
paste("insert into TestTable (UniqueID) Values (", paste(b[[1]], collapse = ", "),")", sep = "")
Assuming b looks like this:
b <- data.frame(Noinscr=c("8535735", "8449336"))
Then you only need a couple steps:
# in case Noinscr is a factor
b$Noinscr <- as.character(b$Noinscr)
# convert the vector into a single string
# NOTE that I subset to get the vector, since b is a data.frame
B <- paste(b$Noinscr, collapse=",")
# create your query
paste("insert into TestTable (UniqueID) Values (",B,")", sep="")
# [1] "insert into TestTable (UniqueID) Values (8535735,8449336)"
You got odd results because sqlQuery returns a data.frame, not a vector. As you learned, using paste on a data.frame (or any list) can provide weird results because paste must return a character vector.