I have a textInput in a Shiny app for the user to write three character product codes separated by a comma. For example: F03, F04, F05.
The output of the textInput is used in a function calling a sql script. It will be used as a filter in the sql statement, eg
sqlfunction <- function(text){
sqlQuery(conn, stri_paste("select .... where product_code in (", text, ");"))
}
To convert the textInput to a string that I can use in the sql statement, I have used
toString(sprintf("'%s'", unlist(strsplit(input$text_input, ","))))
This works and converts the textInput to 'F03', 'F04'. 'F05' however when used in the sql only the first code, 'F03', is used in the search despite using product_code in (). The data returned is only those with a product code F03.
How do I get all three codes, if not more, written in textInput into a string to use in the sql clause?
Andrew
if your input is
F03, F04, F05
with whitespaces between commas and next value, the statement gives :
select .... where product_code in ('F03', ' F04', ' F05');
Note the whitespaces. Then the values 'F04' and 'F05' are not found.
What if the input is 'F03,F04,F05' ? (without spaces)
I'll combine the important components of the other answers and discussion into one suggested answer. Feel free to accept one of them, they had the ideas first.
As Jrm_FRL said, it is possible that whitespace around the commas is being preserved in your script, which should not match in the SQL.
toString(sQuote(unlist(strsplit("hello, world, again", ","))))
# [1] "'hello', ' world', ' again'"
### ^-- ^-- leading spaces on the strings
Some options:
If you think it is important for the user to be able to intentionally introduce whitespace around the commas (meaning: at the beginning or end of a string/pattern), then your only hope is to instruct the user to only use whitespace when intended.
Otherwise, you can use trimws:
toString(sQuote(trimws(unlist(strsplit("hello, world, again", ",")))))
# [1] "'hello', 'world', 'again'"
strsplit(..., ",") can be incorrect if the user has quotes to keep things together. You might consider using read.csv:
trimws(unlist(read.csv(text="hello, \"world, too\", again", header = FALSE, stringsAsFactors = FALSE)))
# V1 V2 V3
# "hello" "world, too" "again"
This is not perfectly compatible with option 1 above.
Second, as Tim Biegeleisen and Jrm_FRL both agreed, you are specifically prone to SQL injection here. A malformed (either accidentally or intentionally) search string from the user can at best corrupt this query's results, at worst (depending on connection permissions) corrupt or delete the data in the database. (I strongly suggest you read https://db.rstudio.com/best-practices/run-queries-safely/.)
Ways to safeguard this:
Don't manually add single-quotes around your data: if a string includes a single-quote, it will not be escaped and will cause at best a SQL error.
toString(sQuote(trimws(unlist(strsplit("hello'; drop table students; --", ",")))))
# [1] "'hello'; drop table students; --'"
### this query may delete the 'students' table
### notice that `sQuote` is not enough here, it is not escaping this correctly
Instead, use DBI::dbQuoteString. While I believe most DBMSes use the same convention of single-quotes (and your question suggests that yours does, too), it can be good practice to let the database driver determine how to deal with literal strings and those with embedded quotes.
toString(DBI::dbQuoteString(con, trimws(unlist(strsplit("hello'; drop table students; --", ",")))))
# [1] "'hello''; drop table students; --'"
### ^^ this is SQL's way of escaping an embedded single-quote
### this is now a single string, allegedly SQL-safe
Instead of including the strings into the query, use DBI::dbBind, though admittedly you'll need to include multiple binding bookmarks (often but not always ?) based on the length of your values vector.
val_str <- "hello, world, again"
val_vec <- trimws(unlist(strsplit("hello, world, again", ",")))
qmarks <- paste(rep("?", length(val_vec)), collapse = ",")
qmarks
# [1] "?,?,?"
qry <- paste("select ... where product_code in (", qmarks, ")")
out <- tryCatch({
res <- NULL
res <- DBI::dbSendStatement(con, qry)
DBI::dbBind(res, val_vec)
DBI::dbFetch(res)
}, finally = { if (!is.null(res)) suppressWarnings(DBI::dbClearResult(res)) })
The use of ? varies by DMBS, so you may need to do some research for your specific situation.
(While I used tryCatch here to "guarantee" that res will be cleared on exit, this pattern is a little more robust than doing it without. If part of the query or binding fails without the finally= part, then it could leave the connection in an imperfect state.)
Here is what I suspect is what you want:
text_input = "A,B,C"
in_clause <- paste0("'", unlist(strsplit(text_input, ",")), "'", collapse=",")
sql <- paste0("WHERE product_code IN (", in_clause, ")")
sql
[1] "WHERE product_code IN ('A','B','C')"
Here I am still using your combination of unlist and strsplit to generate a string vector of terms for the IN clause. But then I use paste with collapse to get the output you want.
Related
I am trying to turn the Influx query into a function in R so I can change the fields as I see fit. Here is an example of the code I am running
my_bucket <- "my_bucket"
start <- "start_time"
stop <- "stop_time"
q <- paste('from(bucket:',my_bucket,')|> range(start:',start,'stop:,'stop')',sep = "")
data <- client$query(q)
Error in private$.throwIfNot2xx(resp) :
API client error (400): compilation failed: error at #1:1-1:2: invalid statement: '
This particular method uses paste() and it keeps the escape character \ in the query. I would like to get rid of that . I have tried using cat() but that is for printing to the console and also have tried capture.output() of the cat() string which still captures the escape characters.
What I would like to see and be stored as an object is the output below. I used cat() to show you exactly what I need (I know I can't use it to store things).
cat('\'from(bucket:\"',my_bucket,'\")|> range(start:',start,',stop:,',stop,')\'', sep = "")
>'from(bucket:"my_bucket")|> range(start:start_time,stop:,stop_time)'
Note the single quotes around the query beginning at from and ending after the parantheses after stop_time. In addtion the double quotes must be present around the bucket I call to. This is required syntax for the query from R.
I would suggest you try to use sprintf, I find it much easier to properly format the query.
q <- sprintf('from(bucket: "%s") |> range(start: %s, stop: %s)', my_bucket, start, stop)
Anyway, the same can be done with paste:
q <- paste('from(bucket: "',my_bucket,'") |> range(start: ',start,',stop: ',stop,')',sep = "")
Basic setup is that I connect to a database A, get some data back to R, write it to another connection, database B.
The database is SQL_Latin1_General_CP1_CI_AS and I'm using encoding = "windows-1252" in connection A and B.
Display in RStudio is fine, special characters show as they should.
When I try to write the data, I get a "Cannot insert the value NULL into column".
I narrowed it down to at least one offending field: a cell with a PHI symbol, which causes the error.
How do I make it so the PHI symbol and presumably other special characters are kept the same from source to destination?
conA <- dbConnect(odbc(),
Driver = "ODBC Driver 17 for SQL Server",
Server = "DB",
Database = "serverA",
Trusted_connection = "yes",
encoding = "1252")
dbWriteTable(conB,SQL("schema.table"),failing_row, append = T)
#This causes the "cannot insert null value" error
I suggest working around this problem without dbplyr. As the overwhelming majority of encoding questions have nothing to do with dbplyr (the encoding tag has 23k questions, while the dbplyr tag has <400 questions) this may be a non-trivial problem to resolve without dbplyr.
Here are two work-arounds to consider:
Use a text file as an intermediate step
R will have no problem writing an in-memory table out to a text file/csv. And SQL server has standard ways of reading in a text file/csv. This gives you the added advantage of validating the contents of the text file before loading it into SQL.
Documentation for SQL Server BULK INSERT can be found here. This answer gives instructions for using UTF-8 encoding: CODEPAGE = '65001'. And this documentation gives instructions for unicode: DATAFILETYPE = 'widechar'.
If you want to take this approach entirely within R, it will likely look something like:
write.csv(failing_row, "output_file.csv")
query_for_creating_table = "CREATE TABLE schema_name.table_name (
col1 INT,
col2 NCHAR(10),
)"
# each column listed along with suitable data types
query_for_bulk_insert = "BULK INSERT schema_name.table_name
FROM 'output_file.csv'
WITH
(
DATAFILETYPE = 'widechar',
FIRSTROW = 2,
ROWTERMINATOR = '\n'
)"
DBI::dbExecute(con, query_for_creating_table)
DBI::dbExecute(con, query_for_bulk_insert)
Load all the non-error rows and append the final row after
I have has some success in the past using the INSERT INTO syntax. So would recommend loading the failing row using this approach.
Something like the following:
failing_row = local_df %>%
filter(condition_to_get_just_the_failing_row)
non_failing_rows = local_df %>%
filter(! condition_to_get_just_the_failing_row)
# write non-failing rows
dbWriteTable(con, SQL("schema.table"), non_failing_rows, append = T)
# prep for insert failing row
insert_query = "INSERT INTO schema.table VALUES ("
for(col in colnames(failing_row)){
value = failing_row[[col]]
if(is.numeric(value)){
insert_query = paste0(insert_query, value, ", ")
} else {
insert_query = paste0(insert_query, "'", value, "', ")
}
}
insert_query = paste0(insert_query, ");")
# insert failing row
dbExecute(con, insert_query)
other resources
If you have not seen them already, here are several related Q&A that might assist: Arabic characters, reading UTF-8, encoding to MySQL, and non-Latin characters as question marks. Though some of these are for reading data into R.
I have a database called "db" with a table called "company" which has a column named "name".
I am trying to look up a company name in db using the following query:
dbGetQuery(db, 'SELECT name,registered_address FROM company WHERE LOWER(name) LIKE LOWER("%APPLE%")')
This give me the following correct result:
name
1 Apple
My problem is that I have a bunch of companies to look up and their names are in the following data frame
df <- as.data.frame(c("apple", "microsoft","facebook"))
I have tried the following method to get the company name from my df and insert it into the query:
sqlcomp <- paste0("'SELECT name, ","registered_address FROM company WHERE LOWER(name) LIKE LOWER(",'"', df[1,1],'"', ")'")
dbGetQuery(db,sqlcomp)
However this gives me the following error:
tinyformat: Too many conversion specifiers in format string
I've tried several other methods but I cannot get it to work.
Any help would be appreciated.
this code should work
df <- as.data.frame(c("apple", "microsoft","facebook"))
comparer <- paste(paste0(" LOWER(name) LIKE LOWER('%",df[,1],"%')"),collapse=" OR ")
sqlcomp <- sprintf("SELECT name, registered_address FROM company WHERE %s",comparer)
dbGetQuery(db,sqlcomp)
Hope this helps you move on.
Please vote my solution if it is helpful.
Using paste to paste in data into a query is generally a bad idea, due to SQL injection (whether truly injection or just accidental spoiling of the query). It's also better to keep the query free of "raw data" because DBMSes tend to optimize a query once and reuse that optimized query every time it sees the same query; if you encode data in it, it's a new query each time, so the optimization is defeated.
It's generally better to use parameterized queries; see https://db.rstudio.com/best-practices/run-queries-safely/#parameterized-queries.
For you, I suggest the following:
df <- data.frame(names = c("apple", "microsoft","facebook"))
qmarks <- paste(rep("?", nrow(df)), collapse = ",")
qmarks
# [1] "?,?,?"
dbGetQuery(con, sprintf("select name, registered_address from company where lower(name) in (%s)", qmarks),
params = tolower(df$names))
This takes advantage of three things:
the SQL IN operator, which takes a list (vector in R) of values and conditions on "set membership";
optimized queries; if you subsequently run this query again (with three arguments), then it will reuse the query. (Granted, if you run with other than three companies, then it will have to reoptimize, so this is limited gain);
no need to deal with quoting/escaping your data values; for instance, if it is feasible that your company names might include single or double quotes (perhaps typos on user-entry), then adding the value to the query itself is either going to cause the query to fail, or you will have to jump through some hoops to ensure that all quotes are escaped properly for the DBMS to see it as the correct strings.
Trying to run a SQL statement in an RStudio environment, but I'm having difficulty extracting Java-style comments from the statement. I cannot edit the SQL statements / comments themselves, so trying a sequence of gsub to remove the unwanted special characters so I'm left with only the SQL statement in the R string.
I'm trying to use gsub to remove the special characters and the comment in between, but struggling to find the right regex to do so (especially one that does not read the division symbol in the SELECT statement as a part of the Java comment).
SELECT
id
, metric
, SUM(numerator)/SUM(denominator) AS rate
/*
This is an example of the comment.
I want to remove this. */
FROM table
WHERE id = 2
You can remove anything between /* and */ using this regex:
gsub(pattern = "/\\*[^*]*\\*/", replacement = "", x = text)
Result:
"SELECT\n id\n, metric\n, SUM(numerator)/SUM(denominator) AS rate\n/\nFROM table\nWHERE id = 2"
I built a script in R that automatically create a very long and complex SQL query to create a view over similar tables of 5 databases.
Of course there were integration issues to solve. The only one remaining to make this happen is the problem I am going to present you now.
Considering one very long string like
'"/*NOTES*/", "/*TABLE_ID*/", "/*TABLE_SUB_ID*/", "/*TABLE_SUB_SUB_ID*/", "OTHER_COLUMNS",'
My objective is to replace
this string '"/*' with this string '/*'
this string '*/",' with this string '*/'
I tried with:
gsub('"/*', '/*', '"/*NOTES*/", "/*TABLE_ID*/", "/*TABLE_SUB_ID*/", "/*TABLE_SUB_SUB_ID*/", "OTHER_COLUMNS",')
but it returns the string
'/**NOTES*//*, /**TABLE_ID*//*, /**TABLE_SUB_ID*//*, /**TABLE_SUB_SUB_ID*//*, /*OTHER_COLUMNS/*,'
whereas my expected output is the following string:
'/*NOTES*/ /*TABLE_ID*/ /*TABLE_SUB_ID*/ /*TABLE_SUB_SUB_ID*/ "OTHER_COLUMNS",'
Note the * is not escaped but it represents start (/*) and end (*/) of comments when the string will be run by a SQL compiler
Escaping regexes requires two backslashes, so the following will get you what you want:
gsub('"?(/\\*|\\*/)"?', '\\1', '"/*NOTES*/", "/*TABLE_ID*/", "/*TABLE_SUB_ID*/", "/*TABLE_SUB_SUB_ID*/", "OTHER_COLUMNS",')
# [1] "/*NOTES*/, /*TABLE_ID*/, /*TABLE_SUB_ID*/, /*TABLE_SUB_SUB_ID*/, \"OTHER_COLUMNS\","
FYI, double-backslashes are required for most, but the following are legitimate single-backslash special characters:
'\a\b\f\n\r\t\v'
# [1] "\a\b\f\n\r\t\v"
'\u0101' # unicode, numbers are variable
# [1] "a"
'\x0A' # hex, hex-numbers are variable
# [1] "\n"
Perhaps there are more, I didn't find the authoritative list though I'm sure it's in there somewhere.