In perl/python DBI APIs have a mechanism to safely interpolate in parameters to an sql query. For example in python I would do:
cursor.execute("SELECT * FROM table WHERE value > ?", (5,))
Where the second parameter to the execute method is a tuple of parameters to add into the sql query
Is there a similar mechanism for R's DBI compliant APIs? The examples I've seen never show parameters passed to the query. If not, what is the safest way to interpolate in parameters to a query? I'm specifically looking at using RPostgresSQL.
Just for completeness, I'll add an answer based on Hadley's comment. The DBI package now has the function sqlInterpolate which can also perform this. It requires a list of function arguments to be named in the sql query that all must start with a ?. Excerpt from the DBI manual below
sql <- "SELECT * FROM X WHERE name = ?name"
sqlInterpolate(ANSI(), sql, name = "Hadley")
# This is safe because the single quote has been double escaped
sqlInterpolate(ANSI(), sql, name = "H'); DROP TABLE--;")
Indeed the use of bind variables is not really well documented. Anyway the ODBC commands in R work differently for different databases. One possibility for postgres would be like this:
res <- postgresqlExecStatement(con, "SELECT * FROM table WHERE value > $1", c(5))
postgresqlFetch(res)
postgresqlCloseResult(res)
Hope it helps.
Related
I have a database called "db" with a table called "company" which has a column named "name".
I am trying to look up a company name in db using the following query:
dbGetQuery(db, 'SELECT name,registered_address FROM company WHERE LOWER(name) LIKE LOWER("%APPLE%")')
This give me the following correct result:
name
1 Apple
My problem is that I have a bunch of companies to look up and their names are in the following data frame
df <- as.data.frame(c("apple", "microsoft","facebook"))
I have tried the following method to get the company name from my df and insert it into the query:
sqlcomp <- paste0("'SELECT name, ","registered_address FROM company WHERE LOWER(name) LIKE LOWER(",'"', df[1,1],'"', ")'")
dbGetQuery(db,sqlcomp)
However this gives me the following error:
tinyformat: Too many conversion specifiers in format string
I've tried several other methods but I cannot get it to work.
Any help would be appreciated.
this code should work
df <- as.data.frame(c("apple", "microsoft","facebook"))
comparer <- paste(paste0(" LOWER(name) LIKE LOWER('%",df[,1],"%')"),collapse=" OR ")
sqlcomp <- sprintf("SELECT name, registered_address FROM company WHERE %s",comparer)
dbGetQuery(db,sqlcomp)
Hope this helps you move on.
Please vote my solution if it is helpful.
Using paste to paste in data into a query is generally a bad idea, due to SQL injection (whether truly injection or just accidental spoiling of the query). It's also better to keep the query free of "raw data" because DBMSes tend to optimize a query once and reuse that optimized query every time it sees the same query; if you encode data in it, it's a new query each time, so the optimization is defeated.
It's generally better to use parameterized queries; see https://db.rstudio.com/best-practices/run-queries-safely/#parameterized-queries.
For you, I suggest the following:
df <- data.frame(names = c("apple", "microsoft","facebook"))
qmarks <- paste(rep("?", nrow(df)), collapse = ",")
qmarks
# [1] "?,?,?"
dbGetQuery(con, sprintf("select name, registered_address from company where lower(name) in (%s)", qmarks),
params = tolower(df$names))
This takes advantage of three things:
the SQL IN operator, which takes a list (vector in R) of values and conditions on "set membership";
optimized queries; if you subsequently run this query again (with three arguments), then it will reuse the query. (Granted, if you run with other than three companies, then it will have to reoptimize, so this is limited gain);
no need to deal with quoting/escaping your data values; for instance, if it is feasible that your company names might include single or double quotes (perhaps typos on user-entry), then adding the value to the query itself is either going to cause the query to fail, or you will have to jump through some hoops to ensure that all quotes are escaped properly for the DBMS to see it as the correct strings.
Using R-3.5.0 and RODBC v. 1.3-15 on Windows.
I am trying to query data from a remote database. I can connect fine and if I do a query to count the rows, the answer comes out correctly. But if I try to remove the count statement select count(*) and actually get the data via select *, I yield an empty query (with some rather strange headers). Only two of the column names come out correctly and the rest are question marks and a number (as shown below). I can using sql developer to query the data no problem.
I include the simplest version of the code below but I get the same results if I try to limit to just a few rows or certain conditions, etc. Sorry I cannot create a reproducible example but as this is a remote db and I have no idea what the problem is, I'm not sure how I could even do that.
I can query other tables from different schemas within the same odbc connection, so I don't think it is that. I have tried with and without the believeNRows and the rows_at_time.
Thank you for any thoughts.
channel <- odbcConnect("mydb", uid="myuser", pwd="mypass", believeNRows=FALSE,rows_at_time = 1)
myquery <- paste("select count(*) from MYSCHEMA.MYTABLE")
sqlQuery(channel, myquery)
COUNT(*)
1 149712361
myquery <- paste("select * from MYSCHEMA.MYTABLE")
sqlQuery(channel, myquery)
[1] ID FMC_IN_ID ? ?.1 ?.2 ?.3 ?.4 ?.5 ?.6 ?.7 ?.8 ?.9 ?.10 ?.11 ?.12 ?.13 ?.14 ?.15
<0 rows> (or 0-length row.names)
I would try the following:
add a simple limit 100 to your query to see if you can get some data back
add the believeNRows option to the sqlQuery call -- in my experience it is needed at that level
In case it helps others, the problem was that the database contained an Oracle spatial field (MDSYS.SDO_GEOMETRY). R did not know what to do with it. I assumed it would just convert it to a character but instead it just got confused. By omitting the spatial field, the query worked fine.
I'm currently trying to connect my R session to a MySQL server using the RMySQL package.
One of the tables on the server is called "order", I already searched how you can import a table called order with MySQL (by putting it into ''), yet the syntax does not work for the RMySQL query.
when I run the following statement:
order_query = dbSendQuery(mydb,"SELECT * FROM 'order'")
It returns the following error:
Error in .local(conn, statement, ...) : could not run statement:
You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near ''order'' at line 1
Anyone knows how to get around this in R?
Single quotes in MySQL indicate string literals, and you should not be putting them around your table names. Try the query without the quotes:
order_query = dbSendQuery(mydb,"SELECT * FROM `order`")
If you did, for some reason, need to escape your table name, then use backticks, e.g.
SELECT * FROM `some table` -- table name with a space (generally a bad thing)
Edit:
As #Ralf pointed out, in this case you do need backticks because ORDER is a MySQL keyword and you should not be using it to name your tables and columns.
How to get the list of table names from database for certain scheme?
tabellen <- dbListTables(con, all=T)
gives all the tables from database, but i would like to specify the scheme. I read in the ROracle package that i can specify scheme like:
tabellen <- dbListTables(con, schema="K")
However i get an empty character...
when i use sql command:
rs <- dbSendQuery(con, "SELECT * FROM ALL_TABLES WHERE OWNER ='K'")
data <- fetch(rs)
It works but i get a table, not a list what i would prefer too.. Is there a way to get directly the list of tables? [SOLVED] - too much programming..I wrote scheme instead of schema...Thanks for pointing it out, my bad, sorry for that
And additionally how i can get the name of columns for certain table which i choosed [NOT SOLVED]
Thanks for help
I am using TSQLT AssertResultSetsHaveSameMetaData to compare metadata between two tables.But the problem is that i cannot hardcode the table name since i am passing the table name as the parameter at the runtime.So is there any way to do that
You use tSQLt.AssertResultSetsHaveSameMetaData by passing two select statements like this:
exec tSQLt.AssertResultSetsHaveSameMetaData
'SELECT TOP 1 * FROM mySchema.ThisTable;'
, 'SELECT TOP 1 * FROM mySchema.ThatTable;';
So it should be quite easy to parameterise the names of the tables you are comparing and build the SELECT statements based on those table name parameters.
However, if you are using the latest version of tSQLt you can also now use tSQLt.AssertEqualsTableSchema to do the same thing. You would use this assertion like this:
exec tSQLt.AssertEqualsTableSchema
'mySchema.ThisTable'
, 'mySchema.ThatTable';
Once again, parameterising the tables names would be easy since they are passed to AssertEqualsTableSchema as parameters.
If you explain the use case/context and provide sample code to explain what you are trying to do you stand a better chance of getting the help you need.