Querying mixed case columns in SQL with R - r

I have a mixed case column in my_table that can only be queried using double quotes in psql. For example:
select "mixedCase" from my_table limit 5; would be the correct way to write the query in psql, and this returns records successfully
However, I am unable to replicate this query in R:
I have tried the following:
dbGetQuery(con, "SELECT '\"mixedCase\"' from my_table limit 5;")
which throws: RS-DBI driver warning: (unrecognized PostgreSQL field type unknown (id:705) in column 0)
dbGetQuery(con, "SELECT 'mixedCase' from my_table limit 5;")
which throws: RS-DBI driver warning: (unrecognized PostgreSQL field type unknown (id:705) in column 0)
dbGetQuery(con, "SELECT "mixedCase" from my_table limit 5;")
which throws Error: unexpected symbol in "dbGetQuery(con, "SELECT "mixedCase"
What is the solution for mixed case columns with the RPostgreSQL package?

You seem to understand the problem, yet you never actually tried just using the literal correct query in R. Just escape the double quotes in the query string and it should work:
dbGetQuery(con, "SELECT \"mixedCase\" from my_table limit 5;")
Your first two attempts would have failed because you are passing in mixedCase as a string literal, not as a column name. And the third attempt would fail on the R side because you are passing in a broken string/code.

Related

Declaring variable in R for DBI query to MS SQL

I'm writing an R query that runs several SQL queries using the DBI package to create reports. To make this work, I need to be able to declare a variable in R (such as a Period End Date) that is then called from within the SQL query. When I run my query, I get the following error:
If I simply use the field name (PeriodEndDate), I get the following error:
Error in (function (classes, fdef, mtable) : unable to find an
inherited method for function ‘dbGetQuery’ for signature ‘"Microsoft
SQL Server", "character"’
If I use # to access the field name (#PeriodEndDate), I get the following error:
Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC SQL Server
Driver][SQL Server]Must declare the scalar variable "#PeriodEndDate".
[Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not
be prepared. '
An example query might look like this:
library(DBI) # Used for connecting to SQL server and submitting SQL queries.
library(tidyverse) # Used for data manipulation and creating/saving CSV files.
library(lubridate) # Used to calculate end of month, start of month in queries
# Define time periods for queries.
PeriodEndDate <<- ceiling_date(as.Date('2021-10-31'),'month') # Enter Period End Date on this line.
PeriodStartDate <<- floor_date(PeriodEndDate, 'month')
# Connect to SQL Server.
con <- dbConnect(
odbc::odbc(),
driver = "SQL Server",
server = "SERVERNAME",
trusted_connection = TRUE,
timeout = 5,
encoding = "Latin1")
samplequery <- dbGetQuery(con, "
SELECT * FROM [TableName]
WHERE OrderDate <= #PeriodEndDate
")
I believe one way might be to use the paste function, like this:
samplequery <- dbGetQuery(con, paste("
SELECT * FROM [TableName]
WHERE OrderDate <=", PeriodEndDate")
However, that can get unwieldy if it involves several variables being referenced outside the query or in several places within the query.
Is there a relatively straightforward way to do this?
Thanks in advance for any thoughts you might have!
The mechanism in most DBI-based connections is to use ?-placeholders[1] in the query and params= in the call to DBI::dbGetQuery or DBI::dbExecute.
Perhaps this:
samplequery <- dbGetQuery(con, "
SELECT * FROM [TableName]
WHERE OrderDate <= ?
", params = list(PeriodEndDate))
In general the mechanisms for including an R object as a data-item are enumerated well in https://db.rstudio.com/best-practices/run-queries-safely/. In the order of my recommendation,
Parameterized queries (as shown above);
glue::glue_sql;
sqlInterpolate (which uses the same ?-placeholders as #1);
The link also mentions "manual escaping" using dbQuoteString.
Anything else is in my mind more risky due to inadvertent SQL corruption/injection.
I've seen many questions here on SO that try to use one of the following techniques: paste and/or sprintf using sQuote or hard-coded paste0("'", PeriodEndDate, "'"). These are too fragile in my mind and should be avoided.
My preference for parameterized queries extends beyond this usability, it also can have non-insignificant impacts on repeated use of the same query, since DBMSes tend to analyze/optimize the query and cache this for the next use. Consider this:
### parameterized queries
DBI::dbGetQuery("select ... where OrderDate >= ?", params=list("2020-02-02"))
DBI::dbGetQuery("select ... where OrderDate >= ?", params=list("2020-02-03"))
### glue_sql
PeriodEndDate <- as.Date("2020-02-02")
qry <- glue::glue_sql("select ... where OrderDate >= {PeriodEndDate}", .con=con)
# <SQL> select ... where OrderDate >= '2020-02-02'
DBI::dbGetQuery(con, qry)
PeriodEndDate <- as.Date("2021-12-22")
qry <- glue::glue_sql("select ... where OrderDate >= {PeriodEndDate}", .con=con)
# <SQL> select ... where OrderDate >= '2021-12-22'
DBI::dbGetQuery(con, qry)
In the case of parameterized queries, the "query" itself never changes, so its optimized query (internal to the server) can be reused.
In the case of the glue_sql queries, the query itself changes (albeit just a handful of character), so most (all?) DBMSes will re-analyze and re-optimize the query. While they tend to do it quickly, and most analysts' queries are not complex, it is still unnecessary overhead, and missing an opportunity in cases where your query and/or the indices require a little more work to optimize well.
Notes:
? is used by most DBMSes but not all. Others use $name or $1 or such. With odbc::odbc(), however, it is always ? (no name, no number), regardless of the actual DBMS.
Not sure if you are using this elsewhere, but the use of <<- (vice <- or =) can encourage bad habits and/or unreliable/unexpected results.
It is not uncommon to use the same variable multiple times in a query. Unfortunately, you will need to include the variable multiple times, and order is important. For example,
samplequery <- dbGetQuery(con, "
SELECT * FROM [TableName]
WHERE OrderDate <= ?
or (SomethingElse = ? and OrderDate > ?)0
", params = list(PeriodEndDate, 99, PeriodEndDate))
If you have a list/vector of values and want to use SQL's IN operator, then you have two options, my preference being the first (for the reasons stated above):
Create a string of question marks and paste into the query. (Yes, this is pasteing into the query, but we are not dealing with the risk of incorrectly single-quoting or double-quoting. Since DBI does not support any other mechanism, this is what we have.)
MyDates <- c(..., ...)
qmarks <- paste(rep("?", length(MyDates)), collapse=",")
samplequery <- dbGetQuery(con, sprintf("
SELECT * FROM [TableName]
WHERE OrderDate IN (%s)
", qmarks), params = as.list(MyDates))
glue_sql supports expanding internally:
MyDates <- c(..., ...)
qry <- glue::glue_sql("
SELECT * FROM [TableName]
WHERE OrderDate IN ({MyDates*})", .con=con)
DBI::dbGetQuery(con, qry)

R with postgresql database

I've been trying to query data from postgresql database (pgadmin) into R and analyse. Most of the queries work except when I try to write a condition specifically to filter out most of the rows. Please find the code below
dbGetQuery(con, 'select * from "db_name"."User" where "db_name"."User"."FirstName" = "Mani" ')
Error in result_create(conn#ptr, statement) :
Failed to prepare query: ERROR: column "Mani" does not exist
LINE 1: ...from "db_name"."User" where "db_name"."User"."FirstName" = "Mani"
^
this is the error I get, Why is it considering Mani as a column when it is just an element. Someone pls assist me
String literals in Postgres (and most flavors of SQL) take single quotes. This, combined with a few other optimizations in your code leave us with this:
sql <- "select * from db_name.User u where u.FirstName = 'Mani'"
dbGetQuery(con, sql)
Note that introduced a table alias, for the User table, so that we don't have to repeat the fully qualified name in the WHERE clause.

loading in a MySQL table called "order" with RMySQL

I'm currently trying to connect my R session to a MySQL server using the RMySQL package.
One of the tables on the server is called "order", I already searched how you can import a table called order with MySQL (by putting it into ''), yet the syntax does not work for the RMySQL query.
when I run the following statement:
order_query = dbSendQuery(mydb,"SELECT * FROM 'order'")
It returns the following error:
Error in .local(conn, statement, ...) : could not run statement:
You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near ''order'' at line 1
Anyone knows how to get around this in R?
Single quotes in MySQL indicate string literals, and you should not be putting them around your table names. Try the query without the quotes:
order_query = dbSendQuery(mydb,"SELECT * FROM `order`")
If you did, for some reason, need to escape your table name, then use backticks, e.g.
SELECT * FROM `some table` -- table name with a space (generally a bad thing)
Edit:
As #Ralf pointed out, in this case you do need backticks because ORDER is a MySQL keyword and you should not be using it to name your tables and columns.

How do I run a SQL update statement in RODBC?

When trying to run an update with a SQL statement with the sqlQuery function in RODBC, it brings up an error
"[RODBC] ERROR: Could not SQLExecDirect '.
How do you run a direct update statement with R?
You cannot use a plain SQL update statement with the SQL query function, it just needs to return a resultset. For example, the following statement won't work:
sql="update mytable set column=value where column=value"
cn <-odbcDriverConnect(connection="yourconnectionstring")
resultset <- sqlQuery(cn,sql)
But if you add an output statement, the SQL query function will work fine. For example.
sql="update mytable set column=value output inserted.column where column=value"
cn <-odbcDriverConnect(connection="yourconnectionstring")
resultset <- sqlQuery(cn,sql)
I just added a function to make it easy to take your raw sql and quickly turn it into an update statement.
setUpdateSql <-function(updatesql, wheresql, output="inserted.*"){
sql=paste(updatesql," output ",output, wheresql)
sql=gsub("\n"," ",sql) #remove new lines if they appear in sql
return(sql)
}
So now I just need to split the SQL statement and it will run. I could also add an "inserted.columnname" if I didn't want to return the whole thing.
sql=setUpdateSql("update mytable set column=value","where column=value","inserted.column")#last parameter is optional
cn <-odbcDriverConnect(connection="yourconnectionstring")
resultset <- sqlQuery(cn,sql)
The other advantage with this method is you can find out what has changed in the resultset.

RS-DBI driver warning: (unrecognized MySQL field type 7 in column 1 imported as character)

I'm trying to run a simple query that works with MySQL or other MySQL connector API's,
SELECT * FROM `table` WHERE type = 'farmer'
I've tried various methods using the RMySQL package and they all get the same error
RS-DBI driver warning: (unrecognized MySQL field type 7 in column 1 imported as character)
Type = 'farmer'
(Query<-paste0("SELECT * FROM `table` WHERE type = '%",Type,"%'"))
res<-dbGetQuery(con, Query)
Query<-paste("SELECT * FROM `table` WHERE type = \'farmer\'")
Query<-paste("SELECT * FROM `table` WHERE type = 'farmer'")
What am I doing wrong?
"type" is a keyword in MYSQL. Surround the it with backticks to escape field names.
SELECT * FROM `table` WHERE `type` = 'farmer'
Also you probably have a time stamp column in your table. R is known to not recognize that column type. Convert it to a unix time stamp in the portion of the SQL statement.
Looks like the db schema has something in column which is of type 7 -- and that type appears to be unknown to the RMySQL driver.
I try to exclude column one in the query, or cast it at the select * ... level eg via something like
select foo as character, bar, bim, bom from 'table' where ...
To be clear, when I encountered this error message, it was because my data field was a time stamp.
I verified this by changing my query to SELECT created_at FROM ... which caused the error. I also verified this by changing the query not to include the column names that were timestamps, then I had no errors.
Note too, that the error message counts columns starting from 0 (instead of 1 as R does)
IMHO, the answer is you aren't doing anything wrong, but it's something that needs to be fixed in RMySQL.
The workaround is after you read in your data, you need to call one of the several possible character to datetime conversion functions. (Which one depends on what you want to do with the time stamp exactly.)

Resources