Matching dates in sqldf - r

I have a data frame with stock data (date, symbol, high, low, open, close, volume). Using r and mysql and sqldf and rmysql I have a list of unique dates and unique stock symbols.
What I need now is to loop through the data and find the close on two specified dates. For instance:
stkData contains (date, symbol, high, low, open, close, volume)
dates contains unique dates
symbol contains unique symbols
I want to loop through the lists in a sqldf statement as such:
'select stkData$close from stkData where symbol = symbol[k] and date = dates[j]'
k and j would be looped numbers, but my problem is the symbol[k] and dates[j] parts.
sqldf won't read them properly (or I can't code properly). I've tried as.Date, as.character with no luck. I get the following error message:
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: near "[4,]": syntax error)

You're pretty far off in terms of syntax for sqldf, unfortunately. You can't use $ or [] notations in sqldf calls because those are both R syntax, not SQL syntax. It's an entirely separate language. What's happening is that sqldf is taking your data frame, importing it into SQLite3, executing the SQL query that you supply against the resulting table, and then importing the result set back into R as a data frame. No R functionality is available within the SQL.
It's not clear to me what you're trying to do, but if you want to run multiple queries in a loop, you probably want to construct the SQL query as a string using the R function paste(), so that when it gets to SQLite3 it'll just be static values where you currently have symbol[k] and dates[j].
So, you'll have something like the following, but wrapped in a loop for j and k:
sqldf(paste('select close from stkData where symbol = ', symbol[k],
' and date = ', dates[j]))

You might need to construct the select statement as a string with paste before it gets passed to your SQL caller. Something like:
combo_kj <- expand.grid(ksym=symbol[1:k], jdates=dates[1:j])
SQLcalls <- paste('select close from stkData where symbol = ',
combo_kj$ksym,
' and date = '
combo_kj$jdates,
sep="")
And then loop over SQLcalls with whatever code you are using.

Preface sqldf with fn$ as shown and then strings within backticks will be replaced by the result of running them in R and strings of the form $variable will be replaced by the contents of that variable (provided the variable name contains only word characters). Note that SQL requires that character constants be put in quotes so be sure to surround the backticks or $variable with quotes:
fn$sqldf("select close from stkData
where symbol = '`symbol[k]`' and
date = '`dates[j]`' ")
To use the $variable syntax try this:
mysymbol <- symbol[k]
mydate <- dates[j]
fn$sqldf("select close from stkData
where symbol = '$mysymbol' and
date = '$mydate' ")
Also see example 5 on the sqldf github page: https://github.com/ggrothendieck/sqldf

Related

how to change dbplyr's show_query() quote covention for snowflake SQL

I am using dbplyr's package to translate my dplyr's query to SQL and it works really well, however when I copy and paste my translated SQL statement it wont run in snowflake because the quotes the columns with ` (the key above tab), whereas my snowflake SQL will only run if its columns are quoted with either " (double quote) ' (single quote) or no quote at all (if there are no breaks).
Is there a way to change the dbplyr::show_query() argument so that the outcome is in double quotes or single quotes instead of backtick? there is a con argument which I've set to simulate_snowflake() however that doesn't change anything.
The error I get is: SQL compilation error: error line 2 at position 0 invalid identifier '"COL_NAME"'
#This will not work in my snowflake SQL
SELECT
`COL_NAME`
FROM
TABLENAME
#This will work though:
SELECT
"COL_NAME"
FROM
TABLENAME
One possibility would be to use sql_render(), convert to a character string, and use any regex replace process that you like to change the quotes. For example:
<pipeline> %>%
sql_render() %>%
as.character() %>%
str_replace_all(pattern="`",replacement = "\\\"")

Create Influx query as a function in R

I am trying to turn the Influx query into a function in R so I can change the fields as I see fit. Here is an example of the code I am running
my_bucket <- "my_bucket"
start <- "start_time"
stop <- "stop_time"
q <- paste('from(bucket:',my_bucket,')|> range(start:',start,'stop:,'stop')',sep = "")
data <- client$query(q)
Error in private$.throwIfNot2xx(resp) :
API client error (400): compilation failed: error at #1:1-1:2: invalid statement: '
This particular method uses paste() and it keeps the escape character \ in the query. I would like to get rid of that . I have tried using cat() but that is for printing to the console and also have tried capture.output() of the cat() string which still captures the escape characters.
What I would like to see and be stored as an object is the output below. I used cat() to show you exactly what I need (I know I can't use it to store things).
cat('\'from(bucket:\"',my_bucket,'\")|> range(start:',start,',stop:,',stop,')\'', sep = "")
>'from(bucket:"my_bucket")|> range(start:start_time,stop:,stop_time)'
Note the single quotes around the query beginning at from and ending after the parantheses after stop_time. In addtion the double quotes must be present around the bucket I call to. This is required syntax for the query from R.
I would suggest you try to use sprintf, I find it much easier to properly format the query.
q <- sprintf('from(bucket: "%s") |> range(start: %s, stop: %s)', my_bucket, start, stop)
Anyway, the same can be done with paste:
q <- paste('from(bucket: "',my_bucket,'") |> range(start: ',start,',stop: ',stop,')',sep = "")

RSQLite dbGetQuery with input from Data Frame

I have a database called "db" with a table called "company" which has a column named "name".
I am trying to look up a company name in db using the following query:
dbGetQuery(db, 'SELECT name,registered_address FROM company WHERE LOWER(name) LIKE LOWER("%APPLE%")')
This give me the following correct result:
name
1 Apple
My problem is that I have a bunch of companies to look up and their names are in the following data frame
df <- as.data.frame(c("apple", "microsoft","facebook"))
I have tried the following method to get the company name from my df and insert it into the query:
sqlcomp <- paste0("'SELECT name, ","registered_address FROM company WHERE LOWER(name) LIKE LOWER(",'"', df[1,1],'"', ")'")
dbGetQuery(db,sqlcomp)
However this gives me the following error:
tinyformat: Too many conversion specifiers in format string
I've tried several other methods but I cannot get it to work.
Any help would be appreciated.
this code should work
df <- as.data.frame(c("apple", "microsoft","facebook"))
comparer <- paste(paste0(" LOWER(name) LIKE LOWER('%",df[,1],"%')"),collapse=" OR ")
sqlcomp <- sprintf("SELECT name, registered_address FROM company WHERE %s",comparer)
dbGetQuery(db,sqlcomp)
Hope this helps you move on.
Please vote my solution if it is helpful.
Using paste to paste in data into a query is generally a bad idea, due to SQL injection (whether truly injection or just accidental spoiling of the query). It's also better to keep the query free of "raw data" because DBMSes tend to optimize a query once and reuse that optimized query every time it sees the same query; if you encode data in it, it's a new query each time, so the optimization is defeated.
It's generally better to use parameterized queries; see https://db.rstudio.com/best-practices/run-queries-safely/#parameterized-queries.
For you, I suggest the following:
df <- data.frame(names = c("apple", "microsoft","facebook"))
qmarks <- paste(rep("?", nrow(df)), collapse = ",")
qmarks
# [1] "?,?,?"
dbGetQuery(con, sprintf("select name, registered_address from company where lower(name) in (%s)", qmarks),
params = tolower(df$names))
This takes advantage of three things:
the SQL IN operator, which takes a list (vector in R) of values and conditions on "set membership";
optimized queries; if you subsequently run this query again (with three arguments), then it will reuse the query. (Granted, if you run with other than three companies, then it will have to reoptimize, so this is limited gain);
no need to deal with quoting/escaping your data values; for instance, if it is feasible that your company names might include single or double quotes (perhaps typos on user-entry), then adding the value to the query itself is either going to cause the query to fail, or you will have to jump through some hoops to ensure that all quotes are escaped properly for the DBMS to see it as the correct strings.

Using pl/sql trunc function with dbplyr in R

I am trying to use dbplyr and the trunc function from pl/sql, to mutate the date column to the start of the month.
df %>% mutate(start_month = sql(trunc(date_column, 'month'))
however this throws an error invalid identifier when executing the query. I think it is because when it is parsed to pl/sql as a string the query reads select .... trunc("date_column",'month') as start_month so it doesn't recognise as a column name due to the quotes inside the sql function.... Any ideas on how to do this another way or get around this error would be great.
You can probably achieve this by removing the quote marks from month and possible the sql function from inside your mutate.
dbplyr works by translating dplyr commands into the equivalent sql. Where there is no translation it defaults to leaving the command as is. You can make use of this feature to pass in sql commands.
I recommend trying
df %>% mutate(start_month = TRUNC(date_column, MONTH))
As dbplyr translations are not defined for TRUNC or MONTH these should appear in your plsql query in the same way as they appear in your R code:
SELECT ... TRUNC(date_column, MONTH) AS start_month
I recommend writing the commands you do not want translated in capitals because R is case sensitive but sql is not. So even if month is an R function, MONTH is probably not an R function and hence there is no chance dbplyr will try and translate it.

Date shows up as number

I have fetched a set of dates from postgresql, they look correct:
[1] "2007-07-13" "2007-07-14" "2007-07-22" "2007-07-23" "2007-07-24"
[6] "2007-07-25" "2007-08-13" "2007-08-14" "2007-08-15" "2007-08-16"
etc.
Then I want to run a loop on them to make new sql sentences to fetch some other data sets (yes, I know what I am doing, it would not have been possible to do all the processing in the database server)
So I tried
for(date in geilodates)
mapdate(date,geilo)
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: invalid input syntax for type date: "13707"
LINE 1: ...id_date_location where not cowid is null and date='13707' or...
mapdate is a function I have written, the use of date within that is
sql=paste('select * from gps_coord where cowid=',cowid," and date='",date,"'",sep='')
So, what has happened is that R silently converted my formatted dates to their integer representations before i tried to paste the sql together.
How do I get the original textual representation of the date? I tried
for(date in geilodates){
d=as.Date(date,origin="1970-01-01")
mapdate(d,geilo)
}
Error in charToDate(x) :
character string is not in a standard unambiguous format
And I have not managed to find any other functions to create a datestring (or to "serve" the date as the string I get when listing the variable
Thanks to wush978 for pointing me in the right direction, In the end I had to do:
for(d in geilodates){
date=format(as.Date(d,origin="1970-01-01"))
mapdate(date,geilo)
}
For some reason, inside the loop the "date" variable was seen as an integer, so I explicitely had to convert it to a date and then format it...
try ?format.Date
x <- Sys.Date()
class(x)
class(format(x))
In R, the data of class Date is a numeric type.
An official way to represent Date as string is to call format.
I doubt that the format of date is defined in your case, so paste
does something unexpected.
Maybe you need to put format(x, "%Y-%m-%d") in your paste function instead of date to tell R how which format you want for Date.

Resources