Using Dates with RSQLite - r

How do you write a SQL query with a date using RSQLite. Here is an example below. The dbGetQuery does not return any rows.
require(RSQLite)
require(ggplot2)
data(presidential)
m <- dbDriver("SQLite")
tmpfile <- tempfile('presidential', fileext='.db')
conn <- dbConnect(m, dbname=tmpfile)
dbWriteTable(conn, "presidential", presidential)
dbGetQuery(conn, "SELECT * FROM presidential WHERE Date(start) >= Date('1980-01-01')")

Just to illustrate, this works fine:
tmpfile <- tempfile('presidential', fileext='.db')
conn <- dbConnect(m, dbname=tmpfile)
p <- presidential
p$start <- as.character(p$start)
p$end <- as.character(p$end)
dbWriteTable(conn, "presidential", p)
dbGetQuery(conn, "SELECT * FROM presidential WHERE start >= '1980-01-01'")
You can read about the lack of native date types in SQLite in the docs here. I've been using strings as dates for so long in SQLite that I'd actually forgotten about the issue completely.
And yes, I've written a small R function that converts any Date column in a data frame to character. For simple comparisons, keeping them in YYYY-MM-DD is enough, and if I need to do arithmetic I convert them after the fact in R.

Following on from #joran's answer, here's a simple function to convert date columns to string for a data.frame.
mutate(df, across(where(is.Date), ~ format(.x, "%Y.%m.%d")))

I found working with RSQLite and dplyr to be the most convenient way to stay type-consistent using R and SQLite. In particular, extendend_types = TRUE ensures that columns of type DATE, DATETIME / TIMESTAMP, and TIME are mapped to the corresponding R-classes (at least after version 2.2.8 for RSQLite).
library(dplyr)
library(RSQLite)
library(ggplot2)
data(presidential)
mydb <- dbConnect(SQLite(), "presidential.sqlite", extended_types = TRUE)
dbWriteTable(mydb, "presidential", presidential)
tbl(mydb, "presidential") %>%
filter(start >= as.Date("1980-01-01")) %>%
collect()
You can also formulate the latter collection as a get query:
dbGetQuery(mydb, "SELECT * FROM presidential WHERE start >= CAST('1980-01-01' AS DATE)")

As #joran suggests keeping dates in text in SQLlite seems like the best way to go for the time being.
I used #Richard Knight's approach for conversion in, but with ISO format, to change the date to string before writing the dataframe:
local_df %>% mutate(across(where(lubridate::is.Date), ~ format(.x, "%Y-%m-%d")))
Manipulating the dates remotely can be done using sql translation, particularly:
remote_df %>% mutate(date_as_number = julianday(date_as_string))
remote_df %>% mutate(date_as_string = date(date_as_number))
N.b. that is date not as.Date in the second one. This is because as.Date will get translated to CAST(date_as_number AS DATE) whereas what we want is to use SQLLite's date() function with a floating point number as returned by julianday().
Mapping the remote datestrings back into dates can be done automatically, if you :
collect <- function(remote_df, ...) {
raw = remote_df %>% dplyr::collect(...)
isoDateString = function(x) return(is.character(x) & all(na.omit(stringr::str_detect(x,"[0-9]{4}-[0-9]{2}-[0-9]{2}"))) & !all(is.na(x)))
raw = raw %>% mutate(across(where(isoDateString), ~ as.Date(.x, "%Y-%m-%d")))
maybeJulian = function(x) {return(is.double(x) & all(na.omit(x>2440587.5)) & all(na.omit(x<2488069.5)) & !all(is.na(x)))}
raw = raw %>% mutate(across(matches(".*(D|d)ate.*") & where(maybeJulian), ~ as.Date(.x-2440587.5, "1970-01-01")))
return(raw)
}
The apparently random numbers in the maybeJulian function correspond to 1970-01-01 and 2100-01-01

Related

R: JBDC result query using a loop

I am trying to run several queries with one chunk of code to save time. Normally, I enter the variable and time, and return the results one at a time. However, there should be a way to use a loop that I iterate several variables through with one line, and collect the results in a data.frame with embedded code. However, I keept getting met with error "Error in dbSendQuery(con, sql) : Unable to retrieve JDBC result set..."
#enter sensor
sensor <- a
#enter start
t_start <- c("2022-07-29 04:54:00.0")
#enter end
t_end <-c("2022-08-01 23:59:59.0")
#open con
con <- openConn()
s <- tbl(con, in_schema("doc","sensors"))
consol_final <- s %>%
filter(timeStamp >= t_start & t_end >= timeStamp,
topic %in% sensor) %>%
select("timeStamp", "ID", "running", "value") %>%
collect() %>%
arrange(timeStamp)
closeConn
There must be a way to do this much faster than manually changing this using iteration. Here is my attempt that won't work no matter how much I tinker with it:
solution <- list()
sensor_list <- c(a,b,c,d,e,f) #note each of these is an object containing a topic URL
s <- tbl(con, in_schema("doc","sensors"))
con <- openConn()
for (i in 1:length(sensor_list)){
consol_final <- s %>%
filter(timeStamp >= t_start & t_end >= timeStamp,
topic %in% sensor_list[i]) %>%
select("timeStamp", "ID", "running", "value") %>%
collect() %>%
arrange(timeStamp)
rbind(consol_final, solution)
}
I am very limited with my understanding of JBDC queries so it is difficult to wrap my head around the problem here. There also must be a solution to iterate through several time periods stored in a list as well. If possible, please advise on this as well. Thank you!!

How to lookup data and print values based on criteria in R?

So I have a csv file that has 12 columns of data, what I want to do is get specific values from the CSV file based on the desired criteria
A snip of the data is provided, so I have this list of Maps:
Maps <- c("Nuke","Vertigo","Inferno","Mirage","Train","Overpass","Dust2")
The goal is to get CTWinProb & TWinProb values for each of the maps in the Map list, e.g.
CTWinProbs;
Nuke = 0.5758
Dust2 = 0.4965
Inferno = 0.4885
etc and vice versa for TWinProb
So far I have been using sqldf library which is very tedious, this is what I am currently doing:
T1NukeCT <- sqldf("select CTWinProb from Team1 where MapName like '%Nuke%'")
which outputs T1NukeCT = 0.5758
and repeating for each Map and then again for TWinProb
I am sure there is an easier way, just quite new to using R so am not 100% on the best method here or how to go about doing it in a less tedious manner
You may use a WHERE IN (...) clause:
Maps <- c("Nuke","Vertigo","Inferno","Mirage","Train","Overpass","Dust2")
where_in <- paste0("('", paste(Maps, collapse="','"), "')")
sql <- paste0("SELECT CTWinProb FROM Team1 WHERE MapName IN ", where_in)
T1NukeCT <- sqldf(sql)
To be clear, the SQL query generated by the above script is:
SELECT CTWinProb
FROM Team1
WHERE MapName IN ('Nuke','Vertigo','Inferno','Mirage','Train','Overpass','Dust2')
What output/results are you looking for exactly?
If you want results in R, these are two simple functions to return the desired values.
They require the dplyr package to be loaded.
library(dplyr)
YourData <- read_csv("./yourfile/.csv")
CTWinFunc <- function(x){
YourData %>% filter(MapName == x) %>% pull(CTWinProb)}
TWinFunc <- function(x){
YourData %>% filter(MapName == x) %>% pull(TWinProb)}
Now CTWinFunc("Nuke") should return CTWinProb result for Nuke, ie: 0.5758
And TWinFunc("Nuke") should return TWinProb result for Nuke, ie: 0.4242
If you want to return a vector with all the results together, I guess you could use the sapply() function. Something like this...
TWins <- sapply(Maps, TWinFunc)
TWins[lengths(TWins)==0] <- NA
TWins <- unlist(TWins)
And this should give you a table with the results:
cbind(Maps, Twins)
Of course, it seems like all this data is already in the original table and you could just subset that.
YourData[,c(4,11,12)]

How to read data from a database by chunk in R?

In dplyr, if tbl is a table in a database then head(tbl) gets translated into
select
*
from
tbl
limit 6
but there doesn't seem to be a way to use the offset keyword to read data in chunks. E.g. the equivalent of
select
*
from
tbl
limit 6 offset 5
doesn't seem possible with dplyr. In dbplyr, there is a do function to let you choose a chunk_size to bring back data chunk-by-chunk.
Is that the only way to do it in R? The solution doesn't have to in dplyr or the tidyverse.
Another approach would be to construct your own offset function. This assumes your database supports it, and the function is unlikely to be transferable to databases of other types.
Something like the following:
offset_head = function(table, num, offset){
# get connection
db_connection = table$src$con
sql_query = build_sql(con = db_connection,
sql_render(table),
"\nLIMIT ", num,
"\nOFFSET ", offset
)
return(tbl(db_connection, sql(sql_query)))
}
The way I have done this in dbplyr is based on the addition of a reference/ID column:
my_tbl = tbl(con, "table_name")
for(i in 1:100){
sub_tbl = my_tbl %>% filter(ID %% 100 == i)
# further processing using 'sub_tbl'
...
}
If you add a row number to your dataset, then your filter could be replaced by filter(LowerBound < row_number & row_number < UpperBound).

Can I dynamically generate a WITH RESULT SETS clause for a stored procedure in SQL Server 2016?

I have a stored procedure where I execute an R script, and return a data frame as the output. The procedure queries a linked server, performs a variety of transformations on the result of the original query, and returns the resulting data frame.
Here are details about my environment:
Product: Microsoft SQL Server Enterprise (64-bit)
Operating System: Microsoft Windows NT 6.3 (9600)
Platform: NT x64
Version: 13.0.5216.0
The script in the stored procedure is as follows:
EXECUTE sp_execute_external_script
#language = N'R',
#script = N'
library(jsonlite)
library(purrr)
library(tidyr)
library(dplyr)
library(lubridate)
##store initial query as a data frame
goldInstrumentDF <- data.frame(InputDataSet)
##set load_ts as a timestamp
formattedDF <- transform(goldInstrumentDF, load_ts = ymd_hms(as.character(goldInstrumentDF$load_ts)))
##set all other column values as characters
i <- sapply(formattedDF, is.factor)
formattedDF[i] <- lapply(formattedDF[i], as.character)
##unpack json object
firstTransform <- formattedDF %>%
mutate(event = map(event, ~ fromJSON(.) %>% as.data.frame())) %>%
unnest(event)
##store load events in a data frame
loadEvents <- firstTransform[firstTransform$object.method == "IMM_Equipment_RePro_Load_Event", ]
##store equipment events in a data frame
equipmentEvents <- firstTransform[firstTransform$object.method != "IMM_Equipment_RePro_Load_Event", ]
##parse the initial characters from the non-standard data exchange format
equipmentSubstring <- equipmentEvents %>% mutate(object.object = substring(equipmentEvents$object.object, 19))
##remove the curly bracked from the end of the non-standard data exchange format
equipmentSubstring2 <- equipmentSubstring %>% mutate(object.object = gsub(''.$'', '''', equipmentSubstring$object.object))
##remove the single quotes from the non-standard data exchange format
equipmentSubstring3 <- equipmentSubstring2 %>% mutate(object.object = gsub("''", "", equipmentSubstring2$object.object))
##split the data from the non-standard data exchange format into a header and a value
namev<-function(x) {
a<-strsplit(x,"=")
setNames(sapply(a,''['',2), sapply(a,''['',1))
}
##turn each row into a named vector
secondTransform <- lapply(strsplit(equipmentSubstring3$object.object, ","), namev)
##find list of all column names
thirdTransform <- unique(unlist(sapply(secondTransform, names)))
##extract data from all rows for every column
fourthTransform <-do.call(rbind, lapply(secondTransform, ''['', thirdTransform))
##rejoin with original data
fifthTransform <-cbind(equipmentSubstring3[,-25], fourthTransform)
##remove exraneous columns
drops <- c(" error", "object.object", NA)
sixthTransform <- fifthTransform[ , !(names(fifthTransform) %in% drops)]
##output the data frame
OutputDataSet <- as.data.frame(sixthTransform)',
#input_data_1 = N'SELECT * FROM openquery(KMhivehttp, ''select * from dmfwk_gold.instrumentapps_event;'');
Is there any way for me to dynamically define a WITH RESULTS SET clause based upon the data in my dataframe?
No, you cannot do that as WITH RESULTS SET is part of the stored procedure execution. I guess you'd want to do that due to your sixthTransform, where you remove erroneous columns?
If that is the case (removal of columns), perhaps it would be better if you returned the erroneous columns, but had them assigned a "magic" value? That way you'd always get a deterministic resultset back.

Global variable inside an sqlQuery

I have a variable that stores a time string.
library(lubridate)
date_n <- today() - years(2)
And I want to use the date_n within the following sqlQuery.
transactions_july <- sqlQuery(con,
"select DATA, VREME, PARTIJA, IZNOS
from pts
where DATA > '2016-08-10'")
So basically, date_n would replace the date - '2016-08-10'.
Any ideas?
You can use sprintf
Just do it:
transactions_july <- sqlQuery(con,
sprintf("select DATA, VREME, PARTIJA, IZNOS
from pts where DATA > %s",date_n))
The %s will be replaced by the date_n as you want.
And for SQL query you can also use sqldf.

Resources