RPostgreSQL: datetime convert as date - r

I'm using the RPostgreSQL package to load data from a PostgreSQL data base.
The problem is that a datetime column (POSIXct) is automatically convert into a date.
library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="abc",host="def ",port=1234,user="ghi",password="jkl" )
Instead of using this:
df = dbGetQuery(con, "
SELECT customer_id, dttm_utc
FROM schema.table;")
I have to use that:
df = dbGetQuery(con, "
SELECT customer_id, to_char(dttm_utc, 'MM-DD-YYYY HH24:MI:SS') as dttm_utc,
FROM schema.table;")
If I don't I loose the time and only recover dates.
I noticed this probem doesn't occur if I only want the first 1000 rows. It appears almost all the time when there is more than 300 000 rows.
How can I fix this ?

Related

Hive and RJDBC: Datatype date is not detected -> thus: converted to dataframe R column type "character"

I am using R-Studio to process data from HIVE. Here I am using RJDBC. RJDBC converts the select statement into a dataframe. Unfortunately conversion of hive columns data types "date" and "timestamp" seems not be recognized. Thus it is converted as character during dbReadTable(conn, db2.ibor_lending), which is bad.
Do you have any idea about this ? I don't want to recast the character to date again in R because it is 1. overhead, 2. lead to coupling and 3. increased maintenance efforts
library(DBI)
library(rJava)
library(RJDBC)
print("Attempting Hive Connection...")
hadoop.class.path = list.files(path=c("/usr/hdp/current/hadoop-client"),pattern="jar", full.names=T);
hadoop.client.lib = list.files(path=c("/usr/hdp/current/hadoop-client/lib"),pattern="jar", full.names=T);
hive.class.path = list.files(path=c("/usr/hdp/current/hive-client/lib"),pattern="jar", full.names=T);
hadoop.hdfs.lib.path = list.files(path=c("/usr/hdp/current/hadoop-hdfs-client"),pattern="jar",full.names=T);
zookeeper.lib.path = list.files(path=c("/usr/hdp/current/zookeeper-client"),pattern="jar",full.names=T);
mapred.class.path = list.files(path=c("/usr/hdp/current/hadoop-mapreduce-client"),pattern="jar",full.names=T);
cp = c(hive.class.path,mapred.class.path,hadoop.class.path,hadoop.client.lib,hadoop.hdfs.lib.path)
.jinit(classpath=cp, parameters="-Djavax.security.auth.useSubjectCredsOnly=false")
drv <- JDBC("org.apache.hive.jdbc.HiveDriver","/usr/hdp/current/hive-client/lib/hive-jdbc.jar",identifier.quote="`")
conn <- dbConnect(drv,"jdbc:hive2://xxx:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_HOST#yyyy")
show_databases <- dbGetQuery(conn, "select * from db2.ibor_lending LIMIT 100")
show_datatypes <- dbGetQuery(conn, "describe db2.ibor_lending")
show_table <- dbReadTable(conn, db2.ibor_lending)
The result is:
Hive: col_name data_type comment
cutoffdate timestamp
R dataframe: ibor_lending.cutoffdate character
Br, Dennis

Timezone shift and losing milliseconds when inserting dates into Oracle using the R odbc package

I'm switching over from using the ROracle package to using the odbc package to connect to Oracle. Using ROracle, I was able to insert datetimes with milliseconds into a table that had a field with the timestamp data type. Using the odbc package, milliseconds are lost. Additionally, when I query back the date I just inserted, the time shifts four hours forward (I'm in Eastern so presumably it's shifting to UTC time). I've confirmed the time is being inserted correctly into Oracle. Is there an option that can be set so that milliseconds are retained and is there a way to prevent the time from shifting?
library(odbc)
options(digits.secs = 6)
Sys.setenv(TZ = "EST5EDT",
ORA_SDTZ = "EST5EDT")
conn <- DBI::dbConnect(odbc::odbc(),
driver = "Oracle12c",
uid = rstudioapi::showPrompt(title = "username", message = "username", default = ""),
pwd = rstudioapi::askForPassword(),
dbq = "dbname",
timezone = Sys.timezone())
DBI::dbExecute(conn, "create table test_table (datetime timestamp(6))")
df <- data.frame(DATETIME = Sys.time(), stringsAsFactors = FALSE)
# the time has milliseconds in R
print(df)
# insert data
res <- dbSendStatement(conn, "insert into test_table (datetime) values (:1)")
dbBind(res, df)
dbGetRowsAffected(res)
dbClearResult(res)
# the time does not have milliseconds when read back from Oracle and is shifted four hours forward
dbGetQuery(conn, "select * from test_table")
Pass the timestamp value as a string and convert it in the SQL:
insert into test_table (datetime) values (TO_TIMESTAMP(:1, 'DD-MON-YYYY HH24:MI:SS.FF'))
Pass your parameter as a string which matches the format in the TO_TIMESTAMP function, e.g.
"22-MAR-2019 17:46:57.123456"
and change the date format to whatever you're comfortable with.
Best of luck

Global variable inside an sqlQuery

I have a variable that stores a time string.
library(lubridate)
date_n <- today() - years(2)
And I want to use the date_n within the following sqlQuery.
transactions_july <- sqlQuery(con,
"select DATA, VREME, PARTIJA, IZNOS
from pts
where DATA > '2016-08-10'")
So basically, date_n would replace the date - '2016-08-10'.
Any ideas?
You can use sprintf
Just do it:
transactions_july <- sqlQuery(con,
sprintf("select DATA, VREME, PARTIJA, IZNOS
from pts where DATA > %s",date_n))
The %s will be replaced by the date_n as you want.
And for SQL query you can also use sqldf.

R text substitution in odbc sqlquery

I'm trying to use text substitutes in R to put in custom dates with my SQL odbc connect query.
For example, I could change date1 to be 2016-01-31 and the data would automatically execute. However, using bquote text replacement, it doesn't seem to work....
Any ideas?
library("rodbc")
date1 <- c("2016-12-31")
myconn <- odbcConnect("edwPROD",uid="username",pwd="BBBBB")
data1 <- sqlQuery(myconn,"
SELECT a.*
FROM (SELECT id
,status_code
,rate_plan
,publication
,active_count
FROM prod_view.fct_active
WHERE snap_start_date<=bquote(.date1)
) AS a
")
odbcClose(myconn)
This is a job for package infuser. It allows you to change one part of the SQL request, in this case date1.
library(infuser)
date1 <- c("2016-12-31")
sql_query_template <- "SELECT a.*
FROM (SELECT id
,status_code
,rate_plan
,publication
,active_count
FROM prod_view.fct_active
WHERE snap_start_date<='{{date1}}'
) AS a;"
sql_query <-infuse(sql_query_template, date1=date1)
myconn <- odbcConnect("edwPROD",uid="username",pwd="BBBBB")
data1 <- sqlQuery(myconn,sql_query)
odbcClose(myconn)

Using Dates with RSQLite

How do you write a SQL query with a date using RSQLite. Here is an example below. The dbGetQuery does not return any rows.
require(RSQLite)
require(ggplot2)
data(presidential)
m <- dbDriver("SQLite")
tmpfile <- tempfile('presidential', fileext='.db')
conn <- dbConnect(m, dbname=tmpfile)
dbWriteTable(conn, "presidential", presidential)
dbGetQuery(conn, "SELECT * FROM presidential WHERE Date(start) >= Date('1980-01-01')")
Just to illustrate, this works fine:
tmpfile <- tempfile('presidential', fileext='.db')
conn <- dbConnect(m, dbname=tmpfile)
p <- presidential
p$start <- as.character(p$start)
p$end <- as.character(p$end)
dbWriteTable(conn, "presidential", p)
dbGetQuery(conn, "SELECT * FROM presidential WHERE start >= '1980-01-01'")
You can read about the lack of native date types in SQLite in the docs here. I've been using strings as dates for so long in SQLite that I'd actually forgotten about the issue completely.
And yes, I've written a small R function that converts any Date column in a data frame to character. For simple comparisons, keeping them in YYYY-MM-DD is enough, and if I need to do arithmetic I convert them after the fact in R.
Following on from #joran's answer, here's a simple function to convert date columns to string for a data.frame.
mutate(df, across(where(is.Date), ~ format(.x, "%Y.%m.%d")))
I found working with RSQLite and dplyr to be the most convenient way to stay type-consistent using R and SQLite. In particular, extendend_types = TRUE ensures that columns of type DATE, DATETIME / TIMESTAMP, and TIME are mapped to the corresponding R-classes (at least after version 2.2.8 for RSQLite).
library(dplyr)
library(RSQLite)
library(ggplot2)
data(presidential)
mydb <- dbConnect(SQLite(), "presidential.sqlite", extended_types = TRUE)
dbWriteTable(mydb, "presidential", presidential)
tbl(mydb, "presidential") %>%
filter(start >= as.Date("1980-01-01")) %>%
collect()
You can also formulate the latter collection as a get query:
dbGetQuery(mydb, "SELECT * FROM presidential WHERE start >= CAST('1980-01-01' AS DATE)")
As #joran suggests keeping dates in text in SQLlite seems like the best way to go for the time being.
I used #Richard Knight's approach for conversion in, but with ISO format, to change the date to string before writing the dataframe:
local_df %>% mutate(across(where(lubridate::is.Date), ~ format(.x, "%Y-%m-%d")))
Manipulating the dates remotely can be done using sql translation, particularly:
remote_df %>% mutate(date_as_number = julianday(date_as_string))
remote_df %>% mutate(date_as_string = date(date_as_number))
N.b. that is date not as.Date in the second one. This is because as.Date will get translated to CAST(date_as_number AS DATE) whereas what we want is to use SQLLite's date() function with a floating point number as returned by julianday().
Mapping the remote datestrings back into dates can be done automatically, if you :
collect <- function(remote_df, ...) {
raw = remote_df %>% dplyr::collect(...)
isoDateString = function(x) return(is.character(x) & all(na.omit(stringr::str_detect(x,"[0-9]{4}-[0-9]{2}-[0-9]{2}"))) & !all(is.na(x)))
raw = raw %>% mutate(across(where(isoDateString), ~ as.Date(.x, "%Y-%m-%d")))
maybeJulian = function(x) {return(is.double(x) & all(na.omit(x>2440587.5)) & all(na.omit(x<2488069.5)) & !all(is.na(x)))}
raw = raw %>% mutate(across(matches(".*(D|d)ate.*") & where(maybeJulian), ~ as.Date(.x-2440587.5, "1970-01-01")))
return(raw)
}
The apparently random numbers in the maybeJulian function correspond to 1970-01-01 and 2100-01-01

Resources