R text substitution in odbc sqlquery - r

I'm trying to use text substitutes in R to put in custom dates with my SQL odbc connect query.
For example, I could change date1 to be 2016-01-31 and the data would automatically execute. However, using bquote text replacement, it doesn't seem to work....
Any ideas?
library("rodbc")
date1 <- c("2016-12-31")
myconn <- odbcConnect("edwPROD",uid="username",pwd="BBBBB")
data1 <- sqlQuery(myconn,"
SELECT a.*
FROM (SELECT id
,status_code
,rate_plan
,publication
,active_count
FROM prod_view.fct_active
WHERE snap_start_date<=bquote(.date1)
) AS a
")
odbcClose(myconn)

This is a job for package infuser. It allows you to change one part of the SQL request, in this case date1.
library(infuser)
date1 <- c("2016-12-31")
sql_query_template <- "SELECT a.*
FROM (SELECT id
,status_code
,rate_plan
,publication
,active_count
FROM prod_view.fct_active
WHERE snap_start_date<='{{date1}}'
) AS a;"
sql_query <-infuse(sql_query_template, date1=date1)
myconn <- odbcConnect("edwPROD",uid="username",pwd="BBBBB")
data1 <- sqlQuery(myconn,sql_query)
odbcClose(myconn)

Related

Hive and RJDBC: Datatype date is not detected -> thus: converted to dataframe R column type "character"

I am using R-Studio to process data from HIVE. Here I am using RJDBC. RJDBC converts the select statement into a dataframe. Unfortunately conversion of hive columns data types "date" and "timestamp" seems not be recognized. Thus it is converted as character during dbReadTable(conn, db2.ibor_lending), which is bad.
Do you have any idea about this ? I don't want to recast the character to date again in R because it is 1. overhead, 2. lead to coupling and 3. increased maintenance efforts
library(DBI)
library(rJava)
library(RJDBC)
print("Attempting Hive Connection...")
hadoop.class.path = list.files(path=c("/usr/hdp/current/hadoop-client"),pattern="jar", full.names=T);
hadoop.client.lib = list.files(path=c("/usr/hdp/current/hadoop-client/lib"),pattern="jar", full.names=T);
hive.class.path = list.files(path=c("/usr/hdp/current/hive-client/lib"),pattern="jar", full.names=T);
hadoop.hdfs.lib.path = list.files(path=c("/usr/hdp/current/hadoop-hdfs-client"),pattern="jar",full.names=T);
zookeeper.lib.path = list.files(path=c("/usr/hdp/current/zookeeper-client"),pattern="jar",full.names=T);
mapred.class.path = list.files(path=c("/usr/hdp/current/hadoop-mapreduce-client"),pattern="jar",full.names=T);
cp = c(hive.class.path,mapred.class.path,hadoop.class.path,hadoop.client.lib,hadoop.hdfs.lib.path)
.jinit(classpath=cp, parameters="-Djavax.security.auth.useSubjectCredsOnly=false")
drv <- JDBC("org.apache.hive.jdbc.HiveDriver","/usr/hdp/current/hive-client/lib/hive-jdbc.jar",identifier.quote="`")
conn <- dbConnect(drv,"jdbc:hive2://xxx:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_HOST#yyyy")
show_databases <- dbGetQuery(conn, "select * from db2.ibor_lending LIMIT 100")
show_datatypes <- dbGetQuery(conn, "describe db2.ibor_lending")
show_table <- dbReadTable(conn, db2.ibor_lending)
The result is:
Hive: col_name data_type comment
cutoffdate timestamp
R dataframe: ibor_lending.cutoffdate character
Br, Dennis

How to use a subquery in postgresql using R

query2 <- dbGetQuery(con,paste0("select uniqueno from
postgres.asset where location_id in (",dbGetQuery(con,"select
location_id from postgres.location where city_name
='",cityname,"')")
both the tables postgres.asset and postgres.location are postgres Table.
Error it shows for the inner query , as it is unable to find the connection
parameter.
Do let me know how to get rid of the error. I doubt the error comes for the misplacing of paste0 command
locationid <- dbGetQuery(con, paste0("select location_id from postgres.location where city_name = '",cityname ,"'"))
for(i in 1:nrow(locationid)){
uniqueid <- dbGetQuery(con, paste0("select uniqueno from postgres.asset where location_id = '",locationid[i,],"'"))}

Global variable inside an sqlQuery

I have a variable that stores a time string.
library(lubridate)
date_n <- today() - years(2)
And I want to use the date_n within the following sqlQuery.
transactions_july <- sqlQuery(con,
"select DATA, VREME, PARTIJA, IZNOS
from pts
where DATA > '2016-08-10'")
So basically, date_n would replace the date - '2016-08-10'.
Any ideas?
You can use sprintf
Just do it:
transactions_july <- sqlQuery(con,
sprintf("select DATA, VREME, PARTIJA, IZNOS
from pts where DATA > %s",date_n))
The %s will be replaced by the date_n as you want.
And for SQL query you can also use sqldf.

RPostgreSQL: datetime convert as date

I'm using the RPostgreSQL package to load data from a PostgreSQL data base.
The problem is that a datetime column (POSIXct) is automatically convert into a date.
library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="abc",host="def ",port=1234,user="ghi",password="jkl" )
Instead of using this:
df = dbGetQuery(con, "
SELECT customer_id, dttm_utc
FROM schema.table;")
I have to use that:
df = dbGetQuery(con, "
SELECT customer_id, to_char(dttm_utc, 'MM-DD-YYYY HH24:MI:SS') as dttm_utc,
FROM schema.table;")
If I don't I loose the time and only recover dates.
I noticed this probem doesn't occur if I only want the first 1000 rows. It appears almost all the time when there is more than 300 000 rows.
How can I fix this ?

Add to contents of vector to use in an RJDBC query

I'm using a RJDBC connection to query results from a vertica database into R. I'm creating a comma separated vector of zip codes that I'm then pasting into my query as shown below.
b <- paste("'20882'", "'01441'", "'20860'", "'02139'", sep = ", ")
SQL <- paste("select zip, count(*)
from tablea a
inner join tableb b on a.id = b.id
inner join tablec c on c.col = b.col
where b.zip in (",b'', ") group by 1 order by 1", '', sep = " ")
result <- dbGetQuery(vertica, SQL)
I'm using this in a loop within a function in which I'm going to be adding zip codes to vector b. I was wondering if there was a way to easily do this?
I've been trying, but I'm unable to add items to vector in a way where the query would execute.
Something like the following
b <- c(add_zip, b)
which could then be re-run in the body of the query.
Any suggestions?
Thanks,
Ben

Resources