Casting PostgresqlResult to R dataframe - r

I am trying to fetch data from a local Postgresql instance into R. I need to work with parameterized queries because the queries will later depend on the users input.
res <- postgresqlExecStatement(con, "SELECT * FROM patient_set WHERE
instance_id = $1", c(100))
postgresqlFetch(res,n=-1)
postgresqlCloseResult(res)
dataframe = data.frame(res)
dbDisconnect(con)
Unfortunately this still gives me the following error:
Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class "structure("PostgreSQLResult", package ="RPostgreSQL")" to a data.frame
I also tried switching to dbGetQuery and dbBind but didn't get it running properly. What is the best way to fetch the result of parameterized queries from Postgresql directly into an R dataframe or table?

Related

How can I unserialize a model object using PL/R in Greenplum/Postgres?

Error unserializing model object in Greenplum via PL/R
I store model objects in a greenplum database (the open source version) and I've successfully been able to serialize my model objects, insert them into a table in greenplum and unserialize when needed, but using R version 3.5 installed on my machine (local). This is the R code below that runs successfully:
Code:
fromtable = 'modelObjDevelopment'
mod.id = '7919'
model_obj <-
dbGetQuery(conn,
sprintf("SELECT val from standard.%s where model_id::int = '%s';",
fromtable, mod.id))
iter_model <- postgresqlUnescapeBytea(model_obj)
lm_obj_back <- unserialize(iter_model)
summary(lm_obj_back)
Recently, I have installed PL/R on greenplum with all the necessary libraries that I generally use. I am attempting to recreate the code I use in local R (mentioned above) to run on greenplum. After much research I have been trying to run the following transformed code, which relentlessly keeps failing and giving me the same error.
Code:
DROP FUNCTION IF EXISTS mdl_load(val bytea);
CREATE FUNCTION mdl_load(val bytea)
RETURNS text AS
$$
require("RPostgreSQL")
iter_model<-postgresqlUnescapeBytea(val)
model<-unserialize(iter_model)
return(length(val))
$$
LANGUAGE 'plr';
select length(val::bytea) as len, mdl_load(val) as t
from modelObjDevelopment
where model_id::int = 7919
At this point I don't care what I return, I just want the unserialize function to work.
Error:
[22000] ERROR: R interpreter expression evaluation error Detail: Error in unserialize(iter_model) : unknown input format Where: In PL/R function mdl_load
Hope someone had a similar issue and might have a clue for me. It seems that the bytea object changes size after being passed into Pl/R. I am new to this method and hope someone can help.
$$
require(RPostgreSQL)
## load the PostgresSQL driver
drv <- dbDriver("PostgreSQL")
## connect to the default db
con <- dbConnect(drv, dbname = 'XXX')
rows<-dbGetQuery(con, 'SELECT encode(val::bytea,'escape') from standard.modelObjDevelopment where model_id::int=1234')
iter_model<-postgresqlUnescapeBytea(rows[[model_obj_column]])
model<-unserialize(iter_model)
$$
We solved this problem together. For future people coming to this site, get and unserialize model object inside R code is the way to go.

No applicable method for 'st_write' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"

I am trying to transfer data from the Thingspeak API into a postgres database. The API limits each request to 8000 observations, but I need to pull millions! I'm using R to iteratively pull from the API, do a bunch of wrangling, and then submit the results as a data.frame to my table within the db.
The current way I am doing this relies on the dbWriteTable() function from the RPostgres package. However, this method does not account for existing observations in the db. I have to manually DELETE FROM table_name before running the script or I'll end up writing duplicate observations each time I try to update the db. I'm still wasting time re-writing observations that I deleted, and the script takes ~2 days to complete because of this.
I would prefer a script that incorporates the functionality of postgres-9.5' ON CONLFICT DO NOTHING clause, so I don't have to waste time re-uploading observations that are already within the db. I've found the st_write() and st-read() functions from the sf packages to be useful for running SQL queries directly from R, but have hit a roadblock. Currently, I'm stuck trying to upload the 8000 observations within each df from R to my db. I am getting the following error:
Connecting to database:
# db, host, port, pw, and user are all objects in my R environment
con <- dbConnect(drv = RPostgres::Postgres()
,dbname = db
,host = host
,port = port
,password = pw
,user = user)
Current approach using RPostgres:
dbWriteTable(con
,"table_name"
,df
,append = TRUE
,row.names = FALSE)
New approach using sf:
st_write(conn = conn
,obj = df
,table = 'table_name'
,query = "INSERT INTO table_name ON CONFLICT DO NOTHING;"
,drop_table = FALSE
,try_drop = FALSE
,debug = TRUE)
Error message:
Error in UseMethod("st_write") :
no applicable method for 'st_write' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
Edit:
Converting to strictly a dataframe, i.e. df <- as.data.frame(df) or attributes(df)$class <- "data.frame", resulted in a similar error message, only without the tbl_df or tbl classes.
Most recent approach with sf:
I'm making some progress with using st_write() by changing to the following:
# convert geom from WKT to feature class
df$geom <- st_as_sfc(df$geom)
# convert from data.frame to sf class
df <- st_as_sf(df)
# write sf object to db
st_write(dsn = con # changed from drv to dsn argument
,geom_name = "geom"
,table = "table_name"
,query = "INSERT INTO table_name ON CONFLICT DO NOTHING;"
,drop_table = FALSE
,try_drop = FALSE
,debug = TRUE
)
New Error:
Error in result_create(conn#ptr, statement) :
Failed to fetch row: ERROR: type "geometry" does not exist at character 345
I'm pretty sure that this is because I have not yet installed the PostGIS extension within my PostgreSQL database. If anyone could confirm I'd appreciate it! Installing PostGIS a pretty lengthy process, so I won't be able to provide an update for a few days. I'm hoping I've solved the problem with the st_write() function though!

dbGetQuery and dbReadTable failing to return a really large table DBI

I have a really large table (8M rows) that I need to import in R on which I will be doing some processing. Problem is when I try to bring it in R using the DBI package I get an error
My code is below
options(java.parameters = "-Xmx8048m")
psql.jdbc.driver <- "../postgresql-42.2.1.jar"
jdbc.url <- "jdbc:postgresql://server_url:port"
pgsql <- JDBC("org.postgresql.Driver", psql.jdbc.driver, "`")
con <- dbConnect(pgsql, jdbc.url, user="", password= '')
tbl <- dbGetQuery(con, "SELECT * FROM my_table;")
And the error I get is
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", :
Unable to retrieve JDBC result set for SELECT * FROM my_table; (Ran out of memory retrieving query results.)
I can understand its because the result set is too big but I am not sure how to retrieve it by batches instead of all of it together. I have tried using dBSendQuery, dbReadTable and dbGetQuery all of them give the same error.
Any help would be appreciated!
I got it to work by using the RPostgreSQL package instead of the default RJDBC and DBI package.
It was able to do a sendQuery and then used fetch recursively to get the data in chunks of 10,000.
main_tbl <- dbFetch(postgres_query, n=-1) #didnt work so tried in chunks
df<- data.frame()
while (!dbHasCompleted(postgres_query)) {
chunk <- dbFetch(postgres_query, 10000)
print(nrow(chunk))
df = rbind(df, chunk)
}

"The requested fetchSize is more than the allowed value in Athena" with JDBC driver

I'm trying to pull data from an Athena DB into R using RJDBC as described in detail on AWS's own blog. Alas, the amount of data I'm trying to pull is substantial and so I'm getting the following error message:
Error in .jcall(rp, "I", "fetch", stride, block) :
java.sql.SQLException: The requested fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try again. Refer to the Athena documentation for valid fetchSize values.
The Athena documentation doesn't actually give any such fetchSize values but I gather from this github issue that the value should be lower than 1000. I gather from the same github issue that there is no way to pass this fetchSize to RJDBC. So are there other ways of querying Athena that respect this limit?
The basic problem is that dbGetQuery doesn't allow one to specify the fetchSize. As per the RJDBC package author one workaround is to call the two functions that dbGetQuery wraps separately and pass the fetchSize to fetch():
q <- dbSendQuery(c, ...)
fetch(q, -1, block=999)
More generally:
setMethod("dbGetQuery", signature(conn="JDBCConnection", statement="character"), def=function(conn, statement, ...) {
r <- dbSendQuery(conn, statement, ...)
on.exit(.jcall(r#stat, "V", "close"))
if (conn#jc %instanceof% "com.amazonaws.athena.jdbc.AthenaConnection") fetch(r, -1, 999) # Athena can only pull 999 rows at a time
else fetch(r, -1)
})
For what it's worth, I fixed this in the AWR.Athena R package, so you can use it if you like.
From RJDBC version >= 0.2-10 you can use dbGetQuery with n = -1 and block = 999 to fetch more than 1000 lines from Athena:
d = dbGetQuery(con, statement = "select * from tmp limit 1001", n = -1, block = 999)

R: How to use RJDBC to download blob data from oracle database?

Does anyone know of a way to download blob data from an Oracle database using RJDBC package?
When I do something like this:
library(RJDBC)
drv <- JDBC(driverClass=..., classPath=...)
conn <- dbConnect(drv, ...)
blobdata <- dbGetQuery(conn, "select blobfield from blobtable where id=1")
I get this message:
Error in .jcall(rp, "I", "fetch", stride) :
java.sql.SQLException: Ongeldig kolomtype.: getString not implemented for class oracle.jdbc.driver.T4CBlobAccessor
Well, the message is clear, but still I hope there is a way to download blobs. I read something about 'getBinary()' as a way of getting blob information. Can I find a solution in that direction?
The problem is that RJDBC tries to convert the SQL data type it reads to either double or String in Java. Typically the trick works because JDBC driver for Oracle has routines to convert different data types to String (accessed by getString() method of java.sql.ResultSet class). For BLOB, though, the getString() method has been discontinued from some moment. RJDBC still tries calling it, which results in an error.
I tried digging into the guts of RJDBC to see if I can get it to call proper function for BLOB columns, and apparently the solution requires modification of fetch S4 method in this package and also the result-grabbing Java class within the package. I'll try to get this patch to package maintainers. Meanwhile, quick and dirty fix using rJava (assuming conn and q as in your example):
s <- .jcall(conn#jc, "Ljava/sql/Statement;", "createStatement")
r <- .jcall(s, "Ljava/sql/ResultSet;", "executeQuery", q, check=FALSE)
listraws <- list()
col_num <- 1L
i <- 1
while(.jcall(r, 'Z', 'next')){
listraws[[i]] <- .jcall(r, '[B', 'getBytes', col_num)
i <- i + 1
}
This retrieves list of raw vectors in R. The next steps depend on the nature of data - in my application these vectors represent PNG images and can be handled pretty much as file connections by png package.
Done using R 3.1.3, RJDBC 0.2-5, Oracle 11-2 and OJDBC driver for JDK >= 1.6

Resources