No applicable method for 'st_write' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')" - r

I am trying to transfer data from the Thingspeak API into a postgres database. The API limits each request to 8000 observations, but I need to pull millions! I'm using R to iteratively pull from the API, do a bunch of wrangling, and then submit the results as a data.frame to my table within the db.
The current way I am doing this relies on the dbWriteTable() function from the RPostgres package. However, this method does not account for existing observations in the db. I have to manually DELETE FROM table_name before running the script or I'll end up writing duplicate observations each time I try to update the db. I'm still wasting time re-writing observations that I deleted, and the script takes ~2 days to complete because of this.
I would prefer a script that incorporates the functionality of postgres-9.5' ON CONLFICT DO NOTHING clause, so I don't have to waste time re-uploading observations that are already within the db. I've found the st_write() and st-read() functions from the sf packages to be useful for running SQL queries directly from R, but have hit a roadblock. Currently, I'm stuck trying to upload the 8000 observations within each df from R to my db. I am getting the following error:
Connecting to database:
# db, host, port, pw, and user are all objects in my R environment
con <- dbConnect(drv = RPostgres::Postgres()
,dbname = db
,host = host
,port = port
,password = pw
,user = user)
Current approach using RPostgres:
dbWriteTable(con
,"table_name"
,df
,append = TRUE
,row.names = FALSE)
New approach using sf:
st_write(conn = conn
,obj = df
,table = 'table_name'
,query = "INSERT INTO table_name ON CONFLICT DO NOTHING;"
,drop_table = FALSE
,try_drop = FALSE
,debug = TRUE)
Error message:
Error in UseMethod("st_write") :
no applicable method for 'st_write' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
Edit:
Converting to strictly a dataframe, i.e. df <- as.data.frame(df) or attributes(df)$class <- "data.frame", resulted in a similar error message, only without the tbl_df or tbl classes.
Most recent approach with sf:
I'm making some progress with using st_write() by changing to the following:
# convert geom from WKT to feature class
df$geom <- st_as_sfc(df$geom)
# convert from data.frame to sf class
df <- st_as_sf(df)
# write sf object to db
st_write(dsn = con # changed from drv to dsn argument
,geom_name = "geom"
,table = "table_name"
,query = "INSERT INTO table_name ON CONFLICT DO NOTHING;"
,drop_table = FALSE
,try_drop = FALSE
,debug = TRUE
)
New Error:
Error in result_create(conn#ptr, statement) :
Failed to fetch row: ERROR: type "geometry" does not exist at character 345
I'm pretty sure that this is because I have not yet installed the PostGIS extension within my PostgreSQL database. If anyone could confirm I'd appreciate it! Installing PostGIS a pretty lengthy process, so I won't be able to provide an update for a few days. I'm hoping I've solved the problem with the st_write() function though!

Related

In sparklyr, how do I save an ml_transformer for future use?

Let's say I want to train a feature transformer on a dataset which has a particular distribution and apply it to different datasets later on where it would be computationally inefficient to recalculate the feature transformer.
I would like to be able to save the transformer in on Spark session, then load it in a different one. However, I am currently unable to do so. Consider the following example code:
library(dplyr)
library(sparklyr)
sc <- spark_connect(master = 'local[1]')
siris <- copy_to(sc, iris)
decile <- sc %>%
ft_quantile_discretizer(input_col = 'Sepal_Length', output_col = 'decile', num_buckets = 10) %>%
ml_fit(siris)
saveRDS(decile, 'decile.rds')
#ml_save(decile, 'decile.rds', overwrite = T)
spark_disconnect(sc)
Then I would like to load that object and run it, like so.
library(dplyr)
library(sparklyr)
sc <- spark_connect(master = 'local[1]')
siris <- copy_to(sc, iris)
decile <- readRDS('decile.rds')
#decile <- ml_load(sc, 'decile.rds')
final <- ml_transform(decile, siris)
The problem is, regardless of which method I use to save the object, I get an error. If I try to use saveRDS, I get an error after I load the object and attempt to use it (for example: java.lang.IllegalArgumentException: Object not found 40). When I try to use ml_save, I get an error when saving the object (Error: org.apache.spark.SparkException: Job aborted. Which is then followed by a massive stacktrace).
It seems that for whatever reason the spark connection is saved as part of the object. I also tried overwriting that with the new connection via
decile$.jobj$connection <- sc
Unfortunately, the object also stores an id, but that appears to be associated with an object created at ml_fit time and can not be pointed towards the dataset that I want to transform. There is no need for this particular object to have to have a spark connection: the information necessary to apply this transformation is available in decile$param_map, so the worst case is I could write up a mutate to handle this. However, I would prefer to use the built in sparklyr capabilities, if possible, and I would like to do this on other transformers, not just the ft_quantile_discretizer.

Grant SQL permissions in PostgreSQL using R

I'm accessing a PostgreSQL database through the R library RPostgreSQL. The following line successfully reads my table into object DF:
DF <- dbReadTable(conn = con, name = c("my_schema","my_table"))
However, attempting to write back into the database with the following line throws ERROR: permission denied for schema my_schema:
dbWriteTable(conn = con, name = c("my_schema", "my_table"), value = DF)
I've discovered from the question Writing to specific schemas with RPostgreSQL that the solution is to SET search_path = my_schema, public;, but I have no idea how to run this from the R Console. I've tried lines such as dbSendQuery(conn = con, statement = "SET search_path = my_schema, public;"), and I recognize that setting permissions is not querying at all, but there's not a dbSetPermissions function in RPostgreSQL.
I'm clearly missing something fundamental since the answer to the aforementioned question satisfied the user who asked it, so I appreciate your patience.

How can I unserialize a model object using PL/R in Greenplum/Postgres?

Error unserializing model object in Greenplum via PL/R
I store model objects in a greenplum database (the open source version) and I've successfully been able to serialize my model objects, insert them into a table in greenplum and unserialize when needed, but using R version 3.5 installed on my machine (local). This is the R code below that runs successfully:
Code:
fromtable = 'modelObjDevelopment'
mod.id = '7919'
model_obj <-
dbGetQuery(conn,
sprintf("SELECT val from standard.%s where model_id::int = '%s';",
fromtable, mod.id))
iter_model <- postgresqlUnescapeBytea(model_obj)
lm_obj_back <- unserialize(iter_model)
summary(lm_obj_back)
Recently, I have installed PL/R on greenplum with all the necessary libraries that I generally use. I am attempting to recreate the code I use in local R (mentioned above) to run on greenplum. After much research I have been trying to run the following transformed code, which relentlessly keeps failing and giving me the same error.
Code:
DROP FUNCTION IF EXISTS mdl_load(val bytea);
CREATE FUNCTION mdl_load(val bytea)
RETURNS text AS
$$
require("RPostgreSQL")
iter_model<-postgresqlUnescapeBytea(val)
model<-unserialize(iter_model)
return(length(val))
$$
LANGUAGE 'plr';
select length(val::bytea) as len, mdl_load(val) as t
from modelObjDevelopment
where model_id::int = 7919
At this point I don't care what I return, I just want the unserialize function to work.
Error:
[22000] ERROR: R interpreter expression evaluation error Detail: Error in unserialize(iter_model) : unknown input format Where: In PL/R function mdl_load
Hope someone had a similar issue and might have a clue for me. It seems that the bytea object changes size after being passed into Pl/R. I am new to this method and hope someone can help.
$$
require(RPostgreSQL)
## load the PostgresSQL driver
drv <- dbDriver("PostgreSQL")
## connect to the default db
con <- dbConnect(drv, dbname = 'XXX')
rows<-dbGetQuery(con, 'SELECT encode(val::bytea,'escape') from standard.modelObjDevelopment where model_id::int=1234')
iter_model<-postgresqlUnescapeBytea(rows[[model_obj_column]])
model<-unserialize(iter_model)
$$
We solved this problem together. For future people coming to this site, get and unserialize model object inside R code is the way to go.

Casting PostgresqlResult to R dataframe

I am trying to fetch data from a local Postgresql instance into R. I need to work with parameterized queries because the queries will later depend on the users input.
res <- postgresqlExecStatement(con, "SELECT * FROM patient_set WHERE
instance_id = $1", c(100))
postgresqlFetch(res,n=-1)
postgresqlCloseResult(res)
dataframe = data.frame(res)
dbDisconnect(con)
Unfortunately this still gives me the following error:
Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class "structure("PostgreSQLResult", package ="RPostgreSQL")" to a data.frame
I also tried switching to dbGetQuery and dbBind but didn't get it running properly. What is the best way to fetch the result of parameterized queries from Postgresql directly into an R dataframe or table?

R: How to use RJDBC to download blob data from oracle database?

Does anyone know of a way to download blob data from an Oracle database using RJDBC package?
When I do something like this:
library(RJDBC)
drv <- JDBC(driverClass=..., classPath=...)
conn <- dbConnect(drv, ...)
blobdata <- dbGetQuery(conn, "select blobfield from blobtable where id=1")
I get this message:
Error in .jcall(rp, "I", "fetch", stride) :
java.sql.SQLException: Ongeldig kolomtype.: getString not implemented for class oracle.jdbc.driver.T4CBlobAccessor
Well, the message is clear, but still I hope there is a way to download blobs. I read something about 'getBinary()' as a way of getting blob information. Can I find a solution in that direction?
The problem is that RJDBC tries to convert the SQL data type it reads to either double or String in Java. Typically the trick works because JDBC driver for Oracle has routines to convert different data types to String (accessed by getString() method of java.sql.ResultSet class). For BLOB, though, the getString() method has been discontinued from some moment. RJDBC still tries calling it, which results in an error.
I tried digging into the guts of RJDBC to see if I can get it to call proper function for BLOB columns, and apparently the solution requires modification of fetch S4 method in this package and also the result-grabbing Java class within the package. I'll try to get this patch to package maintainers. Meanwhile, quick and dirty fix using rJava (assuming conn and q as in your example):
s <- .jcall(conn#jc, "Ljava/sql/Statement;", "createStatement")
r <- .jcall(s, "Ljava/sql/ResultSet;", "executeQuery", q, check=FALSE)
listraws <- list()
col_num <- 1L
i <- 1
while(.jcall(r, 'Z', 'next')){
listraws[[i]] <- .jcall(r, '[B', 'getBytes', col_num)
i <- i + 1
}
This retrieves list of raw vectors in R. The next steps depend on the nature of data - in my application these vectors represent PNG images and can be handled pretty much as file connections by png package.
Done using R 3.1.3, RJDBC 0.2-5, Oracle 11-2 and OJDBC driver for JDK >= 1.6

Resources