Insert URL path into database using dbplyr - r

I'm trying to insert a url into a postgresql database using
db_insert_into(con, "url", "http://www.google.com")
Error in file(fn, open = "r") : cannot open the connection
In addition: Warning message:
In file(fn, open = "r") :
cannot open file 'http:/www.google.com': No such file or directory
How can I solve this?

You need to somehow specify both the table name and the field name. I'm going to guess that "url" is the field name, and the table name is as yet undefined here. But it doesn't matter, frankly, take the solution and adapt as needed.
The expectation of db_insert_into is that the values (third argument) is a data.frame or something that can easily be converted to such. So you can probably do:
newdata <- data.frame(url = "http://www.google.com", stringsAsFactors = FALSE)
db_insert_into(con, "tablename", newdata)
If you're lazy or playing code-golf, you might be able to do it with:
db_insert_into(con, "tablename", list(url = "http://google.com"))
since some of the underlying S3 or S4 methods around dbplyr sometimes check if (!is.data.frame(values)) values <- as.data.frame(values). (But I wouldn't necessarily rely on that, it's usually better to be explicit.)

Related

How to append a R data frame into Redshift?

I'm trying to upload a data frame into SQL, but I keep receiving errors. I've tried the following funciont
DBI::dbWriteTable(rposgre_rsql, name = "my_schema.the_table", value = base, row.names = FALSE, append=TRUE)
that functions returns the error
RPosgreSQL error: could not Retrieve the result : ERROR: syntax error at or near "STDIN"
so i tried:
insert\<- "INSERT INTO my_schema.the_table VALUES base"
M2_results\<- RPostgreSQL::dbSendQuery(conn= rposgre_rsql,statement= insert)
but it returns
RPosgreSQL error: could not Retrieve the result : ERROR: syntax error at or near "base"
I'm positive the connection works, since I can either select the table or use "dbExistsTable.R", but I don't understand why it doesn't work with INSERT INTO. The connection is for a corporate enviroment, maybe it's a permission issue? Also, I don't quite understand what's "STDIN" is.

RODBC connection issue

I am trying to use RODBC to connect to an access database. I have used the same structure several times in this project with success. However, in this instance it is now failing and I cannot figure out why. The code is not really reprex as I can't provide the DB, but...
This works for a single table:
library(magrittr);library(RODBC)
#xWalk_path is simply the path to the accdb
#xtabs generated by querying the available tables
x=1
tab=xtabs$TABLE_NAME[x]
temp<-RODBC::odbcConnectAccess2007(xWalk_path)%>%
RODBC::sqlFetch(., tab, stringsAsFactors = FALSE)
odbcCloseAll()
#that worked perfectly
However, I really want to use this in a a function so I can read several similar tables into a list. As a function it does not work:
xWalk_ls<- lapply(seq_along(xtabs$TABLE_NAME), function(x, xWalk_path=xWalk_path, tab=xtabs$TABLE_NAME[x]){
#print(tab) #debug code
temp<-RODBC::odbcConnectAccess2007(xWalk_path)%>%
RODBC::sqlFetch(., tab, stringsAsFactors = FALSE)
return(temp)
odbcCloseAll()
})
#error every time
The above code will return the error:
Warning in odbcDriverConnect(con, ...) :
[RODBC] ERROR: Could not SQLDriverConnect
Warning in odbcDriverConnect(con, ...) : ODBC connection failed
Error in RODBC::sqlFetch(., tab, stringsAsFactors = FALSE) :
first argument is not an open RODBC channel
I am baffled. I accessed the db to pull table names and generate the xtabs variable using sql Tables. Also, earlier in my code I used a similar code structure (not identical, but same core: sqlFetch to retrieve a table into a list) nd it worked without a problem. Only difference between then and now is that: Then I was opening and closing different .accdb files, but pulling the same table name from each. Now, I am opening and closing the same .accdb file but pulling different sheet names each time.
Am I somehow opening and closing this too fast and it is getting irritated with me? That seems unlikely, because if I force it to print(tab) as the first line of the function it will only print the first table name. If it was getting annoyed about the speed of opening an closing I would expect it to print 2 table names before throwing the error.
return returns its argument and exits, so the remaining code (odbcCloseAll()) won't be executed and the opened file (AccessDB) remains locked as you supposed.

rquery: Connect to specific schema in Postgres DB

The rquery package has been out for some time now, but the documentation is still very sparse. There isn't even a tag yet in SO, this question will create it.
Maybe there is someone who can help me nevertheless.
I want to connect to a schema in my Postgres-DB via rqueryto read the data into R with all the speed it promises.
Using this code it works with all the tables in the public-schema.
library(RPostgres)
library(rquery)
con <- dbConnect(RPostgres::Postgres(),
host = #####,
dbname = #####,
user = #####,
password = ######)
df <- db_td(con, "tablename") %.>%
execute(con, .)
Now when I want to access a table in a specific schema db_td() has the argument qualifiers = which is an
optional named ordered vector of strings carrying
additional db hierarchy terms,such as schema
So I did:
db_td(db, "tablename", qualifiers = c(schema = "schema"))
But:
Error in result_create(conn#ptr, statement) : Failed to prepare
query: FEHLER: Relation »tablename« existiert nicht LINE 1: SELECT
* FROM "tablename" LIMIT 1
So the qualifiers = argument seems to be completely ignored.
My question is thus pretty basic:
How can I connect to a schema in a PostgresDB via rquery?
all my attempts to solve this "within" rquery seem to fail miserably, but you can work around it by doing something like:
dbExecute(con, "SET search_path = foo_schema, public;")
before you run db_td.
I think it's caused by rq_colnames doing:
paste0("SELECT * FROM ", quote_identifier(db, table_name),
" LIMIT 1")
and hence not doing anything with its qualifiers, at least this matches the error I get back.
maybe report a bug/issue with rquery if this isn't enough
I have created an issue on github. So far regular rquery indeed doesn't have schema ability. The development version of rquery (1.3.4) however has, as of today, basic schema ability.
To be installed via:
library(devtools)
install_github("WinVector/rquery", host = "https://api.github.com")
Here's a small instruction. Seems to have been inteded to work just as I was trying in my question.
Be careful though, rquery hasn't been fully tested in schema-mode and some things might not work.
EDIT: rquery now has full schema support.

No applicable method for 'st_write' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"

I am trying to transfer data from the Thingspeak API into a postgres database. The API limits each request to 8000 observations, but I need to pull millions! I'm using R to iteratively pull from the API, do a bunch of wrangling, and then submit the results as a data.frame to my table within the db.
The current way I am doing this relies on the dbWriteTable() function from the RPostgres package. However, this method does not account for existing observations in the db. I have to manually DELETE FROM table_name before running the script or I'll end up writing duplicate observations each time I try to update the db. I'm still wasting time re-writing observations that I deleted, and the script takes ~2 days to complete because of this.
I would prefer a script that incorporates the functionality of postgres-9.5' ON CONLFICT DO NOTHING clause, so I don't have to waste time re-uploading observations that are already within the db. I've found the st_write() and st-read() functions from the sf packages to be useful for running SQL queries directly from R, but have hit a roadblock. Currently, I'm stuck trying to upload the 8000 observations within each df from R to my db. I am getting the following error:
Connecting to database:
# db, host, port, pw, and user are all objects in my R environment
con <- dbConnect(drv = RPostgres::Postgres()
,dbname = db
,host = host
,port = port
,password = pw
,user = user)
Current approach using RPostgres:
dbWriteTable(con
,"table_name"
,df
,append = TRUE
,row.names = FALSE)
New approach using sf:
st_write(conn = conn
,obj = df
,table = 'table_name'
,query = "INSERT INTO table_name ON CONFLICT DO NOTHING;"
,drop_table = FALSE
,try_drop = FALSE
,debug = TRUE)
Error message:
Error in UseMethod("st_write") :
no applicable method for 'st_write' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
Edit:
Converting to strictly a dataframe, i.e. df <- as.data.frame(df) or attributes(df)$class <- "data.frame", resulted in a similar error message, only without the tbl_df or tbl classes.
Most recent approach with sf:
I'm making some progress with using st_write() by changing to the following:
# convert geom from WKT to feature class
df$geom <- st_as_sfc(df$geom)
# convert from data.frame to sf class
df <- st_as_sf(df)
# write sf object to db
st_write(dsn = con # changed from drv to dsn argument
,geom_name = "geom"
,table = "table_name"
,query = "INSERT INTO table_name ON CONFLICT DO NOTHING;"
,drop_table = FALSE
,try_drop = FALSE
,debug = TRUE
)
New Error:
Error in result_create(conn#ptr, statement) :
Failed to fetch row: ERROR: type "geometry" does not exist at character 345
I'm pretty sure that this is because I have not yet installed the PostGIS extension within my PostgreSQL database. If anyone could confirm I'd appreciate it! Installing PostGIS a pretty lengthy process, so I won't be able to provide an update for a few days. I'm hoping I've solved the problem with the st_write() function though!

Using R package BerkeleyEarth

I'm working for the first time with the R package BerkeleyEarth, and attempting to use its convenience functions to access the BEST data. I think maybe it's just a problem with their servers (a matter I've separately addressed to the package's maintainer) but I wanted to know if it's instead something silly I'm doing.
To reproduce my fault
library(BerkeleyEarth)
downloadBerkeley()
which provides the following error message
trying URL 'http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip'
Error in download.file(urls$Url[thisUrl], destfile = file.path(destDir, :
cannot open URL 'http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip'
In addition: Warning message:
In download.file(urls$Url[thisUrl], destfile = file.path(destDir, :
InternetOpenUrl failed: 'A connection with the server could not be established'
Has anyone had a better experience using this package?
The error message is pointing to a different URL than one should get judging what URLs are listed at http://berkeleyearth.org/data/ that point to the zip formatted files. There are another set of .nc files that appear to be more recent. I would replace the entries in the BerkeleyUrls dataframe with the ones that match your analysis strategy:
This is the current URL that should be in position 1,1:
http://berkeleyearth.lbl.gov/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip
And this is the one that is in the package dataframe:
> BerkeleyUrls[1,1]
[1] "http://download.berkeleyearth.org/downloads/TAVG/LATEST%20-%20Non-seasonal%20_%20Quality%20Controlled.zip"
I suppose you could try:
BerkeleyUrls[, 1] <- sub( "download\\.berkeleyearth\\.org", "berkeleyearth.lbl.gov", BerkeleyUrls[, 1])

Resources