sparklyr::spark_write_jdbc Does not Accept Spark Dataframe?

sparklyr::spark_write_jdbc Does not Accept Spark Dataframe? - r

I am working within Databricks, trying to use the sparklyr function spark_write_jdbc to write a dataframe to a SQL Server table. The server name/driver etc are correct and work, as I successfully used sparklyr::spark_read_jdbc() earlier in the code.
Per the documentation (here), spark_write_jdbc should accept a Spark Dataframe.
I used SparkR::createDataFrame() to convert the dataframe I was working with to a Spark dataframe.
Here is the relevant code:
events_long_test <- SparkR::createDataFrame(events_long, schema = NULL, samplingRatio = 1, numPartitions = NULL)
sparklyr::spark_write_jdbc(events_long_test,
name ="who_status_long_test" ,
options = list(url = url,
user = user,
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver",
password = pw,
dbtable = "who_status_long_test"))
However, when I run this, it gives me the following error:
Error in UseMethod("spark_write_jdbc") : Error in UseMethod("spark_write_jdbc") :
no applicable method for 'spark_write_jdbc' applied to an object of class "SparkDataFrame"
I have searched around and cannot find other people asking about this error. Why would it say this function cannot work with a Spark Dataframe, when the documentation says it does?
Any help is appreciated.

What is in events_long? the syntax is correct and make sure your connection properties in options are correct. Make sure that events_long_test is a spark dataframe not a table.

Related

How to save to pre-existing Snowflake table from R using pool

I am using pool to handle connections to my Snowflake warehouse. I have created a connection to my database and can read data in a pre-existing table with no issues e.g:
my_pool <- dbPool(odbc::odbc(),
Driver = "Snowflake",
Server = Sys.getenv('WH_URL'),
UID = Sys.getenv('WH_USER'),
PWD = Sys.getenv('WH_PW'),
Warehouse = Sys.getenv('WH_WH'),
Database = "MY_DB")
my_data<-tbl(my_pool, in_schema(sql("schema_name"), sql("table_name"))) %>%
collect()
I would like to save back to a table (table_name) and I believe the best way to do this is with pool::dbWriteTable:
# Create some data to save to db
data<-data.frame("user_email" = "tim#apple.com",
"query_run" = "arrivals_departures",
"data_downloaded" = FALSE,
"created_at" = as.character(Sys.time()))
# Define where to save the data
table_id <- Id(database="MY_DB", schema="MY_SCHEMA", table="TABLE_NAME")
# Write to database
pool::dbWriteTable(my_pool, table_id, data, append=TRUE)
However this returns the error:
Error in new_result(connection#ptr, statement, immediate) :
nanodbc/nanodbc.cpp:1594: 00000: SQL compilation error:
Object 'MY_DB.MY_SCHEMA.TABLE_NAME' already exists.
I have read/write/update permissions for this database for the user specified in my_pool.
I have explored the accepted answers here and here to create the above attempt and can't figure out what I'm doing wrong. It's probably something simple that I've forgotten to do - any thoughts?
EDIT: Wondering if my issue is anything to do with: https://github.com/r-dbi/odbc/issues/480

Cannot access EIA API in R

I'm having trouble accessing the Energy Information Administration's API through R (https://www.eia.gov/opendata/).
On my office computer, if I try the link in a browser it works, and the data shows up (the full url: https://api.eia.gov/series/?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json).
I am also successfully connected to Bloomberg's API through R, so R is able to access the network.
Since the API is working and not blocked by my company's firewall, and R is in fact able to connect to the Internet, I have no clue what's going wrong.
The script works fine on my home computer, but at my office computer it is unsuccessful. So I gather it is a network issue, but if somebody could point me in any direction as to what the problem might be I would be grateful (my IT department couldn't help).
library(XML)
api.key = "e122a1411ca0ac941eb192ede51feebe"
series.id = "PET.MCREXUS1.M"
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=", api.key, "&out=xml", sep="")
doc = xmlParse(file=my.url, isURL=TRUE) # yields error
Error msg:
No such file or directoryfailed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
Error: 1: No such file or directory2: failed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
I tried some other methods like read_xml() from the xml2 package, but this gives a "could not resolve host" error.

To get XML, you need to change your url to XML:
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=",
api.key, "&out=xml", sep="")
res <- httr::GET(my.url)
xml2::read_xml(res)
Or :
res <- httr::GET(my.url)
XML::xmlParse(res)
Otherwise with the post as is(ie &out=json):
res <- httr::GET(my.url)
jsonlite::fromJSON(httr::content(res,"text"))
or this:
xml2::read_xml(httr::content(res,"text"))
Please note that this answer simply provides a way to get the data, whether it is in the desired form is opinion based and up to whoever is processing the data.

If it does not have to be XML output, you can also use the new eia package. (Disclaimer: I'm the author.)
Using your example:
remotes::install_github("leonawicz/eia")
library(eia)
x <- eia_series("PET.MCREXUS1.M")
This assumes your key is set globally (e.g., in .Renviron or previously in your R session with eia_set_key). But you can also pass it directly to the function call above by adding key = "yourkeyhere".
The result returned is a tidyverse-style data frame, one row per series ID and including a data list column that contains the data frame for each time series (can be unnested with tidyr::unnest if desired).
Alternatively, if you set the argument tidy = FALSE, it will return the list result of jsonlite::fromJSON without the "tidy" processing.
Finally, if you set tidy = NA, no processing is done at all and you get the original JSON string output for those who intend to pass the raw output to other canned code or software. The package does not provide XML output, however.
There are more comprehensive examples and vignettes at the eia package website I created.

Error Appending Data to Existing Empty Table on BigQuery Using R

I created an empty table from Big Query GUI with the schema for the table_name. Later I'm trying to append data to the existing empty table from R using bigrquery package.
I have tried below code,
upload_job <- insert_upload_job(project = "project_id",
dataset = "dataset_id",
table = "table_name",
values = values_table,
write_disposition = "WRITE_APPEND")
wait_for(upload_job)
But it is throwing me an error saying,
Provided Schema does not match Table. Field alpha has changed mode from REQUIRED to NULLABLE [invalid]
My table doesn't have any NULL or NA in the mentioned column and data_types in the schema matches exactly with the data types of values_table.
I tried without creating schema uploading directly from R. While I'm doing that it is automatically converting the mode to nullable which is not what I'm looking for.
I also tried by changing write_dispostion = "WRITE_TRUNCATE" which is also converting mode to nullable.
I also looked at this and this which didn't really help me.
Can someone explain what is happening behind the scenes and what is the best way to upload data without recreating schema.
Note: There was a obvious typo error. Earlier it was wirte_disposition edited it to write_disposition.

Appending new data to a local Access data base file with r after a successful connection

So I am currently working with a connecting to an Access database. I am able to get connected to the Access DB which is located on my local system. This is actually connected to a SharePoint list. I would love to automate the process handling this SharePoint list with an R and Access combo! What I want to be able to do actually pretty basic, I want to introduce new data via a .csv which is processed for the relevant content and then compared to the current Access DB and finally the new information uploaded from r to Access.
I've learned that you need to pair the bit version of your Windows OS, Office version, and R version. So I am x64 on all of the above. This allowed me to connect to the Access DB. You also need the 'Microsoft Access Database Engine 2016 Redistributable' which is essentially the driver for the connection.
So what I have so far is:
library(odbc)
library(DBI)
file_path <- "C:/user/Documents/R Projects/...pathtofile.../filename.accdb"
accdb_con <- dbConnect(drv = odbc(), .connection_string = paste0("Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=",file_path,";"))
access.db <- dbReadTable(accdb_con, "sNPS Deep Dives")
That now connects!
I then read in a .csv of new information
new.df <- read.csv("C:/user/Documents/R projects/...pathtofile.csv", header=T, stringsAsFactors=FALSE, na.strings=c("","NA"))
an example of the data set might just look something like this:
date <- c("15/10/2018","15/10/2018", "16/10/2018", "12/11/2018", "07/09/2018")
score <- c("6", "10", "7", "10", "9")
group <- c("a","b", "b", "a", "b")
CaseID <- c("301", "302", "303", "304", "305")
new.df <- data.frame(date,score,group,CaseID)
new.df$date <- as.character(new.df$date)
new.df$score <- as.numeric(new.df$score)
new.df$group <- as.character(new.df$group)
new.df$CaseID <- as.numeric(new.df$CaseID)
Notably there are more columns in the Access DB that people will fill in by hand with further information.
and I process it to be ready go into the Access DB.
probably not that interesting...
Then I compare the the new data against the Access DB as such:
library(dplyr)
new <- anti_join(new.df, access.db, by= "Case.ID")
Now I've tried:
dbWriteTable(access.db.copy, new, append = TRUE)
dbAppendTable(access.db.copy, new)
I don't seem to be able to get this to go anywhere
I am getting an error:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘dbWriteTable’ for signature ‘"ACCESS", "data.frame", "missing"’
I've seen plenty of posts in which people are having trouble connecting to an Access DB but I haven't seen anything about writing new data into that database.
I know this isn't quite a reproducible example but it seems like a difficult problem to recreate since it's a connection problem between different tools. I would be happy to provide example sets that might make this easier
I would appreciate any direction you all can provide.
Thanks!
Edit:
It appears that Bing Sun was right, I was missing an argument. So it appears that we need something more like:
dbWriteTable(access.db.copy, "Name of table",new, append = TRUE)
Which produces the error:
Error in result_insert_dataframe(rs#ptr, values) :
nanodbc/nanodbc.cpp:1944: HY104: [Microsoft][ODBC Microsoft Access Driver]Invalid precision value
I wonder if this may something that is an error from Access about a file type?
now if I use the append I don't get an error I get a 0 for output
dbAppendTable(access.db.copy, "Name of table", new, append= TRUE)
With output:
[1] 0
But I don't see any of the new values when I check the Access file.

I know it's years later, but hopefully this will help someone else with this issue since you're right CrayCrayTown, there aren't very many posts covering this issue.
I've run into this problem repeatedly when dealing with R and MS Access. The solution that I've come up with is pretty "hacky" but it accomplishes what's trying to be done...just not very eloquently.
The way I do this is with a combo of RODBC and DBI packages.
First, I open a connection to the DB with RODBC, and use that connection to write my data to the DB as an intermediary table:
chan <- RODBC::odbcDriverConnection(connection = "/path/to/database.accdb")
RODBC::sqlSave(channel = chan,
dat = df,
tablename = "tbl_intermediary",
rownames = FALSE,
append = FALSE)
RODBC::odbcClose(chan)
rm(chan)
Make sure to close the RODBC connection, I also destroy it for good measure, because why not? I use RODBC for the intermediary table because it supports batch insert statements. I know that the same thing can, in theory, be done with DBI with DBI::dbAppendTable()(but we wouldn't be on this post if that worked how we had hoped). I tried this in a previous SO question here, but it didn't solve my problem. I also don't know how big my intermediary tables could get in the future. Hopefully by the time they get too big we'll be in a different DBMS.
Next, I reopen the connection, this time with DBI, and send a statement to the DB to write those data from the intermediary table to the final resting place for those data, and then drop the intermediary table.
con <- DBI::dbConnect(odbc::odbc(), .connection_string = "/path/to/database.accdb")
DBI::dbSendStatement(
conn = con,
statement = 'UPDATE
tbl_intermediary INNER JOIN final_tbl ON tbl_intermediary.SampleID = final_tbl.sampleNumber
SET
final_tbl.field1 = [tbl_intermediary].[field1],
final_tbl.notes = IIf(Nz([tbl_intermediary].[Notes],"")="",[final_tbl].[notes],[final_tbl].[notes] & "; Newest Notes: " & [tbl_intermediary].[Notes]);'
)
DBI::dbSendStatement(
conn = con,
statement = 'DROP TABLE tbl_intermediary;'
DBI::dbDisconnect(con)
rm(con)
)
The main reason why I chose this method is because some of the SQL I use with Access also has some VBA in it. When I send the SQL-VBA hybrid string with RODBC, I get assorted errors in the IIF() and Nz() functions (see example above). From the RODBC CRAN docs the query argument for the sqlQuery() function is strictly assumed to be a valid SQL statement. So, RODBC has no clue how to interpret the IIf() and Nz() MS Access functions. I think this also has to do with how the ODBC driver handles communication as well (please, someone correct me if I'm wrong about this).
As I understand it, DBI::dbSendStatment() however lets the database engine you're working with interpret how to use the statement argument you provide. In the situation above, the VBA is executed exactly how I would expect if it were run in Access directly. As per the DBI docs, for interactive use you'll generally want to use dbGetQuery or dbExecute.

Trouble fetching data from DP04 table using acs.R

I am using the acs.R package and I am having trouble collecting data from the DP tables and S tables. The tables beginning with B are fine though. Here is an example of my code and the error I receive:
national = geo.make(us="*")
Race_US <- acs.fetch(endyear = 2015, span = 1, geography = national,
table.number = "DP04", col.names = "pretty")
Warning message:
In (function (endyear, span = 5, dataset = "acs", keyword, table.name, :
Sorry, no tables/keyword meets your search.
Suggestions:
try with 'case.sensitive=F',
remove search terms,
change 'keyword' to 'table.name' in search (or vice-versa)
For some reason it is unable to find the table. I have tried acs.lookup with various keywords that should work and still nothing.

Thanks for using the acs.R package.
The problem here is with the "DP" tables: although they are available through the census api, they are not fetched via the acs.R package, since they are in a different format -- not really "raw data" as much as pre-formatted tables made from data found in other places. That said, you should be able to find the underlying data in other tables that are available with acs.fetch.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

sparklyr::spark_write_jdbc Does not Accept Spark Dataframe? - r

What is in events_long? the syntax is correct and make sure your connection properties in options are correct. Make sure that events_long_test is a spark dataframe not a table.

Related

How to save to pre-existing Snowflake table from R using pool

Cannot access EIA API in R

Error Appending Data to Existing Empty Table on BigQuery Using R

Appending new data to a local Access data base file with r after a successful connection

Trouble fetching data from DP04 table using acs.R

Categories

Resources