I'm pulling in data from a api and keep getting the following error.
I put together the sql query and am connecting to the instance for pulling the data. However, when I run collect, it given me an error.
soql_query = paste("SELECT Id, subject FROM Table")
myDF2 <- read.df(sqlContext, source="...", username=sf_username, password=sf_password, version=apiVersion, soql=soql_query)
temp2 <- SparkR::collect(myDF2)
Error in rawToChar(string) :
embedded nul in string: 'VOID: \xe5,nq\b\x92ƹ\xc8Y\x8b\n\nAdd a new comment by Asako:\0\xb3\xe1\xf3Ȓ\xfd\xa0\bE\xe4\t06/29 09:23'
In addition: Warning message:
closing unused connection 6 (col)
I've gone through and identified what column it is. It contains a lot of string data and sentences, so the error partially makes sense.
I was wondering if there was any way to get around this issue.
Related
I'm trying to upload a data frame into SQL, but I keep receiving errors. I've tried the following funciont
DBI::dbWriteTable(rposgre_rsql, name = "my_schema.the_table", value = base, row.names = FALSE, append=TRUE)
that functions returns the error
RPosgreSQL error: could not Retrieve the result : ERROR: syntax error at or near "STDIN"
so i tried:
insert\<- "INSERT INTO my_schema.the_table VALUES base"
M2_results\<- RPostgreSQL::dbSendQuery(conn= rposgre_rsql,statement= insert)
but it returns
RPosgreSQL error: could not Retrieve the result : ERROR: syntax error at or near "base"
I'm positive the connection works, since I can either select the table or use "dbExistsTable.R", but I don't understand why it doesn't work with INSERT INTO. The connection is for a corporate enviroment, maybe it's a permission issue? Also, I don't quite understand what's "STDIN" is.
I am trying to use RODBC to connect to an access database. I have used the same structure several times in this project with success. However, in this instance it is now failing and I cannot figure out why. The code is not really reprex as I can't provide the DB, but...
This works for a single table:
library(magrittr);library(RODBC)
#xWalk_path is simply the path to the accdb
#xtabs generated by querying the available tables
x=1
tab=xtabs$TABLE_NAME[x]
temp<-RODBC::odbcConnectAccess2007(xWalk_path)%>%
RODBC::sqlFetch(., tab, stringsAsFactors = FALSE)
odbcCloseAll()
#that worked perfectly
However, I really want to use this in a a function so I can read several similar tables into a list. As a function it does not work:
xWalk_ls<- lapply(seq_along(xtabs$TABLE_NAME), function(x, xWalk_path=xWalk_path, tab=xtabs$TABLE_NAME[x]){
#print(tab) #debug code
temp<-RODBC::odbcConnectAccess2007(xWalk_path)%>%
RODBC::sqlFetch(., tab, stringsAsFactors = FALSE)
return(temp)
odbcCloseAll()
})
#error every time
The above code will return the error:
Warning in odbcDriverConnect(con, ...) :
[RODBC] ERROR: Could not SQLDriverConnect
Warning in odbcDriverConnect(con, ...) : ODBC connection failed
Error in RODBC::sqlFetch(., tab, stringsAsFactors = FALSE) :
first argument is not an open RODBC channel
I am baffled. I accessed the db to pull table names and generate the xtabs variable using sql Tables. Also, earlier in my code I used a similar code structure (not identical, but same core: sqlFetch to retrieve a table into a list) nd it worked without a problem. Only difference between then and now is that: Then I was opening and closing different .accdb files, but pulling the same table name from each. Now, I am opening and closing the same .accdb file but pulling different sheet names each time.
Am I somehow opening and closing this too fast and it is getting irritated with me? That seems unlikely, because if I force it to print(tab) as the first line of the function it will only print the first table name. If it was getting annoyed about the speed of opening an closing I would expect it to print 2 table names before throwing the error.
return returns its argument and exits, so the remaining code (odbcCloseAll()) won't be executed and the opened file (AccessDB) remains locked as you supposed.
I'm trying to insert a url into a postgresql database using
db_insert_into(con, "url", "http://www.google.com")
Error in file(fn, open = "r") : cannot open the connection
In addition: Warning message:
In file(fn, open = "r") :
cannot open file 'http:/www.google.com': No such file or directory
How can I solve this?
You need to somehow specify both the table name and the field name. I'm going to guess that "url" is the field name, and the table name is as yet undefined here. But it doesn't matter, frankly, take the solution and adapt as needed.
The expectation of db_insert_into is that the values (third argument) is a data.frame or something that can easily be converted to such. So you can probably do:
newdata <- data.frame(url = "http://www.google.com", stringsAsFactors = FALSE)
db_insert_into(con, "tablename", newdata)
If you're lazy or playing code-golf, you might be able to do it with:
db_insert_into(con, "tablename", list(url = "http://google.com"))
since some of the underlying S3 or S4 methods around dbplyr sometimes check if (!is.data.frame(values)) values <- as.data.frame(values). (But I wouldn't necessarily rely on that, it's usually better to be explicit.)
I'm trying to read API data from the BLS into R. I am using the Version 1.0 that does not require registration and is open for public use.
Here is my code:
url <-"http://api.bls.gov/publicAPI/v1/timeseries/data/LAUCN040010000000005"
raw.data <- readLines(url, warn = F)
library(rjson)
rd <- fromJSON(raw.data)
And here is the error message I receive:
Error in fromJSON(raw.data) : incomplete list
If I just try to go to the url in my webrowser it seems to work (pull up a JSON webpage). Not really sure what is going on when I try to get this into R.
When you've used readLines, the object returned is a vector of length 4:
length(raw.data)
You can look at the individual pieces via:
raw.data[1]
If you stick the pieces back together using paste
fromJSON(paste(raw.data, collapse = ""))
everything works. Alternatively,
jsonlite::fromJSON(url)
I have a number of large dataframes in R which I was planning to store using redis. I am totally new to redis but have been reading about it today and have been using the R package rredis.
I have been playing around with small data and saved and retrieved small dataframes using the redisSet() and redisGet() functions. However when it came to saving my larger dataframes (the largest of which is 4.3 million rows and 365MB when saved as .RData file)
using the code redisSet('bigDF', bigDF) I get the following error message:
Error in doTryCatch(return(expr), name, parentenv, handler) :
ERR Protocol error: invalid bulk length
In addition: Warning messages:
1: In writeBin(v, con) : problem writing to connection
2: In writeBin(.raw("\r\n"), con) : problem writing to connection
Presumably because the dataframe is too large to save. I know that redisSet writes the dataframe as a string, which is perhaps not the best way to do it with large dataframes. Does anyone know of the best way to do this?
EDIT: I have recreated the error my creating a very large dummy dataframe:
bigDF <- data.frame(
'lots' = rep('lots',40000000),
'of' = rep('of',40000000),
'data' = rep('data',40000000),
'here'=rep('here',40000000)
)
Running redisSet('bigDF',bigDF) gives me the error:
Error in .redisError("Invalid agrument") : Invalid agrument
the first time, then running it again immediately afterwards I get the error
Error in doTryCatch(return(expr), name, parentenv, handler) :
ERR Protocol error: invalid bulk length
In addition: Warning messages:
1: In writeBin(v, con) : problem writing to connection
2: In writeBin(.raw("\r\n"), con) : problem writing to connection
Thanks
In short: you cannot. Redis can store a maximum of 512 Mb of data in a String value and your serialized demo data frame is bigger than that:
> length(serialize(bigDF, connection = NULL)) / 1024 / 1024
[1] 610.352
Technical background:
serialize is called in the .cerealize function of the package via redisSet and rredis:::.redisCmd:
> rredis:::.cerealize
function (value)
{
if (!is.raw(value))
serialize(value, ascii = FALSE, connection = NULL)
else value
}
<environment: namespace:rredis>
Offtopic: why would you store such a big dataset in redis anyway? Redis is for small key-value pairs. On the other hand I had some success storing big R datasets in CouchDB and MongoDB (with GridFS) by adding the compressed RData there as an attachement.