After I use
cn<-odbcConnect(...)
to connect to MS SQL Server. I can successfully get data using:
tmp <- sqlQuery(cn, "select * from MyTable")
But if I use
tmp <- sqlFetch(cn,"MyTable")
R would complain about "Error in odbcTableExists(channel, sqtable) : table not found on channel". Did I miss anything here?
Assuming you work on Windows OS. When you define your "dsn" in Control panel > Administrative tools > System and Security > Data Sources (ODBC), you have to select a database as well. If you do that your code should work as expected.
So, the problem is not in your R code, but in your "dsn" string that in my opinion does not contain the reference to a database which is needed.
Related
I am trying to use RODBC to connect to an access database. I have used the same structure several times in this project with success. However, in this instance it is now failing and I cannot figure out why. The code is not really reprex as I can't provide the DB, but...
This works for a single table:
library(magrittr);library(RODBC)
#xWalk_path is simply the path to the accdb
#xtabs generated by querying the available tables
x=1
tab=xtabs$TABLE_NAME[x]
temp<-RODBC::odbcConnectAccess2007(xWalk_path)%>%
RODBC::sqlFetch(., tab, stringsAsFactors = FALSE)
odbcCloseAll()
#that worked perfectly
However, I really want to use this in a a function so I can read several similar tables into a list. As a function it does not work:
xWalk_ls<- lapply(seq_along(xtabs$TABLE_NAME), function(x, xWalk_path=xWalk_path, tab=xtabs$TABLE_NAME[x]){
#print(tab) #debug code
temp<-RODBC::odbcConnectAccess2007(xWalk_path)%>%
RODBC::sqlFetch(., tab, stringsAsFactors = FALSE)
return(temp)
odbcCloseAll()
})
#error every time
The above code will return the error:
Warning in odbcDriverConnect(con, ...) :
[RODBC] ERROR: Could not SQLDriverConnect
Warning in odbcDriverConnect(con, ...) : ODBC connection failed
Error in RODBC::sqlFetch(., tab, stringsAsFactors = FALSE) :
first argument is not an open RODBC channel
I am baffled. I accessed the db to pull table names and generate the xtabs variable using sql Tables. Also, earlier in my code I used a similar code structure (not identical, but same core: sqlFetch to retrieve a table into a list) nd it worked without a problem. Only difference between then and now is that: Then I was opening and closing different .accdb files, but pulling the same table name from each. Now, I am opening and closing the same .accdb file but pulling different sheet names each time.
Am I somehow opening and closing this too fast and it is getting irritated with me? That seems unlikely, because if I force it to print(tab) as the first line of the function it will only print the first table name. If it was getting annoyed about the speed of opening an closing I would expect it to print 2 table names before throwing the error.
return returns its argument and exits, so the remaining code (odbcCloseAll()) won't be executed and the opened file (AccessDB) remains locked as you supposed.
I'm trying to setup a connection to a SQL Server from my Mac using the
Microsoft OBDC Driver and the DBI package.
The connection establishes, however character fields, even those that have no special characters, return garbled. The database is proprietary so I'm limited as to what actual output I can show. Numeric fields return fine.
Some other notes.
If I submit a query, I'm able to receive a record set using the correct table. For example the below query returns results, and the column name is correct. The data in the column is garbled
> dbGetquery(con, "Select name from tb1", n = 1)
Warning: Pending rows
name
1 CalteMtrSeda
dbListTables() also returns garbled output, even though as shown above I can receive output from the table referencing it by name.
dbListTables() returns the correct number of tables, but the names are not intelligible.
grep("tb1", dbListTables(con), value = TRUE)
character(0)
Output from my con object
> con
<OdbcConnection> user#ExpectedDataBase
Database: NameIWouldExpect
Microsoft SQL Server Version: 13.00.1742
** Updated to include pattern.
I'm getting every other character returned. From the example above.
CalteMtrSeda == CharlotteMotorSpeedway
This is the first time I've attempted to connect to this database from a Mac.
Turned out to be related to R3.6. Reverting to R3.5 fixed the issue. Link to relevant issue in odbc repo
https://github.com/r-dbi/odbc/issues/283
The rquery package has been out for some time now, but the documentation is still very sparse. There isn't even a tag yet in SO, this question will create it.
Maybe there is someone who can help me nevertheless.
I want to connect to a schema in my Postgres-DB via rqueryto read the data into R with all the speed it promises.
Using this code it works with all the tables in the public-schema.
library(RPostgres)
library(rquery)
con <- dbConnect(RPostgres::Postgres(),
host = #####,
dbname = #####,
user = #####,
password = ######)
df <- db_td(con, "tablename") %.>%
execute(con, .)
Now when I want to access a table in a specific schema db_td() has the argument qualifiers = which is an
optional named ordered vector of strings carrying
additional db hierarchy terms,such as schema
So I did:
db_td(db, "tablename", qualifiers = c(schema = "schema"))
But:
Error in result_create(conn#ptr, statement) : Failed to prepare
query: FEHLER: Relation »tablename« existiert nicht LINE 1: SELECT
* FROM "tablename" LIMIT 1
So the qualifiers = argument seems to be completely ignored.
My question is thus pretty basic:
How can I connect to a schema in a PostgresDB via rquery?
all my attempts to solve this "within" rquery seem to fail miserably, but you can work around it by doing something like:
dbExecute(con, "SET search_path = foo_schema, public;")
before you run db_td.
I think it's caused by rq_colnames doing:
paste0("SELECT * FROM ", quote_identifier(db, table_name),
" LIMIT 1")
and hence not doing anything with its qualifiers, at least this matches the error I get back.
maybe report a bug/issue with rquery if this isn't enough
I have created an issue on github. So far regular rquery indeed doesn't have schema ability. The development version of rquery (1.3.4) however has, as of today, basic schema ability.
To be installed via:
library(devtools)
install_github("WinVector/rquery", host = "https://api.github.com")
Here's a small instruction. Seems to have been inteded to work just as I was trying in my question.
Be careful though, rquery hasn't been fully tested in schema-mode and some things might not work.
EDIT: rquery now has full schema support.
Does anyone know of a way to download blob data from an Oracle database using RJDBC package?
When I do something like this:
library(RJDBC)
drv <- JDBC(driverClass=..., classPath=...)
conn <- dbConnect(drv, ...)
blobdata <- dbGetQuery(conn, "select blobfield from blobtable where id=1")
I get this message:
Error in .jcall(rp, "I", "fetch", stride) :
java.sql.SQLException: Ongeldig kolomtype.: getString not implemented for class oracle.jdbc.driver.T4CBlobAccessor
Well, the message is clear, but still I hope there is a way to download blobs. I read something about 'getBinary()' as a way of getting blob information. Can I find a solution in that direction?
The problem is that RJDBC tries to convert the SQL data type it reads to either double or String in Java. Typically the trick works because JDBC driver for Oracle has routines to convert different data types to String (accessed by getString() method of java.sql.ResultSet class). For BLOB, though, the getString() method has been discontinued from some moment. RJDBC still tries calling it, which results in an error.
I tried digging into the guts of RJDBC to see if I can get it to call proper function for BLOB columns, and apparently the solution requires modification of fetch S4 method in this package and also the result-grabbing Java class within the package. I'll try to get this patch to package maintainers. Meanwhile, quick and dirty fix using rJava (assuming conn and q as in your example):
s <- .jcall(conn#jc, "Ljava/sql/Statement;", "createStatement")
r <- .jcall(s, "Ljava/sql/ResultSet;", "executeQuery", q, check=FALSE)
listraws <- list()
col_num <- 1L
i <- 1
while(.jcall(r, 'Z', 'next')){
listraws[[i]] <- .jcall(r, '[B', 'getBytes', col_num)
i <- i + 1
}
This retrieves list of raw vectors in R. The next steps depend on the nature of data - in my application these vectors represent PNG images and can be handled pretty much as file connections by png package.
Done using R 3.1.3, RJDBC 0.2-5, Oracle 11-2 and OJDBC driver for JDK >= 1.6
I am using R 2.14.1 and Cassandra 1.2.11, I have a separate program which has written data to a single Cassandra table. I am failing to read them from R.
The Cassandra schema is defined like this:
create table chosen_samples (id bigint , temperature double, primary key(id))
I have first tried the RCassandra package (http://www.rforge.net/RCassandra/)
> # install.packages("RCassandra")
> library(RCassandra)
> rc <- RC.connect(host ="192.168.33.10", port = 9160L)
> RC.use(rc, "poc1_samples")
> cs <- RC.read.table(rc, c.family="chosen_samples")
The connection seems to succeed but the parsing of the table into data frame fails:
> cs
Error in data.frame(..dfd. = c("#\"ffffff", "#(<cc><cc><cc><cc><cc><cd>", :
duplicate row.names:
I have also tried using JDBC connector, as described here: http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive
> # install.packages("RJDBC")
> library(RJDBC)
> cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver", "/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar", "`")
But this one fails like this:
Error in .jfindClass(as.character(driverClass)[1]) : class not found
Even though the location to the java driver is correct
$ ls /Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar
/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar
You have to download apache-cassandra-2.0.10-bin.tar.gz and cassandra-jdbc-1.2.5.jar and cassandra-all-1.1.0.jar.
There is no need to install Cassandra on your local machine; just put the cassandra-jdbc-1.2.5.jar and the cassandra-all-1.1.0.jar files in the lib directory of unziped apache-cassandra-2.0.10-bin.tar.gz. Then you can use
library(RJDBC)
drv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
list.files("D:/apache-cassandra-2.0.10/lib",
pattern="jar$",full.names=T))
That is working on my unix but not on my windows machine.
Hope that helps.
This question is old now, but since it's the one of the top hits for R and Cassandra I thought I'd leave a simple solution here, as I found frustratingly little up-to-date support for what I thought would be a fairly common task.
Sparklyr makes this pretty easy to do from scratch now, as it exposes a java context so the Spark-Cassandra-Connector can be used directly. I've wrapped up the bindings in this simple package, crassy, but it's not necessary to use.
I mostly made it to demystify the config around how to make sparklyr load the connector, and as the syntax for selecting a subset of columns is a little unwieldy (assuming no Scala knowledge).
Column selection and partition filtering are supported. These were the only features I thought were necessary for general Cassandra use cases, given CQL can't be submitted directly to the cluster.
I've not found a solution to submitting more general CQL queries which doesn't involve writing custom scala, however there's an example of how this can work here.
Right, I found an (admittedly ugly) way, simply by calling python from R, parsing the NA manually and re-assigning the data-frames names in R, like this
# install.packages("rPython")
# (don't forget to "pip install cql")
library(rPython)
python.exec("import sys")
# adding libraries from virtualenv
python.exec("sys.path.append('/Users/svend/dev/pyVe/playground/lib/python2.7/site-packages/')")
python.exec("import cql")
python.exec("connection=cql.connect('192.168.33.10', cql_version='3.0.0')")
python.exec("cursor = connection.cursor()")
python.exec("cursor.execute('use poc1_samples')")
python.exec("cursor.execute('select * from chosen_samples' )")
# coding python None into NA (rPython seem to just return nothing )
python.exec("rep = lambda x : '__NA__' if x is None else x")
python.exec( "def getData(): return [rep(num) for line in cursor for num in line ]" )
data <- python.call("getData")
df <- as.data.frame(matrix(unlist(data), ncol=15, byrow=T))
names(df) <- c("temperature", "maxTemp", "minTemp",
"dewpoint", "elevation", "gust", "latitude", "longitude",
"maxwindspeed", "precipitation", "seelevelpressure", "visibility", "windspeed")
# and decoding NA's
parsena <- function (x) if (x=="__NA__") NA else x
df <- as.data.frame(lapply(df, parsena))
Anybody has a better idea?
I had the same error message when executing Rscript with RJDBC connection via batch file (R 3.2.4, Teradata driver).
Also, when run in RStudio it worked fine in the second run but not first.
What helped was explicitly call:
library(rJava)
.jinit()
It not enough to just download the driver, you have to also download the dependencies and put them into your JAVA ClassPath (MacOS: /Library/Java/Extensions) as stated on the project main page.
Include the Cassandra JDBC dependencies in your classpath : download dependencies
As of the RCassandra package, right now it's still too primitive compared to RJDBC.