R and cassandra connection error - r

library(RJDBC)
cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
list.files("/home/beyhan/Downloads/jars/",pattern="jar$",full.names=T))
casscon <- dbConnect(cassdrv, "jdbc:cassandra://localhost:9042")
Output
> cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
+ list.files("/home/beyhan/Downloads/jars/",pattern="jar$",full.names=T))
> casscon <- dbConnect(cassdrv, "jdbc:cassandra://localhost:9042")
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect",
as.character(url)[1], : java.lang.NoClassDefFoundError:
org/apache/thrift/transport/TTransportException

Okay, the ODBC Connector is based on the THRIFT Protocol. The THRIFT Connection to Cassandra is deprecated. I think the Python in solution is the best approach for you. Here a example: How to read data from Cassandra with R?
And here is a blog post about Thrift vs. CQL: http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster

Our JDBC Driver for Cassandra allows you to access your Cassandra data in R. To be clear, our driver creates a relational interface to your Cassandra data, allowing you to submit SQL queries to Cassandra through our driver (internally, we translate the SQL to CQL, send the request and return the results as a relational database).
We have an article in our Knowledge Base for connecting, but I'll transcribe it here as well.
Load the RJDBC Package:
library(RJDBC)
Set the driver class and classpath:
driver <- JDBC(driverClass = "cdata.jdbc.cassandra.CassandraDriver", classPath = "MyInstallationDir\lib\cdata.jdbc.cassandra.jar", identifier.quote = "'")
Initialize the JDBC connection:
conn <- dbConnect(driver,"Database=MyCassandraDB;Port=7000;Server=127.0.0.1;")
(Set the Server, Port, and Database connection properties to connect to Cassandra.)
At this point, you can perform standards actions available in R, like:
Listing the tables:
dbListTables(conn)
Executing any SQL query supported by the Cassandra API:
customer <- dbGetQuery(conn,"SELECT City, SUM(TotalDue) FROM Customer GROUP BY City")
Viewing the results:
View(customer)
Feel free to download a free Beta of the driver! If you have any questions, please let us know.

Related

R teradata DBI:dbConnect() error: TimedOut: No response received when attempting to connect to the Teradata server

I am going to ask and answer this question because I spent more time than I'd like to admit searching for a response and couldn't find one. I installed Teradata ODBC Driver 16.20. In the ODBC Data Source Administrator, I added a Data Source. I named it teradata, put in the name of the Teradata Server to connect to and my username and password for authentication. When I tried running the following code in RStudio:
con <- DBI::dbConnect(odbc::odbc(),
"teradata")
I would get an error:
Error: nanodbc/nanodbc.cpp:1021: HY000: [Teradata][WSock32 DLL] (434) WSA E TimedOut: No response received when attempting to connect to the Teradata server
To solve this, I needed to pass a timeout argument:
con <- DBI::dbConnect(odbc::odbc(),
"teradata",
timeout = 20)

R connection to Redshift using AWS driver doesn't work but does work with Postgre driver

I am trying establish a connection to my redshift database after following the example provided by AWS https://blogs.aws.amazon.com/bigdata/post/Tx1G8828SPGX3PK/Connecting-R-with-Amazon-Redshift. However, I get errors when trying to establish the connection using their recommended driver. However, when I use the Postgre driver I can establish a connection to the redshift DB.
AWS says their driver is "optimized for performance and memory management", so I would rather use it. Can someone please review my code below, and let me know if they see something wrong? I suspect that I am not setting the URL up correctly, but not sure what I should be using instead? Thanks in advance for any help.
#' This code attempts to establish a connection to redshift database. It
#' attempts to establish a connection using the suggested redshift but doesn't
#' work.
## Clear up space and set working directory
#Clear Variables
rm(list=ls(all=TRUE))
gc()
## Libriries for analyis
library(RJDBC)
library(RPostgreSQL)
#Create DBI driver for working with redshift driver directly
# download Amazon Redshift JDBC driver
download.file('http://s3.amazonaws.com/redshift-downloads/drivers/RedshiftJDBC41-1.1.9.1009.jar',
'RedshiftJDBC41-1.1.9.1009.jar')
# connect to Amazon Redshift using specific driver
driver_redshift <- JDBC("com.amazon.redshift.jdbc41.Driver",
"RedshiftJDBC41-1.1.9.1009.jar", identifier.quote="`")
## Using postgre connection that works
#postgre driver
driver_postgre <- dbDriver("PostgreSQL")
#establish connection
conn_postgre <- dbConnect(driver_postgre, host="nhdev.c6htwjfdocsl.us-west-2.redshift.amazonaws.com",
port="5439",dbname="dev",
user="xxxx", password="xxxx")
#list the tables available
tables = dbListTables(conn_postgre)
## Use URL option to establish connection like the example on AWS website
# url <- "<JDBCURL>:<PORT>/<DBNAME>?user=<USER>&password=<PW>
# url <- "jdbc:redshift://demo.ckffhmu2rolb.eu-west-1.redshift.amazonaws.com
# :5439/demo?user=XXX&password=XXX" #useses example from AWS instructions
#url using my redshift database
url <- "jdbc:redshift://nhdev.c6htwjfdocsl.us-west-2.redshift.amazonaws.com
:5439/dev?user=xxxx&password=xxxx"
#attempt connect but gives an error
conn_redshift <- dbConnect(driver_redshift, url)
#gives the following error:
# Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
# java.sql.SQLException: Error message not found: CONN_GENERAL_ERR. Can't find bundle for base name com.amazon.redshift.core.messages, locale en
## Similier to postgre example that works but doesn't work when using redshift specific driver
#gives an error saying url is missing, but I am not sure which url to use?
conn <- dbConnect(driver_redshift, host="nhdev.c6htwjfdocsl.us-west-2.redshift.amazonaws.com",
port="5439",dbname="dev",
user="xxxx", password="xxxx")
# gives the following error:
#Error in .jcall("java/sql/DriverManager", "Ljava/sql/Connection;", "getConnection", :
# argument "url" is missing, with no default
I've done it this way it works for me:
drv <- JDBC("com.amazon.redshift.jdbc41.Driver","PathTO/RedshiftJDBC41-1.1.2.0002.jar")
conn <- dbConnect(drv,"jdbc:redshift://......redshift.amazonaws.com:5439/dev",User,PWD)
The difference I see in yours is that you don't mention the full path to redshift jar in driver_redshift.
Hope it works.

Connect to MSSQL using DBI

I can not connect to MSSQL using DBI package.
I am trying the way shown in package itself
m <- dbDriver("RODBC") # error
Error: could not find function "RODBC"
# open the connection using user, passsword, etc., as
# specified in the file \file{\$HOME/.my.cnf}
con <- dbConnect(m, dsn="data.source", uid="user", pwd="password"))
Any help appreciated. Thanks
As an update to this question: RStudio have since created the odbc package (or GitHub version here) that handles ODBC connections to a number of databases through DBI. For SQL Server you use:
con <- DBI::dbConnect(odbc::odbc(),
driver = "SQL Server",
server = <serverURL>,
database = <databasename>,
uid = <username>,
pwd = <passwd>)
You can also set a dsn or supply a connection string.
It looks like there used to be a RODBC driver for DBI, but not any more:
http://cran.r-project.org/src/contrib/Archive/DBI.RODBC/
A bit of tweaking has got this to install in a version 3 R but I don't have any ODBC sources to test it on. But m = dbDriver("RODBC") doesn't error.
> m = dbDriver("RODBC")
> m
<ODBCDriver:(29781)>
>
Suggest you ask on the R-sig-db mailing list to maybe find out what happened to this code and/or the author...
Solved.
I used library RODBC. It has great functionality to connect sql and run sql queries in R.
Loading Library:
library(RODBC)
# dbDriver is connection string with userID, database name, password etc.
dbhandle <- odbcDriverConnect(dbDriver)
Running Sql query
sqlQuery(channel=dbhandle, query)
Thats It.

Connect R and Vertica using RODBC

This is my first time connecting to Vertica. I have already connected to a MySQL database sucessfully by using RODBC library.
I have the database setup in vertica and I installed the windows 64-bit ODBC driver from https://my.vertica.com/download-community-edition/
When I tried to connect to vertica using R, I get the below error:
channel = odbcDriverConnect(connection = "Server=myserver.edu;Database=mydb;User=mydb;Password=password")
Warning messages:
1: In odbcDriverConnect(connection = "Server=myserver.edu;Database=mydb;User=mydb;Password=password") :
[RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
2: In odbcDriverConnect(connection = "Server=myserver.edu;Database=mydb;User=mydb;Password=password") :
ODBC connection failed
Can someone tell me how to fix this? Or is there any other ways to connect to vertica using R?
It may not be the fastest, but I prefer to use the Vertica JDBC driver from R. Getting the ODBC drivers working is a little messy across different operating systems. If you already have a Java Runtime Environment (JRE) installed for other applications then this is fairly straightforward.
Download the Vertica JDBC drivers for your Vertica server version from the MyVertica portal. Place the driver (a .jar file) in a reasonable location for your operating system.
Install RJDBC into your workspace:
install.packages("RJDBC",dep=TRUE)
In your R script, load the RJDBC module and create an instance of the Vertica driver, adjusting the classPath argument to point to the location and filename of the driver you downloaded:
library(RJDBC)
vDriver <- JDBC(driverClass="com.vertica.jdbc.Driver", classPath="full\path\to\driver\vertica_jdbc_VERSION.jar")
Make a new connection using the driver object, substituting your connection details for the host, username and password:
vertica <- dbConnect(vDriver, "jdbc:vertica://host:5433/db", "username", "password")
Then run your SQL queries:
myframe = dbGetQuery(vertica, "select Address,City,State,ZipCode from MyTable")
You have to use double slash in the classPath arguement in JDBC function.
for example,
vDriver <- JDBC(driverClass="com.vertica.jdbc.Driver",
classPath="C:\\Program Files\\Vertica Systems\\JDBC\\vertica-jdk5-6.1.2-0.jar")
worked for me, while just copying and pasting the route failed.

Connection R to Cassandra using RCassandra

I have an instance of Cassandra running on my localhost. For this example I have used the default configuration provided in conf\cassandra.yaml
I tried to connect R to Cassandra using the RCassandra package.
Basically, i have just installed the RCassandra package in R and tried to connect.
library("RCassandra")
RC.connect('localhost','9160')
RC.connect('127.0.0.1','9160')
None of those are working. Here is the error I get:
Error in RC.connect("localhost", port = "9160") :
cannot connect to locahost:9160
Using Cassandra-cli with the same parameters work. Can you please help on that.
Thank you
Set start_rpc: true in cassandra.yaml file.
Could not fix it but found a way to make it work: initiate a jdbc connection and then launch RCassandra
#Load RJDBC
library(RJDBC)
#Load in the Cassandra-JDBC diver
cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
list.files("C://Users//aab_ITSolutions//apache-cassandra-1.0.10//lib",pattern="jar$",full.names=T))
#Connect to Cassandra node and Keyspace
casscon <- dbConnect(cassdrv, "jdbc:cassandra://localhost:9160/DEMO")
#Query timeseries data
res <- dbGetQuery(casscon, "select * from StockHist limit 10")
library("RCassandra")
connx = RC.connect('localhost',9160)

Resources