RCassandra is not connecting to Cassandra Database - r

I'm new to Cassandra and R. When I'm connecting to Cassandra database using RCassandra package, connection is establishing. But When trying to use any keyspace, R is not responding. I used the following statements.
c <- RC.connect('192.168.1.20', 9042)
RC.use(c, 'effesensors')
Please give me a brief idea about how to use RCassandra to avoid this problem.

Are you aware that you may be using a non default port for Cassandra? If you can provide the Cassandra version and RStudio version I may be able to update my answer. I found this tutorial by tarkalabs useful as a checklist of steps to take before any connection is attempted.
From the tutorial,
Now connect to your database with connect.handle <-
RC.connect(host="127.0.0.1", port=9160)
Cassandra by default listens to port 9160 but you can change it
according to your configuration. To show the cluster type into your
prompt RC.cluster.name(connect.handle)
Just to verify that you are connected and your Cassandra instance is running try the following command:
RC.describe.keyspaces(connect.handle)
That should bring back a list of the settings in your keyspaces. If nothing returns, you are either not connected or your Cassandra instance is not properly installed.
EXAMPLE OUTPUT
$system_traces$strategy_options
replication_factor
"2"
$system_traces$cf_defs
named list()
$system_traces$durable_writes
[1] TRUE
Let me know what your results are if my answer does not work and I will update my answer. Good Luck!

make use of RODBC instead of using RCassandra. We need to install Cassandra ODBC driver.

Thanks #D. Venkata Naresh, your suggestion of using RODBC driver resolved my issue.
I am using R and datastax cassandra community edition.
This is the link I followed to configure the ODBC driver in my windows machine.
https://www.datastax.com/dev/blog/using-the-datastax-odbc-driver-for-apache-cassandra
Then, in my R studio, These are the commands to connect and fetch from the Cassandra
install.packages("RODBC")
library("RODBC")
require("RODBC")
conn <- odbcConnect(<ODBC datasource name>)
dataframe <- sqlFetch(conn, <column family / table name>)
dataframe
Hope, this answer helps someone who is facing issue with RCassandra.

I read your comments above, you are using the wrong port. You should run the following command
c <- RC.connect('192.168.1.20', 9160)
This will definitely work for you.

Related

Connecting to an Azure SQL Server Data Warehouse from R on a Mac - See random names instead of tables

I'm trying to connect to an Azure SQL Server (12.00.1900) from R on a Mac, using Microsoft's unixodbc SQL Server drivers (17).
I get a connection, but instead of seeing the 12 or so tables that live in the database, dbListTables returns 442 tables, all with nonsensical names, beginning with 'Csoe', 'Ote', and ending in 'xlshm_idad'. Instead of seeing the single schema that lives in the database, I see cin_1mro__e, IFRAINSHM, and s, none of which have any tables in them.
Note that when I use an ordinary SQL visualization app, that doesn't use the MS drivers, I'm able to see the tables and their content properly.
In addition, the RSQLServer package gets a working connection and sees the tables correctly, but isn't compatible with dplyr semantics.
Can anyone help or advise? I've looked for third party SQL Server unixodbc drivers for Mac, and I can't find any.
Until I see more info from OP, I'll leave as my answer the general recommendation to use R's odbc package. Assuming the correct drivers are installed, connection is configured correctly in odbc.ini, and assuming trusted_connection=yes is used in the same, then connecting from R is as simple as:
library(odbc)
dbConn <- dbConnect(odbc(), dsn = "myDSN")
if trusted connection is not on then you just need to pass uid and pwd arguments.
Also, it may be the case OP that you did not install freeTDS, so try (replace with equivalent for package manager you're using):
brew install freetds --with-unixodbc
This gives you the libtdsodbc.so driver. Make sure the DSN points to this.

Trouble connecting to Oracle database using RODBC

I recently upgraded from Windows 7 to Windows 10 and had to reset some remote database connections. I had previously been connecting quite successfully to an Oracle database using the Oracle 11g client and RODBC.
library(RODBC)
channel<-
odbcConnect(dsn="myoracleDB",
uid='myusername',
pw='mypassword',
believeNRows=FALSE)
result<- sqlQuery(channel,"select * from schema_name.table_name")
close(channel)
Since the Windows 10 upgrade, the above connection protocol no longer works. Specifically, I get the following error:
channel<-
odbcConnect(dsn="myoracleDB",
uid='myusername',
pw='mypassword',
believeNRows=FALSE)
Warning messages:
1: In RODBC::odbcDriverConnect("DSN=myoracleDB;UID=myusername;
PWD=mypassword",:
[RODBC] ERROR: state HY000, code 12170, message [Oracle][ODBC]
[Ora]ORA-12170: TNS:Connect timeout occurred
2: In RODBC::odbcDriverConnect("DSN=myoracleDB;UID=myusername;
PWD=mypassword",:ODBC connection failed
Two additional observations are relevant here:
I use the Windows command line to execute tnsping myoracleDB which returns a successful connection to the database
I can also use Oracle's SQL Developer Application to successfully connect to and query from the database.
So I feel confident that the Oracle Client and the ODBC Data Sources are set up correctly.
Interestingly, I AM able to connect to my database using the RODBC library if I use the following code:
mycon = odbcDriverConnect("Driver={Oracle in OraClient11g_home1};
Dbq=myoracleDB; Uid=myusername; Pwd=mypassword;",
believeNRows=FALSE)
My question for the community is:
This new connection protocol works (which I'm happy about). However, since I don't really understand why it works when the approach that worked before no longer works, I fear I may be ignoring some underlying problem that could really hurt me down the road.
I have found the following SO threads to be helpful, though neither really addresses my issue exactly:
Failure to connect to odbc database in R
Connect to ORACLE via R, using the info in sql developer
UPDATE:
I have accessed the Windows ODBC 64 bit menu and verified that I do have a DSN called "myoracleDB" which is assigned to the "Oracle in OraClient11g_home1" driver. I have tested this connection and find that it works fine. I have also used the RODBC line:
odbcDataSources()
in RStudio and found that the data source "myoracleDB" is recognized. However, when I try to execute:
channel<-
odbcConnect(dsn="myoracleDB",
uid='myusername',
pw='mypassword',
believeNRows=FALSE)
I still get the error:
"TNS: Connect timeout occurred ODBC connection failed"
If you check out the docs, DSN=myoracleDB tells RODBC to connect to the Windows DSN "myoracleDB", while Dbq=myoracleDB tells RODBC to connect to the TNSNAMES entry "myoracleDB". They're two different ways of resolving database names. tnsping and SQL Developer also both use TNSNAMES to resolve databases.
So I think your DSN probably got deleted when you reset things. You can test it by going to Control Panel > Administrative Tools > Data Sources (ODBC). If your database is there, you should be able to Configure it and click Test Connection to make sure it's working. Otherwise you can add it there, and your original configuration should work again.

Connecting cassandra to Tableau Software

I want to connect Tableau software to my cassandra database. Note that i'm using tableau in windows7 and cassandra in ubuntu (Virtual machine).
For this i've installed the Cassandra ODBC (and Simba cassandra ODBC but i got the same problem). I got a connexion succes and i found my keyspace but not my tables !!!!!!
But no table in "cim" keyspace !!
Note that in my keyspace "cim" i have 3 tables that i can request with any problem in cassandra. Is there something i should do before creating the ODBC driver ???
Thank you for your help
The ODBC driver as it stands currently uses thrift so won't be able to communicate directly with cql3 to display the table names. Describe commands also won't work. However, you should still be able to select data from your tables. Updates to the ODBC driver should provide cql3 support at some point in the new year.
Update Simba ODBC driver for Cassandra supports CQL3 and solves your problem.
http://www.simba.com/connectors/apache-cassandra-odbc

OBIEE connect to impala

I'm trying to connect OBIEE to Impala. Where I try my test, I encounter a problem that I can't
resolved,here comes my steps:
download the Cloudera latest Impala ODBC driver for windows,and import metadata from impala,I can finally successfully see data in Admin Tools like this:
upload the rpd file to the server,and download cloudera impala odbc driver for linux,and configure it,in the end ,I can do it like this which shows I have configure the driver successfully:
I try to create new analysis through 【Create Direct Database Request】 to test weather I can successfully connect Impala, but I can never connect it due to the reason like this,I can never fingure out why:
Is there anybody successfully do it or tell me how I can resolve the problem? Thanks!
I recommend connecting directly to Hive instead. According to Oracle's documentation Impala connectivity is not directly supported. OBIEE is built connect to Hive and Impala is not referenced:
http://docs.oracle.com/cd/E28280_01/bi.1111/e10540/deploy_rpd.htm#BABGIAJH

R Hive Thrift Client

I'm working on adding HiveServer2 support to my company's R data-access package. I'm curious what the best way of generating an R Thrift client would be. I'm considering writing an R wrapper around the Java Thrift client, similar to what rhbase does, but I'd prefer a pure R solution, if possible.
Things to note:
HiveServer2 thrift server is different from the original Hive Thrift server.
I've looked at and used the RHive package. Among other issues I have with it, it requires a system-install of Hadoop and Hive, which will not always be available on R client machines.
My somewhat horrible - but currently sufficient - workaround is to wrap the beeline client in some R goodness.
The exact scope of this question may be too broad for Stackoverflow and the asker confirmed he abandoned this quest, but for future readers this is probably the thing to look for:
From R you can connect to Hive with JDBC.
This is not exactly what the asker came for, but it should serve the purpose in most cases.
The key part in the solution for this would be the RJDBC package, here is some example code found on the Cloudera Community
library(DBI)
library(rJava)
library(RJDBC)
hadoop.class.path = list.files(path=c("/usr/hdp/2.4.0.0-169/hadoop"),pattern="jar", full.names=T);
hive.class.path = list.files(path=c("/usr/hdp/current/hive-client/lib"),pattern="jar", full.names=T);
hadoop.lib.path = list.files(path=c("/usr/hdp/current/hive-client/lib"),pattern="jar",full.names=T);
mapred.class.path = list.files(path=c("/usr/hdp/current/hadoop-mapreduce-client/lib"),pattern="jar",full.names=T);
cp = c(hive.class.path,hadoop.lib.path,mapred.class.path,hadoop.class.path)
drv <- JDBC("org.apache.hive.jdbc.HiveDriver","hive-jdbc.jar",identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://ixxx:10000/default", "hive", "hive")
show_databases <- dbGetQuery(conn, "show databases")
Full disclosure: I am an employee of cloudera.

Resources