R Hive Thrift Client - r

I'm working on adding HiveServer2 support to my company's R data-access package. I'm curious what the best way of generating an R Thrift client would be. I'm considering writing an R wrapper around the Java Thrift client, similar to what rhbase does, but I'd prefer a pure R solution, if possible.
Things to note:
HiveServer2 thrift server is different from the original Hive Thrift server.
I've looked at and used the RHive package. Among other issues I have with it, it requires a system-install of Hadoop and Hive, which will not always be available on R client machines.
My somewhat horrible - but currently sufficient - workaround is to wrap the beeline client in some R goodness.

The exact scope of this question may be too broad for Stackoverflow and the asker confirmed he abandoned this quest, but for future readers this is probably the thing to look for:
From R you can connect to Hive with JDBC.
This is not exactly what the asker came for, but it should serve the purpose in most cases.
The key part in the solution for this would be the RJDBC package, here is some example code found on the Cloudera Community
library(DBI)
library(rJava)
library(RJDBC)
hadoop.class.path = list.files(path=c("/usr/hdp/2.4.0.0-169/hadoop"),pattern="jar", full.names=T);
hive.class.path = list.files(path=c("/usr/hdp/current/hive-client/lib"),pattern="jar", full.names=T);
hadoop.lib.path = list.files(path=c("/usr/hdp/current/hive-client/lib"),pattern="jar",full.names=T);
mapred.class.path = list.files(path=c("/usr/hdp/current/hadoop-mapreduce-client/lib"),pattern="jar",full.names=T);
cp = c(hive.class.path,hadoop.lib.path,mapred.class.path,hadoop.class.path)
drv <- JDBC("org.apache.hive.jdbc.HiveDriver","hive-jdbc.jar",identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://ixxx:10000/default", "hive", "hive")
show_databases <- dbGetQuery(conn, "show databases")
Full disclosure: I am an employee of cloudera.

Related

Connecting to an Azure SQL Server Data Warehouse from R on a Mac - See random names instead of tables

I'm trying to connect to an Azure SQL Server (12.00.1900) from R on a Mac, using Microsoft's unixodbc SQL Server drivers (17).
I get a connection, but instead of seeing the 12 or so tables that live in the database, dbListTables returns 442 tables, all with nonsensical names, beginning with 'Csoe', 'Ote', and ending in 'xlshm_idad'. Instead of seeing the single schema that lives in the database, I see cin_1mro__e, IFRAINSHM, and s, none of which have any tables in them.
Note that when I use an ordinary SQL visualization app, that doesn't use the MS drivers, I'm able to see the tables and their content properly.
In addition, the RSQLServer package gets a working connection and sees the tables correctly, but isn't compatible with dplyr semantics.
Can anyone help or advise? I've looked for third party SQL Server unixodbc drivers for Mac, and I can't find any.
Until I see more info from OP, I'll leave as my answer the general recommendation to use R's odbc package. Assuming the correct drivers are installed, connection is configured correctly in odbc.ini, and assuming trusted_connection=yes is used in the same, then connecting from R is as simple as:
library(odbc)
dbConn <- dbConnect(odbc(), dsn = "myDSN")
if trusted connection is not on then you just need to pass uid and pwd arguments.
Also, it may be the case OP that you did not install freeTDS, so try (replace with equivalent for package manager you're using):
brew install freetds --with-unixodbc
This gives you the libtdsodbc.so driver. Make sure the DSN points to this.

R integration with asp.net and ssas

Maybe someone knows if it's possible to send data from ms sql database to R server, so it could calculate some columns and send them back to ms sql again?
I am not that much familiar with R server's integration and I am cautious that is not even possible. If this is not possible, maybe it would be possible to send them from asp.net mvc 5 using integration from .net library, but I do not think that is a good solution because the data can have more then 500k rows so it would be extremely slow.
Querying a database from a server running R requires three things:
Network security that allows you to communicate between the machines
Drivers installed on the R server
Configurations that allow you to connect to the database from R
In general, it is best to have your IT/Ops team take care of the Networking security and the installation of drivers, since these are things that they likely have security procedures around. We recommend using the RStudio Professional Drivers, which are easy to install and designed to work with our products.
Then, when it comes to the connection from R to the database, we recommend using the odbc package, which is a DBI compliant interface to using ODBC drivers. You can acquire the latest stable version from CRAN with install.packages("odbc").
In general, a connection looks something like this:
library(odbc)
con <- dbConnect(odbc(),
Driver = "SQLServer",
Server = "mysqlhost",
Database = "mydbname",
UID = "myuser",
PWD = rstudioapi::askForPassword("Database password")
Port = 1433)
The rstudioapi::askForPassword function will prompt the user for a password and reduce the need to store passwords in code. For more options on securing credentials, there is a dedicated article on the topic. Note that there is also support for DSNs:
# Using a DSN
con <- dbConnect(odbc::odbc(), "mydbalias")
for other reference please visit this url.
Hope it helps.

Cassandra Database is not connecting with R via Rcassandra

When I'm connecting to Cassandra database using RCassandra package, connection is establishing. But When trying to use any keyspace, R is not responding.I used the following statements.
library(RCassandra)
rc <- RC.connect(host ="localhost", port = 9042)
RC.use(rc, "db1", cache.def = TRUE)
Any sugestions Please
Your problem is that you're specifying the port directly, and you're using the port of the native protocol, while RCassandra uses thrift protocol (that uses port 9160), so when it's talking to port 9042, it simply don't understand what it says. So you need to either remove port argument completely, or specify it as 9160, and make sure that you have start_rpc parameter set to true in the cassandra.yaml.
I've looked into source code of the RCassandra, and see that it wasn't updated for more than 5 years. And as it uses Thrift instead of native protocol, then you have many limitations comparing to use of native protocol. And support for Thrift will be removed in the next major version of Cassandra - 4.0. The better alternative will be to write a wrapper around DataStax C/C++ driver, and expose underlying functionality to R.

RCassandra is not connecting to Cassandra Database

I'm new to Cassandra and R. When I'm connecting to Cassandra database using RCassandra package, connection is establishing. But When trying to use any keyspace, R is not responding. I used the following statements.
c <- RC.connect('192.168.1.20', 9042)
RC.use(c, 'effesensors')
Please give me a brief idea about how to use RCassandra to avoid this problem.
Are you aware that you may be using a non default port for Cassandra? If you can provide the Cassandra version and RStudio version I may be able to update my answer. I found this tutorial by tarkalabs useful as a checklist of steps to take before any connection is attempted.
From the tutorial,
Now connect to your database with connect.handle <-
RC.connect(host="127.0.0.1", port=9160)
Cassandra by default listens to port 9160 but you can change it
according to your configuration. To show the cluster type into your
prompt RC.cluster.name(connect.handle)
Just to verify that you are connected and your Cassandra instance is running try the following command:
RC.describe.keyspaces(connect.handle)
That should bring back a list of the settings in your keyspaces. If nothing returns, you are either not connected or your Cassandra instance is not properly installed.
EXAMPLE OUTPUT
$system_traces$strategy_options
replication_factor
"2"
$system_traces$cf_defs
named list()
$system_traces$durable_writes
[1] TRUE
Let me know what your results are if my answer does not work and I will update my answer. Good Luck!
make use of RODBC instead of using RCassandra. We need to install Cassandra ODBC driver.
Thanks #D. Venkata Naresh, your suggestion of using RODBC driver resolved my issue.
I am using R and datastax cassandra community edition.
This is the link I followed to configure the ODBC driver in my windows machine.
https://www.datastax.com/dev/blog/using-the-datastax-odbc-driver-for-apache-cassandra
Then, in my R studio, These are the commands to connect and fetch from the Cassandra
install.packages("RODBC")
library("RODBC")
require("RODBC")
conn <- odbcConnect(<ODBC datasource name>)
dataframe <- sqlFetch(conn, <column family / table name>)
dataframe
Hope, this answer helps someone who is facing issue with RCassandra.
I read your comments above, you are using the wrong port. You should run the following command
c <- RC.connect('192.168.1.20', 9160)
This will definitely work for you.

Querying Oracle DB from Revolution R using RODBC

RODBC error in Revolution R 64bit on winxp64 bit connected to Oracle using a 64bit ODBC driver thru a DSN
library(RODBC)
db <- odbcConnect("oraclemiso",uid="epicedf",pwd="…")
rslts = sqlQuery(db, "select count(*) from FTRAuction")
Error in .Call(C_RODBCFetchRows, attr(channel, "handle_ptr"), max, buffsize, :
negative length vectors are not allowed
I am able to connect but get an error when I query for stuff,
also the below works
library(RODBC)
channel <- odbcConnect("OraLSH", <user>, <password>))
odbcQuery (channel, "select sysdate from dual")
sqlGetResults(channel, as.is=FALSE, errors=FALSE, max=1, buffsize=1,
nullstring=NA, na.strings="NA", believeNRows=TRUE, dec=getOption("dec"))
SYSDATE
1 2010-01-24 15:10:02
but what if I dont know the rowsize(max=1) before hand
Thanks,
Arun
believeNRows=FALSE seems to be the key. Best to use it when opening the connection:
db <- odbcConnect(dsn="testdsn", uid="testuser", pwd="testpasswd", believeNRows=FALSE )
When testing with unixODBC's isql, it reports SQLRowCount to be 4294967295 (even if there's just one row) on 64bit Linux while it reports -1 on 32 bit Linux. This is probably an optimization as it enables quicker answers. It saves the database the burden of retrieving the complete response data set immediately. E.g. there might be lots of records while only the first few hits will ever be fetched.
4294967295 is (2^32)-1 which is the maximum value for an unsigned int, but will be tretated as -1 with a signed int. Thus R complains on a vector with negative length.
So I assume it's an issue about signed vs. unsigned integer (or sizeof(long) between 32 and 64 bit).
Setting believeNRows=FALSE solved the issue for me so I can use the same R code on both systems.
BTW: I'm using R 2.10.1, RODBC 1.3.2, unixODBC 2.3.0 with Oracle 10.2.0.4 on Linux 64 bit.
Be sure to use
export CFLAGS="-DBUILD_REAL_64_BIT_MODE -DSIZEOF_LONG=8 -fshort-wchar"
when doing configure for unixODBC as the Oracle ODBC driver expects REAL_64_BIT_MODE, not LEGACY_64_BIT_MODE.
And be aware of internationalization issues: R uses $LANG while Oracle uses $NLS_LANG.
I experienced problems with UTF8 so I use e.g.
LANG=en_US; NLS_LANG=American_America
The error
Error in .Call(C_RODBCFetchRows, attr(channel, "handle_ptr"), max, buffsize, :
negative length vectors are not allowed
very much looks like a 32-bit / 64-bit porting issue so I kindly suggest you get in touch with the two commercial vendors involved to have that fixed. I prefer direct database driver where available over ODBC but there is no reason why it shouldn't work as 64-bit Linux merrily plays along.
Dirk is right -- RODBC doesn't support 64-bit drivers for Oracle, at least not as of a few months ago. You may be out of luck. We had a similar issue trying to get R to access an Oracle database from a 64-bit Linux box using the following tools: 64-bit R, RODBC, unixODBC, Oracle Instant Client. I asked the R-sig-db list, including the package author (Prof. Ripley) about this, and there was no conclusive answer. I then asked Revolution if they would be willing to solve the problem, if we were to purchase licenses from them (at 5-figures/year!), and they said no.
My company is now trying to minimize use of R to areas where it is best suited. We will be using other tools (web services, JVM-based systems) to access the database, and sharing data with R only when necessary.
The underlying problem is that very few major users of R also use Oracle. R is primarily used by academics (Excel, MySQL), finance types (Postgres), and more cutting-edge analytics teams. Oracle is used by old businesses that value reliability over innovation, the exact opposite of what most R uses are looking for. So this explains why support for Oracle has fallen away, in my view.
Try max=0 and believeNRows=FALSE - that worked for me.

Resources