Want to Connect redshift to R - r

I tried to use the code from this link but I got an error
driver <- JDBC("com.amazon.redshift.jdbc41.Driver", "RedshiftJDBC41-1.1.9.1009.jar", identifier.quote="`")
JavaVM: requested Java version ((null)) not available. Using Java at "" instead.
JavaVM: Failed to load JVM: /bundle/Libraries/libserver.dylib
JavaVM FATAL: Failed to load the jvm library.
Error in .jinit(classPath) : JNI_GetCreatedJavaVMs returned -1
After loading the driver and trying to connect. I don't know how to connect Redshift to R.

This will not solve the error, but if you want to connect to Redshift from R, you can use RPostgreSQL library.
as in the answer in another R-Redshift connection issue
library (RPostgreSQL)
drv <- dbDriver("PostgreSQL")
conn <- dbConnect(drv, host="your.host.us-east-1.redshift.amazonaws.com",
port="5439",
dbname="your_db_name",
user="user",
password="password")
You also need to make sure that your IP is Redshift security group white list.

Related

Simba Athena ODBC: unable to use SQLGetPrivateProfileString functions

This is very strange, I want to setup a connection from RStudio to my instance in AWS Athena.
I am using unixodbc as the driver manager, and succeded by testing the connection using isql -v 'Simba Athena'. However, when I test the connection in RStudio with...
con <- DBI::dbConnect(
odbc::odbc(),
"Simba Athena"
)
... it gives me the error Error: nanodbc/nanodbc.cpp:1021: 00000: [Simba][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function.. Any clue about it, I am a bit stuck.
It is basically not finding the correct ODBC driver. Simba by default references the driver in its /Library/simba/athenaodbc/lib/simba.athenaodbc.ini setup file to libodbc.dylib but it should be libodbcinst.dylib. At least in MacOS.
This solved my problem.
I got the same error when I link with static library of "libodbc.a", however I can succeed to connect when I change to link with dynamic library of "libodbc.so"

I am getting a class not found error when I try to connect R with AWS Redshift

I am trying to connect R with redshift using the JDBC template they provide on their website.
I got the most updated version of the redshift jdbc and pulled JDBC() and it's not working.
install.packages("RJDBC",dep=TRUE)
library(RJDBC)
download.file('https://s3.amazonaws.com/redshift-downloads/drivers/RedshiftJDBC42-1.2.10.1009.jar','RedshiftJDBC42-1.2.10.1009.jar')
driver_redshift <- JDBC("com.amazon.redshift.jdbc42.Driver",
"RedshiftJDBC41-1.1.9.1009.jar", identifier.quote="`")
I am getting an error that says Error in .jfindClass(as.character(driverClass)[1]) : class not found
Try to download the driver with binary mode:
download.file('https://s3.amazonaws.com/redshift-downloads/drivers/RedshiftJDBC42-1.2.10.1009.jar','RedshiftJDBC42-1.2.10.1009.jar', mode="wb");
Then make sure that you're referring the correct jar:
driver <- JDBC("com.amazon.redshift.jdbc42.Driver", "RedshiftJDBC42-1.2.10.1009.jar", identifier.quote="`")

R connection to Redshift using AWS driver doesn't work but does work with Postgre driver

I am trying establish a connection to my redshift database after following the example provided by AWS https://blogs.aws.amazon.com/bigdata/post/Tx1G8828SPGX3PK/Connecting-R-with-Amazon-Redshift. However, I get errors when trying to establish the connection using their recommended driver. However, when I use the Postgre driver I can establish a connection to the redshift DB.
AWS says their driver is "optimized for performance and memory management", so I would rather use it. Can someone please review my code below, and let me know if they see something wrong? I suspect that I am not setting the URL up correctly, but not sure what I should be using instead? Thanks in advance for any help.
#' This code attempts to establish a connection to redshift database. It
#' attempts to establish a connection using the suggested redshift but doesn't
#' work.
## Clear up space and set working directory
#Clear Variables
rm(list=ls(all=TRUE))
gc()
## Libriries for analyis
library(RJDBC)
library(RPostgreSQL)
#Create DBI driver for working with redshift driver directly
# download Amazon Redshift JDBC driver
download.file('http://s3.amazonaws.com/redshift-downloads/drivers/RedshiftJDBC41-1.1.9.1009.jar',
'RedshiftJDBC41-1.1.9.1009.jar')
# connect to Amazon Redshift using specific driver
driver_redshift <- JDBC("com.amazon.redshift.jdbc41.Driver",
"RedshiftJDBC41-1.1.9.1009.jar", identifier.quote="`")
## Using postgre connection that works
#postgre driver
driver_postgre <- dbDriver("PostgreSQL")
#establish connection
conn_postgre <- dbConnect(driver_postgre, host="nhdev.c6htwjfdocsl.us-west-2.redshift.amazonaws.com",
port="5439",dbname="dev",
user="xxxx", password="xxxx")
#list the tables available
tables = dbListTables(conn_postgre)
## Use URL option to establish connection like the example on AWS website
# url <- "<JDBCURL>:<PORT>/<DBNAME>?user=<USER>&password=<PW>
# url <- "jdbc:redshift://demo.ckffhmu2rolb.eu-west-1.redshift.amazonaws.com
# :5439/demo?user=XXX&password=XXX" #useses example from AWS instructions
#url using my redshift database
url <- "jdbc:redshift://nhdev.c6htwjfdocsl.us-west-2.redshift.amazonaws.com
:5439/dev?user=xxxx&password=xxxx"
#attempt connect but gives an error
conn_redshift <- dbConnect(driver_redshift, url)
#gives the following error:
# Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
# java.sql.SQLException: Error message not found: CONN_GENERAL_ERR. Can't find bundle for base name com.amazon.redshift.core.messages, locale en
## Similier to postgre example that works but doesn't work when using redshift specific driver
#gives an error saying url is missing, but I am not sure which url to use?
conn <- dbConnect(driver_redshift, host="nhdev.c6htwjfdocsl.us-west-2.redshift.amazonaws.com",
port="5439",dbname="dev",
user="xxxx", password="xxxx")
# gives the following error:
#Error in .jcall("java/sql/DriverManager", "Ljava/sql/Connection;", "getConnection", :
# argument "url" is missing, with no default
I've done it this way it works for me:
drv <- JDBC("com.amazon.redshift.jdbc41.Driver","PathTO/RedshiftJDBC41-1.1.2.0002.jar")
conn <- dbConnect(drv,"jdbc:redshift://......redshift.amazonaws.com:5439/dev",User,PWD)
The difference I see in yours is that you don't mention the full path to redshift jar in driver_redshift.
Hope it works.

Connecting to Hive in R

I am trying to connect to hive in R. I have loaded RJDBC and rJava libraries on my R env.
I am using a Linux server with hadoop (hortonworks sandbox 2.1) and R (3.1.1) installed in the same box. This is the script I am using to connect:
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/default")
I get this error:
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :java.lang.NoClassDefFoundError: Could not initialize class org.apache.hive.service.auth.HiveAuthFactory
I have checked that my classpath contains all the jar files in /usr/lib/hive and /usr/lib/hadoop,but can not be sure if anything else is missing. Any idea what is causing the problem??
I am fairly new to R (and programming for that matter) so any specific steps are much appreciated.
I succesffuly connect to Hive from R just with RJDBC and a few configuration lines. I prefere RJDBC to rHive because rHive needs complex installations on all the node of the cluster (and I don't really understand why).
Here is my R solution :
#loading libraries
library("DBI")
library("rJava")
library("RJDBC")
#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hive/lib/hive-jdbc.jar", "/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/libthrift-0.9.2.jar", "/usr/lib/hive/lib/hive-service.jar", "/usr/lib/hive/lib/httpclient-4.2.5.jar", "/usr/lib/hive/lib/httpcore-4.2.5.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)
#init of the connexion to Hive server
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")
#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases
You can simply connect to hiveserver2 from R using the RHIVE package
Below are the commands that I had to use.
Sys.setenv(HIVE_HOME="/usr/local/hive") Sys.setenv(HADOOP_HOME="/usr/local/hadoop") rhive.env(ALL=TRUE) rhive.init() rhive.connect("localhost")

Connect R and Vertica using RODBC

This is my first time connecting to Vertica. I have already connected to a MySQL database sucessfully by using RODBC library.
I have the database setup in vertica and I installed the windows 64-bit ODBC driver from https://my.vertica.com/download-community-edition/
When I tried to connect to vertica using R, I get the below error:
channel = odbcDriverConnect(connection = "Server=myserver.edu;Database=mydb;User=mydb;Password=password")
Warning messages:
1: In odbcDriverConnect(connection = "Server=myserver.edu;Database=mydb;User=mydb;Password=password") :
[RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
2: In odbcDriverConnect(connection = "Server=myserver.edu;Database=mydb;User=mydb;Password=password") :
ODBC connection failed
Can someone tell me how to fix this? Or is there any other ways to connect to vertica using R?
It may not be the fastest, but I prefer to use the Vertica JDBC driver from R. Getting the ODBC drivers working is a little messy across different operating systems. If you already have a Java Runtime Environment (JRE) installed for other applications then this is fairly straightforward.
Download the Vertica JDBC drivers for your Vertica server version from the MyVertica portal. Place the driver (a .jar file) in a reasonable location for your operating system.
Install RJDBC into your workspace:
install.packages("RJDBC",dep=TRUE)
In your R script, load the RJDBC module and create an instance of the Vertica driver, adjusting the classPath argument to point to the location and filename of the driver you downloaded:
library(RJDBC)
vDriver <- JDBC(driverClass="com.vertica.jdbc.Driver", classPath="full\path\to\driver\vertica_jdbc_VERSION.jar")
Make a new connection using the driver object, substituting your connection details for the host, username and password:
vertica <- dbConnect(vDriver, "jdbc:vertica://host:5433/db", "username", "password")
Then run your SQL queries:
myframe = dbGetQuery(vertica, "select Address,City,State,ZipCode from MyTable")
You have to use double slash in the classPath arguement in JDBC function.
for example,
vDriver <- JDBC(driverClass="com.vertica.jdbc.Driver",
classPath="C:\\Program Files\\Vertica Systems\\JDBC\\vertica-jdk5-6.1.2-0.jar")
worked for me, while just copying and pasting the route failed.

Resources