Writing data rom R to AWS Redshift db directly - r

Is there a way to write large datasets from R studio to AWS Redshift db directly? I used the following solution that I got online. but it throws error -unused argument: tablename=".."
install.packages('devtools')
devtools::install_github("RcppCore/Rcpp")
devtools::install_github("rstats-db/DBI")
devtools::install_github("rstats-db/RPostgres")
install.packages("aws.s3", repos = c(getOption("repos"), "http://cloudyr.github.io/drat"))
devtools::install_github("sicarul/redshiftTools")
library("aws.s3") library(RPostgres) library(redshiftTools)
pconn_r <- dbConnect(RPostgres::Postgres(), dbname="db",
host='xyz.db.amazon.com', port='1234',
user='user', password='pwd',sslmode='require')
rs_replace_table(tst, dbcon=pconn_r, tableName='abc', bucket="pqr")
Please help!

You can look that library: redshift-r
Or connect throught RJDBC: RJDBC

Related

Athena Connection with R

I am new to Athena. I want to connect this with R
Sys.getenv()
URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC42_2.0.14.jar'
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)
drv <- JDBC(driverClass="com.simba.athena.jdbc.Driver", fil, identifier.quote="'")
This is the error message
Error in .jfindClass(as.character(driverClass)[1]) :
java.lang.ClassNotFoundException
Referred this article
https://aws.amazon.com/blogs/big-data/running-r-on-amazon-athena/
con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.ap-south-1.amazonaws.com:443/',
s3_staging_dir="s3://aws-athena-query-results-ap-south-1-region/",
user=("xxx"),
password=("xxx"))
Need help really struggling from two days
Thanks in advance. I downloaded jar files and java.
You are using a newer driver version and the driver is now developed by simba and therefore the driver class name has changed.
The driver class is now com.simba.athena.jdbc.Driver.
You may also want to check out AWR.Athena - A nice R package to interact with Athena.
If you are still having difficulty with the JDBC drivers for Athena you could always try: RAthena or noctua. These two packages opt in using AWS SDK to make the connection to AWS Athena.
RAthena uses Python boto3 sdk (similar to pyathena), through reticulate.
noctua uses R paws sdk.
Code Example:
library(DBI)
# connect to AWS
# using ~/.aws/credentials to store aws credentials
con <- dbConnect(RAthena::athena(),
s3_staging_dir = "s3://mybucket/")
# upload some data into aws athena
dbWriteTable(con, "iris", iris)
# query iris in aws athena
dbGetQuery(con, "select * from iris")
NOTE: noctua works extactly the same way as code example above but instead the driver is: noctua::athena()

Cannot connect R to DB2

I want to connect R 3.2.3 to DB2. I tried this code :
library(RJDBC)
jcc = JDBC("com.ibm.db2.jcc.DB2Driver","c:/installed/sqllib/java/db2jcc4.jar")
I get this error :
Error: could not find function "JDBC"
Any Ideas ?
Tantaoui, I suggest you take a look at ibmdbR, which is a dedicated R data frame API to DB2 and dashDB data: https://cran.r-project.org/web/packages/ibmdbR/ibmdbR.pdf

Connecting to Hive in R

I am trying to connect to hive in R. I have loaded RJDBC and rJava libraries on my R env.
I am using a Linux server with hadoop (hortonworks sandbox 2.1) and R (3.1.1) installed in the same box. This is the script I am using to connect:
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/default")
I get this error:
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :java.lang.NoClassDefFoundError: Could not initialize class org.apache.hive.service.auth.HiveAuthFactory
I have checked that my classpath contains all the jar files in /usr/lib/hive and /usr/lib/hadoop,but can not be sure if anything else is missing. Any idea what is causing the problem??
I am fairly new to R (and programming for that matter) so any specific steps are much appreciated.
I succesffuly connect to Hive from R just with RJDBC and a few configuration lines. I prefere RJDBC to rHive because rHive needs complex installations on all the node of the cluster (and I don't really understand why).
Here is my R solution :
#loading libraries
library("DBI")
library("rJava")
library("RJDBC")
#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hive/lib/hive-jdbc.jar", "/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/libthrift-0.9.2.jar", "/usr/lib/hive/lib/hive-service.jar", "/usr/lib/hive/lib/httpclient-4.2.5.jar", "/usr/lib/hive/lib/httpcore-4.2.5.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)
#init of the connexion to Hive server
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")
#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases
You can simply connect to hiveserver2 from R using the RHIVE package
Below are the commands that I had to use.
Sys.setenv(HIVE_HOME="/usr/local/hive") Sys.setenv(HADOOP_HOME="/usr/local/hadoop") rhive.env(ALL=TRUE) rhive.init() rhive.connect("localhost")

Connect to MSSQL using DBI

I can not connect to MSSQL using DBI package.
I am trying the way shown in package itself
m <- dbDriver("RODBC") # error
Error: could not find function "RODBC"
# open the connection using user, passsword, etc., as
# specified in the file \file{\$HOME/.my.cnf}
con <- dbConnect(m, dsn="data.source", uid="user", pwd="password"))
Any help appreciated. Thanks
As an update to this question: RStudio have since created the odbc package (or GitHub version here) that handles ODBC connections to a number of databases through DBI. For SQL Server you use:
con <- DBI::dbConnect(odbc::odbc(),
driver = "SQL Server",
server = <serverURL>,
database = <databasename>,
uid = <username>,
pwd = <passwd>)
You can also set a dsn or supply a connection string.
It looks like there used to be a RODBC driver for DBI, but not any more:
http://cran.r-project.org/src/contrib/Archive/DBI.RODBC/
A bit of tweaking has got this to install in a version 3 R but I don't have any ODBC sources to test it on. But m = dbDriver("RODBC") doesn't error.
> m = dbDriver("RODBC")
> m
<ODBCDriver:(29781)>
>
Suggest you ask on the R-sig-db mailing list to maybe find out what happened to this code and/or the author...
Solved.
I used library RODBC. It has great functionality to connect sql and run sql queries in R.
Loading Library:
library(RODBC)
# dbDriver is connection string with userID, database name, password etc.
dbhandle <- odbcDriverConnect(dbDriver)
Running Sql query
sqlQuery(channel=dbhandle, query)
Thats It.

Connection R to Cassandra using RCassandra

I have an instance of Cassandra running on my localhost. For this example I have used the default configuration provided in conf\cassandra.yaml
I tried to connect R to Cassandra using the RCassandra package.
Basically, i have just installed the RCassandra package in R and tried to connect.
library("RCassandra")
RC.connect('localhost','9160')
RC.connect('127.0.0.1','9160')
None of those are working. Here is the error I get:
Error in RC.connect("localhost", port = "9160") :
cannot connect to locahost:9160
Using Cassandra-cli with the same parameters work. Can you please help on that.
Thank you
Set start_rpc: true in cassandra.yaml file.
Could not fix it but found a way to make it work: initiate a jdbc connection and then launch RCassandra
#Load RJDBC
library(RJDBC)
#Load in the Cassandra-JDBC diver
cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
list.files("C://Users//aab_ITSolutions//apache-cassandra-1.0.10//lib",pattern="jar$",full.names=T))
#Connect to Cassandra node and Keyspace
casscon <- dbConnect(cassdrv, "jdbc:cassandra://localhost:9160/DEMO")
#Query timeseries data
res <- dbGetQuery(casscon, "select * from StockHist limit 10")
library("RCassandra")
connx = RC.connect('localhost',9160)

Resources