R using RJDBC not writing data to Hive table - r

RJDBC connecting to Hive fine and also reading the data from Hive. But it is not writing data to Hive using --> dbWriteTable.
see below-
options(java.parameters = "-Xmx8g")
library(DBI)
library(rJava)
library(RJDBC)
cp <- c(list.files("/tmp/R_hive_libs/cloudera_hive_jars", pattern = "[.]jar", full.names=TRUE, recursive=TRUE),list.files("/tmp/R_hive_libs/R_hadoop_libs", pattern = "[.]jar", full.names=TRUE, recursive=TRUE),list.files("/tmp/R_hive_libs/R_hadoop_libs/lib", pattern = "[.]jar", full.names=TRUE, recursive=TRUE), recursive=TRUE)
drv <- JDBC(driverClass = "com.cloudera.hive.jdbc4.HS2Driver", classPath=cp)
conn <- dbConnect(drv, "jdbc:hive2://XXXXXX:10000/default", "user", "password")
show_databases <- dbGetQuery(conn, "show databases")
List_of_Tables <- dbListTables(conn)
data1 <- dbGetQuery(conn, "select * from XXX.xxx limit 10000")
data_to_write_back_to_hive <- data.frame(aggregate(data1$xxx.xxx, by=list(Month=data1$xxx.cmp_created_timestamp_month), FUN=sum))
data_to_write_back_to_hive[[2]] <-c(10,20)
colnames(data_to_write_back_to_hive) <- c("Month", "Energy")
dbWriteTable(conn, "xxxx.checking",data_to_write_back_to_hive)
How to write data back to hive? it is giving below error-
Error in .local(conn, statement, ...) : execute JDBC update query failed in dbSendUpdate ([Simba]HiveJDBCDriver ERROR
processing query/statement. Error Code: 40000, SQL state:
TStatus(statusCode:ERROR_STATUS,
infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error
while compiling statement: FAILED: ParseException line 1:36 mismatched
input 'PRECISION' expecting ) near 'DOUBLE' in create table
statement:28:27,
org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:326,
org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:102,
org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:171,
org.apache.hive.service.cli.operation.Operation:run:Operation.java:268,
org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:410,
org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:391,
sun.reflect.GeneratedMethodAccessor56:invoke::-1,
sun.reflect.DelegatingMeth

This question comes up a fair bit. I think the short answer is that you can't do what you want at present. The DBI/JDBC drivers don't metaprogram syntactically correct HiveQL.

Related

Error When Insert data frame using RMySQL on R

Using R, I tried to insert a data frame. My script looks like below:
con <- dbConnect(RMySQL::MySQL(), username = "xxxxxx", password = "xxxxxx",host = "127.0.0.1", dbname = "xxxxx")
dbWriteTable(conn=con,name='table',value=as.data.frame(df), append = TRUE, row.names = F)
dbDisconnect(con)
WHen the script hit below line:
dbWriteTable(conn=con,name='table',value=as.data.frame(df), append = TRUE, row.names = F)
I got an error like below:
Error in .local(conn, statement, ...) : could not run statement: Invalid utf8 character string: 'M'
I am not sure why this error occurred. This is a part of script that has been run well on another machine.
Please advise
You should create proper connection, then only can insert data frame to your DB.
for creating connection username & password, host name & data base name should correct. same code only but i removed some parameter
try this:
mydb = dbConnect(MySQL(), user='root', password='password', dbname='my_database', host='localhost')
i just insert Iris data in my_database
data(iris)
dbWriteTable(mydb, name='db', value=iris)
i inserted iris data frame in the name of db in my_database

RPostgreSQL loading multiple CSV files into an Postgresql table

I'm new at using Postgresql, and I'm having trouble populating a table I created with multiple *.csv files. I was working first in pgAdmin4, then I decide to work on RPostgreSQL as R is my main language.
Anyway, I am dealing (for now) with 30 csv files located in one folder. All have the same headers and general structure, for instance:
Y:/Clickstream/test1/video-2016-04-01_PARSED.csv
Y:/Clickstream/test1/video-2016-04-02_PARSED.csv
Y:/Clickstream/test1/video-2016-04-03_PARSED.csv
... and so on.
I tried to load all csv files by using a following the RPostgresql specific answer from Parfait. Sadly, it didn't work. My code is specified below:
library(RPostgreSQL)
dir = list.dirs(path = "Y:/Clickstream/test1")
num = (length(dir))
psql.connection <- dbConnect(PostgreSQL(),
dbname="coursera",
host="127.0.0.1",
user = "postgres",
password="xxxx")
for (d in dir){
filenames <- list.files(d)
for (f in filenames){
csvfile <- paste0(d, '/', f)
# IMPORT USING COPY COMMAND
sql <- paste("COPY citl.courses FROM '", csvfile , "' DELIMITER ',' CSV ;")
dbSendQuery(psql.connection, sql)
}
}
# CLOSE CONNNECTION
dbDisconnect(psql.connection)
I'm not understanding the error I got:
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: could not open file
" Y:/Clickstream/test1/video-2016-04-01_PARSED.csv " for reading: Invalid
argument
)
If I'm understanding correctly, there is an invalid argument in the name of my first file. I'm not very sure about it, but again I am recently using PostgreSQL and this RPostgreSQL in R. Any help will be much appreciated.
Thanks in advance!
Edit: I found the problem, but cannot solve it for some reason. When I copy the path while in the for loop:
# IMPORT USING COPY COMMAND
sql <- paste("COPY citl.courses FROM '",csvfile,"' DELIMITER ',' CSV ;")
I have the following result:
sql
[1] "COPY citl.courses FROM ' Y:/Clickstream/test1/video-2016-04-01_PARSED.csv ' DELIMITER ',' CSV ;"
This means that the invalid argument is the blank space between the file path. I've tried to change this unsuccessfully. Any help will be deeply appreciated!
Try something like this
Files <- list.files("Y:/Clickstream/test1", pattern = "*.csv", full.names = TRUE)
CSVs <- lapply(Files, read.csv)
psql.connection <- dbConnect(PostgreSQL(),
dbname="coursera",
host="127.0.0.1",
user = "postgres",
password="xxxx")
for(i in 1:length(Files)){
dbWriteTable(psql.connection
# schema and table
, c("citl", "courses")
, CSVs[i]
, append = TRUE # add row to bottom
, row.names = FALSE
)
}

R: ORA - 01805 error while connecting to oracle database using R

I am connecting to Oracle database server using R. While i try to fetch the data, I am getting error as
Error in .oci.fetch(res, as.integer(n)) :
ORA-01805: possible error in date/time operation
I dont have oracle skill set, however i connect to the database only for fetching the data.
If you need any other information, I am ready to provide.
Also if you think that i posted this question in an in appropriate tag, kindly point me the right tag. Thank you in advance.
** EDIT **
Posting some more information:
This is the code and table details(masked) that i use to connect oracle from R
drv <- dbDriver("Oracle")
host <- "xyzdbqa"
port <- nnnnn
sid <- "abc1"
connect.string <- paste(
"(DESCRIPTION=",
"(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",
"(CONNECT_DATA=(SID=", sid, ")))", sep = "")
con <- dbConnect(drv, username = "xyz",password = "xyz1",dbname=connect.string)
query.string <- paste ( "SELECT * FROM data_base_Table WHERE VALUE_DATE='10-dec-2015'",
sep = "")
print('Connection Established')
rs_testdata <- dbSendQuery(con,query.string)
print('Query Sent')
test_data <- fetch(rs_testdata)

update table in postgresql database through r

How do I update data in a postgresql db through R with new data?
I've tried
dbGetQuery(con,"UPDATE table SET column1=:1,column2=:2, column3=:3
where id=:4", data=Rdata[,c("column1", "column3", "column3","id")])
I also tried with the colons replaced with $ but that didn't work either. I keep getting:
Error in postgresqlExecStatement(conn, statement, ...) :
unused argument(s)
I figured it out using:
update <- function(i) {
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="db_name", host="localhost", port="5432", user="chris", password="password")
txt <- paste("UPDATE data SET column_one=",data$column_one[i],",column_two=",data$column_two[i]," where id=",data$id[i])
dbGetQuery(con, txt)
dbDisconnect(con)
}
registerDoMC()
foreach(i = 1:length(data$column_one), .inorder=FALSE,.packages="RPostgreSQL")%dopar%{
update(i)
}
At least the RODBC has a specific function sqlUpdate:
sqlUpdate updates the table where the rows already exist. Data frame
dat should contain columns
with names that map to (some of) the columns in the table
See http://cran.r-project.org/web/packages/RODBC/RODBC.pdf

How to insert a dataframe into a SQL Server table?

I'm trying to upload a dataframe to a SQL Server table, I tried breaking it down to a simple SQL query string..
library(RODBC)
con <- odbcDriverConnect("driver=SQL Server; server=database")
df <- data.frame(a=1:10, b=10:1, c=11:20)
values <- paste("(",df$a,",", df$b,",",df$c,")", sep="", collapse=",")
cmd <- paste("insert into MyTable values ", values)
result <- sqlQuery(con, cmd, as.is=TRUE)
..which seems to work but does not scale very well. Is there an easier way?
[edited] Perhaps pasting the names(df) would solve the scaling problem:
values <- paste( " df[ , c(",
paste( names(df),collapse=",") ,
")] ", collapse="" )
values
#[1] " df[ , c( a,b,c )] "
You say your code is "working".. I would also have thought one would use sqlSave rather than sqlQuery if one wanted to "upload".
I would have guessed this would be more likely to do what you described:
sqlSave(con, df, tablename = "MyTable")
This worked for me and I found it to be simpler.
library(sqldf)
library(odbc)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "ServerName",
Database = "DBName",
UID = "UserName",
PWD = "Password")
dbWriteTable(conn = con,
name = "TableName",
value = x) ## x is any data frame
Since insert INTO is limited to 1000 rows, you can dbBulkCopy from rsqlserver package.
dbBulkCopy is a DBI extension that interfaces the Microsoft SQL Server popular command-line utility named bcp to quickly bulk copying large files into table. For example:
url = "Server=localhost;Database=TEST_RSQLSERVER;Trusted_Connection=True;"
conn <- dbConnect('SqlServer',url=url)
## I assume the table already exist
dbBulkCopy(conn,name='T_BULKCOPY',value=df,overwrite=TRUE)
dbDisconnect(conn)

Resources