I want tu insert this data frame to my table datatable...
library(RODBC)
conn <- odbcConnect("CData Azure Source")
sqlTables(conn)
df <- data.frame(a=1:10, b=10:1, c=11:20)
values <- paste("(",df$a,",", df$b,",",df$c,")", sep="", collapse=",")
cmd <- paste("insert into datatable values ", values)
result <- sqlQuery(conn, cmd, as.is=FALSE)
print(result)
But I'm having this problem...
[1] "42000 -1 Malformed SQL Statement: Expected '('. Found: values\r\nStatement:insert into datatable values (1,10,11),(2,9,12),(3,8,13),(4,7,14),(5,6,15),(6,5,16),(7,4,17),(8,3,18),(9,2,19),(10,1,20)"
[2] "[RODBC] ERROR: Could not SQLExecDirect 'insert into datatable values (1,10,11),(2,9,12),(3,8,13),(4,7,14),(5,6,15),(6,5,16),(7,4,17),(8,3,18),(9,2,19),(10,1,20)'"
Can any anybody point where i'm wrong...
Related
As RJDBC is the only package I have been able to make work on Ubuntu, I am trying to use it to INSERT a CSV-file into a database.
I can make the following work:
# Connecting to database
library(RJDBC)
drv <- JDBC('com.microsoft.sqlserver.jdbc.SQLServerDriver', 'drivers/sqljdbc42.jar', identifier.quote="'")
connection_string <- "jdbc:sqlserver://blablaserver;databaseName=testdatabase"
ch <- dbConnect(drv, connection_string, "username", "password")
# Inserting a row
dbSendQuery(ch, "INSERT INTO cpr_esben.CPR000_Startrecord (SORTFELT_10,OPGAVENR,PRODDTO,PRODDTOFORRIG,opretdato) VALUES ('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01')")
The insert works. Next I try to make an INSERT of a CSV-file with the same data, that is separated by the default "tab" and I am working on windows.
# Creating csv
df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
class(df$SORTFELT_10) <- "character"
class(df$OPGAVENR) <- "character"
class(df$PRODDTO) <- "character"
class(df$PRODDTOFORRIG) <- "character"
class(df$opretdato) <- "character"
write.table(df, file = "test.csv", col.names = FALSE, quote = FALSE)
# Inserting CSV to database
dbSendQuery(ch, "INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv'")
Unable to retrieve JDBC result set for INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv' (Incorrect syntax near the keyword 'FROM'.)
Do you have any suggestions to what I am doing wrong, when trying to insert the csv-file? I do not get the Incorrect syntax near the keyword 'FROM' error?
What if you create a statement from your data? Something like:
# Data from your example
df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
class(df$SORTFELT_10) <- "character"
class(df$OPGAVENR) <- "character"
class(df$PRODDTO) <- "character"
class(df$PRODDTOFORRIG) <- "character"
class(df$opretdato) <- "character"
# Formatting rows to insert into SQL statement
rows <- apply(df, 1, function(x){paste0('"', x, '"', collapse = ', ')})
rows <- paste0('(', rows, ')')
# SQL statement
statement <- paste0(
"INSERT INTO cpr_esben.CPR000_Startrecord (",
paste0(colnames(df), collapse = ', '),
')',
' VALUES ',
paste0(rows, collapse = ', ')
)
dbSendQuery(ch, statement)
This should work for any number of rows in your df
RJDBC is built on DBI, which has many useful functions to do tasks like this. What you want is dbWriteTable. Syntax would be:
dbWriteTable(ch, 'cpr_esben.CPR000_Startrecord', df, append = TRUE)
and would replace your write.table line.
I am not that familiar with RJDBC specifically, but I think the issue with your sendQuery is that you are referencing test.csv inside your SQL statement, which does not locate the file that you created with write.table as the scope of that SQL statement is not in your working directory.
Have you tried loading the file directly to the database as below.
library(RJDBC)
drv <- JDBC("connections")
conn <- dbConnect(drv,"...")
query = "LOAD DATA INFILE 'test.csv' INTO TABLE test"
dbSendUpdate(conn, query)
You can also try to include other statements in the end like delimiter for column like "|" for .txt file and "," for csv file.
RJDBC connecting to Hive fine and also reading the data from Hive. But it is not writing data to Hive using --> dbWriteTable.
see below-
options(java.parameters = "-Xmx8g")
library(DBI)
library(rJava)
library(RJDBC)
cp <- c(list.files("/tmp/R_hive_libs/cloudera_hive_jars", pattern = "[.]jar", full.names=TRUE, recursive=TRUE),list.files("/tmp/R_hive_libs/R_hadoop_libs", pattern = "[.]jar", full.names=TRUE, recursive=TRUE),list.files("/tmp/R_hive_libs/R_hadoop_libs/lib", pattern = "[.]jar", full.names=TRUE, recursive=TRUE), recursive=TRUE)
drv <- JDBC(driverClass = "com.cloudera.hive.jdbc4.HS2Driver", classPath=cp)
conn <- dbConnect(drv, "jdbc:hive2://XXXXXX:10000/default", "user", "password")
show_databases <- dbGetQuery(conn, "show databases")
List_of_Tables <- dbListTables(conn)
data1 <- dbGetQuery(conn, "select * from XXX.xxx limit 10000")
data_to_write_back_to_hive <- data.frame(aggregate(data1$xxx.xxx, by=list(Month=data1$xxx.cmp_created_timestamp_month), FUN=sum))
data_to_write_back_to_hive[[2]] <-c(10,20)
colnames(data_to_write_back_to_hive) <- c("Month", "Energy")
dbWriteTable(conn, "xxxx.checking",data_to_write_back_to_hive)
How to write data back to hive? it is giving below error-
Error in .local(conn, statement, ...) : execute JDBC update query failed in dbSendUpdate ([Simba]HiveJDBCDriver ERROR
processing query/statement. Error Code: 40000, SQL state:
TStatus(statusCode:ERROR_STATUS,
infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error
while compiling statement: FAILED: ParseException line 1:36 mismatched
input 'PRECISION' expecting ) near 'DOUBLE' in create table
statement:28:27,
org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:326,
org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:102,
org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:171,
org.apache.hive.service.cli.operation.Operation:run:Operation.java:268,
org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:410,
org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:391,
sun.reflect.GeneratedMethodAccessor56:invoke::-1,
sun.reflect.DelegatingMeth
This question comes up a fair bit. I think the short answer is that you can't do what you want at present. The DBI/JDBC drivers don't metaprogram syntactically correct HiveQL.
I have an SQLite database connection to a database file. I want to extract some data from one of the tables, do some processing in R and then create a temporary table on the same connection from the processed data. It needs to be a temp table because users may not have write access to the database, but I want to be able to query this new data alongside the data already in the database.
so, for example:
require(sqldf)
db <- dbConnect(SQLite(), "tempdb")
dbWriteTable(db, "iris", iris)
# do some processing in R:
d <- dbGetQuery(db, "SELECT Petal_Length, Petal_Width FROM iris;")
names(d) <- c("length_2", "width_2")
d <- exp(d)
and then I want to make a temporary table in the connection db from d
I know I could do:
dbWriteTable(conn=db, name= "iris_proc", value = d)
but I need it in a temp table and there doesn't seem to be an option for this in dbWriteTable.
One workaround I thought of was to add a temp table and then add columns and update them:
dbGetQuery(db, "CREATE TEMP TABLE iris_proc AS SELECT Species FROM iris;")
dbGetQuery(db, "ALTER TABLE iris_proc ADD COLUMN length_2;")
But then I can't get the data from d into the columns:
dbGetQuery(db, paste("UPDATE iris2 SET length_2 =", paste(d$length_2, collapse = ", "), ";"))
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: near "4.05519996684467": syntax error)
I imagine that, even if I get this to work, it will be horribly inefficient.
I thought there might have been some way to do this with read.csv.sql but this does not seem to work with open connection objects.
Use an in-memory database for the temporary table:
library(RSQLite)
db <- dbConnect(SQLite(), "tempdb")
dbWriteTable(db, "iris", iris)
d <- dbGetQuery(db, "SELECT Petal_Length, Petal_Width FROM iris")
d <- exp(d)
dbGetQuery(db, "attach ':memory:' as mem")
dbWriteTable(db, "mem.d", d, row.names = FALSE) # d now in mem database
dbGetQuery(db, "select * from iris limit 3")
dbGetQuery(db, "select * from mem.d limit 3")
dbGetQuery(db, "select * from sqlite_master")
dbGetQuery(db, "select * from mem.sqlite_master")
How do I update data in a postgresql db through R with new data?
I've tried
dbGetQuery(con,"UPDATE table SET column1=:1,column2=:2, column3=:3
where id=:4", data=Rdata[,c("column1", "column3", "column3","id")])
I also tried with the colons replaced with $ but that didn't work either. I keep getting:
Error in postgresqlExecStatement(conn, statement, ...) :
unused argument(s)
I figured it out using:
update <- function(i) {
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="db_name", host="localhost", port="5432", user="chris", password="password")
txt <- paste("UPDATE data SET column_one=",data$column_one[i],",column_two=",data$column_two[i]," where id=",data$id[i])
dbGetQuery(con, txt)
dbDisconnect(con)
}
registerDoMC()
foreach(i = 1:length(data$column_one), .inorder=FALSE,.packages="RPostgreSQL")%dopar%{
update(i)
}
At least the RODBC has a specific function sqlUpdate:
sqlUpdate updates the table where the rows already exist. Data frame
dat should contain columns
with names that map to (some of) the columns in the table
See http://cran.r-project.org/web/packages/RODBC/RODBC.pdf
I'm trying to upload a dataframe to a SQL Server table, I tried breaking it down to a simple SQL query string..
library(RODBC)
con <- odbcDriverConnect("driver=SQL Server; server=database")
df <- data.frame(a=1:10, b=10:1, c=11:20)
values <- paste("(",df$a,",", df$b,",",df$c,")", sep="", collapse=",")
cmd <- paste("insert into MyTable values ", values)
result <- sqlQuery(con, cmd, as.is=TRUE)
..which seems to work but does not scale very well. Is there an easier way?
[edited] Perhaps pasting the names(df) would solve the scaling problem:
values <- paste( " df[ , c(",
paste( names(df),collapse=",") ,
")] ", collapse="" )
values
#[1] " df[ , c( a,b,c )] "
You say your code is "working".. I would also have thought one would use sqlSave rather than sqlQuery if one wanted to "upload".
I would have guessed this would be more likely to do what you described:
sqlSave(con, df, tablename = "MyTable")
This worked for me and I found it to be simpler.
library(sqldf)
library(odbc)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "ServerName",
Database = "DBName",
UID = "UserName",
PWD = "Password")
dbWriteTable(conn = con,
name = "TableName",
value = x) ## x is any data frame
Since insert INTO is limited to 1000 rows, you can dbBulkCopy from rsqlserver package.
dbBulkCopy is a DBI extension that interfaces the Microsoft SQL Server popular command-line utility named bcp to quickly bulk copying large files into table. For example:
url = "Server=localhost;Database=TEST_RSQLSERVER;Trusted_Connection=True;"
conn <- dbConnect('SqlServer',url=url)
## I assume the table already exist
dbBulkCopy(conn,name='T_BULKCOPY',value=df,overwrite=TRUE)
dbDisconnect(conn)