How do I update data in a postgresql db through R with new data?
I've tried
dbGetQuery(con,"UPDATE table SET column1=:1,column2=:2, column3=:3
where id=:4", data=Rdata[,c("column1", "column3", "column3","id")])
I also tried with the colons replaced with $ but that didn't work either. I keep getting:
Error in postgresqlExecStatement(conn, statement, ...) :
unused argument(s)
I figured it out using:
update <- function(i) {
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="db_name", host="localhost", port="5432", user="chris", password="password")
txt <- paste("UPDATE data SET column_one=",data$column_one[i],",column_two=",data$column_two[i]," where id=",data$id[i])
dbGetQuery(con, txt)
dbDisconnect(con)
}
registerDoMC()
foreach(i = 1:length(data$column_one), .inorder=FALSE,.packages="RPostgreSQL")%dopar%{
update(i)
}
At least the RODBC has a specific function sqlUpdate:
sqlUpdate updates the table where the rows already exist. Data frame
dat should contain columns
with names that map to (some of) the columns in the table
See http://cran.r-project.org/web/packages/RODBC/RODBC.pdf
Related
I have ~ 250 csv files I want to load into SQLite db. I've loaded all the csv into my global environment as data frames. I'm using the following function to copy all of them to db but get Error: df must be local dataframe or a remote tbl_sql
library(DBI)
library(odbc)
library(rstudioapi)
library(tidyverse)
library(dbplyr)
library(RSQLite)
library(dm)
# Create DB Instance ---------------------------------------------
my_db <- dbConnect(RSQLite::SQLite(), "test_db.sqlite", create = TRUE)
# Load all csv files ---------------------------------------------
filenames <- list.files(pattern = ".*csv")
names <- substr(filenames, 1, nchar(filenames)-4)
for (i in names) {
filepath <- file.path(paste(i, ".csv", sep = ""))
assign(i, read.csv(filepath, sep = ","))
}
# Get list of data.frames ----------------------------------------
tables <- as.data.frame(sapply(mget(ls(), .GlobalEnv), is.data.frame))
colnames(tables) <- "is_data_frame"
tables <- tables %>%
filter(is_data_frame == "TRUE")
table_list <- row.names(tables)
# Copy dataframes to db ------------------------------------------
for (j in table_list) {
copy_to(my_db, j)
}
I have had mixed success using copy_to. I recommend the dbWriteTable command from the DBI package. Example code below:
DBI::dbWriteTable(
db_connection,
DBI::Id(
catalog = db_name,
schema = schema_name,
table = table_name
),
r_table_name
)
This would replace your copy_to command. You will need to provide a string to name the table, but the database and schema names are likely optional and can probably be omitted.
I am trying to read excel files using odbcConnectExcel2007 function in R from RODBC package. While reading individual file, it's working. But when I am trying to run using for loop function, it's throwing following error
3 stop(sQuote(tablename), ": table not found on channel")
2 odbcTableExists(channel, sqtable)
1 sqlFetch(conn1, sqlTables(conn1)$TABLE_NAME[1])
Below is the code:-
file_list <- list.files("./Raw Data")
file_list
for (i in 1:length(file_list)){
conn1 = odbcConnectExcel2007(paste0("./Raw Data/",file_list[i])) # open a connection to the Excel file
sqlTables(conn1)$TABLE_NAME
data=sqlFetch(conn1, sqlTables(conn1)$TABLE_NAME[1])
close(conn1)
data <- data[,c("Branch","Custome","Category","Sub Category","SKU"
"Weight","Order Type","Invoice Date")]
if(i==1) alldata=data else{
alldata = rbind(alldata,data)
}
}
I would appreciate any kind of help.
Thanks in advance.
I think it's getting messed up with the table name having quotes returned from the sqlTables(conn1)$TABLE_NAME object. Try manipulating the table name by removing the quotes. Something like this:
table <- sqlTables(conn1)$TABLE_NAME
table <- noquote(table)
table <- gsub("\'", "", table)
And then just do:
data=sqlFetch(conn1, table)
RJDBC connecting to Hive fine and also reading the data from Hive. But it is not writing data to Hive using --> dbWriteTable.
see below-
options(java.parameters = "-Xmx8g")
library(DBI)
library(rJava)
library(RJDBC)
cp <- c(list.files("/tmp/R_hive_libs/cloudera_hive_jars", pattern = "[.]jar", full.names=TRUE, recursive=TRUE),list.files("/tmp/R_hive_libs/R_hadoop_libs", pattern = "[.]jar", full.names=TRUE, recursive=TRUE),list.files("/tmp/R_hive_libs/R_hadoop_libs/lib", pattern = "[.]jar", full.names=TRUE, recursive=TRUE), recursive=TRUE)
drv <- JDBC(driverClass = "com.cloudera.hive.jdbc4.HS2Driver", classPath=cp)
conn <- dbConnect(drv, "jdbc:hive2://XXXXXX:10000/default", "user", "password")
show_databases <- dbGetQuery(conn, "show databases")
List_of_Tables <- dbListTables(conn)
data1 <- dbGetQuery(conn, "select * from XXX.xxx limit 10000")
data_to_write_back_to_hive <- data.frame(aggregate(data1$xxx.xxx, by=list(Month=data1$xxx.cmp_created_timestamp_month), FUN=sum))
data_to_write_back_to_hive[[2]] <-c(10,20)
colnames(data_to_write_back_to_hive) <- c("Month", "Energy")
dbWriteTable(conn, "xxxx.checking",data_to_write_back_to_hive)
How to write data back to hive? it is giving below error-
Error in .local(conn, statement, ...) : execute JDBC update query failed in dbSendUpdate ([Simba]HiveJDBCDriver ERROR
processing query/statement. Error Code: 40000, SQL state:
TStatus(statusCode:ERROR_STATUS,
infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error
while compiling statement: FAILED: ParseException line 1:36 mismatched
input 'PRECISION' expecting ) near 'DOUBLE' in create table
statement:28:27,
org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:326,
org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:102,
org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:171,
org.apache.hive.service.cli.operation.Operation:run:Operation.java:268,
org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:410,
org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:391,
sun.reflect.GeneratedMethodAccessor56:invoke::-1,
sun.reflect.DelegatingMeth
This question comes up a fair bit. I think the short answer is that you can't do what you want at present. The DBI/JDBC drivers don't metaprogram syntactically correct HiveQL.
I have an SQLite database connection to a database file. I want to extract some data from one of the tables, do some processing in R and then create a temporary table on the same connection from the processed data. It needs to be a temp table because users may not have write access to the database, but I want to be able to query this new data alongside the data already in the database.
so, for example:
require(sqldf)
db <- dbConnect(SQLite(), "tempdb")
dbWriteTable(db, "iris", iris)
# do some processing in R:
d <- dbGetQuery(db, "SELECT Petal_Length, Petal_Width FROM iris;")
names(d) <- c("length_2", "width_2")
d <- exp(d)
and then I want to make a temporary table in the connection db from d
I know I could do:
dbWriteTable(conn=db, name= "iris_proc", value = d)
but I need it in a temp table and there doesn't seem to be an option for this in dbWriteTable.
One workaround I thought of was to add a temp table and then add columns and update them:
dbGetQuery(db, "CREATE TEMP TABLE iris_proc AS SELECT Species FROM iris;")
dbGetQuery(db, "ALTER TABLE iris_proc ADD COLUMN length_2;")
But then I can't get the data from d into the columns:
dbGetQuery(db, paste("UPDATE iris2 SET length_2 =", paste(d$length_2, collapse = ", "), ";"))
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: near "4.05519996684467": syntax error)
I imagine that, even if I get this to work, it will be horribly inefficient.
I thought there might have been some way to do this with read.csv.sql but this does not seem to work with open connection objects.
Use an in-memory database for the temporary table:
library(RSQLite)
db <- dbConnect(SQLite(), "tempdb")
dbWriteTable(db, "iris", iris)
d <- dbGetQuery(db, "SELECT Petal_Length, Petal_Width FROM iris")
d <- exp(d)
dbGetQuery(db, "attach ':memory:' as mem")
dbWriteTable(db, "mem.d", d, row.names = FALSE) # d now in mem database
dbGetQuery(db, "select * from iris limit 3")
dbGetQuery(db, "select * from mem.d limit 3")
dbGetQuery(db, "select * from sqlite_master")
dbGetQuery(db, "select * from mem.sqlite_master")
I'm trying to upload a dataframe to a SQL Server table, I tried breaking it down to a simple SQL query string..
library(RODBC)
con <- odbcDriverConnect("driver=SQL Server; server=database")
df <- data.frame(a=1:10, b=10:1, c=11:20)
values <- paste("(",df$a,",", df$b,",",df$c,")", sep="", collapse=",")
cmd <- paste("insert into MyTable values ", values)
result <- sqlQuery(con, cmd, as.is=TRUE)
..which seems to work but does not scale very well. Is there an easier way?
[edited] Perhaps pasting the names(df) would solve the scaling problem:
values <- paste( " df[ , c(",
paste( names(df),collapse=",") ,
")] ", collapse="" )
values
#[1] " df[ , c( a,b,c )] "
You say your code is "working".. I would also have thought one would use sqlSave rather than sqlQuery if one wanted to "upload".
I would have guessed this would be more likely to do what you described:
sqlSave(con, df, tablename = "MyTable")
This worked for me and I found it to be simpler.
library(sqldf)
library(odbc)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "ServerName",
Database = "DBName",
UID = "UserName",
PWD = "Password")
dbWriteTable(conn = con,
name = "TableName",
value = x) ## x is any data frame
Since insert INTO is limited to 1000 rows, you can dbBulkCopy from rsqlserver package.
dbBulkCopy is a DBI extension that interfaces the Microsoft SQL Server popular command-line utility named bcp to quickly bulk copying large files into table. For example:
url = "Server=localhost;Database=TEST_RSQLSERVER;Trusted_Connection=True;"
conn <- dbConnect('SqlServer',url=url)
## I assume the table already exist
dbBulkCopy(conn,name='T_BULKCOPY',value=df,overwrite=TRUE)
dbDisconnect(conn)