R: Insert csv-file into database using RJDBC - r

As RJDBC is the only package I have been able to make work on Ubuntu, I am trying to use it to INSERT a CSV-file into a database.
I can make the following work:
# Connecting to database
library(RJDBC)
drv <- JDBC('com.microsoft.sqlserver.jdbc.SQLServerDriver', 'drivers/sqljdbc42.jar', identifier.quote="'")
connection_string <- "jdbc:sqlserver://blablaserver;databaseName=testdatabase"
ch <- dbConnect(drv, connection_string, "username", "password")
# Inserting a row
dbSendQuery(ch, "INSERT INTO cpr_esben.CPR000_Startrecord (SORTFELT_10,OPGAVENR,PRODDTO,PRODDTOFORRIG,opretdato) VALUES ('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01')")
The insert works. Next I try to make an INSERT of a CSV-file with the same data, that is separated by the default "tab" and I am working on windows.
# Creating csv
df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
class(df$SORTFELT_10) <- "character"
class(df$OPGAVENR) <- "character"
class(df$PRODDTO) <- "character"
class(df$PRODDTOFORRIG) <- "character"
class(df$opretdato) <- "character"
write.table(df, file = "test.csv", col.names = FALSE, quote = FALSE)
# Inserting CSV to database
dbSendQuery(ch, "INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv'")
Unable to retrieve JDBC result set for INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv' (Incorrect syntax near the keyword 'FROM'.)
Do you have any suggestions to what I am doing wrong, when trying to insert the csv-file? I do not get the Incorrect syntax near the keyword 'FROM' error?

What if you create a statement from your data? Something like:
# Data from your example
df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
class(df$SORTFELT_10) <- "character"
class(df$OPGAVENR) <- "character"
class(df$PRODDTO) <- "character"
class(df$PRODDTOFORRIG) <- "character"
class(df$opretdato) <- "character"
# Formatting rows to insert into SQL statement
rows <- apply(df, 1, function(x){paste0('"', x, '"', collapse = ', ')})
rows <- paste0('(', rows, ')')
# SQL statement
statement <- paste0(
"INSERT INTO cpr_esben.CPR000_Startrecord (",
paste0(colnames(df), collapse = ', '),
')',
' VALUES ',
paste0(rows, collapse = ', ')
)
dbSendQuery(ch, statement)
This should work for any number of rows in your df

RJDBC is built on DBI, which has many useful functions to do tasks like this. What you want is dbWriteTable. Syntax would be:
dbWriteTable(ch, 'cpr_esben.CPR000_Startrecord', df, append = TRUE)
and would replace your write.table line.
I am not that familiar with RJDBC specifically, but I think the issue with your sendQuery is that you are referencing test.csv inside your SQL statement, which does not locate the file that you created with write.table as the scope of that SQL statement is not in your working directory.

Have you tried loading the file directly to the database as below.
library(RJDBC)
drv <- JDBC("connections")
conn <- dbConnect(drv,"...")
query = "LOAD DATA INFILE 'test.csv' INTO TABLE test"
dbSendUpdate(conn, query)
You can also try to include other statements in the end like delimiter for column like "|" for .txt file and "," for csv file.

Related

Encoding problem when using R packages odbc and DBI [duplicate]

I'm trying to write Unicode strings from R to SQL, and then use that SQL table to power a Power BI dashboard. Unfortunately, the Unicode characters only seem to work when I load the table back into R, and not when I view the table in SSMS or Power BI.
require(odbc)
require(DBI)
require(dplyr)
con <- DBI::dbConnect(odbc::odbc(),
.connection_string = "DRIVER={ODBC Driver 13 for SQL Server};SERVER=R9-0KY02L01\\SQLEXPRESS;Database=Test;trusted_connection=yes;")
testData <- data_frame(Characters = "❤")
dbWriteTable(con,"TestUnicode",testData,overwrite=TRUE)
result <- dbReadTable(con, "TestUnicode")
result$Characters
Successfully yields:
> result$Characters
[1] "❤"
However, when I pull that table in SSMS:
SELECT * FROM TestUnicode
I get two different characters:
Characters
~~~~~~~~~~
â¤
Those characters are also what appear in Power BI. How do I correctly pull the heart character outside of R?
It turns out this is a bug somewhere in R/DBI/the ODBC driver. The issue is that R stores strings as UTF-8 encoded, while SQL Server stores them as UTF-16LE encoded. Also, when dbWriteTable creates a table, it by default creates a VARCHAR column for strings which can't even hold Unicode characters. Thus, you need to both:
Change the column in the R data frame from being a string column to a list column of UTF-16LE raw bytes.
When using dbWriteTable, specify the field type as being NVARCHAR(MAX)
This seems like something that should still be handled by either DBI or ODBC or something though.
require(odbc)
require(DBI)
# This function takes a string vector and turns it into a list of raw UTF-16LE bytes.
# These will be needed to load into SQL Server
convertToUTF16 <- function(s){
lapply(s, function(x) unlist(iconv(x,from="UTF-8",to="UTF-16LE",toRaw=TRUE)))
}
# create a connection to a sql table
connectionString <- "[YOUR CONNECTION STRING]"
con <- DBI::dbConnect(odbc::odbc(),
.connection_string = connectionString)
# our example data
testData <- data.frame(ID = c(1,2,3), Char = c("I", "❤","Apples"), stringsAsFactors=FALSE)
# we adjust the column with the UTF-8 strings to instead be a list column of UTF-16LE bytes
testData$Char <- convertToUTF16(testData$Char)
# write the table to the database, specifying the field type
dbWriteTable(con,
"UnicodeExample",
testData,
append=TRUE,
field.types = c(Char = "NVARCHAR(MAX)"))
dbDisconnect(con)
Inspired by last answer and github: r-dbi/DBI#215: Storing unicode characters in SQL Server
Following field.types = c(Char = "NVARCHAR(MAX)") but with vector and compute of max because of the error dbReadTable/dbGetQuery returns Invalid Descriptor Index .... :
vector_nvarchar<-c(Filter(Negate(is.null),
(
lapply(testData,function(x){
if (is.character(x) ) c(
names(x),
paste0("NVARCHAR(",
max(
# nvarchar(max) gave error dbReadTable/dbGetQuery returns Invalid Descriptor Index error on SQL server
# https://github.com/r-dbi/odbc/issues/112
# so we compute the max
nchar(
iconv( #nchar doesn't work for UTF-8 : help (nchar)
Filter(Negate(is.null),x)
,"UTF-8","ASCII",sub ="x"
)
)
,na.rm = TRUE)
,")"
)
)
})
)
))
con= DBI::dbConnect(odbc::odbc(),.connection_string=xxxxt, encoding = 'UTF-8')
DBI::dbWriteTable(con,"UnicodeExample",testData, overwrite= TRUE, append=FALSE, field.types= vector_nvarchar)
DBI::dbGetQuery(con,iconv('select * from UnicodeExample'))
Inspired by the last answer I also tried to find an automated way for writing data frames to SQL server. I can not confirm the nvarchar(max) errors, so I ended up with these functions:
convertToUTF16_df <- function(df){
output <- cbind(df[sapply(df, typeof) != "character"]
, list.cbind(apply(df[sapply(df, typeof) == "character"], 2, function(x){
return(lapply(x, function(y) unlist(iconv(y, from = "UTF-8", to = "UTF-16LE", toRaw = TRUE))))
}))
)[colnames(df)]
return(output)
}
field_types <- function(df){
output <- list()
output[colnames(df)[sapply(df, typeof) == "character"]] <- "nvarchar(max)"
return(output)
}
DBI::dbWriteTable(odbc_connect
, name = SQL("database.schema.table")
, value = convertToUTF16_df(df)
, overwrite = TRUE
, row.names = FALSE
, field.types = field_types(df)
)
I found the previous answer very useful but ran into problems with character vectors that had another encoding such as 'latin1' instead of UTF-8. This resulted in random NULLs in the database column due to special characters such as non-breaking spaces.
In order to avoid these encoding issues, I've made the following modifications to detect the character vector encoding or otherwise default back to UTF-8 before conversion to UTF-16LE:
library(rlist)
convertToUTF16_df <- function(df){
output <- cbind(df[sapply(df, typeof) != "character"]
, list.cbind(apply(df[sapply(df, typeof) == "character"], 2, function(x){
return(lapply(x, function(y) {
if (Encoding(y)=="unknown") {
unlist(iconv(enc2utf8(y), from = "UTF-8", to = "UTF-16LE", toRaw = TRUE))
} else {
unlist(iconv(y, from = Encoding(y), to = "UTF-16LE", toRaw = TRUE))
}
}))
}))
)[colnames(df)]
return(output)
}
field_types <- function(df){
output <- list()
output[colnames(df)[sapply(df, typeof) == "character"]] <- "nvarchar(max)"
return(output)
}
DBI::dbWriteTable(odbc_connect
, name = SQL("database.schema.table")
, value = convertToUTF16_df(df)
, overwrite = TRUE
, row.names = FALSE
, field.types = field_types(df)
)
Ideally, I'd still modify this to remove the rlist dependency but it seems to work now.
You could consider using the package RODBC instead of odbc/DBI. I've have used RODBC with SQL Server and with Microsoft Access as permanent data storage system. I never had trouble with german umlaut (e.g. Ä, ä, ..., ß)
I wonder if using iconv is an appealing alternative as there seem to boe some '\X00' issues (e.g. https://www.r-bloggers.com/2010/06/more-powerful-iconv-in-r/)
I am posting this answer as an Extension to the top answer, because some people might find it useful.
If you need Unicode strings in SQL statements such as INSERT or UPDATE where you cannot use dbWriteTable(), you can constructing your query with dbBind() like this:
x <- "äöü"
x <- iconv(x, from="UTF-8", to="UTF-16LE", toRaw = TRUE)
q <-
"
update foobar
set umlauts = ?
where id = 1
")
query <- DBI::dbSendStatement(con, q)
DBI::dbBind(query, list(x))
DBI::dbClearResult(query)

Copy Multiple data frames to SQLite db in R

I have ~ 250 csv files I want to load into SQLite db. I've loaded all the csv into my global environment as data frames. I'm using the following function to copy all of them to db but get Error: df must be local dataframe or a remote tbl_sql
library(DBI)
library(odbc)
library(rstudioapi)
library(tidyverse)
library(dbplyr)
library(RSQLite)
library(dm)
# Create DB Instance ---------------------------------------------
my_db <- dbConnect(RSQLite::SQLite(), "test_db.sqlite", create = TRUE)
# Load all csv files ---------------------------------------------
filenames <- list.files(pattern = ".*csv")
names <- substr(filenames, 1, nchar(filenames)-4)
for (i in names) {
filepath <- file.path(paste(i, ".csv", sep = ""))
assign(i, read.csv(filepath, sep = ","))
}
# Get list of data.frames ----------------------------------------
tables <- as.data.frame(sapply(mget(ls(), .GlobalEnv), is.data.frame))
colnames(tables) <- "is_data_frame"
tables <- tables %>%
filter(is_data_frame == "TRUE")
table_list <- row.names(tables)
# Copy dataframes to db ------------------------------------------
for (j in table_list) {
copy_to(my_db, j)
}
I have had mixed success using copy_to. I recommend the dbWriteTable command from the DBI package. Example code below:
DBI::dbWriteTable(
db_connection,
DBI::Id(
catalog = db_name,
schema = schema_name,
table = table_name
),
r_table_name
)
This would replace your copy_to command. You will need to provide a string to name the table, but the database and schema names are likely optional and can probably be omitted.

Data truncated with sqlQuery in R with unixodbc from PostgreSQL

I'm storing data as very long character strings in a text field in PostgreSQL, but I'm hitting a limit when I retrieve the data. The table is as follows:
CREATE TABLE test
(
a integer,
b text
)
I insert data using R and RODBC with unixodbc configured with MaxLongVarcharSize=256000. As running the code below shows, the data is inserted into the table correctly with no truncation, but extracting the data with sqlQuery truncates the data at 65534 characters.
library(RODBC)
pg <- odbcConnect("pgScarabParallel")
test <- data.frame(a = 1:3,
b = c(
paste(rep("test", 10000), collapse = " "),
paste(rep("test", 15000), collapse = " "),
paste(rep("test", 20000), collapse = " ")
)
)
test$b <- as.character(test$b)
nchar(test$b)
sqlSave(pg, test, append = TRUE, rownames = FALSE)
sqlQuery(pg, "SELECT LENGTH(b) FROM test")[[1]]
test2 <- sqlQuery(pg, "SELECT * FROM test", stringsAsFactors = FALSE)
nchar(test2$b)
The inserted fields are 49999, 74999, and 99999 characters long, but when I query them they are truncated to 49999, 65534, and 65534 respectively.
Is there any way to avoid the truncation? Is there an easy way to find out if this is caused by odbc or R?

Escape single quote in R variables

I have a table with names in one column. I have an R script to read this table and then do a write.table to a CSV file for further processing. The script barfs when writing my table if it encounters a name with an apostrophe (single quote) character such as "O'Reilly" in the matrix
library(RCurl)
library(RJSONIO)
dir <- "C:/Users/rob/Data"
setwd(dir)
filename <- "employees.csv"
url <- "https://obscured/employees.html"
html <- getURL(url, ssl.verifypeer = FALSE)
initdata <- gsub("^.*?emp.allEployeeData = (.*?);.*", "\\1", html)
initdata <- gsub("'", '"', initdata)
data <- fromJSON( initdata )
table <- list()
for(i in seq_along(data))
{
job <- data[[i]][[1]]
name <- data[[i]][[2]]
age <- data[[i]][[6]]
sex <- data[[i]][[7]]
m <- matrix(nrow = 1, ncol = 4)
colnames(m) <- c("job", "name", "age", "sex")
m[1, ] <- c(job, name, age, sex)
table[[i]] <- as.data.frame(m)
write.table(table[[i]],file = filename,append = TRUE,sep = ",",col.names = FALSE,row.names = FALSE)
}
When I encounter O'Reilly, the error I am receiving is:
Error in m[1, ] <- c(job, name, age, sex) :
number of items to replace is not a multiple of replacement length
I end up with a csv file that includes data for all employees before O'Reilly is encountered. My Googling revealed people trying to add quotes to strings or parse strings already containing escape characters.
Is there a way to escape or remove single quotes inside my data?
I was replacing single quotes with double quotes in line 11, which I don't need to do in this data set. So it wasn't a single quote in a name messing things up, it was replacing that single quote with a double messing things up.
Removed this line:
initdata <- gsub("'", '"', initdata)

How to insert a dataframe into a SQL Server table?

I'm trying to upload a dataframe to a SQL Server table, I tried breaking it down to a simple SQL query string..
library(RODBC)
con <- odbcDriverConnect("driver=SQL Server; server=database")
df <- data.frame(a=1:10, b=10:1, c=11:20)
values <- paste("(",df$a,",", df$b,",",df$c,")", sep="", collapse=",")
cmd <- paste("insert into MyTable values ", values)
result <- sqlQuery(con, cmd, as.is=TRUE)
..which seems to work but does not scale very well. Is there an easier way?
[edited] Perhaps pasting the names(df) would solve the scaling problem:
values <- paste( " df[ , c(",
paste( names(df),collapse=",") ,
")] ", collapse="" )
values
#[1] " df[ , c( a,b,c )] "
You say your code is "working".. I would also have thought one would use sqlSave rather than sqlQuery if one wanted to "upload".
I would have guessed this would be more likely to do what you described:
sqlSave(con, df, tablename = "MyTable")
This worked for me and I found it to be simpler.
library(sqldf)
library(odbc)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "ServerName",
Database = "DBName",
UID = "UserName",
PWD = "Password")
dbWriteTable(conn = con,
name = "TableName",
value = x) ## x is any data frame
Since insert INTO is limited to 1000 rows, you can dbBulkCopy from rsqlserver package.
dbBulkCopy is a DBI extension that interfaces the Microsoft SQL Server popular command-line utility named bcp to quickly bulk copying large files into table. For example:
url = "Server=localhost;Database=TEST_RSQLSERVER;Trusted_Connection=True;"
conn <- dbConnect('SqlServer',url=url)
## I assume the table already exist
dbBulkCopy(conn,name='T_BULKCOPY',value=df,overwrite=TRUE)
dbDisconnect(conn)

Resources