Data truncated with sqlQuery in R with unixodbc from PostgreSQL - r

I'm storing data as very long character strings in a text field in PostgreSQL, but I'm hitting a limit when I retrieve the data. The table is as follows:
CREATE TABLE test
(
a integer,
b text
)
I insert data using R and RODBC with unixodbc configured with MaxLongVarcharSize=256000. As running the code below shows, the data is inserted into the table correctly with no truncation, but extracting the data with sqlQuery truncates the data at 65534 characters.
library(RODBC)
pg <- odbcConnect("pgScarabParallel")
test <- data.frame(a = 1:3,
b = c(
paste(rep("test", 10000), collapse = " "),
paste(rep("test", 15000), collapse = " "),
paste(rep("test", 20000), collapse = " ")
)
)
test$b <- as.character(test$b)
nchar(test$b)
sqlSave(pg, test, append = TRUE, rownames = FALSE)
sqlQuery(pg, "SELECT LENGTH(b) FROM test")[[1]]
test2 <- sqlQuery(pg, "SELECT * FROM test", stringsAsFactors = FALSE)
nchar(test2$b)
The inserted fields are 49999, 74999, and 99999 characters long, but when I query them they are truncated to 49999, 65534, and 65534 respectively.
Is there any way to avoid the truncation? Is there an easy way to find out if this is caused by odbc or R?

Related

Encoding problem when using R packages odbc and DBI [duplicate]

I'm trying to write Unicode strings from R to SQL, and then use that SQL table to power a Power BI dashboard. Unfortunately, the Unicode characters only seem to work when I load the table back into R, and not when I view the table in SSMS or Power BI.
require(odbc)
require(DBI)
require(dplyr)
con <- DBI::dbConnect(odbc::odbc(),
.connection_string = "DRIVER={ODBC Driver 13 for SQL Server};SERVER=R9-0KY02L01\\SQLEXPRESS;Database=Test;trusted_connection=yes;")
testData <- data_frame(Characters = "❤")
dbWriteTable(con,"TestUnicode",testData,overwrite=TRUE)
result <- dbReadTable(con, "TestUnicode")
result$Characters
Successfully yields:
> result$Characters
[1] "❤"
However, when I pull that table in SSMS:
SELECT * FROM TestUnicode
I get two different characters:
Characters
~~~~~~~~~~
â¤
Those characters are also what appear in Power BI. How do I correctly pull the heart character outside of R?
It turns out this is a bug somewhere in R/DBI/the ODBC driver. The issue is that R stores strings as UTF-8 encoded, while SQL Server stores them as UTF-16LE encoded. Also, when dbWriteTable creates a table, it by default creates a VARCHAR column for strings which can't even hold Unicode characters. Thus, you need to both:
Change the column in the R data frame from being a string column to a list column of UTF-16LE raw bytes.
When using dbWriteTable, specify the field type as being NVARCHAR(MAX)
This seems like something that should still be handled by either DBI or ODBC or something though.
require(odbc)
require(DBI)
# This function takes a string vector and turns it into a list of raw UTF-16LE bytes.
# These will be needed to load into SQL Server
convertToUTF16 <- function(s){
lapply(s, function(x) unlist(iconv(x,from="UTF-8",to="UTF-16LE",toRaw=TRUE)))
}
# create a connection to a sql table
connectionString <- "[YOUR CONNECTION STRING]"
con <- DBI::dbConnect(odbc::odbc(),
.connection_string = connectionString)
# our example data
testData <- data.frame(ID = c(1,2,3), Char = c("I", "❤","Apples"), stringsAsFactors=FALSE)
# we adjust the column with the UTF-8 strings to instead be a list column of UTF-16LE bytes
testData$Char <- convertToUTF16(testData$Char)
# write the table to the database, specifying the field type
dbWriteTable(con,
"UnicodeExample",
testData,
append=TRUE,
field.types = c(Char = "NVARCHAR(MAX)"))
dbDisconnect(con)
Inspired by last answer and github: r-dbi/DBI#215: Storing unicode characters in SQL Server
Following field.types = c(Char = "NVARCHAR(MAX)") but with vector and compute of max because of the error dbReadTable/dbGetQuery returns Invalid Descriptor Index .... :
vector_nvarchar<-c(Filter(Negate(is.null),
(
lapply(testData,function(x){
if (is.character(x) ) c(
names(x),
paste0("NVARCHAR(",
max(
# nvarchar(max) gave error dbReadTable/dbGetQuery returns Invalid Descriptor Index error on SQL server
# https://github.com/r-dbi/odbc/issues/112
# so we compute the max
nchar(
iconv( #nchar doesn't work for UTF-8 : help (nchar)
Filter(Negate(is.null),x)
,"UTF-8","ASCII",sub ="x"
)
)
,na.rm = TRUE)
,")"
)
)
})
)
))
con= DBI::dbConnect(odbc::odbc(),.connection_string=xxxxt, encoding = 'UTF-8')
DBI::dbWriteTable(con,"UnicodeExample",testData, overwrite= TRUE, append=FALSE, field.types= vector_nvarchar)
DBI::dbGetQuery(con,iconv('select * from UnicodeExample'))
Inspired by the last answer I also tried to find an automated way for writing data frames to SQL server. I can not confirm the nvarchar(max) errors, so I ended up with these functions:
convertToUTF16_df <- function(df){
output <- cbind(df[sapply(df, typeof) != "character"]
, list.cbind(apply(df[sapply(df, typeof) == "character"], 2, function(x){
return(lapply(x, function(y) unlist(iconv(y, from = "UTF-8", to = "UTF-16LE", toRaw = TRUE))))
}))
)[colnames(df)]
return(output)
}
field_types <- function(df){
output <- list()
output[colnames(df)[sapply(df, typeof) == "character"]] <- "nvarchar(max)"
return(output)
}
DBI::dbWriteTable(odbc_connect
, name = SQL("database.schema.table")
, value = convertToUTF16_df(df)
, overwrite = TRUE
, row.names = FALSE
, field.types = field_types(df)
)
I found the previous answer very useful but ran into problems with character vectors that had another encoding such as 'latin1' instead of UTF-8. This resulted in random NULLs in the database column due to special characters such as non-breaking spaces.
In order to avoid these encoding issues, I've made the following modifications to detect the character vector encoding or otherwise default back to UTF-8 before conversion to UTF-16LE:
library(rlist)
convertToUTF16_df <- function(df){
output <- cbind(df[sapply(df, typeof) != "character"]
, list.cbind(apply(df[sapply(df, typeof) == "character"], 2, function(x){
return(lapply(x, function(y) {
if (Encoding(y)=="unknown") {
unlist(iconv(enc2utf8(y), from = "UTF-8", to = "UTF-16LE", toRaw = TRUE))
} else {
unlist(iconv(y, from = Encoding(y), to = "UTF-16LE", toRaw = TRUE))
}
}))
}))
)[colnames(df)]
return(output)
}
field_types <- function(df){
output <- list()
output[colnames(df)[sapply(df, typeof) == "character"]] <- "nvarchar(max)"
return(output)
}
DBI::dbWriteTable(odbc_connect
, name = SQL("database.schema.table")
, value = convertToUTF16_df(df)
, overwrite = TRUE
, row.names = FALSE
, field.types = field_types(df)
)
Ideally, I'd still modify this to remove the rlist dependency but it seems to work now.
You could consider using the package RODBC instead of odbc/DBI. I've have used RODBC with SQL Server and with Microsoft Access as permanent data storage system. I never had trouble with german umlaut (e.g. Ä, ä, ..., ß)
I wonder if using iconv is an appealing alternative as there seem to boe some '\X00' issues (e.g. https://www.r-bloggers.com/2010/06/more-powerful-iconv-in-r/)
I am posting this answer as an Extension to the top answer, because some people might find it useful.
If you need Unicode strings in SQL statements such as INSERT or UPDATE where you cannot use dbWriteTable(), you can constructing your query with dbBind() like this:
x <- "äöü"
x <- iconv(x, from="UTF-8", to="UTF-16LE", toRaw = TRUE)
q <-
"
update foobar
set umlauts = ?
where id = 1
")
query <- DBI::dbSendStatement(con, q)
DBI::dbBind(query, list(x))
DBI::dbClearResult(query)

R: Insert csv-file into database using RJDBC

As RJDBC is the only package I have been able to make work on Ubuntu, I am trying to use it to INSERT a CSV-file into a database.
I can make the following work:
# Connecting to database
library(RJDBC)
drv <- JDBC('com.microsoft.sqlserver.jdbc.SQLServerDriver', 'drivers/sqljdbc42.jar', identifier.quote="'")
connection_string <- "jdbc:sqlserver://blablaserver;databaseName=testdatabase"
ch <- dbConnect(drv, connection_string, "username", "password")
# Inserting a row
dbSendQuery(ch, "INSERT INTO cpr_esben.CPR000_Startrecord (SORTFELT_10,OPGAVENR,PRODDTO,PRODDTOFORRIG,opretdato) VALUES ('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01')")
The insert works. Next I try to make an INSERT of a CSV-file with the same data, that is separated by the default "tab" and I am working on windows.
# Creating csv
df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
class(df$SORTFELT_10) <- "character"
class(df$OPGAVENR) <- "character"
class(df$PRODDTO) <- "character"
class(df$PRODDTOFORRIG) <- "character"
class(df$opretdato) <- "character"
write.table(df, file = "test.csv", col.names = FALSE, quote = FALSE)
# Inserting CSV to database
dbSendQuery(ch, "INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv'")
Unable to retrieve JDBC result set for INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv' (Incorrect syntax near the keyword 'FROM'.)
Do you have any suggestions to what I am doing wrong, when trying to insert the csv-file? I do not get the Incorrect syntax near the keyword 'FROM' error?
What if you create a statement from your data? Something like:
# Data from your example
df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
class(df$SORTFELT_10) <- "character"
class(df$OPGAVENR) <- "character"
class(df$PRODDTO) <- "character"
class(df$PRODDTOFORRIG) <- "character"
class(df$opretdato) <- "character"
# Formatting rows to insert into SQL statement
rows <- apply(df, 1, function(x){paste0('"', x, '"', collapse = ', ')})
rows <- paste0('(', rows, ')')
# SQL statement
statement <- paste0(
"INSERT INTO cpr_esben.CPR000_Startrecord (",
paste0(colnames(df), collapse = ', '),
')',
' VALUES ',
paste0(rows, collapse = ', ')
)
dbSendQuery(ch, statement)
This should work for any number of rows in your df
RJDBC is built on DBI, which has many useful functions to do tasks like this. What you want is dbWriteTable. Syntax would be:
dbWriteTable(ch, 'cpr_esben.CPR000_Startrecord', df, append = TRUE)
and would replace your write.table line.
I am not that familiar with RJDBC specifically, but I think the issue with your sendQuery is that you are referencing test.csv inside your SQL statement, which does not locate the file that you created with write.table as the scope of that SQL statement is not in your working directory.
Have you tried loading the file directly to the database as below.
library(RJDBC)
drv <- JDBC("connections")
conn <- dbConnect(drv,"...")
query = "LOAD DATA INFILE 'test.csv' INTO TABLE test"
dbSendUpdate(conn, query)
You can also try to include other statements in the end like delimiter for column like "|" for .txt file and "," for csv file.

ROracle dbWriteTable affirms insertion that didnt happen

My goal is to create a table in the database and fill it with data afterwards. This is my code:
library(ROracle)
# ... "con" is the connection string, created in an earlier stage!
# 1 create example
testdata <- data.frame(A = c(1,2,3), B = c(4,5,6))
# 2 create-statement
createTable <- paste0("CREATE TABLE TestTable(", paste(paste(colnames(testdata), c("integer", "integer")), collapse = ","), ")")
# 3 send and execute query
dbGetQuery(con, createTable)
# 4 write example data
dbWriteTable(con, "TestTable", testdata, row.names = TRUE, append = TRUE)
I already suceeded a few times. The table was created and filled.
Now step 4 doesn't work anymore, R returns TRUE after execution of dbWriteTable though. But the table is still empty.
I know this is a vague question, but does anyone have an idea what could be wrong here?
I found the solution for my problem. After creating the table in step 3, you have to commit! After that, the data is written into the table.
library(ROracle)
# ... "con" is the connection string, created in an earlier stage!
# 1 create example
testdata <- data.frame(A = c(1,2,3), B = c(4,5,6))
# 2 create-statement
createTable <- paste0("CREATE TABLE TestTable(", paste(paste(colnames(testdata), c("integer", "integer")), collapse = ","), ")")
# 3 send and execute query
dbGetQuery(con, createTable)
# NEW LINE: COMMIT!
dbCommit(con)
# 4 write example data
dbWriteTable(con, "TestTable", testdata, row.names = TRUE, append = TRUE)

Connecting R To Teradata VOLATILE TABLE

I am using R to try and connect to a teradata database and am running into difficulties
The steps in the process are below
1) Create Connection
2) Create a VOLATILE TABLE
3) Load information from a data frame into the Volatile table
Here is where it fails, giving me an error message
Error in sqlSave(conn, mydata, tablename = "TEMP", rownames = FALSE, :
first argument is not an open RODBC channel
The code is below
# Import Data From Text File and remove duplicates
mydata = read.table("Keys.txt")
mydata.unique = unique(mydata)
strSQL.TempTable = "CREATE VOLATILE TABLE TEMP………[Table Details]"
"UNIQUE PRIMARY INDEX(index)"
"ON COMMIT PRESERVE ROWS;"
# Connect To Database
conn <- tdConnect('Teradata')
# Execute Temp Table
tdQuery(strSQL.TempTable)
sqlSave(conn, mydata, tablename = "TEMP ",rownames = FALSE, append = TRUE)
Can anyone help, Is it closing off the connection before I can upload the information into the Table?
My Mistake, I have been confusing libraries
Basically the lines
# Connect To Database
conn <- tdConnect('Teradata')
# Execute Temp Table
tdQuery(strSQL.TempTable)
sqlSave(conn, mydata, tablename = "TEMP ",rownames = FALSE, append = TRUE)
can all be replaced by this
# Connect To Database
channel <- odbcConnect('Teradata')
# Execute Temp Table
sqlQuery(channel, paste(strSQL.TempTable))
sqlSave(channel, mydata, table = "TEMP",rownames = FALSE, append = TRUE)
Now I'm being told, i don't have access to do this but this is another question for another forum
Thanks

How to insert a dataframe into a SQL Server table?

I'm trying to upload a dataframe to a SQL Server table, I tried breaking it down to a simple SQL query string..
library(RODBC)
con <- odbcDriverConnect("driver=SQL Server; server=database")
df <- data.frame(a=1:10, b=10:1, c=11:20)
values <- paste("(",df$a,",", df$b,",",df$c,")", sep="", collapse=",")
cmd <- paste("insert into MyTable values ", values)
result <- sqlQuery(con, cmd, as.is=TRUE)
..which seems to work but does not scale very well. Is there an easier way?
[edited] Perhaps pasting the names(df) would solve the scaling problem:
values <- paste( " df[ , c(",
paste( names(df),collapse=",") ,
")] ", collapse="" )
values
#[1] " df[ , c( a,b,c )] "
You say your code is "working".. I would also have thought one would use sqlSave rather than sqlQuery if one wanted to "upload".
I would have guessed this would be more likely to do what you described:
sqlSave(con, df, tablename = "MyTable")
This worked for me and I found it to be simpler.
library(sqldf)
library(odbc)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "ServerName",
Database = "DBName",
UID = "UserName",
PWD = "Password")
dbWriteTable(conn = con,
name = "TableName",
value = x) ## x is any data frame
Since insert INTO is limited to 1000 rows, you can dbBulkCopy from rsqlserver package.
dbBulkCopy is a DBI extension that interfaces the Microsoft SQL Server popular command-line utility named bcp to quickly bulk copying large files into table. For example:
url = "Server=localhost;Database=TEST_RSQLSERVER;Trusted_Connection=True;"
conn <- dbConnect('SqlServer',url=url)
## I assume the table already exist
dbBulkCopy(conn,name='T_BULKCOPY',value=df,overwrite=TRUE)
dbDisconnect(conn)

Resources