parameterized Sql Query in R not working - r

I have a query in R that extracts data from SQL Server that takes column values as command line arguments. But it does not result in any output.
library(RODBC)
argv <- commandArgs(TRUE)
dbhandle <- odbcDriverConnect('driver={SQL Server};server=<srvr_nm>; database=<db>; trusted_connection=true')
res <- sqlQuery(dbhandle, 'select * from table where col = \'argv[1]\'')
This is how I am calling it
C:\Users\uid>"C:\Program Files\R\R-3.1.0\bin\x64\Rscript.exe" --slave --vanilla "c:\R\script.R" "abc"
(even if I remove the quotation from command line argument while passing that also dies not help)
The output that I get is:
<0 rows> (or 0-length row.names)
When I saw what was getting passed had quotation mark... e.g. "abc"... The value stored in table is abc (without quotation). I tried to remove the quotation with
as.name(argv[1])
but it also did not help...
Then i inserted a value in table with quotation like "abc" (instead of abc)... but still it is not getting selected.
Can you help me in the query.

Try fn$ in the gsubfn package:
library(gsubfn)
argv <- gsub('"', '', commandArgs(TRUE)) # remove double quotes
# ...
res <- fn$sqlQuery(dbhandle, "select * from table where col = '`argv[1]`' ")
or replace last line with:
argv1 <- argv[1]
res <- fn$sqlQuery(dbhandle, "select * from table where col = '$argv1' ")

Unless sqlQuery has some special internal query parsing, the string 'argv[1]' will not evaluate to the value of argv[1] but remain unparsed.
Try any of these instead
res <- sqlQuery(dbhandle,
paste('select * from table where col = \'', argv[1], '\'', sep=""))
res <- sqlQuery(dbhandle,
sprintf('select * from table where col = \'%s\'', argv[1]))

Related

Encoding problem when using R packages odbc and DBI [duplicate]

I'm trying to write Unicode strings from R to SQL, and then use that SQL table to power a Power BI dashboard. Unfortunately, the Unicode characters only seem to work when I load the table back into R, and not when I view the table in SSMS or Power BI.
require(odbc)
require(DBI)
require(dplyr)
con <- DBI::dbConnect(odbc::odbc(),
.connection_string = "DRIVER={ODBC Driver 13 for SQL Server};SERVER=R9-0KY02L01\\SQLEXPRESS;Database=Test;trusted_connection=yes;")
testData <- data_frame(Characters = "❤")
dbWriteTable(con,"TestUnicode",testData,overwrite=TRUE)
result <- dbReadTable(con, "TestUnicode")
result$Characters
Successfully yields:
> result$Characters
[1] "❤"
However, when I pull that table in SSMS:
SELECT * FROM TestUnicode
I get two different characters:
Characters
~~~~~~~~~~
â¤
Those characters are also what appear in Power BI. How do I correctly pull the heart character outside of R?
It turns out this is a bug somewhere in R/DBI/the ODBC driver. The issue is that R stores strings as UTF-8 encoded, while SQL Server stores them as UTF-16LE encoded. Also, when dbWriteTable creates a table, it by default creates a VARCHAR column for strings which can't even hold Unicode characters. Thus, you need to both:
Change the column in the R data frame from being a string column to a list column of UTF-16LE raw bytes.
When using dbWriteTable, specify the field type as being NVARCHAR(MAX)
This seems like something that should still be handled by either DBI or ODBC or something though.
require(odbc)
require(DBI)
# This function takes a string vector and turns it into a list of raw UTF-16LE bytes.
# These will be needed to load into SQL Server
convertToUTF16 <- function(s){
lapply(s, function(x) unlist(iconv(x,from="UTF-8",to="UTF-16LE",toRaw=TRUE)))
}
# create a connection to a sql table
connectionString <- "[YOUR CONNECTION STRING]"
con <- DBI::dbConnect(odbc::odbc(),
.connection_string = connectionString)
# our example data
testData <- data.frame(ID = c(1,2,3), Char = c("I", "❤","Apples"), stringsAsFactors=FALSE)
# we adjust the column with the UTF-8 strings to instead be a list column of UTF-16LE bytes
testData$Char <- convertToUTF16(testData$Char)
# write the table to the database, specifying the field type
dbWriteTable(con,
"UnicodeExample",
testData,
append=TRUE,
field.types = c(Char = "NVARCHAR(MAX)"))
dbDisconnect(con)
Inspired by last answer and github: r-dbi/DBI#215: Storing unicode characters in SQL Server
Following field.types = c(Char = "NVARCHAR(MAX)") but with vector and compute of max because of the error dbReadTable/dbGetQuery returns Invalid Descriptor Index .... :
vector_nvarchar<-c(Filter(Negate(is.null),
(
lapply(testData,function(x){
if (is.character(x) ) c(
names(x),
paste0("NVARCHAR(",
max(
# nvarchar(max) gave error dbReadTable/dbGetQuery returns Invalid Descriptor Index error on SQL server
# https://github.com/r-dbi/odbc/issues/112
# so we compute the max
nchar(
iconv( #nchar doesn't work for UTF-8 : help (nchar)
Filter(Negate(is.null),x)
,"UTF-8","ASCII",sub ="x"
)
)
,na.rm = TRUE)
,")"
)
)
})
)
))
con= DBI::dbConnect(odbc::odbc(),.connection_string=xxxxt, encoding = 'UTF-8')
DBI::dbWriteTable(con,"UnicodeExample",testData, overwrite= TRUE, append=FALSE, field.types= vector_nvarchar)
DBI::dbGetQuery(con,iconv('select * from UnicodeExample'))
Inspired by the last answer I also tried to find an automated way for writing data frames to SQL server. I can not confirm the nvarchar(max) errors, so I ended up with these functions:
convertToUTF16_df <- function(df){
output <- cbind(df[sapply(df, typeof) != "character"]
, list.cbind(apply(df[sapply(df, typeof) == "character"], 2, function(x){
return(lapply(x, function(y) unlist(iconv(y, from = "UTF-8", to = "UTF-16LE", toRaw = TRUE))))
}))
)[colnames(df)]
return(output)
}
field_types <- function(df){
output <- list()
output[colnames(df)[sapply(df, typeof) == "character"]] <- "nvarchar(max)"
return(output)
}
DBI::dbWriteTable(odbc_connect
, name = SQL("database.schema.table")
, value = convertToUTF16_df(df)
, overwrite = TRUE
, row.names = FALSE
, field.types = field_types(df)
)
I found the previous answer very useful but ran into problems with character vectors that had another encoding such as 'latin1' instead of UTF-8. This resulted in random NULLs in the database column due to special characters such as non-breaking spaces.
In order to avoid these encoding issues, I've made the following modifications to detect the character vector encoding or otherwise default back to UTF-8 before conversion to UTF-16LE:
library(rlist)
convertToUTF16_df <- function(df){
output <- cbind(df[sapply(df, typeof) != "character"]
, list.cbind(apply(df[sapply(df, typeof) == "character"], 2, function(x){
return(lapply(x, function(y) {
if (Encoding(y)=="unknown") {
unlist(iconv(enc2utf8(y), from = "UTF-8", to = "UTF-16LE", toRaw = TRUE))
} else {
unlist(iconv(y, from = Encoding(y), to = "UTF-16LE", toRaw = TRUE))
}
}))
}))
)[colnames(df)]
return(output)
}
field_types <- function(df){
output <- list()
output[colnames(df)[sapply(df, typeof) == "character"]] <- "nvarchar(max)"
return(output)
}
DBI::dbWriteTable(odbc_connect
, name = SQL("database.schema.table")
, value = convertToUTF16_df(df)
, overwrite = TRUE
, row.names = FALSE
, field.types = field_types(df)
)
Ideally, I'd still modify this to remove the rlist dependency but it seems to work now.
You could consider using the package RODBC instead of odbc/DBI. I've have used RODBC with SQL Server and with Microsoft Access as permanent data storage system. I never had trouble with german umlaut (e.g. Ä, ä, ..., ß)
I wonder if using iconv is an appealing alternative as there seem to boe some '\X00' issues (e.g. https://www.r-bloggers.com/2010/06/more-powerful-iconv-in-r/)
I am posting this answer as an Extension to the top answer, because some people might find it useful.
If you need Unicode strings in SQL statements such as INSERT or UPDATE where you cannot use dbWriteTable(), you can constructing your query with dbBind() like this:
x <- "äöü"
x <- iconv(x, from="UTF-8", to="UTF-16LE", toRaw = TRUE)
q <-
"
update foobar
set umlauts = ?
where id = 1
")
query <- DBI::dbSendStatement(con, q)
DBI::dbBind(query, list(x))
DBI::dbClearResult(query)

filepath from concatenated string in R

I'm getting a "no such file or directory" error when using the file() function in R when using a concatenated string as the path argument.
folder <- "Trades"
account <- "333000"
symbol <- "EURUSD"
date <- "2016.09.09"
filepath <- sprintf("%s/%s %s %s alpha count.bin",folder, account, symbol, date)
count <- file('filepath', 'rb')
If I simply write out the full file path as the argument I get no such errors:
count <- file('Trades/333000 EURUSD 2016.09.09 alpha count.bin', 'rb')
I inspected the filepath in the first code example and the output is the same by comparison:
countstring <- "Trades/333000 EURUSD 2016.09.09 alpha count.bin"
countstring == filepath
output: TRUE
I can see that if I use the dplyr library and pipe the concatenated string to the file(), then it works.
library(dplyr)
folder <- "Trades"
account <- "333000"
symbol <- "EURUSD"
date <- "2016.09.09"
filepath <- sprintf("%s/%s %s %s alpha",folder, account, symbol, date)
count <- paste(filepath, "count.bin") %>% file('rb')
I feel like I am misunderstanding a fundamental concept in R in regards to string manipulation.
I am new to R and just learning. Please help me understand, thank you!!!

Paste variable in RMariaDB dbGetQuery 'where clause' [1054]

Having issues with pasting a variable into the query string for RMariaDB. I can return a query without paste and find the proper where statement I am looking for within the dataframe I query (ex. MIN). When I try to use a variable in the query it fails. I have searched stackoverflow up and down and read the dbgetquery docs but nothing seems to be working. I am sure it is something simple, just can't seem to find it.
library(RMariaDB)
team <- "MIN"
# This returns entire database with MIN in tm column.
filename <- dbGetQuery(conn, "select * from nhl_lab_lines_today")
# These will all give me a [1054] error.
test <- paste("select * from nhl_lab_lines_today WHERE tm = ",paste(team,collapse=", "),sep ="")
test <- paste("select * from nhl_lab_lines_today WHERE tm = team")
test <- paste("select * from nhl_lab_lines_today WHERE tm =", team,sep=" ")
filename <- dbGetQuery(conn, test)
dbGetQuery(con, paste0("select * from nhl_lab_lines_today WHERE tm = '", team ,"'"))

R: Insert csv-file into database using RJDBC

As RJDBC is the only package I have been able to make work on Ubuntu, I am trying to use it to INSERT a CSV-file into a database.
I can make the following work:
# Connecting to database
library(RJDBC)
drv <- JDBC('com.microsoft.sqlserver.jdbc.SQLServerDriver', 'drivers/sqljdbc42.jar', identifier.quote="'")
connection_string <- "jdbc:sqlserver://blablaserver;databaseName=testdatabase"
ch <- dbConnect(drv, connection_string, "username", "password")
# Inserting a row
dbSendQuery(ch, "INSERT INTO cpr_esben.CPR000_Startrecord (SORTFELT_10,OPGAVENR,PRODDTO,PRODDTOFORRIG,opretdato) VALUES ('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01')")
The insert works. Next I try to make an INSERT of a CSV-file with the same data, that is separated by the default "tab" and I am working on windows.
# Creating csv
df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
class(df$SORTFELT_10) <- "character"
class(df$OPGAVENR) <- "character"
class(df$PRODDTO) <- "character"
class(df$PRODDTOFORRIG) <- "character"
class(df$opretdato) <- "character"
write.table(df, file = "test.csv", col.names = FALSE, quote = FALSE)
# Inserting CSV to database
dbSendQuery(ch, "INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv'")
Unable to retrieve JDBC result set for INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv' (Incorrect syntax near the keyword 'FROM'.)
Do you have any suggestions to what I am doing wrong, when trying to insert the csv-file? I do not get the Incorrect syntax near the keyword 'FROM' error?
What if you create a statement from your data? Something like:
# Data from your example
df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
class(df$SORTFELT_10) <- "character"
class(df$OPGAVENR) <- "character"
class(df$PRODDTO) <- "character"
class(df$PRODDTOFORRIG) <- "character"
class(df$opretdato) <- "character"
# Formatting rows to insert into SQL statement
rows <- apply(df, 1, function(x){paste0('"', x, '"', collapse = ', ')})
rows <- paste0('(', rows, ')')
# SQL statement
statement <- paste0(
"INSERT INTO cpr_esben.CPR000_Startrecord (",
paste0(colnames(df), collapse = ', '),
')',
' VALUES ',
paste0(rows, collapse = ', ')
)
dbSendQuery(ch, statement)
This should work for any number of rows in your df
RJDBC is built on DBI, which has many useful functions to do tasks like this. What you want is dbWriteTable. Syntax would be:
dbWriteTable(ch, 'cpr_esben.CPR000_Startrecord', df, append = TRUE)
and would replace your write.table line.
I am not that familiar with RJDBC specifically, but I think the issue with your sendQuery is that you are referencing test.csv inside your SQL statement, which does not locate the file that you created with write.table as the scope of that SQL statement is not in your working directory.
Have you tried loading the file directly to the database as below.
library(RJDBC)
drv <- JDBC("connections")
conn <- dbConnect(drv,"...")
query = "LOAD DATA INFILE 'test.csv' INTO TABLE test"
dbSendUpdate(conn, query)
You can also try to include other statements in the end like delimiter for column like "|" for .txt file and "," for csv file.

Warning: Error in result_create: no such column: tmp

I'm running the database query by using sqldf in Shiny in R. But getting error.
ui.R:
observeEvent (input$uploadForTest_1, {
inFile=input$uploadForTest_1
inFileName=input$uploadForTest_1$name
file <-"tss.txt"
tmp = paste("audio/street", inFileName, sep = "/")
res <- read.csv.sql(file,header=FALSE,sql = "select * from file where V1=tmp",sep="\t")
print(res)
})
I'm successfully running the following query:
res <- read.csv.sql(file,header=FALSE,sql = "select * from file where V1='audio/street/b098.wav'",sep="\t")
But, if I run the query that is mentioned in ui.R it is giving me error that tmp column doesn't exists:
Warning: Error in result_create: no such column: tmp 86:
I dont want to use string in my query. I want to use variable name. Because I don't want to hard code string in query. Can I use variable name in query instead of string. If yes, then how can I do this? I didn't find solution to my problem in Internet. Thanks.
Preface read.csv.sql with fn$, and use '$tmp' in the SQL statement.
fn$read.csv.sql(file, sql = "select * from file where V1 = '$tmp'",
header = FALSE, sep = "\t")
See ?fn and the gsubfn vignette for more info. Note that sqldf automatically loads the gsubfn package so it will already be available.
You could use sprintf. Another option would be to paste together a string, but I find sprintf far more elegant for this task.
> tmp <- "audio/street/somefile.txt"
> tmp <- "audio/street/somefile.txt"
> "select * from file where V1=tmp"
[1] "select * from file where V1=tmp"
> sprintf("select * from file where V1='%s'", tmp)
[1] "select * from file where V1='audio/street/somefile.txt'"

Resources