Unable to insert values to sqlite table - r

I need to create a table with table name having some special characters. I am using RSQLite package. Table name I need to create is port.3.1. It is not possible to create a table with this name. So I changed the table name to [port.3.1] based on What are valid table names in SQLite?.
Now I can create the table, but I can't insert a dataframe to that table.
The code I used is as follows:
createTable <- function(tableName){
c <- c(portDate='varchar(20) not null' ,
ticker='varchar(20)',
quantities='double(20,10)')
lite <- dbDriver("SQLite", max.con = 25)
db <- dbConnect(lite, dbname="sql.db")
if( length(which(strsplit(toString(tableName),'')[[1]]=='.') ) != 0){ tableName = paste("[",tableName,"]",sep="") } #check whether the portfolio contains special characters or not
sql <- dbBuildTableDefinition(db, tableName, NULL, field.types = c, row.names = FALSE)
print(sql)
dbGetQuery(db, sql)
}
datedPf <- data.frame(date=c("2001-01-01","2001-01-01"), ticker=c("a","b"),quantity=c(12,13))
for(port in c("port1","port2","port.3.1")){
createTable(port)
lite <- dbDriver("SQLite", max.con = 25)
db <- dbConnect(lite, dbname="sql.db")
if( length(which(strsplit(toString(port),'')[[1]]=='.') ) != 0){ port = paste("[",port,"]",sep="") } #check whether the portfolio contains special characters or not
dbWriteTable(db,port,datedPf ,append=TRUE,row.names=FALSE)
}
In this example, I can insert data frame to table port1 and port2, but it is not inserting on tables [port.3.1]. What is the reason behind this? How can I solve this problem?

Have a look at the sqliteWriteTable implementation, simply by entering that name and pressing enter. You will notice two things:
[…]
foundTable <- dbExistsTable(con, name)
new.table <- !foundTable
createTable <- (new.table || foundTable && overwrite)
[…]
if (createTable) {
[…]
And looking at showMethods("dbExistsTable", includeDefs=T) output you'll see that it uses dbListTables(conn) which will return the unquoted version of your table name. So if you pass the quoted table name to sqliteWriteTable, then it will incorrectly assume that your table does not exist, try to create it and then encounter an error. If you pass the unquoted table name, the creation statement will be wrong.
I'd consider this a bug in RSQLite. In my opinion, SQL statements the user passes have to be correctly quoted, but everywhere you pass a table name as a separate argument to a function, that table name should be unquoted by default and should get quoted in the SQL statements generated from it. It would be even better if the name were allowed in either quoted or unquoted form, but that's mainly to maximize portability. If you feel like it, you can try contact the developers to report this issue.
You can work around the problem:
setMethod(dbExistsTable, signature(conn="SQLiteConnection", name="character"),
function(conn, name, ...) {
lst <- dbListTables(conn)
lst <- c(lst, paste("[", lst, "]", sep=""))
match(tolower(name), tolower(lst), nomatch = 0) > 0
}
)
This will overwrite the default implementation of dbExistsTable for SQLite connections with a version which checks for both quoted and unquoted table names. After this change, passing "[port.3.1]" as the table name will cause foundTable to be true, so RSQLite won't attempt to create that table for you.

Related

Retain the text values with special character (such as hyphen and space) when updating to PostgreSQL database through R

I want to update a table in PostgreSQL table from a newData dataframe in local through a loop when the id matches in both tables. However, I encountered issues that the text values do not update exactly as our newData to the database. Number is updating correctly but there are 2 issues when updating the text:
1) I have a column for house_nbr and it can be '120-12', but somehow it calculated and updated as '108' which should really be the text '120-12'.
2) I have a column for street_name and it can be 'Main Street', but I received an error that I couldn't resolve.
(Error in { :
task 1 failed - "Failed to prepare query: ERROR: syntax error at or near "Street")
The database table datatype is in char. It seems something is wrong with special character in the text, such as hyphen and space. Please advise how to retain the character text when updating to a Postgre database. Below is the code I am using. Thanks!
Update <- function(i) {
con <- dbConnect(RPostgres::Postgres(),
user="xxx",
password="xxx",
host="xxx",
dbname="xxx",
port=5432)
text <- paste("UPDATE dbTable SET house_nbr=" ,newData$house_nbr[i], ",street_name=",newData$street_name[i], "where id=",newData$id[i])
dbExecute(con, text)
dbDisconnect(con)
}
foreach(i = 1:length(newData$id), .inorder=FALSE,.packages="RPostgreSQL")%dopar%{
Update(i)
}

Unique identifier not recognized by SQL query

I am making a table for users to fill out in Shiny using SQLite. At the end of each session I want to delete all entries containing the unique sessionID:
library(RSQLite)
library(pool)
library(DBI)
#Generates unique token. For example "ce20ca2792c26a702653ce54896fc10a"
sessionID <- session$token
pool <- dbPool(RSQLite::SQLite(), dbname = "db.sqlite")
df <- data.frame( sessionID=character(),
name=character(),
group=character(),
stringsAsFactors = FALSE)
dbWriteTable(pool, "user_data", df, overwrite=FALSE, append=TRUE)
-------------#Code to fill out the table-----------------
At the end of the session I delete the session specific entries using:
dbExecute(pool, sprintf('DELETE FROM "user_data" WHERE "sessionID" == (%s)', sessionID))
I get the following error:
Warning: Error in result_create: no such column: ce20ca2792c26a702653ce54896fc10a
If I replace the session ID with a random generated number for example "4078540723057" the entries are deleted without any problem. Why is the session$token not recognized?
As the sessionId column is text in your SQLite database, SQLite expects the literal value to be surrounded in single quotes. Normally you would use a prepared statement for this, but you may try:
dbExecute(pool, sprintf("DELETE FROM user_data WHERE sessionID = '%s'", sessionID))
Waiving the need to use a prepared statement here may be justified as your script is not open/accessible to the outside.

How to use glue_data_sql to write safe parameterized queries on an SQL server database?

The problem
I want to write a wrapper around some DBI functions that allows safe execution of parameterized queries. I've found the this resource that explains how to use the glue package to insert parameters into an SQL query. However, there seem to be two distinct ways to use the glue package to insert parameters:
Method 1 involves using ? in the sql query where the parameters need to be inserted, and then subsequently using dbBind to fill them in. Example from the link above:
library(glue)
library(DBI)
airport_sql <- glue_sql("SELECT * FROM airports WHERE faa = ?")
airport <- dbSendQuery(con, airport_sql)
dbBind(airport, list("GPT"))
dbFetch(airport)
Method 2 involves using glue_sql or glue_data_sql to fill in the parameters by itself (no use of dbBind). Again an example from the link above:
airport_sql <-
glue_sql(
"SELECT * FROM airports WHERE faa IN ({airports*})",
airports = c("GPT", "MSY"),
.con = con
)
airport <- dbSendQuery(con, airport_sql)
dbFetch(airport)
I would prefer using the second method because this has a lot of extra functionality such as collapsing multiple values for an in statement in the where clause of an sql statement. See second example above for how that works (note the * after the parameter which indicates it must be collapsed). The question is: is this safe against SQL injection? (Are there other things I need to worry about?)
My code
This is currently the code I have for my wrapper.
paramQueryWrapper <- function(
sql,
params = NULL,
dsn = standard_dsn,
login = user_login,
pw = user_pw
){
if(missing(sql) || length(sql) != 1 || !is.character(sql)){
stop("Please provide sql as a character vector of length 1.")
}
if(!is.null(params)){
if(!is.list(params)) stop("params must be a (named) list (or NULL).")
if(length(params) < 1) stop("params must be either NULL, or contain at least one element.")
if(is.null(names(params)) || any(names(params) == "")) stop("All elements in params must be named.")
}
con <- DBI::dbConnect(
odbc::odbc(),
dsn = dsn,
UID = login,
PWD = pw
)
on.exit(DBI::dbDisconnect(con), add = TRUE)
# Replace params with corresponding values and execute query
sql <- glue::glue_data_sql(.x = params, sql, .con = con)
query <- DBI::dbSendQuery(conn = con, sql)
on.exit(DBI::dbClearResult(query), add = TRUE, after = FALSE)
return(tibble::as_tibble(DBI::dbFetch(query)))
}
My question
Is this safe against SQL injection? Especially since I am not using dbBind.
Epilogue
I know that there already exists a wrapper called dbGetQuery that allows parameters (see this question for more info - look for the answer by #krlmlr for an example with a parameterized query). But this again relies on the first method using ? which is much more basic in terms of functionality.

Use Rs mongolite to correctly (insert? update?) add data to existing collection

I have the following function written in R that (I think) is doing a poor job of updating my mongo databases collections.
library(mongolite)
con <- mongolite::mongo(collection = "mongo_collection_1", db = 'mydb', url = 'myurl')
myRdataframe1 <- con$find(query = '{}', fields = '{}')
rm(con)
con <- mongolite::mongo(collection = "mongo_collection_2", db = 'mydb', url = 'myurl')
myRdataframe2 <- con$find(query = '{}', fields = '{}')
rm(con)
... code to update my dataframes (rbind additional rows onto each of them) ...
# write dataframes to database
write.dfs.to.mongodb.collections <- function() {
collections <- c("mongo_collection_1", "mongo_collection_2")
my.dataframes <- c("myRdataframe1", "myRdataframe2")
# loop dataframes, write colllections
for(i in 1:length(collections)) {
# connect and add data to this table
con <- mongo(collection = collections[i], db = 'mydb', url = 'myurl')
con$remove('{}')
con$insert(get(my.dataframes[i]))
con$count()
rm(con)
}
}
write.dfs.to.mongodb.collections()
My dataframes myRdataframe1 and myRdataframe2 are very large dataframes, currently ~100K rows and ~50 columns. Each time my script runs, it:
uses con$find('{}') to pull the mongodb collection into R, saved as a dataframe myRdataframe1
scrapes new data from a data provider that gets appended as new rows to myRdataframe1
uses con$remove() and con$insert to fully remove the data in the mongodb collection, and then re-insert the entire myRdataframe1
This last bullet point is iffy, because I run this R script daily in a cronjob and I don't like that each time I am entirely wiping the mongo db collection and re-inserting the R dataframe to the collection.
If I remove the con$remove() line, I receive an error that states I have duplicate _id keys. It appears I cannot simply append using con$insert().
Any thoughts on this are greatly appreciated!
When you attempt to insert documents into MongoDB that already exist in the database as per their primary key you will get the duplicate key exception. In order to work around that you can simply unset the _id column using something like this before the con$insert:
my.dataframes[i]$_id <- NULL
This way, the newly inserted document will automatically get a new _id assigned.
you can use upsert ( which matches document with the first condition if found it will update it, if not it will insert a new one,
first you need to separate id from each doc
_id= my.dataframes[i]$_id
updateData = my.dataframes[i]
updateData$_id <- NULL
then use upsert ( there might be some easier way to concatenate strings in R)
con$update(paste('{"_id":"', _id, '"}' ,sep="" ) , paste('{"$set":', updateData,'}', sep=""), upsert = TRUE)

Amazon Redshift - table columns declared as varchar(max) but forced as varchar(255)

I'm coding a data extraction tool to load data from Google Search Console (GSC from now on) and store it on an Amazon Redshift (AR from now on) database. I coded a function to parse the elements on the data frame coming from GSC to determine the field structure when creating tables on AR.
This is the R function I created:
get_table_fields <- function (d) {
r <- FALSE
if (is.data.frame(d)) {
r <- vector()
t <- d[1,]
c <- colnames(t)
for (k in c) {
v <- t[, k]
if (is.character(v)) {
r[k] <- "nvarchar(max)"
} else if (!is.na(as.Date(as.character(v), format = c("%Y-%m-%d")))) {
r[k] <- "date"
} else if (is.numeric(v)) {
r[k] <- ifelse(grepl(".", v, fixed = TRUE), "real", "integer")
}
}
}
return(r)
}
So far, so good. I pass the full data frame and the function extracts all relevant information from the first row, giving me the structure needed to create a table on AR.
This is the code I use to extract data from GSC and write it onto AR:
# retrieve the table fields schema
s_fields <- get_table_fields(data)
# compose the table creation definition out of the fields schema
d_fields <- paste(toString(sapply(names(s_fields), function (x) {
return(sprintf('"%s" %s', x, s_fields[x]))
})))
# compose the table creation query
c_query <- sprintf("CREATE TABLE IF NOT EXISTS %s (%s);", t_table_name, d_fields)
if (nrow(data) > 0) {
# create the table if it doesn't exist
dbSendUpdate(db, c_query)
# delete previous saved records for the specified date
dbSendUpdate(db, sprintf("DELETE FROM %s WHERE date = '%s' AND gsc_domain = '%s';", t_table_name, date_range[d], config.gsc.domain))
# upload the Google Search Console (GSC) data to Amazon Redshift (AR)
dbWriteTable(db, t_table_name, data, append = TRUE, row.names = FALSE)
}
db is the database connection object, declated like this:
# initialize the Amazon Redshift JDBC driver
driver <- JDBC("com.amazon.redshift.jdbc42.Driver", "drivers/RedshiftJDBC42-1.2.16.1027.jar", identifier.quote = "`")
# connect to the Amazon Redshift database instance
db <- dbConnect(driver, sprintf("jdbc:redshift://%s:%s/%s?user=%s&password=%s", config.ar.host, config.ar.port, config.ar.database, config.ar.user, config.ar.password))
t_table_name is a concatenated string with the different dimensions in the GSC extraction definition with gsc_by as a prefix and joined with an underscore so, if we wanted to extract date, page and device, the table name would be gsc_by_date_page_device
So, basically, what this code does is gather a data frame from GSC, ensure the table for the specified extraction exists. If not, it creates it. Otherwise, it removes any existing data (in case the extraction is re-launched not to duplicate any entries) and stores it in AR.
The problem is it seems like either the AR database or the JDBC driver from Amazon Redshift is forcing my column definitions as varchar(255) instead of the nvarchar(max) or varchar(max) I'm trying to write. I've tried different combinations but the result is always the same:
<simpleError in .local(conn, statement, ...): execute JDBC update query failed in dbSendUpdate ([Amazon](500310) Invalid operation: Value too long for character type
Details:
-----------------------------------------------
error: Value too long for character type
code: 8001
context: Value too long for type character varying(255)
query: 116225
location: funcs_string.hpp:395
process: padbmaster [pid=29705]
-----------------------------------------------;)>
If I print the c_query variable (the table creation query) before sending the query, it prints out correctly:
CREATE TABLE IF NOT EXISTS gsc_by_date_query_device ("date" date, "query" nvarchar(max), "device" nvarchar(max), "clicks" integer, "impressions" integer, "ctr" real, "position" integer, "gsc_domain" nvarchar(max));
CREATE TABLE IF NOT EXISTS gsc_by_date_query_country_device ("date" date, "query" nvarchar(max), "country" nvarchar(max), "device" nvarchar(max), "countryName" nvarchar(max), "clicks" integer, "impressions" integer, "ctr" real, "position" integer, "gsc_domain" nvarchar(max));
CREATE TABLE IF NOT EXISTS gsc_by_date_page_device ("date" date, "page" nvarchar(max), "device" nvarchar(max), "clicks" integer, "impressions" integer, "ctr" real, "position" real, "gsc_domain" nvarchar(max));
If I execute this on SQLWorkbench/J (the tool I'm using for checking), it creates the table correctly and even with that, what is failing is the data insertion.
Can you give me a hint on what am I doing wrong or how can I specify the text columns as bigger than 256 characters? I'm having a nightmare with this and I think I've tried everything I could.
I've written an extensive blogpost explaining a lot of nuances of reading/writing data to/from Amazon Redshift: https://auth0.com/blog/a-comprehensive-guide-for-connecting-with-r-to-redshift/
In particular, the best way to read data with R is using the RPostgres library, and to write data i recommend using the R Package i created: https://github.com/sicarul/redshiftTools
In particular, it does not have the issue you are reporting, varchars are created based on the length of the strings using function calculateCharSize: https://github.com/sicarul/redshiftTools/blob/master/R/table_definition.R#L2
Though, as a best practice i'd say unless it's a temporary or staging table, try to always create the table yourself, that way you can control sortkeys, distkeys and compression, those are very important for performance in Amazon Redshift.
If you already have created the table, you can do something like:
rs_replace_table(data, dbcon=db, table_name=t_table_name, bucket="mybucket", split_files=4)
If you haven't created the table, you can do practically the same thing with rs_create_table
You'll need an S3 bucket and the AWS keys to access it, since this package uploads to S3 and then directs redshift to that bucket, it's the fastest way to bulk upload data.

Resources