I'm in the process of learning R, to wave SAS goodbye, I'm still new to this and I somehow have difficulties finding exactly what I'm looking for.
But for this specific case, I read:
Pass R variable to RODBC's sqlQuery?
and made it work for myself, as long as I'm only inserting one variable in the destination table.
Here is my code:
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
paste("insert into TestTable (UniqueID) Values (",b,")", sep = "")
When I replace the top 1 by any other number, let's say top 2, and run the exact same code, I get the following errors:
[1] "42000 195 [Microsoft][SQL Server Native Client 10.0][SQL Server]
'c' is not a recognized built-in function name."
[2] "[RODBC] ERROR: Could not SQLExecDirect
'insert into TestTable (UniqueID) Values (c(8535735, 8449336))'"
I understand that it is because there is an extra c that is generated, I assume for column when I give the command: paste(b).
So how can I get "8535735, 8449336" instead of "c(8535735, 8449336)" when using paste(b)? Or is there another way to do this?
Look into the collapse argument in the paste() documentation. Try replacing b with paste(b, collapse = ", "), as shown below.
Edit As Joshua points out, sqlQuery returns a data.frame, not a vector. So, instead of paste(b, collapse = ", "), you could use paste(b[[1]], collapse = ", ").
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
## note paste(b[[1]], collapse = ", ") in line below
paste("insert into TestTable (UniqueID) Values (", paste(b[[1]], collapse = ", "),")", sep = "")
Assuming b looks like this:
b <- data.frame(Noinscr=c("8535735", "8449336"))
Then you only need a couple steps:
# in case Noinscr is a factor
b$Noinscr <- as.character(b$Noinscr)
# convert the vector into a single string
# NOTE that I subset to get the vector, since b is a data.frame
B <- paste(b$Noinscr, collapse=",")
# create your query
paste("insert into TestTable (UniqueID) Values (",B,")", sep="")
# [1] "insert into TestTable (UniqueID) Values (8535735,8449336)"
You got odd results because sqlQuery returns a data.frame, not a vector. As you learned, using paste on a data.frame (or any list) can provide weird results because paste must return a character vector.
Related
I am currently building a (large) survey and need to send the responses people provide to a database. I have set up my database connection using the pool and RMariaDB packages, and I have written the following function to construct the SQL queries and submit my data (the data is secured with SSL certificates and all this information is passed through the list db_config).
save_db <- function (db_pool, x, db_name, db_config, replace_val) {
# Construct the DB query to be sent to the database
if (!replace_val) {
query <- sprintf(
"INSERT INTO %s (%s) VALUES ('%s')",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = "', '")
)
} else {
query <- sprintf(
"UPDATE %s SET %s WHERE %s;",
db_name,
paste(paste0(names(x)[-1], " = \'", x[-1], "\'"), collapse = ", "),
paste0(names(x)[1], " = \'", x[1], "\'")
)
}
# Submit the insert query to the database via the opened connection
RMariaDB::dbExecute(db_pool, query)
}
db_poolis the pool object handling my database connections; x is a named vector with the data that I am sending to the database, where the names corresponds to the column names of my MariaDB and the values are stored as data blobs; db_name is the name of my database; replace_val a boolean.
The data blobs are essentially different output objects from the survey, e.g. vectors or matrices of responses, turned into character strings using the toJSON() from the jsonlite package.
So far, so good. I am able to send data to the database, download it and reconstruct the responses using the fromJSON() command. All is good. However, I do have one security concern. In my survey, I do have a few open-ended questions where people can write what they want. While unlikely, I am concerned that someone might use a SQL injection attack. Worst case scenario, I lose all my data.
I know of the sqlInterpolate() function from the DBI package. From my understanding, the function escapes any quotation marks, meaning that any value submitted will be turned into a safe string.
What I have not been able to do is modify my function above to work with sqlInterpolate. In my case x is a named vector of length seven where each vector element is a JSON string. Essentially, I need to use sqlInterpolate() on each of the JSON strings. I was wondering if there is an "easy" way of doing this, or if my best course of action would be to completely rewrite my function to send seven individual deposits to the DB, i.e. one for each vector element.
A rather simplified example would be something like this:
library(jsonlite)
# Create some data to test the string on
y <- 1:3
z <- matrix(runif(4), 2, 2)
q <- c("one", "don't")
x <- c(toJSON(y), toJSON(z), toJSON(q))
names(x) <- c("var_1", "var_2", "var_3")
db_name <- "my_db"
# Current sprintf() statement
sprintf(
"INSERT INTO %s (%s) VALUES ('%s')",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = "', '")
)
And what I would need to interpolate is the values captured by ('%s') in the sprintf() statement (and similarly for the update query). Because I am fairly certain that just turning everything into a JSON string would sanitize my DB input?
Any help would be much appreciated.
Having spent several hours trying and failing at this today, I believe I managed to find a work around. I have done some testing and it appears to be working. I am posting an answer to my own question in case someone has a similar problem at a different time.
My updated function now looks like this:
save_db <- function (db_pool, x, db_name, db_config, replace_val) {
# Interpolate the elements of x
x <- do.call(c, lapply(x, function(y) {
sql <- "?value"
sqlInterpolate(db_pool, sql, value = y)
}))
# Construct the DB query to be sent to the database
if (!replace_val) {
query <- sprintf(
"INSERT INTO %s (%s) VALUES (%s)",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = ", ")
)
} else {
query <- sprintf(
"UPDATE %s SET %s WHERE %s;",
db_name,
paste(paste0(names(x)[-1], " = ", x[-1]), collapse = ", "),
paste0(names(x)[1], " = ", x[1])
)
}
# Submit the insert query to the database via the opened connection
RMariaDB::dbExecute(db_pool, query)
}
It appears that the key was to only use the interpolation on the actual JSON string itself, like so:
x <- do.call(c, lapply(x, function(y) {
sql <- "?value"
sqlInterpolate(db_pool, sql, value = y)
}))
And the rest of the function can be used as is. To see this, let's use the example I provided in my original question:
y <- 1:3
z <- matrix(runif(4), 2, 2)
q <- c("one", "don't")
x <- c(toJSON(y), toJSON(z), toJSON(q))
names(x) <- c("var_1", "var_2", "var_3")
db_name <- "my_db"
# Current sprintf() statement
sprintf(
"INSERT INTO %s (%s) VALUES ('%s')",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = "', '")
)
Which yields the output:
"INSERT INTO my_db (var_1, var_2, var_3) VALUES ('[1,2,3]', '[[0.6573,0.1726],[0.3291,0.9903]]', '[\"one\",\"don't\"]')"
If I now transform my x as above and use the updated sprintf() call (Note that the extra single quotation marks are removed):
x <- do.call(c, lapply(x, function(y) {
sql <- "?value"
sqlInterpolate(ANSI(), sql, value = y)
}))
sprintf(
"INSERT INTO %s (%s) VALUES (%s)",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = ", ")
)
I will get:
"INSERT INTO my_db (var_1, var_2, var_3) VALUES ('[1,2,3]', '[[0.6573,0.1726],[0.3291,0.9903]]', '[\"one\",\"don''t\"]')"
And we see that the single quotation mark in don't is correctly quoted out. If I have missed something crucial in my own solution, please feel free to comment on it.
I am very new to R, so please forgive any obvious or naive errors. I need to insert multiple rows of data from R into an Oracle database table.
Make the data frame (I have made the RJDBC connection earlier in the script):
df <- data.frame("field_1" = 1:2, "field_2" = c("f","k"), "field_3"= c("j","t"))
This code runs without error, but inserts only the first row into the table:
insert <- sprintf("insert into temp_r_test_u_suck values (%s')",
apply(df, 1, function(i) gsub(" ", "", paste("'", i, collapse="',"), fixed = TRUE)))
dbSendUpdate(con, insert)
This code runs:
insert <- sprintf("into temp_r_test_u_suck values (%s')",
apply(df, 1, function(i) gsub(" ", "", paste("'", i, collapse="',"), fixed = TRUE)))
insert_all <- c("insert all", insert, "select * from dual")
dbSendUpdate(con, insert_all)
But gives me this error:
Error in .local(conn, statement, ...) :
execute JDBC update query failed in dbSendUpdate (ORA-00905: missing keyword
Both of the queries work on their own in Oracle. WHAT am I doing wrong?
Thank you!
Multiple SQL statements are not supported in dbGetQuery, dbSendQuery, dbSendUpdate calls. You need to iterate through them for each statement. Hence, why only the first statement processes. To resolve, extend the anonymous function inside apply to call dbSendUpdate:
apply(df, 1, function(i) {
# BUILD SQL STATEMENT
insert <- sprintf("insert into temp_r_test_u_suck values (%s')",
paste0("'", i, collapse="',"))
# RUN QUERY
dbSendUpdate(con, insert)
})
However, RJDBC extends the DBI standard by supporting parameterization with dbSendUpdate as mentioned in rForge docs for bulk-inserts with no need for iteratively concatenating strings.
dbSendUpdate(conn, statement, ...) This function is analogous to
dbSendQuery, but works with DBML statements and thus doesn't return a
result set. It is more efficient than dbSendQuery. In addition, as of
RJDBC 0.2-9 it supports vectors in prepared statements which allows
bulk-inserts.
# ALL CHARACTER DATAFRAME
df <- data.frame(field_1=as.character(1:2), field_2=c("f","k"), field_3=c("j","t"),
stringsAsFactors=FALSE)
# PREPARED STATEMENT
sql <- "insert into temp_r_test_u_suck values (?, ?, ?)"
# RUN QUERY
dbSendUpdate(con, sql, df$field_1, df$field_2, df$field_3)
I'm using a RJDBC connection to query results from a vertica database into R. I'm creating a comma separated vector of zip codes that I'm then pasting into my query as shown below.
b <- paste("'20882'", "'01441'", "'20860'", "'02139'", sep = ", ")
SQL <- paste("select zip, count(*)
from tablea a
inner join tableb b on a.id = b.id
inner join tablec c on c.col = b.col
where b.zip in (",b'', ") group by 1 order by 1", '', sep = " ")
result <- dbGetQuery(vertica, SQL)
I'm using this in a loop within a function in which I'm going to be adding zip codes to vector b. I was wondering if there was a way to easily do this?
I've been trying, but I'm unable to add items to vector in a way where the query would execute.
Something like the following
b <- c(add_zip, b)
which could then be re-run in the body of the query.
Any suggestions?
Thanks,
Ben
I would like to take the values from a data frame and paste them into a text string that can be used as a sql query. In SAS I would do it
proc sql noprint; Select Names into :names separated by ", " from df; quit;
this would create a variable &name storing all the names. Like: Id, Name, Account. I would like to do this same type of thing in R, but do not know how. I can create a vector with names separated by comma and each one is surrounded by quotes, I can take away the quotes using noquote function and have them in a vector, but I can not get the elements in another paste statement to add the "SELECT" and FROM. I can not get it to all paste. Is there a way to pull the values on Column and create a text string that can be used as a SQL query inside R? Here is what I have tried in R:
name = c("Id", "IsDeleted", "Name", "Credit__Loan__c")
label = c("Record Id", "Deleted", "ID", "Loan")
df = data.frame(name, label)
names(df) <- c("name", "label")
as.query.fields = noquote(paste(df$name, collaspe=", "))
as.query.final <- paste("SELECT " , noquote(paste(df$name, collaspe=", ")), " id FROM Credit_Amortization_Schedule__c")
data(iris)
colnames(iris)
a <- noquote(paste(colnames(iris), collaspe=", "))
as.query.final <- cat("SELECT " , a, " id FROM Credit_Amortization_Schedule__c")
The result is:
SELECT Sepal.Length , Sepal.Width , Petal.Length , Petal.Width , Species , id FROM Credit_Amortization_Schedule__c
which you can then use with SQL like this:
require(RODBC)
result <- sqlQuery(db, as.query.final)
where db is your database connection
Or, since I see your sqldf tag now, if you want to use sqldf it's just:
sqldf(as.query.final)
The gsubfn package supports string interpolation:
library(gsubfn)
Names <- toString( sprintf("%s '%s'", df$name, df$label) )
fn$identity("select $Names from myTable")
giving:
[1] "select Id 'Record Id', IsDeleted 'Deleted', Name 'ID', Credit__Loan__c 'Loan' from myTable"
Here some additional examples: SO example 1 and SO example 2 .
Im trying to add data to MySQL table by using RMySQL. I only need to add one row at a time and it's not working. What I'm trying to do is this.
dbGetQuery(con,"INSERT INTO names VALUES(data[1,1], data[1,2])")
so what I'm doing is that I have values in data frame that is named as "data" and I need to put them into mysql table. before that I will check them if they are already in the table or not and if they are not then I will add them, but that way it isn't working. The data is read from .csv file by read.csv .
You can use paste to construct that actual query.
dat <- matrix(1:4, 2, 2)
query <- paste("INSERT INTO names VALUES(",data[1,1], ",", data[1,2], ")")
query
#[1] "INSERT INTO names VALUES( 1 , 3 )"
dbGetQuery(con, query)
# If there are a lot of columns this could be tedious...
# So we could also use paste to add all the values at once.
query <- paste("INSERT INTO names VALUES(", paste(data[1,], collapse = ", "), ")")
query
#[1] "INSERT INTO names VALUES( 1, 3 )"
You could try with:
dbWriteTable(names, data[1,],append=True)
as the DBI package details