I would like to take the values from a data frame and paste them into a text string that can be used as a sql query. In SAS I would do it
proc sql noprint; Select Names into :names separated by ", " from df; quit;
this would create a variable &name storing all the names. Like: Id, Name, Account. I would like to do this same type of thing in R, but do not know how. I can create a vector with names separated by comma and each one is surrounded by quotes, I can take away the quotes using noquote function and have them in a vector, but I can not get the elements in another paste statement to add the "SELECT" and FROM. I can not get it to all paste. Is there a way to pull the values on Column and create a text string that can be used as a SQL query inside R? Here is what I have tried in R:
name = c("Id", "IsDeleted", "Name", "Credit__Loan__c")
label = c("Record Id", "Deleted", "ID", "Loan")
df = data.frame(name, label)
names(df) <- c("name", "label")
as.query.fields = noquote(paste(df$name, collaspe=", "))
as.query.final <- paste("SELECT " , noquote(paste(df$name, collaspe=", ")), " id FROM Credit_Amortization_Schedule__c")
data(iris)
colnames(iris)
a <- noquote(paste(colnames(iris), collaspe=", "))
as.query.final <- cat("SELECT " , a, " id FROM Credit_Amortization_Schedule__c")
The result is:
SELECT Sepal.Length , Sepal.Width , Petal.Length , Petal.Width , Species , id FROM Credit_Amortization_Schedule__c
which you can then use with SQL like this:
require(RODBC)
result <- sqlQuery(db, as.query.final)
where db is your database connection
Or, since I see your sqldf tag now, if you want to use sqldf it's just:
sqldf(as.query.final)
The gsubfn package supports string interpolation:
library(gsubfn)
Names <- toString( sprintf("%s '%s'", df$name, df$label) )
fn$identity("select $Names from myTable")
giving:
[1] "select Id 'Record Id', IsDeleted 'Deleted', Name 'ID', Credit__Loan__c 'Loan' from myTable"
Here some additional examples: SO example 1 and SO example 2 .
Related
I have a vector with the following data
ids <- list(memberPersonID = c("2892056", "2894545", "2894546", "2894548",
"2894550", "2894551", "2894553", "2894555", "2894556"))
I need to convert use this in a string such that it's output like the following:
select * from data where id in ('2892056', '2894545', '2894546', '2894548', '2894550', '2894551', '2894553', '2894555', '2894556')
so, I did the following:
ids <- paste0(ids, collapse="','")
query <- paste0("select * from data where id in (", ids, ")")
I keep getting the following error:
select * from teams where teamID = '3246492' and assessmentStatus = 'C' and memberPersonID in (c("2892056", "2894544", "2894545", "2894546", "2894547", "2894548", "2894550", "2894553", "2894555", "2894556"))
<Rcpp::exception: no such function: c>
How do I get rid of 'c(' from the vector so that it looks like the one shown above in the desired output?
You could paste the memberPersonID value into one comma-separated string.
sprintf("select * from data where id in (%s)", paste0(ids$memberPersonID, collapse = ','))
#[1] "select * from data where id in (2892056,2894545,2894546,2894548,2894550,2894551,2894553,2894555,2894556)"
If you want the ids to be surrounded by quotes :
sprintf("select * from data where id in ('%s')", paste0(ids$memberPersonID, collapse = "','"))
#[1] "select * from data where id in ('2892056','2894545','2894546','2894548','2894550','2894551','2894553','2894555','2894556')"
Because ids is a list, you need to use it like this:
ids <- paste0(ids$memberPersonID, collapse="','")
And you need extra ' before and after ids.
query <- paste0("select * from data where id in ('", ids, "')")
I am currently building a (large) survey and need to send the responses people provide to a database. I have set up my database connection using the pool and RMariaDB packages, and I have written the following function to construct the SQL queries and submit my data (the data is secured with SSL certificates and all this information is passed through the list db_config).
save_db <- function (db_pool, x, db_name, db_config, replace_val) {
# Construct the DB query to be sent to the database
if (!replace_val) {
query <- sprintf(
"INSERT INTO %s (%s) VALUES ('%s')",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = "', '")
)
} else {
query <- sprintf(
"UPDATE %s SET %s WHERE %s;",
db_name,
paste(paste0(names(x)[-1], " = \'", x[-1], "\'"), collapse = ", "),
paste0(names(x)[1], " = \'", x[1], "\'")
)
}
# Submit the insert query to the database via the opened connection
RMariaDB::dbExecute(db_pool, query)
}
db_poolis the pool object handling my database connections; x is a named vector with the data that I am sending to the database, where the names corresponds to the column names of my MariaDB and the values are stored as data blobs; db_name is the name of my database; replace_val a boolean.
The data blobs are essentially different output objects from the survey, e.g. vectors or matrices of responses, turned into character strings using the toJSON() from the jsonlite package.
So far, so good. I am able to send data to the database, download it and reconstruct the responses using the fromJSON() command. All is good. However, I do have one security concern. In my survey, I do have a few open-ended questions where people can write what they want. While unlikely, I am concerned that someone might use a SQL injection attack. Worst case scenario, I lose all my data.
I know of the sqlInterpolate() function from the DBI package. From my understanding, the function escapes any quotation marks, meaning that any value submitted will be turned into a safe string.
What I have not been able to do is modify my function above to work with sqlInterpolate. In my case x is a named vector of length seven where each vector element is a JSON string. Essentially, I need to use sqlInterpolate() on each of the JSON strings. I was wondering if there is an "easy" way of doing this, or if my best course of action would be to completely rewrite my function to send seven individual deposits to the DB, i.e. one for each vector element.
A rather simplified example would be something like this:
library(jsonlite)
# Create some data to test the string on
y <- 1:3
z <- matrix(runif(4), 2, 2)
q <- c("one", "don't")
x <- c(toJSON(y), toJSON(z), toJSON(q))
names(x) <- c("var_1", "var_2", "var_3")
db_name <- "my_db"
# Current sprintf() statement
sprintf(
"INSERT INTO %s (%s) VALUES ('%s')",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = "', '")
)
And what I would need to interpolate is the values captured by ('%s') in the sprintf() statement (and similarly for the update query). Because I am fairly certain that just turning everything into a JSON string would sanitize my DB input?
Any help would be much appreciated.
Having spent several hours trying and failing at this today, I believe I managed to find a work around. I have done some testing and it appears to be working. I am posting an answer to my own question in case someone has a similar problem at a different time.
My updated function now looks like this:
save_db <- function (db_pool, x, db_name, db_config, replace_val) {
# Interpolate the elements of x
x <- do.call(c, lapply(x, function(y) {
sql <- "?value"
sqlInterpolate(db_pool, sql, value = y)
}))
# Construct the DB query to be sent to the database
if (!replace_val) {
query <- sprintf(
"INSERT INTO %s (%s) VALUES (%s)",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = ", ")
)
} else {
query <- sprintf(
"UPDATE %s SET %s WHERE %s;",
db_name,
paste(paste0(names(x)[-1], " = ", x[-1]), collapse = ", "),
paste0(names(x)[1], " = ", x[1])
)
}
# Submit the insert query to the database via the opened connection
RMariaDB::dbExecute(db_pool, query)
}
It appears that the key was to only use the interpolation on the actual JSON string itself, like so:
x <- do.call(c, lapply(x, function(y) {
sql <- "?value"
sqlInterpolate(db_pool, sql, value = y)
}))
And the rest of the function can be used as is. To see this, let's use the example I provided in my original question:
y <- 1:3
z <- matrix(runif(4), 2, 2)
q <- c("one", "don't")
x <- c(toJSON(y), toJSON(z), toJSON(q))
names(x) <- c("var_1", "var_2", "var_3")
db_name <- "my_db"
# Current sprintf() statement
sprintf(
"INSERT INTO %s (%s) VALUES ('%s')",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = "', '")
)
Which yields the output:
"INSERT INTO my_db (var_1, var_2, var_3) VALUES ('[1,2,3]', '[[0.6573,0.1726],[0.3291,0.9903]]', '[\"one\",\"don't\"]')"
If I now transform my x as above and use the updated sprintf() call (Note that the extra single quotation marks are removed):
x <- do.call(c, lapply(x, function(y) {
sql <- "?value"
sqlInterpolate(ANSI(), sql, value = y)
}))
sprintf(
"INSERT INTO %s (%s) VALUES (%s)",
db_name,
paste(names(x), collapse = ", "),
paste(x, collapse = ", ")
)
I will get:
"INSERT INTO my_db (var_1, var_2, var_3) VALUES ('[1,2,3]', '[[0.6573,0.1726],[0.3291,0.9903]]', '[\"one\",\"don''t\"]')"
And we see that the single quotation mark in don't is correctly quoted out. If I have missed something crucial in my own solution, please feel free to comment on it.
Basically I am trying to derive the WHERE part of a SELECT statement by unlisting and paste-ing a list where the list names represent the database TABLE Columns and the respective list values equal the parameters for the WHERE clause. Here is a simplified example ...
lst <- list(DATE=as.Date('2015-10-25'), NUM="0001", PROD="SOMETHING")
lst
$DATE
[1] "2015-10-25"
$NUM
[1] "0001"
$PROD
[1] "SOMETHING"
This would ideally be transformed into (the interesting bit starting in the second line after the WHERE):
"SELECT SOME_COLUMNS WHERE
DATE = '", lst$DATE, "' AND
NUM = '", lst$NUM, "' AND
PROD = '" lst$PROD ,"'")
I am quite sure that someone knows of some fancy combination of apply(),
paste(..,collapse ="' AND ") and/or substitute() that can accomplish that in an elegant form, but I am stuck.
I don't know if this is elegant enough but it should work:
sql <- paste0("SELECT ",
paste0(names(lst),collapse=','),
" WHERE\n",
paste(lapply(names(lst),function(x)paste0(x," = '",lst[[x]],"'")),collapse="AND\n"))
> cat(sql)
SELECT DATE,NUM,PROD WHERE
DATE = '2015-10-25'AND
NUM = '0001'AND
PROD = 'SOMETHING'
sprintf is generally useful:
lst <- list(DATE=as.Date('2015-10-25'), NUM="0001", PROD="SOMETHING")
q <- "SELECT SOME_COLUMNS WHERE DATE = '%s' AND NUM = '%s' AND PROD = '%s'"
> sprintf(q,lst[[1]],lst[[2]],lst[[3]])
[1] "SELECT SOME_COLUMNS WHERE DATE = '2015-10-25' AND NUM = '0001' AND PROD = 'SOMETHING'"
Also, see my other answer here for more ideas. If you do this a lot, it pays to build up some specialized tools for it, as I outline in that answer.
Are you looking for something like this?
lst2sql <- function(lst) {
sql <- "SELECT col1, col2 FROM table1 WHERE"
predicates <- vapply(names(lst), function(n) { paste(n, " = '", lst[[n]], "'", sep="") }, character(length(names)))
paste(sql, paste(predicates, collapse=" AND "))
}
When called on your example list will produce:
"SELECT col1, col2 FROM table1 WHERE DATE = '2015-10-25' AND NUM = '0001' AND PROD = 'SOMETHING'"
Im trying to add data to MySQL table by using RMySQL. I only need to add one row at a time and it's not working. What I'm trying to do is this.
dbGetQuery(con,"INSERT INTO names VALUES(data[1,1], data[1,2])")
so what I'm doing is that I have values in data frame that is named as "data" and I need to put them into mysql table. before that I will check them if they are already in the table or not and if they are not then I will add them, but that way it isn't working. The data is read from .csv file by read.csv .
You can use paste to construct that actual query.
dat <- matrix(1:4, 2, 2)
query <- paste("INSERT INTO names VALUES(",data[1,1], ",", data[1,2], ")")
query
#[1] "INSERT INTO names VALUES( 1 , 3 )"
dbGetQuery(con, query)
# If there are a lot of columns this could be tedious...
# So we could also use paste to add all the values at once.
query <- paste("INSERT INTO names VALUES(", paste(data[1,], collapse = ", "), ")")
query
#[1] "INSERT INTO names VALUES( 1, 3 )"
You could try with:
dbWriteTable(names, data[1,],append=True)
as the DBI package details
I'm in the process of learning R, to wave SAS goodbye, I'm still new to this and I somehow have difficulties finding exactly what I'm looking for.
But for this specific case, I read:
Pass R variable to RODBC's sqlQuery?
and made it work for myself, as long as I'm only inserting one variable in the destination table.
Here is my code:
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
paste("insert into TestTable (UniqueID) Values (",b,")", sep = "")
When I replace the top 1 by any other number, let's say top 2, and run the exact same code, I get the following errors:
[1] "42000 195 [Microsoft][SQL Server Native Client 10.0][SQL Server]
'c' is not a recognized built-in function name."
[2] "[RODBC] ERROR: Could not SQLExecDirect
'insert into TestTable (UniqueID) Values (c(8535735, 8449336))'"
I understand that it is because there is an extra c that is generated, I assume for column when I give the command: paste(b).
So how can I get "8535735, 8449336" instead of "c(8535735, 8449336)" when using paste(b)? Or is there another way to do this?
Look into the collapse argument in the paste() documentation. Try replacing b with paste(b, collapse = ", "), as shown below.
Edit As Joshua points out, sqlQuery returns a data.frame, not a vector. So, instead of paste(b, collapse = ", "), you could use paste(b[[1]], collapse = ", ").
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
## note paste(b[[1]], collapse = ", ") in line below
paste("insert into TestTable (UniqueID) Values (", paste(b[[1]], collapse = ", "),")", sep = "")
Assuming b looks like this:
b <- data.frame(Noinscr=c("8535735", "8449336"))
Then you only need a couple steps:
# in case Noinscr is a factor
b$Noinscr <- as.character(b$Noinscr)
# convert the vector into a single string
# NOTE that I subset to get the vector, since b is a data.frame
B <- paste(b$Noinscr, collapse=",")
# create your query
paste("insert into TestTable (UniqueID) Values (",B,")", sep="")
# [1] "insert into TestTable (UniqueID) Values (8535735,8449336)"
You got odd results because sqlQuery returns a data.frame, not a vector. As you learned, using paste on a data.frame (or any list) can provide weird results because paste must return a character vector.