I have a vector with the following data
ids <- list(memberPersonID = c("2892056", "2894545", "2894546", "2894548",
"2894550", "2894551", "2894553", "2894555", "2894556"))
I need to convert use this in a string such that it's output like the following:
select * from data where id in ('2892056', '2894545', '2894546', '2894548', '2894550', '2894551', '2894553', '2894555', '2894556')
so, I did the following:
ids <- paste0(ids, collapse="','")
query <- paste0("select * from data where id in (", ids, ")")
I keep getting the following error:
select * from teams where teamID = '3246492' and assessmentStatus = 'C' and memberPersonID in (c("2892056", "2894544", "2894545", "2894546", "2894547", "2894548", "2894550", "2894553", "2894555", "2894556"))
<Rcpp::exception: no such function: c>
How do I get rid of 'c(' from the vector so that it looks like the one shown above in the desired output?
You could paste the memberPersonID value into one comma-separated string.
sprintf("select * from data where id in (%s)", paste0(ids$memberPersonID, collapse = ','))
#[1] "select * from data where id in (2892056,2894545,2894546,2894548,2894550,2894551,2894553,2894555,2894556)"
If you want the ids to be surrounded by quotes :
sprintf("select * from data where id in ('%s')", paste0(ids$memberPersonID, collapse = "','"))
#[1] "select * from data where id in ('2892056','2894545','2894546','2894548','2894550','2894551','2894553','2894555','2894556')"
Because ids is a list, you need to use it like this:
ids <- paste0(ids$memberPersonID, collapse="','")
And you need extra ' before and after ids.
query <- paste0("select * from data where id in ('", ids, "')")
Related
Here is the data frame that i have
trail_df= data.frame(d= seq.Date(as.Date("2020-01-01"), as.Date("2020-02-01"), by= 1),
AA= NA,
BB= NA,
CC= NA)
Now I would loop to the columns of trail_df and get the data of the column names respectively from the oracle database for the given date, which I am doing like this.
for ( i in 2:ncol(trail_df)){
c_name = colnames(trail_df)[i]
query = paste0("SELECT * FROM tablename WHERE ID= '",c_name,"' ") # this query would return Date and price
result= dbGetQuery(con, query) # con is the connection variable from db
for (k in nrow(trail_df)){
trail_df [which(as.Date(result[k,1])==as.Date(trail_df[,1])),i]= result[k,2]
# just matching the date in trail_df dataframe and pasting the value in front of respective column
}
}
this is the snippet of the code and the dates filtering and all has been taken care of in real code.
The problem is, I have more than 6000 columns and 500 rows, for which I have to match the dates(
BECAUSE THE DATES ARE RANDOM) and put the price in front, which is taking like forever now.
I am new in the R language and would appreciate any help which would fasten this code maybe multiprocess if possible in R.
There are two steps to this answer:
Use parameterized queries to get the raw data; and
Get this data into the "wide" format you desire.
Parameterized query
My (first) suggestion is to use parameterized queries, which is safer. It may not improve the speed relative to #RonakShah's answer (using sprintf), at least not on the first time.
However, it might help a touch if the query is repeated: DBMSes tend to parse/optimize queries and cache this optimization. When a query changes even a little, this caching cannot happen, and the query is re-optimized. In this case, this cache-invalidation is unnecessary, and can be avoided if we use binding parameters.
query <- sprintf("SELECT * FROM tablename WHERE ID IN (%s)",
paste(rep("?", ncol(trail_df[-1])), collapse = ","))
query
# [1] "SELECT * FROM tablename WHERE ID IN (?,?,?)"
res <- dbGetQuery(con, query, params = list(trail_df$ID))
Some thoughts:
if the database has many more dates than what you have here, you can restrict the data returned by reducing the date range queries. This will work well if your trail_df dates are close together:
query <- sprintf("SELECT * FROM tablename WHERE ID IN (%s) and Date between ? and ?",
paste(rep("?", ncol(mtcars)), collapse = ","))
query
res <- dbGetQuery(con, query, params = c(list(trail_df$ID), as.list(range(df$d))))
if your dates are more variable and you end up querying many more rows than you actually need, I suggest you can upload your trail_df dates into a temporary table and something like:
"select tb.Date, tb.ID, tb.Price
from mytemptable tmp
left join tablename tb on tmp.d = tb.Date
where ..."
Reshape
It appears as if your database table may be more "long" shaped and you want it "wide" in your frame. There are many ways to reshape from long-to-wide (examples), but these should work:
reshape2::dcast(res, Date ~ ID, value.var = "Price") # 'Price' is the 'value' column, unk here
tidyr::pivot_wider(res, id_cols = "Date", names_from = "ID", values.from = "Price")
I have written a code:
my_values <- dbGetQuery(con, stri_encode(stores_query, to = "UTF-8")) %>%
as.data.table()
table_query <- glue("
SELECT
*
FROM MY_DB
WHERE
values IN {my_values} AND
LIMIT 100
")
The output is:
SELECT
*
FROM MY_DB
WHERE
values IN c("john", "mike", "alex") AND
LIMIT 100
However I need this:
SELECT
*
FROM MY_DB
WHERE
values IN ('john', 'mike', 'alex') AND
LIMIT
As you see, I need to without vector sign "c", to make my query work. I also need values be in '', not in "". How to do it?
Use glue_sql(), and add * after the variable name placeholder inside {…}:
table_query <- glue_sql("
SELECT
*
FROM MY_DB
WHERE
values IN ({my_values*}) AND
LIMIT 100
", .con = con)
SELECT
*
FROM MY_DB
WHERE
values IN ('john', 'mike', 'alex') AND
LIMIT 100
PostgreSQL table structure looks like
if (!dbExistsTable(pg, c("store", "a"))) {
dbGetQuery(pg, "
CREATE TABLE a (
company_name text,
num text,
date date,
file text
)
")
}
and R dataframe is
df <- data.frame(num=c('13','15', '100', '700'))
what i want is to search df column num (all rows) from PostgreSQL something like
file.list <- dbGetQuery(pg, "
SET work_mem='1GB';
SELECT *
FROM store.a
WHERE num = ?????
")
don't know how to create where clause for the loop to get the desired results and is it possible to fixed the results for specific date/year?
I would like to take the values from a data frame and paste them into a text string that can be used as a sql query. In SAS I would do it
proc sql noprint; Select Names into :names separated by ", " from df; quit;
this would create a variable &name storing all the names. Like: Id, Name, Account. I would like to do this same type of thing in R, but do not know how. I can create a vector with names separated by comma and each one is surrounded by quotes, I can take away the quotes using noquote function and have them in a vector, but I can not get the elements in another paste statement to add the "SELECT" and FROM. I can not get it to all paste. Is there a way to pull the values on Column and create a text string that can be used as a SQL query inside R? Here is what I have tried in R:
name = c("Id", "IsDeleted", "Name", "Credit__Loan__c")
label = c("Record Id", "Deleted", "ID", "Loan")
df = data.frame(name, label)
names(df) <- c("name", "label")
as.query.fields = noquote(paste(df$name, collaspe=", "))
as.query.final <- paste("SELECT " , noquote(paste(df$name, collaspe=", ")), " id FROM Credit_Amortization_Schedule__c")
data(iris)
colnames(iris)
a <- noquote(paste(colnames(iris), collaspe=", "))
as.query.final <- cat("SELECT " , a, " id FROM Credit_Amortization_Schedule__c")
The result is:
SELECT Sepal.Length , Sepal.Width , Petal.Length , Petal.Width , Species , id FROM Credit_Amortization_Schedule__c
which you can then use with SQL like this:
require(RODBC)
result <- sqlQuery(db, as.query.final)
where db is your database connection
Or, since I see your sqldf tag now, if you want to use sqldf it's just:
sqldf(as.query.final)
The gsubfn package supports string interpolation:
library(gsubfn)
Names <- toString( sprintf("%s '%s'", df$name, df$label) )
fn$identity("select $Names from myTable")
giving:
[1] "select Id 'Record Id', IsDeleted 'Deleted', Name 'ID', Credit__Loan__c 'Loan' from myTable"
Here some additional examples: SO example 1 and SO example 2 .
I'm in the process of learning R, to wave SAS goodbye, I'm still new to this and I somehow have difficulties finding exactly what I'm looking for.
But for this specific case, I read:
Pass R variable to RODBC's sqlQuery?
and made it work for myself, as long as I'm only inserting one variable in the destination table.
Here is my code:
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
paste("insert into TestTable (UniqueID) Values (",b,")", sep = "")
When I replace the top 1 by any other number, let's say top 2, and run the exact same code, I get the following errors:
[1] "42000 195 [Microsoft][SQL Server Native Client 10.0][SQL Server]
'c' is not a recognized built-in function name."
[2] "[RODBC] ERROR: Could not SQLExecDirect
'insert into TestTable (UniqueID) Values (c(8535735, 8449336))'"
I understand that it is because there is an extra c that is generated, I assume for column when I give the command: paste(b).
So how can I get "8535735, 8449336" instead of "c(8535735, 8449336)" when using paste(b)? Or is there another way to do this?
Look into the collapse argument in the paste() documentation. Try replacing b with paste(b, collapse = ", "), as shown below.
Edit As Joshua points out, sqlQuery returns a data.frame, not a vector. So, instead of paste(b, collapse = ", "), you could use paste(b[[1]], collapse = ", ").
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
## note paste(b[[1]], collapse = ", ") in line below
paste("insert into TestTable (UniqueID) Values (", paste(b[[1]], collapse = ", "),")", sep = "")
Assuming b looks like this:
b <- data.frame(Noinscr=c("8535735", "8449336"))
Then you only need a couple steps:
# in case Noinscr is a factor
b$Noinscr <- as.character(b$Noinscr)
# convert the vector into a single string
# NOTE that I subset to get the vector, since b is a data.frame
B <- paste(b$Noinscr, collapse=",")
# create your query
paste("insert into TestTable (UniqueID) Values (",B,")", sep="")
# [1] "insert into TestTable (UniqueID) Values (8535735,8449336)"
You got odd results because sqlQuery returns a data.frame, not a vector. As you learned, using paste on a data.frame (or any list) can provide weird results because paste must return a character vector.