I'm using a RJDBC connection to query results from a vertica database into R. I'm creating a comma separated vector of zip codes that I'm then pasting into my query as shown below.
b <- paste("'20882'", "'01441'", "'20860'", "'02139'", sep = ", ")
SQL <- paste("select zip, count(*)
from tablea a
inner join tableb b on a.id = b.id
inner join tablec c on c.col = b.col
where b.zip in (",b'', ") group by 1 order by 1", '', sep = " ")
result <- dbGetQuery(vertica, SQL)
I'm using this in a loop within a function in which I'm going to be adding zip codes to vector b. I was wondering if there was a way to easily do this?
I've been trying, but I'm unable to add items to vector in a way where the query would execute.
Something like the following
b <- c(add_zip, b)
which could then be re-run in the body of the query.
Any suggestions?
Thanks,
Ben
Related
Here is the data frame that i have
trail_df= data.frame(d= seq.Date(as.Date("2020-01-01"), as.Date("2020-02-01"), by= 1),
AA= NA,
BB= NA,
CC= NA)
Now I would loop to the columns of trail_df and get the data of the column names respectively from the oracle database for the given date, which I am doing like this.
for ( i in 2:ncol(trail_df)){
c_name = colnames(trail_df)[i]
query = paste0("SELECT * FROM tablename WHERE ID= '",c_name,"' ") # this query would return Date and price
result= dbGetQuery(con, query) # con is the connection variable from db
for (k in nrow(trail_df)){
trail_df [which(as.Date(result[k,1])==as.Date(trail_df[,1])),i]= result[k,2]
# just matching the date in trail_df dataframe and pasting the value in front of respective column
}
}
this is the snippet of the code and the dates filtering and all has been taken care of in real code.
The problem is, I have more than 6000 columns and 500 rows, for which I have to match the dates(
BECAUSE THE DATES ARE RANDOM) and put the price in front, which is taking like forever now.
I am new in the R language and would appreciate any help which would fasten this code maybe multiprocess if possible in R.
There are two steps to this answer:
Use parameterized queries to get the raw data; and
Get this data into the "wide" format you desire.
Parameterized query
My (first) suggestion is to use parameterized queries, which is safer. It may not improve the speed relative to #RonakShah's answer (using sprintf), at least not on the first time.
However, it might help a touch if the query is repeated: DBMSes tend to parse/optimize queries and cache this optimization. When a query changes even a little, this caching cannot happen, and the query is re-optimized. In this case, this cache-invalidation is unnecessary, and can be avoided if we use binding parameters.
query <- sprintf("SELECT * FROM tablename WHERE ID IN (%s)",
paste(rep("?", ncol(trail_df[-1])), collapse = ","))
query
# [1] "SELECT * FROM tablename WHERE ID IN (?,?,?)"
res <- dbGetQuery(con, query, params = list(trail_df$ID))
Some thoughts:
if the database has many more dates than what you have here, you can restrict the data returned by reducing the date range queries. This will work well if your trail_df dates are close together:
query <- sprintf("SELECT * FROM tablename WHERE ID IN (%s) and Date between ? and ?",
paste(rep("?", ncol(mtcars)), collapse = ","))
query
res <- dbGetQuery(con, query, params = c(list(trail_df$ID), as.list(range(df$d))))
if your dates are more variable and you end up querying many more rows than you actually need, I suggest you can upload your trail_df dates into a temporary table and something like:
"select tb.Date, tb.ID, tb.Price
from mytemptable tmp
left join tablename tb on tmp.d = tb.Date
where ..."
Reshape
It appears as if your database table may be more "long" shaped and you want it "wide" in your frame. There are many ways to reshape from long-to-wide (examples), but these should work:
reshape2::dcast(res, Date ~ ID, value.var = "Price") # 'Price' is the 'value' column, unk here
tidyr::pivot_wider(res, id_cols = "Date", names_from = "ID", values.from = "Price")
I'm trying to use text substitutes in R to put in custom dates with my SQL odbc connect query.
For example, I could change date1 to be 2016-01-31 and the data would automatically execute. However, using bquote text replacement, it doesn't seem to work....
Any ideas?
library("rodbc")
date1 <- c("2016-12-31")
myconn <- odbcConnect("edwPROD",uid="username",pwd="BBBBB")
data1 <- sqlQuery(myconn,"
SELECT a.*
FROM (SELECT id
,status_code
,rate_plan
,publication
,active_count
FROM prod_view.fct_active
WHERE snap_start_date<=bquote(.date1)
) AS a
")
odbcClose(myconn)
This is a job for package infuser. It allows you to change one part of the SQL request, in this case date1.
library(infuser)
date1 <- c("2016-12-31")
sql_query_template <- "SELECT a.*
FROM (SELECT id
,status_code
,rate_plan
,publication
,active_count
FROM prod_view.fct_active
WHERE snap_start_date<='{{date1}}'
) AS a;"
sql_query <-infuse(sql_query_template, date1=date1)
myconn <- odbcConnect("edwPROD",uid="username",pwd="BBBBB")
data1 <- sqlQuery(myconn,sql_query)
odbcClose(myconn)
Fist I am executing following R commands which returns me a set of records from postgresql
col_qry <- paste("select column_name from table1",sep="")
rs_col <- dbSendQuery(r,col_qry)
temp_list <- fetch(rs_col,n=-1)
The Data returned is displayed is following format when printed in R using print(temp_list)
column_name
1 col1
2 col2
3 col3
4 col4
Now based on this returned data I want to generate another sql statement which should be like this
copy (select "col1","col2","col3","col4" from table2 )
When I do this
tmp_cp <- paste("copy (select ",col_list,",","from table2",sep="")
and print this tmp_cp then instead of one copy statement bunch of copy statements are printed, one for each column name inside select like this
copy (select col1 from table2 )
copy (select col2 from table2 )
copy (select col3 from table2 )
copy (select col4 from table2 )
and so on...
I want only one copy statement with all column names mentioned together, each quoted with "" and separated by ,. How can I do that?
UPDATE: When I am using these statement
col_list <- toString(shQuote(temp_list$column_name))
tmp_cp <- paste("copy (select ",col_list,",","from table2",sep="")
then only one statement is generated but the column names are inside single quote instead of double quotes like this :
copy (select 'col1','col2','col3','col4' from table2 )
NOTE: I have mentioned 4 columns above but it is not that there are 4 columns only.columns can be many.For sake of explanation i have show 4 columns
Try this:
library(gsubfn)
sql <- fn$identity(
"select `toString(shQuote(temp_list$column_name, 'cmd'))` from table2"
)
giving:
> sql
[1] "select \"col1\", \"col2\", \"col3\", \"col4\" from table2"
> cat(sql, "\n")
select "col1", "col2", "col3", "col4" from table2
This would work too and does not require any packages:
sprintf("select %s from table2",
toString(shQuote(temp_list$column_name, 'cmd')))
Nested paste with the collapse argument:
paste("copy (select", paste(cols, collapse=", "), "from table2)")
If you want quoted column names:
paste("copy (select", paste(shQuote(cols, "cmd"), collapse=", "), "from table2)")
Im trying to add data to MySQL table by using RMySQL. I only need to add one row at a time and it's not working. What I'm trying to do is this.
dbGetQuery(con,"INSERT INTO names VALUES(data[1,1], data[1,2])")
so what I'm doing is that I have values in data frame that is named as "data" and I need to put them into mysql table. before that I will check them if they are already in the table or not and if they are not then I will add them, but that way it isn't working. The data is read from .csv file by read.csv .
You can use paste to construct that actual query.
dat <- matrix(1:4, 2, 2)
query <- paste("INSERT INTO names VALUES(",data[1,1], ",", data[1,2], ")")
query
#[1] "INSERT INTO names VALUES( 1 , 3 )"
dbGetQuery(con, query)
# If there are a lot of columns this could be tedious...
# So we could also use paste to add all the values at once.
query <- paste("INSERT INTO names VALUES(", paste(data[1,], collapse = ", "), ")")
query
#[1] "INSERT INTO names VALUES( 1, 3 )"
You could try with:
dbWriteTable(names, data[1,],append=True)
as the DBI package details
I'm in the process of learning R, to wave SAS goodbye, I'm still new to this and I somehow have difficulties finding exactly what I'm looking for.
But for this specific case, I read:
Pass R variable to RODBC's sqlQuery?
and made it work for myself, as long as I'm only inserting one variable in the destination table.
Here is my code:
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
paste("insert into TestTable (UniqueID) Values (",b,")", sep = "")
When I replace the top 1 by any other number, let's say top 2, and run the exact same code, I get the following errors:
[1] "42000 195 [Microsoft][SQL Server Native Client 10.0][SQL Server]
'c' is not a recognized built-in function name."
[2] "[RODBC] ERROR: Could not SQLExecDirect
'insert into TestTable (UniqueID) Values (c(8535735, 8449336))'"
I understand that it is because there is an extra c that is generated, I assume for column when I give the command: paste(b).
So how can I get "8535735, 8449336" instead of "c(8535735, 8449336)" when using paste(b)? Or is there another way to do this?
Look into the collapse argument in the paste() documentation. Try replacing b with paste(b, collapse = ", "), as shown below.
Edit As Joshua points out, sqlQuery returns a data.frame, not a vector. So, instead of paste(b, collapse = ", "), you could use paste(b[[1]], collapse = ", ").
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
## note paste(b[[1]], collapse = ", ") in line below
paste("insert into TestTable (UniqueID) Values (", paste(b[[1]], collapse = ", "),")", sep = "")
Assuming b looks like this:
b <- data.frame(Noinscr=c("8535735", "8449336"))
Then you only need a couple steps:
# in case Noinscr is a factor
b$Noinscr <- as.character(b$Noinscr)
# convert the vector into a single string
# NOTE that I subset to get the vector, since b is a data.frame
B <- paste(b$Noinscr, collapse=",")
# create your query
paste("insert into TestTable (UniqueID) Values (",B,")", sep="")
# [1] "insert into TestTable (UniqueID) Values (8535735,8449336)"
You got odd results because sqlQuery returns a data.frame, not a vector. As you learned, using paste on a data.frame (or any list) can provide weird results because paste must return a character vector.