I have read-only access to a Postgres database. I can not write to the database.
Q. Is there a way to construct and run a SQL query where I join a data frame (or other R object) to a table in a read-only Postgres database?
This is for accessing data from WRDS, https://wrds-www.wharton.upenn.edu/
Here's an attempt at pseudocode
#establish a connection to a database
con <- dbConnect( Postgres(),
host = 'host.org',
port = 1234,
dbname = 'db_name',
sslmode = 'require',
user = 'username', password = 'password')
#create an R dataframe (or other object)
df <- data.frame( customer_id = c('a123', 'a-345', 'b0') )
#write a sql query we will run
sql_query <- "
SELECT t.customer_id, t.* FROM df t
LEFT JOIN table_name df
on t.customer_id = df.customer_id
"
my_query_results <- dbSendQuery(con, sql_query)
temp <- dbFetch(res, n = 1)
dbClearResult(res)
my_query_results
Note and edit: The example query I provided is intentionally super simple for example purposes.
In my actual queries, there might be 3 or more columns I want to join on, and millions of rows I want to join on.
Use the copy_inline function from the dbplyr package, which was added following an issue filed on this topic. See also the question here.
An example of its use is found here.
If your join is on a single condition, it can be rewritten using an in clause:
In SQL:
SELECT customer_id
FROM table_name
WHERE customer_id in ('a123', 'a-345', 'b0')
Programmatically from R:
sql_query = sprintf(
"SELECT customer_id
FROM table_name
WHERE customer_id in (%s)",
paste(sQuote(df$customer_id, q = FALSE), collapse = ", ")
)
Some time ago I did this question: Speeding up PostgreSQL queries (Check if entry exists in another table)
But, since I'm working with DBI with dbplyr as backend, I'd like to know whats is the dbplyr function equivalent to EXISTS function from PostgreSQL.
For while, I'm performing the query using literal SQL sintaxe
myQuery <- 'SELECT "genomic_accession",
"assembly",
"product_accession",
"tmpcol",
( EXISTS (SELECT 1
FROM "cachedb" c
WHERE c.product_accession IN ( pt.product_accession, pt.tmpcol
)) )
AS CACHE,
( EXISTS (SELECT 1
FROM "sbpdb" s
WHERE s.product_accession IN ( pt.product_accession, pt.tmpcol
)) )
AS SBP
FROM (SELECT *
FROM "pairtable2") pt; '
dbExecute(db, myQuery) -> tmp
Then, I tried to pass literal SQL instructions to mutate:
pairTable %>%
head(n=5000) %>%
mutate(
CACHE = sql('EXISTS( select 1 FROM "cacheDB" AS c
WHERE c.product_accession IN ( product_accession, tmpcol) )' ),
SBP = sql('EXISTS( select 1 FROM "SBPDB" AS s
WHERE s.product_accession IN ( product_accession, tmpcol) )' )
)
But this way, I don't know why I'm lacking all cases which are false, the comparison.
And I expect there is an implementation of this method in dbplyr or even some DBI method to this.
Thanks
I have a dataframe in R containing 10 rows and 7 columns. There's a stored procedure that does the few logic checks in the background and then inserts the data in the table 'commodity_price'.
library(RMySQL)
#Connection Settings
mydb = dbConnect(MySQL(),
user='uid',
password='pwd',
dbname='database_name',
host='localhost')
#Listing the tables
dbListTables(mydb)
f= data.frame(
location= rep('Bhubaneshwar', 4),
sourceid= c(8,8,9,2),
product= c("Ingot", "Ingot", "Sow Ingot", "Alloy Ingot"),
Specification = c('ie10','ic20','se07','se08'),
Price=c(14668,14200,14280,20980),
currency=rep('INR',4),
uom=rep('INR/MT',4)
)
For multiple rows insert, there's a pre-created stored proc 'PROC_COMMODITY_PRICE_INSERT', which I need to call.
for (i in 1:nrow(f))
{
dbGetQuery(mydb,"CALL PROC_COMMODITY_PRICE_INSERT(
paste(f$location[i],',',
f$sourceid[i],',',f$product[i],',',f$Specification[i],',',
f$Price[i],',',f$currency[i],',', f$uom[i],',',#xyz,')',sep='')
);")
}
I am repeatedly getting error.
Error in .local(conn, statement, ...) :
could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[i],',',
f$sourceid[i],',',f$product[i],',',f$Specification' at line 2
I tried using RODBC but its not getting connected at all. How can I insert the data from the R dataframe in the 'commodity_price' table by calling a stored proc? Thanks in advance!
That is probably due to your use of ', this might work:
for (i in 1:nrow(f))
{
dbGetQuery(mydb,paste("CALL PROC_COMMODITY_PRICE_INSERT(",f$location[i],',',
f$sourceid[i],',',f$product[i],',',f$Specification[i],',',
f$Price[i],',',f$currency[i],',', f$uom[i],',',"#xyz",sep='',");"))
}
or the one-liner:
dbGetQuery(mydb,paste0("CALL PROC_COMMODITY_PRICE_INSERT('",apply(f, 1, paste0, collapse = "', '"),"');"))
Trying the for loop:
for (i in 1:nrow(f))
{
dbGetQuery(mydb,paste("CALL PROC_COMMODITY_PRICE_INSERT(","'",f$location[i],"'",',',"'",
f$sourceid[i],"'",',',"'",f$product[i],"'",',',"'",f$Specification[i],"'",',',"'",
f$Price[i],"'",',',"'",f$currency[i],"'",',',"'",f$uom[i],"'",',','#xyz',sep='',");"))
}
I have R script for data analysis. I try it on 6 different tables from my mysql database. On 5 of them script works fine. But on last table it don't wont work. There is part of my code :
sql <- ""
#write union select for just one access to database which will optimize code
for (i in 2:length(awq)-1){
num <- awq[i]-1
sql <- paste(sql, "(SELECT * FROM mytable LIMIT ", num, ",1) UNION ")
}
sql <- paste(sql, "(SELECT * FROM mytable LIMIT ", awq[length(awq)-1], ",1)")
#database query
nb <- dbGetQuery(mydb, sql)
My mysql table where script don't work have 21 676 rows. My other tables have under 20 000 rows and with them script work. If it don't work work it give me this error :
Error in .local(conn, statement, ...) :
could not run statement: memory exhausted near '1) UNION (SELECT * FROM mytable LIMIT 14107 ,1) UNION (SELECT * FROM mytabl' at line 1
I understood there is memory problem. But how to solve it ? I don't want delete rows from my table. Is there another way ?
I'm reading 3 Million Records from a table and i want to Write it to a text file, but i'm facing as the program is running out of Memory throwing an error
Exceeded maximum space of Memory 3096 MB.
My System Configuration is i5 Processor with 4 GB RAM.
Please find below code.
library(RODBC)
con <- odbcConnect("REGION", uid="", pwd="")
a <- sqlQuery(con, "SELECT * FROM dbo.GERMANY where CHARGE_START_DATE = '04/01/2017'");
write.table(a,"C:/Users/609354986/Desktop/R/Data/1Germany.txt",na="",sep="|",row.names = FALSE,col.names = FALSE)
close(con)
what you can do is add an index to your db table so you can loop through it and extract/write your data piece by piece without filling up your memory
here is an example
# create that index
sqlQuery(channel, 'alter table dbo.GERMANY ADD MY_COL NUMBER')
sqlQuery(channel, 'update dbo.GERMANY set MY_COL = rownum ')
# the function
g <- function(a) {
for (i in (1:length(a))) {
query <- gsub('\n',' ', paste( "SELECT * FROM dbo.GERMANY where
CHARGE_START_DATE = '04/01/2017 and
my_col between",a[i] ," and ", a[i+1], collapse = ' '));
df <- sqlQuery(channel, query) ;
write.csv(df, paste('my_',i,'_df.csv')) ;
}
}
# use reasonable chunks
a <- seq(1,3000000,250000)
g(a)