Easy way to Copy-Paste SQL Query into R - r

I'm trying to move all my data-processes into R from SQL using odbc since that's where I do all my data cleaning/analysis anyways. In moving my queries to R, I haven't been able to find an easy way to just copy-paste the strings into a readable format. Say I have the following query in SQL:
SELECT
col1,
sum(col2) sum_col2,
col3
FROM
db.table1 t1
WHERE
col1 > 0
AND col6 BETWEEN .5 AND 1.76
GROUP BY
col1,
col3
If I were to try to assign this to a querystring variable in R, I'd put it in a paste, but even then I'd have to go through and separate each line with a comma for it to pass through getDBQuery correctly. Has anyone found an elegant way to copy and paste SQL syntax into R that doesn't require too many fixes? Is there an option in paste that allows you to ignore new lines '\n', or could I create a custom function?
Thank you

As Benjamin explains in the comments, you can simply put your text in quotation marks. Example:
library(sqldf)
df = data.frame(x=c(1,2,3,4,5),y=c(2,2,3,3,3))
my_query = 'SELECT DISTINCT
y FROM
df
where x>3'
df2 <- sqldf(my_query)

there are also tools which generate out of plain SQL the apropriate Java,C#, VB, etc. statement. Just cut & paste.
The formatter I wrote for this purpose is SQLinForm which has several free formatter according to which environment you are working on. Link is enter link description here

Related

RSQLite dbGetQuery with input from Data Frame

I have a database called "db" with a table called "company" which has a column named "name".
I am trying to look up a company name in db using the following query:
dbGetQuery(db, 'SELECT name,registered_address FROM company WHERE LOWER(name) LIKE LOWER("%APPLE%")')
This give me the following correct result:
name
1 Apple
My problem is that I have a bunch of companies to look up and their names are in the following data frame
df <- as.data.frame(c("apple", "microsoft","facebook"))
I have tried the following method to get the company name from my df and insert it into the query:
sqlcomp <- paste0("'SELECT name, ","registered_address FROM company WHERE LOWER(name) LIKE LOWER(",'"', df[1,1],'"', ")'")
dbGetQuery(db,sqlcomp)
However this gives me the following error:
tinyformat: Too many conversion specifiers in format string
I've tried several other methods but I cannot get it to work.
Any help would be appreciated.
this code should work
df <- as.data.frame(c("apple", "microsoft","facebook"))
comparer <- paste(paste0(" LOWER(name) LIKE LOWER('%",df[,1],"%')"),collapse=" OR ")
sqlcomp <- sprintf("SELECT name, registered_address FROM company WHERE %s",comparer)
dbGetQuery(db,sqlcomp)
Hope this helps you move on.
Please vote my solution if it is helpful.
Using paste to paste in data into a query is generally a bad idea, due to SQL injection (whether truly injection or just accidental spoiling of the query). It's also better to keep the query free of "raw data" because DBMSes tend to optimize a query once and reuse that optimized query every time it sees the same query; if you encode data in it, it's a new query each time, so the optimization is defeated.
It's generally better to use parameterized queries; see https://db.rstudio.com/best-practices/run-queries-safely/#parameterized-queries.
For you, I suggest the following:
df <- data.frame(names = c("apple", "microsoft","facebook"))
qmarks <- paste(rep("?", nrow(df)), collapse = ",")
qmarks
# [1] "?,?,?"
dbGetQuery(con, sprintf("select name, registered_address from company where lower(name) in (%s)", qmarks),
params = tolower(df$names))
This takes advantage of three things:
the SQL IN operator, which takes a list (vector in R) of values and conditions on "set membership";
optimized queries; if you subsequently run this query again (with three arguments), then it will reuse the query. (Granted, if you run with other than three companies, then it will have to reoptimize, so this is limited gain);
no need to deal with quoting/escaping your data values; for instance, if it is feasible that your company names might include single or double quotes (perhaps typos on user-entry), then adding the value to the query itself is either going to cause the query to fail, or you will have to jump through some hoops to ensure that all quotes are escaped properly for the DBMS to see it as the correct strings.

RODBC gives proper row count but yields empty query

Using R-3.5.0 and RODBC v. 1.3-15 on Windows.
I am trying to query data from a remote database. I can connect fine and if I do a query to count the rows, the answer comes out correctly. But if I try to remove the count statement select count(*) and actually get the data via select *, I yield an empty query (with some rather strange headers). Only two of the column names come out correctly and the rest are question marks and a number (as shown below). I can using sql developer to query the data no problem.
I include the simplest version of the code below but I get the same results if I try to limit to just a few rows or certain conditions, etc. Sorry I cannot create a reproducible example but as this is a remote db and I have no idea what the problem is, I'm not sure how I could even do that.
I can query other tables from different schemas within the same odbc connection, so I don't think it is that. I have tried with and without the believeNRows and the rows_at_time.
Thank you for any thoughts.
channel <- odbcConnect("mydb", uid="myuser", pwd="mypass", believeNRows=FALSE,rows_at_time = 1)
myquery <- paste("select count(*) from MYSCHEMA.MYTABLE")
sqlQuery(channel, myquery)
COUNT(*)
1 149712361
myquery <- paste("select * from MYSCHEMA.MYTABLE")
sqlQuery(channel, myquery)
[1] ID FMC_IN_ID ? ?.1 ?.2 ?.3 ?.4 ?.5 ?.6 ?.7 ?.8 ?.9 ?.10 ?.11 ?.12 ?.13 ?.14 ?.15
<0 rows> (or 0-length row.names)
I would try the following:
add a simple limit 100 to your query to see if you can get some data back
add the believeNRows option to the sqlQuery call -- in my experience it is needed at that level
In case it helps others, the problem was that the database contained an Oracle spatial field (MDSYS.SDO_GEOMETRY). R did not know what to do with it. I assumed it would just convert it to a character but instead it just got confused. By omitting the spatial field, the query worked fine.

Please assist in understanding an R import from csv

I have R code like this, that I cannot execute because I do not have rights to install the packages, so need some help understanding what it is doing.
raw_data<- read.csv("raw_data.csv")
attach(raw_data)
raw_data$new_col<- raw_data$Employee.Name
raw_data <- select(raw_data, - Employee.Name)
Am I correct that line 3 is creating a new field called new_col and assigning the value from the csv field Employee Name . The . is supposed to mask the space between Employee and Name
In the 4th line, we are just dropping the original column from the dataset?
Yes, the fourth line (raw_data <- select(raw_data, - Employee.Name)) is using the select() function from the dplyr package to drop a column/variable from the data set. The base R equivalents would be
subset(raw_data, select = -Employee.Name)
or
raw_data[,!(names(raw_data)=="Employee.Name")]
Almost every modern R lesson recommends avoiding attach() (even its own help page!)
The operation here creates a new column by copying the employee name column, then drops the employee name column. It might be more efficient and easier to understand to rename the column instead.
names(raw_data)[names(raw_data)=="Employee.Name"] <- "new_col"
or in tidyverse
rename(raw_data, new_col = Employee.Name)
(see here)

Adding value to existing database table in RSQLite

I am new to RSQLite.
I have an input document in text format in which values are seperately by '|'
I created a table with the required variables (dummy code as follows)
db<-dbconnect(SQLite(),dbname="test.sqlite")
dbSendQuery(conn=db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER
NAME CHAR(25)
DATED DATE)"
)
However I am struck at how to import values into the created table.
I cannot use INSERT INTO Values command as there are thousands of rows and more than 20+ columns in the original data file and it is impossible to manually type in each data point.
Can someone suggest an alternative efficient way to do so?
You are using a scripting language. The deal of this is literally to avoid manually typing each data point. Sorry.
You have two routes:
1: You have corrected loaded a database connection and created an empty table in your SQLite database. Nice!
To load data into the table, load your text file into R using e.g. df <-
read.table('textfile.txt', sep='|') (modify arguments to fit your text file).
To have a 'dynamic' INSERT statement, you can use placeholders. RSQLite allows for both named or positioned placeholder. To insert a single row, you can do:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (?, ?, ?);', list(1, 16, 'Big fellow'))
You see? The first ? got value 1, the second ? got value 16, and the last ? got the string Big fellow. Also note that you do not enclose placeholders for text in quotation marks (' or ")!
Now, you have thousands of rows. Or just more than one. Either way, you can send in your data frame. dbSendQuery has some requirements. 1) That each vector has the same number of entries (not an issue when providing a data.frame). And 2) You may only submit the same number of vectors as you have placeholders.
I assume your data frame, df contains columns mark, roll, and name, corrsponding to the columns. Then you may run:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (:mark, :roll, :name);', df)
This will execute an INSERT statement for each row in df!
TIP! Because an INSERT statement is execute for each row, inserting thousands of rows can take a long time, because after each insert, data is written to file and indices are updated. Insert, enclose it in an transaction:
dbBegin(db)
res <- dbSendQuery(db, 'INSERT ...;', df)
dbClearResult(res)
dbCommit(db)
and SQLite will save the data to a journal file, and only save the result when you execute the dbCommit(db). Try both methods and compare the speed!
2: Ah, yes. The second way. This can be done in SQLite entirely.
With the SQLite command utility (sqlite3 from your command line, not R), you can attach a text file as a table and simply do a INSERT INTO ... SELECT ... ; command. Alternately, read the text file in sqlite3 into a temporary table and run a INSERT INTO ... SELECT ... ;.
Useful site to remember: http://www.sqlite.com/lang.html
A little late to the party, but DBI provides dbAppendTable() which will write the contents of a dataframe to an SQL table. Column names in the dataframe must match the field names in the database. For your example, the following code would insert the contents of my random dataframe into your newly created table.
library(DBI)
db<-dbConnect(RSQLite::SQLite(),dbname=":memory")
dbExecute(db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER,
NAME TEXT
)"
)
df <- data.frame(MARKS = sample(1:100, 10),
ROLLNUM = sample(1:100, 10),
NAME = stringi::stri_rand_strings(10, 10))
dbAppendTable(db, "TABLE1", df)
I don't think there is a nice way to do a large number of inserts directly from R. SQLite does have a bulk insert functionality, but the RSQLite package does not appear to expose it.
From the command line you may try the following:
.separator |
.import your_file.csv your_table
where your_file.csv is the CSV (or pipe delimited) file containing your data and your_table is the destination table.
See the documentation under CSV Import for more information.

Writing and Updating DB2 tables with r

I can't figure out how to update an existing DB2 database in R or update a single value in it.
I can't find much information on this topic online other than very general information, but no specific examples.
library(RJDBC)
teachersalaries=data.frame(name=c("bob"), earnings=c(100))
dbSendUpdate(conn, "UPDATE test1 salary",teachersalaries[1,2])
AND
teachersalaries=data.frame(name=c("bob",'sally'), earnings=c(100,200))
dbSendUpdate(conn, "INSERT INTO test1 salary", teachersalaries[which(teachersalaries$earnings>200,] )
Have you tried passing a regular SQL statement like you would in other languages?
dbSendUpdate(conn, "UPDATE test1 set salary=? where id=?", teachersalary, teacherid)
or
dbSendUpdate(conn,"INSERT INTO test1 VALUES (?,?)",teacherid,teachersalary)
Basically you specify the regular SQL DML statement using parameter markers (those question marks) and provide a list of values as comma-separated parameters.
Try this, it worked for me well.
dbSendUpdate(conn,"INSERT INTO test1 VALUES (?,?)",teacherid,teachersalary)
You just need to pass a regular SQL piece in the same way you do in any programing langs. Try it out.
To update multiple rows at the same time, I have built the following function.
I have tested it with batches of up to 10,000 rows and it works perfectly.
# Libraries
library(RJDBC)
library(dplyr)
# Function upload data into database
db_write_table <- function(conn,table,df){
# Format data to write
batch <- apply(df,1,FUN = function(x) paste0("'",trimws(x),"'", collapse = ",")) %>%
paste0("(",.,")",collapse = ",\n")
#Build query
query <- paste("INSERT INTO", table ,"VALUES", batch)
# Send update
dbSendUpdate(conn, query)
}
# Push data
db_write_table(conn,"schema.mytable",mydataframe)
Thanks to the other authors.

Resources