I have a database called "db" with a table called "company" which has a column named "name".
I am trying to look up a company name in db using the following query:
dbGetQuery(db, 'SELECT name,registered_address FROM company WHERE LOWER(name) LIKE LOWER("%APPLE%")')
This give me the following correct result:
name
1 Apple
My problem is that I have a bunch of companies to look up and their names are in the following data frame
df <- as.data.frame(c("apple", "microsoft","facebook"))
I have tried the following method to get the company name from my df and insert it into the query:
sqlcomp <- paste0("'SELECT name, ","registered_address FROM company WHERE LOWER(name) LIKE LOWER(",'"', df[1,1],'"', ")'")
dbGetQuery(db,sqlcomp)
However this gives me the following error:
tinyformat: Too many conversion specifiers in format string
I've tried several other methods but I cannot get it to work.
Any help would be appreciated.
this code should work
df <- as.data.frame(c("apple", "microsoft","facebook"))
comparer <- paste(paste0(" LOWER(name) LIKE LOWER('%",df[,1],"%')"),collapse=" OR ")
sqlcomp <- sprintf("SELECT name, registered_address FROM company WHERE %s",comparer)
dbGetQuery(db,sqlcomp)
Hope this helps you move on.
Please vote my solution if it is helpful.
Using paste to paste in data into a query is generally a bad idea, due to SQL injection (whether truly injection or just accidental spoiling of the query). It's also better to keep the query free of "raw data" because DBMSes tend to optimize a query once and reuse that optimized query every time it sees the same query; if you encode data in it, it's a new query each time, so the optimization is defeated.
It's generally better to use parameterized queries; see https://db.rstudio.com/best-practices/run-queries-safely/#parameterized-queries.
For you, I suggest the following:
df <- data.frame(names = c("apple", "microsoft","facebook"))
qmarks <- paste(rep("?", nrow(df)), collapse = ",")
qmarks
# [1] "?,?,?"
dbGetQuery(con, sprintf("select name, registered_address from company where lower(name) in (%s)", qmarks),
params = tolower(df$names))
This takes advantage of three things:
the SQL IN operator, which takes a list (vector in R) of values and conditions on "set membership";
optimized queries; if you subsequently run this query again (with three arguments), then it will reuse the query. (Granted, if you run with other than three companies, then it will have to reoptimize, so this is limited gain);
no need to deal with quoting/escaping your data values; for instance, if it is feasible that your company names might include single or double quotes (perhaps typos on user-entry), then adding the value to the query itself is either going to cause the query to fail, or you will have to jump through some hoops to ensure that all quotes are escaped properly for the DBMS to see it as the correct strings.
Related
For example I made an SQL table with column names "Names", "Class", "age" and I have a data-frame which I made using R code:
data_structure1<- as.data.frame("Name")
data_structure1
data_structure2<-as.data.frame("Class")
data_structure2
data_structure3<-as.data.frame("age")
data_structure3
final_df<- cbind(data_structure1,data_structure2,data_structure3)
final_df
#dataframe "Name" contains multiple entries as different names of students.
#dataframe "Class" contains multiple entries as classes in which student are studying like 1,2,3,4 etc.
#dataframe "age" contains multiple entries as ages of students.
I want to insert final_df in my SQL table containing column names as "Names", "Class", "age".
Below is the command which I am using for inserting only one column in one column of SQL table and this is working fine for me.
title<- sqlQuery(conn,paste0("INSERT INTO AMPs(Names) VALUES('",Names, "')"))
title
But this time I want to insert a dataframe of multiple columns in multiple columns of SQL table.
Please help me with this. Thanks in advance.
Using the DBI package and a package for your database (e.g., RPostgres) you can do something like this, assuming the table already exists:
AMPs <- data.frame(name = c("Foo", "Bar"), Class = "Student", age = c(21L, 22L))
conn <- DBI::dbConnect(...) # read doc to set parameters
## Create a parameterized INSERT statement
db_handle <- DBI::dbSendStatement(
conn,
"INSERT INTO AMPs (Name, Class, age) VALUES ($1, $2, $3)"
)
## Passing the data.frame
DBI::dbBind(db_handle, params = unname(as.list(AMPs)))
DBI::dbHasCompleted(db_handle)
## Close statement handle and connection
DBI::dbClearResult(db_handle)
DBI::dbDisconnect(conn)
You may also want to look at DBI::dbAppendTable.
Edit:
The example above works with PostgreSQL, and I see that the OP tagged the question with 'sql-server'. I don't have means for testing it with SQL Server, but the DBI interface aims at being general so I believe it should work. One important thing to note is the syntax, and, in particular, the syntax for the bind parameters. In PostgreSQL the parameters are defined as $1 for the first parameter, $2 for the second and so on. This might be different for other databases. I think SQL Server is using ?, and that that the binding is done based on position - i.e., the first ? is bind to the first supplied parameter, the second ? to the second and so on.
Using R-3.5.0 and RODBC v. 1.3-15 on Windows.
I am trying to query data from a remote database. I can connect fine and if I do a query to count the rows, the answer comes out correctly. But if I try to remove the count statement select count(*) and actually get the data via select *, I yield an empty query (with some rather strange headers). Only two of the column names come out correctly and the rest are question marks and a number (as shown below). I can using sql developer to query the data no problem.
I include the simplest version of the code below but I get the same results if I try to limit to just a few rows or certain conditions, etc. Sorry I cannot create a reproducible example but as this is a remote db and I have no idea what the problem is, I'm not sure how I could even do that.
I can query other tables from different schemas within the same odbc connection, so I don't think it is that. I have tried with and without the believeNRows and the rows_at_time.
Thank you for any thoughts.
channel <- odbcConnect("mydb", uid="myuser", pwd="mypass", believeNRows=FALSE,rows_at_time = 1)
myquery <- paste("select count(*) from MYSCHEMA.MYTABLE")
sqlQuery(channel, myquery)
COUNT(*)
1 149712361
myquery <- paste("select * from MYSCHEMA.MYTABLE")
sqlQuery(channel, myquery)
[1] ID FMC_IN_ID ? ?.1 ?.2 ?.3 ?.4 ?.5 ?.6 ?.7 ?.8 ?.9 ?.10 ?.11 ?.12 ?.13 ?.14 ?.15
<0 rows> (or 0-length row.names)
I would try the following:
add a simple limit 100 to your query to see if you can get some data back
add the believeNRows option to the sqlQuery call -- in my experience it is needed at that level
In case it helps others, the problem was that the database contained an Oracle spatial field (MDSYS.SDO_GEOMETRY). R did not know what to do with it. I assumed it would just convert it to a character but instead it just got confused. By omitting the spatial field, the query worked fine.
Sorry in advance due to being new to Rstudio...
There are two parts to this question:
1) I have a large database that has almost 6,000 tables in it. Many of these tables have no data in them. Is there a code using R to only pull a list of tables names that have data in them?
I know how to pull a list of all table names and how to pull specific table data using the code below..
test<-odbcDriverConnect('driver={SQL Server};server=(SERVER);database=(DB_Name);trusted_connection=true')
rest<-sqlQuery(test,'select*from information_schema.tables')
Table1<-sqlFetch(test, "PROPERTY")
Above is the code I use to access the database and tables.
"test" is the connection
"rest" shows the list of 5,803 tables names.. one of which is called "PROPERTY"
"Table1" is simply pulling one of the tables named "PROPERTY".
I am looking to make "rest" only show the data tables that have data in them.
2) My ultimate goal, which leads to the second question, is to create a table that shows a list of every table from this database in column#1 and then column 2,3,4,etc... would include every one of the column headers that is contained in each table. Any idea how do to that?
Thanks so much!
The Tables object below returns a data frame giving all of the tables in the database and how many rows are in each table. As a condition, it requires that any table selected have at least one record. This is probably the fastest way to get your list of non-empty tables. I pulled the query to get that information from https://stackoverflow.com/a/14163881/1017276
My only reservation about that query is that it doesn't give the schema name, and it is possible to have tables with the same name in different schemas. So this is likely only going to work well within one schema at a time.
library(RODBCext)
Tables <-
sqlExecute(
channel = test,
query = "SELECT T.name TableName, I.rows Records
FROM sysobjects t, sysindexes i
WHERE T.xtype = ? AND I.id = T.id AND I.indid IN (0,1) AND I.rows > 0
ORDER BY TableName;",
data = list(xtype = "U"),
fetch = TRUE,
stringsAsFactors = FALSE
)
This next part uses the tables you found above and then gets the column information from each of those tables. Lastly, it makes on single data frame with all of the column names.
Columns <-
lapply(Tables$TableName,
function(x) sqlColumns(test, x))
Columns <- do.call("rbind", Columns)
sqlColumns is a function in RODBC.
sqlExecute is a function in RODBCext that allows for parameterized queries. I tend to use that anytime I need to use quoted strings in a query.
How to get the list of table names from database for certain scheme?
tabellen <- dbListTables(con, all=T)
gives all the tables from database, but i would like to specify the scheme. I read in the ROracle package that i can specify scheme like:
tabellen <- dbListTables(con, schema="K")
However i get an empty character...
when i use sql command:
rs <- dbSendQuery(con, "SELECT * FROM ALL_TABLES WHERE OWNER ='K'")
data <- fetch(rs)
It works but i get a table, not a list what i would prefer too.. Is there a way to get directly the list of tables? [SOLVED] - too much programming..I wrote scheme instead of schema...Thanks for pointing it out, my bad, sorry for that
And additionally how i can get the name of columns for certain table which i choosed [NOT SOLVED]
Thanks for help
I can't figure out how to update an existing DB2 database in R or update a single value in it.
I can't find much information on this topic online other than very general information, but no specific examples.
library(RJDBC)
teachersalaries=data.frame(name=c("bob"), earnings=c(100))
dbSendUpdate(conn, "UPDATE test1 salary",teachersalaries[1,2])
AND
teachersalaries=data.frame(name=c("bob",'sally'), earnings=c(100,200))
dbSendUpdate(conn, "INSERT INTO test1 salary", teachersalaries[which(teachersalaries$earnings>200,] )
Have you tried passing a regular SQL statement like you would in other languages?
dbSendUpdate(conn, "UPDATE test1 set salary=? where id=?", teachersalary, teacherid)
or
dbSendUpdate(conn,"INSERT INTO test1 VALUES (?,?)",teacherid,teachersalary)
Basically you specify the regular SQL DML statement using parameter markers (those question marks) and provide a list of values as comma-separated parameters.
Try this, it worked for me well.
dbSendUpdate(conn,"INSERT INTO test1 VALUES (?,?)",teacherid,teachersalary)
You just need to pass a regular SQL piece in the same way you do in any programing langs. Try it out.
To update multiple rows at the same time, I have built the following function.
I have tested it with batches of up to 10,000 rows and it works perfectly.
# Libraries
library(RJDBC)
library(dplyr)
# Function upload data into database
db_write_table <- function(conn,table,df){
# Format data to write
batch <- apply(df,1,FUN = function(x) paste0("'",trimws(x),"'", collapse = ",")) %>%
paste0("(",.,")",collapse = ",\n")
#Build query
query <- paste("INSERT INTO", table ,"VALUES", batch)
# Send update
dbSendUpdate(conn, query)
}
# Push data
db_write_table(conn,"schema.mytable",mydataframe)
Thanks to the other authors.