Using R-3.5.0 and RODBC v. 1.3-15 on Windows.
I am trying to query data from a remote database. I can connect fine and if I do a query to count the rows, the answer comes out correctly. But if I try to remove the count statement select count(*) and actually get the data via select *, I yield an empty query (with some rather strange headers). Only two of the column names come out correctly and the rest are question marks and a number (as shown below). I can using sql developer to query the data no problem.
I include the simplest version of the code below but I get the same results if I try to limit to just a few rows or certain conditions, etc. Sorry I cannot create a reproducible example but as this is a remote db and I have no idea what the problem is, I'm not sure how I could even do that.
I can query other tables from different schemas within the same odbc connection, so I don't think it is that. I have tried with and without the believeNRows and the rows_at_time.
Thank you for any thoughts.
channel <- odbcConnect("mydb", uid="myuser", pwd="mypass", believeNRows=FALSE,rows_at_time = 1)
myquery <- paste("select count(*) from MYSCHEMA.MYTABLE")
sqlQuery(channel, myquery)
COUNT(*)
1 149712361
myquery <- paste("select * from MYSCHEMA.MYTABLE")
sqlQuery(channel, myquery)
[1] ID FMC_IN_ID ? ?.1 ?.2 ?.3 ?.4 ?.5 ?.6 ?.7 ?.8 ?.9 ?.10 ?.11 ?.12 ?.13 ?.14 ?.15
<0 rows> (or 0-length row.names)
I would try the following:
add a simple limit 100 to your query to see if you can get some data back
add the believeNRows option to the sqlQuery call -- in my experience it is needed at that level
In case it helps others, the problem was that the database contained an Oracle spatial field (MDSYS.SDO_GEOMETRY). R did not know what to do with it. I assumed it would just convert it to a character but instead it just got confused. By omitting the spatial field, the query worked fine.
Related
I have a database called "db" with a table called "company" which has a column named "name".
I am trying to look up a company name in db using the following query:
dbGetQuery(db, 'SELECT name,registered_address FROM company WHERE LOWER(name) LIKE LOWER("%APPLE%")')
This give me the following correct result:
name
1 Apple
My problem is that I have a bunch of companies to look up and their names are in the following data frame
df <- as.data.frame(c("apple", "microsoft","facebook"))
I have tried the following method to get the company name from my df and insert it into the query:
sqlcomp <- paste0("'SELECT name, ","registered_address FROM company WHERE LOWER(name) LIKE LOWER(",'"', df[1,1],'"', ")'")
dbGetQuery(db,sqlcomp)
However this gives me the following error:
tinyformat: Too many conversion specifiers in format string
I've tried several other methods but I cannot get it to work.
Any help would be appreciated.
this code should work
df <- as.data.frame(c("apple", "microsoft","facebook"))
comparer <- paste(paste0(" LOWER(name) LIKE LOWER('%",df[,1],"%')"),collapse=" OR ")
sqlcomp <- sprintf("SELECT name, registered_address FROM company WHERE %s",comparer)
dbGetQuery(db,sqlcomp)
Hope this helps you move on.
Please vote my solution if it is helpful.
Using paste to paste in data into a query is generally a bad idea, due to SQL injection (whether truly injection or just accidental spoiling of the query). It's also better to keep the query free of "raw data" because DBMSes tend to optimize a query once and reuse that optimized query every time it sees the same query; if you encode data in it, it's a new query each time, so the optimization is defeated.
It's generally better to use parameterized queries; see https://db.rstudio.com/best-practices/run-queries-safely/#parameterized-queries.
For you, I suggest the following:
df <- data.frame(names = c("apple", "microsoft","facebook"))
qmarks <- paste(rep("?", nrow(df)), collapse = ",")
qmarks
# [1] "?,?,?"
dbGetQuery(con, sprintf("select name, registered_address from company where lower(name) in (%s)", qmarks),
params = tolower(df$names))
This takes advantage of three things:
the SQL IN operator, which takes a list (vector in R) of values and conditions on "set membership";
optimized queries; if you subsequently run this query again (with three arguments), then it will reuse the query. (Granted, if you run with other than three companies, then it will have to reoptimize, so this is limited gain);
no need to deal with quoting/escaping your data values; for instance, if it is feasible that your company names might include single or double quotes (perhaps typos on user-entry), then adding the value to the query itself is either going to cause the query to fail, or you will have to jump through some hoops to ensure that all quotes are escaped properly for the DBMS to see it as the correct strings.
I have a script that runs a function on many databases. It has been working correctly up to now, but today, it returned the fields of a new db I just built in a very strange order. This is the order of the fields in the db (Azure):
ID, IndNum,CorpCode,IndName,Type,LOP_Tar,Active,UpisBetter,LOPisCumm
The script tries to download that full table. This is how I do that from R:
Q <- 'SELECT * FROM XXXX'
T <- sqlQuery(channel, Q, rows_at_time = 5)
names(T)
[1] "ID" "Active" "UpIsBetter" "LOPisCumm" "IndNum" "CorpCode" "IndName" "Type"
[9] "LOP_Tar"
So it's returning the fields in the following order: 1,7:9,2:6. Why?
Now, two things are special about THIS database. 1) it's in Spanish, and 2) It has less fields than it's sister databases. As a sanity check, I just reran one of the other dbs with the same code, and it perfectly gives the field names in order... Anyone know what's going on? Is it related to this?
If SELECT * doesn't give a reproducible result.... I need to reconsider some things about my life...
How to get the list of table names from database for certain scheme?
tabellen <- dbListTables(con, all=T)
gives all the tables from database, but i would like to specify the scheme. I read in the ROracle package that i can specify scheme like:
tabellen <- dbListTables(con, schema="K")
However i get an empty character...
when i use sql command:
rs <- dbSendQuery(con, "SELECT * FROM ALL_TABLES WHERE OWNER ='K'")
data <- fetch(rs)
It works but i get a table, not a list what i would prefer too.. Is there a way to get directly the list of tables? [SOLVED] - too much programming..I wrote scheme instead of schema...Thanks for pointing it out, my bad, sorry for that
And additionally how i can get the name of columns for certain table which i choosed [NOT SOLVED]
Thanks for help
I am using R in combination with SQLite using RSQLite to persistate my data since I did not have sufficient RAM to constantly store all columns and calculate using them. I have added an empty column to the SQLite database using:
dbGetQuery(db, "alter table test_table add column newcol real)
Now I want to fill this column using data I calculated in R and which is stored in my data.table column dtab$newcol. I have tried the following approach:
dbGetQuery(db, "update test_table set newcol = ? where id = ?", bind.data = data.frame(transactions$sum_year, transactions$id))
Unfortunately, R seems like it is doing something but is not using any CPU time or RAM allocation. The database does not change size and even after 24 hours nothing has changed. Therefore, I assume it has crashed - without any output.
Am I using the update statement wrong? Is there an alternative way of doing this?
UPDATE
I have also tried the RSQLite functions dbSendQuery and dbGetPreparedQuery - both with the same result. However, what does work is updating a single row without the use of bind.data. A loop to update the column, therefore, seems possible but I will have to evaluate the performance since the dataset is huge.
As mentioned by #jangorecki the problem had to do with SQLite performance. I disabled synchronous and set journal_mode to off (which has to be done for every session).
dbGetQuery(transDB, "PRAGMA synchronous = OFF")
dbGetQuery(transDB, "PRAGMA journal_mode = OFF")
Also I changed my RSQLite code to use dbBegin(), dbSendPreparedQuery() and dbCommit(). It is takes a while but at least it works not and has an acceptable performance.
I am using DB Browser for SQLite to extract some interesting data from my database but I encountered one big problem with GROUP BY statement.
Even the most basic SELECT I can imagine is not working properly.
(Filename nvarchar(2147483647))
SELECT FileName FROM TableName WHERE FileName LIKE '%Nieminen%' GROUP BY FileName gives 5 rows even though I know that there are 9 distinct FileNames containing the phrase 'Nieminen' (I've browsed it).
Can it be possible that GROUP BY in sqlite compares only N (e.g. 10) initial characters? From my observation it might be true...
Any clues?
why don't you try out the following:
SELECT DISTINCT FileName FROM TableName WHERE FileName LIKE '%Nieminen%'