R - RMysql - could not run statement: memory exhausted - r

I have R script for data analysis. I try it on 6 different tables from my mysql database. On 5 of them script works fine. But on last table it don't wont work. There is part of my code :
sql <- ""
#write union select for just one access to database which will optimize code
for (i in 2:length(awq)-1){
num <- awq[i]-1
sql <- paste(sql, "(SELECT * FROM mytable LIMIT ", num, ",1) UNION ")
}
sql <- paste(sql, "(SELECT * FROM mytable LIMIT ", awq[length(awq)-1], ",1)")
#database query
nb <- dbGetQuery(mydb, sql)
My mysql table where script don't work have 21 676 rows. My other tables have under 20 000 rows and with them script work. If it don't work work it give me this error :
Error in .local(conn, statement, ...) :
could not run statement: memory exhausted near '1) UNION (SELECT * FROM mytable LIMIT 14107 ,1) UNION (SELECT * FROM mytabl' at line 1
I understood there is memory problem. But how to solve it ? I don't want delete rows from my table. Is there another way ?

Related

select into temporary table

I believe I should be able to do select * into #temptable from othertable (where #temptable does not previously exist), but it does not work. Assuming that othertable exists and has valid data, and that #sometemp does not exist,
# conn <- DBI::dbConnect(...)
DBI::dbExecute(conn, "select top 1 * into #sometemp from othertable")
# [1] 1
DBI::dbGetQuery(conn, "select * from #sometemp")
# Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid object name '#sometemp'. [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared.
The non-temporary version works without error:
DBI::dbExecute(conn, "select top 1 * into sometemp from othertable")
# [1] 1
DBI::dbGetQuery(conn, "select * from sometemp")
### ... valid data ...
System info:
conn
# <OdbcConnection> myuser#otherdomain-DATA01
# Database: dbname
# Microsoft SQL Server Version: 13.00.5026
DBI::dbGetQuery(conn, "select ##version")
#
# 1 Microsoft SQL Server 2016 (SP2) (KB4052908) - 13.0.5026.0 (X64) \n\tMar 18 2018 09:11:49 \n\tCopyright (c) Microsoft Corporation\n\tStandard Edition (64-bit) on Windows Server 2016 Standard 10.0 <X64> (Build 14393: )\n
Tested on Win11 and Ubuntu. R-4.1.2, DBI-1.1.2, odbc-1.3.3.
I've seen some comments that suggest "select into ..." isn't for temporary tables, but I've also seen several tutorials demonstrate that it works (for them).
Back-story: this is for a generic accessor function for upserting data: I insert into a temp table, do the upsert, then remove the temp table. I can use a non-temp table, but I think there are valid reasons to use temps when justified, and I want to understand why this doesn't or shouldn't work as intended. Other than switching from temps, I could try to reconstitute the structure of the othertable programmatically, but that is prone to interpretative error with some column types. I can't just insert into a temp table since there are times when the data types are imperfectly mapped (such as when I should use nvarchar(max) and/or when a new column is indeterminant due to being all-NA).
Related links:
Insert Data Into Temp Table with Query from 2013
https://www.sqlshack.com/select-into-temp-table-statement-in-sql-server/ from 2021
There are few different approaches:
Use the immediate arg in your DBI::dbExecute statement
DBI::dbExecute(conn, "select top 5 * into #local from sometable", immediate=TRUE)
DBI::dbGetQuery(conn, "select * from #local")
Use a global temp table
DBI::dbExecute(conn, "select top 5 * into ##global from sometable")
DBI::dbGetQuery(conn, "select * from ##global")
Use dplyr/dbplyr
tt = tbl(conn, sql("select top 5 * from sometable")) %>% compute()
tt
Also see here: https://github.com/r-dbi/odbc/issues/127

R, ClickHouse: Expected: FixedString(34). Got: UInt64: While processing

I am trying to query data from ClickHouse database from R with subset.
Here is the example
library(data.table)
library(RClickhouse)
library(DBI)
subset <- paste(traffic[,unique(IDs)][1:30], collapse = ',')
conClickHouse <- DBI::dbConnect('here is the connection')
DataX <- dbgetdbGetQuery(conClickHouse, paste0("select * from database
and IDs in (", subset ,") ", sep = "") )
As a result I get error:
DB::Exception: Type mismatch in IN or VALUES section. Expected: FixedString(34).
Got: UInt64: While processing (IDs IN ....
Any help is appreciated
Thanks to the comment of #DennyCrane,
"select * from database where toFixedString(IDs,34) in
(toFixedString(ID1, 34), toFixedString(ID2,34 ))"
This query subset properly
https://clickhouse.tech/docs/en/sql-reference/functions/#strong-typing
Strong Typing
In contrast to standard SQL, ClickHouse has strong typing. In other words, it doesn’t make implicit conversions between types. Each function works for a specific set of types. This means that sometimes you need to use type conversion functions.
https://clickhouse.tech/docs/en/sql-reference/functions/type-conversion-functions/#tofixedstrings-n
select * from (select 'x' B ) where B in (select toFixedString('x',1))
DB::Exception: Types of column 1 in section IN don't match: String on the left, FixedString(1) on the right.
use casting toString or toFixedString
select * from (select 'x' B ) where toFixedString(B,1) in (select toFixedString('x',1))

Why is sqlQuery from RODBC not always returning the same data when querying an Impala DB?

I'm trying to get some data from an Impala database using the sqlQuery function from the RODBC package. The results I get changes from one execution of a query to another execution of the exact same query.
The data.frame I get doesn't always have the same number of rows:
library("RODBC")
conn <- odbcConnect("Cloudera Impala DSN;host=mydb;port=21050")
df<-sqlQuery(conn, "select * from hydrau.hydr where flight= 'V0051'")
dim(df)
[1] 26600 220
df<-sqlQuery(conn, "select * from hydrau.hydr where flight= 'V0051'")
dim(df)
[1] 142561 220
df<-sqlQuery(conn, "select * from hydrau.hydr where flight= 'V0051'")
dim(df)
[1] 23500 220
This query should in fact return a 142561 x 220 data frame.
On the other hand, the following query always return the same (correct) result :
sqlQuery(conn, "select count(*) from hydr where flight= 'V0051' ")
count(*)
1 142561
It seems my problem was that Impala didn't have enough memory to perform well.

Using variable in "IN" function of SQL query in R

I am having a variable x which contains 20000 IDs. I want to write a sql query like,
select * from tablename where ID in x;
I am trying to implement this in R where I can get the values only for IDs in x variable. The following is my try,
dbSendQuery(mydb, "select * from tablename where ID in ('$x') ")
I am not getting any error while trying this. But it is returning 0 values.
Next tried using
sprintf("select * from tablename where ID in %s",x)
But this is creating 20000 individual queries which could prove costly in DB.
Can anybody suggest me a way to write a command, which would loop through IDs in x and save to a Dataframe in R in a single query?
You need to have the codes in the actual string. Here is how I would do it with gsub
x <- LETTERS[1:3]
sql <- "select * from tablename where ID in X_ID_CODES "
x_codes <- paste0("('", paste(x, collapse="','"), "')")
sql <- gsub("X_ID_CODES", x_codes, sql)
# see new output
cat(sql)
select * from tablename where ID in ('A','B','C')
# then submit the query
#dbSendQuery(mydb, sql)
How about pasting it:
dbSendQuery(mydb, paste("select * from tablename where ID in (", paste(x, collapse = ","), ")"))

dbgetquery java.sql.SQLException: Bigger type length than Maximum

I am trying to fetch a decently large result set (about 1-2M records) using RJDBC using the following
library(RJDBC)
drv <- JDBC("oracle.jdbc.driver.OracleDriver",
classPath="../oracle11g/ojdbc6.jar", " ")
con <- dbConnect(drv, "jdbc:oracle:thin:#hostname:1521/servname","user","pswd")
data <- dbGetQuery(con, "select * from largeTable where rownum < xxx")
The above works if xxx is less than 32768. Above 32800, I get the following exception
> data <- dbGetQuery(con, "select * from dba_objects where rownum < 32768")
> dim(data)
[1] 32767 15
> data <- dbGetQuery(con, "select * from dba_objects where rownum < 32989")
Error in .jcall(rp, "I", "fetch", stride) :
java.sql.SQLException: Bigger type length than Maximum
In https://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf, I see "fetch retrieves the content of the result set in the form of a data frame. If n is -1 then the current implementation fetches 32k rows first and then (if not sufficient) continues with chunks of 512k rows, appending them." followed by "Note that some databases (like Oracle) don’t support a fetch size of more than 32767."
Sorry for the newbie question but I don't see how I can tell dbGetQuery to fetch the result set in chunks of 32K only. I believe my fetch is dying because it went to fetch 512K records.
Would really appreciate any suggestions. Thanks in advance.

Resources