R delete rows from data table using sqldf - r

I am wondering if R does not support using sqldf to delete rows from a data table. My data looks like this
and I am trying to delete from a data table using a delete statement. There is no underlying database just a data.table. But hwen I enter the following sql statement:
loans_good <- sqldf("Delete from LoansDT1 where status not in ('Current','Default')")
I get the following error message:
'SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().'
Since I get the same message for update I am wondering if it is a limitation.

This question is a FAQ. See FAQ 8 on the sqldf github home page.
The operation did work. The message is a warning message, not an error message. The message is misleading and you can ignore it. Note that question did not show the complete message -- the complete message does state that it is a warning message.
The warning message comes from RSQLite, not from sqldf itself. It is caused by non-backwardly compatible change that was introduced into RSQLite at some point; however, as stated the actual operation works anyways.
Also delete and update act on tables in the database. They do not return values so even if they work you won't see any result. If you want a result you have to use a select statement after the delete or update to retrieve the modified table.
Here is an example using the built-in 6 row BOD data.frame. It deletes the last row as that row has a Time greater than 5.
library(sqldf)
sqldf(c("delete from BOD where Time > 5", "select * from BOD"))
## Time demand
## 1 1 8.3
## 2 2 10.3
## 3 3 19.0
## 4 4 16.0
## 5 5 15.6
## Warning message:
## In result_fetch(res#ptr, n = n) :
## SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
Note that this is listed in the sqldf issues where a workaround for the message is provided: https://github.com/ggrothendieck/sqldf/issues/40

You need to use dbExecute() to perform delete, update or insert queries.
conn <- dbConnect("Put your connection to your database here")
dbExecute(
conn,
"Delete from LoansDT1 where status not in ('Current','Default')"
)
dbReadTable(conn, LoansDT1) # Check

Related

Sqoop trying to --split-by ROWID (Oracle) fails

(be Kind, this is my first question and I did extensive Research here and on the net beforehand. Question Oracle ROWID for Sqoop Split-By Column did not really solve this issue, as the original Person asking resorted to using another column)
I am using sqoop to copy data from an Oracle 11 DB.
Unfortunately, some tables have no index, no Primary key, only partitions (date). These tables are very large, hundreds of millions if not billions of rows.
so far, I have decided to Access data in the source by explicitly adressing the partitions. That works well and Speeds up the process nicely.
I need to do the splits by data that resides in each and every table in order to avoid too many if- branches in my bash script. (we're talking some 200+ tables here)
I notice that a split by 8 Tasks results in very uneven spread of workload among the Tasks. I considered using Oracle ROWID to define the split.
To do this, I must define a boundary-query. In a Standard query 'select * from xyz' the rowid is not part of the result set. therefore, it is not an option to let Sqoop define the boundary-query from --query.
Now, when I run this, I am getting the error
ERROR tool.ImportTool: Encountered IOException running import job:
java.io.IOException: Sqoop does not have the splitter for the given SQL
data type. Please use either different split column (argument --split-by)
or lower the number of mappers to 1. Unknown SQL data type: -8
samples of ROWID :
AAJXFWAKPAAOqqKAAA
AAJXFWAKPAAOqqKAA+
AAJXFWAKPAAOqqKAA/
it is static and unique once it is created for any row.
I cast this funny datatype into something else in my boundary-query
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect
jdbc:oracle:thin:#127.0.0.1:port:mydb --username $USER --P --m 8
--split-by ROWID --boundary-query "select cast(min(ROWID) as varchar(18)), cast
( max(ROWID)as varchar(18)) from table where laufbzdt >
TO_DATE('2019-02-27', 'YYYY-MM-DD')" --query "select * from table
where laufbzdt > TO_DATE('2019-02-27', 'YYYY-MM-DD') and \$CONDITIONS "
--null-string '\\N'
--null-non-string '\\N'
But then I get ugly ROWIDs that are rejected by Oracle:
select * from table where laufbzdt > TO_DATE('2019-02-27', 'YYYY-MM-DD')
and ( ROWID >= 'AAJX6oAG聕聁AE聉N:' ) AND ( ROWID < 'AAJX6oAH⁖⁁AD䁔䀷' ) ,
Error Msg = ORA-01410: invalid ROWID
how can I resolve this properly?
I am a LINUX-Embryo and have painfully chewed myself through the Topics of bash-shell-scripting and Sqooping so far, but I would like to make better use of evenly spread mapper-task workload - it would cut sqoop-time in half, I guess, saving some 5 to 8 hours.
TIA!
wahlium
You can try ROWNUM, but I think sqoop import does not work with pseudocolumn.

RODBC Teradata Copy Table

I am using RODBC with R to connect to Teradata.
I am trying to copy a large table EXAMPLE (25GB) from the READ_ONLY database to the WORKdatabase. The two databases are under the same DB system so I only need one connection.
I have tried sqlQuery, sqlCopy and sqlCopyTablefunctions but do not succeed.
sqlQuery
EDIT: syntax error corrected as suggested by #dnoeth.
CREATE TABLE WORK.EXAMPLE AS (SELECT * FROM READ_ONLY.EXAMPLE) WITH DATA;
OR
CREATE TABLE WORK.EXAMPLE AS (SELECT * FROM READ_ONLY.EXAMPLE) WITH NO DATA;
INSERT INTO WORK.EXAMPLE SELECT * FROM READ_ONLY.EXAMPLE;
I let the latter method run for 15h but it did not complete the copy.
sqlCopy
sqlCopy(ch,
query='SELECT * FROM READ_ONLY.EXAMPLE',
destination = 'WORK.EXAMPLE')
Error: cannot allocate vector of size 155.0 Mb
Does sqlCopy try to first copy the data to R's memory before creating the new table? If so, how can I bypass this step and work exclusively on the Teradata server? Also, the error persists even if use the option fast=F.
In case R's memory was the issue, I tried creating a smaller table of 1000 rows:
sqlCopy(ch,
query='SELECT * FROM READ_ONLY.EXAMPLE SAMPLE 1000',
destination = 'WORK.EXAMPLE')
Error in sqlSave(destchannel, dataset, destination, verbose = verbose, :
[RODBC] Failed exec in Update
22018 0 [Teradata][ODBC Teradata Driver] Data is not a numeric-literal.
In addition: Warning message:
In odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
character data '2017-03-20 12:08:25' truncated to 15 bytes in column 'ExtractionTS'
With this command a table was actually created but it only includes the column names without any rows.
sqlCopyTable
sqlCopyTable(ch,
srctable = 'READ_ONLY.EXAMPLE',
desttable = 'WORK.EXAMPLE')
Error in if (as.character(keys[[4L]]) == colnames[i]) create <- paste(create, :
argument is of length zero
The syntax in your sqlQuery is not correc, the WITH DATAoption is missing:
CREATE TABLE WORK.EXAMPLE AS (SELECT * FROM READ_ONLY.EXAMPLE) WITH DATA;
Caution, this will loose all NOT NULL & CHECK constraints and all indexes, resulting in the 1st column as Non-Unique Primary Index.
Either add a PI manually or switch to
CREATE TABLE WORK.EXAMPLE AS READ_ONLY.EXAMPLE WITH DATA;
if READ_ONLY.EXAMPLE is a table and you actually want an exact copy.

more the 500 result row in dplyr generic SQL

How can I recieve full results from any (general) SQL query in dplyr? Here is a toy example where the SQL query simply returns the full table.
library("plyr")
library("dplyr")
## connect to a database
hflights_sqlite <- tbl(hflights_sqlite(), "hflights")
my_con <- src_sqlite(hflights_sqlite$src$path)
## here is the problem
tbl(my_con, sql("SELECT * FROM hflights"))
## ...
## Warning message:
## Only first 500 results retrieved. Use n = -1 to retrieve all.
tbl(my_con, sql("SELECT * FROM hflights"), n=-1)
## ...
## Warning message:
## Only first 500 results retrieved. Use n = -1 to retrieve all.
(This is not a question about the particular simple SQL used here, of course)
Use collect(n=Inf) to force dplyr to fetch all data.
Here is an example:
results <- CONNECTION %>% tbl(sql(SQL_QUERY)) %>% collect(n=Inf)
where
CONNECTION in your case is from src_sqlite(hflights_sqlite$src$path)
and
SQL_QUERY in your case is "SELECT * FROM hflights"
It looks like there used to be some bugs in setting the limit for how many would be cached, but it's been fixed: https://github.com/hadley/dplyr/issues/407
#Andreas: If I understand dplyr, it's always lazy for as long as possible. When you execute the tbl call above, or any tbl call, it fetches just enough data to show you that it worked... if you want the entire resultset, you need to collect the results, per #hadley's comment, or in some other way force full evaluation, e.g.,
head(tbl(my_con, sql("SELECT * FROM hflights")), n=999999)
... n=-1 should work, but I haven't yet seen it work properly in my testing.

RODBC sqlQuery() returning error messages on successful execution

This is a bit of a time-sensitive/emergency problem. I have some R code that involves a number of SQL queries using the RODBC package. This code runs every morning on a dedicated Linux server - it pulls down some data, does some statistics, and inserts the data back into an MSSQL DB.
Today, our sysarch upgraded to Ubuntu 12.04.1, and since then, I'm having a bunch of problems with ODBC. After making a few changes, we got it so that ODBC connections can be established, but now there's an even bigger problem. Basically, I'm getting error messages from RODBC every time I do a CREATE or DROP command, even though the identified tables are created/dropped. Example (bts.connect is the result of odbcConnect([connection information])):
> sqlQuery(bts.connect, "SELECT OBJECT_ID('tempdb.dbo.#tempscores')")
1 1050250967
So the #tempscores table exists
> sqlQuery(bts.connect, "DROP TABLE #tempscores")
[1] "[RODBC] ERROR: Could not SQLExecDirect 'DROP TABLE #tempscores'"
Even though it exists, we “can’t” drop it
> sqlQuery(bts.connect, "SELECT OBJECT_ID('tempdb.dbo.#tempscores')")
1 NA
But we have dropped it.
> sqlQuery(bts.connect, "CREATE TABLE #tempscores (dummy int)")
[1] "[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE #tempscores (dummy int)'"
We also can’t create it, but it exists and we can SELECT from it:
sqlQuery(bts.connect, "SELECT OBJECT_ID('tempdb.dbo.#tempscores')")
1 1066251024
sqlQuery(bts.connect, "SELECT * FROM #tempscores")
[1] dummy
<0 rows> (or 0-length row.names)
I'm at wits' end, and I really need this to run successfully tomorrow morning for a client. Anybody have any idea what could be causing this strange behavior?

SQL query error with ODBC connection in R using Informix driver

With functionality from the RODBC package, I have successfully created an ODBC but receive error messages when I try to query the database. I am using the INFORMIX 3.31 32 bit driver (version 3.31.00.10287).
channel <- odbcConnect("exampleDSN")
unclass(channel)
[1] 3
attr(,"connection.string")
[1] "DSN=exampleDSN;UID=user;PWD=****;DB=exampleDB;HOST=exampleHOST;SRVR=exampleSRVR;SERV=exampleSERV;PRO=onsoctcp ... (more parameters)"
attr(,"handle_ptr")
<pointer: 0x0264c098>
attr(,"case")
[1] "nochange"
attr(,"id")
[1] 4182
attr(,"believeNRows")
[1] TRUE
attr(,"colQuote")
[1] "\""
attr(,"tabQuote")
[1] "\""
attr(,"interpretDot")
[1] TRUE
attr(,"encoding")
[1] ""
attr(,"rows_at_time")
[1] 100
attr(,"isMySQL")
[1] FALSE
attr(,"call")
odbcDriverConnect(connection = "DSN=exampleDSN")
When I try to query and investigate the structure of the returned object, I receive an error message 'chr [1:2] "42000 -201 [Informix][Informix ODBC Driver][Informix]A syntax error has occurred." ...'
Specifically, I wrote an expression to loop through all tables in the database, retrieve 10 rows, and investigate the structure of the returned object.
for (i in 1:153){res <- sqlFetch(channel, sqlTables(channel, tableType="TABLE")$TABLE_NAME[i], max=10); str(res)}
Each iteration returns the same error message. Any ideas where to start?
ADDITIONAL INFO: When I return the object 'res', I receive the following -
> res
[1] "42000 -201 [Informix][Informix ODBC Driver][Informix]A syntax error has occurred."
[2] "[RODBC] ERROR: Could not SQLExecDirect 'SELECT * FROM \"exampleTABLE\"'"
The error message you quote is:
"[RODBC] ERROR: Could not SQLExecDirect 'SELECT * FROM \"exampleTABLE\"'"
Informix only recognizes table names enclosed in double quotes if the environment DELIMIDENT is set in the environment, either of the server or the client (or both). It doesn't much matter what it is set to; I use DELIMIDENT=1 when I want delimited identifiers.
How did you create the table in the Informix database? Unless you created the table with DELIMIDENT set, the table name will not be case sensitive; you do not need the quotes around the table name.
The fact that you're getting error -201 means you've got through the connection process; that is a good start, and simplifies what follows.
I'm not sure whether you're on a Unix machine or a Windows machine - it often helps to indicate that. On Windows, you might have to set the environment with SETNET32 (an Informix program), or there may be a way to specify the DELIMIDENT in the connect string. On Unix, you probably set it in your environment and the R software picks it up. However, there might be problems if you launch R via some sort of menu button or option in a GUI environment; the chances are that the profile is not executed before the R program is.
You can try using the sqlQuery() function in RODBC to retrieve your results. This is the function I use at work and have never had a problem with it:
sqlQuery(channel, "select top 10 * from exampleTABLE")
You should be able to put all of your queries into a list and iterate through them as you were before:
dat <- lapply(queries, function(x) sqlQuery(channel, x))
where queries is your list of queries and channel is your open ODBC connection. I guess I should also encourage you to close said connection when your done with odbcCloseAll()

Resources