dbplyr function equivalent to EXISTS in sql - r

Some time ago I did this question: Speeding up PostgreSQL queries (Check if entry exists in another table)
But, since I'm working with DBI with dbplyr as backend, I'd like to know whats is the dbplyr function equivalent to EXISTS function from PostgreSQL.
For while, I'm performing the query using literal SQL sintaxe
myQuery <- 'SELECT "genomic_accession",
"assembly",
"product_accession",
"tmpcol",
( EXISTS (SELECT 1
FROM "cachedb" c
WHERE c.product_accession IN ( pt.product_accession, pt.tmpcol
)) )
AS CACHE,
( EXISTS (SELECT 1
FROM "sbpdb" s
WHERE s.product_accession IN ( pt.product_accession, pt.tmpcol
)) )
AS SBP
FROM (SELECT *
FROM "pairtable2") pt; '
dbExecute(db, myQuery) -> tmp
Then, I tried to pass literal SQL instructions to mutate:
pairTable %>%
head(n=5000) %>%
mutate(
CACHE = sql('EXISTS( select 1 FROM "cacheDB" AS c
WHERE c.product_accession IN ( product_accession, tmpcol) )' ),
SBP = sql('EXISTS( select 1 FROM "SBPDB" AS s
WHERE s.product_accession IN ( product_accession, tmpcol) )' )
)
But this way, I don't know why I'm lacking all cases which are false, the comparison.
And I expect there is an implementation of this method in dbplyr or even some DBI method to this.
Thanks

Related

I'd like to have Characters only (no signs, numbers and spaces at all)

It should be done with SQLite
just like this;
yes, I know, it is quite easy task, If I use UDF(User Define Function).
but, I have severe difficulty with it.
so, looking for another way (no UDF way) to achieve my goal.
Thanks
for your reference,
I leave a link that I have failed to make UDF (using AutoHotkey)
SQLite/AutoHotkey, I have problem with Encoding of sqlite3_result_text return function
I believe that you could base the resolution on :-
WITH RECURSIVE eachchar(counter,rowid,c,rest) AS (
SELECT 1,rowid,'',mycolumn AS rest FROM mytable
UNION ALL
SELECT counter+1,rowid,substr(rest,1,1),substr(rest,2) FROM eachchar WHERE length(rest) > 0 LIMIT 100
)
SELECT group_concat(c,'') AS mycolumn, myothercolumn, mycolumn AS original
FROM eachchar JOIN mytable ON eachchar.rowid = mytable.rowid
WHERE length(c) > 0
AND (
unicode(c) BETWEEN unicode('a') AND unicode('z')
OR unicode(c) BETWEEN unicode('A') AND unicode('Z')
)
GROUP BY rowid;
Demo :-
Perhaps consider the following :-
/* Create the Test Environment */
DROP TABLE IF EXISTS mytable;
CREATE TABLE IF NOT EXISTS mytable (mycolumn TEXT, myothercolumn);
/* Add the Testing data */
INSERT INTO mytable VALUES
('123-abc_"D E F()[]{}~`!##$%^&*-+=|\?><<:;''','A')
,('123-xyz_"X Y Z()[]{}~`!##$%^&*-+=|\?><<:;''','B')
,('123-abc_"A B C()[]{}~`!##$%^&*-+=|\?><<:;''','C')
;
/* split each character thenconcatenat only the required characters*/
WITH RECURSIVE eachchar(counter,rowid,c,rest) AS (
SELECT 1,rowid,'',mycolumn AS rest FROM mytable
UNION ALL
SELECT counter+1,rowid,substr(rest,1,1),substr(rest,2) FROM eachchar WHERE length(rest) > 0 LIMIT 100
)
SELECT group_concat(c,'') AS mycolumn, myothercolumn, mycolumn AS original
FROM eachchar JOIN mytable ON eachchar.rowid = mytable.rowid
WHERE length(c) > 0
AND (
unicode(c) BETWEEN unicode('a') AND unicode('z')
OR unicode(c) BETWEEN unicode('A') AND unicode('Z')
)
GROUP BY rowid;
/* Cleanup Test Environment */
DROP TABLE IF EXISTS mytable;
This results in :-

How can I set query_band for Block Level Compression using Aster's load_to_teradata function,

When you are loading a teradata table in bteq you can set the queryband for block level compression. This even works when you are using querygrid and inserting from a foreign server.
SET QUERY_BAND = 'BlockCompression=Yes;' UPDATE FOR SESSION;
My issue is that I am creating a table on an Aster system and then using the load to teradata function. I suspect there is a syntax call where I can set the query band as part of the load_to_teradata call but after searching the internet and through a reem of teradata documentation I haven't found anything yet.
-- Load Agg data for the YYYYMM to Teradta
SELECT SUM(loaded_row_count),SUM(error_row_count)
FROM load_to_teradata (
ON ( select
Cust_id
, cast(lst_cnf_actvy_dt_tm as date) as lst_cnf_actvy_dt
, cast(sum(str_cnt ) as INTEGER) as acct_open_brnch_use_cnt
, cast(sum(phone_cnt ) as INTEGER) as acct_open_phn_use_cnt
, cast(sum(mail_cnt ) as INTEGER) as acct_open_mail_use_cnt
, cast(sum(onlnchnl_cnt) as INTEGER) as acct_open_onln_use_cnt
, cast(sum(mblbnk_cnt ) as INTEGER) as acct_open_mbl_dvc_use_cnt
, cast(sum(acctopen_cnt) as INTEGER) as acct_open_trck_chnl_cnt
from <someDB>.<someTBL>
where acctopen_cnt > 0
and lower(lst_cnf_actvy_typ_cd) = 'acctopen'
and cast(lst_cnf_actvy_dt_tm as date) between
cast(substring('${YYYYMM}' from 1 for 4) || '-' || substring('${YYYYMM}' from 5 for 2) || '-01' as date) and
cast((cast(substring('${YYYYMM}' from 1 for 4) || '-' || substring('${YYYYMM}' from 5 for 2) || '-01' as date) + interval '1 month') - interval '1 day' as date)
and (str_cnt > 0 or phone_cnt > 0 or mail_cnt > 0 or onlnchnl_cnt > 0 or mblbnk_cnt > 0)
group by 1,2 )
TDPID('TD_RDBMS_C2T') USERNAME('${c2tUID}') PASSWORD('${c2tPWD}') ${LDAP_IND_AST_C2T}
TARGET_TABLE ( 'C2T.t_yyyymm_agg' ) LOG_TABLE ('C2T.t_yyyymm_aggLOG')
MAX_SESSIONS(120));
Was able to get the syntax for the load_to_teradata options. You can see the query_band_sess_info argument after max_sessions and before query_timeout...
load_to_teradata(
ON (source query)
TDPID('tdpid')
TARGET_TABLE('fully-qualified table name')
[ERROR_TABLES('error table'[, 'unique constraint violation table'])]
[LOG_TABLE('table name')]
Teradata QueryGrid: Aster-Teradata Connector
Loading Data From Aster Database to Teradata
Aster Database User Guide for Aster Appliances 301
[USERNAME('username')]
[PASSWORD('password')]
[LOGON_MECHANISM('TD2' | 'LDAP' | 'KRB5')]
[LOGON_DATA('mechanism-specific logon data')]
[ACCOUNT_ID('account-id')]
[TRACE_LEVEL('trace-level')]
[NUM_INSTANCES('instance-count')]
[START_INSTANCE('start-instance')]
[MAX_SESSIONS('max-sessions-number')]
[QUERY_BAND_SESS_INFO('key1=value1;key2=value2;...')]
[QUERY_TIMEOUT('timeout-in-seconds')]
[AUTO_TUNE_INSTANCES('yes'|'no')]
[WORKINGDATABASE(‘dbname’)]
[DIAGNOSTIC_MODE('All'|['GetCOPEntries','CheckConnectivity',
'CheckAuthentication','GetTPTSessions',
'TargetTableOrQuerySchema'])])
);

Translate SQLite query, with subquery, into Peewee statement

I've got a SQL statement that does what I need, but I'm having trouble converting it into the correlated Peewee statement. Here's the SQL I have now, note that I'm using a subquery right now, but I don't care that it's a subquery either way.
select t.name,
count(a.type_id) as total,
(
select count(id)
from assignment a
where a.course_id = 7
and a.due_date < date()
and a.type_id = t.id
group by a.type_id
order by a.type_id
) as completed
from assignment a
inner join type t on t.id = a.type_id
where a.course_id = 7
group by a.type_id
order by a.type_id
Here's the closest I've come to the Peewee statement. Currently I'm aliasing a static number in the query just so that I have a value to work with in my template, so please ignore that part.
Assignment.select(
Type.name,
fn.Lower('1').alias('completed'),
fn.Count(Type.id).alias('total'),
).naive().join(Type).where(
Assignment.course==self,
).group_by(Type.id).order_by(Type.id)
Have you tried just including the subquery as part of the select?
Something like this?
query = (Assignment
.select(
Type.name,
fn.COUNT(Type.id).alias('total'),
Assignment.select(fn.COUNT(Assignment.id)).where(
(Assignment.due_date < fn.DATE()) &
(Assignment.course == 7) &
(Assignment.type == Type.id)
).group_by(Assignment.type).alias('completed'))
.join(Type)
.where(Assignment.course == 7)
.group_by(Type.name))

SQL Paging on more than 10 lac Records

I am using MS SQL 2008 R2. One of my table have more than 10 lac rows — 1 lac is 105 or 100,000, so 10 lac is 1,000,000).
I want to bind this to ASP Gridview. I tried custom paging with page size and index. But grid not binded. Timeout Error occured.
Tried directly execute stored procedure, but it takes a long time.
How can I optimize this procedure ?
My procedure
ALTER PROCEDURE SP_LOAN_APPROVAL_GET_LIST
#USERCODE NVARCHAR(50) ,
#FROMDATE DATETIME = NULL ,
#TODATE DATETIME = NULL ,
#PAGESIZE INT ,
#PAGENO INT ,
#TOTALROW BIGINT OUTPUT
AS
BEGIN
SELECT *
FROM ( SELECT DOC_NO ,
DOC_DATE_GRE ,
EMP_CODE ,
EMP_NAME_ENG as Name ,
LOAN_AMOUNT ,
DESC_ENG as Discription ,
REMARKS ,
ROW_NUMBER() OVER(
ORDER BY ( SELECT 1 )
) AS [ROWNO]
from VW_PER_LOAN
Where isnull( POST_FLAG , 'N' ) = 'N'
and ISNULl( CANCEL_FLAG , 'N' ) != 'Y'
and DOC_DATE_GRE between ISNULL(#FROMDATE , DOC_DATE_GRE )
and ISNULL(#TODATE , DOC_DATE_GRE )
and BRANCH in ( SELECT *
FROM DBO.FN_SSP_GetAllowedBranches(#USERCODE)
)
) T
WHERE T.ROWNO BETWEEN ((#PAGENO-1)*#PAGESIZE)+1 AND #PAGESIZE*(#PAGENO)
SELECT #TOTALROW=COUNT(*)
from VW_PER_LOAN
Where isnull(POST_FLAG,'N')= 'N'
and ISNULl(CANCEL_FLAG,'N')!='Y'
and DOC_DATE_GRE between ISNULL(#FROMDATE,DOC_DATE_GRE)and ISNULL(#TODATE,DOC_DATE_GRE)
and BRANCH in ( SELECT *
FROM DBO.FN_SSP_GetAllowedBranches(#USERCODE)
)
END
Thanks
The first thing to do is to look at your execution plan and discuss it with a DBA if you don't understand it.
The obvious thing that stands out is that your where clause has pretty much every column reference wrapped in some sort of function. That makes them expressions and make the SQL optimizer unable to use any covering indices that might exist.
It looks like you are calling a table-valued function as an uncorrelated subquery. That would worry me with respect to performance. I'd probably move that out of the query. Instead run it just once and populate a temporary table.

Rsqlite takes hours to write table to sqlite database

I have this simple R program that reads a table (1000000 rows, 10 columns) from a sqlite database into an R data.table and then I do some operations on the data and try to write it back into a new table of the same sqlite database. Reading the data takes a few seconds but writing the table back into the sqlite database takes hours. I don't know how long exactly because it has never finished, the longest I have tried is 8 hours.
This is the simplified version of the program:
library(DBI)
library(RSQLite)
library(data.table)
driver = dbDriver("SQLite")
con = dbConnect(driver, dbname = "C:/Database/DB.db")
DB <- data.table(dbGetQuery(con, "SELECT * from Table1"))
dbSendQuery(con, "DROP TABLE IF EXISTS Table2")
dbWriteTable(con, "Table2", DB)
dbDisconnect(con)
dbUnloadDriver(driver)
Im using R version 2.15.2, package version are:
data.table_1.8.6 RSQLite_0.11.2 DBI_0.2-5
I have tried on multiple systems and on different Windows versions and in all cases it takes an incredible amount of time to write this table into the sqlite database. When looking at the file size of the sqlite database it writes at about 50KB per minute.
My question is does anybody know what causes this slow write speed?
Tim had the answer but I can't flag it as such because it is in the comments.
As in:
ideas to avoid hitting memory limit when using dbWriteTable to save an R data table inside a SQLite database
I wrote the data to the database in chunks
chunks <- 100
starts.stops <- floor( seq( 1 , nrow( DB ) , length.out = chunks ) )
system.time({
for ( i in 2:( length( starts.stops ) ) ){
if ( i == 2 ){
rows.to.add <- ( starts.stops[ i - 1 ] ):( starts.stops[ i ] )
} else {
rows.to.add <- ( starts.stops[ i - 1 ] + 1 ):( starts.stops[ i ] )
}
dbWriteTable( con , 'Table2' , DB[ rows.to.add , ] , append = TRUE )
}
})
It takes:
user system elapsed
4.49 9.90 214.26
time to finish writing the data to the database. Apparantly I was hitting the memory limit without knowing it.
Use a single transaction (commit) for all the records. Add a
dbSendQuery(con, "BEGIN")
before the insert and a
dbSendQuery(con, "END")
to complete. Much faster.

Resources