I need a desired result with the less number of execution time.
I have a table which contains many rows (over 100k) , in this table a field name is notes varchar2(1800).
It contains following values:
notes
CASE Transfer
Surnames AAA : BBBB
Case Status ACCOUNT TXFERRED TO BORROWERS
Completed Date 25/09/2022
Task Group 16
Message sent at 12/10/2012 11:11:21
Sender : lynxfailures123#google.com
Recipient : LFRB568767#yahoo.com
Received : 21:31 12/12/2002
Rows should return with the values of(ACCOUNT TXFERRED TO BORROWERS).
I have used the following queries but it takes a long time(72150436 sec) to execute:
Select * from cps_case_history where (dbms_lob.instr(notes, 'ACCOUNT
TFR TO UFSS') > 1)
Select * from cps_case_history where notes like '%ACCOUNT TFR TO
UFSS%'
Could you please share us the exact query which will take less time to execute.
Can you try parallel hints. Optimizer hints
Select /*+ PARALLEL(a,8) */ a.* from cps_case_history a
where INSTR(NOTES,'Text you want to search') > 0; -- your condition
Replace 8 with 16 and see if the performance improves further.
Avoid % in beginning of the like operator
ie., where notes like '%Account...'
Updated answer : Try creating partition tables.You can go with range partitioning on completed_date column Partitioning
Related
I am using the following query to obtain the current component serial number (tr_sim_sn) installed on the host device (tr_host_sn) from the most recent record in a transaction history table (PUB.tr_hist)
SELECT tr_sim_sn FROM PUB.tr_hist
WHERE tr_trnsactn_nbr = (SELECT max(tr_trnsactn_nbr)
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_lot = '99524136'
AND tr_part = '6684112-001')
The actual table has ~190 million records. The excerpt below contains only a few sample records, and only fields relevant to the search to illustrate the query above:
tr_sim_sn |tr_host_sn* |tr_host_pn |tr_domain |tr_trnsactn_nbr |tr_qty_loc
_______________|____________|_______________|___________|________________|___________
... |
356136072015140|99524135 |6684112-000 |vattal_us |178415271 |-1.0000000000
356136072015458|99524136 |6684112-001 |vattal_us |178424418 |-1.0000000000
356136072015458|99524136 |6684112-001 |vattal_us |178628048 |1.0000000000
356136072015050|99524136 |6684112-001 |vattal_us |178628051 |-1.0000000000
356136072015836|99524137 |6684112-005 |vattal_us |178645337 |-1.0000000000
...
* = key field
The excerpt illustrates multiple occurrences of tr_trnsactn_nbr for a single value of tr_host_sn. The largest value for tr_trnsactn_nbr corresponds to the current tr_sim_sn installed within tr_host_sn.
This query works, but it is very slow, ~8minutes.
I would appreciate suggestions to improve or refactor this query to improve its speed.
Check with your admins to determine when they last updated the SQL statistics. If the answer is "we don't know" or "never" then you might want to ask them to run the following 4gl program which will create a SQL script to accomplish that:
/* genUpdateSQL.p
*
* mpro dbName -p util/genUpdateSQL.p -param "tmp/updSQLstats.sql"
*
* sqlexp -user userName -password passWord -db dnName -S servicePort -infile tmp/updSQLstats.sql -outfile tmp/updSQLtats.log
*
*/
output to value( ( if session:parameter <> "" then session:parameter else "updSQLstats.sql" )).
for each _file no-lock where _hidden = no:
put unformatted
"UPDATE TABLE STATISTICS AND INDEX STATISTICS AND ALL COLUMN STATISTICS FOR PUB."
'"' _file._file-name '"' ";"
skip
.
put unformatted "commit work;" skip.
end.
output close.
return.
This will generate a script that updates statistics for all table and all indexes. You could edit the output to only update the tables and indexes that are part of this query if you want.
Also, if the admins are nervous they could, of course, try this on a test db or a restored backup before implementing in a production environment.
I am posting this as a response to my request for an improved query.
As it turns out, the following syntax features two distinct features that greatly improved the speed of the query. One is to include tr_domain search criteria in both main and nested portions of the query. Second is to narrow the search by increasing the number of search criteria, which in the following are all included in the nested section of the syntax:
SELECT tr_sim_sn,
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_trnsactn_nbr IN (
SELECT MAX(tr_trnsactn_nbr)
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_part = '6684112-001'
AND tr_lot = '99524136'
AND tr_type = 'ISS-WO'
AND tr_qty_loc < 0)
This syntax results in ~0.5s response time. (credit to my colleague, Daniel V.)
To be fair, this query uses criteria outside the originally stated parameters that were included in the original post, making it difficult to impossible for others to attempt a reasonable answer. This omission was not on purpose of course, rather due to being fairly new to fundamentals of good query design. This query in part is a result of learning that when too-few or non-indexed fields are used as search criteria in a large table, it is sometimes helpful to narrow the search by increasing the number of search criteria items. The original had 3, this one has 5.
I'm making a flight tracking map that will need to pull live data from a sql lite db. I'm currently just using the sqlite executable to navigate the db and understand how to interact with it. Each aircraft is identified by a unique hex_ident. I want to get a list of all aircraft that have sent out a signal in the last minute as a way of identifying which aircraft are actually active right now. I tried
select distinct hex_ident, parsed_time
from squitters
where parsed_time >= Datetime('now','-1 minute')
I expected a list of 4 or 5 hex_idents only but I'm just getting a list of every entry (today's entries only) and some are outside the 1 minute bound. I'm new to sql so I don't really know how to do this yet. Here's what each entry looks like. The table is called squitters.
{
"message_type":"MSG",
"transmission_type":8,
"session_id":"111",
"aircraft_id":"11111",
"hex_ident":"A1B4FE",
"flight_id":"111111",
"generated_date":"2021/02/12",
"generated_time":"14:50:42.403",
"logged_date":"2021/02/12",
"logged_time":"14:50:42.385",
"callsign":"",
"altitude":"",
"ground_speed":"",
"track":"",
"lat":"",
"lon":"",
"vertical_rate":"",
"squawk":"",
"alert":"",
"emergency":"",
"spi":"",
"is_on_ground":0,
"parsed_time":"2021-02-12T19:50:42.413746"
}
Any ideas?
You must remove 'T' from the value of parsed_time or use datetime() for it also to make the comparison work:
where datetime(parsed_time) >= datetime('now', '-1 minute')
Note that datetime() function does not take into account microseconds, so if you need 100% accuracy, you must put them in the code with concatenation:
where replace(parsed_time, 'T', ' ') >=
datetime('now', '-1 minute') || substr(parsed_time, instr(parsed_time, '.'))
I wrote a query which contains multiple for each statements. The query is taking more than 20 minutes to fetch the data. Is there a way to check what time each loop started and ended. (How much time does each loop takes to execute and also the total time taken to complete the program).
You could do as you ask (just follow JensD's suggestsions) but you would likely be better served to use the profiler. You can easily add profiling for a code snippet:
assign
profiler:enabled = yes
profiler:description = "description of this test"
profiler:profiling = yes
profiler:file-name = "filename.prf"
.
/* this is deliberately awful code that should take a long time to run */
for each orderline no-lock:
for each order no-lock:
for each customer no-lock:
if customer.custNum = order.custNum and orderLine.orderNum = orderLine.orderNum then
. /* do something */
end.
end.
end.
/* end of test snippet */
assign
profiler:enabled = no
profiler:profiling = no
.
profiler:write-data().
You can then load that prf file into an analysis tool. The specifics depend on your development environment - if you are using an up to date version of PSDOE there is a Profiler analyzer included, if not you might want to download ProTop
https://demo.wss.com/download.php and use the simple report included in lib/zprof_topx.p.
Ultimately what you are going to discover is that one or more of your FOR EACH statements is almost certainly using a WHERE clause that is a poor match for your available indexes.
To fix that you will need to determine which indexes are actually being selected and review the index selection rules. Some excellent material on that topic can be found here: http://pugchallenge.org/downloads2019/303_FindingData.pdf
If you don't want to go to the trouble of reading that then you should at least take a look at the actual index selection as shown by:
compile program.p xref program.xref
Do the selected indexes match your expectation? Did WHOLE-INDEX (aka "table scan") show up?
Using ETIME you can initiate a counter of milliseconds. It could be called once or several times to tell how much time has passed since reset.
ETIME(TRUE).
/*
Loop is here but instead I'll insert a small pause.
*/
PAUSE 0.5.
MESSAGE "This took" ETIME "milliseconds" VIEW-AS ALERT-BOX.
Milliseconds might not be useful when dealing with several minutes. Then you can use TIME to keep track of seconds but you need to handle start time yourself then.
DEFINE VARIABLE iStart AS INTEGER NO-UNDO.
iStart = TIME.
/*
Loop is here but instead I'll insert a slightly longer pause.
*/
PAUSE 2.
MESSAGE "This took" TIME - iStart "seconds" VIEW-AS ALERT-BOX.
If you want to keep track of several times then it might be better to output to a log file instead of using a MESSAGE-box that will stop execution until it's clicked.
DEFINE VARIABLE i AS INTEGER NO-UNDO.
DEFINE STREAM str.
OUTPUT STREAM str TO c:\temp\timing.txt.
ETIME(TRUE).
/*
Fake loop
*/
DO i = 1 TO 20:
PAUSE 0.1.
PUT STREAM str UNFORMATTED "Timing no " i " " ETIME "ms" SKIP.
END.
OUTPUT CLOSE.
(be Kind, this is my first question and I did extensive Research here and on the net beforehand. Question Oracle ROWID for Sqoop Split-By Column did not really solve this issue, as the original Person asking resorted to using another column)
I am using sqoop to copy data from an Oracle 11 DB.
Unfortunately, some tables have no index, no Primary key, only partitions (date). These tables are very large, hundreds of millions if not billions of rows.
so far, I have decided to Access data in the source by explicitly adressing the partitions. That works well and Speeds up the process nicely.
I need to do the splits by data that resides in each and every table in order to avoid too many if- branches in my bash script. (we're talking some 200+ tables here)
I notice that a split by 8 Tasks results in very uneven spread of workload among the Tasks. I considered using Oracle ROWID to define the split.
To do this, I must define a boundary-query. In a Standard query 'select * from xyz' the rowid is not part of the result set. therefore, it is not an option to let Sqoop define the boundary-query from --query.
Now, when I run this, I am getting the error
ERROR tool.ImportTool: Encountered IOException running import job:
java.io.IOException: Sqoop does not have the splitter for the given SQL
data type. Please use either different split column (argument --split-by)
or lower the number of mappers to 1. Unknown SQL data type: -8
samples of ROWID :
AAJXFWAKPAAOqqKAAA
AAJXFWAKPAAOqqKAA+
AAJXFWAKPAAOqqKAA/
it is static and unique once it is created for any row.
I cast this funny datatype into something else in my boundary-query
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect
jdbc:oracle:thin:#127.0.0.1:port:mydb --username $USER --P --m 8
--split-by ROWID --boundary-query "select cast(min(ROWID) as varchar(18)), cast
( max(ROWID)as varchar(18)) from table where laufbzdt >
TO_DATE('2019-02-27', 'YYYY-MM-DD')" --query "select * from table
where laufbzdt > TO_DATE('2019-02-27', 'YYYY-MM-DD') and \$CONDITIONS "
--null-string '\\N'
--null-non-string '\\N'
But then I get ugly ROWIDs that are rejected by Oracle:
select * from table where laufbzdt > TO_DATE('2019-02-27', 'YYYY-MM-DD')
and ( ROWID >= 'AAJX6oAG聕聁AE聉N:' ) AND ( ROWID < 'AAJX6oAH⁖⁁AD䁔䀷' ) ,
Error Msg = ORA-01410: invalid ROWID
how can I resolve this properly?
I am a LINUX-Embryo and have painfully chewed myself through the Topics of bash-shell-scripting and Sqooping so far, but I would like to make better use of evenly spread mapper-task workload - it would cut sqoop-time in half, I guess, saving some 5 to 8 hours.
TIA!
wahlium
You can try ROWNUM, but I think sqoop import does not work with pseudocolumn.
I am using RODBC with R to connect to Teradata.
I am trying to copy a large table EXAMPLE (25GB) from the READ_ONLY database to the WORKdatabase. The two databases are under the same DB system so I only need one connection.
I have tried sqlQuery, sqlCopy and sqlCopyTablefunctions but do not succeed.
sqlQuery
EDIT: syntax error corrected as suggested by #dnoeth.
CREATE TABLE WORK.EXAMPLE AS (SELECT * FROM READ_ONLY.EXAMPLE) WITH DATA;
OR
CREATE TABLE WORK.EXAMPLE AS (SELECT * FROM READ_ONLY.EXAMPLE) WITH NO DATA;
INSERT INTO WORK.EXAMPLE SELECT * FROM READ_ONLY.EXAMPLE;
I let the latter method run for 15h but it did not complete the copy.
sqlCopy
sqlCopy(ch,
query='SELECT * FROM READ_ONLY.EXAMPLE',
destination = 'WORK.EXAMPLE')
Error: cannot allocate vector of size 155.0 Mb
Does sqlCopy try to first copy the data to R's memory before creating the new table? If so, how can I bypass this step and work exclusively on the Teradata server? Also, the error persists even if use the option fast=F.
In case R's memory was the issue, I tried creating a smaller table of 1000 rows:
sqlCopy(ch,
query='SELECT * FROM READ_ONLY.EXAMPLE SAMPLE 1000',
destination = 'WORK.EXAMPLE')
Error in sqlSave(destchannel, dataset, destination, verbose = verbose, :
[RODBC] Failed exec in Update
22018 0 [Teradata][ODBC Teradata Driver] Data is not a numeric-literal.
In addition: Warning message:
In odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
character data '2017-03-20 12:08:25' truncated to 15 bytes in column 'ExtractionTS'
With this command a table was actually created but it only includes the column names without any rows.
sqlCopyTable
sqlCopyTable(ch,
srctable = 'READ_ONLY.EXAMPLE',
desttable = 'WORK.EXAMPLE')
Error in if (as.character(keys[[4L]]) == colnames[i]) create <- paste(create, :
argument is of length zero
The syntax in your sqlQuery is not correc, the WITH DATAoption is missing:
CREATE TABLE WORK.EXAMPLE AS (SELECT * FROM READ_ONLY.EXAMPLE) WITH DATA;
Caution, this will loose all NOT NULL & CHECK constraints and all indexes, resulting in the 1st column as Non-Unique Primary Index.
Either add a PI manually or switch to
CREATE TABLE WORK.EXAMPLE AS READ_ONLY.EXAMPLE WITH DATA;
if READ_ONLY.EXAMPLE is a table and you actually want an exact copy.