Avoid Full Table Scan - Extract First Row Only - oracle11g

I am trying to write a query that only extract the first (random) row when condition is met.
-- Create table
create table TRANSACTIONS_SAMPLE
(
institution_id NUMBER(5) not null,
id NUMBER(10) not null,
partitionkey NUMBER(10) default 0 not null,
cardid NUMBER(10),
accountid NUMBER(10),
batchid NUMBER(10) not null,
amt_bill NUMBER(16,3),
load_date DATE not null,
trxn_date DATE not null,
single_msg_flag NUMBER(5),
authaccounttype VARCHAR2(2 BYTE),
originator VARCHAR2(50),
amount NUMBER(16,3) default 0.000 not null,
embeddedfee NUMBER(16,3) default 0.000 not null,,
valuedate DATE,
startofinterest DATE,
minduevaluedate DATE,
postdate DATE,
posttimestamp DATE,
Status CHAR(4 BYTE) default 'NEW' not null,
)
partition by list (PARTITIONKEY)
(
partition 0002913151 values (1234567)
tablespace LIVE
pctfree 10
initrans 16
maxtrans 255
storage
(
initial 8M
next 1M
minextents 1
maxextents unlimited
)
);
-- Create/Recreate indexes
create index TRANSACTIONS_SAMPLEI01 on TRANSACTIONS_SAMPLE (ACCOUNTID)
local;
create index TRANSACTIONS_SAMPLEI02 on TRANSACTIONS_SAMPLE (LOAD_DATE)
local;
create index TRANSACTIONS_SAMPLEI03 on TRANSACTIONS_SAMPLE (BATCHID)
local;
create index TRANSACTIONS_SAMPLEI04 on TRANSACTIONS_SAMPLE (POSTDATE)
local;
create index TRANSACTIONS_SAMPLEI05 on TRANSACTIONS_SAMPLE (POSTTIMESTAMP)
local;
create index TRANSACTIONS_SAMPLEI06 on TRANSACTIONS_SAMPLE (STATUS, PARTITIONKEY)
local;
create index TRANSACTIONS_SAMPLEI07 on TRANSACTIONS_SAMPLE (CARDID, TRXN_DATE)
local;
create unique index TRANSACTIONS_SAMPLEUI01 on TRANSACTIONS_SAMPLE (ID, PARTITIONKEY)
local;
-- Create/Recreate primary, unique and foreign key constraints
alter table TRANSACTIONS_SAMPLE
add constraint TRANSACTIONS_SAMPLEPK primary key (ID, PARTITIONKEY);
--QUERY
Select * From (
Select t.AccountId From Transactions_sample t Group by t.Accountid Having Count(t.AccountId) > 10 order by dbms_random.random)
Where Rownum = 1
The problem With this Query is full table scan. I want to achieve the same results without having to fully Access the table. Any ideas?
Thanks

You can get it down to a full index scan, using TRANSACTIONS_SAMPLEI01, if you add a filter for where AccountId is not null. But only if you don't want to count null values, of course.
The column is nullable, but the index doesn't contain null values. To include the count of nulls it has to do a full table scan because it cannot get that count from the index. If you have that filter the optimizer knows all the account ID values must be in the index, so it only has to refer to that, not the table itself.
explain plan for
Select * From (
Select t.AccountId From Transactions_sample t where AccountId is not null Group by t.Accountid Having Count(t.AccountId) > 10 order by dbms_random.random)
Where Rownum = 1;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------
Plan hash value: 381125580
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 0 (0)| 00:00:01 | | |
|* 1 | COUNT STOPKEY | | | | | | | |
| 2 | VIEW | | 1 | 13 | 0 (0)| 00:00:01 | | |
|* 3 | SORT ORDER BY STOPKEY | | 1 | 13 | 0 (0)| 00:00:01 | | |
|* 4 | FILTER | | | | | | | |
| 5 | SORT GROUP BY NOSORT | | 1 | 13 | 0 (0)| 00:00:01 | | |
| 6 | PARTITION LIST SINGLE| | 1 | 13 | 0 (0)| 00:00:01 | 1 | 1 |
|* 7 | INDEX FULL SCAN | TRANSACTIONS_SAMPLEI01 | 1 | 13 | 0 (0)| 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM=1)
3 - filter(ROWNUM=1)
4 - filter(COUNT("T"."ACCOUNTID")>10)
7 - filter("ACCOUNTID" IS NOT NULL)
Note
-----
- dynamic sampling used for this statement (level=2)
Alternatively, if the column can be made not-nullable then the filter wouldn't be required.

Related

Sqlite delete all connected rows from 3 join table

How do you delete all rows connected from 2 different table in SQLite?
**Table1** **Table2** **Table3**
| ID | Number | | ID | Tax | Table1ID | | ID | Price | Table2ID |
| 1 | 0 | | 1 | 21 | 1 | | 1 | 56 | 1 |
| 2 | 1 | | 2 | 15 | 2 | | 2 | 5 | 2 |
| 3 | 0 | | 3 | 10 | 3 | | 3 | 98 | 3 |
I want to delete all rows from Table1-3 where Table1.Number = 0, how I can get that?
What you need is to define the tables properly so that Table2.Table1ID references Table1.ID and Table3.Table2ID references Table2.ID with the action ON DELETE CASCADE:
PRAGMA foreign_keys = ON;
CREATE TABLE Table1(
`ID` INTEGER PRIMARY KEY,
`Number` INTEGER
);
CREATE TABLE Table2(
`ID` INTEGER PRIMARY KEY,
`Tax` INTEGER,
`Table1ID` INTEGER REFERENCES Table1(ID) ON DELETE CASCADE
);
CREATE TABLE Table3(
`ID` INTEGER PRIMARY KEY,
`Price` INTEGER,
`Table2ID` INTEGER REFERENCES Table2(ID) ON DELETE CASCADE
);
Note that you must turn on the foreign key support because it is off by default.
Now every time you delete a row from Table1, all rows of Table2 that hold a reference to the deleted row of Table1 will be deleted too.
Also all rows of Table3 that hold a reference to the deleted rows of Table2 will be deleted.
So all you need is:
DELETE FROM Table1 WHERE Number = 0;
See the demo.

Optimizing query that looks at a specific time window each day

This is a followup to my previous question
Optimizing query to get entire row where one field is the maximum for a group
I'll change the names from what I used there to make them a little more memorable, but these don't represent my actual use-case (so don't estimate the number of records from them).
I have a table with a schema like this:
OrderTime DATETIME(6),
Customer VARCHAR(50),
DrinkPrice DECIMAL,
Bartender VARCHAR(50),
TimeToPrepareDrink TIME(6),
...
I'd like to extract the rows from the table representing each customer's most expensive drink order during happy hour (3 PM - 6 PM) each day. So for instance I'd want results like
Date | Customer | OrderTime | MaxPrice | Bartender | ...
-------+----------+-------------+------------+-----------+-----
1/1/18 | Alice | 1/1/18 3:45 | 13.15 | Jane | ...
1/1/18 | Bob | 1/1/18 5:12 | 9.08 | Jane | ...
1/1/18 | Carol | 1/1/18 4:45 | 20.00 | Tarzan | ...
1/2/18 | Alice | 1/2/18 3:45 | 13.15 | Jane | ...
1/2/18 | Bob | 1/2/18 5:57 | 6.00 | Tarzan | ...
1/2/18 | Carol | 1/2/18 3:13 | 6.00 | Tarzan | ...
...
The table has an index on OrderTime, and contains tens of billions of records. (My customers are heavy drinkers).
Thanks to the previous question I'm able to extract this for a specific day pretty easily. I can do something like:
SELECT * FROM orders b
INNER JOIN (
SELECT Customer, MAX(DrinkPrice) as MaxPrice
FROM orders
WHERE OrderTime >= '2018-01-01 15:00'
AND OrderTime <= '2018-01-01 18:00'
GROUP BY Customer
) AS a
ON a.Customer = b.Customer
AND a.MaxPrice = b.DrinkPrice
WHERE b.OrderTime >= '2018-01-01 15:00'
AND b.OrderTime <= '2018-01-01 18:00';
This query runs in less than a second. The explain plan looks like this:
+---+-------------+------------+-------+---------------+------------+--------------------+--------------------------------------------------------+
| id| select_type | table | type | possible_keys | key | ref | Extra |
+---+-------------+------------+-------+---------------+------------+--------------------+--------------------------------------------------------+
| 1 | PRIMARY | b | range | OrderTime | OrderTime | NULL | Using index condition |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | b.Customer,b.Price | |
| 2 | DERIVED | orders | range | OrderTime | OrderTime | NULL | Using index condition; Using temporary; Using filesort |
+---+-------------+------------+-------+---------------+------------+--------------------+--------------------------------------------------------+
I can also get the information about the relevant rows for my query:
SELECT Date, Customer, MAX(DrinkPrice) AS MaxPrice
FROM
orders
INNER JOIN
(SELECT '2018-01-01' AS Date
UNION
SELECT '2018-01-02' AS Date) dates
WHERE OrderTime >= TIMESTAMP(Date, '15:00:00')
AND OrderTime <= TIMESTAMP(Date, '18:00:00')
GROUP BY Date, Customer
HAVING MaxPrice > 0;
This query also runs in less than a second. Here's how its explain plan looks:
+------+--------------+------------+------+---------------+------+------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | ref | Extra |
+------+--------------+------------+------+---------------+------+------+------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | Using temporary; Using filesort |
| 1 | PRIMARY | orders | ALL | OrderTime | NULL | NULL | Range checked for each record (index map: 0x1) |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | UNION | NULL | NULL | NULL | NULL | NULL | No tables used |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | |
+------+--------------+------------+------+---------------+------+------+------------------------------------------------+
The problem now is retrieving the remaining fields from the table. I tried adapting the trick from before, like so:
SELECT * FROM
orders a
INNER JOIN
(SELECT Date, Customer, MAX(DrinkPrice) AS MaxPrice
FROM
orders
INNER JOIN
(SELECT '2018-01-01' AS Date
UNION
SELECT '2018-01-02' AS Date) dates
WHERE OrderTime >= TIMESTAMP(Date, '15:00:00')
AND OrderTime <= TIMESTAMP(Date, '18:00:00')
GROUP BY Date, Customer
HAVING MaxPrice > 0) b
ON a.OrderTime >= TIMESTAMP(b.Date, '15:00:00')
AND a.OrderTime <= TIMESTAMP(b.Date, '18:00:00')
AND a.Customer = b.Customer;
However, for reasons I don't understand, the database chooses to execute this in a way that takes forever. Explain plan:
+------+--------------+------------+------+---------------+------+------------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | ref | Extra |
+------+--------------+------------+------+---------------+------+------------+------------------------------------------------+
| 1 | PRIMARY | a | ALL | OrderTime | NULL | NULL | |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | a.Customer | Using where |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | Using temporary; Using filesort |
| 2 | DERIVED | orders | ALL | OrderTime | NULL | NULL | Range checked for each record (index map: 0x1) |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | No tables used |
| 4 | UNION | NULL | NULL | NULL | NULL | NULL | No tables used |
| NULL | UNION RESULT | <union3,4> | ALL | NULL | NULL | NULL | |
+------+--------------+------------+------+---------------+------+------------+------------------------------------------------+
Questions:
What is going on here?
How can I fix it?
To extract the rows from the table representing each customer's most expensive drink order during happy hour (3 PM - 6 PM) each day I would use row_number() over() within a case expression evaluating the hour of day, like this:
CREATE TABLE mytable(
Date DATE
,Customer VARCHAR(10)
,OrderTime DATETIME
,MaxPrice NUMERIC(12,2)
,Bartender VARCHAR(11)
);
note changes were made to OrderTime
INSERT INTO mytable(Date,Customer,OrderTime,MaxPrice,Bartender)
VALUES
('1/1/18','Alice','1/1/18 13:45',13.15,'Jane')
, ('1/1/18','Bob' ,'1/1/18 15:12', 9.08,'Jane')
, ('1/2/18','Alice','1/2/18 13:45',13.15,'Jane')
, ('1/2/18','Bob' ,'1/2/18 15:57', 6.00,'Tarzan')
, ('1/2/18','Carol','1/2/18 13:13', 6.00,'Tarzan')
;
The suggested query is this:
select
*
from (
select
*
, case when hour(OrderTime) between 15 and 18 then
row_number() over(partition by `Date`, customer
order by MaxPrice DESC)
else null
end rn
from mytable
) d
where rn = 1
;
and the result will give access to all columns you include in the derived table.
Date | Customer | OrderTime | MaxPrice | Bartender | rn
:--------- | :------- | :------------------ | -------: | :-------- | -:
0001-01-18 | Bob | 0001-01-18 15:12:00 | 9.08 | Jane | 1
0001-02-18 | Bob | 0001-02-18 15:57:00 | 6.00 | Tarzan | 1
To help display how this works, running the derived table subquery:
select
*
, case when hour(OrderTime) between 15 and 18 then
row_number() over(partition by `Date`, customer order by MaxPrice DESC)
else null
end rn
from mytable
;
produces this interim resultset:
Date | Customer | OrderTime | MaxPrice | Bartender | rn
:--------- | :------- | :------------------ | -------: | :-------- | ---:
0001-01-18 | Alice | 0001-01-18 13:45:00 | 13.15 | Jane | null
0001-01-18 | Bob | 0001-01-18 15:12:00 | 9.08 | Jane | 1
0001-02-18 | Alice | 0001-02-18 13:45:00 | 13.15 | Jane | null
0001-02-18 | Bob | 0001-02-18 15:57:00 | 6.00 | Tarzan | 1
0001-02-18 | Carol | 0001-02-18 13:13:00 | 6.00 | Tarzan | null
db<>fiddle here
The task seems to be a "groupwise-max" problem. Here's one approach, involving only 2 'queries' (the inner one is called a "derived table").
SELECT x.OrderDate, x.Customer, b.OrderTime,
x.MaxPrice, b.Bartender
FROM
(
SELECT DATE(OrderTime) AS OrderDate,
Customer,
Max(Price) AS MaxPrice
FROM tbl
WHERE TIME(OrderTime) BETWEEN '15:00' AND '18:00'
GROUP BY OrderDate, Customer
) AS x
JOIN tbl AS b
ON b.OrderDate = X.OrderDate
AND b.customer = x.Customer
AND b.Price = x.MaxPrice
WHERE TIME(b.OrderTime) BETWEEN '15:00' AND '18:00'
ORDER BY x.OrderDate, x.Customer
Desirable index:
INDEX(Customer, Price)
(There's no good reason to be using MyISAM.)
Billions of new rows per day
This adds new wrinkles. That's upwards of a terabyte of additional disk space needed each and every day?
Is it possible to summarize the data? The goal here is to add summary info as the new data comes in, and never have to re-scan the billions of old data. This may also let you remove all the secondary indexes on the Fact table.
Normalization will help shrink the table size, hence speeding up the queries. Bartender and Customer are prime candidates for such -- perhaps a SMALLINT UNSIGNED (2 bytes; 65K values) for the former and MEDIUMINT UNSIGNED (3 bytes, 16M) for the latter. That would probably shrink by 50% the 5 columns you currently show. You may get a 2x speedup on many operations after normalizing.
Normalization is best done by 'staging' the data -- Load the data into a temporary table, normalize within it, summarize it, then copy into the main Fact table.
See http://mysql.rjweb.org/doc.php/summarytables
and http://mysql.rjweb.org/doc.php/staging_table
Before getting back to the question of optimizing the one query, we need to see the schema, the data flow, whether things can be normalized, whether summary tables can be effective, etc. I would hope to have the 'answer' for the query to be mostly digested in a summary table. Sometimes this leads to a 10x speedup.

Add columns from file to an existing table in MariaDB 10.1

I want to add a new column from a file to an existing table, in the way cbind does in R.
The file has 1 column, 23710 lines, all numbers:
me#my_server:/var/www/html/my_website$ head my_sample.txt
61
66
0
330
76
9
10
16
6
0
Using the code:
ALTER TABLE my_table ADD COLUMN IF NOT EXISTS sample69 INT(10) DEFAULT NULL;
LOAD DATA LOCAL INFILE '/var/www/html/my_website/my_sample.txt' INTO TABLE my_table LINES TERMINATED BY '\n' (sample69);
Before:
MariaDB [my_database]> select * from my_table limit 10;
+------------+-----------+
| geneSymbol | sample000 |
+------------+-----------+
| A1BG | 61 |
| A1BG-AS1 | 66 |
| A1CF | 0 |
| A2M | 330 |
| A2M-AS1 | 76 |
| A2ML1 | 9 |
| A2MP1 | 10 |
| A4GALT | 16 |
| A4GNT | 6 |
| AA06 | 0 |
+------------+-----------+
MariaDB [my_database]> select count(*) from my_table;
+----------+
| count(*) |
+----------+
| 23710 |
+----------+
After:
MariaDB [my_database]> select * from my_table limit 10;
+------------+-----------+-----------+
| geneSymbol | sample000 | sample69 |
+------------+-----------+-----------+
| A1BG | 61 | NULL |
| A1BG-AS1 | 66 | NULL |
| A1CF | 0 | NULL |
| A2M | 330 | NULL |
| A2M-AS1 | 76 | NULL |
| A2ML1 | 9 | NULL |
| A2MP1 | 10 | NULL |
| A4GALT | 16 | NULL |
| A4GNT | 6 | NULL |
| AA06 | 0 | NULL |
+------------+-----------+-----------+
MariaDB [my_database]> select count(*) from my_table;
+----------+
| count(*) |
+----------+
| 47420 |
+----------+
It apparently appends the data to the end of the column. Instead I want the new column to be the same length of 23710, filled with the new data from the file.
What am I doing wrong?
LOAD only loads whole rows.
Even if it could load just one column, how would it know which row each number goes with?
You must reconstruct the data with two columns (geneSymbol and sample69), load that into a temp table, then do a multi-table JOIN to move the data into the main table.
Addenda
If you have 69 columns of samples, that it the wrong way to design the schema. At some point, you will hit a limit.
Plan A: Lots of rows, not lots of columns:
CREATE TABLE x (
geneSymbol VARCHAR(..) ...,
num SMALLINT UNSIGNED NOT NULL,
value SMALLINT UNSIGNED NOT NULL,
PRIMARY KEY(geneSymbol, num)
) ENGINE=InnoDB
Plan B (This will require code to add each new sample):
CREATE TABLE x (
geneSymbol VARCHAR(..) ...,
text NOT NULL, -- JSON encoded list of samples for that gene
PRIMARY KEY(geneSymbol)
) ENGINE=InnoDB
Plan C (aimed at reading one sample):
CREATE TABLE x (
num SMALLINT UNSIGNED NOT NULL,
text NOT NULL, -- JSON encoded list of values for that sample
PRIMARY KEY(num)
) ENGINE=InnoDB
What will your queries be like? I suspect you will be reading all the data, not doing any WHERE clauses based on symbol or num??

Oracle 11g r2: strange behavior on index

I have the query:
SELECT count(*)
FROM
(
SELECT
TBELENCO.DATA_PROC, TBELENCO.POD, TBELENCO.DESCRIZIONE, TBELENCO.ERROR, TBELENCO.STATO,
TBELENCO.SEZIONE, TBELENCO.NOME_FILE, TBELENCO.ID_CARICAMENTO, TBELENCO.ESITO_OPERAZIONE,
TBELENCO.DES_TIPO_MISURA,
--TBELENCO.RAGIONE_SOCIALE,
--ROW_NUMBER() OVER (ORDER BY TBELENCO.DATA_PROC DESC) R
ROWNUM R
FROM(
SELECT
LOG.DATA_PROC, LOG.POD, LOG.DESCRIZIONE, LOG.ERROR, LOG.STATO,
LOG.SEZIONE, LOG.NOME_FILE, LOG.ID_CARICAMENTO, LOG.ESITO_OPERAZIONE, TM.DES_TIPO_MISURA
--,C.RAGIONE_SOCIALE
--ROW_NUMBER() OVER (ORDER BY LOG.DATA_PROC DESC) R
FROM
MS042_LOADING_LOGS LOG JOIN MS116_MEASURE_TYPES TM ON
TM.ID_TIPO_MISURA=LOG.SEZIONE
-- LEFT JOIN(
-- SELECT CUST.RAGIONE_SOCIALE,STR.POD,RSC.DATA_DA, RSC.DATA_A
-- FROM
-- MS038_METERS STR JOIN MS036_REL_SITES_CUSTOMERS RSC ON
-- STR.ID_SITO=RSC.ID_SITO
-- JOIN MS030_CUSTOMERS CUST ON
-- CUST.ID_CLIENTE=RSC.ID_CLIENTE
-- ) C ON
-- C.POD=LOG.POD
--AND LOG.DATA_PROC BETWEEN C.DATA_DA AND C.DATA_A
WHERE
1=1
--AND LOG.DATA_PROC>=TRUNC(SYSDATE)
AND LOG.DATA_PROC>=TRUNC(SYSDATE)-3
--TO_DATE('01/11/2014', 'DD/MM/YYYY')
) TBELENCO
)
WHERE
R BETWEEN 1 AND 200;
If I execute the query with AND LOG.DATA_PROC>=TRUNC(SYSDATE)-3, Oracle uses the index on the data_proc field of the MS042_LOADING_LOGS (LOG) table, if I use, instead, AND LOG.DATA_PROC>=TRUNC(SYSDATE)-4 or -5, or -6, etc, it uses a table access full. Why this behavior?
I also execute a :
ALTER INDEX MS042_DATA_PROC_IDX REBUILD;
but with no changes.
Thank,
Igor
--***********************************************************
SELECT count(*)
FROM
(
SELECT
TBELENCO.DATA_PROC, TBELENCO.POD, TBELENCO.DESCRIZIONE, TBELENCO.ERROR, TBELENCO.STATO,
TBELENCO.SEZIONE, TBELENCO.NOME_FILE, TBELENCO.ID_CARICAMENTO, TBELENCO.ESITO_OPERAZIONE,
TBELENCO.DES_TIPO_MISURA,
ROWNUM R
FROM(
SELECT
LOG.DATA_PROC, LOG.POD, LOG.DESCRIZIONE, LOG.ERROR, LOG.STATO,
LOG.SEZIONE, LOG.NOME_FILE, LOG.ID_CARICAMENTO, LOG.ESITO_OPERAZIONE, TM.DES_TIPO_MISURA
FROM
MS042_LOADING_LOGS LOG JOIN MS116_MEASURE_TYPES TM ON
TM.ID_TIPO_MISURA=LOG.SEZIONE
WHERE
1=1
AND LOG.DATA_PROC>=TRUNC(SYSDATE)-1
) TBELENCO
)
WHERE
R BETWEEN 1 AND 200;
Plan hash value: 2191058229
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 30866 (2)| 00:06:11 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | VIEW | | 94236 | 1196K| 30866 (2)| 00:06:11 |
| 3 | COUNT | | | | | |
|* 4 | HASH JOIN | | 94236 | 1104K| 30866 (2)| 00:06:11 |
| 5 | INDEX FULL SCAN | P087_TIPI_MISURE_PK | 15 | 30 | 1 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| MS042_LOADING_LOGS | 94236 | 920K| 30864 (2)| 00:06:11 |
|* 7 | INDEX RANGE SCAN | MS042_DATA_PROC_IDX | 94236 | | 25742 (2)| 00:05:09 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("R"<=200 AND "R">=1)
4 - access("TM"."ID_TIPO_MISURA"="LOG"."SEZIONE")
7 - access(SYS_OP_DESCEND("DATA_PROC")<=SYS_OP_DESCEND(TRUNC(SYSDATE#!)-1))
filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("DATA_PROC"))>=TRUNC(SYSDATE#!)-1)
Plan hash value: 69930686
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 95921 (1)| 00:19:12 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | VIEW | | 1467K| 18M| 95921 (1)| 00:19:12 |
| 3 | COUNT | | | | | |
|* 4 | HASH JOIN | | 1467K| 16M| 95921 (1)| 00:19:12 |
| 5 | INDEX FULL SCAN | P087_TIPI_MISURE_PK | 15 | 30 | 1 (0)| 00:00:01 |
|* 6 | TABLE ACCESS FULL| MS042_LOADING_LOGS | 1467K| 13M| 95912 (1)| 00:19:11 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("R"<=200 AND "R">=1)
4 - access("TM"."ID_TIPO_MISURA"="LOG"."SEZIONE")
6 - filter("LOG"."DATA_PROC">=TRUNC(SYSDATE#!)-4)
The larger the fraction of rows that will be returned, the more efficient a table scan is and the less efficient it is to use an index. Apparently, Oracle expects that inflection point to come when the query returns more than 3 days of data. If that is inaccurate, I would expect that the statistics on your table or indexes are inaccurate.

Oracle Index Organized Table (IOT) order by primary key suffix better than by primary key prefix

I use an Index Organized Table (IOT) for a table having 550 M rows. The primary key is composed by two columns (id1 and id2) which are also foreign key towards 2 other tables (id1 FK towards table1, id2 FK towards table2).
When using an IOT, according to Oracle doc (http://docs.oracle.com/cd/B28359_01/server.111/b28310/tables012.htm#i1007389), sorting by prefix of the primary key should be faster than sorting by suffix of the primary key. Indeed, rows are sorted according to the primary key and the order of the columns composing it.
However here are the explain plans I get when joining the IOT with the two other tables and I try to order either by id1 or id2. I would have expected to get a better cost by sorting by id1. But it is not the case.
The value used for id1 corresponds to 58000 rows among 489000 rows in total in table1. The value used for id2 corresponds to 760 rows among 248900 rows in total in table2.
The SQL query :
SELECT a.id1, a.id2, a.some_column
FROM iot_table a
INNER JOIN table1 t1 ON t1.id = a.id1
INNER JOIN table2 t2 ON t2.id = a.id2
WHERE t1.col_x = x AND t2.col_y = y
ORDER BY {a.id1|a.id2};
Explain plan order by id1 :
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 11243 (100)| |
| 1 | NESTED LOOPS | | 1311K| 42M| 11243 (1)| 00:01:44 |
| 2 | MERGE JOIN CARTESIAN | | 46M| 842M| 11173 (1)| 00:01:43 |
|* 3 | TABLE ACCESS BY INDEX ROWID | TABLE1 | 58152 | 511K| 4745 (1)| 00:00:44 |
| 4 | INDEX FULL SCAN | TABLE1_ID1_IDX | 488K| | 15 (0)| 00:00:01 |
| 5 | BUFFER SORT | | 799 | 7990 | 6429 (1)| 00:01:00 |
| 6 | TABLE ACCESS BY INDEX ROWID| TABLE2 | 799 | 7990 | 1 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | TABLE2_COL_Y_IDX | 799 | | 1 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | IOT_TABLE_PK | 1 | 15 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Explain plan order by id2 :
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 5159 (100)| |
| 1 | NESTED LOOPS | | 1311K| 42M| 5159 (2)| 00:00:48 |
| 2 | MERGE JOIN CARTESIAN | | 46M| 842M| 5089 (1)| 00:00:47 |
|* 3 | TABLE ACCESS BY INDEX ROWID | TABLE2 | 799 | 7990 | 2512 (1)| 00:00:24 |
| 4 | INDEX FULL SCAN | TABLE2_ID2_IDX | 248K| | 28 (0)| 00:00:01 |
| 5 | BUFFER SORT | | 58152 | 511K| 2577 (2)| 00:00:24 |
| 6 | TABLE ACCESS BY INDEX ROWID| TABLE1 | 58152 | 511K| 3 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | TABLE1_COL_X_IDX | 58152 | | 1 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | IOT_TABLE_PK | 1 | 15 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
I am wondering why the cost is worst for sorting by id1 whereas the rows should be primarily sorted according to this column and Oracle would just have to browse the IOT B*-Tree as it is.
Thank you for your help
I use Oracle 11.2 g and the stats are up to date.

Resources